# Evaluation of the Routing Algorithms with Reduced **Energy Consumption** lle Dimitrievski, Valentin Mollov stract: This paper presents an evaluation of a special routing algorithm with energy reduced onsumption in comparison to one without any energy consumption optimization. We apply Noxim discrete event simulator specialized for simulation of Network on Chip (NoC) architectures. Evaluation is made over a 7x7 2D mesh structure which is the most used model nowadays. Comparison between the popular state-of-the-art XY routing algorithm and DyAD one is given, as well. DyAD routing algorithm for NoC combines the advantages of both deterministic and adaptive routing algorithms. Calculations on the above algorithms are performed by means of the Noxim simulator with examples on 2D mesh topology. Key words: Adaptive, deterministic, DyAD, Network on Chip, power consumption, random selection, routing algorithms, 2D mesh topology. #### 1. INTRODUCTION ps" Users view their om 1 to will be Today, into the billion transistors' era, many application specific chips performing advanced functions are used in the everyday smart devices. NoC (Network on Chip) has become the most used paradigm for performing such advanced functions like image and signal processing, coding and decoding of digital video frames with or without compression, etc. The problem of the efficient routing between each individual core in such smart powerful encoding or decoding digital video engine becomes an important issue as it concerns the energy consumption. These powerful NoC engines are an essential part of almost every modern commercial application like image cameras, smart phones, etc. Generally, two types of routing algorithms are used in the NoC architectures - one is adaptive, another is deterministic, while the new DyAD (Dynamically Adaptive Deterministic) is applied in the NoC with reduced power consumption. ## 2. PROBLEM FORMULATION (ROUTING SCHEMES) Application of the XY algorithm (Fig.1) which is deterministic routing algorithm has the advantages of simplicity, deadlock free, and livelock free [1]. Unfortunately, the XY algorithm does not have power consumption minimization due to the application of non-adaptive approach. The advantage of deterministic routing algorithm logic is simple and can provide low latency only when the network is not congested [2]. If the Packed Injection Ratio (PIR) is increased, deterministic routers start to suffer from network degradation as they are unable to respond dynamically to the network congestion. The adaptive routing approach avoids congested links by using alternative routing paths and that leads to higher throughput rate. Adaptively routing should have extra embedded routing logic circuitry which must take decisions on the good routing path. Fig. 1. XY routing scheme (algorithm) New algorithms are suggested to perform characteristic of adaptively algorithms and state-ofthe-art deterministic routing algorithms. In this paper algorithm with dynamic adaptively and deterministic characteristic will be used to perform power consumption reduction, shown on Fig.2. This routing algorithm routing procedure is dependable of the current network congestion. Here, the working principle is that near each network unit (CPU, FPGA, DSP), a separate router is attached to perform local network monitoring of the traffic and to take decisions based on the retrieved information about the traffic flow. There are two possible states of routing: is case the network is not congested a DyAD router is working in deterministic mode and this mode shows low latency enabled by this mode of operation. When the network is congested, DyAD router is working in adaptively routing mode and thus avoids congested links by exploiting other routing paths, which also leads to high network throughput. It is highly desirable when NoC-based applications are designed. A valid Fig. 2. DyAD routing scheme approach which is livelock and deadlock free is proposed [1] when combining deterministic with adaptive routing algorithms. ## 3. DyAD PRACTICAL ROUTER IMPLEMENTATION DyAD is a new paradigm which exploits the advantages of deterministic and adaptive routing techniques. Any deterministic and adaptive routing algorithm can be combined to form a DyAD router. ## 3.1 Platform description We consider the most frequently used 2D mesh platform - Fig.3. In this regular architecture each tile is composed of a processing element PE and a router. The router embedded onto each tile is connected to four neighboring tiles and its local PE via channels. Each channel consists of two directional point-to-point links between two routers or a router and PE. Fig. 3. DyAD test 7x7 routing platform on 2D mesh newtork Because of the limited silicon on-chip resources and the low-latency requirements for typical NoC applications, wormhole switching is used as the switching scheme for the on-chip routers. Under this scheme, a packet is split into so called flits (flow control digits) which are then routed in a pipelined fashion. To minimize the implementation cost, the onchip network should be implemented with very little area overhead. Thus, instead of having huge memories (e.g. SRAM or DRAM) as buffered space, it is more reasonable to use registers as memory buffer elements. A 7x7 crossbar switch is used as switching fabrics because of its nice cost/performance trade-offs for switches with small numbers of ports. In the conducted research presented here, we use XY routing scheme to representative deterministic routing because of its simplicity and wide popularity. XY routing scheme is minimal path routing algorithm and is free of deadlock and livelock [2]. Unlike deterministic routing where the routing path is fixed once the source and the destination addresses are given, the adaptive routing offers packets more flexibility in choosing their routing paths, as multiple routing paths exist. However, when using adaptive routing scheme, caution must be taken in order to solve the deadlock problem, which may be caused by packets waiting for each other in a cycle. Starting from these observations, we use routing algorithms that require no virtual channels for NoC, to be deadlock free, the routing algorithm need to prohibit at least one turn in each of the possible routing cycles. In addition, in order to preserve the adaptiveness, it should not prohibit more turns than necessary. Chiu [6] proposed the odd-even turn model which restricts the locations where some types of turns can take place, so that the algorithm remains deadlock-free. More precisely, the odd-even routing prohibits the east→north and east→south turns at any tiles located in an even column. It also prohibits the north→west and south→west at any tile located in an odd column. Compared to other adaptive routing algorithm without virtual channels support [5], the degree of adaptiveness provided by the odd-even routing is distributed more evenly across the network. Thus, we choose minimal odd-even routing as the adaptive routing scheme for the on-chip routers. The usage of minimal routing helps not only in reducing the energy consumption of communications, but also keeps the network free form the livelock. # 4. EXPERIMENTAL RESULTS We will evaluate DyAD routing algorithm on most used topology in NoC and that is 2D mesh topology. Evaluation of the NoC critical parameters: average latency, average throughput, and dynamically energy consumption will be made with Noxim simulator. #### 4.1 Simulator overview Noxim simulator [4,7] (Fig.4) was originally written in SystemC, a system description language based on object Univer Bas softwa on nur In which Tal NoC synthe applie routing 7x7 a both v Th are as 4.2 8 avera powe simul inves elemfixed topolespe appli led flits (flow n a pipelined cost, the onvith very little having huge ered space, it nemory buffer as switching nce trade-offs here, we use deterministic popularity. path routing lock [2]. routing path destination puting offers outing paths, , when using be taken in hich may be in a cycle. use routing s for NoC, to m need to the possible the possible preserve the e turns than Id-even turn some types thm remains prohibits the tiles located north—west odd column. ithm without degree of n routing is routing as the routers. The in reducing ins, but also hm on most sh topology. rs: average cally energy plator. nally written le based on object C++ programming language, developed by University of Catania Italy. Basically there are two types of network simulator software available. The first group simulators are based on events occurrence, the second one is based on number of cycles on which an event will occur. In our simulations we will use Noxim simulator which is driven by cycles. Table 1 presents numerical results about major NoC parameters calculated upon simulation under synthetic workload [3] for different injection rates applied on two network topologies, and respective routing algorithms: XY routing algorithm on 2D mesh 7x7 and DyAD routing algorithm on 2D mesh 7X7, both with bit reversal routing traffic pattern. The network traffic models used for our simulations are as follows: - Uniform random: here source and destination nodes are chosen via a uniform random process. The load is balanced, because the source and destination coordinates are uniformly distributed, although the random process that generates the coordinates may cause transient hotspots; - Transpose: In transpose traffic, the destination coordinates are the transpose of the source coordinates. Under this load the network's diagonal bisection is a bottleneck as all packets must cross it. The transpose traffic, in combination with the Dimension Router Ordering (DOR) algorithm commonly found in NoC's, produces a highly imbalanced network load. Here, the links located counter-clockwise from the centre of the mesh are utilized while the clockwise links remain unused; - Bit-complement: In this traffic load, the destination coordinates are the bit-wise inversion of the source coordinates. This load stresses the horizontal and vertical network bisection. Under this load DOR statically spreads traffic across all of the bisectional links, providing a perfectly balanced network load. ## 4.2 Simulator scenarios We will use different scenarios for evaluation of average latency, average throughput and dynamical power consumption using the above mentioned Noxim simulator. The test topology for all scenarios under investigation is 2D mesh with 7x7 nodes (processing elements) using bit-complement traffic pattern under fixed packet size and variable injection rate. This topology can be found mostly used nowadays, especially in NoC based devices. Here is the list of applied scenarios: Evaluated for average throughput and average latency. XY routing algorithm and DyAD routing algorithm are used in this case; - Evaluated for global average throughput and global average latency. XY routing algorithm and DyAD routing algorithm are used here; - Evaluated for energy consumption and average throughput. XY routing algorithm and DyAD routing algorithm are used in this case. Fig. 4. Noxim simulator sample screenshot Below, we present graphically the following dependences retrieved by simulation: Fig. 5. Average latency (cycles) vs offered traffic Fig. 6. Offered traffic (fits/cycle/node) vs injection rate Fig.5 shows the average latency [cycles] vs offered traffic [fits/cycle/node] plot for XY 2D mesh (upper, blue color) and DyAD topologies (lower, red color), while Fig.6 presents the magnitude of the offered traffic [fits/cycle/node] vs injection rate (ranges from 0 to 1). On Fig.7 plot is shown the total energy [J] vs throughput [flits/cycle/IP] dependency for XY and DyAD routing algorithms, respectively. Fig.8 graph presents the global average latency [cycles] vs global throughput [flits/cycle]. It is clearly visible, that the waveforms on this plot are very close to these, given on Fig.5. The proximity of the above results comes from the common origin of parameters that are calculated – the average latency on Fig.5 and Fig. 7. Total Energy (J) vs throughput (flits/cycle/IP) Fig. 8. Global average latency (cycles) vs Global throughput (flits/cycle) the global average latency on Fig.8. All plots are derived for the cases of NoC 2D mesh evaluated under offered traffic and XY and DyAD routing algorithms. ## 5. CONCLUSION By applying the state-of-the-art deterministic XY routing algorithm we have prepared an evaluation of the new power consumption optimized dynamical adaptive deterministic routing (DyAD) approach. The obtained results could lead to the idea of using this power consumption optimized routing algorithm for application in media processors with multiple processing elements (PE). These PE can apply coding and decoding of digital information, digital sound or image, where there is a need to optimize over one single chip processor and to implement all above mentioned functions. The next step of our research is to find some optimal strategy between switching from one routing algorithm to another and that will respectively lead to inclusion of two routing modes on the NoC based media processor. Control of the network congestion will depend basically from the traffic in the NoC network and it will be fluctuating when it is used by specific application on NoC. ### 6. FUTURE RESEARCH DIRECTIONS The future work will be based on how to optimize media processor based on NoC principle to work under different network workloads and with different specific functional cores implemented on it. We will choose suitable FPGA platform (by architecture, technology, and corresponding resources) for performing implementation of the NoC with 2D mesh connection between cores and two types of routing algorithms - one without power consumption optimization, and another with applying power consumption optimization. Modern FPGA technologies and platforms can be used for such performance evaluation of NoC with different connections between cores based media processors. Our next step will also be to write a code for implementation of NoC 2D mesh network topology. That can be one of the further practical extensions of the investigations, presented in this paper. [7 Table 1. Numerical results obtained with Noxim simulator | ROUTING<br>ALGORITHM | INJECTION<br>RATE | TOTAL<br>RECEIVED<br>PACKETS | TOTAL<br>RECEIVED<br>FLITS | GLOBAL<br>AVERAGE<br>DELAY<br>(CYCLES) | GLOBAL<br>AVERAGE<br>THROUGHPUT<br>(FLITS/CYCLE) | TROUGHPUT<br>(FLITS/<br>CYCLE/IP) | MAX DELAY<br>(CYCLES) | TOTAL<br>ENERGY<br>(J) | DYNAMIC<br>ENERGY<br>(J) | STATIC<br>ENERGY<br>(J) | |--------------------------------------------------------|-------------------|------------------------------|----------------------------|----------------------------------------|--------------------------------------------------|-----------------------------------|-----------------------|------------------------|--------------------------|-------------------------| | TRAFFIC BIT<br>REVERSAL<br>Routing XY<br>2D MESH 7 X 7 | 0.001 | 23855 | 855999 | 1042.33 | 0.0354 | 0.0351 | 67612 | 3.71E-04 | 3.34E-05 | 3.38E-04 | | | 0.01 | 40291 | 1444958 | 19761 | 0.0753 | 0.0756 | 98731 | 3.73E-04 | 3.67E-05 | 3.38E-04 | | | 0.025 | 45757 | 1644051 | 34460.8 | 0.0866 | 0.0869 | 99163 | 3.72E-04 | 3.43E-05 | 3.38E-04 | | | 0.05 | 46218 | 1663529 | 43145.2 | 0.0891 | 0.0897 | 99818 | 3.73E-04 | 3.54E-05 | 3.38E-04 | | | 0.075 | 45922 | 1653947 | 46380.9 | 0.0843 | 0.0849 | 99938 | 3.73E-04 | 3.53E-05 | 3.38E-04 | | | 0.1 | 47189 | 1702186 | 47524.2 | 0.0894 | 0.09 | 99708 | 3.75E-04 | 3.71E-05 | 3.38E-04 | | | 0.25 | 46557 | 1672806 | 50247 | 0.0851 | 0.085 | 99869 | 3.73E-04 | 3.55E-05 | 3.38E-04 | | | 0.5 | 47857 | 1718470 | 51227.1 | 0.0876 | 0.0882 | 99789 | 3.73E-04 | 3.56E-05 | 3.38E-04 | | | 0.75 | 48154 | 1725726 | 51342.5 | 0.0916 | 0.0922 | 99872 | 3.73E-04 | 3.55E-05 | 3.38E-04 | | | 1 | 48042 | 1733840 | 51761.9 | 0.0887 | 0.89 | 99918 | 3.74E-04 | 3.65E-05 | 3.38E-04 | | TRAFFIC BIT REVERSAL Routing DyAD 2D MESH 7 X 7 | 0.001 | 2221 | 4441 | 23.56 | 0.0019 | 0.0019 | 65 | 6.66E-05 | 3.61E-07 | 6.62E-05 | | | 0.01 | 22952 | 45908 | 24.948 | 0.018 | 0.019 | 82 | 6.99E-05 | 3.73E-06 | 6.62E-05 | | | 0.025 | 50589 | 101176 | 186.133 | 0.03 | 0.04 | 9181 | 7.39E-05 | 7.65E-06 | 6.62E-05 | | | 0.05 | 64858 | 129715 | 368.165 | 0.06 | 0.06 | 9487 | 7.45E-05 | 8.25E-05 | 6.62E-05 | | | 0.075 | 71706 | 143410 | 571.775 | 0.08 | 0.09 | 9631 | 7.47E-05 | 8.52E-06 | 6.62E-05 | | | 0.1 | 76743 | 153483 | 624.802 | 0.12 | 0.13 | 9354 | 7.48E-05 | 8.65E-06 | 6.62E-05 | | | 0.25 | 95619 | 191236 | 1398.88 | 0.23 | 0.26 | 6147 | 7.50E-05 | 8.82E-06 | 6.62E-05 | | | 0.5 | 103500 | 207000 | 2636.93 | 0.25 | 0.28 | 7623 | 7.54E-05 | 9.18E-06 | 6.62E-05 | | | 0.75 | 103500 | 207000 | 3255.41 | 0.26 | 0.29 | 8120 | 7.55E-05 | 9.25E-06 | 6.62E-05 | | | 1 | 103500 | 207000 | 3573.28 | 0.26 | 0.29 | 8371 | 7.54E-05 | 9.24E-06 | 6.62E-05 | ## REFERENCES: sh XY of cal ing hm ple ply ital ize all me ting d to sed end will tion nize vork rent (by ding NoC two ower lying n be with edia e for logy. ns of L.M.Ni and P.K.McKinley. A survey of wormhole routing techniques in direct networks. IEEE Transactions on Computers, v. 26, pp. 62-76, Feb. 1993. [2] W.J. Dally and B.Towles. Route packets, not wires: onchip interconnection networks. Proceedings of DAC, pp.684–689, June 2001. [3] P.V.Gratz., S.W.Keckler. Realistic workload characterization and analysis for Network-on-Chip design, Proc. of the 4th Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI), January 2010. [4] Noxim simulator: https://github.com/davidepatti/Noxim [5] C.J.Glass, L.M.Ni. The turn model for adaptive routing, Journal of ACM,41(5), pp. 874 - 902, Sept. 1994. [6] G.-M.Chiu. The odd-even turn model for adaptive routing. IEEE Transactions on Parallel and Distributed Systems, 11(7), pp. 729-738, July 2000. [7] V.Catania, A.Mineo, S.Monteleone, M.Palesi, D.Patti. Noxim: An open, extensible and cycle-accurate Network on Chip simulator. IEEE Int. Conference on Applicationspecific Systems, Architectures and Processors 2015. July 27-29, 2015, Toronto, Canada. ## About the authors: Ile Dimitrievski, Ph.D. student Technical University of Sofia, Bulgaria Faculty of Computer Systems and Technology Email: <a href="mailto:ile.dimitrievski@gmail.com">ile.dimitrievski@gmail.com</a> Assoc. Prof. Valentin Stoyanov Mollov, Ph.D. Technical University of Sofia, Bulgaria Faculty of Computer Systems and Technology Email: mollov@tu-sofia.bg