Simple fault-tolerant method to balance load in network-on-chip

ELECT A fault-tolerant odd–even (FTOE) turn model and load-balancing multi-router fault-tolerant (LBMF) routing method that do not use virtual channels for mesh network-on-chip (NoC) are presented. Applying FTOE rules and routing packets around faulty regions in advance, LBMF balances the load around the faulty regions, and then the fault-tolerant paths are shortened. Simulation results show that LBMF’s network injection rate can be improved than related works by 25, 40% for 16 × 16 mesh and in the presence of 20 faults.

Introduction: Designing a fault-tolerant routing algorithm, deadlock is a major challenge.Virtual channels are mainly used in the network to avoid deadlocks, but they introduce the additional area and increase the complex control logic of the router, so a wormhole fault-tolerant routing algorithm should introduce fewer or no virtual channels in network-on-chip (NoC) [1].Most existing fault-tolerant routing methods without virtual channels for mesh NoC avoid deadlock by turn model [2][3][4].However, because of the proposed turn model limitation result in the load imbalances and long circuitousness.
To address this problem, we propose a fault-tolerant odd-even (FTOE) turn model which is more applicable to the fault-tolerant routing than other turn models.On the basis of FTOE turn model, we propose a load-balancing and multi-router fault-tolerant (LBMF) routing method.LBMF can balance the link load around the faulty regions and reduce the fault-tolerant paths.

Model:
In this Letter, we adopt the convex region fault model [3] for the analysis and simulation that the faulty nodes are included in the nonoverlapping rectangular region and the faulty regions do not share boundaries.Faults at either the link or router level may result in the failure of the system; we only consider router level fault in this Letter for simplicity.
In FTOE, Fig. 1 shows the two rules that the routing method obeys even column NW odd column SW EN

ES
Fig. 1 Permitted and prohibited turns using FTOE Rule 1: Any packet is not allowed to take EastNorth (EN) and NorthWest (NW) turns at any node located in an even column.
Rule 2: Any packet is not allowed to take SouthWest (SW) and EastNorth (EW) turns at any node located in an odd column.
Fig. 1 shows these two rules.According to the FTOE rules, ES and SW turns will not occur in the same column, so the clockwise rightmost column is broken.Furthermore, NW and EN turns cannot occur in the same column, so the counter-clockwise rightmost column is broken.Therefore, FTOE is deadlock-free.
LBMF routing: Packets are routed inside the network using the following three rules.The C, S and D represent current node, source node and destination node respectively.The odd (even) boundary column of the faulty region is marked as O (E).
Rule 3: A fault-free routing.If S is in an even (odd) column and D is at the northwest (southwest) side of S, packets are first routed West to the odd (even) column.Otherwise, the packets are routed by the Y direction firstly and X direction secondly (YX) routing.
Rule 4: When S and D are, respectively, at the North and South side of the faulty region, or D is at the same row of the faulty region, packets will encounter the faulty region along the y-dimension, as shown in Fig. 2. The packets are routed around faulty region consisting of two phases: (i) We predict the position of the faulty region (the North and South nodes of the faulty region should be notified with its position information).
(ii) The packet obeys FTOE rules to route around faulty region in advance.Since packets route around faulty region in advance and some packets routed around the faulty region along West boundary in [2][3][4] are routed to East, the link load around the faulty region is balanced and the faulttolerant paths of some packets are shortened.
Rule 5: When S and D are, respectively, at the East and West sides of the faulty region, packets will encounter the faulty region along the x-dimension.Packets are first routed by Rule 3 until they reach a boundary node of the faulty region.To avoid some prohibited turns, the West boundary of the faulty region has two columns, one even and one odd, which offer just enough flexibility for the packets to make turns for all situations, as shown in Fig. 3.
Fig. 3 Two cases of routing around faulty region along x-dimension Deadlock avoidance and livelock avoidance: According to Rules 3-5, when packets reach any boundaries of faulty region or route along the faulty region, the rightmost column cannot be formed, so there is not deadlock in LBMF.
If the source and destination are not blocked by the faulty region, LBMF is minimal and thus it is livelock-free.Otherwise, packets may be routed to West or along the boundary of the faulty region.There are two cases that packets routed to West will be ended.One of which is the packets are routed at the same column as West region boundary and the other occurs when the packets reach the left network edge node.If the current and destination nodes are at the same side of the faulty region, packets routed along the boundary of the faulty region will be ended.Furthermore, it is impossible that a packet repetitive routes a faulty region.Thus LBMF is livelock-free.
Simulation results: We use BookSim2.0 to evaluate LBMF in average network latency with Wu [2] and Fu [3].We assume that the network topology size is 8 × 8, and simulate the cases with one, two and five faults.As shown in Fig. 4a, when there is one fault in the network, the average network latency is similar with these Wu [2], Fu [3] and LBMF routing methods.With the number of faults increases, more faulty regions are translated.LBMF can improve the network injection rate by 3.5, 8.5% over Wu and 11.4,18.5% over Fu with 100 cycles network latency.
To evaluate the effect of network size on routing method, 16 × 16 mesh with one, ten and twenty faults is simulated.From Fig. 4b, with the increasing of the network size, the threshold of injection rate is lowered, and the average network latency is increased.The reason is that the fault-tolerant paths are grown, and that the related network congestion is increased.With only one fault existing, Fu's performance is better than Wu's and LBMF.With over two faults existing, LBMF could balance some packets routed around the faulty region along West boundary are routed to East.LBMF can improve the network injection rate by 27.6, 25% over Wu and 42.3, 40% over Fu, with 100 cycles network latency.
ELECTRONICS LETTERS 12th May 2016 Vol.52 No. 10 pp.814-816 The results reveal that LBMF, no matter how the network size is expanded, can provide a better performance for reducing average network latency than Wu [2] and Fu [3].LBMF is theoretically proved to be deadlock-and livelock-free.
Simulation results demonstrate that LBMF, no matter how the number of fault or the network size is changed, can provide a better performance in reducing average network latency than state-of-the-art fault-tolerant routing methods designed for NoC without virtual channels.

Fig. 2
Fig. 2 Four cases of routing around faulty region along y-dimension