Fault-Tolerant Network-On-Chip Router Architecture Design for Heterogeneous Computing Systems in the Context of Internet of Things

Network-on-chip (NoC) architectures have become a popular communication platform for heterogeneous computing systems owing to their scalability and high performance. Aggressive technology scaling makes these architectures prone to both permanent and transient faults. This study focuses on the tolerance of a NoC router to permanent faults. A permanent fault in a NoC router severely impacts the performance of the entire network. Thus, it is necessary to incorporate component-level protection techniques in a router. In the proposed scheme, the input port utilizes a bypass path, virtual channel (VC) queuing, and VC closing strategies. Moreover, the routing computation stage utilizes spatial redundancy and double routing strategies, and the VC allocation stage utilizes spatial redundancy. The switch allocation stage utilizes run-time arbiter selection. The crossbar stage utilizes a triple bypass bus. The proposed router is highly fault-tolerant compared with the existing state-of-the-art fault-tolerant routers. The reliability of the proposed router is 7.98 times higher than that of the unprotected baseline router in terms of the mean-time-to-failure metric. The silicon protection factor metric is used to calculate the protection ability of the proposed router. Consequently, it is confirmed that the proposed router has a greater protection ability than the conventional fault-tolerant routers.


Introduction
Applications that improve the lifestyles of users such as the Internet of things (IoT), cloud computing, and cognitive computing have attracted considerable attention in recent years [1,2]. These applications and systems generate enormous amounts of data continuously [3]. They require exascale computing systems to process these continual data. Exascale computing systems have high capabilities of computation and storage, with several heterogeneous cores on a chip [4]. A previous study investigated the design of fault-tolerant capabilities for industrial cyber-physical systems (ICPS) and real-time monitoring using efficient hardware infrastructure [5]. The authors emphasized the use of artificial intelligence and deep learning and the development of new fault-tolerant techniques to

•
This study utilizes the inherent redundancies in the pipeline and lookahead routing to maintain the performance in the presence of faults. • This study proposes highly fault-tolerant schemes for each stage of the router pipeline.

•
This study compares the latency, hardware consumption, and reliability of the proposed architecture with those of the state-of-the-art fault-tolerant router architectures.

Related Work
In [18], the authors emphasized the fault diagnosis and fault-tolerant control problem of Markov jump systems (MJS), which exist abundantly in mobile manipulator systems. They investigated the fault-tolerant control for MJS sensor faults. The developed system model is based on stochastic noise terms and the time delay of the state variables and results in the characterization of more features than the current designs. Industry 4.0 and ICPS are the core concepts of IoT. In [19], the authors investigated and reviewed the monitoring, fault diagnosis, and control tasks related to ICPS. Moreover, unobservable attacks, problems in data-driven assessment, and fault-tolerant schemes were discussed in detail. A cyber-physical system comprises embedded sensors and actuators for an interaction with the environment. The availability of the Internet has improved the scalability and functionalities of such systems, but they are susceptible to security threats. Therefore, it is necessary to avoid such threats using cyber security techniques. In [20], the authors analyzed the impact of replay attacks on ICPS. They proposed that the tolerance against replay attacks can be improved by adding an authentication signal to the control unit. Cyber security for power systems has been studied extensively; researchers have focused on several cyber-physical attacks, but they have rarely considered availability attacks. In [21], the authors modeled a hybrid cyber attack by combining availability and integrity attacks. They examined false-negative and false-alarm attack scenarios and proved that the proposed model lowers the attacks with a reduced cost.
In [22], the authors proposed the BulletProof router. It utilizes N-modular redundancy (NMR) and error correcting codes (ECCs) to protect the components of the router against permanent faults. However, NMR has a large area overhead. The RoCo router investigated in [23] is divided into horizontal and vertical modules. It uses parallel arbiters and smaller XBARs for horizontal and vertical connections. The horizontal and vertical modules perform independently. Thus, a defective horizontal module does not disturb the operation of the vertical module, and vice versa.
In [24], the authors proposed the Vicis router. It tolerates permanent faults at the network and router levels. The network-level faults use input port swapping and adaptive routing algorithms. On the other hand, the router-level faults use ECC and bypass the bus to tolerate permanent faults in the input buffers and XBAR, respectively. In [25], the REPAIR router was investigated. It improves the input port swapping algorithm of the Vicis router through expensive re-routing. However, it incurs an area overhead of 50%, which is higher than that of the Vicis router (40%).
In [26], the authors proposed a PVS router. It exploits a partial VC-sharing strategy to tolerate the faults in the input port and RC unit. However, if the shared component receives a fault, all the associated input ports become inaccessible. Accordingly, the decoupled resource sharing (DRS) router, which shares the resources of three adjacent input ports through DRS modules, was proposed to overcome this problem [27]. Even when the DRS module of an input port becomes faulty, it does not disturb the functionality of the neighboring input ports.
In [28], the authors proposed the SHIELD router. It tolerates permanent faults in all the pipeline stages. The RC unit employs spatial redundancy. The VA unit employs resource sharing. The SA unit uses the default winner strategy. The XBAR unit employs multiple secondary bypass paths. Additionally, the NoCGuard router was proposed in [29]. It also tolerates permanent faults in all the pipeline stages. The RC unit employs resource sharing and double-routing strategies. The VA unit uses the default winner strategy. The SA unit employs run-time arbiter selection and default winner strategies. The XBAR unit uses multiple secondary bypass paths. Moreover, this router tolerates a higher number of faults in each pipeline stage than the SHIELD router at a reduced cost.
In [30], the authors proposed high performance router (HPR). It tolerates faults in all the components of the router. The input port buffers employ ECC. The RC unit employs a dual routing scheme. The VA unit employs the default winner strategy. The SA unit employs run-time arbiter selection strategy. The XBAR unit employs a double bypass bus strategy. In [31], the authors proposed the Defender router. It also tolerates faults in all the components of the router. The input port employs resource sharing. The RC unit employs the default winner strategy. The VA unit employs the default winner strategy. The SA unit employs run-time arbiter selection strategy. The XBAR unit provides two bypass buses that bypass a faulty crossbar. Moreover, this router tolerates more faults than the HPR router.
In [32], the authors proposed NoCAlert. It is a comprehensive online fault detection mechanism for NoC router architectures. It comprises multiple checkers for each router component. They seamlessly and concurrently monitor functional irregularities. NoCAlert detects 97% of router faults. This fault detection mechanism incurs area and power overheads of 0.3% and 0.7%, respectively.
The proposed fault-tolerant router detects and tolerates permanent faults in all the components of the router. It uses NoCAlert for fault detection [32]. The proposed router tolerates a higher number of faults in all the components of the router than existing state-of-the-art fault-tolerant routers. Figure 1 depicts the baseline NoC router. It comprises five input ports, five output ports, and four pipeline stages, namely, RC, VA, SA, and XBAR [33]. Each input port comprises de-multiplexers, multiplexers, and VCs, as shown in Figure 2a. The de-multiplexers and multiplexers guide the flits in and out of the VCs. Each input port has four VCs. Each VC comprises buffers and stores incoming flits.

Proposed Fault-Tolerant NoC Router Architecture
The first pipeline stage comprises RC units that compute the route for the packet on its arrival. This stage operates only on the head flit. The baseline router employs lookahead routing, which computes the routing for the downstream router and embeds the result in the packet.
The second pipeline stage comprises the VA unit, which allocates an empty VC buffer for each packet at the downstream router. VA comprises two stages of arbiters, as shown in Figure 2b. In the first stage of VA, the input VC with a head flit competes with other VCs for an empty VC in the downstream router. The second stage arbitrates among the input VCs winning arbitration for the same downstream VC.
The third pipeline stage comprises the SA unit, which grants the flits access to the XBAR. It is a two-stage process, as shown in Figure 2c. The first stage arbitrates among the input port VCs attempting to access the XBAR. The second stage arbitrates among the input ports winning arbitration for the same input port of the XBAR. The baseline router comprises two separable SA units: non-speculative and speculative SA. Speculative SA occurs in parallel with VA. The flits that win both VA and speculative SA simultaneously traverse the XBAR in the next cycle. The flits that do not win VA or speculative SA proceed to arbitration through non-speculative SA in the next cycle.
The fourth pipeline stage comprises the XBAR unit, which allows the flits in the input to access the output ports. The baseline router comprises a multiplexer-based 5x5 XBAR, as shown in Figure 2d. The SA stage provides control signals to reconfigure multiplexers every cycle.
Each component of the router pipeline performs a distinct role in the operation of the router. The functionality of each pipeline component depends on the results of the previous component. Thus, it is necessary to protect all the components against permanent faults.

Fault-Tolerant Design of Input Port
The input port comprises a de-multiplexer, multiplexer, and VCs. The permanent faults in the de-multiplexer and multiplexer block flits arriving in and out of VCs. If a permanent fault occurs inside the VC, it can corrupt the flits. Thus, it is necessary to bypass or mask the effects of these faults. We propose a bypass path with the VC queuing and closing mechanism, which maintains the functionality of the router even if all the de-multiplexers, multiplexers, and VCs become faulty. Figure 3 shows the modified input port. When a fault occurs in the VC, the control unit sends the signal to the upstream router to stop sending flits to the faulty VC. Hence, this VC closes. Upon the failure of the four VCs, multiplexer, or de-multiplexer, the entire port becomes faulty. Then, the bypass path is activated. Flits use the bypass path to reach the XBAR. The flow of these flits is still controlled by the current router. The upstream router stops the flit until it wins the VA and SA in the current router. Thus, flits are physically stored in the upstream router, but are virtually queued and arbitrated in the current router through control signals between adjacent routers. As they win both the VA and SA stages, they traverse the bypass path to reach the XBAR and, finally, their destination. This technique maintains the functionality of the router even if the de-multiplexer, multiplexer, and all the VCs of the input port become faulty.

Fault-Tolerant Design of RC Stage
Each input port has its own RC unit. A permanent fault causes deadlock or misroute flits. The baseline router employs the lookahead routing mechanism. Accordingly, the faulty RC unit computation does not cause misrouting in the current router. Misrouting occurs in the downstream router, which utilizes this computation. We utilize a double routing strategy and provide a redundant RC per input port to handle this situation. Figure 4 shows the modified RC stage. RC_N represents the RC unit that computes the lookahead route in a fault-free scenario. RC_C represents the redundant RC unit. In the case of a fault, the following scenario arises: If the RC_N unit becomes faulty, the RC_C unit replaces it. Now, the RC_C unit computes the lookahead route. Figure 4a depicts this scenario.

Scenario 2
If both the RC_N and RC_C units become faulty, the packet in the downstream router is blocked. To handle this situation both the RC units in the downstream router are activated. The RC_N unit computes the lookahead route, whereas the RC_C unit computes the current route. Figure 4b depicts this scenario.

Scenario 3
If both the RC_N and RC_C units of the current router and the RC_N unit of the downstream router have a fault, then the RC_C unit of the downstream router performs the operation using a double routing strategy. It first computes the current route for a packet. When the packet is in the VA and SA stage, it computes the lookahead route. Figure 4c depicts this scenario. The NoCAlert checkers [32] were used to detect faults in the RC unit. Figure 5 shows the RC fault detection mechanism. Error 1 signal asserts when the input and output ports of the flit are the same. Error 2 signal asserts when flit from north or south input ports turns to east or west output ports. Both cases violate the working principle of the XY routing algorithm.

Fault-Tolerant Design of VA Stage
VA occurs in two stages: VA1 and VA2. In VA1, there is an A:1 arbiter for every input VC. In VA2, there is an A:1 arbiter for every output VC, where A represents (the number of VCs per output port) × (the number of output ports), that is, 20. We separately consider fault tolerance in both stages.

First Stage of VA (VA1)
To handle a faulty arbiter in VA1, we propose adding a spare arbiter for every input port. Figure 6 shows the modified VA1 stage. The request lines from all four VCs of an input port connect to the spare unit through a 4:1 multiplexer. The output of the spare arbiter connects to the corresponding arbiters in VA2 through a 2:1 multiplexer. The control unit generates the necessary signals in case of a fault. When an arbiter becomes faulty, the control unit routes the request lines to the spare arbiter. It allocates an output VC to the input VC. When two or more input VC arbiters become faulty, the spare unit operates in a round-robin manner. The control unit assigns the request lines to the spare unit in a round-robin manner. This technique functions well even if all the arbiters of an input port become faulty. We employ pipeline optimization to mask the effect of a faulty arbiter in VA2. In [25], the authors proposed combining allocation that removes VA2 from the pipeline of the flit. Figure 7 shows the proposed scheme for VA2. From SA1, we observe that only one VC can send a flit at a time (the input VCs of the same input share the input of the XBAR). SA2 selects only one flit for an output port from multiple requests. This is a structural restriction imposed by the data path of the router. Owing to these restrictions, a flit can be allocated to an output VC by performing VA1 in series with SA. The output VC can be successfully assigned to the input VC even when its associated arbiter becomes faulty. The NoCAlert checkers [32] were used to detect faults in the arbiters. Figure 8 shows the arbiter fault detection mechanism. Error 1 signal asserts when one or more request lines are high, but the grant lines are zero. Error 2 signal asserts when multiple grants are detected. Error 3 signal asserts if the arbiter grants without a request. All these cases violate the working principle of the arbiter.

Fault-Tolerant Design of SA Stage
The baseline router comprises two similar sets of SA units: non-speculative and speculative SA. We exploit this redundancy to tolerate faulty arbiters. Figure 9 shows the modified SA stage. SA_NS_1 and SA_NS_2 handle non-speculative requests, whereas SA_S_1 and SA_S_2 handle speculative requests. If the arbiter in the non-speculative SA becomes faulty, its requests shift to a speculative SA arbiter. Now, speculative SA handles both types of requests. Both stages of SA units exploit this strategy. The proposed technique tolerates faults at runtime and avoids stalls.

Fault-tolerant Design of XBAR Stage
XBAR connects the input and output ports of a router. If a fault occurs in an XBAR, flits cannot reach the output ports. We propose an XBAR with a triple bypass bus to make it fault-tolerant. Figure 10 shows the proposed fault-tolerant scheme for the XBAR.
We add three bypass buses to traverse across the faulty XBAR: horizontal, vertical, and local bypass buses. A horizontal bypass bus connects the X-dimension input ports to all the output ports. The vertical bypass bus connects the Y-dimension input ports to the Y-dimension and local output ports. In the XY routing algorithm, flits first traverse the X-dimension and then the Y-dimension. Thus, Y-dimension input ports do not connect to X-dimension output ports. The local bypass bus connects the local input port to all the output ports. The proposed XBAR traverses three flits at a time in a worst-case scenario, that is, if all the multiplexers of the XBAR are faulty.

Latency Analysis
We examine and compare the proposed router with the baseline router to analyze the effect of the fault-tolerant circuitry on latency. The Gem5 simulator [34] was used for the simulation. The proposed router was implemented in Garnet [35], which is a cycle-accurate NoC simulator integrated into Gem5. The simulation is performed on an 8 × 8 mesh network with four VCs per port. Each VC has 16 buffers of 128 bits. Synthetic and application benchmark traffic patterns are used for the simulation. The most effective method to simulate faults is to inject faults based on the failure in time (FIT) values of the component. The FIT values are minute and require applications to run for a long period of time. To speed up the simulation, we inject multiple permanent faults in the router components after 1 million cycles of its operation.
In the first part of the experiment, the synthetic traffic patterns are used for simulation. The simulation runs for five different injection rates. Figures 11 and 12 show the latency assessment of the proposed router compared with that of the baseline unprotected router. As the injection rate approaches 0.1, contention increases. This causes latency to increase, as packets have to wait longer for resource allocation. Faults in the router pipeline aggravate the situation of delayed resource allocation and further contribute to an increase in latency. Beyond an injection rate of 0.1, latency increases exponentially. In a fault-free scenario, the proposed router consumes no additional cycles. When faults are injected into the proposed router, the average latency increases by 2.69 % and 3.17% for the uniform random and tornado traffic patterns, respectively.  In the second part of the experiment, the application benchmark traffic patterns, that is, stanford parallel applications for shared-memory (SPLASH-2) [36] and princeton application repository for shared-memory computers (PARSEC) [37], are used for simulation. The configuration of the routers remains the same as in the first part of the experiment. Each core has its cache and directory. Figures 13  and 14 show the latency assessment of the proposed router compared with that of the baseline unprotected router. The average latency of the proposed router increases by 15% and 12% for the SPLASH-2 [36] and PARSEC [37] benchmark traffic patterns, respectively.

Hardware Overhead Analysis
The baseline router was first implemented in Verilog HDL for hardware overhead analysis. Then, NoCAlert fault detection checkers [32] were added to each pipeline component. Finally, the reconfigurable fault tolerance scheme for each pipeline component was implemented on top of the router. When there is no fault in the router, it behaves like the baseline router. In the presence of a fault in a pipeline component, the corresponding fault-tolerance circuitry becomes activated to perform router operation. For synthesis, we used the NangateOpenCell 15 nm technology library [38] with the Cadence Encounter RTL compiler. The synthesis results reveal that the proposed router consumes 26.6% more area and 28% more power than the baseline router.

Lifetime Reliability Analysis Using MTTF
We use the mean-time-to-failure (MTTF) [39] metric to evaluate the lifetime reliability of the proposed router compared with that of the baseline router. The MTTF of a component can be calculated as where FIT Component is defined as the number of failures per billion hours of operation. To estimate FIT Component , we use the FIT estimation model proposed in [40]. For the TDDB failure mechanism, the MTTF is given as where A TDDB , a, b, X, Y, and Z are fitting parameters, whose values are derived in [41]. N logic_gate is the transistor count of a logic gate, D is the duty cycle (100%), V dd is the operating voltage (1 V), T is the operating temperature (300 K), and K is the Boltzmann constant. The sum-of-failure-rate (SOFR) [42] model is utilized to calculate the FIT of a logic circuit. It assumes that the FIT of a logic circuit is the sum of the FITs of the individual gates.

FIT Calculation for Baseline Router
The baseline router comprises five input ports. Each port comprises four VCs. Each VC store 16 flits. Each flit is 128 bits wide. A basic component of the input port is a D flip-flop. The RC unit comprises two comparators, one for each dimension. VA and SA comprise arbiters. The XBAR unit comprises multiplexers. Table 1 lists the fundamental component (FC), the FIT of each FC, the number of FCs, and the total FIT of each stage of the baseline router. Table 2 lists the fundamental component (FC), the FIT of each FC, the number of FCs, and the total FIT of each stage of the correction circuitry.
The proposed fault-tolerant router operates well as long as the underlying baseline router or correction circuitry is fault-free. The MTTF of a system having two components, i.e., baseline router and correction circuitry, with the failure rates FIT 1 and FIT 2 , respectively, is expressed by utilizing the SOFR model [42] as where FIT 1 is the FIT of the baseline router calculated as 20,480 + 117 + 1468 + 215 + 4096 = 26,376, and FIT 2 is the FIT of the correction circuit calculated as 1024 + 117 + 271.5 + 32 + 2867.2 = 4311.7. By substituting these values in equation (4), the MTTF is determined to be 302426.68 hours. It is 7.98 times higher than that of the baseline router. Thus, the lifetime reliability of the proposed router is 7.98 times higher than that of the baseline router.

Reliability Analysis using SPF
We use the silicon protection factor (SPF) [22] metric to compare the reliability of the proposed router with that of state-of-the-art fault-tolerant router architectures. SPF represents the amount of protection offered by the fault-tolerant system. The higher the SPF, the more resilient each transistor is to defects. The number of defects in a system is directly proportional to its area. Thus, SPF provides a representative notation of the fault tolerance provided by the proposed system. It is expressed as We first calculate the number of defects each component of the router tolerates to calculate the average number of defects that cause the failure of a router.

•
Input port: The baseline router consists of five input ports. Each port consists of a de-multiplexer, multiplexer, and four VCs. The proposed fault-tolerant methodology tolerates faults in all the six components of an input port. Thus, a router tolerates a maximum of 30 input port defects. A defect in a de-multiplexer/multiplexer and bypass path causes the failure of an input port. Thus, a minimum of 2 defects cause input port failure. If a stage of the router fails, the entire router fails. We consider the smallest number among the minimum numbers of defects that cause the failure of each stage. Thus, minimum {2(Input port), 4(RC), 2(VA), 2(SA), 2(XBAR)} = 2, defects cause router failure. We add the maximum number of faults each stage tolerates to calculate the maximum number of faults the router tolerates. Thus, the router tolerates a maximum of total {30(Input port) + 5(RC) + 40(VA) + 10(SA) + 5(XBAR)} = 90, defects. Router will fail if one more defect occurs. Thus, maximum defects to cause router failure are 90 + 1 = 91.
The average number of defects that cause router failure is expressed as  Table 3 presents the comparison of the SPF value of the proposed router with those of the state-of-the-art fault-tolerant router architectures. The proposed fault-tolerant architecture tolerates more faults incurring a minimum area overhead, compared with all the state-of-the-art fault-tolerant router architectures, by exploiting inter component dependencies and inherent redundancies. In addition, it achieves the highest SPF value compared with the state-of-the-art fault-tolerant router architectures. This indicates that the proposed router is more reliable than the state-of-the-art fault-tolerant router architectures.

Conclusions
NoC architectures are increasingly adopted in exascale heterogeneous computing systems owing to their scalability and performance. These systems are used in IoT applications, cognitive computing, and cloud computing. The reliability of NoC is one of the key issues. This paper proposed efficient techniques to improve the reliability of an NoC router against permanent faults. The proposed techniques provided fault tolerance for the input port, RC, VA, SA, and XBAR at the cost of modest additional circuitry. The hardware synthesis results revealed that the proposed router consumes 26.6% more area and 28% more power than the baseline router. The MTTF analysis showed that the reliability of the proposed router is 7.98 times higher than that of the baseline router. We used the SPF metric to estimate the protection ability of the proposed router. The result showed that the proposed router has a larger SPF value than that of the existing fault-tolerant router architectures and tolerates a greater number of faults in each router pipeline component. The idea of using the inherent redundancies in pipeline and adaptive algorithms can be used to design more reliable router architectures in the future.

Conflicts of Interest:
The authors declare no conflict of interest in the publication of this paper.

Abbreviations
The following abbreviations are used in this manuscript: