Detection and Countermeasures of Security Attacks and Faults on NoC-based Many-Cores

The modularization and manufacture of many-cores system-on-chip that involve several vendors open up a vulnerability: the inclusion of Hardware Trojans (HT). In addition to that, the reduced feature size of transistors may accelerate aging effects, leading to faults. The literature presents techniques to tackle security and fault-tolerance, such as cryptography, authentication codes, error correction codes, creation at runtime of flow profiles to detect anomalous behavior. However, at the communication level (i.e., NoC), there is a gap in generic methods to detect attacks or faults. As detailed in the state-of-the-art session, approaches targeting the NoC protection against attacks add additional hardware in the NoC itself, which is prone to security attacks or faults. This work decouples the detection of attacks or faults by using data and control NoCs. The adoption of a control NoC enables the proposal of the Communication Session Protocol to monitor message exchange, detect abnormal behavior, and recover the communication from an eventual failure or attack. The execution time overhead varies according to the application communication model, from 3.5% to 33%. Such overhead is acceptable because once detected an abnormal communication behavior, the protocol changes the path between communicating task pairs and resumes the application execution.

The increasing number of different features and functionalities inside a single chip also increases the variety of thirdparty IPs (3PIPs). Such IPs come from different vendors due to competitive prices and time-to-market. The presence of 3PIPs raises the risk of having a Hardware Trojan (HT) insertion [8]. Assuming HTs infect the NoC, these can per-form several types of attacks that threaten security principles [9]. Such attacks may affect confidentiality by redirecting messages to malicious agents, availability by dropping messages or blocking a communication path, and integrity by corrupting the content of a packet traversing the NoC.
Besides HT attacks, eventual faults may violate the system dependability. Dependability is the ability to deliver service that can justifiably be trusted [10]. Dependability includes the following attributes: availability, reliability, safety, integrity, and maintainability. The literature presents works focusing on communication dependability using secure cryptography key exchange [11] and cooperative communication [12] in environments as wireless network sensors, differently to our proposal, targeting M CSoCs.
Thus, in the context of M CSoCs, a unified method must be proposed to deal with security threats, manufacturing faults, aging, or application constraints (e.g., QoS). Although this work focuses on securing the communication against HT attacks, the method is general and applicable to faulttolerance and QoS constraints.
The motivation of this work is the following question: "how to make the communication between processing elements safe and fault-tolerant"? The literature presents several techniques, such as cryptography [13], authentication codes [14], error correction codes [15], creation at runtime a flow profile to detect anomalous behavior [16]. Adopting these techniques makes it possible to detect violations related to security or faults in the NoC. However, they do not provide a generic method to detect the attacks or faults and evidence for the execution of countermeasures. For example, a PE may detect a tampered packet, then discard it and request a retransmission. This retransmission will probably use the same path, and the new packet will arrive with the same problem.
The goal of this work is to propose an original method to secure the communication between processing elements. This method relies on three pillars: (1) control NoC with broadcast transmission; (2) data network with support for XY and source routing; (3) an additional layer in the message exchange mechanism. The Processing Element (PE) operating system (OS) has exclusive access to the control NoC, thus preventing access to it by malicious applications. The message exchange mechanism involves at least two packets, requests for data, and data transmission. Our proposal adds to each packet an additional packet sent by the control NoC, with a unique identifier to each communicating pair, named session key, or K s . This method enables the receiver of a packet to confirm its authenticity using K s . Decoupling the attack or fault detection method between the data and control NoCs helps in anomalies detection and decisionmaking regarding countermeasures or fault tolerance. If K s does not match the stored key, a possible countermeasure is the request to resend the packet, avoiding the original path. As the data network supports source routing, it computes a new path, circumventing the affected region.
The original contributions of this work concerning the state-of-the-art include: • a unified method to handle security threats and faults, being possible to extend it to cope with aging effects and quality-of-service; • a protocol that mitigates attacks or circumvents faults without the need to locate the issue. A new path avoids the affected path by using a clever rerouting mechanism; • a proposal that decouples the communication medium (NoC) from the infrastructure responsible for dealing with security and fault issues. Proposals in the literature add hardware modules in the hardware prone to attacks and faults, i.e., the NoC. Our work adopts a parallel and simple NoC to detect these issues. This contribution enables the adoption of 3PIP NoCs, keeping the design safe. This work is organized as follows. Section II presents the baseline architecture and the threat model. Section III discusses the related work. Section IV describes the main original contribution, the Communication Session Protocol. The protocol: (i) monitors the message exchange; (ii) detects abnormal behavior (possibly caused by a failure or an attack); (iii) recovers the communication from an eventual failure or attack. Section V evaluates the proposed protocol, and Section VI concludes this work and point-out directions for future work. Figure 1 presents the M CSoC reference architecture, a homogeneous NoC-based many-core. Each PE has a 32-bit RISC processor, a DMNI module (a network interface with DMA capabilities) [17], local dual-port memory, two routers, and wrappers. The reference architecture is derived from the public-available MEMPHIS [18] many-core 1 .

II. SECURE ARCHITECTURE AND THREAT MODEL
Two NoCs interconnect the PEs: data and control NoC. The data NoC transfers data messages exchanged by applications. The data NoC adopts duplicated physical channels, wormhole packet switching, simultaneous support for distributed XY, and source routing. The control NoC transfers the control messages, such as: (i) the set of PEs belonging to a secure zone; (ii) definition of a fault-free path to circumvent a secure zone. The control NoC has the following features: adoption of broadcast as the default transmission mode, bufferless router, each message has one flit. Its router has five internal blocks: two finite state machines, two arbiters, and an 8-slot CAM buffer. The control NoC router has a small area footprint, corresponding roughly to 20% of the data router [19].
Both NoCs contain wrappers in the flow control signals (W in Figure 1). When activated, the wrapper enables to discard all incoming and outcoming packets of a given port. This approach guarantees the creation of a logical barrier at the hardware level and thus the secure zones creation. Secure Zone (SZ) is a defense mechanism that spatially isolates a region of the many-core, reserving it for the execution of an application with security constraints (App sec ) [21]. SZs isolate the App sec computation and communication from other applications running on the system.
A particular case of SZ is the Opaque Secure Zone (OSZ) [22]. OSZ is a defense mechanism executed at runtime. In summary, the method relies on finding a rectilinear region on the system with PEs not executing user tasks to map App sec . If there is no "free" region to map App sec , the method migrates tasks to open space in the system. The OSZ activation occurs by setting wrappers ("W" in Figure 1) at the boundaries of the rectilinear region, blocking all incoming and outgoing traffic trying to cross the OSZ.
OSZ prevent attacks from outside sources, such as Denialof-Service (DoS), timing attack, spoofing, man-in-the-middle [22,23]. Even though the method is robust against external attacks, it still presents vulnerabilities when considering that data-NoC routers infected by HTs placed inside a Secure Zone.
The method we present to secure the communication is general and not coupled to the OSZ method. We chose the OSZ secure architecture because it prevents most of the attacks reported in the literature [22,24]. Figure 2 illustrates an example of a DoS attack executed by an HT. Figure 2(a) presents a 3x3 system with two communicating tasks running on it: T1 and T2, and an HT (deactivated) in a router of the path. Figure 2  According to [25], HTs can be Always On -meaning that they are always active, or Triggered -meaning they need to meet a condition to be activated, as the one used in the example above. HT triggers include time and physical condition (internal triggers) or user input and component output (external triggers).

B. THREAT MODEL
Thus, even OSZ can be attacked internally by HTs, regardless the activation model. Assuming an NoC as a 3PIP, it can be infected by HTs, requiring countermeasures to avoid or mitigate the attacks. The threat model considered in this work includes the following DoS HT attacks [26]: • Packet Loss: one or more of the routers are dropping packets. Therefore, the target never receives the message, and the tasks are left waiting, blocking their execution. This is a DoS-type attack and is also known as a blackhole attack [27]. • Packet Misrouting: one or more routers change the packet header, sending it to the wrong destination. Con-sequently, the target PE stays blocked, waiting for the requested packet, as in the previous attack. • Port Blocking: one or more ports of the routers cannot send or receive packets, making them stall and causing contention. In this case, the blocking can be temporary, affecting only the packet latency, or it can also be permanent, blocking the application.
The present work considers the CPU, memory, DMNI, control NoC, and the OS as reliable entities. Third-party entities like data NoC and applications are unreliable.

III. RELATED WORK
This section presents works exploring the effects or countermeasures against attacks, including Hardware Trojans and DoS. Then, Section III-A presents a table summarizing the related work, comparing them to our proposal.
Charles and Mishra [28] propose a trust-aware routing to bypass malicious IPs during a message transmission through an NoC. In this work, the routers compute the trust they have in their neighbors. The trust values are computed following a trust model, adjusted at runtime. Whenever a packet does not reach the destination or the response does not come back to the source, the trust values of the routers are updated. Then, the packet is resent, and the routing now considers the new values of trust. However, this trust-aware routing uses the same NoC that is considered unsafe to update and configure the trust values, which might also be vulnerable to attacks. Our work also adopts routing to bypass malicious IPs, but using a control NoC to avoid paths in the data NoC that may be compromised by attacks.
Software-Defined Networking (SDN) is a paradigm that can be used as a countermeasure against attacks in the NoC. SDN removes the communication management from routers and moves it into a centralized controller responsible for configuring the routers. Ruaro et al. [29] adopt SDN to reduce the NoC energy consumption, provide QoS and fault tolerance. Moreover, the proposal adopts a secure method to configure the SDN routers using private keys. Experiments show that SDN can avoid attacks such as DoS, Flooding, and Spoofing. According to the Authors, the proposal is still vulnerable to HTs, lacking the discussion of methods that can be applied to prevent them.
Charles et al. [16] propose a detection and localization method of malicious IPs attacking the system with a distributed denial-of-service (DDoS) attack. The proposed framework is based on a communication model obtained at design time. At runtime, attacks are detected when performance boundaries are violated. In terms of localization of the IP causing the DoS attack, the authors use a congested graph containing information of all congested paths and routers of the system. Based on that, the attacker can be found most of the time. Similar to our threat model, this proposal focuses on DoS attacks, however, only in detection and localization, not exploring the recovery or defense mechanisms against such attacks on the system.

VOLUME xx, 2021
Zhang et al. [30] explore and evaluate the effects of two HT-based DoS-attacks: blackhole and sinkhole. The blackhole attack occurs when an infected router stops transmitting forward the packets that it receives, dropping the packet, or sending to unsafe destinations. The sinkhole attack works mostly on adaptive routing by requesting packets from neighbor routers, claiming to have buffer space. Then, when it receives the packet, it might also drop it or send it to another recipient. The authors explore several configurations of those attacks, varying parameters such as HT location, packet length, and the presence of a defense mechanism based on adaptive routing. In this case, HT-attacks are similar to our threat model. Nonetheless, the difference is that Zhang designed an NoC equipped with the defense mechanism is considered untrusted, making the countermeasure vulnerable, while we chose a control NoC to circumvent this issue since it is responsible for detecting and recovering from HT attacks.
Chaves et al. [31] propose detecting DoS (flooding) attacks in an NoC by monitoring packet collision, being able to locate the collision point and the directions of the malicious traffic. They propose to enhance the routers with a DoS monitor, which supervises the latency of the packets and reports it to the OS of the PE processor. When reaching the OS, the latency values are analyzed, triggering a DoS suspicion report in case of latencies out of the expected value, identifying the router where the collision happened. To detect the direction of the malicious traffic, the authors also proposed the store in the packet the inputs that competed to enter the sensitive traffic. Thus, when reaching the OS, it is possible to extract the direction from the malicious traffic. More recently, Chaves et al. in [32] explored protection against flooding DoS, proposing an extra monitor in the Network Interface, which monitors the length of the packets, reporting the presence of large malicious packets intended to cause congestion in the network. Similar to our proposal, both works present approaches for detecting DoS attacks, however, without detailing the recovery method, as our proposal does.
Hussain et al. [33] propose an Energy Efficient Trojan Detection design (EETD) to detect the presence of HTs. The proposal is based on two detection units. The first one, named End-to-end Trojan Detection Units (EDU), is placed at the PE and is always active. The EDUs are responsible for packet authentication and send an attack detection signal to a detection management unit. The second detection unit is named Localization Units (LUs), placed at each routers port and are power gated to save energy. The work can detect HTs with low performance overhead and energy consumption compared to state-of-the-art techniques. However, the difference of this work with ours is that this work does not present security countermeasures once an attack is detected.
Daoud and Rafla explore the effects of the blackhole attack [27] and a method to detect this attack [34]. They propose a protocol based on inter-router acknowledgment, making it possible to detect which router dropped a packet. Once again, the instrumentation is in the router, which is a structure considered vulnerable to attacks. Even though the authors also focus on detecting blackhole attack, they do not present a recovery procedure.
Raparti et al. [35] propose a security mechanism against a snooping attack caused by HTs. The HT could transmit a packet to the wrong destination (misrouting). The Snooping Invalidation Module aims to discard packets with invalid headers, thus, preventing sensitive information from reaching undesired targets. Furthermore, they present THANOS, a mechanism that observes the network and sends an alert in case of suspicious behavior. However, according to the authors, THANOS only mitigates the attacks. Thus, still requires a more robust defense mechanism that could completely recover the system against such attack.
Harttung et al. [36] and Moriam et al. [37] use cryptography and authentication to protect the system against HTs that can tamper or drop messages in the NoC. Three authentication approaches with symmetric cryptography are considered to detect modifications on the messages, which are discarded in case of tempering. Furthermore, the receiver can detect a packet loss via timeout and request a retransmission. This work is similar to ours since it detects the attack with a timeout mechanism. However, the recovery protocol occurs in the same network that tempered the received packet, thus also being exposed to HT attacks. Moreover, the recovery protocol is not detailed by the authors.
JYV et al. [38] consider four HT types: Flit Quantity Trojan (QT), Address Trojan (AT), and Head Hardware Trojan (HHT), and Tail Hardware Trojan (THT). The proposed solution to mitigate the action of the HTs is a Shuffle Encoder placed before the Input FIFO, obfuscating the information inside the packet and avoiding the triggering of the HT that could affect the packet. Simulation results in a 4x4 NoC show that the proposed method is efficient in thwarting the HT attacks. This work presents versions of HTs that are similar to the ones discussed in our Threat Model. However, the solution is based on instrumenting the router that is considered not trusted, affecting the reliability of the security mechanism.
Hazra et al. [39] also model four HT types and propose a detection mechanism. The detection mechanism is based on machine learning techniques such as Decision Tree (DT), Support Vector Machine (SVN), and K-Nearest Neighbor (KNN). The learning phase uses the total execution time, the power consumption, the total number of instructions executed, the total number of read misses and the total number of write misses. The Authors evaluate the accuracy of prediction for each different models of Trojans separately. The DT and SVM are the methods with better detection accuracy. Related to our proposal, this work focus on detecting an HT attack. However, as discussed above, there is a lack of works that focus on recovery besides the detection of attacks.
Sinha et al. [40] also use machine learning (ML) techniques to enhance system security. The Authors propose Sniffer, a tool to detect and locate flood DoS attacks. The authors train the ML module with network features such as Buffer Waiting Time, Inter-Flit Interval, and Virtual Channel Occupancy. Each router has a module to observe the features mentioned above and a module for detecting abnormal traffic based on the ML technique. This work does not present the recovery process, besides instrumenting the same NoC to detect the attack. Table 1 classifies the works in terms of the adopted security mechanism; the instrumentation, meaning the unit/units responsible for the security mechanisms; and if there is a detection and recovery protocol. Daoud et al. [27], Charles et al. [16] and Hazra et al. [39] are examples of works that focus only on the detection of attacks, instead of implementing a countermeasure. Adaptative routing is an approach used by several Authors when the goal is to protect the communication against DoS attacks and HTs. However, implementations are mostly inserted in routers assumed to be vulnerable against HTs, which compromises the reliability of the security mechanism. Moreover, even though most works focus on detecting HTs and suspicious behavior, only a few of them propose a recovery protocol.

A. DISCUSSION
Our proposal adopts a detection and recovery protocol that uses a control NoC to monitor packet transmission and find alternative paths to avoid faulty or malicious nodes of an NoC. The adoption of a control NoC enables the detection of suspicious behaviors on 3PIP NoCs. It is possible to insert an HT in the control NoC, but it can be easily detected. The detection is considered simple because the control NoC router has a low complexity design and a small silicon area footprint (20% of the data router [19]). For this reason, the control NoC is considered a reliable entity.

IV. COMMUNICATION SESSION PROTOCOL
The Introduction Section mentioned the three pillars of the proposal: (1) control NoC; (2) data network with support for XY and source routing; (3) an additional layer in the message exchange mechanism.
The control NoC [19] is a lightweight network-on-chip, with all packets having one flit. When transmitting in broadcast, default transmission mode, packets reach all PEs of the system. Thus, this NoC can find a path from a source PE to a target PE if it exists, even in the presence of a fault or an HT in the path of the data NoC. This NoC may also use the unicast transmission to create a path between a source and a target PE, using a backtracking procedure. For security reasons, only the OS accesses the control NoC, avoiding its use by malicious applications.
The data NoC is a standard wormhole packet switching NoC without virtual channels. It has two particular architectural features. The first one is the adoption of two physical channels, acting as two disjoint NoCs. The flit size is half of the word size to minimize the area overhead, being the network interface responsible for serializing/deserializing the flits. The reason to adopt two physical NoC is to enable fully adaptive routing. The second feature is simultaneous support for XY (default routing algorithm) and source routing (SR). Source routing is required when, e.g., it is necessary to circumvent an OSZ or avoid a path with a faulty or infected router.
The following subsections detail the third pillar of the method, the additional layer in the message exchange mechanism. This layer is the communication session protocol, able to supervise message exchanges, detect suspicious behaviors and recover from attacks or failures.

A. MONITORING MESSAGE EXCHANGE
The protocol starts by establishing a session (Definition 1). Figure 3(b) presents the sequence diagram with each step of the protocol, from its creation up to its end. Definition 1. Session: establishment of a virtual connection between a producer-consumer pair, using the control NoC. The session is defined by a unique identifier, known only to the communicating pair. Definition 2. Session key (K s ): unique identifier for a communicating pair, represented by the tuple {rnd, ID p , ID c }, being rnd a random number, ID p the producer task identifier, and ID c the consumer task identifier. The OS of each PE has a table to store K s s.
The consumer starts the protocol in the first Message_ Request packet, transmitted using the data NoC. In parallel, through the control NoC, the consumer sends a Start_Session packet to the producer, with K s in its payload (Definition 2). When receiving this packet, the producer starts the session, by using rnd and the task identifiers. When the producer delivers the requested message, it also transmits the Session_Ack packet confirming the successful session creation, using the control NoC. The  Session_Ack also contains K s to avoid tampered acknowledgments. At this point, both sides of the communication have K s and can exchange packets. Although not implemented in this work, additional security can be obtained by sending a cryptographic key at system startup, allowing to send the session key of each communicating pair encrypted, such as [23,24]. Then, after the creation of the Session, the messages exchange occurs as follows: • The consumer sends two packets to the producer: MRC (Message_Req_Ctrl) through the control NoC and MR (Message_Request) through the data NoC. Numbers 1 and 2 in Figure 3 illustrate this step. • Once the producer receives the MRC, MR and has data to send, it transmits two packets to the consumer: MDC (Message_Deliv_ Ctrl) through the control NoC, and MD (Message_ Delivery) through the data NoC. Numbers 3 and 4 in Figure 3 illustrate this step.
Step four then goes back to step one, until there are messages to be transmitted. Once the consumer finishes its execution, it closes the current session sending an End_Session message (control NoC) to all tasks that produce data to this task. The producers close the session on their side when they receive a End_Session message, clearing all the values used by the protocol.
Two important issues in a message exchange protocol are: (i) correct reception order; (ii) network congestion by transmitted but not consumed packets. The message exchange protocol adopted in this work avoids these two problems. Data transmission does not inject packets into the network but stores them in the OS, until a consumption request, through the reception of a MR packet. Thus, a packet injected into the network is consumed by the receiver, ensuring message ordering, and avoiding network congestion.
Packets transmitted in both NoCs may arrive in any order.
If the data packet arrives before the control packet, the OS stores it up to the reception of the corresponding control packet. The same scenario occurs in the opposite reception order.
The data packet (MR or MD) enables the OS to retrieve part of K s : {ID p , ID c } -steps 2 and 4 in Figure 3. The control packet (MRC or MDC) contains the rnd value -steps 1 and 3 in Figure 3. To validate a data packet, the OS retrieves from the K s table a line matching the received {ID p , ID c , rnd}. The OS accepts the data packet iff the received values match with some line of the K s table.

B. DETECTING SUSPICIOUS BEHAVIOR
Three situations may signalize to the OS a suspicious behavior: (i) a mismatch when comparing K s ; (ii) an unexpected packet arriving at the data NoC without a previous message request; (iii) a timeout in the reception of the data or control packet. Figure 4(a) illustrates a correct packet reception. The straight yellow arrow corresponds to data packets, while the dashed purple arrow refers to control packets.
According to Section II-B, the proposed method may deal with: • Packet Loss: the receiver PE knows that a data packet should arrive due to the reception of a control packet. The present work uses a timeout mechanism to detect this type of event. • Packet Misrouting: for the receiver PE, the effect is similar to a packet loss. However, the misrouted packet goes to a PE that was not expecting data, and as this PE did not receive a control packet, it is discarded. • Port Blocking: this attack may be permanent or intermittent. If it is permanent, it is similar to a packet loss. If the attack is intermittent, the data packet may arrive with a wrong sequence number or a latency higher than a threshold, which is also detected by the timeout mechanism. Figure 4(b) illustrates a packet loss or permanent port blocking due to a fault in the router. The current work uses K s to detect the above threats. A sequence number in the data packet payload enables detecting intermittent packet blocking.
To increase the ability of the method to detect attacks and faults, the control NoC may embed in the payload other parameters, such as a sequence number (currently embedded in the payload of the data packet), a Message Authentication Code (MAC), or a timestamp. The sequence number enables, e.g., the detection of packets dropped by an HT or a faulty link. A MAC enables the detection of corrupted packets. The timestamp allows for detecting anomalous variations on the latency, which may imply a timing attack.

C. RECOVERING FROM ATTACKS OR FAILURES
After detecting an attack or fault, the receiver starts the recovery process. Here, the control NoC also plays a major role. Instead of wasting resources to detect the HT location [16] or the faulty link [31], the proposal adopts a rerouting mechanism. To find a new path, the control NoC avoids the output port used by the producer task and the input port used by the consumer task. These two rules ensure a new path without using the previous routers due to the restrictions imposed on the broadcast transmission. Figure 4(c) shows the consumer PE notifying the producer PE a missing packet detected by the timeout (could be dropped or received in an incorrect order). The use of the control NoC and the broadcast transmission ensures the reception of this packet, avoiding the affected router(s). The producer PE injects a Path_Search message in the control NoC to the consumer PE. If a path exists, this packet arrives at the consumer PE, and the consumer PE starts a backtracking process to the producer PE, with the new path, as shown in Figure 4(d).
Note that the focus of this work is not the detection of the HT(s) or faulty router(s) location. The method uses a path search approach that avoids the previous path. With the new path, the producer PE resends the lost packet (as explained previously, the producer stores the message in a local buffer) using source routing (SR). All subsequent packets use this path up to the detection of a new event. Note that once defined the path, there is no additional overhead in the producerconsumer communication, excepting a slight increase in the latency if the SR path is longer than the previous one (e.g., Figure 4).

V. RESULTS
This section evaluates the Communication Session Protocol in terms of application execution time overhead (Section V-A), overhead on the session handling routines (Section V-B), and analysis of the recovery protocol impact on real benchmarks (Section V-C).
The many-core, described in Section II, is modeled at the RTL level (NoCs and DMNI in VHDL, and the processor and memory in SystemC) [20]. The OS and applications are described in C language. Such low-level simulation generates clock-cycle accurate results.
Each application is a CT G(T, E) (Communicating task Graph), a directed and connected graph. Each vertex t i ∈ T represents a task, and each edge e ij ∈ E represents the communication from t i to t j . Tasks communicate with each other using a message-passing protocol, similar to MPI (Message Passing Interface). Results are gathered simulating seven benchmarks: DTW, MPEG, MWD, MPEG4, Dijkstra, VOPD, and AES in a 4x4 M CSoC. These benchmarks are used in the many-core research community [41,42]. Figure 5 presents two application graphs corresponding to the MPEG and AES applications. The benchmarks have distinct communication models, such as pipeline and master-slave, to avoid biased results. The MPEG application follows a pipeline communication model. The AES application follows a master-slave model, with one task responsible for distributing the computation to other VOLUME xx, 2021 tasks. The AES benchmark is parameterizable in the number of slave tasks (4 or 8) and the number of 16-byte blocks to encrypt (8 up to 512). For example, AES_4_8 requires two iterations to encrypt or decrypt a 128-byte message (8 blocks of 16 bytes), while AES_8_8 requires only one iteration for the same workload. Table 2 compares the applications' execution time between the baseline system and the one with the Communication Session protocol. Two applications follow the pipeline communication model, MPEG and DTW. The execution time overhead for these applications corresponds to 3.79% (MPEG) and 3.55% (DTW). This small performance overhead is due to the communication model. Only one message is exchanged between each communicating task pair per iteration, requiring one session synchronization per iteration.

A. OVERHEAD OF THE COMMUNICATION PROTOCOL
The CTG of the remaining applications follows a masterslave model (AES, Dijkstra), or has a complex CTG with tasks acting as master tasks (MWD, MPEG4, VOPD). For these applications, the execution time overhead ranges from 18.01% to 33.31%. The execution time overhead for these applications is higher due to the number of messages the master task(s) needs to synchronize. The protocol synchronization directly affects the execution time. When a task has many communication dependencies, the synchronization delay is propagated and impacts the sending or receiving of subsequent messages to the next tasks.
The AES application is used to illustrate the effect of protocol synchronization. The increase in the number of slave tasks, from 4 to 8, implies a higher execution time overhead due to the large number of messages to synchronize. The number of iterations the application executes (number of blocks to be handled divided by the number of slave tasks) shows the effect of the synchronization propagation. The AES_4 stabilizes the execution time overhead at approximately 24% from 16 iterations (64 blocks). For AES_8, the execution penalty also stabilizes at 16 iterations, but for 128 blocks, at approximately 27%.
These results define the protocol execution time overhead according to the communication characteristics of the application. We consider that the overhead of up to 27% on application execution time is an acceptable cost considering the security and fault tolerance benefits added by the communication session protocol.

B. OVERHEAD OF SESSION HANDLING ROUTINES
The Communication Session protocol adds additional computation in the OS to handle the protocol packets. There are two new algorithms: the handling of the MRC and MDC packets. In addition, the routines to handle MR and MD packets were modified to interface with the control NoC.
To analyze the impact of the protocol in the OS, Table 3 shows the time (in clock cycles) taken by the OS to handle the messages in the Session protocol, compared to the baseline implementation, considering 128 iterations of the application. The values refer to the AES 4_128 simulation, considering two cases: • Case 1: control packets arrive before data packets, observed in the AES Slaves. • Case 2: data packets arrive before control packets, observed in the AES Master. This happens because the master receives the MR and MD messages from the four slaves almost simultaneously. The OS prioritizes the handling of the data NoC packets to avoid network congestion. Results show that the second case presents a higher overhead for both services because data packets arrive before control packets. In this case, the packets arriving at the data NoC need to be stored and then retrieved when the control packets arrive to validate them.
This experiment also showed the difference between services: the REQUEST takes longer than the DELIVERY. This happens because the REQUEST is also responsible for sending the packet (i.e., execute the DELIVERY) if the producer already has the packet ready when the REQUEST arrives. Table 3 shows that message exchange transactions have a relatively high percentual overhead, but small if we consider it in clock cycles (less than 400 cycles in the worst case). Thus, applications with a pipeline model (such as MPEG) have a minor execution time penalty adding the proposed protocol, as shown in Table 2. On the other hand, due to the serialization in handling messages in applications with a master-slave communication model (such as AES), this overhead accumulates, explaining the higher runtime overhead.

C. RECOVERY COST
After detecting a suspicious behavior, the recovery process starts, as detailed in Section IV-C. The recovery process adopts a rerouting mechanism. With a new path established, all subsequent packets use this path. Thus, the overhead of the recovery process occurs once, when detecting the suspicious behavior.
Two scenarios are simulated to evaluate the impact of the rerouting and packet recovery mechanisms: one with a pipeline application (MPEG) and the other with a masterslave application (AES4), both applications having five tasks. The tasks are mapped inside an OSZ that encloses six routers (yellow-highlighted area), one of them infected by an HT. Figure 6 illustrates the MPEG and HT mapping. This Each application executes ten iterations, with the HT configured to block all ports of the infected router at 3 ms. Figure 7 shows the time taken for each iteration for both applications. Each graph has three curves: • Baseline, execution without the Communication Session protocol; • Session, execution with the Communication Session protocol, without the activation of the HT; • Attack, execution with the protocol, the HT activation at 3 ms, and the time spent for the recovery process. Figure 7(a) illustrates the MPEG application. The application stalls at iteration 6, firing the recovery process in parallel at different PEs. The next iteration, after the recovery process, executes faster. Due to the pipeline structure of the application, data remains buffered in the producer PEs. Once the new path is established, the data is transferred to their targets. Figure 7(b) illustrates the AES application. This application also stalls at iteration 6. Due to the master-slave communication model, it is not possible to buffer intermediate data. Therefore, it is necessary to finish the recovery process to restore the original latency. As shown in Figure 7, the recovery process overhead happens once. After the recovery process, the HT is still active, but the applications are not affected by it. Note that the latency after the attack is the same for the 'session' and 'attack' scenarios. The latency is the same because the new paths have the same number of hops as the original ones. Table 4 presents the execution time for each scenario. The overheads using the Communication Session Protocol are according to the ones presented in Table 2, varying according to the communication model. The MPEG application increases its execution time by 0.8% when it is necessary to reconfigure the paths due to the HT attack. The additional overhead of the AES application is 8,85% for the execution of 10 iterations. These overheads reduce when the number of executed iterations increases.