Self-Calibrated Energy-Efficient and Reliable Channels for On-Chip Interconnection Networks

Energy-efficient and reliable channels are provided for on-chip interconnection networks (OCINs) using a self-calibrated voltage scaling technique with self-corrected green (SCG) coding scheme. This self-calibrated low-power coding and voltage scaling technique increases reliability and reduces energy consumption simultaneously. The SCG coding is a joint bus and error correction coding scheme that provides a reliable mechanism for channels. In addition, it achieves a significant reduction in energy consumption via a joint triplication bus power model for crosstalk avoidance. Based on SCG coding scheme, the proposed selfcalibrated voltage scaling technique adjusts voltage swing for energy reduction. Furthermore, this technique tolerates timing variations. Based on UMC 65 nm CMOS technology, the proposed channels reduces energy consumption by nearly 28.3% compared with that for uncoded channels at the lowest voltage. This approach makes the channels of OCINs tolerant of transient malfunctions and realizes energy efficiency.


Introduction
As design complexity of multicore system-on-chip (SoC) continues to increase, a global approach is needed to effectively transport and manage on-chip communication traffic, and optimize wire efficiency.In addition to shrinking processing technologies, the ratio of interconnection delay to gate delay will increase in advanced technologies [1], indicating that on-chip interconnection architectures will dominate performance in future SoC designs.Therefore, modern SoC designs face a number of problems caused by the communication among multiple processor elements.Additionally, in current multicore SoC designs, reducing power consumption is the primary challenge for advanced technologies.Therefore, process-independent network-on-chip (NoC) has been considered an effective solution for integrating a multicore system.NoC was investigated for dealing with the challenges of on-chip data communication caused by the increasing scale of next-generation SoC designs [2,3].The most important characteristics of NoC can be considered as a packet switched approach [4] and a flexible and user-defined topology [5].Furthermore, on-chip interconnection networks (OCINs) provide the building blocks and the microarchitecture for NoCs [6,7].However, some physical effects in nanoscale technology unfortunately degrade the performance and reliability of OCINs.Moreover, channels in OCINs dominate the overall power consumption [8,9].
On-chip physical interconnections will comprise a limiting factor for performance and energy consumption.For onchip interconnections, three critical issues, delay, power, and reliability must be addressed.For the delay issue, propagation decreases by coupling capacitances.For long global lines, discharging large capacitances takes considerable time.For the power issue, power dissipation increases due to both parasitic and coupling capacitances.Finally, the reliability issue for on-chip interconnections will be degraded due to noise.In advanced technologies, circuits and interconnects degrade further due to noise with decreasing operating voltages.Furthermore, increasing coupling noise, the soft-error rate, and bouncing noise also decrease the reliability of circuits.Thus, self-calibrated circuitry has become essential for nearfuture interconnection architecture designs.
In this paper, we propose a novel self-calibrated energyefficient and reliable channel design for OCINs.The proposed channels reduce the energy consumption while maintaining reliability.The channels are developed using the self-calibrated voltage scaling technique with the self-corrected green (SCG) coding scheme.The rest of this paper is organized as follows.Section 2 will analyze previous reliable and low-power coding schemes.The self-calibrated low-power coding and voltage scaling channels will be presented in Section 3. Sections 4 and 5 will describe the proposed SCG coding scheme and self-calibrated voltage scaling technique, respectively.Additionally, the simulation results will be given in Section 6.Finally, we will conclude the paper in Section 7.

Previous Low-Power and Reliable Interconnect Techniques
To achieve low latency and reliable and low-energy on-chip communication, energy efficiency is the primary challenge for current OCIN designs with nanoscale effects.First, coupling capacitance increases significantly in nanoscale technology.Second, decreasing operating voltage makes the interconnection susceptible to noise increasingly.Due to crosstalk noise, the coupling effect not only aggravates the powerdelay metrics but also deteriorates the signal integrity.Many techniques have been developed to reduce the coupling capacitance effect using bus encoding schemes [10][11][12][13][14][15][16][17][18].Bus encoding is an elegant and effective technique for eliminating the crosstalk effect, and provides a reliability bound for onchip interconnects.Moreover, in order to provide a reliability bound for on-chip interconnects, forward error correction (FEC) and automatic repeat request (ARQ) techniques are widely used in NoC [5,19].Additionally, a joint error correction coding and bus coding technique is an effective solution to resolve delay, power, and reliability.Encoding schemes for low-power and reliability issues were proposed in [20][21][22][23][24][25].The designers increased reliability for on-chip interconnections.Moreover, robust self-calibrating transmission schemes were proposed in [19,[26][27][28], which examined some physical properties of on-chip interconnects, with the goal of achieving fast, reliable, and low-energy communication.
Incorporating of different coding schemes was being investigated to increase system reliability and to reduce energy dissipation.The crosstalk avoidance codes incorporated with forward error correction coding is a solution to provide the low-power and reliable on-chip interconnection.Therefore, duplicate-add-parity (DAP) [20], modified dual rail (MDR) [23], boundary shift code (BSC) [22,23], and hamming codes [20] are the forward error correction coding to increase the reliability of interconnections.A unified framework of coding with crosstalk avoidance codes (CAC), error control codes (ECC), and linear crosstalk codes (LXC) was proposed in [20,21].It provides practical codes to solve delay, power, and reliability problems jointly as shown in Figure 1.CAC avoids specific code patterns or code transitions to reduce delay and power consumption by decreasing crosstalk effect.ECC is able to detect and correct the error bits.However,

Crosstalk avoidance code (CAC)
Error control code (ECC) Linear crosstalk code (LXC) Figure 1: A unified framework for joint crosstalk avoidance code and error correction code.
the parity bits of CAC cannot be modified.In order to reduce the coupling effect of parity bits, LXC is applied without destroying the parity bits.Other approaches are based on the unified framework to improve the ability of error correction and to address signal integrity in OCINs [20][21][22][23][24][25].
CACs are designed to improve the signal integrity and to reduce the coupling effect.The purpose of CAC is to reduce the worst-case switching patterns, which are forbidden overlap condition (FOC), forbidden transition condition (FTC), and forbidden pattern condition (FPC) [20].FOC represents a codeword transition from 010 to 101 or from 101 to 010.In addition, FTC represents a codeword transition from 01 to 10 or from 10 to 01, and FPC represents a codeword having 010 or 101 patterns.In order to reduce or avoid the worst-case switching patterns, many coding schemes are proposed to be directed against the three conditions [25].Forbidden overlap code provides a 5-bit codeword for a 4-bit dataword to eliminate FOC.And forbidden pattern code is also a 5-bit codeword for a 4-bit dataword to avoid FPC in codeword.Additionally, forbidden transition codes provide a 4-bit codeword for a 3-bit dataword to prevent FTC.However, these three coding schemes do not satisfy the forbidden adjacent boundary pattern condition, which is defined as two adjacent bit boundaries in the codes cannot both be of 01-type and 10-type.Hence, one lambda codes is proposed not only to avoid FTC and FPC but also to satisfy the forbidden adjacent boundary pattern condition [25].However, it needs an 8-bit codeword to transfer a 4-bit dataword.
Joint coding schemes based on the unified framework as shown in Figure 1 provide better communication performance.However, these schemes just combine different kinds of codes directly, since the intrinsic qualities of CACs and ECCs are mutually exclusive, except for duplicating codes (DAP, MDR, and BSC) [20,23].In DAP coding, nevertheless, the critical path of the priority bit is much longer than others.Moreover, CAC must be a code that does not modify the parity bits in any way as decoding of ECC has to occur before any other decoding in the receiver.In order to reduce the coupling effect of the parity bits, the linear crosstalk code could be applied without destroying the parity bits.

Self-Calibrated Low-Power and Energy-Efficient Channel Design
The self-calibrated energy-efficient and reliable channels are developed using a self-calibrated voltage scaling technique and a joint bus/error correction coding scheme, which is called the SCG coding scheme.Figure 2 shows the block diagrams of the proposed channels for OCINs.The SCG coding scheme reduces coupling effects and has a rapid correction ability that reduces the physical transfer unit size in routers.The self-calibrated voltage scaling technique achieves the optimal operating voltage for link wires in channels according to the SCG coding scheme.Additionally, the proposed technique overcomes increasing variation in advanced technologies and facilitates the energy-efficient on-chip data communication.Therefore, the proposed selfcalibrated low-power coding and voltage scaling realize energy-efficient and reliable channels for OCINs.
The SCG coding scheme is a joint bus and error correction coding scheme that provides low-energy and high reliability channels for OCINs.The SCG coding scheme is constructed in two stages, the green bus coding stage and the triplication error correction coding stage.In routers, an undecoded code increases the area and energy dissipation of switching circuits by large physical transfer unit sizes.Therefore, the error correction code should be decoded in routers to reduce power dissipation and the area of switching circuits and buffers.The triplication error correction coding stage achieves rapid correction to reduce the physical transfer unit size in routers via a self-corrected mechanism at the bit level.To efficiently reduce the coupling effect, the green bus coding stage is developed using the joint triplication bus power model, which depends upon the characteristics of triplication error correction coding.The SCG coding can avoid the FOC and FPC, and reduce the FTC to achieve the power saving of channels.The bit width in the self-calibrated low-power coding and voltage scaling varies.The green bus coding encodes packets in accordance with a 4-to-5 codec.To increase the reliability of channels, the triplication error correction stage increases bit width from k-bit to 3k-bit.Although the SCG coding increases link wires in channels, on-chip wires are cheap and plentiful with the increasing metal layers in advanced technologies [29,30].
Designers can tradeoff between power consumption and reliability by reducing the operating voltage as the error correction coding increases the reliability of channels.Therefore, the operating voltage of the link wires in channels is adjusted according to the SCG coding scheme using a selfcalibrated voltage-scaling technique.This technique detects error conditions of channels in the triplication error correction stage, and thus feeds the control signals back to the low swing drivers and adjusts the operating voltage of the link wires.The self-calibrated voltage scaling technique determines the optimal operating point to trade off between energy consumption and reliability.The SCG coding scheme and self-calibrated voltage scaling technique are described in Sections 4 and 5, respectively.

Self-Corrected Green (SCG) Coding Scheme
This section describes the SCG coding scheme, a joint bus and error correction coding scheme.This proposed scheme generates low-energy and reliable channels for advanced technologies.The SCG coding scheme is constructed via two stages, the green bus coding stage and triplication error correction coding stage.The green bus coding has the advantages of shorter delay for error correction coding, greater energy reduction, and smaller area than other approaches.The green bus coding is developed using the joint triplication bus power model to achieve additional energy reductions for triplication error correction coding.

Triplication Error Correction Stage.
The triplication error correction coding scheme as shown in Figure 3 is a single error correcting code by triplicating each bit.Based on information theory, a code set with a hamming distance of h has an h − 1 error-detect ability and a [(h − 1)/2] errorcorrection ability.For triplication error correction coding, the hamming distance of each bit is 3. Therefore, each bit can be corrected individually when no more than one error bit exists in the three triplicated bits, which are defined as a triplication set.The error bit can be corrected by a majority gate.Figure 3 also shows the function of the majority gate.Compared with other error correction mechanisms, the critical delay of the decoder is a constant delay of a majority gate and significantly smaller than that of other approaches [19][20][21][22][23][24][25].Restated, the triplication error correction coding has rapid correction ability via self-correction mechanism at the bit level.Therefore, triplication error correction coding is more suitable to OCINs because data can be decoded and encoded in each router using the small delay of triplication correction coding.
Additionally, one advantage of incorporating error correction mechanisms in an OCIN data stream is that the supply voltage of channels can be reduced without compromising the system reliability.Reducing supply voltage, V dd , increases bit error probability.To simplify error sources, we assume bit error probability, ε, is as in the following equation when a Gaussian distributed noise voltage, V N , with variance σ 2 N is added to the signal waveform: where Q(x) is given as Each triplication set can be error-free if and only if no error transmission exists or just 1-bit error transmission exists.For each triplication set, P 1-bit correct is given as

Triplication Channels
Majority gate For k-bits data, transmission is error-free if and only if all k triplication sets are correct.Thus, P k bits correct is given by Hence, word-error probability is For a small probability of bit error, ε, (5) is simplified to By contrast, word-error probability is much smaller than that in the Hamming code and Duplicate-add-parity (DAP) [20,21] which are directed to k 2 ε 2 .Triplication error correction coding can avoid the FOC and FPC which increase energy dissipation via the coupling effect.Because error-correction coding increases the reliability of on-chip interconnections, designers can tradeoff between power consumption and reliability by reducing operating voltage.In simplifying the cumulative effect of noise sources, the noise model on interconnects assumes Gaussian distributed noise with voltage V N and variance σ 2 N is added to the signal.In addition, we assume errors on different link lines are independent.The bit error probability, ε, is given in (1) and (2), where V dd is signal voltage swing.With given the same σ 2 N , the bit error probability is increasing as the signal voltage swing decreases.However, some specific error control/correct coding schemes can decrease signal voltage swing, and guarantee the reliability of interconnections, if and only if the following equation is satisfied: where ε is bit error probability with full swing voltage of 1.0 V, and ε is bit error probability with a lower swing voltage.To obtain the lowest supply voltage for specific error correction coding under the same level of reliability of the uncoded code, supply voltage can be revised as The inverse function of the Gaussian distributed function is also called a probit function Φ(x).The probit function has proved that the function does not have primary primitive.
To solve the problems, this work first approximates the bit error probability by varying voltage swing.By integrating from 100 − V dd /2, the integral range on the x-axis is divided into 0.0001 (V) segments, and each segment can produce a trapezoid.The areas of all trapezoids are then summed, which is the approximation of bit error probability.Therefore, the lowest voltage swing for a specific error correction coding that satisfies (8) can be obtained.
When an uncoded code is operated at full swing supply voltage (1.0 V), different levels of bit error probability, ε, can be obtained by altering the variance of the Gaussian distributed function.Figures 4(a) and 4(b) show the voltages of specific error correction coding versus different uncoded word error rates with k = 8 and k = 32, respectively.Factor, k, is bit width.If bit error probability of an uncode word, ε, is 10 −20 , the specific voltage of hamming code [20], duplication-add-parity code [20,21], joint crosstalk avoidance and triple-error-correction code (JTEC) [24] and the proposed SCG code are 0.705 V, 0.710 V, 0.579 V, and 0.696 V, respectively.The JTEC code uses a double error correction coding stage to enhance error correction and obtains lower voltages.However, delay and area overheads of the JTEC are much worse than those of other approaches.Compared to other ECC codes, the proposed SCG code has better characteristics in that the lowest supply voltage increases slowly when the uncoded word error rate increases.

Joint Triplication Bus Power Model.
Although triplication error correction coding can avoid many forbidden conditions, some power-hungry transition patterns cannot be eliminated entirely.These patterns are mainly generated by the FTC and self-switching activity.The FTC can be satisfied when a bit pattern does not have a transition from 01 to 10 or from 10 to 01.This work modified the RLC cyclic bus model in [31] by considering loading capacitances and coupling capacitances.Figure 5(a) shows the modified model with a four-bit bus, where C1 means the loading capacitance of line 1 and the C12 is the coupling capacitance between line 1 and line 2.Moreover, the bus lines are parallel and coplanar.Most of the electrical field is trapped between adjacent lines and the ground.Figure 5(b) shows an approximate bus power model that ignores the parasitic capacitances between nonadjacent lines.
We assume all grounded capacitors have the same value without considering the fringing effect of boundary lines, because fringing capacitors are much smaller than loading and coupling capacitors, even for the wide buses.Therefore, this work utilized a joint triplication bus model to implement the bus coding stage to further reduce energy consumption.For a 4-bit triplication bus, the capacitance matrix C t can be expressed as The parameter, λ, is defined as the ratio of coupling capacitance, C x , to loading capacitance, C L .Therefore, the λ parameter depends on the technology, the specific geometry, the metal layer, and bus shielding.λ has some important properties; for example, the parameter λ typically increases with technology scaling.For instance, the value of λ is between 6 and 10, depending on the metal layer for standard 65 nm CMOS technology and the minimum distance between wires.The parameter λ should be much larger in advanced technologies.Additionally, the coefficient of loading capacitances is 3 for the three triplicated bits.
Five transition states exist between two adjacent lines, four of which are described in [32].These five types can be separated into two cases.The first case is static transitions, including type I (single line switching), type II (two lines switching in opposite directions), and type III (no switching or two lines switching in the same direction) as shown in Figure 6.The other case is dynamic transitions which include type IV and type V with signal aliasing for type II and type III, respectively.The static transition is defined as two adjacent lines switching at the same time without noise or different delays.The dynamic transition means that the two adjacent lines may be misaligned.
The power consumption formula is shown in (10), where E and P are energy and power density, respectively; f and V (V dd ) are frequency and voltage (voltage supply), respectively.B i is the current voltage level (1 or 0) for line i, and B −1 i is the previous voltage level for the line i; Power density, P, can be transferred into The items in (11) are defined and identified as follows: where The r i means that a switch of line i exists and is not concerned with the direction of change and adjacent lines.This item, r i , only considers loading capacitances.The meaning of r i ⊕ r j is that only one line is changing between two lines of i and j (Type I).Additionally, d i j indicates that two lines change in opposite directions (Type II and Type V).Moreover, compared with the other two definitions, r i and r i ⊕ r j , the voltage difference across the coupling capacitance is double and when squared it factors 4 for d i j .Using (12), the power formula can be obtained as (13) with the parameter of λ.The term α is the coefficient of coupling effects and switching activities.Except for Type IV, the five transition states are all considered in this power formula:

Green Bus Coding
Stage for Crosstalk Avoidance.The purpose of the green bus coding stage is to minimize the value of α in ( 13) by encoding signals when λ > 2. Figure 7 shows design flow of green bus coding.First a triplication capacitance matrix is established using the RLC cyclic model.Then the power formula with coefficient α is derived, where α represents the switching factor by considering coupling capacitances.The green bus coding stage only affects coefficient α.Furthermore, the codeword minimizes the value of α and maps the codeword to the dataword.Depending on the mapping between the codeword and dataword, the green bus coding stage can be implemented.
According to the design flow of the green bus coding stage, the modified switching activity, α, should be minimized.Therefore, to converter the 4-bit dataword into a 5bit codeword, a 32 × 31 transition state table is established by calculating α.Thus, 16 transition patterns are selected with minimal values of α as the codeword to eliminate crosstalk.The green bus coding chooses a 4 : 5 code to minimize α depending on the energy saving bound and the latency of codec.In a data bus, the bit width of a data is usually  a multiple of 4. Therefore, the energy-saving bound of 4 : 5 to 4 : 8 codes are between 40% to 55% from the energy-saving bound analysis of [33].However, the latency of the codec will increase significantly as the size of a codeword increases.Figure 8(a) shows the relationships between the 4-bit dataword and 5-bit codeword.According to the relationships, the data-word can be grouped into two sets, the original set and the converted set as shown in Figure 8(b).When transmitted data are in the converted set, the green bus coding stage converts the data into the original set via oneon-one mapping.Meanwhile, the converted bit, c4, will be asserted, and c0 and c2 will be inverted and mapped to the original set.Notably, x1 and x3 will always not be modified.
Figure 9 shows the circuit implementation of green bus coding, including the encoder and decoder.The circuitry of green bus coding is more simple and effective than other approaches using the joint triplication bus model.An extra shielding line to reduce the coupling effect is not needed between two adjacent 5-bit codewords because the boundary data of the 5-bit codeword are set to roughly 0. Table 1 shows the comparisons between green bus coding and increasing wire spacing when λ = 8.Although increasing wire spacing can achieve more energy reduction than green bus coding, it has great amount of area overhead.Additionally, the energydelay product (EDP) of green bus coding is smaller than that of double wire spacing.
The proposed green bus coding stage has the following properties.
(1) Use c4 as the detection bit to decode c0 and c2.It can simplify the circuitries of encoder and decoder, especially that of the decoder.
(2) The encoded bit always equals the data bit at certain bit positions, where c1 = x1 and c3 = x3.(3) By focusing on the joint bus and error correction coding scheme, the SCG coding scheme can avoid FOC and FPC and reduce FTC to further reduce power consumption.

Original set Datawords
(4) Adding extra shielding lines to reduce the coupling effect between two adjacent codeword with increasing coding bits is unnecessary.

Self-Calibrated Voltage Scaling Technique
The proposed self-calibrated voltage scaling technique is applied to reduce the operating voltage of channels for energy  Based on the SCG coding scheme, the triplication error correction coding stage can correct errors for link wires.The SCG coding scheme allows for reductions in signal voltage swing and, at the same time, achieves the same word error rate of uncoded link wires.When the bit error rate is in the range from 10 −20 to 10 −10 , a 0.7 V signal swing for link wires can maintain the same reliability with the uncoded code at 1.0 V as shown in Figure 4. Therefore, a low swing driver and level converter are implemented with three voltage levels as shown in Figure 11, which are high voltage (HV = V dd ), middle voltage (MV = V dd − V t ), and low voltage (LV = V dd − 2V t ).The PMOS diodes are utilized to produce low swing voltages as shown in Figure 11(a) by low-V t PMOS.In UMC 65 nm CMOS technology, the threshold voltage of normal-V t and low-V t PMOS are 0.25 V and 0.15 V, respectively.Therefore, the voltage level will be two levels by normal-V t device.In order to realize the lowest voltage, 0.7 V, low-V t PMOS, and three voltage levels are selected.Three control signals, S0-S2, determine the voltage swing of link wires, and Figure 11(a) shows the relationships between control signals and voltages.Based on the different voltages, the low swing driver and level converter can be implemented as shown in Figures 11(b) and 11(c), respectively.Therefore, the timing overhead of switching voltage can be in one cycle.
Figure 12 shows the control policy and voltage state diagram of the self-calibrated voltage scaling technique.Therefore, the crosstalk-aware test error detection stage is triggered by T start, and crosstalk-aware test vectors are generated.Test results are compared by the test error detector.Initially, the crosstalk-aware test vectors are transmitted at the lowest voltage level of 0.7 V.In terms of error correction coding, the error should be zero by the test error detector.If the error detector detects errors, the test vectors will be transferred again with a relatively higher voltage (0.85 V or 1 V).The initial voltage swing of link wires is determined until the test result is free of errors.When the test is finished, the run-time error-detection stage will be activated.
After the crosstalk-aware test error detection stage, the run-time error detection stage raises V scale to trigger a scaling mechanism within every N clock cycles window.Based on the error rate, the voltage control unit can further increase or decrease the signal voltage swing during run-time.But In the  [36].The MAF-based test patterns are a simple pattern stream that represents six different crosstalk effects: rising speedup (Sr), falling speedup (Sf), rising delay (Dr), falling delay (Df), positive glitch (Gp), and negative glitch (Gn).For test wires with n-bits, one victim line and n − 1 aggressor lines exist.All aggressor lines switch simultaneously to generate speedup, delay, or glitch error on the victim line.The MAF test vectors can achieve high error coverage.Additionally, the MAF-based test can be considered as an aggressive test that covers other pattern transition cases.To test n-bit on-chip interconnects, six fault models must be tested on each line.Therefore, testing n-bit needs 6n test pattern transitions to complete an MAF-based test.
The test pattern generator of the MAF-based self-test methodology is implemented by the finite state machine (FSM).The FSM needs a minimum of 8 cycles to complete six faults tests on one victim line, indicating that the test pattern generator requires 8n cycles to complete an n-bit MAF test.Test time is much shorter than that of the linear feedback shift register.The FSM, which is triggered by T start signal, generates the values of the victim line and the aggressor line, counter reset (C reset) and counter enable (C enable).After each circle (states S1-S8) of the FSM, C enable triggers the victim counter.The decoder and output 2-to-1 MUX are selected to ensure that the data bit (Di) selects the correct value (victim or aggressor value) during the test.When the value of the victim counter (C value) is equal to n − 1 in the S8 state, the test is finished and returns to the S0 state.

Run-Time Error Detection Stage.
The run-time error detection stage detects timing variations of link wires.Timing delay variations of on-chip interconnections are due to crosstalk noise, process variations, temperature variations, and other noises.To overcome timing error, the master-slave flip-flop (MSFF) [37] and double sampling data checking technique [38] have been proposed to detect timing errors.The MSFF contains a master flip-flop and a slave flip-flop, both of which operate at the same frequency.However, the slave flip-flop is positively triggered by a delay clock (Δt) which is proportion to master flip-flop.We assume the data captured by the slave flip-flop is correct.The data captured by the master flip-flop and the slave flip-flop are compared using an XOR gate; an error-flag is generated when the two data are not identical.When an error occurs, the control circuit stalls pipeline data flow for 1 clock and the slave flipflop resends correct data to the master flip-flop.The principle of the double sampling data checking technique is similar to that of the MSFF.
The timing delay variation of on-chip interconnects affects the design on Δt.The different propagation delay on the on-chip interconnection caused by crosstalk is due to different pattern transients.For the increasing timing variation of  on-chip interconnections, detecting timing error is difficult for various voltage levels.However, the MSFF and double sampling data checking technique are limited by the clock period and fixed delay line, respectively.Therefore, the runtime error detection stage is constructed using the adaptive timing borrowing technique as shown in Figure 10.The adaptive timing borrowing technique modifies the double sampling data checking technique with the adaptive delay line.In addition, the adaptive timing borrowing technique also has correction ability via a multiplexer.The modified double sampling data checking technique with the adaptive delay line has the adaptive timing borrowing ability to borrow timing from the next clock period.Figure 14 presents analytical results for timing constraints.To ensure that functionality of the modified double sampling data checking technique is correct, time interval Δt must be set appropriately, and each pipeline stages must be considered.If the delay between DFF1 and DFF2 exceeds l clock cycle, error sampling data of DFF1 are induced.The maximum data path delay can be extended to 1 clock cycle plus time interval Δt, as in (14), where t DFF is the clock to Q delay of the D flip-flop, and t d is the data path delay (from the input of the low swing driver to the output of the level converter), t XOR is the XOR propagation delay, and t setup is the setup time of the D flip-flop, DFF3 samples the comparison signal, which compares sampling data before DFF2 and after DFF2.In addition, DFF3 must sample the comparison signal before next datum arrives.Therefore, Δt should be satisfied as Additionally, the pipeline stages after the double sampling data checking stage must satisfy basic constraints, as in the following equation, to avoid the excessive timing borrowing: Equations ( 14) and ( 15) are the timing conditions that avoid error detections, ( 16) is the timing condition that prevents setup timing violation of the sequential circuitry.According to ( 14)-( 16), the upper and lower bounds of time interval Δt are derived by the following equation.When the time interval Δt is appropriate, the run-time error detection stage corrects error data and provides run-time error rate information, allowing the self-calibrated voltage scaling technique to adjust the voltage swing levels of link wires: If ( 14) is not satisfied, a type I statistical error occurs.The double sampling data checking technique cannot detect true errors, and suppose that the sampling data would be correct.On the other hand, if (15) is not satisfied, the type II statistical error occurs.The double sampling data checking technique then misjudges and asserts an error flag when the transferred data is correct.Timing delay variation is caused by the crosstalk effect, process variation, width variation, and voltage variation.In view of increasing timing variation, the adaptive delay line is an effective solution that satisfies these conditions.Furthermore, data path delay t d is affected significantly by operating voltages and input vectors.Therefore, the adaptive delay line can generate three time intervals Δt for different signal voltage levels to satisfy the timing condition in (17); thus, the adaptive delay line can be implemented by a digital control delay line with MUXs.Adjusting the time interval Δt guarantees the functionality of double sampling data checking technique with different voltage swing levels and process variations.

Simulation Results
This section presents simulation results demonstrating the improvement in energy and reliability via the SCG coding scheme and the self-calibrated voltage scaling technique.All simulation results are based on UMC 65 nm 1P9M CMOS technology.For OCINs, the metal layers can be categorized into upper-level, middle-level, and lower-level, respectively.In most cases [39][40][41], the upper-level metal layers are routed for power grids and global clock distribution via low resistance metals.Additionally, the lower-level metal layers are routed for local resources.Therefore, the characteristics of link wires between interprocessor elements are set as metal-6 with a minimum width and spacing of 0.10 μm in UMC 65 nm 1P9M CMOS technology.Simulation results include analysis of different error-correction coding schemes, energy-delay product (EDP) of different joint coding schemes, energy saving of SCG coding in an 8 × 8 mesh network, process-variation timing analysis, and analysis of the self-calibrated voltage scaling technique.
Table 2 lists different combinations of joint coding schemes, such as the hamming code (HC), FTC+HC, FOC+HC and boundary shift code (BSC) in [23], one lumbda code (OLC)+HC and DAP+shielding (DSAP) in [25], JTEC in [24], and the proposed SCG coding scheme.Additionally, Table 2 summarizes different joint coding schemes for 8-bit link wires, which consist of the physical transfer unit size in channels and routers, the maximum delay and average energy of link wires, and the corresponding lowest supply voltage.Table 2 also summarizes the codec of different approaches, including the corresponding codec area, power, and latency.The lowest supply voltages are theoretical values from Figure 5 when ε = 10 −20 .The JTEC uses double error correction coding to enhance error correction.However, codec overhead and energy dissipation (unoptimized JTEC for 8-bit) are much worse than those of other approaches.Although the JTEC can reduce the supply voltage to the lowest point at the same uncoded word-error-rate, the latency is larger than others due to long chains of XOR gates.Furthermore, the lowest voltage of JTEC increases rapidly as bit error rate increases.
Except for the SCG coding, DAP and DSAP, the critical delays of other codec are larger than 0.5 ns.Consequently, these codecs are not appropriate for integration into highspeed routers.Therefore, the physical transfer unit sizes in routers of these codecs are bigger than that of proposed coding scheme; thus network area and energy consumption increase.The delay of green coding stage and triplication error correction stage are 0.28 ns and 0.09 ns, respectively.And the power consumption of triplication error correction stage is only 41.5 μW.Hence, the proposed SCG coding scheme has the smallest codec overhead.Additionally, the green bus coding stage is only integrated in the sender node and receiver node.
The delay and energy of link wires are calculated via the delay model and energy model given by [33], where τ 0 is defined as the delay of a crosstalk-free wire.The proposed SCG coding scheme achieves the most energy reduction by reducing coupling effects on link wire, and avoids the FOC and FPC by the triplication error correction coding stage.Additionally, the SCG coding scheme can reduce the FTC and self-switching activities using the green bus coding stage depending on the joint triplication power model.Although the triplication error correction stage triplicates transferred data and increases the physical transfer unit size on link wires, it also enhances data reliability and avoids the worst crosstalk patterns.Moreover, the delay can be reduced from (1 + 4λ)τ 0 to (1 + 2λ)τ 0 .
Figure 15(a) shows the energy-delay product (EDP) reduction compared to uncoded code under different λ values.Coefficient λ is defined as the ratio between coupling capacitance of two adjacent lines and loading capacitance.The energy and the delay are measured as the average energy dissipation in 1ns and the propagation delay from the transmitter to the receiver, respectively.The proposed SCG coding achieves the highest EDP reduction regardless of the value of λ.Through the tradeoff between reliability and power consumption, the signal swing levels of specific codes can be reduced further to the lowest values based on the error correction abilities.The lowest signal swing guarantees the same level of word error rate as that of the uncoded code.schemes.When λ equals 4 with a full swing signal (1.0 V), the SCG coding scheme can achieve a 34.34%EDP reduction compared to uncoded word and a 56.54%EDP reduction relative to that achieved by traditional hamming codes.The coding schemes can further increase EDP savings at the lowest operating voltages.In Figure 15(b), the proposed SCG coding achieves a 67.29%EDP saving relative to that achieved by the uncoded word when λ is 4 and operating voltage is 0.69 V.
The proposed SCG coding is also simulated with different lengths of link wires.Figure 16 shows the simulation environment setup with different number of routers (N ) and various lengths (M) of link wires.The green bus coding stage is only integrated in the routers of the sender node and receiver node.The architecture of the routers is set as 5 input/output ports with 4-stage pipeline for mesh interconnection networks.The first stage includes switch setup, error correction decoder, and header decoder.The second stage and third stage are routing traversal and arbitration, respectively.The final stage is error correction encoder and link wires.The length of link wires is set as M μm of metal-6 with a minimum width and spacing of 0.10 μm.The clock frequency is as high as 1 GHz. Figure 17 illustrates energy reduction with different number of routers (N ), different lengths (M) under the normal voltage (1.0 V), and lowest voltage (0.7 V).According to some NoC chips [39][40][41], the length of link wires is set from 200 μm to 1800 μm.The energy reduction increases while the length of link wires increases.Additionally, both reducing coupling effect and supply voltage can achieve significant energy saving by the SCG coding scheme.
Figure 18 shows the energy dissipation of an 8 × 8 mesh interconnection network with different joint CAC and ECC coding schemes under their lowest supply voltages.The simulation environment is set as an 8 × 8 mesh topology with uniform random patterns.The routing and arbitration algorithms are XY routing and round robin, and The FIFO depth of each output buffer is 8 flits.The size of each flit size is 32 bits.The length of link wires is set as 800 μm of metal-6 with a minimum width and spacing of 0.10 μm.The clock frequency is as high as 1 GHz.In order to reach 1 GHz, the 32-bit uncoded data is divided into four 8-bit groups for different joint CAC and ECC coding schemes.The proposed SCG coding scheme can realize the most energy saving compared to other joint CAC and ECC coding schemes.
The self-calibrated voltage scaling technique is designed and simulated with the SCG scheme based on UMC 65 nm CMOS technology.The length of link wires is set as 800 μm of metal-6 with a minimum width and spacing of 0.10 μm.The clock frequency is as high as 1 GHz.Therefore, the timing of link wires should be analyzed with different voltage levels and process variations.The different transient patterns must also be considered.This analysis can help designers implement the adaptive delay line and guarantee correct function of the double sampling data check mechanism.The modified double sampling data checking circuit provides error information for the self-calibrated voltage scaling mechanism during run-time.However, the time interval, Δt, must satisfy the constraint discussed in Section 5.The data path delay, t d , is clearly affected by voltages (swing levels of link wires) and input data vectors.Additionally, PVT (process, voltage, and temperature) variation affects both devices and on-chip wires.Therefore, the delays of link wires are analyzed using Monte Carlo simulations of PVT variation at different voltage levels.
Figures 19(a)-19(c) show the data path delay, t d , of rising speedup (Sr) case, falling speedup (Sf) case, rising delay (Dr) case, falling delay (Df) case, normal rising(Nr) case and normal falling (Nf) case under high voltage (1.0 V), medium voltage (0.85 V), and low voltage (0.7 V), respectively.The supply voltages have a 15% variation in 3σ range and the means are 1.0 V, 0.85 V, and 0.7 V.The maximum value and minimum value of t d occur in the Dr case and Sf case.The maximum and minimums value under 0.7 V, 0.85 V and 1 V are 910/485 (ps), 619/333 (ps), and 471/271 (ps), respectively.According to ( 12)-( 15), the upper bounds of Δt under 0.7 V, 0.85 V and 1 V are about 485 ps, 333 ps, and 271 ps, respectively.Operating voltage obviously influences the timing interval.Therefore, the adaptive delay line can generate three time intervals, Δt, for different signal voltage levels: 450 ps, 300 (ps), and 200 (ps), which are 45%, 30%, and 20% of a clock period.Therefore, the adaptive delay line can be designed using a digital control delay line.Adjustments to the time interval guarantees functionality of double sampling data checking technique at different voltage swing levels and process variations.Nevertheless, analysis indicates that timing delay variation on link wires is much smaller under high operating voltage.In other words, if the error rate detected by the double sampling data checking technique increases, the control unit will increase the voltage to narrow the timing variation and enhance reliability.In OCINs, link wires in channels dominate the overall power consumption in advanced technologies.The proposed SCG coding scheme eliminates most crosstalk effects and achieves energy reduction.From Figure 15(b), the EDP reduction of low swing link wires can reach above 60% compared with that of an uncoded bus when low swing drivers are operating at 0.7 V.The proposed self-calibrated voltage scaling technique finds the optimal operating voltage, and the tradeoff between energy consumption and reliability is determined by the self-calibrated circuitry.However, the power overhead of the self-calibrated voltage scaling technique reduces the energy efficiency of the channels.Figure 21 shows the energy analysis of the proposed selfcalibrated energy-efficient and reliable channels at different voltages.The wire length is set as 1800 μm.The SCG coding  stage reduces the energy consumption about 14.1% by decreasing the coupling effect and self-switching activities.From Figure 21, the total overhead of the SCG coding scheme and self-calibrated voltage scaling technique is roughly 6.9%.To elucidate the energy overhead, the right side in Figure 21 shows the energy breakdown of the SCG coding and self-calibrated voltage scaling.The double sampling data checking mechanism with the adaptive delay line accounts for almost 80% of energy overhead as a large number of flipflops is needed.If error correction decoders are moved to before the run-time error detection stage, energy overhead can be reduced by decreasing the number of flip-flops to one-third.However, not only reliability will deteriorate, but the range of adaptive timing borrowing will degrade.Therefore, this is again a tradeoff between reliability and energy consumption.
Table 3 lists the summaries of the SCG coding scheme and self-calibrated voltage scaling technique, including area overhead in a router, energy overhead and energy reduction in channels.The wire length is also set as 1800 μm.The energy reduction of the self-calibrated voltage scaling technique is due to the low swing of link wires.The total area overhead is about 14.4% related to a router, which is using X-Y routing and round-robin arbitration.The router architecture is set as 5 input/output ports with 4-stage pipeline.And the FIFO depth of each output buffer is 8 flits.The size of each flit size is 32 bits.The area breakdown of adaptive double sampling data checking, MAF-based test generator and voltage control unit in the self-calibrated voltage scaling are 71%, 8%, and 21%, respectively.

Conclusion
The physical effects of crosstalk and PVT variations in nanoscale technologies degrade the performance of on-chip interconnection networks (OCINs).This work uses a combination of a self-calibrated voltage scaling technique and a selfcorrected green (SCG) coding scheme to overcome increasing variations and achieve energy-efficient on-chip data communication.The SCG coding scheme is used to construct reliable and energy-efficient channels.The SCG coding scheme has two stages, the triplication error correction coding stage, and the green bus coding stage.Triplication error correction coding is a reliable mechanism that achieves rapid correction ability to reduce the physical transfer unit (phit) size in routers via self-correction at the bit level.Green bus coding reduces energy reduction significantly via a joint triplication bus power model that eliminates crosstalk effects.The self-calibrated voltage scaling technique is designed with the SCG coding scheme.The self-calibrated voltage scaling technique adjusts the voltage swing of link wires via two error detection stages, the crosstalk-aware test error detection stage and run-time error detection stage.Furthermore, the self-calibrated voltage scaling technique is tolerant to timing variations of channels.Based on UMC 65 nm CMOS technology, the proposed self-calibrated energy-efficient and reliable channels reduce energy consumption by nearly 28.3% compared with that of uncoded channels at the lowest voltage.

Figure 4 :
Figure 4: The corresponding voltages of specific error correction coding versus different uncoded word-error-rate with (a) k = 8 and (b) k = 32.

Figure 5 :
Figure 5: (a) Bus model for 4 bits.(b) The approximate bus power model.

Figure 6 :
Figure 6: Five transition types for two adjacent wires.
Derive power formula with the coefficient α Find the codeword to minimize the value, α Circuit implementation Transition definition Establish triplication capacitance matrix C t by RLC cyclic model Map the codeword to dataword

Figure 7 :
Figure 7: The design flow of the green bus coding stage.

Figure 8 :Figure 9 :
Figure 8: (a) The mapping table between 4-bit dataword and 5bit codeword of the green bus coding stage.(b) The two sets and Boolean expression of the green bus coding stage.

Figure 10 :
Figure 10: The block diagrams of self-calibrated voltage scaling technique with crosstalk-aware test error detection stage and run-time error detection stage.

Figure 12 :Figure 13 :
Figure 12: The control policy of self-calibrated voltage scaling technique.

Figure 15 :Figure 16 :
Figure 15: The energy-delay product (EDP) reduction to uncoded code under different values of λ with (a) full swing signal and (b) the lowest swing signal.

40 NMFigure 17 :Figure 18 :
Figure 17: Energy reduction under different lengths of link wires and different number of routers.
Figure 15(b) shows the energy reduction compared to uncoded code under different λ values and the lowest signal swing level.Simulation results indicate that the proposed SCG coding realizes more EDP saving than other joint coding

Figure 19 :
Figure 19: The data path delay (t d ) distributions of rising speedup, falling speedup, rising delay, falling delay, normal rising, and normal falling cases under (a) high voltage (1.0 V), (b) medium voltage (0.85 V) and (c) low voltage (0.7 V).

Figure 20 6 Phase 1 :Figure 20 :
Figure 20: Voltage levels of the self-calibrated voltage scaling technique under six phases with different noise distributions and timing variations.

Figure 21 :
Figure 21: Energy analysis of the self-calibrated energy-efficient and reliable interconnection architecture.

Table 1 :
Comparisons between green bus coding and increasing wire spacing.Figure10presents the block diagrams of the selfcalibrated voltage scaling technique.This technique is constructed by comprising low swing drivers, level converters, voltage scaling control unit, crosstalk-aware test error detection stage, and run-time error detection stage.Depending on the detections about the two error detection stages, the voltage control unit adjusts voltage swing levels of the link wires.The crosstalk-aware test error detection stage detects errors by maximal aggressor fault (MAF) test patterns in the test mode.The run-time error detection stage detects errors using the double sampling data checking technique and the adaptive delay line.Moreover, the self-calibrated voltage scaling technique is tolerant of timing variations by the adaptive timing borrowing technique.In response to detected errors, the self-calibrated voltage scaling technique can reduce voltage swing for energy reduction and guarantee the reliability is still in the confidence interval simultaneously.
overhead, fast test time, and high error coverage.Depending on test vectors, therefore, the test error detector can detect error data following error correction coding.The crosstalk-aware test vectors are generated by a test pattern generator with the maximal aggressor fault (MAF) model as shown in Figure13 voltage in the run-time error detection stage cannot be lower than the voltage level determined by the crosstalkaware test error detection stage.The error rate is defined as the ratio of the error data to the total transmission data in one window.If the error rate is less than 5%, signal voltage swing is reduced one level or kept at the lowest safe signal.pseudorandom pattern sequences.By changing the feedback polynomial of the LFSR, the LFSR generates different subsets of the maximum-length LFSR (maximum 2 n − 1 patterns when the LFSR tests n-bits data with primitive polynomials).However, test patterns generated by the LFSR-based TPG are complicated and require a long test time to achieve high error coverage.Hence, a better self-test methodology is needed to achieve low hardware

Table 2 :
Summaries of different joint coding schemes for 8-bit link wires.

Table 3 :
Summaries of SCG coding and self-calibrated voltage scaling.