Coarse-Grained Online Monitoring of BTI Aging by Reusing Power-Gating Infrastructure

In this paper, we present a novel coarse-grained technique for monitoring online the bias temperature instability (BTI) aging of circuits by exploiting their power gating infrastructure. The proposed technique relies on monitoring the discharge time of the virtual-power-network during standby operations, the value of which depends on the threshold voltage of the CMOS devices in a power-gated design (PGD). It does not require any distributed sensors, because the virtual-power-network is already distributed in a PGD. It consists of a hardware block for measuring the discharge time concurrently with normal standby operations and a processing block for estimating the BTI aging status of the PGD according to collected measurements. Through SPICE simulation, we demonstrate that the BTI aging estimation error of the proposed technique is less than 1% and 6.2% for PGDs with static operating frequency and dynamic voltage and frequency scaling, respectively. Its area cost is also found negligible. The power gating minimum idle time (MIT) cost induced by the energy consumed for monitoring the discharge time is evaluated on two scalar machine models using either x86 or ARM instruction sets. It is found less than $1.3\times $ and $1.45\times $ the original power gating MIT, respectively. We validate the proposed technique through accelerated aging experiments conducted with five actual chips that contain an ARM cortex M0 processor, manufactured with a 65 nm CMOS technology.

Coarse-Grained Online Monitoring of BTI Aging by Reusing Power-Gating Infrastructure Vasileios Tenentes, Member, IEEE, Daniele Rossi, Member, IEEE, Sheng Yang, Saqib Khursheed, Bashir M. Al-Hashimi, Fellow, IEEE, and Steve R. Gunn Abstract-In this paper, we present a novel coarse-grained technique for monitoring online the bias temperature instability (BTI) aging of circuits by exploiting their power gating infrastructure.The proposed technique relies on monitoring the discharge time of the virtual-power-network during standby operations, the value of which depends on the threshold voltage of the CMOS devices in a power-gated design (PGD).It does not require any distributed sensors, because the virtual-powernetwork is already distributed in a PGD.It consists of a hardware block for measuring the discharge time concurrently with normal standby operations and a processing block for estimating the BTI aging status of the PGD according to collected measurements.Through SPICE simulation, we demonstrate that the BTI aging estimation error of the proposed technique is less than 1% and 6.2% for PGDs with static operating frequency and dynamic voltage and frequency scaling, respectively.Its area cost is also found negligible.The power gating minimum idle time (MIT) cost induced by the energy consumed for monitoring the discharge time is evaluated on two scalar machine models using either x86 or ARM instruction sets.It is found less than 1.3× and 1.45× the original power gating MIT, respectively.We validate the proposed technique through accelerated aging experiments conducted with five actual chips that contain an ARM cortex M0 processor, manufactured with a 65 nm CMOS technology.Index Terms-Aging, bias temperature instability (BTI), power gating, sensor.

I. INTRODUCTION
B IAS temperature instability (BTI) is the major aging mechanism in very deep submicron CMOS technologies [1].It induces detrimental effects to devices, such as performance degradation, which can lead to in-the-field failures.Many techniques for monitoring online the BTI provide a warning about imminent faults by focusing at its local detrimental effects.They monitor, in a fine-grained fashion, devices or paths in a design that are more vulnerable to aging [2]- [12].
The sensors utilized for fine-grained BTI monitoring fall mainly into two categories: sensors monitoring path delay [2], [3] of logic circuits and sensors monitoring frequency drift in ring oscillators [4]- [6].The former require the sensitization of critical paths providing a warning indication, when the path delay has violated a predefined delay threshold.The latter integrate ring oscillators at stressed areas and monitor the aging status of the sensors.Hybrid methods also exist [7], [8].Other methods [9]- [11] reduce the area cost by selecting a subset of critical paths to monitor.However, many paths of modern circuits can become critical in-the-field due to temperature and workload variability [1], [12].Therefore, for online fine-grained BTI monitoring, multiple devices or paths should be monitored at various predefined delay thresholds, impacting inevitably design complexity and area/power cost.
Many online applications require a global indication about the BTI status of a circuit without a warning indication about imminent faults.For such applications, a low-cost indication about the BTI status of a design, in a coarse-grained fashion, can be practical, and the high cost of fine-grained monitoring could be avoided.One such application is the reliability management of multicore systems that requires a BTI indication for balancing workload among identical cores under long-term reliability constraints.Such cores share similar workload and, therefore, similar fine-grained degradation characteristics.Another application is the dynamic thermal/ power management (DTM) of system-on-chips (SOCs), such as those of smart SOCs [13], [14], that tune online power reduction techniques [15], [16] according to measurements provided by on-chip sensors.Recent results [17], [18] show that the BTI-induced threshold voltage V th degradation of the CMOS devices is not only accompanied by detrimental effects, but also by some benefits.Leakage power reduction techniques become more efficient [17], [19] and static power consumption decreases over time [18], [20].Therefore, for the DTM systems to harvest such aging benefits, a coarsegrained BTI indication would suffice.Finally, fine-grained BTI monitoring is not very practical for memories.
In this paper, we present a novel coarse-grained BTI aging monitoring technique, which is applicable on power-gated designs (PGDs).Power gating has already been proven as an effective solution to tackle static power consumption and has been widely adopted in many modern processors [21].We show that the leakage current reduction of BTI aging in nanometer technologies [17], [18] impacts considerably the virtual-power-network discharge time during the standby of a PGD.The proposed technique consists of a hardware block for measuring online the virtual-power-network This work is licensed under a Creative Commons Attribution 3.0 License.For more information, see http://creativecommons.org/licenses/by/3.0/discharge time, and a processing block for estimating the BTI aging status of the PGD according to the collected measurements.The proposed technique provides an indication about the average aging status of all the CMOS devices in the PGD, and cannot be used for providing a warning about imminent faults.However, it features some advantages over path-based monitoring techniques.First, the discharge time is measured on the virtual-power-network, which is already distributed in the PGD, and thus distributed sensors are not required.Second, high aging estimation resolution is achieved, because the impact of aging on the discharge time is on the order of hundreds of nanoseconds, while on path delay it is on the order of picoseconds.Third, it is also applicable to memories, because the discharge time is sensitive to the aging status of all the CMOS devices in the design, and the workload is not required to be known during design.Finally, the proposed technique is performed concurrently with normal standby operations, enabling the harvesting of BTI static power reduction benefits by online applications, such as the DTM system of SOCs.To the best of our knowledge, this is the first coarse-grained technique for online BTI monitoring.
The remainder of this paper is organized as follows.The SOC architecture with DTM and the discharge time of the virtual-power-network, denoted as d V hereafter, are introduced in Section II.The results of static power consumption reduction on designs due to BTI aging are also discussed.The proposed technique for monitoring the average threshold voltage degradation induced by BTI, which consists of an on-chip d V sensor and a processing block is presented in Section III.The performance and the area cost of the proposed technique are evaluated by means of SPICE simulation of IWLS'05 [22] benchmarks in Section IV. Results on the energy consumed by the processing block using two scalar machine models with x86 and ARM instruction sets are also presented, and its impact on the power gating minimum idle time (MIT) [23] is also evaluated.The discharge time d V sensitivity to aging is validated through accelerated aging experiments conducted using five actual chips with an SOC that contains an ARM cortex M0 processor fabricated with a 65-nm technology in Section V. Finally, conclusions are drawn in Section VI.

II. BACKGROUND AND MOTIVATION
Fig. 1 shows an SOC architecture with embedded DTM system [1], [14].Designs with different power-management capabilities, such as power gating and dynamic voltage and frequency scaling (DVFS), are integrated into the SOC.The DTM system consists of a DTM core and software.It collects measurements from on-chip sensors related to the status of the designs (power consumption, temperature, aging, and so on), and optimizes their features (performance, power consumption, temperature, and reliability) by controlling (accordingly) the power-management capabilities of the designs [14].The interconnection between the designs and the DTM core is achieved through functional interconnection [bus or networkon-chip] [1], shared nonvolatile memory (NVM) [1], and sensor access mechanisms (SAMs) [24].The DTM core is used for processing data coming from on-chip sensors.Power gating is a static power reduction technique that adds pMOS Header and/or nMOS Footer power switches, often referred to as sleep transistors (STs), that allow a circuit to operate in two modes: the power-ON and the power-OFF mode.The general scheme using header STs is shown in Fig. 2(a).During periods of inactivity, the circuit is set in the power-OFF mode in order to reduce static power consumption.STs are used for disconnecting the virtual power supply V Vdd of the circuit from the power supply V dd .The wake-up (power-OFF → power-ON) and the standby (power-ON → power-OFF) operations are implemented by a finite state machine (FSM) that resides in the always-ON (operating with V dd ) power domain of the power gating controller.Each operation follows a protocol to coordinate the activation and deactivation of design features, such as clock gating, isolation, and state retention [21].A typical case, where the circuit is equipped with clock-gating and isolation features, is shown in Fig. 2(b).With the deassertion of the power-ON signal, the protocol of the standby operation applied is to: 1) enable clock-gating; 2) enable isolation by asserting isolate signal; 3) reset the power-gated logic by asserting the reset signal; and 4) disconnect V Vdd from V dd by asserting the sleep signal to open the STs.The protocol of the wake-up operation is the reverse sequence of actions.The operations of a PGD can be self-controlled or externally controlled.For the first case, PGDs contain specialized idle-time monitoring circuitry for detecting idle periods during their operation, and for the second case, they are controlled by an external processing block (the DTM core at our case), which selects the best suited idle intervals according to system beneficial objectives (minimizing power, temperature and maximizing reliability and so on).The proposed coarse-grained BTI monitoring technique has been considered for the second case of PGDs.However, in principle, it is also applicable to the first case.Another approach [25] provides the self-controlled ability without requiring any idle-time monitoring circuitry by deploying predefined idle intervals together with intervals that the circuit operates at higher than nominal voltage.A coarse-grained BTI indication could also be beneficial to this approach.
We point out that the virtual power supply V Vdd is distributed by a virtual-power-network in the design, as shown in Fig. 2(a).We consider to use the virtual-power-network discharge time d V [shown in Fig. 2(b)], which is the time required by the virtual-power-network to discharge after the assertion of the sleep signal during a standby protocol application, for monitoring the BTI of power-gated designs.
Recent research on the effect of V th degradation of CMOS devices induced by BTI presented a significant leakage current reduction.It was shown in [18] that after only one month of operation, the power consumption due to leakage current drops to 50% compared with the initial power consumption at time t = 0.It further reduces to less than 30% and 20% after one year and ten years of operation, respectively.In [18] all leakage current components are considered.However, since high-k technologies (thicker dielectrics) reduce considerably the gate leakage [21], and the junction leakage I j is not affected by V th [26], this phenomenon has been attributed1 to a reduction of the subthreshold leakage current I sub−th .Particularly, when the STs are OFF the virtual-power-network V Vdd discharges via the leakage current I leak [21] where W is the width, and L is the length of device channel, q is the electron charge, k is the Boltzmann constant, T is the temperature, and λ is a fabrication characterization parameter.
According to BTI aging models [27], [28], V th increases over time, an effect that decreases circuit subthreshold current I sub−th exponentially over time, as derived by (1).Previous BTI monitoring techniques monitor either the path delay or the frequency drift of ring oscillators, which are effected by the active current.The active current varies almost linearly with V th [29].However, I sub−th of a circuit, which affects the discharge time d V , varies exponentially with V th .Therefore, it is expected for I sub−th to be more sensitive to V th than the active current, especially after the early lifetime of the circuit, when the variability of V th with time t is lower.These observations motivated the exploration of the virtualpower-network discharge time, which is affected by I sub−th , for monitoring BTI.

III. PROPOSED BTI MONITORING TECHNIQUE FOR PGDs
The proposed BTI aging monitoring technique consists of a virtual-power-network discharge time d V sensor and an online processing block for estimating BTI aging according to the  collected measurements, which are described in the following.The cost of the processing block is also analyzed.

A. Discharge Time Sensor
The d V sensor, shown in Fig. 3, is a very small circuit that resides in the power-gating controller and operates as a timeto-digital converter.This type of sensors is already used by power gating DFT infrastructure [30].The power gating FSM controls the sensor by asserting the measure signal together with the sleep signal in order to collect the d V measurement on every standby operation.Then, the sensor, which consists of only a logic AND gate, an inverter, and a counter, counts the clock rising edges c until the virtual voltage V Vdd drops to logic-"1."This happens when the inverter input (V Vdd ) drops below m • V dd , where m • V dd is its logic threshold voltage.Then, its output, the discharged signal, switches to logic-"1," deasserting the enable signal of the counter.The c(t i ) value of the counter is d V at time t i expressed in clock cycles.Therefore, the measured where T clk is the circuit clock period.Although the logic threshold voltage m •V dd of the inverter affects the absolute d V (t i ) value, it does not affect the relative value, which is evaluated as d V (t i )/d V (t = 0), where d V (t = 0) is the discharge time at t = 0.However, a logic threshold voltage m • V dd lower than 0.15•V dd should be avoided in order to limit the discharge time (and so the monitoring time) to hundreds of nanoseconds.

B. Collection and Analysis of Characterization Data
The d V BTI-aware characterization process is shown in Fig. 4. First, CMOS device models are characterized with V th using [27] and [28] for various values of aging temperature T A and operating time t.Statistical evaluation of the workload impact on devices stress was used [12] using structural correlations of the logic.We considered temperature  The characterization is applied to a PGD of 21 cascaded inverters (casc21) synthesized with a 32-nm high-k metal gate CMOS technology [31].We have considered a small circuit in order to explore the tradeoffs using SPICE simulation.The operating frequency of this circuit, including 30% guardband, is lower than 4 GHz, which is usually the highest frequency of commercial applications.The number of STs is selected to fulfill the constraint of an IR-drop ≤ 10% in this analysis.The synthesis and SPICE simulations are conducted using commercial EDA tools.The d V characterization data are presented in Fig. 5 and are discussed in the following.
In Fig. 5(a), we show the d V characterization data when the temperature during standby operation is kept constant at T M = 80 °C and the average aging temperature T A varies as follows: T A ∈ S T = [60, 80, 100, 120] °C.As expected, d V increases as time t and aging temperature T A increase.Indeed, from (1), we derive that the subthreshold leakage current of the devices of the circuit decreases as their threshold voltage increases because of BTI [18], [21].In Fig. 5(b), we present the d V characterization data when the average aging temperature is kept constant at T A = 80 °C and the temperature during standby T M varies as follows: T M ∈ S T = [60, 80, 100, 120] °C.In this case, d V decreases considerably with the temperature during standby, since the subthreshold leakage current (1) of the devices of the circuit increases substantially with the temperature [21].If we compare the d V range of values in Fig. 5(a) (2507 to 5411 ns) with that in Fig. 5(b) (1375 to 5561 ns) for a specific time (t = 5 years), we conclude that the effect of the temperature during standby T M on d V overwhelms the effect of average aging temperature T A .In Fig. 5(c), we present the d V characterization data for average aging temperature T A equal to the temperature during standby T M , T A = T M , selected by set S T = [60, 80, 100, 120] °C.We note that for the same time t i , d V decreases with temperature, thus confirming the great sensitivity of d V to the temperature during standby T M .In Section V, we collect measurements from actual chips that follow the d V trends shown in Fig. 5. Hence, d V characterization data could also be fitted on actual measurements, and points can be obtained using extrapolation.
In Fig. 6, we present the impact of the BTI-induced V th ("x"-axis) of the pMOS devices at the propagation delay pd (left "y"-axis) of the casc21 and at its virtual-power-network discharge time d V (right "y"-axis) measured for T A = T M = 100 °C.The graphs depict the relative values compared with those at t = 0, when also V th = 0.As expected, the trends validate that the propagation delay pd is affected almost linearly by V th , increasing upto 1.47× after ten years, while the discharge time d V is affected exponentially increasing upto 18.7× after ten years.

C. Online Processing Block and Cost Analysis
The basic concept for monitoring BTI aging by processing the virtual-power-network discharge time d V is described by means of the example shown in Fig. 7.During time (x-axis), a circuit operates at various temperatures T (t) (y-axis) and executes many times the standby operation at various time moments t i .We note that the temperature T M can be considered constant during the dischage time, which is in the order of nanoseconds and much shorter than the thermal transient cooldown from power-ON to power-OFF mode, which is in the order of microseconds [32].While time increases, the average aging temperature T A (t i ) = t =t i t =t 1 T (t)/i is affecting V th due to BTI [27], [28].However, while t i → t ∞ , both the average aging temperature T A (t i ) and the average temperature during standby T AM (t i ) = t =t i t =t 1 T M (t)/i converge to a constant value.Therefore, we consider that the temperature during standby T M (t) is a random variable that follows the deviation of T (t).This assumption is realistic, because each T M (t i ) is a sample of T (t) at the moment of standby operation t = t i , as shown in Fig. 7. Later, in Section IV-D, we present results when this assumption is removed.The online processing block is shown in Fig. 8.A cumulative moving average filter is utilized to compute the average d V from the history of standby operations.The filter is described by: d and T clk is the circuit clock period.This filter, which is applied whenever the discharged signal is asserted, requires time to converge to the average discharge time.A higher s value makes the filter to converge faster, but with a higher sensitivity to noise, as will be shown in Section IV.Note that the average discharge time d V that is provided by the moving average filter depends on the average temperature T AM during every previous standby.Therefore, as T AM converges to the average aging temperature T A , the computed d V depends only on the aging status of the circuit.Based on the |S t | × |S T | collected d V characterization data [Fig.5(c)], which are discrete d V points in the space t × T , the function T A (t, d V ) can be approximated using either interpolation coefficients [33] (cubic or linear) or a lookup table.An NVM, which is accessible for online processing, stores this data.The aging temperature T A (t i , d V ) until time moment t i is computed using the stored data and the average discharge time d V (t i ) provided by the moving average filter.Then, a BTI model [27], [28] is used to compute the average V th degradation of the CMOS devices in the PGD upon time t i , as shown in Fig. 8.The processing block is embedded in the DTM core (Section II) as a software.
The DTM core consumes power for the execution of the moving average filter affecting power gating efficiency.This cost is evaluated in terms of energy and MIT [23] impact, which represents the minimum time that a PGD must stay in power-OFF mode (denoted by MIT orig ) in order to save energy.The energy consumed the PGD while it idle is E(idle) = P OFF MIT orig , where P OFF is the static power consumption in OFF state.The PGD also consumes energy E(PGD) for recharging during wake-up.Thus, the energy consumed during and the recharging energy must be lower the energy that would be consumed if the PGD were always ON where P ON is the circuit static power consumption in power-ON state.Considering that P OFF 0.05P ON due to power gating [23], (2) becomes MIT orig ≥ E(PGD)/(0.95PON ). ( For the proposed MIT evaluation, we consider the dynamic energy E(dyn) of the DTM core.Instead, we do not consider its static energy, since the DTM core is already present in the SOC, and is never power-gated.Thus, the proposed MIT, denoted by MIT prop , is given by As in [34], we reasonably consider that half of the internal PGD nodes are in logic-"1" during wake-up.Thus, the energy E PGD for recharging the PGD depends on the effective capacitance of the power network C PDN and half of the capacitance of the logic: E PGD (C PDN + 0.5C PGD )V 2 dd .Also, the effective capacitance of the power network is almost half of the design [34], thus C PDN 0.5C PGD .Therefore, As for E(dyn), it is given by E(dyn) = aC core V 2 dd s clk , where C core is the capacitance of the DTM core, a is the switching activity, and s clk is the number of clock cycles to execute the software.Hence, the MIT cost C MIT = MIT prop /MIT orig becomes For a relative evaluation, we consider the sizes of the PGD and the DTM core similar (C core C PGD ).Thus, (5) becomes C MIT of the proposed technique depends on the switching activity a of the DTM core and the elapsed clock cycles s clk .
As for the switching activity, we can consider a value a = 0.15, as in [35].
In addition, we evaluate the energy cost of the proposed technique.For this reason, we introduce a new metric, the ratio of the dynamic energy E(dyn) consumed by the proposed technique on the DTM core against the energy that the power gating is saving when the circuit is idle for time t idle .The energy cost to energy savings ratio will be simply referred to as energy cost E cost , hereafter, and is given by When E cost > 100%, the consumed energy is greater than the saved energy.Since the energy stored in the circuit E core C core V 2 dd is almost equal to the consumed energy during the discharge due to power gating where s clk is the time to execute the software, whereas t idle_clk is the idle time, and d V _clk is the discharge time d V , expressed in clock cycles.As a worst case analysis using (8), we consider that t idle_clk 10 clock cycles, as in [34], whereas d V _clk 1000 clock cycles, as evidenced by simulation results (Section IV) and experimental measurements (Section V).In Section IV-F, we present the energy and MIT cost of the processing block using metrics ( 6) and (8).

IV. SIMULATION RESULTS
To evaluate the performance of the proposed technique, we apply it on a circuit of 21 cascaded inverters, referred to as casc21, on the c432 and on the s38584 and s38417 benchmarks from the IWLS'05 suite [22].All circuits have been synthesized with a 32-nm high-k metal gate CMOS technology [31].By means of SPICE simulations, we compare the aging estimation resolution achieved by the proposed technique against path-based approaches (Section IV-B).Also, we evaluate the performance of the proposed technique considering DVFS, and we demonstrate its robustness against temperature variation.Finally, the cost of the proposed technique is evaluated in terms of area overhead, memory requirements, energy required by the processing block and its impact on the MIT.For any quantity Q at time t i , we evaluate its relative error using ε where Est(Q(t i )) and Act(Q(t i )) are the estimated and actual values of a quantity Q at time t i .The average relative error at time t i is computed as

A. Monte Carlo Simulation Setup
A circuit may operate using one or multiple DVFS operating modes that are controlled by DTM system policies, which affect its power consumption and its operating temperature.In order to simulate how d V is affected by the DTM policies, we generate random workloads from 500 Monte Carlo permutations, varying the active policy.Particularly, each permutation is a Marcov Chain constructed by integrating the time range between t = 0 and t = 10 years with a time step of dt.For each step s i , which to time from t i to t i + dt, we that the circuit executes a task with a task average temperature T (t i ).Each T (t i ) is considered to be a random value from a normal distribution with mean temperature T p and standard deviation σ p , the values of which are indicated by the policy.For each step s i , the devices are characterized according to the models [27], [28] using the average temperature of all the tasks executed until task s i : T A (t i ) = j =i j =1 T (t j )/i , and statistical stress values [12].During the integration, unless it is stated differently, we assume that the circuit executes eight tasks per day and each task is followed by a standby operation.
Example: Consider a scenario where the temperature T (t i ) of a PGD during the execution of a task is a random variable with mean temperature T p = 80 °C and a standard deviation σ p = 3 °C.A Monte Carlo permutation of this scenario, with dt = 0.25 days, is shown in Fig. 9(a), where the temperature T (t i ) of a task and the average temperature T A (t i ) of all tasks that have been executed until time t i are shown.Next, Fig. 9(b) shows the V th degradation V i th (t i ) at time t i when the aging temperature is T A (t i ).The initial V th for a pMOS is 0.49155 V and T A is 80 °C [Fig.9(a)].
V th is 16.88% after four years and reaches approximately 20% after ten years.Finally, Fig. 9(c) shows d V (t i ) after each task (shown as dots) and the average virtual-powernetwork discharge time d V (t i ) = j =i j =1 d V (t j )/i (shown as a line) until time t i , when we apply this scenario on casc21.

B. Robustness to Noise: Path Delay Versus Discharge Time
During the simulations, we also collect path-delay data.Fig. 9(d) presents the path delay for each task (points) and the average path delay (line), when the tasks shown in Fig. 9(a) are applied on the cascaded inverters casc21 circuit.Comparing the dicharge time [Fig.9(c)] with the path delay [Fig.9(d)] values, we observe that the discharge time is in the order of hundreds of nanoseconds, while the path delay is in the order of hundreds of picoseconds.If we assume a very small measured path-delay deviation of 5% at t = 0.6 years [Fig.9(d)], where the average path delay is 0.2 ns and the pMOS devices V th is 60 mV [Fig.9(b)], then the average path-delay increases from 0.2 to 0.21 ns, which is the value V th degradation 83 mV at time t = 3.8 years.This corresponds to a time error of 3.2 years.The propagated error at estimated V th using path delay ε pd V th = 38% [Fig.9(b)], which is also the aging estimation resolution that can be achieved by path-based techniques.If we now assume a small deviation of 5% at the measured discharge time, at t = 0.6 years, then the average discharge time varies from 1176 to 1235 ns, which corresponds to the discharge time due to V th = 60.5 mV that occurs at time t = 0.76 years (for the same operating conditions).The propagated time error is 0.16 years, and the error of estimation using the discharge time would be ε d V V th < 1%, which is a 97% error reduction, and hence resolution increase, compared with the aging estimation resolution using path delay.Finally, in Fig. 9, we observe that path-delay increases by less than 23%, while discharge time more than 1100% after ten years of lifetime.Note that the robustness evaluation of the ring oscillator frequency drift sensors is similar to that of the path-delay-based sensors, because the path delay of the ring oscillator is its oscillation period.Therefore, we conclude that the discharge time is more robust to random noise and offers higher aging estimation resolution than path delay and ring-oscillators frequency drift.

C. Results on Circuits Implementing Various DTM Policies
First, we consider that the benchmarks operate using a single policy (static operating frequency) that follows a thermal profile p = [90 °C, 3 °C], with average aging temperature T p = 90 °C and deviation σ p = 3 °C.Second, we consider three policies with operating voltages (V dd1 , V dd2 , V dd3 ) = (0.9, 1, 1.1) V, and thermal profiles p L = [75 °C, 2 °C], p M = [85 °C, 2 °C], and p H = [100 °C, 2 °C], respectively.Table I presents the results.Particularly, first column shows the circuit name and column "policies #" the number of available policies.We assume that eight tasks/day are executed, therefore column "cp-every" reports the change-policy rule, which selects values from the set ["day," "month," "never"].When "cp-every" is set to value "day" then the active policy of the circuit remains unchanged for eight tasks, and then it is randomly selected among the [ p L , p M , p H ] policies.Similarly, the value "month" indicates that the active policy remains unchanged for six months (30 × 8 = 240 tasks).The value "never" applies only to the single policy case.The column labeled as "discharge time d V sensor" contains information related to the d V sensor (Section III-A): the parameter convergence speed "s" of the moving average filter, the number of standby operations required to converge "sb #," and the average relative error of the moving average filter ε d V for all the Monte Carlo permutations.Note that, for s = 0.01, the filter requires 265 standby operations to converge for the c432 (single policy), while it requires only 29 operations for s = 0.05.We also observe the earlier convergence of the sensor for higher "s" values, which, however, comes together with a higher error due to the filter's higher sensitivity to workload fluctuations.The error ε d V is small, in the range [0.36%-0.97%]and [4.1%-8.6%]for designs with single  and multiple policies, respectively.The BTI estimation also requires a lower number of standby operations to converge, while s increases.The BTI monitoring of casc21 requires 268 standby operations for s = 0.01, while it requires with the first standby operation for s = 0.05.The error of the average threshold voltage degradation estimation ε V th is very small, less than 1% for designs with a single policy and in the range [0.5%-6.2%]for designs with multiple policies.For the Monte Carlo permutations conducted, the convergence occurs in the range 3 h to 0.09 years.However, it is obtained considering only eight standby operations per day, which is a small number.For circuits that are more frequently powergated, the convergence could occur in minutes.
Figs. 10-12 focus on a single Monte Carlo permutation to present these trends in more detail.Figs. 10 and 11 show the discharge time d V (t i ) and the average d V (t i ) (left y-axis) given by the moving average filter, as a function of time (x-axis), for circuit c432 for both the single policy (Fig. 10) and the three policy (Fig. 11, the three d V regions represent one for each policy) cases, respectively.Fig. 10 shows results for the considered s values, s = 0.01 and s = 0.05.The relative error ε d V (t i ) (right y-axis) of the average discharge time estimation is also depicted.Fig. 12 shows the estimated (Est( V th (t i ))) and the actual ( V th (t i )) average V th degradation (left y-axis) in time t ("x"-axis) for the single policy case (Fig. 12).It also depicts their relative error ε V th (t i ) (right y-axis).The relative error between the estimated and the actual V th values is higher at the beginning, but it reduces as the filter converges.The average value of the error ε V th (t i ) is found 0.4% after the convergence.As convergence point, it is considered the moment when the relative error becomes <10% and occurs at 0.013 years [Fig.12(b)].For the case of three policies (Fig. 11), the V th degradation estimation error is following a similar trend.Its average value ε V th (t i ) is found 3.2% after the convergence, which occurs at 0.024 years.

D. Temperature Variation During Standby Operations
Both the temperature during standby operations T M (t i ) and the temperature of the executed task T (i ) were considered independent random numbers following the temperature variation of the active policy.However, a reason to power-OFF a circuit could be the elevated temperature.Therefore, the average temperature during standby might be higher compared with the average temperature of the active policy.Therefore, we repeat all the simulations by considering that the average temperature during standby operations T M is higher compared with the average temperature of the active policy by modeling T M as T M (t i ) = T (t i ) + d T M + σ T M , where d T M is a drift and σ T M is a white noise deviation of temperature during standby at time t i , compared with the task temperature T (t i ).For a high deviation of σ T M = 10 °C and without a drift (d T M = 0 °C), the proposed technique performs without any additional notable error, because the white noise is canceled by the moving average filter.The drift introduces an error in the average threshold voltage estimation, which for d T M = 5 °C can reach 9.4%.However, this error is systematic, thus it can be corrected by the processing block.Even in the case that this error is ignored, the drift is the same for identical designs, and hence, it does not affect the practicality of the proposed technique for comparing their aging status.

E. Area Cost and System Memory Requirements
We evaluate the area cost of the hardware block as well as the memory requirements of the processing block.The discharge time sensor (Section III-A) consists of only a logic AND gate, an inverter, and a clock cycles counter.This type of delay sensor may already be part of the power gating DFT infrastructure [30], [36].The maximum number of bits |CC| for the counter was |CC| = log 2 (d V (t = 10, T A = 120, M 60)/T clk ) = 14 bits, is obtained with an operating clock period T clk = 1 ns and the maximum d V value that is observed (after time t = 10 years, with average temperature T A = 120 °C and temperature during stand-by T M = 60 °C) (lower temperature considered) and operating clock period T clk = 1 ns.The overall area overhead, when the DFT infrastructure [30] is not available, is ≤ 0.4% of s38417, and does not depend on the size of the design.In addition, we examined the NVM size |M| required by the processing block software in order to approximate the T A (t, d V ) function.Using linear interpolation coefficients from 64 collected points for the processing block, |M| = 4 × 4 × (# of points) bytes, with four number of linear coefficients of 4 byte each per point.Thus |M| = 1 Kbyte, which is a very low memory cost.The discharge time sensor is accessible by the DTM core (Section II) through cross layer SAMs that reuse DFT and interconnection infrastructure [1], [14], [24].

F. Energy and Minimum Idle Time Cost
We implemented the moving average filter in C programming language, which was compiled into 7 and 12 instructions from x86 and ARM instruction sets, respectively.We consider that each instruction is executed in one clock cycle, thus s clk_x86 = 7 + 2 = 9 and s clk_ARM = 12 + 3 = 15, considering also the clock cycles for checking the discharged signal.Next, we use (6) and (8) to evaluate the processing block cost.
1) Moving Average Filter: Since MIT is less than the time of the circuit to discharge (MIT < d V [34]), we examine the energy cost, when t idle belongs to one of the two possible intervals: 1) MIT ≤ t idle < d V and 2) d V ≤ t idle ≤ 1 s.The DTM core is aware if the PGD was fully discharged, through the discharged signal of the sensor.If the PGD wakes up before the circuit discharges (MIT ≤ t idle < d V ), the moving average filter execution is avoided and only two and three instructions are required from the x86 and ARM sets, respectively, to check the value of discharged signal, implying [using (6)] an C MIT of 1.3× and 1.45×, respectively, as shown in Table II.Also, the average energy cost in this interval is This cost is evaluated for MIT ≤ t idle < d V by using A = MIT and B = d V .For x86 and ARM architectures, the E cost results, which are shown in Table II, are 7.3% and 10.9%, respectively.When d V ≤ t idle , the filter is executed and the energy cost is evaluated using (9) with A = d V and B = 1 s in clock cycles.It is found 9.8E-05% and 1.7E-04% for each architecture, respectively (Table II).The worst case energy cost for this process is when t idle = d V , and is evaluated using (8) at 7.1% and 11.8% for each architecture, respectively.

2) Aging Monitoring Process (Accessing of the Lookup Table):
We presented in Section IV-B that a 5% d V variability propagates a V th shift error < 1% and that such V th variability is exhibited between PGD with 0.16 year time difference.Due to this resolution bound, the aging monitoring process runs periodically with the very low period of 0.16 years (approximately two months) and, hence, its energy cost is negligible.Also, the larger the PGD is compared with the DTM core, the lower is the cost presented in Table II.Fig. 13 shows the floorplan of an actual SOC, which has a DTM core that is an ARM cortex M0 processor, and is located at the bottom-left corner of the SOC.Note that most blocks in the SOC are larger than the core.

V. EXPERIMENTAL VALIDATION
To demonstrate the impact of aging on the discharge time, we conduct experiments with actual chips.The experimental setup is shown in Fig. 13.The test-chips used in our experiment contain the SOC Tokashi [37] [Fig.13(a)] and are manufactured with a 65-nm CMOS technology.V dd is connected to 1.2 V power supply.The SOC has an ARM cortex M0 processor that is power-gated as a single block, and has an exposed V Vdd pin [Fig.13(b)] that can be directly accessed by an external oscilloscope [Fig.13(c)].Through the external oscilloscope, we collect virtual voltage V Vdd waveforms during standby operations of the processor in time.These measurements are postprocessed for emulating the operation of the proposed processing block.The impact on the discharge time oscilloscope's probe (∼10 M resistance) is negligible and the V Vdd network discharges mainly through the chip (∼50 K resistance).The same instrument is used throughout the experiments, and a relative evaluation of measurements compared with those obtained at t = 0 is performed, thus any systematic variability induced by the instrument should not impact the observed trends.
To accelerate aging between measurements collection, we operate the chips at 70 °C, using a temperature chamber that has ≤5% accuracy error, while executing a computational intensive synthetic benchmark, the Dhrystone [38].The discharge time is evaluated using oscilloscope measurements as the time interval from the assertion of the sleep signal to the moment, where V Vdd reaches a logic threshold of 25% of V dd .We collect K measurements at various time points t = 0, 200, and 400 hours of operation.For each set of K measurements at a time point t, we compute the relative average discharge time compared to the average discharge time experienced at t = 0.This normalized discharge time, which emulates the moving average filter is simply referred to as average discharge time, hereafter, and is computed for each time point t by where d V i (t = 0) denotes one of the K measurements collected at the beginning of the experiment, when t = 0.The measurements at each time point are considered to occur simultaneously, since the aging status of the chips is slightly affected during the few seconds of their manual collection.
In Fig. 14(a), we present the average discharge time of a set of K = 10 measurements for every time point t = 0, 200, and 400 h of operation for a set of chips.After 200 h of operation, there is a 5%-17.4% increase of the average discharge time, which increases to 9.3%-26.7%after 400 h of operation compared with the average discharge time at t = 0.As expected, a clear increase of the average discharge time for all the examined chips is observed confirming its sensitivity to the aging status of the chips.On the other hand, the absolute d V i measurements are highly sensitive to random noise and vary in the range [613 ns-1240 ns].Next, we obtain a trend for the static power P stnorm over time by considering that the charge, which is stored in the circuit and the leakage current I leak are constant during discharging: P stnorm = I leak (t)/I leak (t = 0) ∝ d V (t = 0)/d V (t).Fig. 14(b) shows the computed static power trend for the examined chips.These results are consistent with the static power reduction with BTI aging reported in [18].The aging of the chips at t = 0 differs, since they were manufactured in 2012 and have also been used for other purposes.
In the next experiment, we focus on another chip, relatively "fresh" than those used for the previous experiments, and we repeat the experiment for 4000 h 5.5 months.Fig. 15(a) shows the collected data.We collect d V measurements every 100 h, while time t < 600 h [Fig.15(b)], and every 500 h when time t > 600 h [Fig.15(c)].We also collect data at t = 4000 h.The same process as before is followed on each measurement.time d V is in consistency with the expected trend, thus confirming its sensitivity to the BTI aging status of the design.
Note that the examined core (Fig. 13) is power-gated as a single block.However, the proposed technique can also be adapted for cores with individually power-gated blocks by following coarse-grained rules, which depend on the objectives of the application that utilizes the coarse-grained BTI monitoring.For example, an application that targets to maximize reliability can consider the most aged block, as a representative of the core, while an application that targets to maximize power consumption can consider the average aging of all blocks, instead.Nevertheless, the proposed technique remains unaffected in principle, while only additional software is required for following such coarse-grained rules.The analytical tools presented in Section III-C can be used for analyzing cost, which is architectural and objective dependent.

VI. CONCLUSION
We presented a coarse-grained technique for monitoring online the impact of BTI aging on the CMOS devices of power-gated designs (PGDs) that consists of an on-chip virtual-power-network sensor embedded in the power-gating controller and a processing block for processing the collected measurements.The proposed technique features some advantages over fine-grained techniques: 1) it does not require the mission profile to be known during design, making it also applicable to memories; 2) upto 97% higher average aging estimation resolution is achieved than that of pathdelay-based techniques; and 3) the virtual-power-network is already distributed in the PGD, and thus it does not require additional distributed sensors.By means of SPICE simulation, we evaluated the performance of the proposed technique on PGDs with static operating frequency and DVFS.The average threshold voltage estimation error induced by random temperature variation was found to be negligible.The MIT increase caused by the energy consumed by the proposed software was evaluated on two scalar machine models that use x86 and ARM instruction sets and was found <30% and <45%, respectively.Through accelerated aging experiments using five actual chips with an SOC that contains an ARM Cortex processor, we validated the discharge time sensitivity to the BTI aging status of the processor.

Fig. 9 .
Fig. 9.For T A = 80 °C (a) Scenario of tasks temperature; (b) pMOS V th degradation; (c) discharge time d V and (d) path delay over time t.
Figs 10(a) and 11(a) focus on the time range [0-10] years.The average relative error is 0.55% and 0.89% for s = 0.01 and s = 0.05, respectively, for the single policy case, and 3.2% for the three policies.Figs.10(b) and 11(b) focus on the time range [0-0.4]years.
The reported d V values are relative to time t = 0.A clear incremental trend of the average discharge time d V in time up to 2.79× compared with the average d V at time t = 0 is shown after 4000 h of operation.Particularly, d V increases by 2.75× after almost a month [Fig.15(b)] and continues increasing, almost linearly, for 1% every 79 days [Fig.15(c)].The absolute d V values are in the range [410-1650 ns].The observed trend of the average virtual-power-network discharge

TABLE I AVERAGE
DISCHARGE TIME AND BTI ESTIMATION RESULTS FROM MONTE CARLO SIMULATIONS USING SINGLE AND MULTIPLE POLICIES

TABLE II AVERAGE
ENERGY-SAVING AND MIT COSTS FOR PROCESSING