Footer Voltage Controlled Dual Keeper Domino Logic with Static Switching Approach

Abstract. In this paper, two circuits, namely Footer Voltage Controlled Dual Keeper domino logic (FVCDK) and Footer Voltage Controlled Dual Keeper with Static Switching domino logic (FVCDK-SS) are presented, in order to achieve high speed, low power consumption and robustness. The dual keeper arrangement helps in reducing the loop gain of the feedback circuitry, which leads to lower delay variability. The keeper circuitry is controlled using the footer voltage to reduce the contention current in the initial evaluation phase, and thus providing enhanced speed. In FVCDKSS domino logic, unwanted transients at the output are reduced by incorporating pseudo-dynamic buffer in the proposed FVCDK domino logic. This further reduces the dynamic power consumption. The results of the logic presented here are validated by comparing them to a wide range of existing domino logic circuits for a variety of performance metrics such as delay, power, power-delay product and unity noise gain. To effectively gauge the wide fan-in capabilities of the proposed logic, results are shown for the various fan-in OR gate. The simulations of the circuits are carried out using industry standard full-suite Cadence tools using 45 nm technology library.


Introduction
The low-power and high-speed requirement for wide fan-in applications such as SRAM, pre-encoders, OR gates and tag comparators [1] have been effectively ac-complished by the domino logic circuit design. A conventional domino logic, shown in Fig. 1, consists of a pre-charge transistor (M P re ), Controlled by Clock (CLK), and a Pull-Down Network (PDN). The low number of transistors needed for the domino logic make it an attractive choice for wide fan-in circuits, as only two additional transistors are needed in addition to the evaluation network. Moreover, this logic only needs either a Pull-Up Network (PUN) or PDN, which leads to its immensely small footprint in comparison to the static CMOS logic. When CLK is LOW, M P re turns ON and charges the dynamic node (Dyn node ) to the Supply Voltage (VDD). This is known as the precharge phase. When CLK goes HIGH, M P re turns OFF and the Dyn node is conditionally discharged to Ground (GND), if the PDN is evaluated to be TRUE. This is known as the evaluation phase.
The presence of footer NMOS M n is optional, giving two distinguished forms of conventional domino logic, namely, Footed Domino Logic (FDL) ( Fig. 1(a)) and Footless Domino Logic (FLDL) ( Fig. 1(b)). The footless domino logic is faster than the footed domino logic, because there is no stacking effect. However, this speed improvement comes at the cost of increased power consumption and leakage current in comparison to the footed domino logic.
In order to prevent the Dyn node from being in the high impedance state during the evaluation phase, in case the PDN evaluates to be FALSE, a keeper transistor (M K ) is used. It counteracts the charge leakage at the dynamic node during the evaluation phase in such scenario [2], [3], [4] and [5]. If the clock frequency is low, then, in the absence of keeper transistor, voltage at the Dyn node can stray from its ideal value of VDD to a lower value due to charge leakage during the elongated evaluation phase. The magnitude to which the Dyn node falls below VDD can be taken as a mea-sure of robustness for the circuit, with values closer to VDD pertaining to the circuits that are more robust. It can be enhanced by increasing the aspect ratio of the keeper, but at the cost of speed. This is because if the PDN evaluates to be TRUE in the evaluation phase, it tries to discharge the node, while M K tries to keep it at VDD. Several modifications have been made in the conventional domino logic to reduce the contention current for increased speed and lower power.
In this paper, two novel domino structures, FVCDK and FVCDK-SS have been introduced. Both circuits take advantage of a controlled dual keeper arrangement to minimize the contention current at the early stage of evaluation and a controlled discharge path. These two techniques working in parallel lead to improvement in speed, power and power-delay product.
The organization of this paper is as follows: Section 2. presents an overview of the previous studies in the field of domino logic circuits. Section 3. provides explanation of the proposed circuits and design methodology. Section 4. presents the simulation results obtained for the proposed logic and its comparison with existing architectures. Section 5. concludes the discussion.

Related Work
The study on domino logic focuses on one or more of the following: • Counteracting the leakage current.
• Low power consumption.
• Reduced unwanted switching at the output.
These enhancements are achieved by either modifying keeper circuitry (to reduce contention current), or adding static switching mechanism at the output node (to reduce unwanted transients), or providing an additional discharge path (for improving speed). These techniques are described in the following sub-sections.

Dual Keeper Modification
In the conventional domino logic shown in Fig. 1(b), the positive feedback loop formed by the output inverter I 1 and keeper M K increases the delay variability produced due to variation in various process parameters, such as C ox , t ox etc. This loop gain is given by Eq. (1).
where A inv is the gain of I 1 , g m/keeper is the transconductance of keeper M K , and Z dyn is the impedance at the dynamic node.
The Grounded PMOS Keeper (GPK) domino logic [6] shown in Fig. 2(a) divides the keeper M K into M K1 and M K2 , such that the sum of their length is same as M K . This reduces the loop gain of the output node, thus reducing the delay variability. However, since the keeper M K2 always remains ON, this leads to a high contention current, and hence reduced speed.
This contention current can be reduced by making modification in the dual keeper circuitry and turning it OFF during the initial stage of the evaluation phase. In the Clock Delayed Dual Keeper (CDDK) domino logic [7] depicted in Fig. 2(b), the keeper circuitry is enabled after a delay produced by the inverter I 1 during the initial part of the evaluation phase. This leads to the truncation of contention current and hence enhanced speed.

Static Switching Mechanism
Unwanted switching transients during the pre-charge phase at the output node elevates the problem of dynamic power consumption in conventional domino logic circuits. This is because during the evaluation phase, if PDN evaluates TRUE, Dyn node is discharged to GND, hence OUT is charged to VDD. In the subsequent pre-charge phase, Dyn node is charged to VDD by M P re , causing OUT to get discharged to GND. This transient in the output node is undesired as output would again be charged to VDD in the next evaluation phase, if the inputs to the PDN remains unchanged. show the domino logic using Pseudo-Dynamic Buffer (PDB) [8] and Clock Delayed Dual Keeper with Static Switching (CDDK-SS) mechanism [9] respectively, that reduce this unwanted transient at the output node. In both the logics, source terminal of the NMOS M n1 of the output inverter is connected to V f oot , instead of GND. If inputs to the PDN remain HIGH for two consecutive cycles, charging of Dyn node to VDD by M P re in the pre-charge phase will not cause the discharging of OUT to GND. This is because M n remains OFF, and since PDN is TRUE, voltage at V f oot rises, thus preventing M n1 from discharging the output node.

Dual Keeper with Additional Discharge Path
In the evaluation phase, if PDN evaluates to be TRUE, then the time required to discharge Dyn node to GND depends upon the path delay offered by the PDN and the footer transistor M n . In the Clock Delayed Dual Keeper domino logic with Additional Discharge Path (CDDK-ADP) [10] shown in Fig. 4, an additional discharge path for the Dyn node is available through M n1 and M n2 . During the pre-charge phase, Dyn node is charged to VDD. Since CLK is LOW, M n2 is in cutoff region, and hence the additional discharge path remains OFF. In the initial part of the evaluation phase, if the PDN evaluates to be TRUE, voltage at V f oot rises, which turns M n1 ON. And since CLK is high during this period, M n2 also turns ON. This enables the discharging of Dyn node through the additional discharge path, comprising of M n1 and M n2 , and hence reduces the delay.       The operation of the proposed FVCDK circuit is elucidated in Fig. 5 for the evaluation phase when PDN evaluates to be either TRUE or FALSE. Bold and dotted lines are used to represent the state of transistors as ON and OFF respectively. During the initial evaluation phase, if PDN evaluates to be TRUE, voltage at V f oot increases. The extent of this increase depends upon the relative sizing of the PDN and the footer transistor (M n ). An increase in this extent can be obtained by increasing the aspect ratio of the transistors in the PDN. This increase in V f oot reduces the driving capability of the keeper M K2 , and hence reduces contention current. This in turn speeds up the discharging of the Dyn node . After the Dyn node is discharged sufficiently, even if M K2 turns ON, the keeper M K1 would remain OFF, leading to this topology yielding a lower contention current throughout the discharging of the Dyn node . Therefore it may be inferred that if the PDN is TRUE, either of the two keepers is in the OFF state, thus reducing the contention current.

Proposed Work
The FVCDK-SS works in a similar way to FVCDK, but in order to eliminate the unnecessary switching at the output node and lower the dynamic power consumption of the circuit, PDB is used. The advantages of PDB are described in Subsec. 2.2.

Results
The simulations of the circuits are carried out using industry standard full-suite Cadence® tools using 45 nm technology library. We have analyzed various footed domino circuit topologies using metrics like power consumption, delay, Power-Delay Product (PDP) and Unity Noise Gain (UNG). For comparison purpose, all the transistors are kept at minimum size, i.e., channel width and length are set to 120 nm and 45 nm respectively, with supply voltage and temperature set to 1.2 V and 300 K, respectively. Footless topologies are not considered for this comparison, as those would lead to high power consumption due to excessive leakage at 45 nm technology node.  Figure 7 shows the output waveform of the proposed FVCDK circuit during the pre-charge and evaluation phase. The delay is calculated by making one of the inputs HIGH as the evaluation phase starts and calculating the time taken for the OUT signal to rise from LOW to VDD/2. The power is determined under the same simulation environment by calculating the average power consumption in a period of time. UNG readings are taken by applying a noise pulse of 100 ps duration to all the inputs of the PDN during the evaluation phase [11]. The amplitude of the noise pulse is varied, until the output node reaches the same amplitude. The UNG is a measure of DC robustness of the circuits.  Table 1 shows the power, delay, PDP and UNG values for 128 fan-in OR gate designed using various footed domino topologies, including the proposed FVCDK and FVCDK-SS logic. It can be seen from Fig. 8 that FVCDK-SS offers the smallest rise time delay and the lowest power consumption, hence lowest PDP. On the other hand, FVCDK offers improvement in all three metrics when compared to the pre-existing domino logic circuits, with the exception of CDDK-SS. While the speed and power consumption of FVCDK are slightly worse than that of CDDK-SS, the logic makes up for it by having a larger UNG value, as de-picted in Fig. 9, and thus resulting in a more robust circuit.  CDDK-SS and CDDK respectively is attributed to the reduced contention current in the proposed circuits.
There is no contention reduction mechanism in GPK, which leads to the largest delay for the GPK domino logic. Figure 11 shows the effect of fan-in on delay, power consumption and PDP for OR gate designed using FVCDK and FVCDK-SS. With higher fan-in, capaci-tance at the dynamic node increases, leading to an increased power consumption, as well as increased delay. Figure 12 and Fig. 13 depict the effect of change in supply voltage and temperature respectively, on power, delay and PDP values. With a rise in VDD, power starts to increase because of the increase in leakage current. This is accompanied by a reduction in delay because of the increased driving capacity of the circuit. Both power and delay increase with a rise in temperature. To obtain better performance, transistors in the PDN and the PMOS transistor in the output inverter were replaced with their lower V th counterpart. Moreover, to achieve lower power consumption, all the other transistors were replaced with their higher V th components. As the high V th transistors lie on the noncritical delay path, the performance of the circuit is not affected. The results obtained for 128-input OR gate using this approach are shown in Tab. 3. A speed improvement of 47 % and power reduction of 14 % is observed for FVCDK-SS from this technique. In order to verify the robustness of the proposed circuits, corner case analysis was conducted. In this analysis, circuits were simulated using transistors with extreme fabrication parameters. The delay and temperature obtained for the circuit in these corner conditions tend to deviate from their typical corner value. A circuit is said to have an inadequate design margin, if it does not function correctly at any of these process extremes. The corner cases used for the analysis are Slow NMOS-Slow PMOS (SS), Slow NMOS-fast PMOS (SF), fast NMOS-Slow PMOS (FS) and fast NMOS-fast PMOS (FF). Table 2 presents the value of delay, power and PDP for these corners, including the results for the Typical NMOS-Typical PMOS (TT) case. The topologies used were CDDK, FVCDK, CDDK-SS and FVCDK-SS. A 128 input OR gate was taken up for this analysis.
Also, the Monte-Carlo Simulations were performed over 1000 points in order to obtain the delay variability for the compared topologies. The delay variability was obtained by dividing the standard deviation value of the delay with the mean value. Table 4 shows the values obtained for both the standard deviation, as well as the variability factor of the delay. The variability factors obtained for FVCDK and FVCDK-SS were 13.56 % and 9.29 %, respectively, down from 19.04 % and 10.50 % for CDDK and CDDK-SS, respectively. This illustrates the better delay invariability offered by the proposed domino circuits from the CDDK variant.

Conclusion
The novel FVCDK and FVCDK-SS domino circuits were proposed in this paper. Both the circuits are designed for the contention current reduction by switching OFF the keeper arrangement during the initial part of the evaluation phase. This leads to a reduced power consumption and increased speed at the cost of a little degradation in the UNG value, as compared to the existing dual keeper footed domino circuits. In addition, FVCDK-SS domino circuit offers even better speed and power efficiency than the FVCDK by trading off the UNG value. This is due to the reduction in the unnecessary switching of the output node.
The simulation for all the process corners validates that both circuits are process variation tolerant. Standard deviation of delay for all the corner cases about the typical values illustrates that the proposed circuits' performance is more invariant to the process change as compared to the CDDK and CDDKS-SS. The simulations are also carried out for a range of temperature and supply voltage variations to validate the robustness of the circuit. Also, the use of MTC-MOS results in a 53 % and 47 % speed improvement in FVCDK and FVCDK-SS, respectively. However, this improvement comes at the cost of increased complexity in terms of fabrication, as both the lower V th and higher V th counterparts for the same MOS needs to be fabricated on the same die. By comparing the proposed circuits with the conventional footed domino circuit, where all are designed using MTCMOS, we note that the proposed circuits outperform the latter in speed, power and PDP metrics by a significant margin.