Multibit Full Comparator Logic in Quantum-Dot Cellular Automata

In the last few years, binary comparators have received a great deal of attention as parts of complex computational data-paths. However, while several multi-bit architectures have been demonstrated using conventional CMOS technologies, few examples of n-bit comparators, with n higher than 4, can be found in literature for designs based on emerging nanotechnologies, such as the Quantum Dot Cellular Automata, the Nano Magnetic Logic, and many others. This brief proposes a novel approach to design efficient multi-bit binary comparators using the Quantum Dot Cellular Automata nanotechnology. The approach here presented allows improving state-of-the-art competitors in terms of computational complexity and average energy consumption. As an example, in comparison with its direct counterparts, the 32-bit comparator designed as proposed here saves up to 26%, 23% and 11% of the occupied area, the used basic cells and the average energy consumption, respectively. When implemented using the Nano Magnetic Logic, the 4-bit version of the novel comparator uses 1183 magnets and 38 clock phases.


Multibit Full Comparator Logic in Quantum-Dot Cellular Automata I. INTRODUCTION
S EVERAL emerging nanotechnologies, such as the Quantum dot Cellular Automata (QCA) [1], the Nano Magnetic Logic (NML) [2], the carbon nanotube [3], etc., are recognized as promising solutions to overcome the limits of conventional CMOS technology. In the recent past, QCAbased binary arithmetic circuits, including adders, multipliers and comparators received a great deal of attention [4]- [19]. However, among them, parallel binary comparators realized in such emerging technologies still represents a challenge. They are basic computational modules extensively used in many digital circuits and systems applications. Unfortunately, while their realization using the traditional CMOS approach is relatively straightforward, designing efficient parallel comparators using QCA or NML poses several practical difficulties. This Manuscript  This brief presents a novel and easy-to-use approach suitable for the design of efficient QCA-ased multi-bit parallel comparators. A novel logic formulation is introduced to reduce the complexity and the energy consumption of tree-based comparator architectures constituted by replicas of two very simple encoding sub-circuits, responsible for comparing 2-bit subwords of the operands, and a third module furnishing the final comparison result. The high regularity achieved in this way allows partitioning the cells used into proper clock zones in a simpler way.
All the designs here presented were laid out and characterized using the QCA Designer-E 2.2 software tool. Post-layout results show that the 32-bit version of the novel comparator complying with the 2DDWave clocking scheme [20] requires 2587 cells, occupies an overall area of ∼5.9um 2 , uses 18 clock phases to perform the generic operation and dissipates an average energy of ∼8.02e-2 eV. At a parity of the adopted basic clocking scheme, it achieves a complexity (i.e., the number of used MGs) and an average energy consumption ∼22% and ∼11% lower than [14], respectively.
The proposed methodology has been also applied to realize a 4-bit NML-based comparator. Results obtained with the ToPoliNano CAD Tool show that the proposed NML-based comparator requires 1183 magnets and uses 38 clock phases.
The rest of this brief is organized as follows: Section II provides a brief background; the novel comparator logic is introduced in Section III; then, Section IV presents post-layout results and comparisons with existing counterparts; finally, conclusions are drawn in Section V. polarizations that are associated to the bit values '0' and '1' [1]. Properly combining multiple instances of the basic cell, the fundamental logic elements, i.e., MGs, INVs and wires, are obtained. More complex logic functions are implemented by exploiting the interaction between adjacent QCA cells and employing either the co-planar or the multilayer routing strategy [1].
In order to guarantee the correct data-flow directionality, the cells within a QCA-based design are partitioned into the socalled clock zones, each associated to a proper clock signal. Each clock zone in the computational path behaves like a Dlatch, thus making such a design intrinsically pipelined, with a latency depending on the number of cascaded clock zones.
In the last few years, efficient QCA implementations have been proposed for several binary arithmetic functions, with a special effort focused on the comparators [7]- [20].
Representative examples of n-bit comparators are described in [9], [11], [12], [14], [15], [19]. The serial structure provided in [9] exploits one 1-bit comparator and two 1-bit registers to process the n bits of the operands A (n−1:0) and B (n−1:0) serially. As its main advantage, this full comparator limits the amount of utilized resources, but at the expense of an equally limited throughput. The solution presented in [11] leads to a faster implementation, but it recognizes only the condition A ≥ B.
All the approaches proposed in [12], [14], [15], [19] allow designing n-bit parallel full comparators. However, while the tree-based (TB) architecture presented in [12] was demonstrated for n up to 8, the two different architectures introduced in [14], namely the low-cost cascade-based (CB) and the fast TB comparators, where characterized for n ranging between 2 and 32.
The full comparator recently presented in [15], was designed by exploiting a novel 5-input MG and it was characterized considering operands word-lengths up to 4bit. Finally, [19] presents pre-layout results for a comparator  Fig. 2a and (1a)-(1b).    To better explain the running of the novel comparator, let us examine the example reported in Fig. 4 that refers to the case in which the 2's complement 8-bit numbers A (7:0) = 11000000 (i.e., −64) and B (7:0) = 01000000 (i.e., 64) are compared. With n being equal to 8, n 2 = 4 instances of FEM are required to process as many 2-bit sub-words of the operands. The signals M0 i 1 and M0 i 0 , with i = 0, . . . , 3, are inputted to the h = 2 levels of IEMs: as shown in Fig. 4, the first one (for which j = 1) consists of y = 2 IEMs, whereas the second one (for which j = 2) uses y = 1 IEM. The two levels of IEMs set both the signals M2 0 1 and M2 0 0 to 0 that, as above shown in Fig. 2b, encodes the condition A<B. Finally, the FRM furnishes A big B = 0, A eq B = 0 and A less B = 1.
It is worth noting that the proposed approach leads to highly regular full comparator architectures. In fact, in contrast to the TB structures presented in [14] and [19], that, depending on the operands word-length n, employ up to six different basic modules, a binary comparator designed as proposed here employs, independently of n, only three simple modules properly arranged within the TB structure. Moreover, while the six basic modules utilized in [14] and [19] consist of up to six MGs and three INVs, as depicted in Fig. 3, the FEM, IEM and FRM blocks used within the novel comparators are significantly simpler. The design complexity of the novel comparators can be evaluated in terms of the amount of used MGs and INVs, and the number of MGs within the worst computational path (#MGs_CP), as given in (4). Preliminary comparison results summarized in Table I show that, at a parity of the operands word-length and the computational capability, the novel comparator exhibits the shortest computational path and uses the fewest MGs. However, in order to achieve reliable comparison results, the post-implementation characteristics must be analyzed. Indeed, compliance with the QCA layout rules [21] may require additional clock phases. As an example, thermodynamic effects influence minimum and maximum number of cells in a single clock zone.
In accordance with [7]- [15], the proposed designs use a minimum of 2 cells and a maximum of 16 cells cascaded per clock zone. Moreover, due to their higher robustness [21], multilayer crossovers are preferred to coplanar ones. Clocking wires are assumed to be buried below the QCA base layer  and to be endowed with small metal pads that assure a uniform electric field and precise control on inter-dot barrier of the cells. Accordingly, the new layouts have been partitioned into uniform, regular and bounded clock zones. In particular, the novel comparators have been laid-out to comply with the 2DDWave clocking scheme demonstrated in [20] and they have been characterized through the QCA Designer-E 2.2 software tool with default settings.
Samples of the implemented layouts are depicted in Fig. 5, whereas Fig. 6 shows some simulation results. Post-layout results are collected in Table II and compared with several existing counterparts. In order to have a touchstone, also the TB comparators presented in [14] and [19] have been characterized in terms of energy consumption. It can be seen that, in comparison with [14], at a parity of the basic layout rules, the new comparator saves up to 26%, 23% and 11% of the occupied area, the used QCA cells and the average energy consumption, respectively.
An energy saving up to 19% is reached when the 2DDWave clocking scheme is adopted. In this case, the amount of used cells does not significantly change with respect to the layouts implemented using the basic clocking scheme. However as expected, the occupied area increases due to the geometric restrictions dictated by the 2DDWave scheme in terms of length and width of each clock zone [20].
The reliability of the proposed designs has been analyzed using the probabilistic transfer matrices (PTM) approach [22] assuming that the primary inputs are uniformly distributed and that each gate has an error probability p = 0.1. At first, for the basic modules FEM, IEM and FRM has been estimated the reliability 0.7216, 0.81 and 0.6152, respectively. Then, the fidelity of 0.6739 has been estimated for the output signals furnished by the new 4-bit comparator.
To further analyze the compared architectures, the generalized cost function CF proposed in [23] has been extended as given in (5) to take into account the consumed energy (E). There, #MGs and #INVs represent the amount of utilized logic gates, #CO the number of crossovers and #Phases the delay. Note that all the weightings are set to 1. CF = (#MGs + #INVs + #CO) × #Phases × E The reduced energy consumption and size, in conjunction with the lower number of crossovers, make the new circuits able to achieve a CF up to 13% and 27% better than [14] and [19]. Finally, the 4-bit version of the novel comparator has been implemented using the NML, thus demonstrating that the proposed logic can be efficiently exploited also with different nanotechnologies. The layout obtained using the ToPoliNano CAD Tool is shown in Fig. 7, whereas some simulation results are reported in Fig. 8. The proposed NML-based comparator requires 1183 magnets and uses 38 clock phases.

IV. CONCLUSION
The novel logic here presented allows designing efficient multi-bit QCA-based full comparators. Only three elementary modules, named FEM, IEM and FRM, are required to process multi-bit operands. Such an approach leads to highly regular TB architectures. Moreover, in comparison to stateof-the-art competitors, the comparators designed as proposed here achieve reduced computational complexity and average energy consumption.