Low Power 256-bit Modified Carry Select Adder

: Carry Select Adder (CSLA) is one of the high speed adders used in many computational systems to perform fast arithmetic operations. When compared to earlier Ripple Carry Adder and Carry Look Ahead Adder, Regular CSLA (R-CSLA) is observed to provide optimized results in terms of area. This study proposes an efficient method which replaces the RCA using BEC. The modified CSLA architecture has been developed using gate-level modification to significantly reduce the delay and power of the CSLA. Based on this modification 8, 16-, 32-, 64-and 128-bit Square-Root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design for 256-bit has reduced power and delay as compared with the regular SQRT CSLA. Designs were developed using structural Verilog module and synthesized using Xilinx ISE simulator and the implementation is done in cadence RTL compiler using 0.18 µm technology. For 256-bit addition in this study, it is proposed to simple gate level modification which significantly reduces the power by 19.4% when compared with R-CSLA. The result analysis shows that the proposed architecture achieves two folded advantages in terms of delay and power.


INTRODUCTION
Power and area have major role in the designing of integrated circuit because of the increase in popularity of portable systems as well as the rapid growth of power density in VLSI circuits.Addition usually influences strongly on the overall performance of digital systems and a crucial arithmetic function.Adders are most widely used in electronic applications.For example, in microprocessors, millions of instructions per second are performed.Due to the increase in the portability of the devices like mobile, laptop etc., require more battery backup.Low power (Edison and Manikandababu, 2012) and area efficient addition and multiplication have always been a fundamental requirement of high performance processors and systems.Designing efficient adder is the most difficult problem in VLSI design.
Carry Select Adder are used for high speed application by reducing propagation delay.The basic operation Carry Select Adder (CSLA) is parallel computation.CSLA generates many carriers and partial sum.The final sum and carry are selected by multiplexers (mux).Multiple pairs of Ripple Cary Adders (RCA) are used in CSLA (Mitra and Dutta, 2012) structure.Hence, the CSLA is not area efficient.
The proposed method use Binary to Excess-1 Converter (BEC) instead of RCA with Cin = 1 in the regular CSLA.The main goal of this BEC logic is to use lesser number of logic gate than the n-bit Full Adder.So that, the modified CSLA architecture is lower area and power consumption.In the modified CLSA the input bits are given in linear manner to achieve low power.This study is implemented for higher order bits (till 256 bits) and the Comparison between Regular SQRT CSLA and modified SQRT CSLA is discussed below:

Binary to Excess -1 Converter
The main idea of this study is to use BEC instead of the RCA with Cin = 1 in order to reduce the area and power consumption of the regular CSLA.To replace the n-bit RCA, an n+1-bit BEC is required.The modified CSLA architecture has developed using Binary to Excess-1 Converter (BEC).Figure 1 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux.The XOR gate in BEC of Modified CSLA is replaced with the optimized XOR gate in And or Inverter (AOI) of Modified Area Efficient CSLA.With BEC there is reduction of gates by replacing n bit RCA with n+1 bit BEC.When the optimized XOR gate is used in Modified CSLA, it is verified that there is large reduction in number of gates.The multiplexer (mux) is used to select either the BEC output or the inputs given directly to a BEC circuit of next block.In this design, the major function of mux is to derive the adder speed.
According to the control signal C in (Subha and Durga, 2013), the mux is used to select the output from the inputs (input bits as per the block size and the BEC output).The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed.The Boolean expressions of the 4-bit BEC (Pandey et al., 2013) are listed as (note the functional symbols~NOT, & AND, ^XOR): Regular SQRT carry select adder: In this part the regular SQRT CSLA operation and its delay calculation are discussed.A SQRT carry select adder is constructed using the conventional 4-bit Ripple Carry Adder (RCA).The RCA uses multiple full adders to perform addition operation.Each full adder inputs a carry-in, which is the carry-out of the preceding adder.
The CSLA divides the words to be added into blocks and forms two sums for each block in parallel, one with assumed carry in (Cin) of 0 and the other with Cin of 1.The carry-out from one stage of 4-bit RCA is used as the select signal for the multiplexer.
This selects the corresponding sum bit from the next block.This speeds up the computation process of the adder.Thus, the carry select adder achieves higher speed of operation at the cost of increased number of devices used in the circuit.This in turn increases the area and power consumed by the circuits of this type of structure.(Nair, 2013) is always greater than the arrival time of data outputs from the RCAs.Thus, the delay of group 3 to 5 is determined (Mitra and Dutta, 2012;Pandey et al., 2013) respectively as follows: The one set of 2-bit RCA in group 2 has 2 FA for Cin = 1 and other set has 1 HA for Cin = 0.The area and total no. of gates can be calculated as follows: Gate Count = 57 (FA+HA+mux) FA = 39 (3*13) (FA-Full Adder) HA = 6 (1*6) (HA-Half Adder) Mux = 12 (3*4) Proposed SQRT carry select adder: In this type of Adder, the block of Ripple Carry Adder with input carry as 1 has been replaced with a block of Binary to Excess-1 Converter (BEC) as shown in Fig. 2.This is done in order to reduce the area and power requirement of the previous conventional Carry Select Adder.The    1.
The area of each group is calculated manually by the no. of gates used in the structure.The percentage reduction of CSLA for different word sizes are given in Table 2.
Delay evaluation methodology of modified SQRT CSLA: Thirty two-bit modified SQRT CSLA structure is given in Fig. 2. The steps leading to the delay evaluation are given here Table 1 and 3.The second group has a 2-bit RCA.Instead of another 2-bit RCA with Cin = 1, a 3-bit BEC is used which adds 1 to the output from 2-bit RCA.Based on the values of the Arrival time of selection input c1 of 6:3 mux is earlier than the s3 and c3 and later than the s2.Thus, the sum 3 and final c3 (output from mux) depend on s3 and mux and partial c3 (input to mux) and mux, respectively.An area count of CSLA is given in Table 1.Modified partial CSLA structure of group 2 and group 5 are given in Fig. 3a and b.
For the remaining groups the arrival time of mux selection input is always greater than the arrival time of data inputs from the BECs.Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.Comparing the delay values of the earlier models and the proposed model, the reduction in area, power and delay values are given in Table 2.The implementation results in terms of leakage, switching and total power are given in Table 3.These results are obtained by Cadence RTL compiler.

RESULTS AND DISCUSSION
The design proposed in this study has been developed using Verilog-HDL and synthesized in Cadence RTL compiler using typical libraries of TMSC 180 nm technology.Designs of CSLA were developed using structural Verilog module and synthesized using Xilinx ISE simulator, version 10.1 and the implementation is done in cadence RTL compiler.
The percentage reduction in the total power dissipation and the delay, with respect to the worst path of the flow is given in Table 2.The analysis shows that there has been a considerable decrease in the power and delay with slight increase in the area compared to the earlier work (Ramkumar and Kittur, 2012).
The implementation results are as shown in Table 3.The percentage of delay overhead is as shown in Fig. 4.
The percentage reduction in the cell area, total power, power-delay product and the area-delay product as function of the bit size are shown in Fig. 5.

CONCLUSION
After comparing the different parameters of various adders with the proposed modified SQRT CSLA, it is evident that the power dissipation has been reduced to the desired extent with a slight increase in area.The proposed model provides a good tradeoff between the time and power consumption.Hence the modified 256-bit CSLA is more efficient for the VLSI hardware implementation.Further work is to be done in reducing the area and for higher order adders (512-bit), thus improving the overall system performance as such.

Table 1 :
Area count of CSLA Group no.The structure of the 16-bit Regular Square Root Carry Select Adder (SQRT CSLA) has five groups of different size RCA.Only group 2 delay evaluation is discussed: • The group 2 has two sets of 2-bit RCA.The regular CSLA (Mitra and Dutta, 2012) structure has two Ripple Carry Adder (RCA).One of RCA use with initial carry Cin = 0 and other with carry Cin = 1.

Table 3 :
Implementation result Power Switching power maximum estimated areas of each group of the modified and regular SQRT CSLA are given in Table * : Total power = Leakage power + Internal power +