A 159 μ W , Fourth Order , Feedforward , Multi-bit Sigma-Delta Modulator for 100 kHz Bandwidth Image Sensors in 65-nm CMOS Process

A fourth-order, three-stage, feedforward cascade sigma-delta modulator (ƩΔM) for CMOS image sensor applications is realized in low leakage, high threshold voltage 65 nm CMOS standard process. A top down CAD methodology is used for the design of building blocks, which involves statistical and simulation optimization at different stages of modulator. The multi-bit ƩΔ architecture employs OTA sharing technique with the dual integrating scheme at the first stage and the gain boosted pseudo-differential class-C inverters as OTAs for the rest two stages for low area and power consumption. The operation of proposed ƩΔM is validated through post-layout simulations, considering worst case. The ƩΔM operates at a power supply of 1-V offering a peak signal-to-ratio of 92 dB and a peak signal-to-noise plus distortion ratio of 89 dB for a signal bandwidth of 100 kHz. The overall power and estimated area consumed by the ƩΔM including auxiliary blocks is 159 μW and 101.2 mm, respectively.


Introduction
The development in ubiquitous computing and artificial intelligence over the last decade has led to a remarkable rise in the application of CMOS image sensors (CISs).The scaling down of CMOS technologies permits a large number of sensor array implementation on the same die, therefore the demand of low power and compact size analog-to-digital converters (ADCs) with moderate speed has increased.The main design challenges for signal conditioning circuit for CISs are: 1) low power consumption, 2) miniature size, 3) immune to noise and 4) the signal should be processed in a stable state before sent to the telemetry system [1].The conceptual block diagram of a CIS with the column ADC is shown in Fig. 1.The CISs consist of a pixel array, column parallel readout circuitry, a row decoder, biasing circuits, buffer memory and a correlated double sampling (CDS) circuit [2].The sigma-delta modulators (ƩΔMs) are usually employed as ADCs because of their high resolution at low frequencies.However, the usage of multiple operational transconductance amplifiers (OTAs) in ƩΔMs makes them bulky and power hungry.Therefore, the bottleneck for designing low power and small size ƩΔMs is to amend the OTAs.
The constraint of threshold voltage V TH on scaling and low power consumption has led to development of many low voltage design techniques like level shifting techniques [4] or using floating gate (FG) metal-oxide semiconductor transistor (MOST) [5], sub-threshold MOST [6] or bulk driven (BD) MOST [7].Other extensive techniques employed for low power and compact size ƩΔMs are OTA sharing between two stages [8], [9], and using inverters as OTAs [10][11][12][13][14][15][16][17].However, these techniques limit the dynamic range of ƩΔMs and contribute more noise.In [13], a ƩΔM is introduced which employs inverters near-threshold voltage instead of conventional OTAs.This modulator provides a good performance but consumes more current.In [14], a class-C inverter is used instead of OTAs for low-voltage, low-supply incremental ƩΔM.For low power consumption and small static current, the transistors in inverter are operated in sub-threshold region.Due to the low dc gain of class-C inverters, the ƩΔM results in low performance, non-linearities and leakage.In [15], a high threshold voltage transistor inverter is employed to improve the performance of ƩΔM.The leakage issues are decreased using switches with charge protection and re-arranged reference signal schemes.In order to improve the signal-to-noise ratio (SNR) of ƩΔM, a gain boosted class-C inverter is employed for ƩΔM in [16].The gain boosted technology resulted in improvement of gain of traditional class-C inverter to 83 dB, but results in degradation of ƩΔM performance for high speed CISs applications.This issue can be resolved using a reset clock with small offset class-C inverters.A behavioral model is also introduced in ƩΔM in [16], which needs further development for better accuracy.In [18], a ƩΔM is reported using discrete-time (DT) passive loop filter, gives an acceptable performance with low power consumption.The use of large capacitors in loop filters increases the overall size.In [19], a DT ƩΔM uses bulk driven technique for implementing the OTA but results in degraded performances in terms of signal-to-noise plus distortion ratio (SNDR) and dynamic range (DR).This paper presents a fourth-order cascade (2-1-1), 3-bit, feed-forward (FF) ƩΔM with dual integrating scheme (DIS), implemented in 65 nm CMOS technology at a supply voltage of 1 V. From the frequency range of CISs signals, the proposed ƩΔM is designed for 100 kHz signal bandwidth, however, can be used for other lower frequencies with minor adjustments.The design of ƩΔM banks on the exhaustive behavioral modeling that involves both the statistical and simulation optimization at subsystem level.The rest of the paper is organized as follows.Section 2 discusses the architectural considerations, trade-offs related to ƩΔM specifications using canonical equations and a detailed top-down behavioral modeling for block level specifications.In Sec. 3, the circuit level implementation of ƩΔM is presented and its operation is validated through post-layout simulation results presented in Sec. 4. Lastly, Section 5 gives the conclusion of the paper.

ƩΔ Modulator System Level Design Considerations
For higher resolution and speed of the ƩΔ converters, the oversampling ratio (OSR) should be small to restrict the clock speed and hence the bandwidth of the integrators [20].Single-loop, one-bit ƩΔ converters exhibit good accuracy at higher filter orders.Unfortunately, the increase in filter order results in stability issues at the modulator output in terms of low frequency oscillations and large am-plitudes, leading to deterioration of modulator's SNR [21], [22].Cascaded topologies employ higher-order noise shaping techniques and second order modulator for better stability [21].These topologies demand high block level specifications for the reduction of noise leakage at the input of the modulator, which makes them power hungry and consumes large area.Now, for the enhancement of the DR of ƩΔM, the resolution of the embedded quantizers is increased.The multi-bit quantizer roughly reduces the in-band quantization noise power by 6 dB for every additional bit [21].Contrary, to single bit quantizers, they add complexity to the design with more analog circuitry.The proposed modulator employs a cascaded multibit ƩΔ topology for achieving a high DR with low OSR.The 2-1-1, 3-bit ƩΔM utilizes OTA sharing technique with DIS in its first loop for low power and area.The use of 3-bit quantizer improves the overall accuracy by 12 dB as compared to single bit quantizer.

ƩΔM Architecture Selection
The architecture and the block level specifications of Σ∆M are decided by behavioral modeling of the modulator [21][22][23].In this paper, an optimization based CAD synthesis tool, SIMulink-based SIgma-DElta Simulator (SIM-SIDES) [21], is used for developing the topology for given specifications.The architecture for the given specifications is chosen from the cascade topologies of ƩΔM, based on the (2 -1 L -2 ) relation, where L is the modulator order.The blocks of the cascaded topology are generally described by three parameters: Quantizer resolution (B), OSR and L. Once these parameters are found, Schreier's MATLAB Delta-Sigma toolbox [24] is used for finding the suitable topology.The in-band error power (IBE) of ƩΔM is expressed as follows [21] CN Q nl st

IBE P P P P
where P CN , P Q , P nl and P st are IBE power of circuit noise, quantization error, non-linearity errors and settling errors, respectively.The ƩΔM is designed in such a way that: Moreover, as the signal bandwidth is moderate, OSR can be more flexible [21].Based on (2), a fourth order, 2-1-1 topology is the best fit.The detailed behavioral block diagram of the cascaded 2-1-1 multi-bit ƩΔM for 16 bit resolution is shown in Fig. 2, in which the scaling factors of inloop integrators are denoted by a i , b i , c i where i = 1,2,3....The first loop acts as a second-order Σ∆M followed by the second and third stage as first-order Σ∆M.The SNR of Σ∆M is further improved by replacing the single bit quantizer of the third stage by 3-bit quantizer.For the proper operation of Σ∆M, the following equations must be satisfied [21].
where K q is the gain of quantizer.The coefficients are properly chosen to limit the output of integrators within 10% to 80% of the supply voltage, when the Σ∆M is not overloaded.The Σ∆M can be considered as a two-port system with input (x,e) and output (y), that can be represented in Z-domain by: where X(z) and E(z) are the Z-transform of the input signal and quantization noise, respectively, and the STF(z) and NTF(z) are the signal transfer functions and noise transfer functions.The first three scaling factors are chosen arbi-trarily and others are calculated to map the corresponding STF(z) and NTF(z) in Z-domain.The overall transfer function of Σ∆M is given by:

Block-Level Specifications (High Level Sizing)
After the architecture of modulator is decided, the given specifications (resolution and signal bandwidth) of modulator are mapped for the electrical specifications of different sub-circuits, like amplifiers, switches, comparators and passive elements like resistors and capacitors.The behavioral model of ƩΔM for 16 bit resolution with 100 kHz signal frequency is implemented in SIMSIDES and is simulated for time (N -1)T s , where T s is the sampling time and N is the number of levels.
The model developed includes all the non-idealities associated with the quantizer such as those of switched capacitor (SC) circuits and comparators.For the calculation of DC gain, slew rate, output swing and maximum current to be driven through OTAs, the DC gain of OTAs are varied against each other for the desired SNR. Figure 3 shows the 3-dimensional plots of SNR as a function of different OTAs DC gain, where A0, A1 and A2 represent the DC gain of Integrator 0, Integrator 1 and Integrator 2, respectively, and Ain is the input signal amplitude (in volts).Based on the results obtained from behavioral model of cascaded 2-1-1, 3-bit Σ∆M, the block level specifications are summarized in Tab. 1.The loop coefficients of Σ∆M determined from the capacitor ratios are given in Tab. 2.

Proposed Σ∆ Modulator
A fourth-order cascade 2-1-1 FF Σ∆M composed by a second-order FF Σ∆M and two first-order Σ∆M is proposed, as shown in Fig. 4. The benefits of employing FF at system level are: 1) Signal transfer function (STF) is unity; 2) Building blocks are less sensitive to non-idealities; 3) Internal signal swing is reduced; 4) Overload level gets improved, thus improving the DR and, 5) reduced complexity of Σ∆M [25].The internal swing is further reduced by employing a 3-bit quantizer, thus relaxing the gain requirements of OTAs.
A fully differential switched capacitor is used for implementation of Σ∆M, because of its large DR and immunity to surrounding noise.It consists of two non-overlapping phases ɸ 1 and ɸ 2 , followed by delayed versions of ɸ 1 and ɸ 2 (ɸ 1d and ɸ 2d ) for the reduction of charge injection effects in switched capacitor circuits.During ɸ 1 , the input signal is sampled through the sampling capacitor (C 1 ) and in phase ɸ 2 , the charge is transferred to integration capacitor (C 2 ) for integration.A symmetrical voltage reference +V ref and -V ref , where +V ref = 1 V and -V ref = 0 V, are used to minimize the effect of feedback levels on the DR of modulator.The switches are implemented using CMOS transmission gates.The ON-resistance of CMOS transmis- Tab. 3. Clock signal representation in Fig. 4.
sion gate warrants a rail to rail operation as long as V DD -V SS > V TN + V TP .The sizing of nMOS and pMOS transistor is done appropriately for smaller on-resistance to limit the harmonic distortion of Σ∆M.
The Σ∆M employs three non-inverting, parasitic insensitive delaying switched capacitor integrators (SCI), for the reduction of double settling problem.The two SCIs used in the first loop of the Σ∆M are embedded into unit SCI using the technique of opamp sharing, thus reducing overall area and power.As shown in Fig. 4, the first and second integrators are represented by the upper and lower sides of the shared integrator.The use of shared opamp affects the linearity of Σ∆M due to the residual charge storage at the input parasitics at the OTA [9].However, due to the high performance of the shared opamp, it does not suffer from the residual charge.
The integrators used in the proposed Σ∆M employ sampling capacitors (C i,a/b , where i = 1,2,3....) and switches to perform the double sampling (DS) of input analog signal, as shown in Fig. 4. The sampling and integration operations are performed by using slow time-interleaved clock signals of ɸ 1 and ɸ 2 (ɸ S1 = ɸ 1 /2 and ɸ S2 = ɸ 2 /2).However, the DAC circuit employed in the feedback path consists of single sampling capacitors and switches operating at nominal sampling frequency (ɸ 1 ).As the most critical blocks of Σ∆M operate at ɸ 1 /2, the GBW product and gain requirements of OTAs are relaxed compared to conventional OTAs, therefore reducing the power consumption.For higher linearity of Σ∆M, a memory-less return-to-zero scheme is used for 1-bit feedback DAC [21].The different clock signals along with their non-overlapping signals represented in Fig. 4 correspond to the signals given in Tab. 3.
Instead of using conventional opamps for A2 and A3, a pseudo differential class C inverter with gain boosted technology is realized as an amplifier in SC circuits.In comparison to conventional opamps, no virtual ground is provided by the PDI because of its only input.Instead, the input node of inverter is kept near the offset voltage (V off ) by forming a closed loop as follows: where V inv is the input voltage of inverter, A inv is the inverter dc gain, and V C1 is the voltage at capacitor C 1 .
During phase ɸ 2 , the charge transferred through , where V 1 is the input signal.An auto-zeroing technique can be employed to cancel the offsets by forming a virtual ground.The gain boosted PDI configuration of SCI avoids the requirement of common-feedback (CMFB) circuits at low supply voltages [26].During phase ɸ 1 , the CMFB capacitor (C M ) gets discharged to signal ground level whereas in phase ɸ 2 , the C M gets charged to commonmode voltage (V CM ).The CMFB loop is realized by applying the difference between V CM and signal ground to the integrator.

Circuit Level Implementation
The Σ∆M is generally integrated on a chip surrounded by thousands of transistors, resulting in leakage issues, increased power consumption and harmonic distortion [27].A low-leakage with high threshold voltage (LL_HVT) transistor technology is used instead of transistors with standard performance (SP).The LL_HVT results in less leakage current as compared to the SP 65 nm CMOS package.From Tab. 2, a convenient topology for each sub-circuit, i.e., OTA, comparator, switches and passive elements are chosen to meet the specifications at circuit level.The selected circuit topologies are analyzed and the impact of temperature variations, technology corners and supply voltage are taken into consideration.For the correct operation of the Σ∆M circuit, the worst case performances of different sub-blocks are considered.
The operation of sub-circuits of proposed Σ∆M is discussed as follows.

Opamps
The total in-band error power contributed by A2 is attenuated in the signal band by the gain of front end integrator (A0) [21].Therefore, the performance of A0 is more demanding than A1.Thus, the power consumption can be reduced by designing A2 with relaxed specifications.The A0 is implemented using a fully differential, three stage opamp [28] for low power supply (1-V), as the noise constraints are easily met by its output swing (>70% of V DD ) with switched capacitor common mode feedback (SCCMFB) circuit.The schematic of three stage opamp (A0) is shown in Fig. 5.The common-source amplifier used at the second and the third stage does not limit the output current by bias current, and provide high slew rate (SR) with low static power consumption.The sizing of compensation devices R C1 , R C2 and C C1 , C C2 are done so that a phase margin of at least 70 is achieved during integration phase.
During the second phase the loop gain gets increased by (1 + C 1a /C I ) times and the load capacitance increases from  The robustness of A0 to mismatch and process variations is analyzed by doing Monte Carlo simulation over 1000 runs (3 sigma interval).The A0 has a DC gain of 76 dB, 72° phase margin, 202 MHz GBW product and consumes a power of 85 µW. Figure 6 shows the AC performance of A0 with a capacitive load of 1 pF.
Due to the relaxed specifications, the rest of OTAs (A2 and A3) are realized using gain boosted PDIs.For the sake of simplicity, the gain boosted circuits are not discussed [16].For a higher dc gain and GBW product, the inverter is operated at the boundary of triode and saturation region, which are realized by using LL_HVT transistors having their collective threshold voltage (V TN + V TP ) equal to supply voltage [29].
The operation of class-C inverter is divided into three stages, shown in Tab. 4. In phase ɸ 1 , both the transistors are operating in deep triode region, forming a feedback loop with input offset voltage (V X ).At the beginning of phase ɸ 2 , V X changes to (V OFF -V 1 ) and one of the transistors of inverter operates in saturation region while the other in deep triode region, depending on V DD .Due to the negative feedback, the charge is transferred through C 1 making V X = V OFF again.At the completion of phase ɸ 2 , both the transistors operate in deep triode region.The inverter provides a large dc gain when operated in deep triode region and a higher slew rate with small static current is achieved with either of the transistors is working in inversion region.As the class-C inverter has low short circuit current, the settling time gets mitigated by ~70%, without increasing the static current.
The AC performance of A2, including process variation and component mismatches, is shown in Fig. 7.The A2 has an average DC gain of 48 dB, a phase margin of 87° and GBW of 78 MHz.The overall transistor sizing and electrical performances of both A0 and A2 are summarized in Tab. 5 and Tab.6, respectively.The minimum length transistors are avoided to reduce the flicker noise and mismatch effects.Tab. 6. Simulation results for the A0 and A1.

Comparator
Most of the non-idealities associated with comparators are dealt during the noise shaping by loop filters.The design specifications of comparator are obtained from Tab. 1.The hysteresis and offset can be tolerated but the comparison time must be at least 1/4 of the clock speed, i.e. 6.4 MHz [21].In order to attain the required resolution time and hysteresis, a single bit quantizer, shown in Fig. 8, is realized using conventional dynamic comparator.The comparator results in small static power dissipation, high input impedance and is immune to noise and mismatch effects [30].The operation of comparator depicted in Tab. 7.
The total delay (t delay ) of the comparator is given by the expression [30]:  and M 2 ).The transient response of the comparator is shown in Fig. 9.The transistor sizing and the electrical results of the comparator are summarized in Tab. 8 and Tab. 9, respectively.

Clock Generator
The non-overlapping clocks ɸ 1 and ɸ 2 are important for the optimal operation of SC Σ∆M.In order to reduce the effect of clock feedthrough signals, delayed clock signals of ɸ 1 and ɸ 2 (ɸ 1d and ɸ 2d ) are given to the switches at the input terminals of modulators [20].The schematic of clock generator with the clock phase schemes are shown in Fig. 10.To avoid the capacitive loading of the different signals, all clock signals are buffered.Figure 11 shows the switching characteristics of generated clock signals.The phase delay and non-overlapping time are 497 psecs and 240 psecs, respectively.

3-bit Quantizer
At the end of the third stage of modulator, a 3-bit quantizer is implemented for digitization of A3 output and then conversion to analog domain.The 3-bit quantizer, shown in Fig. 11, uses a differential flash quantizer with resistor ladder DAC [21].The differential flash ADC com-

Results and Discussion
The 2-1-1, 3-bit Σ∆M is implemented in 65 nm CMOS standard process, having an estimated area of 101.2 mm 2 , excluding input/output pads, as shown in Fig. 12.The chip layout has separate analog, digital and mixed supplies, where every section is surrounded by guard rings.Major attention is given to the area of SCI, digital cells and other auxiliary circuits.Although, the Σ∆M chip is fully differential, optimization techniques like common-centroid, symmetry and dummy transistors were used to reduce the common-mode interferences.Both digital signals (DAC control and clock signals) and analog supplies are routed using buses that surround the critical analog blocks for shielding them from noise interferences.The Σ∆M has a power consumption of 159 µW, including band gap reference (BGR) and clock generators.The distribution of power and area consumed by the major parts of Σ∆M is shown in Fig. 13 and Fig. 14, respectively.
The performance of Σ∆M is evaluated at worst-case through multiple post-layout simulations at transistor level in CADENCE environment.The 65536 point fast Fourier transform (FFT) spectrum for the ƩΔM with a pre-amplifier of 10 dB gain and the result summary are given in Fig. 15 and Tab. 10, respectively.Figure 16 shows the SNR and SNDR versus the normalized input amplitude.To measure SNDR and SNR of the modulator effectively, the input amplitude was increased by 10 dB from −85 dB to −10 dB, then by 1 dB from −10 to 0 dB to obtain more detailed data.With an input sinusoidal signal of 51.1 kHz for a signal bandwidth of 100 kHz, the Σ∆M offers and SNR and SNDR of 92 dB and 89 dB, respectively.The    effective number of bits (ENOB) is equal to 14.49, given by (SNDR-1.76)/6.02.The clock frequencies of 3.1 MHz and 6.4 MHz are supplied using on-chip clock generators.
From Tab. 11, the variation of DR with BW, BW with FOM 1 are shown in Fig. 17 and Fig. 18, respectively.It is concluded that the presented Σ∆M has an overall FOM higher than the related recent Σ∆Ms.

Conclusions
In this paper, a 2-1-1, 3-bit FF Σ∆M employing DIS at the first stage is realized using 65 nm CMOS standard process with a power supply of 1-V for CIS applications.The Σ∆M oversamples an input signal of 100 kHz bandwidth at 32 times.Due to the OTA sharing in the first loop and the usage of gain boosted, pseudo-differential class-C inverters for the rest of OTAs, the Σ∆M results in low power and area consumption.The Σ∆M results in an ENOB of 14.49, SNR of 92 dB and SNDR of 89 dB, while consuming an area and power of 101.2 mm 2 and 159 µW, respectively.The post-layout results confirm that the presented Σ∆M can be used in various low-power, high resolution CIS applications.

Fig. 3 .
Fig. 3. (a) Variation of SNR with DC gain of A0 and input signal amplitude.

Fig. 5 .
Fig. 5.A fully differential 3-stage opamp with SCCMFB circuitry used at the first stage.

Fig. 10 .
Fig. 10.Schematic of clock generator with the clock phase schemes.

Fig. 11 .
Fig. 11.3-bit quantizer.paresthe integrator (A2) output with the voltages generated at different resistors in the resistor ladder.The quantizers employ the same comparators discussed above.The thermometer code from the comparator output is converted into a 1-of-8 code (d 0-7 ), which controls the resistor ladder DAC.The resistor ladder uses 8 resistors connected between V DD and gnd, thus giving a full-scale of 1-V with a current consumption of 35 µA.