A 3.5 GS/s 1-1 MASH VCO ADC With Second-Order Noise Shaping

In this work, a 3.5 GS/s voltage-controlled oscillator (VCO) analog-to-digital converter (ADC) using multi-stage noise shaping (MASH) is presented. This 28nm CMOS ADC achieves second-order noise shaping in an easily-scalable, open-loop configuration. A key enabler of the high-bandwidth MASH VCO ADC is the use of a multi-bit estimated error signal. With an OSR of 16, an SNDR of 67 dB and DR of 68 dB are achieved in 109.375 MHz bandwidth. The full-custom pseudo-analog circuits consume 9 mW, while the automatically generated digital circuits consume another 24 mW. A $\mathbf{FoM_{DR} = 163}$ dB and core area of $\mathbf{0.017\,\mathbf{mm}^2}$ are obtained.


I. INTRODUCTION
Over the past few years, ring oscillator-based voltagecontrolled oscillator (VCO) analog-to-digital converters (ADCs) have gained popularity [1], [2].They offer a multitude of interesting properties, such as inherent anti-aliasing, guaranteed monotonicity, a very low intrinsic noise level and a high sensitivity.Even more enticing is their ability to leverage CMOS technology scaling without the challenges that traditional analog circuits face.This has led to their adoption in a wide range of fields, including sensor readouts, IoT and biomedical applications [3]- [20].
An important next step to improve the performance of VCO A/D conversion is increasing the noise shaping order at high bandwidths (>100 MHz).Multiple higher-order VCO ADC architectures have already been proposed and implemented.However, these options are often not very well suited to our goal of creating an easily scalable and high speed ADC.
For instance, designs as [11]- [14], [21]- [24] reach impressive performance.However, next to VCOs, they also depend on loop filters using OTAs and are therefore not as easily scalable, which counters one of the principal advantages of VCO-based A/D conversion.Higher-order continuous-time Σ∆-modulators using only VCOs have also been presented [7], [25]- [28], but these can be challenging to design for high bandwidths due to the trade-off between performance and stability linked to excess loop delay.
We therefore believe 1-1 multi-stage noise shaping (MASH) VCO ADCs to be a promising option for high bandwidths, as they enable purely VCO-based higher-order A/D conversion The authors are with the Department of Electronics and Information Systems (ELIS), Ghent University, 9052 Gent, Belgium (e-mail: brendan.saux@ugent.be).without this trade-off.Although this technique has been implemented for bandwidths of up to 2 MHz [29], to date, no successful designs using this technique at high bandwidths have been presented.The main difficulty is realizing a sufficiently accurate coupling between the successive stages.
In this work, the coupling between the two stages is performed by using a multi-bit estimated error signal, which allows multi-phase readout of the first stage.The potential issue of nonlinearity in the second stage, which leads to noise leakage, is solved by the use of a pseudo-differential setup using cross-coupling.We will show this method also offers an increased robustness against pulse-width errors.Special care was taken to maintain the integrity of the estimated error signal by the use of dedicated buffers to pull the VCO output signal rail-to-rail and the use of fast sense amplifiers in the error estimation circuit to reduce metastability.
This results in the first implementation of a higher-order purely VCO-based ADC with a bandwidth greater than 100 MHz.This is also the first implementation of a 1-1 MASH VCO ADC featuring multi-bit error estimation.
The rest of this paper is organised as follows.In section II the implemented architecture using the multi-bit estimated error signal is presented and analyzed at the system-level.In section III the issue of nonlinearity in the second VCO ADC stage is discussed along with its system-level solution.Section IV details the circuit-level implementation and design considerations.Section V contains measurement results and explains the calibration procedure to suppress harmonics.Finally, section VI concludes this work.

II. MASH VCO A/D CONVERSION
In the following sections, explicit references to time and frequency variables in the text will be omitted for readability, e.g.E will be used instead of E(s).

A. Multi-Bit Error Estimation
This work builds on [30], where it was shown that a singlestage VCO ADC is strictly equivalent to a pulse frequency Fig. 2: Conceptual illustration of the performance degradation due to presence of the spurs of the first PFM sideband in the signal bandwidth [30].modulator (PFM) followed by a pulse shaping filter and a sampler.A few of the insights obtained from this model will be essential to the understanding of the implemented architecture.
We will summerize these here first.
A model of a simple VCO ADC with relevant in-band signals is shown in Fig. 1.For simplicity, the VCO frequency gain K was normalized to 1 and the contribution of the VCO rest frequency f 0 was omitted.An input signal V in is applied to the system and is subsequently integrated, sampled and differentiated to obtain the output signal D. Two noise sources are present in the system: M LF represents the in-band PFM components and U al represents the aliased high-frequency PFM components that enter the bandwidth through sampling.U al is what is commonly referred to as the quantization noise in the phase model interpretation of VCO A/D conversion.However, M LF , which consists of spurious tones of the first PFM sideband, also degrades the performance of the system.Assuming the readout circuits use both the rising and falling edges of the VCO, the first PFM sideband is centered around the effective VCO rest frequency f ef f = 2N ϕ f 0 as shown in Fig. 2.Here N ϕ represents the number of readout phases.
Two insights are now paramount.The first is that M LF cannot be removed from the system and therefore represents a fundamental limit [30].The only way to reduce it for a given bandwidth is either by increasing f 0 -which is only possible until a certain point -or boosting the number of readout phases N ϕ .It is therefore desirable to use a high N ϕ , especially at high bandwidths.
The second is that the 'quantization noise' U al is added for each VCO phase at the moment of sampling.Therefore, by subtracting a VCO output and its sampled and held value, the quantization noise added at the moment of sampling of that phase can be estimated.The total quantization noise in this VCO ADC stage, which is the sum of the quantization noise of each phase, can then be estimated by summing the estimated error signals of each phase.This multi-bit global error can be fed to the input of a second ADC stage.By combining the output of the two stages using the correct noise cancellation filters (NCF), U al of the first stage can be cancelled.This enables the efficient implementation of 1-1 MASH architectures using multi-phase readout in the first stage.
In contrast, almost all previously published VCO-based 1-1 MASH [31], [32] or sturdy MASH [16] ADC architectures use a single-phase readout in the first stage.These architectures will therefore be limited to low bandwidths to limit the effect of the low-frequency PFM spurs M LF , since the first PFM sideband will be located at the much lower frequency of 2f 0 .Since this is solved in our architecture, we expect a substantially higher performance at high bandwidths.This is discussed in more detail after a description and analysis of our architecture.

B. Initial 1-1 MASH VCO Architecture
Fig. 3 shows a first iteration of the implemented 1-1 MASH VCO architecture.In this, a 32-phase VCO readout is used, where each phase W i of the first stage is followed by a readout quantization/sampling/differentiation (QSD) block [33].These QSDs generate the VCO phases' quantized 1-bit outputs D 1,i , which are afterwards combined to form the overall output D 1 of the first stage.The QSDs are modified to also perform the error estimation (see section IV-B for the circuit implementation) and generate 1-bit pulse signals E i which represent the quantization error.The 1-bit pulse signals each contain the quantization error of their respective VCO phase and can therefore be summed to obtain a global estimated error signal E as explained above.Fig. 4 shows the relevant timedomain waveforms to obtain E, which is identical to what would be obtained using a multi-bit counter that counts the individual phase edges.Notice that the information in E i is contained in the individual pulse lengths.
Finally, each phase of the second stage is also followed by a QSD, which generates the 1-bit outputs D 2,i .These are combined to get the overall output D 2 of the second stage.In this initial setup, a rather straightforward way of driving the second stage is used.The estimated error signals E i are immediately summed to calculate E, which is fed to the second stage.

C. Analysis and Theoretical Performance
We will now perform an analysis of the architecture of Fig. 3. To provide a more easily accessible discussion of this architecture, it is performed here using the more conventional phase model interpretation of VCO A/D conversion instead of the PFM model discussed above.However, it must be stressed that this is only valid if the number of readout phases in the first and second stage are high enough to be able to neglect the in-band PFM spurs M LF .
A block diagram representation of the 1-1 MASH architecture using the phase model is shown in Fig. 5 (black).The free-running frequency f 0 was ignored for clarity.N ϕ1 , K 1 and Q 1 respectively represent the number of phases, frequency gain and phase quantization error of the first stage.N ϕ2 , K 2 and Q 2 represent the same concepts for the second stage.
The output of the first stage can be written as in [34] by where the star operator [•] * indicates the effect of sampling [35].For simplicity, the sinc(sT s ) leading to anti-aliasing was assumed to be approximately equal to 1 in the signal band.
The two stages are coupled through the error estimator block (blue), which subtracts the VCO output W 1 and its Fig. 3: Initial MASH VCO architecture.
Hence the estimated error signal E provides information on the quantization noise of the first stage.Since E is the sum of N ϕ1 individual estimated error bits, the frequency gain K 2 of the second stage must be represented as the gain per individual bit, or K 2 = frange,2 N ϕ1 , where f range,2 represents the frequency range of the second stage.The output of the second stage D 2 is again a digital signal and can be found analogously to (1).The total output signal D can then be calculated by applying the correct NCFs to D 1 and D 2 and summing the results as Fig. 5: Simplified block diagram of a MASH VCO ADC (black).The equivalent error N N L to model the effect of nonlinearity in the second stage (red) was also added.
shown below.
where G represents the noise cancellation filter gain, which should optimally be G opt = fs 2N ϕ2 K2 .It can then be shown that Equation ( 4) shows the expected second-order noise shaping, i.e. the first-order shaped noise was cancelled.
A theoretical estimate of the performance can then be calculated as for a Σ∆-modulator [36] similar to what was done by [34].Due to the use of a XOR-based sQSD, both rising and falling VCO edges are counted, allowing Q 2 to be modeled as white noise identically distributed between [0; π N ϕ2 [ [29].We first find the signal power as where f range,1 = 2AK 1 represents the frequency range of the first stage, with A the amplitude of the input signal.
After integrating over the output-referred noise power spectral density, the in-band noise power can be calculated as Second order noise shaping (7) First order noise shaping [34] 1-1 M AS H VC O AD C Fig. 6: Theoretical SQNR limit in function of the OSR for a single-ended 1-1 MASH VCO ADC with second-order noise shaping or a single-stage first-order VCO ADC using the parameters from Table I.Finally, the SQNR is found as Though this expression seems to suggest that N ϕ1 does not influence the performance, this is only true if N ϕ1 is high enough to make the in-band PFM spurs negligible.Additionally, the assumption that the quantization noise Q 2 is distributed as white noise is not entirely accurate [30].
Building on the results of the previous analysis, we can now compare the performance of a MASH VCO ADC to that of a single-stage VCO ADC.Fig. 6 presents a comparison between the theoretical SQNR of the MASH architecture as derived in (7), and the SQNR of a single-stage VCO ADC, as calculated in [34] and adapted for a QSD that counts both rising and falling edges.The parameters used for this are summarized in Table I.Realistic values for the frequency range f range and free-running frequency f 0 were used, which were based on our final sizing.For the single-stage case, the parameters from the first stage of the MASH were used.Note that this does not include thermal noise or circuit non-idealities, whose influence will be discussed further in this text.
As expected, the benefits of second-order noise shaping increase for higher OSRs, yet remain significant at lower OSRs.This highlights the potential of the MASH-based approach to deliver high performance at increased bandwidths.

D. Comparison With Prior Art 1-1 MASH VCO ADCs
Previously published work on 1-1 MASH VCO ADCs falls in two categories.In the first category, shortly mentioned above, the coupling is simplified by performing error estimation on a single VCO phase [31], [32].However, to obtain higher order noise shaping, this limits these MASH ADCs to as used in this work: 72 dB Theoretical performance according to (7): 75 dB Fig. 7: Simulated SQNR of the single-ended 1-1 MASH VCO ADC for different values of N ϕ1 .The case of N ϕ1 = 1 corresponds to the architectures of [31], [32].
using only 1-phase readout (N ϕ1 = 1) in the first stage, as the quantization error of any additional phases would otherwise not be cancelled.Using the proposed multi-bit error estimation scheme, multi-phase readout in the first stage is possible, leading to improved performance for high bandwidths due to the reduced influence of the in-band PFM spurs.
To determine the advantages of our architecture, we simulated the 1-1 MASH architecture on a system-level using ideal Verilog-A components for a varying N ϕ1 using a constant N ϕ2 = 32.For these simulations, we again use parameters listed in Table I, with the exception of a variable N ϕ1 ranging from 1 tot 32.The results of these simulations are shown in Fig. 7.
Assuming a 2-stage design, the case of N ϕ1 = 1 is equivalent to the systems using single-bit error estimation of [31], [32].Due to the high bandwidth, the performance for this case is clearly severely limited by the presence of the lowfrequency PFM spurs in the spectrum as explained above.As we move towards architectures with multi-bit error estimation, the influence of the PFM spurs decreases and the performance asymptotically approaches the SQNR expected from (7).
It could be argued that another readout structure could be more suitable for the N ϕ1 = 1 architectures, as due to the QSDs used in our work, the maximal VCO frequency is limited to f s /2 [37].However, to match our performance, this would imply designing a VCO with a free-running frequency f 0 of approximately 32 GHz, which renders the subsequent high-speed counter design far from trivial [4].The multi-phase approach for the first VCO therefore significantly reduces speed requirements and facilitates 1-1 MASH VCO ADC conversion at high bandwidths.
The second category is the architecture implemented in [29].Here, multi-phase readout of the first stage is possible by using an approximate 1-bit estimated error signal which takes into account all phases of the first stage.The error is approximated as the delay between the rising edge of the clock and the first subsequent edge of any VCO phase.Due to the closed-loop operation of the first stage, this approximation was shown to be very accurate.
However, delay mismatches between the error estimation circuits in detecting the edges of the different VCO phases will lead to first-order shaped noise leaking into the output spectrum [29].First explorations at higher bandwidths revealed these delays become progressively more difficult to match, especially considering the relative complexity of the used error estimation circuits.Additionally, since the pulse lengths of E can be very short in this architecture, a process similar to dead-zone elimination in PLLs is implemented.The error pulses are extended by XOR'ing each sampled phase with the third next phase.It is questionable if this still works at high bandwidths, as this method seems to rely on the assumption that the VCO frequency of the first stage is constant.As the OSR decreases, this frequency will increasingly vary [29] and the pulse extension will no longer be constant.Due to these reasons, we believe this architecture is not ideal for highbandwidth operation.

A. Impact of Nonlinearity in the Second Stage
In Fig. 8(a) the simulated post-layout frequency characteristic of our ring oscillator in the second stage is shown (blue).The frequency characteristic visibly exhibits mostly secondorder nonlinearity.If a nonlinear ring oscillator is used in the second stage, harmonic distortion of E will be introduced in the system.From ( 2), E contains the broadband quantization noise Q 1 of the first stage.Distortion of E will therefore add broadband noise which will not cancel out with the quantization noise of the first stage.This effect can be modeled as an added noise source N N L at the input of the second stage as shown in Fig. 5 (red).Equation ( 4) now becomes It is clear that a first-order noise shaped component was introduced in the output signal, whose effect on the performance needs to be investigated.
In order to quantify this, we performed simulations using ideal Verilog-A components with the same parameters as in the previous section.However, the ideal ring oscillator in the second stage was now replaced with its post-layout extraction and driven with ideal current sources, hence realistically modeling the nonlinearity of the second stage.The resulting SQNR is limited to 69 dB.While this may be a marginally acceptable result for the high-bandwidth design we present here, it leaves very little margin for additional noise sources.In light of this, we explored an architectural improvement, which is the subject of next section.While for our design the improvement mostly serves to create more margin, the new technique is even more important for 1-1 MASH designs that aim for a higher SQNR or in cases where the second stage exhibits more nonlinearity.

B. Architecture With Cross-coupled Estimated Error Signals
VCO ADCs often use a pseudo-differential architecture to reduce even-order distortion.We apply this principle to both stages of the MASH VCO ADC.In a first step, we create a pseudo-differential first stage.This can simply be accomplished by adding a negative channel as shown in Fig. 9 (left).The output D 1 of the first stage is then calculated as In contrast with previously published 1-1 MASH VCO architectures [29], [31], [32], we also use a pseudo-differential setup for the second stage.This will reduce distortion of the broad-band noise contained in E and will therefore mitigate the effect analyzed above.To obtain a pseudo-differential second stage, we first add a negative channel and then propose to cross-couple the estimated error signals of the two channels of the first stage as shown in Fig. 9  Simulations were performed for the cross-coupled architecture using the same parameters as above.The SQNR without nonlinearity is now increased to 75 dB due to the pseudo-differential operation.When performing simulations including nonlinearity, an SQNR of 74 dB is found, which only represents a reduction of 1 dB compared to the ideal case.The cross-coupling has therefore led to a reduced impact of nonlinearity in the second stage.Notice that there is still a small reduction in performance, as the pseudo-differential operation does not suppress uneven-order distortion and some quantization noise leakage therefore still occurs.
The effect of cross-coupling is also visible in the equivalent frequency characteristic of the second stage after crosscoupling, which is shown in Fig. 8(b).This clearly demonstrates the linearized characteristic and the suppression of the dominant second-order nonlinearity compared to Fig. 8(a).The maximum nonlinearity was reduced from 18% to 3%.
An additional advantage of the cross-coupling is that the entire architecture now follows the guidelines outlined in [38], VCO 1,- Fig. 9: Final improved MASH VCO architecture with cross-coupling.which recommends the use of a pseudo-differential setup for all ring oscillators to improve the power-supply rejection ratio (PSRR) and common-mode rejection ratio (CMRR).A final system-level advantage of the cross-coupling will be explained in section III-D, where it will be shown that using a pseudodifferential second stage leads to an increased robustness against pulse-width errors in the error estimation circuit.

C. Practical Implementation
As explained above, the second stage is operated pseudodifferentially by cross-coupling the estimated error signals E + and E − from the positive and negative channel of the first stage.In our practical implementation, the combination of the E i bits and cross-coupling are performed in a natural way by driving the second stage by an array of differential currentsteering current sources operated by the bits E i and E i as shown on the right of Fig. 10.These current-steering sources are common in high-speed DACs and offer a highly improved dynamic behavior compared to single-ended alternatives [39].The use of a pseudo-differential second stage allows for the efficient use of these current-steering sources, as all current used for the combination of the E i bits is also used to drive the ring oscillators of the second stage.To operate the current sources, the error estimation is expanded to generate both E i and its complementary signal E i .The bits E i and E i are generated in parallel using symmetrical logic gates to obtain the same timing behavior, without requiring any further logic.The implementation for this is discussed in section IV-B.

D. Influence of Pulse Width Errors
As the information contained in E i is contained in the pulse widths, an important concern in 1-1 MASH VCO ADCs is the presence of pulse width errors in the error estimation circuit [29].This issue can be understood as follows.Let us consider the signal E i and its complement E i .These are outputs of an error estimation circuit with a VCO phase W i and its sampled and held value W ZOH,i as inputs.The time-domain waveforms are shown in Fig. 11.Ideally, E i would go down at the rising edge of the sampled and held W ZOH,i and go up at the rising edge of W i , as indicated.The inverse is true for E i .The ideal pulse width of these signals is shown in bold lines.However, realistically, the rising edges will be delayed and undergo a limitation on their rise time.We indicate this with an effective delay t r,i .The delay and rate limitation of the falling edges will be modeled by a similar t f,i .The effect of t r,i and t f,i is that a current proportional to t f,i − t r,i will be added to the signal to the second stage during a sampling period.Moreover, every W i can have a different t f,i − t r,i , e.g.due to path length variations in the layout when connecting to the second stage.Other factors that can contribute to this include proximity effects and systemic variations due to temperature or stress gradients [40].In the initial single-ended architecture, only E i is used as input of the second stage, which leads to an introduction of mismatch noise in the input of the second stage as the VCO cycles through the different E i bits [29].This is especially relevant for high-speed designs, as the relative duration of pulse width variations compared to the clock period increases for higher clock frequencies.
In our implemented architecture with cross-coupling, we expect this effect to be mitigated.Due to a symmetrical layout, path length variations of E i and E i of a single error estimation circuit will be very similar.Additionally, these signals will be subjected to comparable proximity effects and gradients.As a result, t f,i − t r,i of E i and E i can be matched.The same current proportional to t f,i − t r,i will be added to both the positive and negative channel of the second stage.These contributions will largely compensate each other due to the pseudo-differential operation of the second stage, thereby reducing the effect of t f,i − t r,i variations.
To demonstrate the effect of cross-coupling, we again perform simulations using ideal Verilog-A components, comparing the single-ended setup with the proposed design.In order to model the effect discussed above, delays and rise and fall limitations were added to the error estimation circuit.To stress test the robustness of both designs, relatively large variations were introduced with t r,i − t f,i distributed from 0 ps to 75 ps in a gradient-like fashion, to reflect path length variations in the wiring towards the second stage.
For the single ended-configuration, this simulation results in an SQNR of 65 dB, or a reduction of 7 dB compared to the ideal single-ended results of 72 dB using only Verilog-A components obtained above.For our proposed architecture using cross-coupling, an SQNR of 71 dB is obtained, which is a reduction of 4 dB compared to the ideal results using cross-coupling.The effect of t f,i −t r,i variations has therefore been reduced, leading to a significant improvement in SQNR.This increased robustness, even for very the large t f,i − t r,i variations in this stress test, is an interesting advantage of the proposed cross-coupled architecture.

IV. CMOS CIRCUITS
A block diagram of the circuit is shown in Fig. 10.The different building blocks will be described in the next sections.The circuits were developed for a 28nm CMOS technology, which offers transistors in different flavors, including ultra low/high treshold voltage transistors (ulvt/uhvt).Note that in sizing tables, the symbol # represents the total multiplier for each finger (ie. the number of fingers per transistor times the multiplier of each transistor).

A. Ring Oscillators
The ring oscillator VCOs consist of 32 differential feedforward delay cells, shown in Fig. 12(a) [4], [41].Each invertor is identically sized.This topology was selected as it is better suited to high-bandwidth operation than a traditional structure using cross-coupled auxiliary invertors [41].
The first stage is directly driven by the overall input voltage V in through the resistive drive circuit of [3], [41], [42], shown on the left of Fig. 10, which will be combined with an offchip calibration procedure to achieve our desired linearity.This procedure is further described in section V-A.
As was explained above, the cross-coupling and summation are performed by driving the second stage by an array of differential switched current sources operated by the bits E i and E i .The switched-current sources are shown in Fig. 10 (red) along with their sizing.
The ring oscillator sizing is based on thermal noise considerations using equations developed in [43], [44].These are Where γ N and γ P are the NMOS and PMOS channel noise factors respectively.R ring = ∂Vring ∂Iring can be extracted using the I(V)-characteristic of a simple delay cell where the transistors consist of 1 finger and evaluating it at the intersection with the load line of the resistive driver as explained in [44].
The ratio of the resistors R 1 /R 2 in the drive circuit is a compromise between the linearity and the frequency tuning range, which determines the SQNR.In this design, the SQNR is prioritized, while we count on the end-of-chain calibration to reduce the distortion components.However, because calibration becomes harder as nonlinearity increases, a target uncalibrated nonlinearity of around 1% has been put forward, which we believe is sufficient for good calibration.This leads to the sizing of the R 1 /R 2 ratio equal to 0.75.R ring can be decreased by increasing the amount of fingers of the transistors in the delay cell.
Since the thermal noise introduced by the second stage is heavily suppressed by the VCO gain K 1 the first stage and shaped when referred to the input, the second ring oscillator can be much smaller.We opt for a width 4 times larger than minimal to avoid bad PSSR [38].
The ring oscillators are sized in order to obtain an SNR due to only thermal noise of 71 dB, leading to the sizings in Fig. 12 and frequency characteristics in Fig. 13 and Fig. 8 for the ring oscillators in the first and second stage respectively.With the post-layout simulated SQNR also at 71 dB, a total SNR of 68 dB is projected.Invertor buffers (gray invertors in Fig. 12) are introduced between the VCOs and the QSDs for two reasons.First, they serve to isolate the VCOs from kickback from the sense amplifiers.Second, the sense amplifier inputs must be rail-to-rail to avoid signal-dependent delays.The buffers are sized to pull the VCO outputs rail-to-rail while minimizing the capacitive load on the VCOs.Their sizing is summarized in Fig. 12(c).The output of a VCO delay cell is shown in Fig. 14 before (red) and after the buffers (blue).It can be observed that the buffers successfully pull the VCO output rail-to-rail.

B. QSD With Error Estimation
A key building block is the QSD following each VCO phase, shown in Fig. 15 [30].The first flipflop of the QSD is implemented as a sense amplifier for reasons that will be explained in the next paragraph.The second flipflop and XOR gate implement the differentiation and are embedded in the automatically synthesized digital core.
For the error estimation, a XOR operation is added to the QSD of the first stage, shown in red in Fig. 15.This implements the substraction shown in Fig. 3.The XOR gates are shown in Fig. 17 along with their sizing.Fast operation is achieved by using a pass-transistor logic implementation.Symmetrical circuits are used to generate E i and E i in parallel.These 1-bit signals are then directly connected to the crosscoupled switched current sources driving the second stage.(a  (a) 300 nm 2 30 nm ulvt For high-speed ADCs in general, metastability is an important concern.In the MASH architecture, this is also highly relevant.As mentioned above, information about Q 1 is contained in the pulse lengths of E i .There is a finite chance that a transition of a VCO phase W i happens close to the rising edge of the clock.Consequently, a relatively small signal is applied to the sense amplifier, leading to a metastability event.In this case, the rising edge of E i will be delayed by the time necessary for the sense amplifier to amplify the input to a logic value.Q 1 will not be cancelled completely and firstorder shaped noise will leak into the spectrum.The need for a sufficiently fast sense amplifier was confirmed through systemlevel simulations.
After a comparison of multiple sense amplifier topologies, the double tail sense amplifier was selected.This is a regenerative sense amplifier offering a very small regeneration time and consequently, an extremely small metastability window [45].Since the double-tail sense amplifier has a reset operation when CLK goes low, its outputs are invalid during half a clock period.The sense amplifier is therefore followed by a set-reset latch to maintain its output during the entire clock cycle [45], thereby implementing the desired sample and hold behavior.A cross-coupled NOR latch is used.The sense amplifier and latch are displayed in Fig. 16 along with their corresponding sizing.Through simulations using a postlayout extracted sense amplifier, it was assessed that the sense amplifier speed was sufficient to avoid performance loss due to metastability.

C. Digital Core
The digital nature of VCO ADC output signals allows to fully exploit the use of automated tools for their subsequent processing.In this work, the use of these tools was also explored.For this, the synthesis was generated from a behavioral Verilog description through Cadence's Genus Synthesis Solution.For the automated place and route the Innovus Implementation System was used.This tool can also automatically insert signal and clock tree buffering.Configuration scripts were written for both.Unfortunately, due to lack of experience with these complex tools, the resulting digital circuit, while fully functional, is less efficient than we hoped Off-chip Fig. 18: Signal flow diagram in the digital core.
for.For example, a layer of power-consuming and unnecessary registers was inserted (layer 1) as visible in Fig. 18, which shows the signal flow through the digital core.We only discovered this after fabrication where we measured a higherthan-expected power consumption in the digital circuit.Note that the digital core does not include the NCFs, which are implemented off-chip as is often done for experimental MASH ADCs [29], [46], [47].However, we found it important to include the estimated power consumption and area in case of an on-chip implementation.After place and route with the aforementioned automated tools, an added power of 1.5 mW and area of 0.0006 mm 2 were found.

D. Inter-stage Gain Mismatch
Multiple sources of non-idealities in the circuit were already discussed above.For example, the influence of nonlinearity in the second stage, pulse width errors and metastability in the sense amplifiers were addressed.A final factor that must be considered is inter-stage gain mismatch, which can also have an impact on the performance of the MASH.
To quantify the effect of inter-stage gain mismatch, we first performed post-layout simulations to evaluate the range of possible values for the optimal noise cancellation filter gain G opt over process and temperature corners.Since G opt = fs 2N ϕ2 K2 , as explained in section II-C, G opt can be calculated in function of the frequency gain K 2 of the second stage.The G opt values resulting from these simulations are summarized in Table II.This table also shows the deviation with respect to the G opt value obtained under nominal conditions (27 • C TT).The corners with the largest G opt deviations are indicated in bold and are found for 0 • C SS (13 %) and 70 • C FF (-13 %).As a consequence, by using the value of G opt obtained under nominal simulation conditions, we do not expect larger G opt deviations than ±13% over temperature and process corners.
Afterwards, post-layout simulations including thermal noise were performed to evaluate the effect of mismatch of G compared to its optimum.The results are shown in Fig. 19.Note that the case of -100% gain mismatch corresponds to an inter-stage gain of 0, which therefore represents the performance of a single-stage VCO ADC consisting of only the first stage.From Fig. 19, we find that the architecture can tolerate variations of G of the aforementioned ±13% around its optimum with less than a 0.5 dB loss in SNR.This relatively flat behavior around the optimum can be understood by considering that even with a small amount of first stage quantisation noise leakage due to non-ideal gain matching, the total noise at this point will still be dominated by other contributions, most notably the thermal noise.Additionally, eventual quantization noise leakage will still be first-order noise shaped.Only when the gain mismatch exceeds a certain level, we notice a rapidly decreasing SNR.Qualitatively, these observations are also consistent with the measurement results of the 1-1 MASH VCO ADC reported in [29].By using the value of G opt obtained under nominal postlayout simulation conditions, we therefore do not expect deviations larger than ±13% over temperature and process corners.This corresponds to an expected reduction in SNR due to NTF gain mismatch of less than 0.5 dB.We found this acceptable and did not use a separate G calibration, but used the nominal G opt obtained after post-layout simulations throughout our measurements.

V. PROTOTYPE MEASUREMENTS
To demonstrate the performance of our architecture, a prototype was manufactured in a 28nm CMOS technology node.Multiple supply voltage domains are present which can be configured individually.More specifically, the supply voltages of the first and second VCOs, the sense amplifiers & XOR gates and the digital core can all be set separately.The sampling frequency was set at 3.5 GHz.The supply voltages of the digital core and the sense amplifiers were set at 1.05 V and 1 V respectively.All other voltages were set at the nominal supply voltage of 0.9 V.A micrograph of the chip is shown in Fig. 20(a).The total core area is only 0.015 mm 2 , including 0.006 mm 2 for the analog core.The circuit consumes 9 mW and 24 mW in the analog and digital core respectively for a full-scale input signal of 900 mV pp as summarized in Fig. 20(b).Fig. 20(c) shows a breakdown of the dominant noise sources.

A. Measured Nonlinearity Curve and Calibration Procedure
In a first step, we will evaluate the distortion of the ADC.For this, we perform calibration measurements where the ADC is driven by a spectrally pure sine wave and the output D is captured.The output can then be split into three components: (i) the fundamental sine wave, which corresponds to the signal component D sig , (ii) the distortion DIST, which corresponds to the distortion of D sig and (iii) the noise r.This is done using a fit of the components at the input frequency f in and harmonics of f in , leading to D sig and DIST.The residue is the noise r.For this procedure, efficient sine-wave fitting algorithms as developed in IEEE-STD-1057 can be used [48].For future reference, we introduce the clean output signal, which is the output signal without the noise: Here n represents the n-th sample.The extracted signal component D sig and distortion DIST can be referred to the input to obtain what we will call the instantaneous distortion, ie. the distortion with respect to the input signal for every sample.The resulting curves for a 2 MHz and 26.5 MHz signal are shown in Fig. 21(a) and 21(b) respectively.A clear frequency-dependent effect can be observed, as the distortion noticeably increases for higher frequencies.This can be explained by parasitic capacitances at the tune node of the VCO influencing the VCO current and frequency [48].Traditional VCO ADC calibration methods as used in [4], [7], [8] model the VCO nonlinearity by a single nonlinear function and are not capable of correcting the frequency-dependent distortion components.
Instead, the linear performance of the ADC is improved by using the foreground calibration procedure we recently published in [48], which explicitly takes into account these frequency-dependent components.We will provide an overview of the applied procedure in this section and refer to the aforementioned work for an in-depth discussion.It is shown in [48] that a VCO can be modeled as in Fig. 22  Once the estimated coefficients âi , bj and ĉk are known, these can be used for correction of the ADC output.During normal operation, N L(D) is subtracted from the ADC output to obtain the corrected output D corr , as shown in Fig. 24.Also, as first described in [7], the nonlinearity correction is executed on a partially decimated signal including a prior anti- aliasing filtering.This improves performance by removing out of-band shaped quantization noise that would otherwise be partly converted to in-band white noise after passing through the nonlinearity correction block.
Note that despite the 1-1 MASH being more complex than the single-stage VCO ADC calibrated in [48], it is expected the same procedure can still be applied.The first stage will be the dominant source of distortion of V in and can be modeled as Fig. 22, as it is simply a single-stage VCO.While the second stage will introduce distortion of E, no significant distortion of the global input signal V in due to the second stage is expected.This is a consequence of (2), where it is shown that E is an approximation of the quantization error of the first stage.As mentioned before, the nonlinearity in the second stage is mainly of concern for the Q 1 noise leakage, which has little to no input signal dependency.Note that this cannot be corrected by the calibration procedure, which is one of the reasons the cross-coupling technique was introduced in section 3.
In present work, the calibration is performed using N i = N j = 5 and N k = 2.The nonlinearity correction itself is implemented off-chip in a bit-accurate manner.D and D corr are represented as 14-bit fixed point numbers.Other calculations are performed with the same accuracy.The polynomials Ni i=0 âi D i [n] and Nj j=1 bj D j [n] of N L[•] are approximated using 512-element LUTs containing the polynomial values and linear interpolation is performed between the 2 points closest to the input of the correction block.This is similar to the method first developed in [4].To evaluate the cost of an onchip implementation, an implementation was generated using automatic synthesis and place and route tools.The estimated additional power consumption and area for the correction would be 2.2 mW (of which 1.4 mW in the LUTs) and 0.0015 mm 2 respectively.The results of this calibration procedure will be discussed in the next section.

B. Single-and Two-tone Measurement Results
Fig. 25 shows the output frequency spectrum for a 26.5 MHz 750 mV pp sinusoidal input.The input signal was filtered by a 27 MHz low-pass filter.Some lower-frequency generator spurs are therefore present in the output spectrum, but do not affect the measurement interpretation.A bandwidth of 109.375MHz (OSR = 16) is used.The gray curve shows the output frequency spectrum when only reading out the first stage.This shows a significant amount of in-band first-order shaped noise, resulting in SNR = 62 dB.For the 1-1 MASH (red curve), an SNR of 67 dB and uncalibrated SNDR of 36 dB are obtained.This uncalibrated SNDR is largely in line with the design target of section IV-A to limit the nonlinearity of the first stage to about 1%.Applying the calibration procedure described above to the 1-1 MASH leads to the blue curve and an improved SNDR = 67 dB.Fig. 26 shows the result of a two-tone test with 85 and 89.99 MHz -8.5 dBFS input signals, where all harmonics are under −70 dBFS after calibration.
The improvement after calibration is also visible in Fig. 28, which shows the achieved SNDR before calibration and the SNR, SNDR and SFDR after calibration for a 750 mV pp input for a range of input frequencies.The SFDR after calibration is over 80 dB for all measured frequencies.Fig. 27 illustrates the influence of the VCO supply voltage on the obtained SNDR and SNR.Due to the PVT dependency of VCO nonlinearity, using a foreground calibration procedure (commonly used in most high-bandwidth VCO ADCs [1], [4], [6]) results in a calibration performance that varies with changes in temperature and voltage.For example, the results shown in Fig. 27 closely align with those of [4].Fig. 27 shows a narrow range of around 7 mV where the calibrated SNDR (red curve) remains above 65 dB, after which it drops.
To address this, one solution is recalibrating the ADC when the supply voltage changes to maintain a nearly constant SNDR [4], although this may not always be practical.Another approach involves interpolation.It is possible to perform calibration measurements over multiple supply voltages at calibration time.As many modern SoCs are equipped with PVT monitors, an interpolation can be performed over the obtained calibration coefficients during regular operation using the information from the on-board voltage sensor.This approach is illustrated by the green curve of Fig. 27, where quadratic interpolation between calibration coefficients obtained at 0.85 V, 0.9 V and 0.95 V was performed, achieving an SNDR of 65 dB over a 110 mV range between 0.84 V and 0.95 V.An error of 1.92 mV was introduced in the supply voltage estimate used for the interpolation, to mimic the limited resolution of published voltage sensors in our technology node [49].Finally, it is also common practice in many high-performance chips to use voltage regulators such as a low-dropout regulator (LDO) to isolate sensitive ADCs and analog components from the noisy supply of digital circuits [50], [51].Many recently published LDOs offer line regulations that keep variations well within the previously mentioned 7 mV range [52], therefore also effectively addressing this issue.The interpolation method using a temperature sensor can also be employed to address the effects of temperature fluctuations.Fig. 29 shows the SNDR in function of the input signal amplitude of a 26.5 MHz sine before and after calibration.We obtain a dynamic range DR = 68 dB.We also find a peak SNDR = 67 dB for a 750 mV pp signal at 26.5 MHz.Dynamic range measurements on an extra chip from the same wafer resulted in similar results to those reported in this work.
Note that the same set of calibration coefficients, which were derived from an initial calibration measurement, were used for all further measurements in this work (except for the green curve in Fig. 27).These measurements clearly support our claim that the 1-1 MASH can be calibrated in the same way as a single-stage VCO ADC.Based on these  measurements, Table III summarizes the performance of our prototype and compares it to state-of-the-art VCO ADCs [5]- [9] and a recent Σ∆ modulator [51].From the table it is clear that our prototype compares favorably with these designs, achieving an excellent bandwidth, Figure-of-Merit (FoM) and area.

VI. CONCLUSION
In this work, an open-loop 1-1 MASH VCO ADC architecture was presented and a prototype was manufactured in 28nm CMOS.By using the estimated quantization noise of a first VCO ADC stage as the input of a second stage and combining both outputs using the correct noise correction filters, secondorder noise shaping was achieved.To the authors' knowledge, this results in the first higher-order purely VCO-based ADC operating at more than 100 MHz bandwidth.A key enabler is the use of error estimation on all phases of the first stage, which allows to significantly reduce the in-band PFM spurs that limit the encoding accuracy.The ADC core consumes 33 mW.An SNDR of 67 dB and a DR of 68 dB are obtained with a bandwidth of 109.375MHz.Including the estimated

Fig. 1 :
Fig. 1: Simplified PFM model of a single-stage VCO ADC showing in-band signals.

Fig. 8 :
Fig. 8: Frequency characteristic of the second stage VCO obtained after post-layout simulation without cross-coupling (a) and equivalent with cross-coupling (b).Note that E can only take on integer values between 0 and 32.
(right).Now the positive and negative channel are driven by the opposite signals E = E + − E − and −E = E − − E + respectively.The rationale for this choice of signals is that due to the pseudodifferential operation of the first stage, the total quantization noise of the first stage now takes the form Q 1,+ −Q 1,− , which must be present in the output of the second stage to obtain second-order noise shaping.A division by 2 is introduced to keep the input values of each channel of the second stage in the same range.By calculating D 2 as D 2 = D 2,+ − D 2,− , we have created a pseudo-differential second stage.

Fig. 11 :
Fig. 11: Conceptual illustration of the influence of pulse-width errors.

Fig. 12 :
Fig. 12: Ring oscillator using feed-forward delay cells (a) along with the sizing of the delay cells in the first & second stage (b), sizing of the invertor buffers (c).

Fig. 13 :
Fig. 13: Frequency characteristic of the first stage VCO obtained after post-layout simulation.

Fig. 14 :
Fig. 14: Post-layout output waveform of a VCO delay cell and the subsequent buffer for a 750 mV pp 26.5 MHz input.

Fig. 15 :
Fig. 15: Simplified single-ended representation of the QSD block with error estimation.

Fig. 16 :
Fig. 16: Double-tail sense amplifier (left) and latch to maintain output over entire clock period (right) (a) and sizing (b).

Fig. 19 :
Fig.19: Influence of variations of the noise cancellation gain G compared to the optimal value G opt on the post-layout simulated SNR including thermal noise.

Fig. 22 :
Fig. 22: Model of the VCO using the resistive driver.
, where g[•], h[•] and f [•] are nonlinear functions.The nonlinear components can be isolated from the linear and constant components, ie. the free-running frequency and frequency gain, into an isolated distortion function N L[•].This results in the redrawn Fig. 23.An estimate N L[•] of N L[•] can now be written in function of D cl as: ) with coefficients âi , bj and ĉk .The difference∆D j cl [n] = (D j cl [n]−D j cl [n−1]) is used to approximate the time derivative originating from the capacitive effects mentioned above.N i , N j , N k represent the highest powers of the polynomials used for the approximation.During the fitting step of the calibration, the coefficients of N L[•] are determined by fitting N L[•] to DIST using least square minimizations.

TABLE I :
Parameters of the 1-1 MASH VCO Architecture.