Delay Error Shaping in ΔΣ Modulators Using Time-Interleaved High Resolution Quantizers

As wideband Delta-Sigma-Modulators (DSMs) are restricted in oversampling ratio (OSR), and low OSR reduces the benefit of higher order loop filters, the increase of the internal resolution is an obvious way to achieve high signal-to-quantization-noise ratio (SQNR). State-of-the-art implementations restrict the internal resolution to mostly 4–6 bits, as efficient QTZs with higher resolution add excessive delay into the DSM. Multi-step or time-interleaved-quantizer (TI-QTZs) are an effective way to enhance resolution at high sampling rate, but the resulting latency in excess of one clock cycle usually prohibits their usage in DSMs. This paper proposes a new architecture to employ high resolution multi-step QTZs, such as TI SAR or pipeline ADCs. In the proposed architecture, an excess loop delay (ELD) of several clock cycles in the LSBs is purposefully allowed. While the MSBs are conventionally ELD-compensated, the LSBs are not. The resulting error is corrected in the digital domain. It is shown that matching requirements are relaxed by first-order shaping. The idea is also applicable to a Leslie-Singh and noise-coupling architecture, which are compared to the proposed architecture in an extensive system-level analysis and simulation. Depending on the target application, an advantageous design recommendation can be given based on the presented results depending on OSR, internal bitwidth and expected analog-digital matching.

high resolutions can be achieved over low and moderate bandwidths, requiring a large bandwidth imposes significant limitations on these modulators. As the maximum sampling frequency is practically limited to a couple of GHz, the chosen OSR is as low as 6 in recent SoA designs [1]. Possible solutions to maintain a high resolution are either an increase of loop filter order or an increase of internal QTZ resolution. At such low OSR, the advantage of higher order noise-shaping diminishes, rendering higher than 3rd-order single-loop filters inefficient due to reduced maximum stable amplitude (MSA) and stability issues. Alternatively, MASH architectures can be employed. These require rather complex digital noise cancellation filters and especially suffer from noise-leakage due to analog/digital mismatch. Successful recent implementations of wideband MASH can be found e.g. in [2].
Employing medium to high resolution multibit internal quantization also gained popularity in wideband designs [1], [3], [4]. The internal resolution is though still limited to [4][5][6]  at the same time efficient QTZ, e.g. a TI-SAR [5], might appear as better choice within a DSM, as illustrated in Fig. 1a. This however is not feasible, because although the TI-QTZ achieves a very high sampling rate f s , the overall latency and the resulting excess loop delay (ELD) τ d = t delay /T s becomes very large, greatly exceeding a clock cycle, which leads to instability. Even though TI and multi-step QTZs have been priorly used [6], [7], [8], none of the examples solved the latency issue, as either the TI-QTZ still resolved within one clock cycle, which contradicts the original purpose of achieving high sampling rates using TI, or the added ELD is uncompensated and suppressed by a high OSR, which is not an option when targeting wideband DSMs.
In this paper, a novel approach on compensating large ELD due to high resolution internal QTZs is proposed, which exploits the fact that in any multi-step QTZ such as SAR or pipeline ADC, the MSBs of each sample are resolved much earlier than the LSBs, as indicated in Fig. 1b. By purposefully allowing arbitrarily long ELD of e.g. several clock cycles for the LSBs, the use of TI-QTZs becomes possible. The remaining delay error is corrected in the digital domain with relaxed matching requirements employing delay error shaping.
The paper is organized as follows: Section II reviews SoA ELD compensation techniques in excess of one clock cycle. Section III introduces the delay error shaping architecture. Section IV compares this approach with similar architectures in prior art. Section V shows extensive system level simulation results and proves the relaxed matching requirements in case of delay error shaping. Section VI summarizes design recommendations and Section VII concludes the paper.
II. SOA OF ELD COMPENSATION WITH τ d ≥ 1 ELD is a well known nonideality in CTDSMs, which destabilizes the loop and thus requires compensation. Many compensation approaches are available [9], which all rely on tuning coefficients and realizing a 0th-order path in the loop filter. The targeted noise transfer function (NTF) can be restored ideally, however only when ELD τ d < 1. This strict limitation led to attempts in the SoA to maximize the available conversion time within one clock cycle e.g. by preliminary sampling [4] or early bitwise feedback [10].
Also, prior work introduced ELD compensation in excess of one clock cycle. [11] proposed to add the 0th-order path as analog feedback in front of the QTZ, which closes the fast loop independently from the QTZ latency. However, the proposed technique can not restore the original NTF and its maximum allowed ELD still remains below τ d < 2. In [12], an ELD compensation of 2 clock cycles is realized for a 2-TI dual-slope QTZ by using prior timing information from the analog slope of the sub-QTZs. This technique however is limited to a maximum of 2 time-interleaved channels of dualslope ADCs, which makes it not favorable for high sampling rates. Lastly, [13] presented a 5-TI bandpass NS-SAR inside a DSM, where the fast path, which is necessary for robust NS, is relaxed by the z-to-z 2 transformation. This trick allows an additional clock cycle latency in the QTZ, but is only applicable to f s /4-bandpass NTFs. Besides, the overall ELD (a) Block diagram of the delay error shaping architecture with 0 < τ d < 1 being the delay to resolve MSBs and x being the number of additional clock cycles to resolve the LSBs. (b) Equivalent linear CT-to-discrete-time (DT) model. is still limited to τ d < 2 and remains uncompensated in the 2nd-order DSM loop around the NS-QTZ.
Next, a different solution for ELD in excess of one clock cycle is proposed. It builds upon the idea of treating early and late available bits from the internal QTZ differently [10]. Further, it employs established SoA design techniques, like digital-to-analog converter (DAC) segmentation [3], [4] and digital correction filters [2]. In the proposed delay error shaping approach, the impact of nonidealities of both techniques turns out to be lesser compared to existing approaches.

III. DELAY ERROR SHAPING APPROACH
In many high-speed QTZ architectures like TI-SAR or multi-step ADCs, some MSBs are resolved shortly after sampling, while final LSBs are resolved significantly later. Thus, when using e.g. a B-times TI-QTZ inside a DSM as illustrated in Fig. 1b, a few MSBs of each sub-channel can be fed into the loop within τ d < 1, which can be properly compensated for by common ELD compensation techniques; in contrast, the LSBs cannot be correctly compensated for as their τ d ≥ 1. Even though the ELD-compensated MSBs allow stable operation and almost unaltered MSA of the DSM, the not ELD-compensated LSBs yield a visibly increased quantization noise-floor. The core idea of the presented technique is to nonetheless feed those late LSBs into the loop and digitally correct for their not compensated delay error, and thereby restore ideal quantization noise. It will be shown that this comes advantageous concerning analog/digital matching when compared to alternative architectures.

A. System Analysis
A CTDSM with a B-times time-interleaved N -bit internal QTZ is assumed, which provides L MSBs within an ELD of 0 < τ d < 1 for each sub-channel. All remaining (N − L) LSBs are converted maximally x clock cycles later, x ∈ N. Even though the architecture will allow i subgroups of LSBs being converted with different number of clock cycles delay x i , the analysis in this section is restricted to the simplified case of all LSBs appearing simultaneously after x clock cycles to simplify the derivation of the system transfer functions. The more general approach with multiple LSB subgroups is further demonstrated in Section V.
For a B-TI-QTZ, it follows that x < B in order to finish the conversion with the required sampling rate f s , i.e. the TI-QTZ has a sampling rate of f s , a resolution of N and its latency leads to a maximum ELD of (τ d + x). The idea is illustrated in the block diagram in Fig. 2a, where L 0 (s) and L 1 (s) describe the loop filter and H corr (z) is a digital correction filter. The output of the QTZ V q (z) is divided into MSB and LSB section with Thereby, the N -bit, B-TI-QTZ outputs L-bits (MSBs) within the first clock cycle after every sampling instant; those MSBs are fed back into the loop, and their delay is compensated by using any conventional ELD compensation approach. Subsequently, the N -bit, B-TI-QTZ successively outputs the remaining (N − L)-bits (LSBs) in the x th clock cycle. The proposed architecture in Fig. 2a feeds those late LSBs back into the loop, but does not compensate for their actual ELD (τ d + x), as conventional ELD compensation techniques only compensate for τ d < 1; instead, only τ d is compensated. As we know that this creates a compensation error for the LSBs, a digital correction filter is used to remove that error from the output. From the block diagram in Fig. 2a, one can see that the feedback DAC structure remains unaltered compared to a conventional N -bit DAC, the only difference is that the LSB path has additional delay x. Also the loop filter remains unaltered, which means that both coefficients and ELD compensation of the original loop filter remain unchanged; i.e. the loopfilter and compensation are designed as if all bits would be fed back after ELD τ d , but in the proposed architecture we allow the (N − L) LSBs to arrive later.
Under these conditions a linear model can be derived, which is shown in Fig. 2b. Here, the equivalent DT feedback transfer function L 1 (z) is used, which results from CT-DT conversion of the combination of ELD τ d , DAC transfer function R D AC (s) (not shown), loop filter transfer function L 1 (s) as well as the implicit sampling in the QTZ [14]. Additive white noise is assumed in the QTZ, which is reasonable for the intended high resolution. The modulator output in Fig. 2b becomes (2) consisting of input signal U , shaped quantization noise E q and an additional term due to the uncompensated delay of the LSB section of the TI-QTZ. The digital correction on the LSBs is then chosen as Output spectrum of a 3rd-order CTDSM using the uncorrected output V q (blue) and the corrected output V (red). An OSR of 6 and a NTF out-of-band (OOB) gain of 2.5 have been used with an illustrative 10-bit, 2-TI-QTZ using MSB delay of τ d = 0.5 and one additional clock cycle LSB delay (τ d + 1).
where N T F d is the digital representation of the analog NTF, which are ideally identical. Then, the final output after the correction filter becomes which is the ideal, full resolution output from a conventional N -bit modulator. Thus, when allowing additional x clock cycles ELD in the DSM for the LSBs, the only necessary modification becomes a digital correction filter on the LSBs given in Eq. (3); the correction filter firstly consists of a simple delay of length x, which adds the LSB to the correct MSB samples and thus creates the full N -bit QTZ output; secondly, the correction filter employs a shaped version (1 − z −x ) of the digital representation of the analog NTF.
The proposed modulator in Fig. 2a has been simulated on system-level for an exemplary 3rd-order CIFF/FB CTDSM and a 2-TI 10-bit internal QTZ, divided into L = 5 MSBs and remaining (N − L) = 5 LSBs, which arrive in the first (τ d = 0.5) and second clock cycle (τ d + 1) after the sampling point, resp. The spectrum of the uncorrected and the digitally corrected architecture in Fig. 2a are shown in Fig. 3. One can see increased quantization noise and only first-order noiseshaping in the uncorrected output due to the not properly compensated LSBs, while ideal noise-shaping is achieved in the corrected output. The results confirm Eq. (2), assuming perfect matching of the digital filter with the analog NTF.
Note, that due to the additional shaped delay error, a slight increase of loop filter swings and thus a negative impact on MSA might result, which depends on the ratio of early MSBs and late LSBs. This will be analyzed later in Section V.

B. LSB Noise Leakage
In most wideband CTDSMs, coefficient tuning is used on the loopfilter in order to obtain a stable performance over PVT. Still, such global RC tuning has usually an accuracy in the range of 1-5%, which results in remaining analog-digital filter mismatch. The remaining LSB delay error from Eq. (2) with mismatched correction filter becomes Band Edge OSR 10 6 Normalized Frequency f/fs which only vanishes with the digital N T F d matching its analog counterpart N T F, as assumed in Eq. (4). In presence of mismatch, full resolution cannot be achieved, as some of the uncompensated LSB delay error leaks into the output. We can note that this matching-critical leakage error is high-pass shaped. The amount of leakage then not only depends on the mismatch and the bitwidth of the LSBs, but also on the number of delayed clock cycles x in the shaping term. From Eq. (5), reduced leakage is expected with increasing OSR, while leakage increases for larger delay x due to worse error shaping. This relation is depicted in Fig. 4, where the error shaping term is plotted for x = 1 . . . 5, and the band-edge is indicated for OSR = 6 and 10. Obviously, best leakage suppression is achieved for large OSR and low x.
Despite the fact that more delay degrades the shaping and thus the leaked error, the implementation of rather slow multistep or TI-QTZ still shows advantageous error shaping: the later a bit is resolved in relation to the total bitwidth, the less significant it is, too. In other words, while the signal power of each bit reduces by 6 dB per bit towards the LSBs, the loss of shaping suppression due to more delay is only 6 dB or less (cf. Fig. 4). Thus, when splitting the LSB section into i subgroups, each with additional delay 1 ≤ x i < B, earlier LSBs are strongly shaped, while in later LSBs the amount of leaked LSB error remains limited.

C. Practical QTZ Considerations
The proposed delay error shaping approach can be applied to any conventional modulator, as it does not require modification of loop filter or DAC besides the now possible increase or easier realization of resolution by employing TI or pipelined quantization. Next, we will shortly discuss its practicability concerning the internal QTZ.
In a pipelined QTZ, the first stage resolves L coarse bits within τ d < 1, which are fed back with ideal ELD compensation; a second stage would e.g. resolve all remaining (N − L)bits, which are fed back without proper compensation, but with digital correction; the number of bits could be distributed to more stages as well.
Using a TI-ADC, such as a TI-SAR, each individual sub-ADC leads to an ELD larger than one clock cycle, but each sub-ADC also provides some MSBs shortly after sampling.
These can be fed back with conventional ELD compensation, whereas the later bits are again fed back uncompensated, whenever they are ready, but with the proposed digital correction.
Both structures are well suited for the proposed delay error shaping, while the use of a TI-QTZ comes with an inherent advantage. Time-interleaving nonidealities, most prominently offset, gain and timing mismatch [15], result in undesired interleaving spurs at the combined ADC output. In the case of B sub-channels and a signal bandwidth of f b = f s 2·O S R , the lowest spur frequency is located at Advantageously, as long as the lowest spur frequency is higher than the signal bandwidth, all time-interleaving nonidealities fall out-of-band; this is achieved for Consequently, when the OSR is larger than the number of TI channels, costly tuning or correction of TI errors is not necessary, and the spurs only need to be filtered by the decimation filter at the output. On the contrary, a pipelined QTZ and its major nonidealities such as residue gain-error would still need to be as good as the overall resolution of the QTZ. This highlights that the proposed architecture, even though usable for all multi-step QTZ within DSM, is very favorable to employ TI-SAR. Although Eq. (7) defines a new design trade-off, which limits the maximum number of TI channels to B < O S R, this is no practical limitation: only few SoA implementations of DSMs use as low as O S R ≤ 6. Even at such extremely low OSR, time-interleaving of the internal QTZ with B = 2 . . . 5 channels is still suitable without requiring any correction. Note that OOB blocker signals could violate the condition set by Eq. (6). Then, two additional effects are present within a CTDSM which alleviate problems due to TI-nonidealities. Firstly, the input signal is filtered by the signal transfer function (STF) before arriving at the TI-QTZ. This is one of the most important features of CTDSM, especially when OOB blockers are of concern, and typically a maximally flat low pass STF with strict out-of-band roll-off is implemented. Thus, any OOB signal is attenuated prior to being seen by the TI-QTZ and before any potential mixing into the inband by TI-nonidealities. Secondly, any remaining TI-related error is subject to maximum noise-shaping by the NTF. To conclude, TI-nonidealities are in practice of little concern when used as internal QTZ within a CTDSM.

D. Practical DAC Considerations
The feedback DACs in DSMs typically have the same resolution as the internal QTZ. The outermost feedback is especially critical regarding ADC linearity. Most SoA DSMs realize thermometer coded unary DACs with bitwidths of 3-5, in order to achieve a decent differential nonlinearity (DNL). Analog and digital calibration techniques are implemented in most designs when exceeding 12-bit linearity [16]. When the bitwidth is increased, as proposed in combination with the delay error shaping architecture, the additional effort for the high-resolution DAC can be kept small by adding a segmented LSB section [3], [4]. The conventional, low resolution MSB section carries most of the signal power and also causes significant loading effects on the input summation node. Thus, its unary implementation with very good linearity control is mandatory, which can be realized by either calibration or shaping/dynamic element matching (DEM). When adding an additional LSB section in order to form a segmented highresolution DAC, the impact on capacitive amplifier loading is equivalent to the weight of only an additional thermometer bit of the MSB section. As the delay error shaping architecture already segments internal bits, it thus also seems appropriate to segment the feedback DACs accordingly.
On a side note, the inherent delay of LSBs in the proposed technique leads to a beneficial side effect compared to a standard N -bit segmented DAC concerning spurious-free dynamic range (SFDR). Although the overall N -bit (segmented) DAC is not modified, the (N − L) LSBs represent quantization noise of an L-bit QTZ; thus they have dither/noise-like properties, which become decorrelated from the MSB DAC due to the intentional z −x delay. This effect has also been explained in [17]. Behavioral simulations for the proposed delay error shaping architecture showed improved SFDR of more than +10 dB in presence of outer DAC LSB mismatch compared to a modulator using an equally segmented DAC without LSB delay.
Concluding the DAC considerations, the extension of the internal bitwidth and the application of delay error shaping only adds minor complexity to the critical parts of the feedback DACs and inherently includes LSB DAC mismatch error shaping with relaxed matching requirements.

E. Digital Correction Filter
The main modification from a conventional modulator when using the proposed delay error shaping technique is the addition of the digital correction filter H corr (z) given in Eq. (3). It acts only on the output LSBs of the modulator; as it is not included into the feedback loop, it is not critical regarding speed and loopfilter stability. A simple exemplary realization of this filter is given in Fig. 5, which consist of delayed LSB addition, first-order shaping and an R-tap finite impulse response (FIR) filter. The FIR coefficients f 0 . . . f R−1 are chosen based on the analog loop filter coefficients to replicate the NTF impulse response of the modulator, which ultimately needs to match the inband portion of the analog NTF to restore ideal performance. Inaccuracies of this FIR filter, such as finite number of taps and coefficient precision, have similar impact on the modulators performance as analog mismatch, thus a reduced sensitivity by first-order shaping is present.

IV. ANALYSIS OF RELATED ARCHITECTURES
Once the idea of the delay error shaping architecture is established, we need to compare and evaluate it to similar SoA architectures, even though they were not proposed for the same reason. Despite the few prior work, which proposed architectures for ELD of τ d < 2 clock cycles (cf. Section II), two further architectures are investigated below, which can similarly employ the MSB/LSB separation with minor changes to a conventional DSM architecture, as proposed in the delay error shaping approach. Both the Leslie-Singh (X-0 MASH) architecture [18] and a digital noise-coupling approach in [19] are two alternative approaches on how to process late LSBs in a DSM differently from early MSBs. The concept of these architectures is applied to the same idea of a wideband CTDSM with high resolution TI-QTZ and will be analyzed in the following.

A. Leslie-Singh Architecture
The X-0 MASH, i.e. Leslie-Singh architecture [18], can be understood as a DSM with high-resolution QTZ, where the QTZ output is split into an MSB and LSB section, but where only the MSBs are used in the feedback of the modulator. The LSBs are digitally subtracted from the coarsely quantized DSM output via a cancellation filter. Even though this has not been shown in literature, as favorable extension, a TI-QTZ can be used within Leslie-Singh: then, the LSBs have additional delay, which can easily be applied to both digital output paths to synchronize the bits. Thus, the LSBs can have unlimited delay, as long as the MSBs are fed back within the first clock cycle τ d < 1, where ELD can be conventionally compensated for. This TI-QTZ Leslie-Singh architecture is illustrated in Fig. 6a. Analysis of its linearized model in Fig. 6b results in for the modulator output. To cancel the LSB error, a digital correction filter with is necessary. Compared to the delay error shaping approach (cf. Section III), the additional first-order shaping property from Eq. (3) is lost. This predictively makes matching requirements of the digital filter more stringent, which is the known disadvantage of MASH architectures. On the other hand, the Leslie-Singh architecture is simpler than the delay error shaping architecture in Fig. 2 due to the missing LSB DAC, as the LSB section in the X-0 MASH operates completely independent from the feedback loop in a forward fashion.

B. Digital Noise Coupling Architecture
Another relevant architecture was proposed in [19] for digital noise-coupling; it was motivated for a DT modulator with the goal to mitigate the exponential growth of DEM linearization logic in the feedback path of a DSM by feeding back MSBs and LSBs separately at different locations in the loop. The solution in [19] was to feedback the LSBs only around the QTZ, with a one clock cycle delay for causality reasons. In their paper, ELD as a general topic or specifically ELD of more than one clock cycle, e.g. due to a high resolution TI-QTZ, was not considered; however, the resulting architecture can be adapted to the motivation of this paper and used for a similar delay error shaping.
The corresponding block diagram is shown in Fig. 7a. Similarly as in the proposed approach from Fig. 2a, the (early) MSBs are fed back within the first clock cycle in the outer loop. In contrast to Fig. 2a, the (late) LSBs are coupled into the loop only through a direct path around the QTZ. In addition to the originally proposed architecture from [19], additional delay (x) is allowed in the LSB loop in order to enable a similar TI-QTZ as in the proposed architecture. In the resulting architecture in Fig. 7a, an uncompensated ELD-error (of the LSBs) is thereby introduced, as the LSBs are not properly processed within the loop.
Similar to both earlier approaches in Section III and IV-A, the introduced error can be corrected by a digital filter. CT-DT conversion and linearization of the QTZ leads to the equivalent model in Fig. 7b. It turns out to be closely related to the delay error shaping in Fig. 2b, with the key difference that the feedback of LSBs is not processed by the whole loop filter L 1 (z), but only around the QTZ. The modulator output becomes In order to cancel the LSB error entirely, the digital correction filter needs to be Compared to Eq. (3), error shaping is present but degraded due to an effective delay of (x + 1) in the LSBs feedback. This means that even for one single clock cycle additional delay in the LSBs, ideal first-order shaping by (1 − z −1 ) cannot be achieved. Referring to Fig. 4, in the digital noisecoupling architecture the LSB error shaping is degraded to the next worse transfer function compared to the proposed delay error shaping in Fig. 2, thus increased LSB error leakage is expected.

V. SIMULATION RESULTS
System level simulations using Matlab/Simulink are performed. If not mentioned otherwise, the simulation example is a 3rd-order CT CIFF/FB modulator with an OSR of 8; it uses an N -bit B-TI-QTZ which is divided into L early MSBs (with τ d = 0.5) and (N − L) late LSBs; the LSBs are distributed over several following clock cycles (τ d + z −x i , x i = 1, 2, . . .). The underlying architecture is e.g an internal B-TI-QTZ, where N -bits are successively converted over B clock cycles, while the L-MSBs of each sub-channel are fed back into the DSM within the first clock cycle with proper ELD compensation. As explained in Section III-C, TI-nonidealities in the given scenario usually fall out-of-band or are attenuated by a combination of STF and NTF. Thus, matching sub-channels are assumed. As will be shown in the following, a favorable choice of the number of MSBs is L = 5 as a reasonable compromise between MSA and QTZ speed requirement. The specific delays of the LSBs are more flexible, as any number 2 . . . L of LSBs per clock cycle would be feasible and fulfill the condition in Eq. (7). We have illustratively chosen 3-bit LSB groups per clock cycle, which leaves some time margin to exploit for improved input-referred noise [20] or redundant cycles [21]. The resulting delay x i of additional clock cycles for the individual LSB sub-groups is then The NTF is generated with the Delta-Sigma Toolbox [22] using an OOB gain of 2.5, which is a commonly chosen value for multibit DSMs in the SoA [4], including a resonator around the last two integrators. Each integrator is modeled by a clipping 1-pole model with a DC gain of 60 dB and a finite gain-bandwidth product (GBW) of 1 f s /2 f s /3 f s for the first, second and third integrator, respectively. Coefficients are generated by closed-loop fitting [23] using the impulseinvariant transformation, assuming non-return-to-zero (NRZ) DACs in the feedback paths and a capacitive DAC into the last integrator to realize the 0th-order path around the QTZ for ELD compensation [24]. ELD compensation is calculated for the nominal delay of the MSBs of τ d = 0.5.
The System level simulations are used to compare the three shown architectures. The aim is to highlight their differences concerning OSR, internal bitwidth, LSB delays and analog vs. digital mismatch. Thereby, we can firstly expect that the architectures have different impact on the MSA, as uncompensated ELD leads to reduced MSA and both the digital noise-coupling and the delay error shaping architecture are expected to suffer from this. Secondly, following the above derivations, mismatch and digital correction filter inaccuracies will expectedly affect the architectures differently for different OSR.

A. Maximum Stable Amplitude
Obviously, the more early MSBs are used, which are properly ELD compensated, the lesser the number of not properly compensated LSB, which negatively affect the internal swings and the MSA. The MSA is next simulated for an exemplary 10-bit internal QTZ, e.g. realized with a TI-SAR ADC, where the number L of early MSBs, which are fed back in the first clock cycle at τ d = 0.5, is varied. Each following clock cycle, the next 3 of the remaining LSBs are fed back with an additional T s delay in the case of delay error shaping (Fig. 2a) and digital noise-coupling (Fig. 7a). Moreover, the NTF OOB gain is varied between 2, 2.5 and 3, where higher OOB gain generally results in more tendency to instability and thus lower MSA; loop filter and feedback DAC swing scaling is kept equal across all modulators in all simulations for comparability, although in practice one could scale the swings to optimize MSA. Note that also the STF of all modulators is equal regardless of delayed or missing LSB feedback: as long as V q,L S B (cf. Eqs. (2), (8) and (10)) can be assumed to behave like white quantization noise of an L-bit QTZ, no influence on the STF is present.
The results for MSA, taken at the point of maximum SQNR for an input signal of f in = 0.05 · f b (with STF = 0 dB) are presented in Fig. 8. It can be seen that the higher the number L of MSBs, the better the MSA as we expected for all architectures. This is intuitive, as the amount of shaped quantization noise in the loop is directly affected. For increasing NTF OOB gain, the OOB quantization noise increases more due to the more aggressive NTF, thus the internal swings and therefore MSA degrades for all of the analyzed modulators. The feasible OOB gain range is thus around ≤ 3, which however is within typical values chosen in the SoA.
The Leslie-Singh approach outperforms the other approaches for less than 5 MSBs. The reason for this is that in case of digital noise-coupling and delay error shaping, additional noise from the first-order shaped LSBs is added into the loop, which has a larger impact on internal swings in case the number of MSBs is small. A tipping point, where more MSBs do not improve the swing and MSA of the modulator anymore, exists due to the fact that for very low OSR and a fast input signal, the step size of the feedback DAC output is much larger than an LSB. It rather exhibits step sizes in the range of a low resolution DAC, meaning that not all high resolution QTZ bits are needed in order to keep the internal swings and MSA at a good value.
The presented results show that the delay error shaping and similar approaches become feasible for 4 early MSBs, while 5 MSBs or more are preferred depending on the NTF aggressiveness to prevent loss in MSA. As shown in recent SoA wideband CTDSM examples, this is achievable even at very high sampling rates.

B. Analog-Digital Filter Mismatch
Digital correction relies on rebuilding a part of the analog loop filter transfer function, thus it suffers from mismatch and noise-leakage. As seen in the calculation above, one can expect that Leslie-Singh suffers the most, the noisecoupling architecture lesser, and the proposed delay error shaping technique the least, because the latter two modulators feed the LSB still into the loop, where they are partially processed. To show this, a Monte Carlo simulation of 100 runs with varying loop filter coefficients was performed, while the digital filter was fixed to the ideal NTF. A normally distributed local coefficient variation with σ loc = 0.3% is assumed, which models random local mismatch. Additionally, a uniformly distributed global variation of ±2% is applied equally to all loop filter coefficients as a relative deviation from nominal value, which represents remaining deviation (due to process spread) after coarse loop filter tuning, which is usually done for CT DSM. Also, global normally distributed amplifier deviations of σ G BW = 10% and σ DC = 3 dB are applied.
The results are shown in Fig. 9 for the exemplary 3rd-order modulator with 10-bit TI-QTZ, an OSR of 8, L = 5 MSBs, 3 LSBs per following clock cycle and an input amplitude of −6 dBFS. As known from SoA, a conventional multibit modulator would perform very robust against moderate coefficient variations, as no matching requirements are present and slight changes in the NTF are negligible. In contrast, the Leslie-Singh architecture heavily depends on decent matching; even with the reasonably small coefficient variation in the simulation and given the very small OSR, which usually reduces the tendency to leakage [14], this still results in a drop of 7 dB in mean SQNR compared to nominal coefficients as well as a significantly increased standard variation of 4.7 dB. Both the delay error shaping as well as the digital noisecoupling approach improve this drawback, but advantageously the proposed delay error shaping architecture from Fig. 2a significantly outperforms the other approaches, which matches the analysis throughout Section III and Section IV. A small SQNR spread remains due to limited first-order shaping, but overall the ideal high resolution is much closer replicated in the delay error shaping architecture with a mean SQNR loss of only 2.1 dB compared to its mismatch-free SQNR of 95.6 dB.

C. Digital Filter Accuracy
Similar to the impact of analog mismatch on the modulator performance, simplifications and coefficient inaccuracies e.g. by bitwidth truncation in the digital correction filter H corr (z) can lead to increased LSB noise leakage. An exemplary correction filter for the proposed delay error shaping architecture was shown in Fig. 5. Next, the impact of a finite number R of FIR taps as well as a finite FIR coefficient fractional precision is simulated. Note that the matching critical part is solely the R-tap FIR filter, which is realized in the same way across all architectures.
The results are shown in Fig. 10 for the exemplary 3rd-order modulator with 10-bit TI-QTZ, an OSR of 8, L = 5 MSBs  Fig. 10a, the number of FIR taps R is swept from 2 . . . 18, while the individual coefficients are simulated ideally (with double precision). Any coefficients in higher taps are simply truncated, without any modification or optimization of the filter. In Fig. 10b, the coefficients fractional precision, meaning bits after the decimal point after which the coefficient value is truncated, is swept while the number of FIR taps is set to a large enough value (R = 20).
All architectures achieve close to ideal performance when using R ≥ 14 taps with a precision of 13-bit or more. However, the delay error shaping architecture slightly outperforms the others: the relaxed analog-digital matching requirements due to first-order shaping allow a simplification of the necessary FIR filter down to R = 10 taps with 9-bit precision, keeping the SQNR close to ideal. Note that local peaks are present due to the fact that truncation of both filter length and coefficient precision can lead to improved approximation of the ideal NTF impulse response by coincidence, which opens the possibility for further optimization by slightly modifying the target NTF response to simplify filter requirements.
While those exemplary results generally depend on the order of the DSM, the MSB segmentation ratio and the required overall resolution, the advantage of relaxed analogdigital matching due to delay error shaping can be exploited to simplify digital correction compared to e.g. a Leslie Singh (X-0 MASH) approach.

D. Internal Resolution and OSR
In order to allow a favorable architectural choice, we intend to generate a design map with the most important different design choices between the Leslie-Singh, the noise-coupling and the delay error shaping architectures. An overview of several (high) QTZ resolutions, several (low) OSR selections simulated over mismatch for an exemplary 3rd-order CTDSM is given in tabular form to highlight trends for each presented architecture. Table I provides information on MSA (chosen at the point of maximum SQNR) and achieved SQNR over mismatch (mean, sigma and 3-sigma-min. out of 100 MC simulations). Additionally, the minimum number of digital correction filter FIR taps and FIR coefficient fractional bits, which are needed to achieve an SQNR within 2 dB of its ideal value, cf. Section V-C, are reported. The mismatch settings are equal to Section V-B, as well as the number of early MSBs L = 5 for all systems. The remaining LSBs are again fed back in groups of up to 3 bits in case of delay error shaping and digital noise-coupling.

VI. DISCUSSION AND DESIGN GUIDELINES
Studying Table I, all architectures achieve almost identical nominal performance. The maximum SQNR is only lowered by the MSA, which is around −1 dB for Leslie-Singh and around −2 dB for digital noise-coupling and delay error shaping. This result shows that even with a slow QTZ with overall ELD τ d ≥ 2 in the LSBs, the performance of a high-speed high-resolution conventional DSM can be achieved with the proposed techniques. In presence of coefficient and integrator mismatch, the architectures are affected differently, as was also seen in the distinct example in Fig. 9.
Leslie-Singh trends: First, the standard deviation in case of mismatch stays almost constant across different OSR, which is expected as the ratio between inband quantization noise E q and leaked LSB delay error V q,L S B in Eq. (8) is not affected. Secondly, both the standard deviation as well as the mean SQNR in case of mismatch become drastically worse than ideal SQNR, if the internal resolution is increased. This is explained by the fact that in our example the number of early MSBs is fixed, while the number of late LSBs increases, where the large LSBs in Leslie-Singh determine the resolution of the 2nd MASH stage. This directly contributes to the amount of leaked V q,L S B .
Delay error shaping trends: a clear improvement for increasing OSR can be seen in standard deviation and the difference between mean and ideal SQNR. For higher oversampling, the robustness against mismatch increases in accordance to Eq. (5), as we achieve LSB delay error shaping. Although mismatch affects the delay error shaping architecture more the higher the number of (late) bits, the loss is significantly lower compared to the Leslie-Singh architecture. The absolute matching advantage becomes more prominent for a larger number of (late) internal bits, which are fed back with multiple clock cycles ELD in the delay error shaping approach, as well as for increasing OSR. Due to the inherent shaping, this approach also outperforms the others regarding digital filter complexity, with a reduction of typically 1 . . . 4 necessary FIR taps as well as 1 . . . 3 FIR coefficient fractional bits.
Digital noise-coupling trends: it performs worse than both Leslie-Singh and the delay error shaping architecture concerning MSA and somewhat in between concerning matching robustness; this confirms the analysis of reduced leaked error shaping in Section IV. It does not stand out in any specific characteristic and leads to medium matching improvement at a slightly degraded MSA.
As a metric for comparison, the difference between the 3 sigma minimum of the analyzed modulators can be used. The proposed delay error shaping approach gradually outperforms the Leslie-Singh architecture for both increasing OSR as well as increasing the overall QTZ resolution N . The two corner cases are an improvement of around 3 dB for 8-bit and an OSR of 6, as well as an improvement of around 14 dB for 12-bit and an OSR of 10. Although Leslie-Singh is inferior even in the lower resolution scenario, the overall improvement across mismatch is much less significant in this case, while the overall circuitry is less complex.
Thus, if 7 . . . 9-bit internal resolution with a very low OSR < 8 is targeted, the Leslie-Singh architecture including a TI-QTZ is a very feasible candidate, as matching requirements are manageable and generally lower than for e.g. a comparable 2-1 MASH architecture [14].
The tipping point at which the delay error shaping approach noticeably outperforms also depends on the expected loop filter mismatch and the applied trimming accuracy. For the exemplary global coefficient variation of ±2% in the presented results, delay error shaping with N = 10 already achieves a benefit of 9.4 dB for the very low OSR of 6, which increases to 13 dB at an OSR of 10. For the given modulator scenario, this also seems to be the sweet spot between decent improvement compared to Leslie-Singh and limited loss in mean SQNR compared to the ideal case. Any higher internal resolution and OSR favors again the delay error shaping approach, however the deviation from the maximum possible SQNR becomes worse under mismatch.
While the L-bit MSB DAC is equal for all architectures, the LSB feedback DAC segment is a key difference between the presented architectures. For Leslie-Singh, no LSB DAC is required at all. As described in Section III-D, the proposed delay error shaping approach requires an additional (N − L)-bit LSB DAC segment, but with significantly relaxed matching requirements compared to a standard segmented DAC of N -bit resolution, leading to rather low area and power consumption. This effect is even more prominent in the digital noise-coupling approach, as the LSB DAC segment is only required at the innermost feedback loop and any errors are subject to noise shaping by the upfront loopfilter.
Finally, as the presented architectures have individual advantages and at the same time follow a very similar structure, combining the different approaches within the same DSM is possible and provides even further degrees of freedom in the design. As each LSB can be treated individually both in the feedback DACs as well as the digital filter, mixing of different approaches is as simple as adding (or removing) a bit in the DACs or applying digital first-order shaping in front of the R-tap FIR filter. For instance, for the exemplary 10-bit QTZ with 5 early MSBs, one could apply delay error shaping for the following 3 LSBs to significantly improve matching robustness, while the remaining 2 LSBs could be processed in a Leslie-Singh way to reduce additional DAC effort.
To take up again the motivation of this work, the goal was to find a way of including a high resolution TI-QTZ within a DSM. Such a 8/10/12-bit TI-QTZ DSM is not practically realizable for high sampling rates using conventional ELD design strategies. Instead, the best solution is to approximate the ideal modulator as good as possible. With delay error shaping, a novel architecture has been proposed, which can be a viable solution for the limitations in wideband DSMs, while two other possible architectures, namely Leslie-Singh and a noise-coupling architectures using TI-QTZ, have been investigated, compared and discussed.

VII. CONCLUSION
A novel delay error shaping approach is presented, which allows the use of high latency QTZs inside DSMs and maximize internal resolution at low OSR by purposefully allowing ELD in excess of one clock cycle, as e.g. seen when using TI-QTZs. The architecture is analyzed on transfer function level, while the concept is proven by system level simulations. Related architectures are analyzed in similar fashion, namely the Leslie-Singh architecture as well as an adjusted configuration of digital noise-coupling using a TI-QTZ. Extensive behavioral simulations are performed to point out differences between the approaches as well as giving design guidelines when a certain architecture should be considered. For the first time, the usage of arbitrary high resolution B-TI-QTZ is investigated for lowpass DSMs. The proposed delay error shaping approach manages to significantly relax analog loop filter matching requirements, while the necessary analog adjustments to a conventional modulator are small. On the other hand, if the internal resolution is lower and analog mismatch is less critical, the Leslie-Singh including a TI-QTZ approach becomes feasible. Overall, the presented results can form the basis of wideband DSMs at very low OSR using high resolution TI-QTZs.