A digital phase‐based on‐fly offset compensation method for decision feedback equalisers

Universidad Industrial de Santander; The European Organization for Nuclear Research CERN Abstract A low‐complexity method to reduce the offset voltage of dynamic comparators employed as samplers in decision feedback equalisers (DFE) is introduced. The authors propose the phase‐domain offset reduction technique (PORT), which leverages an all‐digital phase estimation of output data for offset compensation, without setting the comparator input to a common‐mode voltage (VCM). While traditional techniques might break the data link for offset adjustment, the proposed technique allows calibrating the comparator on‐the‐ fly. Measurements from a 26‐dB‐loss on‐chip emulated channel with chip‐scope capability validates PORT through eye‐diagrams at sampler input. A prototype was implemented in a TSMC 130 nm 1.2 V process, and experimental results show the possibility of extending PORT to state‐of‐the‐art technology nodes for multi‐gigabit operation.


| INTRODUCTION
Offset reduction is one of the major concerns at the front-end of high-speed wireline receivers. An offset-reduction technique has to be carefully chosen considering the additional circuit complexity and capacitive load penalty to the signal path. Comparators, used as samplers or slicers in decision feedback equalisers (DFEs), face the challenge of sensing signals at data rates above 20 Gb/s with limited input signal swing. Considering the aggregated losses of low-pass channels, which can reach up to 40 dB, signal amplitude at comparator input could be as low as 20 mV [1]. As a result, comparator sensitivity specifications become limited by the accuracy of the offset correction scheme. Furthermore, any load added to the signal path to set up an offset-calibration scheme has a highly negative impact on signal amplitude and power consumption.
Traditional offset correction methods break the communication link to perform calibration. A typical correction scheme sets the comparator input to a common-mode voltage (V CM ) for offset sensing and compensation. This scheme requires the use of additional circuits (switches are often used for this task) to open the input signal path and connect the sampler input to V CM . If the application demands an on-the-fly operation, it is inevitable that an extra signal path that processes input data will be included while calibration is executed. Therefore, extra circuitry must be added and, consequently, capacitive load is increased, demanding more power and area consumption to meet timing specifications.
A novel scheme is presented to reduce offset of dynamic comparators used in DFE circuits for high-speed interfaces. An integrated receiver scheme is described that implements the phase-domain offset reduction technique (PORT). Measurement results show its potential application for on-fly offset correction in high-speed link receivers. PORT works based on the output signal phase, presents a low complexity, and offers the possibility of a digital implementation without compromising speed and power. In contrast to traditional offsetreduction techniques, which require the connection of slicer input to V CM and duplicate the signal path for allowing on-thefly operation [2][3][4][5], the main characteristic of PORT is that calibration does not require setting the comparator's input to a common-mode level. As a result, PORT paves the way to eliminate the necessity of the alternative signal path. Partial work has been reported by the authors in a brief paper [6]. This work features new content that includes: (1) a detailed description of the technique's operation and its integration in a DFE topology, (2) a new approach for residual offset analysis and minimisation, (3) extended and exhaustive measurement results obtained through the chip-scope technique, and (4) algorithm proposals to improve measurement performance.
Section 2 presents common alternatives to compensate offset in DFEs; Section 3 describes PORT's operation Figure 1a shows a traditional double data rate (DDR) receiver front-end composed by a resistance termination (T-coil), a continuous time linear equaliser (CTLE), two decision feedback equaliser for data and timing (edge) sampling respectively, and a clock and data recovery (CDR) block. The first tap of both data and edge equalisers uses a predictive or partial response implementation (prDFE) to meet timing requirements [7]. Commonly, a third sampler adapts equaliser coefficients, eye-diagram monitoring, and offset-correction purposes. During a tuning process, the third sampler extracts the error signal (dLev) required for the adaptation algorithm. The clock signal of the adaptive comparator (clk TRAIN ) has a different phase compared to that of data samplers (clk EVEN and clk ODD ), which is necessary for eye-diagram monitoring during normal operation [3].

| COMMON APPROACH FOR OFFSET CALIBRATION IN HIGH-SPEED LINKS
The third comparator also allows performing an offsetcalibration on a specific sampler while maintaining data transmission, in contrast to typical offset correction at the beginning of link operation [3,4]. Calibration of samplers only before beginning data transmission involves losing the option to track offset changes due to temperature and power supply (VT) variations during the link operation. The inclusion of an on-fly calibration allows compensating samplers considering link variability due to VT variations. Figures 1b,c present the traditional concept of on-fly calibration. In Figure 1b the signal path (yellow line) includes samplers 1 and 2 as the prDFE section (red blocks), through setup of multiplexers A and B At the same time, offset-calibration is performed on sampler three (grey block) through multiplexer C and using a third digital to analog converter (DAC) connected to the local summing point. A similar procedure is used to compensate offset of sampler 2, resulting in an equaliser formed by samplers 1 and 3, as Figure 1c shows.
An on-fly offset-reduction on samplers of Figure 1 implies that during calibration each comparator input has to be disconnected from the signal path (V IN ) and connected to a common-mode voltage V CM , as presented by Figures 1b,c [8]. Sampler input swapping is done by switches at the input of each path, as Figure 2 shows. In Figure 2 calibration is done on the third sampler, needing its input connected to V CM . Furthermore, comparators 1 and 2 equalise the input signal, so that their inputs are connected to the summing circuit. The main problem with the topology of Figure 2 is the load added by switches and extra signal-paths, enlarging total losses and degrading signal amplitude. Increased circuitry also affects power consumption and area. Figure 3 shows two alternatives to reduce the number of switches. The circuit of Figure 3a uses switches at the output of each local summer (summer of each prDFE section), thus reducing the total load of the global summer. However, the method of Figure 3a only reduces the comparator's offset, so that offset of summer amplifiers still affects the performance. Implementation of Figure 3b uses switches only at samplers' input while turning off the summer that receives the signal from high-order taps. Load capacitance decreases down to 50% of the original value at the cost of losing the capability to perform an on-fly correction.
Furthermore, the works presented in [2,[9][10][11] calibrate offset using digital algorithms at the back-end during regular operation, and offset sampling techniques based on setting V CM at comparator input. Back-end routines increase complexity and thus area and power, while traditional offset sampling methods add loading to the signal path. Other alternatives, such as the one presented in [12], achieve a fully onfly operation by doubling the number of samplers and multiplexers, thus increasing power.
An alternative to overcome the aggregated losses due to the inclusion of extra signal paths and parasitic capacitance is to increase the peaking characteristic in the frequency response of previous equalisation stages. An increment in equalisation results in additional power consumption. Figure 4 presents the simulated increment of load capacitance and bias current of DFE summing-circuit and continuous-time linear equalisers (CTLE), respectively. Those circuits are part of two different serial links: a 28 nm 28 Gbps DDR, and a 130 nm 8 Gbps using four quad-data-rate (QDR), with and without offset correction. For a 28 nm technology, the input and output capacitance of a complementary switch corresponds to 20% of the total load. Furthermore, the parasitic component increases up to 30% when including routing and interconnection paths.
In order to guarantee that CTLE + DFE-summing achieves the required bandwidth and equalisation gain with the additional load, it is necessary to increase CTLE bias current (and thus circuit dimensions) by more than 50%. This increment strongly impacts overall power consumption. Similar behaviour is present in the 8 Gbps link implemented with a 130 nm technology; capacitance increment is 25%, thus demanding 50% more of initial bias current.
By eliminating the necessity of connecting the input's sampler to V CM for offset-correction, the power increment of Figure 4 can be mitigated. Therefore, there is a need for an alternative way to measure offset that does not imply inserting switches at comparator input.

| PROPOSED OFFSET REDUCTION TECHNIQUE
The phase-domain offset reduction technique (PORT) is a substitute to sense and compensate offset in dynamic voltage comparators with no need to connect their input to a common-mode voltage. PORT works by sensing the comparator offset through the phase of its output signals, as shown in Figure 5. Considering that the comparator is dynamic, its outputs change continuously between reset and comparison states, even if the input data is the same. Comparator outputs can be seen as two different oscillations, whose phase difference gives information about offset. The way to measure phase is by using a phase detector (PD) in a similar aspect as in a PLL. The PD senses the phase difference between V OUT1 and V OUT2 , whose output controls the transition of a finite-statemachine (FSM). Thus, the FSM outputs X 1 and X 2 set the bias current of a preamplifier. The correct adjustment of currents I 1 and I 2 reduces the offset introduced by the system, which is a combination of the offset of the comparator accumulated at the preamplifier.
A phase-detector and the FSM comprise PORT's core. Figure 6 shows the PD structure, consisting of two D-type flip-flops and an AND gate at the output, that is, a classical frequency-phase detector [13]. A flip-flops structure corresponds to a master-slave pass-gate topology, and the AND gate is implemented using standard static CMOS logic. The FSM consists of two 8-bit UP/DOWN counters, allowing a differential variation of X 1 and X 2 . Figure 7 presents the state diagram of the FSM. Finally, X 1 and X 2 control to two DACs. One relevant aspect of the proposed technique is the fact that the calibration circuit can be synthesised using digital standard cells, which allows migration between different technology nodes. -299 Figure 8 illustrates the calibration process. The total inputreferred offset at the preamplifier input without calibration ( Figure 5) is:

| Performance description
where A v is the gain of the pre-amplifier, and V off1 and V off2 are offsets of the comparator and pre-amp, respectively.
Assuming an input sequence, as shown at the top of Figure 8a and a positive offset so that |V in | < V OFF , the comparator cannot differentiate between a logic 1 or 0 at its input. Therefore, output V out1 is clamped to V DD , and V out2 oscillates between V DD and ground. The comparator's continuous change from reset to comparison phase causes oscillations of V out2 (Figure 8a). The calibration circuit is turned-on at point A of Figure 8b, meaning that while the comparator is saturated, outputs UP and DW of the phase detector are always high and low, respectively. This behaviour produces an increment in X 1 while X 2 decreases, producing a differential increment in bias currents I 1 and I 2 by means of the DACs. The change in bias currents eventually produces an additional offset V CORR in the opposite direction of V OFF . If offset is negative, the behaviour of UP and DOWN signals will be exchanged, as well as for X 1 and X 2 . If |V in | > (V CORR − V OFF ) and the next logic zero reaches the input, the comparator output V out1 will go low (point B of Figure 8b). Then, in the next rising edge of V out1 (point C) the DW signal will go high, causing the phase-detector to reset (point D). Consequently, the increment of bias currents stops, exchanging V out1 and V out2 roles. Thus, the process can be restarted in the opposite direction, making bias currents oscillate around a newly reached DC level (Figure 8c). These final current conditions are used as a stop criterion of the calibration process. Furthermore, Figure 8c shows a half clock  Figure 6 includes delay stages to eliminate glitches.
The described behaviour does not include setting comparator input to a common-mode V CM voltage, while calibration is carried out. Therefore, PORT avoids all switches at the input of each sampler of Figure 2. Moreover, the feedback loop includes only an accumulator (the FSM) so that the system behaves as a one-dominant-pole one. Having only an accumulator is a relevant aspect to present a stable performance for a large range of bias currents and quantisation steps, which is also related to a large tolerance to PVT variations. The correct selection of I 1,2 and DAC reference voltage creates a circuit behaving as a dominant pole system, whose phase margin is 90°. Additionally, because the FSM is an up/down counter, PORT achieves a reduced convergence time because its critical path does not limit the settling time of bias currents. For that reason, calibration speed is limited only by the DACs.

| Residual offset
PORT can be summarised as follows: first, to apply a bit sequence at slicer input; second, to measure the phase difference between sampler outputs for offset calculation; finally, to adjust preamplifier bias current based on phase-detector output and using the FSM and DACs. Considering the feedback loop formed by the calibration circuit, correction signal V CORR tries to follow total input-referred offset V OFF ( Figure 5). Thus, residual offset V RES (defined as V RES = V OFF − V CORR ) gets lower as the technique converges. Reduction of V RES leads to an improvement in the slicer's sensitivity, as offset is a key aspect for the minimum signal amplitude that the slicer can process. The instant the magnitude of the input signal is larger than residual offset at, that is, |V in | > V RES , gives the stop criteria, as Figure 8c shows. This behaviour does not imply the cancellation of V RES . PORT finds an equilibrium point at which residual offset remains below input signal amplitude, so that offset does not slant the slicer decision.
In order to minimise the residual offset, the circuit of Figure 5 can be modified in two different aspects. First, to insert a low-pass filter (LPF) between the phase detector and FSM, as Figure 9 suggests. Second, the DC component of the input data has to be zero, that is, the number of logic 1s is equal to the number of 0s (using a data scrambler). The function of the filter is to extract any long-term DC level of the slicer's output so that the negative feedback can cancel it out. The filter also reduces the ripple of X 1 and X 2 signals, which is beneficial to achieving a more accurate offset cancellation.
The low-pass filter can be implemented using a majorityvoting (MJV) algorithm in the same way as in a clock-and-data recovery circuit (CDR) [14,15]. This type of filter uses N samples of UP and DOWN signals and a voting function to calculate its output. The chosen voting function is the average of UP and DOWN signals because of its simple hardware implementation. Therefore, the filter will produce an effective -301 UP (UP EFF ) signal if the number of UP samples is larger than DOWN ones, and vice versa (DW EFF ), as Figure 10 shows. Using an MJV filter at the output of the phase detector avoids the use of multiplication blocks, which would be necessary when using a classical digital filter at the output of the FSM. Furthermore, the input signals of the MJV block are 1 bit long, in contrast with the 8-bit output of the FSM, resulting in a low hardware overhead and low impact on the critical path. The main drawback of including a filter in the calibration loop is an increment in convergence time. Even if the filter is a first-order type, the extra pole might lead to stability issues. For that reason, the magnitude of the feedback current (which is related to feedback gain) has to be selected carefully. Moreover, the lower the pole frequency of the filter, the higher the lowfrequency feedback gain and thus the possibility to minimise the residual offset. However, stability becomes critical as the filter approaches an integrator.
The necessity of a DC balanced input can be explained using a linear model of the calibration circuit ( Figure 11). Gain blocks model the comparator and phase-detector. The output D OUT is: where K C , K PRE , K DAC and K PD represent the gain of comparator, preamplifier, DAC and phase detector, respectively. Further, MV is the gain of the majority voting block [15], the accumulator is related to the FSM, V OS DAC is the offset of DACs, and V OS PD corresponds to an equivalent offset caused by mismatch between the UP and DW paths of the phase detector. Equation 2 shows a high-pass behaviour because of a zero at z = 1, which is a consequence of the accumulator in the feedback path. To have a zero at z = 1 implies that the calibration loop will attenuate any DC component of V in , as well as signals V OFF1 , V OFF2 and V OS DAC , once it is turned on and reaches a steady state. In other words, the average value of the output tends to be zero. The only offset contribution that still affects the output is V OS PD , however, it is attenuated by K PD .
When the calibration process finishes and the circuit changes to normal operation, the last value of the output of the FSM is stored, generating a constant signal V CORR that is continuously subtracted from the input. Therefore, if the DC component of V IN is zero while calibrating, it is possible to cancel the contribution of V OFF1 and V OFF2 during normal operation. If the input signal does not have an average null component while performing offset correction, V IN will influence the calculation of the compensation signal V CORR , that is, the system processes V IN as another offset source. For instance, if the input data corresponds to a bitstream generated by a 15th-order pseudo-random bit sequence (PRBS), the number of logic 1s occupies a 49.9% of the total sequence length (32 kb). Therefore, the DC component, and thus the residual offset, is 61 μV.
To have a DC balanced V IN signal during offset calibration has the same effect as connecting the input signal to a common-mode voltage, which is the main advantage of the proposed technique. Considering that the transmitter of many high-speed standards has a scrambler (whose primary function is to reorganise the transmitted data to avoid undesired sequences such as a large number of consecutive logic 1s [or 0s]), there is no need to include additional hardware to randomise data and reduce its average level.
A high-speed link also has to execute a training and calibration procedure before data transmission starts. In a training process, the transmitter and receiver communicate with each other mainly in order to tune equalisation and clock-and-data recovery parameters. Therefore, a group of specific data sequences is produced at the transmitter to adjust DFE coefficients (h 1,2…n ) and CDR loop. Traditional training data sequences have a period composed of a logic 1 followed by a 0, as Figure 12 shows. This sequence has a 0 DC value, so PORT is compatible with current training procedures without the need for additional hardware. Although offset reduction is a process that has to be executed before equalisation tuning (because equalisation depends on the sampling precision of the input signal), the pattern shown in Figure 12

| SYSTEM IMPLEMENTATION
PORT was tested on silicon, employing the system described in the block diagram of Figure 13. Considering that the main PORT application is high-speed serial links, testing and validation consist of measuring differences in eye-diagram apertures at pre-amplifier output before and after calibration. Eyediagram measurements are performed on-chip without the need for external probes. A pseudo-random bit sequence (PRBS) source sends a bitstream through an emulated low-pass channel for recovery using the slicer. The testing system is composed of PRBS, a digitally programmable low-pass filter, a digitally controlled phase-mixer, a strong-arm comparator with a current-controlled pre-amplifier, and a serial peripheral interface (SPI). PORT's core is on the feedback path of the comparator and, given its fully digital implementation, it is possible to control its performance (and operation of other blocks) through the SPI. The majority voting block was implemented off-chip by computing the voting rule using data extracted by a field-programmable gate array (FPGA). Then, this output is applied to the circuit via the SPI interface.

| Eye-diagram calculation method
The strategy to test PORT is to build an eye-diagram before and after calibration at pre-amplifier output nodes, using data collected through the SPI. The differences in maximum and minimum levels of the eye-diagrams show the effectiveness of the proposed method. PRBS acts as a data generator, aiming to emulate information from a communication link. The low-pass filter performs as a lossy channel, and its programmable cutoff frequency is used to simulate different losses (bandwidth and attenuation). The comparator uses the phase mixer to vary the instant for input data sensing. The pre-amplifier has four different current sources: the calibration DACs control the first two sources, and the others are used for eye-diagram calculation and twisting. Figure 14 presents the micro-photography of the implemented system, which was taped out in a CMOS 130 nm standard technology with a 1.2 V supply voltage to prove the concept. The dimensions of the calibration circuit are 134 � 35 μm. Both phase-detector and FSM are fully synthesisable, allowing migration between different technology nodes. The FSM occupies 5 � 35 μm and includes features such as variable output resolution (for coarse and fine calibration and variation of feedback gain and convergence time) and sign controlling for negative feedback testing.
The procedure for constructing an eye-diagram can be explained as follows. The output signal of the pre-amplifier in Figure 13 (v PRE ), which is represented by the circuit in Figure 15, is: where A V is the pre-amplifier gain, v CH is the output of the lowpass filter, and V DC is a DC unbalance provoked by the difference between the two bias currents controlled by DAC 3 and DAC 4 (magnitude and sign). The larger the difference between I B1 and I B2 , the larger the unbalance at pre-amp output. The inherent offsets of the low-pass filter and pre-amplifier also affect both v CH and V DC , respectively. A DC unbalance at preamp output will produce a vertical shift in V AMP .
A voltage comparator, whose decision threshold is ideally zero, samples the pre-amplifier output. Voltage levels higher and lower than the comparator threshold will produce a logic 1 and 0, respectively, that is, comparator output corresponds to a 1-bit digital version of pre-amp output; this resolution is not enough to measure a full eye-diagram aperture.
There are three different options to sample pre-amp output with enough detail to measure eye-diagram aperture: increasing the number of voltage comparators with different thresholds [16,17], varying the threshold of only one comparator, or shifting pre-amp output through V DC . The first alternative implies a high power consumption because of the increased number of comparators performing as an ADC. This alternative is also not compatible with traditional DFE topologies. The second option is disadvantageous for highspeed operation because of the increased load at comparator input to produce a variable threshold.
The third alternative implies that the shift caused by V DC will displace the upper and lower limits of v AMP (v eye1 and v eye2 ) up to the comparator threshold, as Figure 15 shows. If |V DC | is greater than v eye1 , so that v eye1 − V DC < 0, comparator output will be always 0 (frame A of Figure 15); otherwise, when v eye1 − V DC > 0, D OUT will vary between 1 and 0 as a function of input data (frames B, C and D). The point where v eye1 is equal to the vertical displacement V DC (v eye1 − V DC = 0) sets the upper aperture of the eye-diagram. Taking into account that V DC can be set digitally using DAC 3,4 , it is possible to have a digital representation of v eye1 .
Following the same procedure, v eye2 can be measured by varying V DC so that the lower aperture of the eye-diagram reaches the comparator threshold, that is, v eye2 + V DC = 0 (frame E).
To know when V DC is lower or greater than the upper and lower apertures of v eye1 and v eye2 it is necessary to capture D OUT and measure its mean value. When vertical displacement V DC adjusts V amp so that comparator threshold is lower than v eye1 and greater than v eye2 , output data D OUT coincides with a recovered version of input data D in . The data source is a PRBS generator whose mean value is 0 after all the sequence, that is, it produces the same number of symbols for 1s and 0s. Hence, if D OUT = D IN the recovered data has the same statistical properties regarding input stream, and thus D OUT mean value (μ D OUT ) is equal to 0 also: If the mean value of D OUT is equal to 0, the comparator can recover data and V DC is bounded within an open region of the eye-diagram. However, if the mean value is greater or lower than 0, V DC corresponds to a closed region of the eye-diagram. When a DC level is added to pre-amp output, D OUT is slanted to 1 (and thus μ = 1) if V DC is larger than the maximum value of V eye1 and vice versa. The average value is calculated from a collection of 50,000 samples of D OUT for each step of DAC 3,4 digital words, which is adequate for PRBS 7 and PRBS 15 sources (128 b and 32 kb length).
The horizontal aperture of the eye-diagram at pre-amp output can be measured by performing the previous procedure given a phase difference between PRBS and comparator clocks. Using a phase-mixer it is possible to calculate the vertical amplitude at different sampling instants, as Figure 16 shows. The phase mixer can be configured digitally for four-quadrant operation, which adds the characteristic of a shifting comparator clock along with an entire unit interval. As a result, by combining vertical and horizontal swap through DAC 3,4 and the phasemixer respectively, it is possible to measure the eye-diagram at pre-amplifier output without accessing it physically.

| Circuit implementation
The implementation of each building block of the scheme in Figures 5 and 13 is based on the classical structures as follows:

-
� Dynamic voltage comparator: This circuit is implemented using a strong-arm topology [18]. The pre-amplifier is based on a degenerated common-source circuit with active load ( Figure 17). Two current mirrors form its bias current for calibration and the other two for twisting and eye-diagram construction. � PRBS: It is implemented using a shift-register counter with programmable word length, producing pseudo-random sequences based on seventh-, 15th-, 21th-, and 31st-order polynomials. � Low-pass filter: The filter is emulating a channel, and corresponds to a Gm-C topology. The gain and bandwidth can be controlled by varying the number of input transconductors and the total capacitance of each node, respectively ( Figure 18). Each Gm stage was implemented using Nauta amplifiers, as in Figure 19, due to its high bandwidth and rapid prototyping by using digital standard cells [19]. � Phase mixer: It corresponds to the well known analogue phase interpolator that uses in-phase and quadrature input clock signals provided by an external source ( Figure 20) to produce 32 different output phases (from 0°to 360°) [20]. � DAC: The authors selected a classical R-2R 8-bit DAC to simplify the design [21], which achieves a maximum operating speed of 200 MHz.
Considering that the phase detector is connected directly to the output of the comparator, the additional load imposed by flip-flops ( Figure 6) could be critical in high-speed applications. A typical DFE structure uses a comparator to resolve 1-tap and 2-tap within 1 UI. Thus additional loading could degrade timing performance.

| EXPERIMENTAL RESULTS
Experimental validation of PORT was achieved using the setup in Figure 21. The testing board contains the fabricated circuit employing the chip-on-board technique. An FPGA was used to set up configuration registers and to extract data by communicating with the on-chip SPI interface.
As a first test, the filter is configured to provide a 26 dB attenuation, and the PRBS length is 15. As a consequence, the filter output is 60 mV since the output signal of PRBS has an amplitude of 1.2 V (supply voltage) and the data rate is 800 Mbps. Although the data rate is lower than state-of-the-art serial links, filter configuration emulates the same attenuation as a common 3 m cable for USB3.1@7Gpbs [22]. The main purpose of this prototype is to serve as a proof-of-concept for PORT.  An offset voltage of 50 mV is forced using the twister transistors and DAC 3,4 , in order to exercise PORT. Figure 22 shows the measured DACs signals during the calibration process. DACs reach the steady state after 400 ns, indicating that PORT has finished. In this test, the calibration loop does not include the low-pass filter (MJV filter), resulting in a higher ripple of the DACs outputs. Figure 22 shows the results for the typical-case post-layout simulation and measurements for one sample. A difference of 35 mV between the two signals indicates the influence of mismatch on the circuit. PORT's average current consumption is 550 μA including DACs. A faster DAC can lead to a reduced calibration time. However, power consumption can be optimised by reducing the clock rate of the calibration loop, at the expense of a larger convergence time.    Figure 23 shows an eye-diagram at pre-amp output using the methodology described in Section 4.1. The data rate is 800 Mbps generated from a PRBS 15 source, and the filter is configured to have an attenuation of 26 dB. The yellow area corresponds to an open region of the eye-diagram because the average value of D OUT is 0, thus implying that V DC is bounded between − v eye2 and v eye1 (Figure 15). The blue region refers to the closed region because the mean of D OUT is different from 0.
The vertical amplitude of Figure 23 is quantified based on the calculation of DC unbalance at pre-amp output and considering the bias current of each transistor: where V OFF1 is the offset of the comparator and V OFF2 the offset of the pre-amplifier. Furthermore, N 3 and N 4 are the digital words that control the bias of the twisting transistor, and V REF is the reference voltage for both DACs. N 3 and N 4 are set to produce V REF /2 at DAC's output, and are varied differentially: first, N 3 increases while N 4 decreases for finding v eye2 ; then N 3 decreases while N 4 increases for measuring v eye1 . As a result, the maximum and minimum values of the yellow region are 65 mV and 55 mV, respectively, implying an inherent offset of 5 mV. Figure 24a,b show the eye-diagram at the input of the slicer without applying PORT. The input data also corresponds to a PRBS 15 source (32 kb). This test also considers 50 mV for both positive and negative offsets. The offset measurement was done based on the difference between the maximum and minimum values of each diagram. The measurement also includes the contributions of the pre-amplifier, the comparator and the twister's DACs. The vertical amplitude of both eyediagrams is 113 mV for a filter attenuation of 26 dB, while the time window is 1.25 ns -indicating a data rate of 800 Mb/s. This diagram was constructed with a 5-bit time (phase difference) resolution -which is related to the resolution of the phase mixer -and 8 bits for amplitude shifts (DACs resolution). These values impose a step of 78 ps and 4.7 mV for the X-axis and Y-axis, respectively.  Figure 25 shows the eye-diagram after applying PORT. A majority-voting-based digital low-pass filter was included aiming to minimise the residual offset. The MJV block was implemented in software using data extracted through FPGA and applied via the SPI interface. The diagram is now centred around 0 V, showing the effectiveness of the proposed technique. The residual offset is 6 mV, which is caused mainly by the DAC resolution and corresponds to the minimum value that can be sensed by Equation 5. Figure 26a,b presents another two eye-diagrams with an offset of 30 mV and for a filter attenuation of 23 dB; and Figure 26c shows its corrected version indicating also a successful offset correction.
The use of a high resolution for the DAC results in a smaller residual offset. For instance, if DAC's resolution is increased by 3 bits, the residual offset is scaled by a factor of 8 (750 μV). The main issue in modifying the DACs is the necessity for a high-resolution converter (greater than 12 bits) with a highly linear behaviour (low DNL and INL) to achieve an offset lower than 1 mV. Any DAC non-ideality will affect the effectiveness of the offset correction.
Another experiment was implemented in order to emulate a vertical displacement and find the maximum offset that the calibration can stand. First, a larger unbalance compared to the one measured in Figures 24 and 26 is induced by the twister, and then the calibration circuit is turned on; next, the unbalance is increased even more and the calibration is performed again. Using twister's DACs it was possible to generate an offset of 245 mV, resulting in a successful calibration. Finally, Table 1 summarises some related works with the proposed technique, focussing on the compatibility with DFE topology.

| CONCLUSIONS
The phase-domain offset reduction technique (PORT) for dynamic voltage comparators has been proposed and verified experimentally. The proposed method uses the output-data phase as a variable to measure offset. Because comparator input is not connected to V cm with offset calibration, PORT's best characteristic is to avoid loading the signal path. The proposed method tracks the influence of temperature and supply voltage variations over offset along data transfer. The calibration core was fully synthesised, which extends the technique to different fabrication processes and applications.  -309