Adaptive Decision Feedback Equalization for Multi-Gbps Serial Links: Challenges and Opportunities Sensor Networks and Data Communications

This editorial briefly examines challenges in design of adaptive decision feedback equalizers for multi-Gbps serial links. The state-of-the-art of serial links over wire channels is briefly reviewed. The impairment of wire channels at high frequencies and their effect on the performance of serial links are examined. It is followed with a close examination of channel equalization techniques to combat inter-symbol interference. Challenges and opportunities in design of adaptive feedback equalizers including timing constraints, power consumption, adaptive references for error generation and thresholds for logic state determination, data-DFE, eye-opening monitor DFE, and edge-DFE, floating tap DFE are examined.


Introduction
Data-intensive applications such as video streaming, cloud computing, and virtual presence devices require data to be transmitted among chips, modules, and chassis via serial copper channels at tens of giga-bit-per-second (Gbps), as shown in Table 1. The data rate of these links is limited by inter-symbol interference (ISI) arising from channel impairment such as finite bandwidth, reflection, and crosstalk [1]. Al-though low-loss channels and low-reflection vias/connectors are highly desirable, they are costly. Combating the effect of channel imperfection by means of channel equalization is proven to be the most robust, effective, and economical. This editorial briefly examines difficulties encountered in design of adaptive decision feedback equalizers for multi-Gbps serial links. In addition, it browses through recent developments that overcome these difficulties and challenges that are yet to be conquered.

Channel Equalization
ISI can be minimized by either boosting the high-frequency components or attenuating the low-frequency components of data symbols at transmitter end prior to data transmission. This is known as pre-emphasis [12][13][14][15]. Pre-emphasis is usually realized using a finite impulse response (FIR) filter with its zeros canceling out the poles of the channels ideally [16]. The order of pre-emphasis filters is low, typically 2~3 revealing that channels can be adequately modeled using low-order low-pass systems provided that the effect of reflection and crosstalk is not accounted for [17,18]. Since the exact characteristics of channels are typically unknown a priori, the optimal coefficients of preemphasis FIR filters cannot be determined precisely. As a result, preemphasis alone is incapable of achieving total channel equalization. The deployment of pre-emphasis is also hindered by the crosstalk between channels and neighboring devices if the high-frequency components of data symbols are overly amplified or the deterioration of signalto-noise ratio if the low-frequency components of data symbols are overly attenuated. Another limitation of pre-emphasis is its inability to eliminate ISI caused by reflection and crosstalk, which is significant at high data rates.
Far-end channel equalization combats ISI by either amplifying the high-frequency components of received data symbols or removing post-cursors by subtracting estimated post-cursors from data symbols. The former is known as continuous-time linear equalization (CTLE) while the latter is termed nonlinear equalization. Since the effect of the imperfection of channels is entailed in data symbols received at the far end of the channels, post-equalization can be adjusted objectively to eliminate the effect of the impairments of the channels so as to achieve better channel equalization. This differs fundamentally from pre-emphasis, which equalizes channels blindly unless a back channel that conduits the performance of pre-emphasis measured at the far end of the channel back to the transmitter exists. Although ideally CTLE is capable of canceling out the poles of the channels so as to achieve full channel equalization, the amplification of the high-frequency components of incoming data symbols, which are often corrupted by noise and disturbances, will worsen noise and disturbances subsequently the performance of data links. It is also difficult to obtain a large gain of CTLE at high frequencies. As a result, CTLE with only a moderate gain is typically deployed. Most channel equalization tasks, especially the Kaviani   removal of reflection/crosstalk-induced ISI, are usually left to nonlinear equalizers [18][19][20][21]. As CTLE is typically con-figured differentially with RC source degeneration feedback where the capacitor shorts the resistor at high frequencies thereby boosting gain at high frequencies, CTLE is subject to the effect of a mismatch-induced input offset voltage. As data symbols are severely attenuated when reaching the far end of channels, the input offset voltage of CTLE can become comparable to the voltage of received data symbols, mandating the compensation of the effect of input offset voltage in CTLE [22,23]. Unlike CTLE, nonlinear equalization removes post-cursors by subtracting estimated postcursors from incoming data symbols directly. The estimate of a postcursor is obtained by multiplying the corresponding past decision of the slicer with an appropriate coefficient, i.e., a tap coefficient. The most widely used nonlinear equalization is decision feedback equalization proposed by Austin in 1967 [24]. A distinct characteristic of DFE is that DFE does not deteriorate crosstalk as no amplification of the high-frequency components of incoming data symbols is performed. In addition, since the taps of DFE can be adjusted in accordance with the response of channels, which entails the effect of the impairment of the channels, DFE is not only capable of eliminating ISI caused by finite channel bandwidth, it also has the ability to remove ISI caused by reflection and crosstalk, a distinct characteristic not possessed by preemphasis. DFE is therefore the most robust and widely used channel equalization technique.

Challenges and Opportunities
Timing constraints DFE is a negative feedback system. The most widely used DFE is based on least mean square (LMS) principle. LMS DFE obtains desired tap coefficients by minimizing the power of the difference between the equalized and desired data symbols at the center of data eyes. For high-speed serial links, only sign-sign least-mean-square (SS-LMS), which only uses the sign rather actual value of the difference between equalized and desired data symbols, is used. The operation of a generic SS-LMS DFE includes data slicing that generates the signed difference (error) between equalized data symbols (the input of the slicer) and desired data symbols (the output of the slicer), the multiplication of tap coefficients with the past decisions of the slicer, and the subtraction of DFE taps from incoming data symbols. These operations must be completed in one unit interval (UI).
The need for positive feedback in slicers so as to yield an output with a large voltage swing sets the lower bound of the latency of the slicers. Most slicers are based on the architecture of sensing amplifiers, which consist of a differentially configured sensing stage and a positive feedback latch output stage. In order to meet timing requirements, loop unrolling emerged as an effective means to avoid the difficulties encountered in lowering the latency of slicers [25][26][27][28]. DFEs with data rates in the vicinity of 10 Gbps typically employ only first-order loop unrolling, i.e., loop unrolling for tap 1 only.
High-order loop unrolling, i.e., loop unrolling for for taps 1~3, has also been used in order to meet timing constraints when data rate reaches tens of Gbps. This, however, is at the cost of exponentially increased silicon and power consumption [5,6,29]. To relax the timing constraints and lower the power consumption of remaining taps, halfrate or even quarter-rate approaches have been used [5,6,16,26,29]. This approaches trades silicon for power consumption.
In addition to data slicing, both multiplication and summation operations are also needed in DFE and these operations must be completed in less than one UI. Each DFE tap to be subtracted from incoming data symbols is the product of a tap coefficient, which is an analog quantity, and the past decision of the slicer, which is a digital quantity. To perform multiplication, fully differential current-steering multipliers with the tail current representing the tap coefficient and current-steering signal from the output of the slicer are widely favored [30][31][32]. The summation block performs the subtraction of DFE taps from incoming data symbols. The large capacitance encountered at the tap summing node of the summation block, arising from the large number of DFE taps, has a detrimental impact on the speed of the summation block [26]. To minimize the delay, current-mode summation is widely favored over its voltage-mode counterpart due to the intrinsic advantages of current-mode circuits [33]. In addition to delay, as the output of the summation block is also the input of the slicer, a large output voltage of the summation block is critical in minimizing slicer error. Increasing the dimension of the input transistors of the summation block, though improving the current flowing to the load of the summation block subsequently the output voltage swing of the block, also reduces the speed due to a large input capacitance. Lowering the load resistance of the summation block, although reducing the time constant of the summing node subsequently the delay of the summation block, also lowers the input voltage of the slicer. One effective way to speed up the summation block without reducing the load resistance is to use inductor peaking, i.e., replacing the load resistors of the summation block with inductors [19,34]. Though effective, the high cost of on-chip spiral inductors needs to be justified. Current-integrating summation with resistor loads offers the advantage of low power consumption [35][36][37]. The replacement of resistor loads with PMOS transistors loads that operate in an ON/OFF mode further lowers the power consumption [30][31][32]. The speed of current-integrating summation can be further increased by re-placing current feedback taps with capacitive charge feedback [5,6,29]. To further reduce power consumption, switchedcapacitor summers were proposed [32]. Since complex clock schemes are needed, switched-capacitor summers are typically used to perform the summation of the first-tap with the rest of the taps implemented using the current-integrating approach.

Power consumption
Increasing power consumption in general helps speed up circuits. The aggressive scaling of CMOS technology has granted designers the permission to trade silicon for power reduction and speed improvement. This is largely achieved using approaches similar to time-interleaving where the rate of the speed and power reduction of time-interleaved blocks is approximately the same as that of silicon consumption increase. Typical examples in Gbps serial links are loop unrolling for the first few taps and half-rate or even quarter-rate clocking for remaining taps. Loop unrolling is geared towards meeting the timing constraints whereas half-rate and quarter-rate clocking is aimed at lowering power consumption. Power-speed trade-off also exists in current-mode logic, which is widely used in serial links due to their advantages of highspeed operation and low switching noise. Recently charge-steering emerged as a promising technique to significantly reduce both the power consumption and the latency of the building blocks of digital logic [38][39][40]. As compared with current-mode logic, charge-steering logic replaces the load resistors and the tail current source of currentmode logic with clocked switches and steer the charge between device capacitors at the output nodes and that at the common-source node. Unlike switched-capacitor networks that require a total charge transfer between clock phases so that the performance of these networks only depends on capacitance ratios, the amount of charge transfer of chargesteering circuits per clock phase is controlled by the input voltage. As a result, by adjusting the ratio of capacitors involved, a tunable voltage gain can be obtained. The small time constant formed by switches and capacitors enables a rapid charge transfer so as to achieve a high-speed operation. The elimination of the tail current source and load resistors also greatly lowers power consumption. As demonstrated in Ref. [39], a 20-fold reduction in the power consumption of a clock and data recovery circuit was obtained.

Adaptive references and thresholds
The variation of the characteristics of channels, the uncertainty in the arrival of reflected signals, and the randomness of crosstalk from neighboring devices require that the number of the taps and the value of the coefficients of the taps of DFE be set adaptively. In SS-LMS DFE, although theoretically the error signal used by DFE algorithms is the sign of the difference between equalized data symbols, which is the input of the slicer, and the desired data symbols, which is the output of the slicer, since the output of the slicer is digital while the input of the slicer is analog, the sign of their difference will always be either positive or negative, rather than be toggled between 1 and -1. This difficulty can be overcome by comparing equalized data symbols with their desired voltages V ref;1 for data 1 and V ref;0 for data 0. The logic state of data symbols is determined by comparing equalized data symbols with a threshold voltage V T , typically set to the common-mode voltage of the data symbols.
For DFE with 4 PAM signaling, the error signals used by DFE are generated by comparing incoming data symbols with their desired voltages V ref;00 for data bits 00, V ref;01 for data bits 01, V ref; 10 for data bits 10, and V ref; 11 for data bits 11. The data bits of incoming data symbols are determined by comparing the data symbols with threshold voltage V T;01 00 defining the border between data symbols 00 and 01, V T;10 01 defining the border between data symbols 01 and 10, and V T;11 10 defining the border between data symbols 10 and 11. Since both the references and thresholds with which incoming data symbols compare are not known a priori and varies with data rate and the characteristics of channels, they should be set in accordance with data rate and the characteristics of channels in an adaptive manner, similar to tap coefficients. For DFE with 2 PAM signaling, a total of 4 adaptive algorithms, one for tap coefficients, one for threshold V T , two for references V ref;1 and V ref;0 are needed, resulting in an excessive amount of silicon and power consumption.
Silicon and power consumption will become prohibitively high for DFE with 4 PAM signaling if every reference for error generation and every threshold for logic state determination are to be set adaptively [41][42][43][44]27]. It was shown in Ref. [44] that the number of references can be reduced from 4 to 1 with the chosen reference for error generation corresponding to one of the four data symbols. The reference is updated only on the arrival of the corresponding data symbol. This approach trades performance for receiver simplicity. A critical piece of information that is missing is the relation between the chosen reference and the performance of DFE, specifically, what are the factors that dictate the choice of the reference and which reference gives the best performance? Also, what is the performance compromise if only one reference is used?

Data-DFE, eye-opening monitor DFE, and Edge-DFE
Unlike SS-LMS data-DFE, eye-opening monitor (EOM) adaptive DFE hereafter referred to as EOM-DFE uses a pre-defined minimum data eye as the benchmark to guide the search for desired tap coefficients [45][46][47][48][49]. Since equalization is activated only when data eyes are smaller than the pre-defined minimum data eye and ceased if data eyes exceed the pre-defined minimum data eye, EOM-DFE is more power-efficient. Similar to data-DFE, which has no explicit constraint imposed on data jitter, EOM-DFE suffers from the lack of a tight constraint on data jitter. For example, a one-dimensional EOM only has a vertical eye-opening constraint and does not have any constraint on data jitter. Although a two-dimensional EOM has a constraint on data jitter, the jitter constraint conflicts with vertical eye-opening constraint. Hexagon EOM tightens jitter constraint without sacrificing vertical eye-opening, however, at the expense of more silicon and power consumption [50]. Data-dependent jitter, a prominent form of the deterministic jitter of data symbols, arises from the impairment of channels and is a major contributor to ISI [51]. Transition (edge) information, normally used for recovering clock embedded in data, was also utilized to generate error information used by DFE [52]. This DFE differs from data-DFE and EOM-DFE and is known as edge-DFE [53]. Edge-DFE searches for desired tap coefficients by minimizing the power of timing error (jitter) of data eyes. Since timing error information is readily available in phase-picking clock recovery [33], no additional cost is encountered in edge-DFE. As compared with data-DFE, edge-DFE possesses a number of attractive characteristics including reduced data-dependent jitter and relaxed CTLE gain constraint [53]. This, however, is at the cost of reduced vertical eye-opening. To eliminate this drawback, both edge-DFE and data-DFE were employed simultaneously in Ref. [16] so as to utilize the advantages of both data-DFE and edge-DFE. In addition, the strength of each DFE can be adjusted objectively and independent of each other. Information that is critically needed is the theory that governs the assignment of the weighting factors between data-DFE and edge-DFE so as to yield the optimal performance.
As phase-picking based data recovery is intrinsically sensitive to errors in data sampling unless a multiple samples per symbol time are taken and a digital filtering mechanism such as majority-voting is employed to filter out irregular samples. Similarly, phase-tracking based data recovery is also prone to errors in samples as it solely relies on data samples at the center of data eyes. Integration-based data recovery, which determines the logic stage of incoming data by integrating the data over one symbol time and comparing the result with a pre-defined threshold, is preferred over their sampling counterparts [54][55][56]. The simultaneous deployment of data-DFE and edge-DFE will maximize both the horizontal and vertical eye-openings of data eyes thereby maximizing the difference between logic-1 and logic-0 subsequently BER.

Floating tap DFE
For severely dispersive channels, a large number of taps are needed in order to remove post-cursors, resulting in both an excessive amount of power and silicon consumption, and a large capacitance at the input of the slicer, which imposes a great challenge on meeting the timing constraint. When impedance discontinuities typically occurring at vias and connectors exist in channels, reflection-induced post-cursors are generated. These post-cursors have two distinct characteristics: Location uncertainty and amplitude uncertainty. Often they reside at locations far away from the main cursor with a large number of insignificant post-cursors in between. If generic DFE is used, the equalization of these channels requires a large number of taps even though those corresponding to the insignificant post-cursors between the main cursor and reflection-induced post-cursors contribute negligibly to the overall equalization of channels. As a result, not only a significant amount of silicon and power is wasted, an exceedingly large capacitor is also introduced at the input of slicers. To address this problem, floating-tap DFE dynamically locates reflection-induced post-cursors and places taps only at the location of the reflectioninduced post-cursors. The number of the taps can be significantly reduced [20,21,57,58]. Despite of its importance, a very limited number of studies on floating-tap DFE are available. More research is clearly warranted on how to dynamically place taps to the location where large reflection-induced post-cursors reside.

Conclusions
The impairment of wire channels and its impact on multi-Gbps serial links were examined. It was followed with a close examination of channel equalization techniques, in particular, pre-emphasis and post-equalization, to combat inter-symbol interference. The difficulties encountered in design of adaptive decision feedback equalizers were investigated and recent developments that overcome these difficulties were studied. Challenges that are yet to be conquered were explored.