Cost-effective digital coherent short-reach transmission system with D8QAM and low-complexity DSP

: We propose a cost-effective digital coherent scheme with low-complexity digital signal processing (DSP) for short-reach optical interconnection. Differential 8-ary quadrature amplitude modulation (D8QAM) with 1-decision-aided adaptive differential decoding bypasses carrier recovery and enables cycle-slip-free operation. We experimentally demonstrate that the receiver sensitivity of 400-Gb/s D8QAM is insensitive to the laser type, and is the same as 400-Gb/s 16QAM in the case of 2-km transmission with a distributed feedback (DFB) laser. The proposed adaptive equalizer (AEQ) using real-valued finite impulse response (FIR) filters and shorter tap lengths for the real-imaginary filters allows hardware-efficient implementation with high robustness to the receiver-side timing skew. In the case of 400-Gb/s D8QAM 10-km transmission, our AEQ achieves comparable performance as conventional 4 × 4 real-valued multi-input multi-output (MIMO) and the existing simplified AEQs with complexity reduction of 50% and 14% respectively.


Introduction
New applications such as 5G, cloud computing, augmented/virtual reality and storage have significantly driven traffic demands. With the standardization of 400 Gigabit Ethernet (GbE) for data center networks, 800 GbE or 1.6 TbE is foreseeable soon after 2020 [1]. To realize beyond 400-Gb/s short-reach transmission, traditional intensity-modulation direct detection solution requires more parallel paths, larger electrical bandwidth, or advanced DSP, which inevitably increases the cost or power consumption. On the other hand, the development of photonics integration and complementary metal-oxide-semiconductor (CMOS) technology has rapidly reduced the cost and power consumption of coherent transceivers [2], making them more competitive for short-reach applications. However, the deployment is still challenging due to the use of narrow-linewidth lasers in both ends and high DSP-associated power consumption. To solve these problems, self-homodyne detection using a transmitted pilot tone as the local oscillator (LO) has been proposed [3][4][5]. Since the signal and the LO origin from the same optical source and the transmission path difference is small, frequency offset (FO) estimation is not required and the impact of laser phase noise is minimized, enabling the use of low-cost lasers with large linewidth (e.g. DFB). Nevertheless, automatic polarization controllers are indispensable in such systems for optical polarization de-multiplexing, which eliminates digital AEQ at a cost of increased system complexity and cost. The pilot tone transmitted by additional fiber [3] or polarization [4] also sacrifices fiber capacity and requires high output power lasers to extend system reach.
Since chromatic dispersion is small in the case of short-haul transmission, the complexity of conventional coherent DSP is dominated by the AEQ and carrier recovery. In terms of AEQ simplification, K. Matsuda et al. proposed a complex-valued 1-N AEQ [6], which consists of one 1-tap 2×2 MIMO and two independent N-tap FIR filters. Compared with N-tap conventional 2×2 MIMO, the 1-N AEQ has ∼42% lower complexity due to almost halved FIR filters. In [7], X. Zhang et al. proposed to reverse the order of the two-section filters (i.e. the N-1 AEQ) to further reduce the AEQ complexity for parallel implementation. However, complex-valued FIR filters used in both schemes result in poor in-phase and quadrature (IQ) skew tolerance. Note that we use normal N to represent a specific AEQ (e.g. the aforementioned "1-N" or "N-1" AEQ), while italicized N to denote the filter tap lengths in this paper. To mitigate timing skew impact, J. Cheng et al. proposed another AEQ (hereinafter referred to as JC AEQ), which consists of a 1-tap 2×2 complex-valued MIMO, four N-tap real-valued FIR filters, and a post 3-tap 4×4 real-valued MIMO [8]. By eliminating IQ crosstalk compensation filters, JC AEQ achieves ∼59% complexity reduction compared with conventional 4×4 MIMO. Nevertheless, it sacrifices system performance, giving a maximum numerical transmission distance of 2.5 km for 56-GBaud 16QAM at 1-dB receiver sensitivity penalty. On the other hand, blind phase search (BPS) [9] has been a popular carrier phase estimation (CPE) algorithm for high-order QAM due to its feedforward implementation and superior performance. By contrast, quadraturephase-shift-keying (QPSK) partition based M th -power scheme [10] has lower computational complexity, therefore, is more hardware-friendly. However, both algorithms suffer from cycle slips. To mitigate cycle slips, a recursive probability-weighted BPS scheme [11] based on the prior knowledge of laser linewidth was proposed. A cycle-slip-free operation can be also realized by employing a frequency-domain low-pass filter before phase extraction [12]. In both schemes, FO estimation is needed before phase recovery. To eliminate the impact of FO and cycle slip, double differential encoding [13] is a good solution. The two consecutive phase differentiation operations mitigate laser phase noise impact, therefore, enable large linewidth tolerance. Nevertheless, enhanced noise power due to differential decoding degrades system performance.
In this paper, we propose a cost-effective and hardware-efficient 400-Gb/s coherent scheme for short-reach optical interconnection. Firstly, decision-aided adaptive differential decoding bypasses carrier recovery and mitigates cycle slips. Secondly, the differential coding penalty for D8QAM is 2.6 dB less than differential 16QAM (D16QAM) in theory, relaxing the requirement of long decision feedback lengths. We numerically demonstrate that carrier-recovery-free D8QAM with 1-decision-aided differential decoding provides a 1.4-dB signal-to-noise ratio (SNR) advantage over 16QAM and achieves linewidth tolerance of 1.5×10 −3 even at 2-GHz FO. Experimental results show that the receiver sensitivity of 400-Gb/s D8QAM is insensitive to the laser type, and is the same as 400-Gb/s 16QAM in the case of 2-km transmission with a DFB laser. Thirdly, the proposed AEQ using real-valued FIR filters and shorter tap lengths for the real-imaginary filters enables hardware-efficient implementation and high receiver-side timing skew tolerance. We experimentally demonstrate that the proposed AEQ achieves comparable performance as conventional 4×4 real-valued MIMO with 40% and 50% lower complexity, for 68-GBaud D8QAM 2-km and 10-km transmission respectively. Compared with the existing simplified AEQs, our AEQ provides either a 0.7-dB receiver sensitivity advantage or complexity reduction of 22% at the same 2-km transmission performance. The complexity advantage over the 1-N AEQ and the N-1 AEQ is slightly decreased to 14% in the case of 10-km transmission.

Differential encoding
As depicted in Fig. 1(a), a normalized differential pre-coding is used for high-order modulation formats with multi-amplitude levels, in order to perform only phase integration. In the case of different phase distribution along the radius, absolute phases after differential pre-coding are increased, as can be seen from the constellations of D8QAM in Fig. 1(a), D16QAM, D32QAM, and D64QAM in Fig. 1(c). For star 16QAM with a normalized inner radius of 0.697 and no phase offset between two rings, the constellation after differential encoding keeps the same and is similar to that of D8QAM with a normalized inner radius of ∼0.65. In terms of differential decoding, it is well-known that conventional one-symbol delay differential decoding mitigates laser phase noise and converts FO to constant phase rotation, requiring additional phase recovery for final signal recovery. Moreover, performance degradation resulting from enhanced noise power due to differential operation generally increases with the format order. Maximum likelihood sequence estimation (MLSE) based differential phase detection can be used to improve the performance of the differential decoder. As numerically demonstrated in [14], MLSE with 4 symbols reduces the differential coding penalty to be within 0.13 dB for differential binary-phase-shift-keying and 0.5 dB for differential QPSK. However, this approach is only applicable to the signals without FOs, indicating the requirement of FO estimation in digital coherent systems before differential decoding. Besides, the complexity increases significantly with the format order. Decision-aided adaptive differential decoding is an alternative solution to mitigate differential coding penalty, whilst bypassing carrier recovery [15]. Multiple delayed symbols coupled with feedback decisions and adaptive coefficients (updated based on least mean square (LMS) algorithm) generate a set of symbols, whose phases are roughly aligned with the delayed symbol of interest. Therefore, a less-noisy symbol reference for differential decoding can be obtained by averaging these symbols, consequently improving system performance. As it is not easy to obtain the theoretical curves of all differential encoded formats from formulas, we predict the theoretical limits by simulating data with different formats transmission over additive white Gaussian noise (AWGN) channel without considering other impairments (e.g. FO and laser linewidth, etc.). We can see that the simulated SNR of 16QAM in Fig. 2(a) confirms well with the theoretical expectation. Differential encoding is scalable for higher-order formats at a cost of the increased coding penalty. Specifically, as shown in Fig. 2(a), the required SNR at BER of 10 −3 is degraded by 2.7 dB, 5.3 dB, 6.6 dB, and 7.8 dB respectively, when differential encoding (no decision feedback, L=0) is applied to 8QAM, 16QAM, 32QAM, and 64QAM. By contrast, the SNR penalty of differential star 16QAM is only 1.7 dB and is 3.6 dB less than that of D16QAM due to better phase noise tolerance. This indicates that geometric shaping can be used to minimize differential coding penalty for higher-order formats. Figure 2(b) depicts the SNR penalties for each format in the legend, which are obtained by comparing the required SNRs in the case of differential encoding with a varied decision feedback length (L) and the non-differential case. The required SNRs (at BER of 10 −3 ) for 8QAM, 16QAM, star 16QAM, 32QAM, and 64QAM are 13.7 dB, 16.5 dB, 17.7 dB, 19.6 dB, and 22.5 dB respectively [see Fig. 2(a)]. We can see from Fig. 2(b) that increasing the decision feedback length to 7 enables less than 1-dB SNR penalties for all formats. The tradeoff between the performance improvements and the complexity of practical hardware implementation biases the use of one symbol decision feedback (i.e. L=1) in the adaptive differential decoding [see Fig. 1(b)]. Despite the smallest penalty for differential star 16QAM at L=1, the required SNR is still 2.3 dB higher than 16QAM, due to the 1.2-dB SNR gap between star 16QAM and square 16QAM [see Fig. 2(a)]. By contrast, D8QAM (L=1) outperforms 16QAM by 1.4 dB, thanks to the 2.8-dB SNR advantage provided by 8QAM over 16QAM. It also bypasses carrier recovery with three complex multipliers used for signal recovery, which can be eliminated by utilizing CORDIC in the polar domain [16]. Moreover, D16QAM, D32QAM, and D64QAM with dense constellation points [see Fig. 1(c)] require very high-resolution digital and analog converters to minimize the implementation penalty. With all these factors taken into account, we adopt D8QAM with 1-decision-aided differential decoding in our scheme. We numerically investigate the performance of D8QAM (L=1) transmission over the AWGN channel at varied FOs. As shown in Fig. 3(a), the resultant performance is immune to FOs within 5 GHz. In principle, the tolerable FO range is only limited by the effective bandwidth of the system. We also compare the linewidth tolerance of 400-Gb/s 8QAM, 16QAM at 0-GHz FO, and 400-Gb/s D8QAM (L=1) at 2-GHz FO. QPSK partitioning based Viterbi-Viterbi (VV) CPE algorithm [10] with block sizes (BZs) of 108 and 80 is used for 68-Gbaud 8QAM and 50-Gbaud 16QAM respectively. The block size in this paper is chosen to match the clock rate (∼625 MHz) of the Application Specific Integrated Circuit (ASIC). The overall linewidth symbol duration product (∆ν·T s ) at 1-dB SNR penalty, in comparison with the zero-linewidth case at BER threshold of 10 −3 , is defined as linewidth tolerance. Here, ∆ν represents the overall laser linewidth and T s refers to the symbol duration time. As shown in Fig. 3(b), the linewidth tolerance of 400-Gb/s 16QAM is about half of that of 8QAM, both of which are less than 10 −4 . By contrast, the linewidth tolerance of 400-Gb/s D8QAM (L=1) can be as high as 1.5×10 −3 even at 2-GHz FO. This indicates that low-cost lasers with large linewidth can be deployed in 400-Gb/s D8QAM system with negligible penalty. Compared with 16QAM, D8QAM (L=1) without carrier recovery provides a 1.4-dB SNR advantage and enables larger linewidth tolerance, making it attractive for cost-effective and power-efficient coherent transceivers.

Proposed adaptive equalizer
For short-reach (≤10 km) transmission, polarization mode dispersion is negligible, and chromatic dispersion is a static impairment. Therefore, N-tap conventional 2×2 MIMO can be divided into two sections: two N-tap FIR filters to separately equalize the signal of each polarization; one 1-tap 2×2 MIMO for polarization de-multiplexing. The order of these two-section filters decides the configuration of the simplified AEQ (i.e. either the 1-N AEQ or the N-1 AEQ). We firstly investigate the impact of polarization rotation on the performance of a 400-Gb/s D8QAM (L=1) system with a 3-dB bandwidth of 32 GHz, laser linewidth of 2 MHz and a FO of 1 GHz using different AEQ schemes. For link length less than 10 km, we have considered chromatic dispersion of 170 ps/nm, polarization mode dispersion of 1 ps, and a maximum polarization rotation speed of 50 rad/ms (as standardized in 400ZR) in this simulation. We can see from Fig. 4(a) that the required SNR in the case of 25-tap 2×2 MIMO remains practically the same for polarization rotation speeds within 50 rad/ms. The N-1 AEQ and the 1-N AEQ with 25-tap FIR filters are also immune to the polarization rotation at the cost of slight performance degradation. Compared with the complex-valued N-1 AEQ, which treats IQ components together, the proposed AEQ using real-valued filters [see Fig. 4(b)] processes IQ signals independently, consequently enabling higher robustness to the receiver-side IQ skew. Figure 5 illustrates the coefficients of the 21-tap real-imaginary filters (H x/y,ri/ir ) in the case of 400-Gb/s D8QAM transmission over 2-km standard single-mode fiber (SSMF) numerically and experimentally. We notice that for both cases, the filter weights are dominated by a few taps due to small IQ crosstalk induced by chromatic dispersion. Therefore, the complexity of the AEQ can be reduced by using shorter tap lengths for the real-imaginary filters (i.e. N 2 <N 1 ), as shown in Fig. 4(b). Comparing Fig. 5(a) with 5(b), we also find that the dominated filter weights are central in simulation, while left-shifted in the experimental demonstration due to the inter-symbol interference (ISI) induced by bandwidth limitation. For a given coherent short-reach transmission system, the tap length N 2 and the cropping start position should be optimized. In terms of filter weights update in the proposed AEQ, we utilize the same radius-directed error rather than different errors [7] in order to achieve better performance. Taking x polarization as an example, the N 1 -tap real-real (H x,rr ) and imaginary-imaginary (H x,ii ) filters and the N 2 -tap real-imaginary (H x,ri and H x,ir ) filters are updated based on the principle shown in Eq. (1): µ and k are the step size and the sample index respectively. ε x (k) is the radius-directed error calculated as: R k is the radius of the nearest constellation symbol for each equalizer output. W xx,rr , W xx,ri , W xx,ir , and W xx,ii are the coefficients of the W xx filter in the 1-tap 4×4 MIMO, which are also updated based on the same radius-directed error ε x (k). Due to shorter tap lengths of H x,ri and H x,ir filters, X ′ in,r and X ′ in,i are a section of the input signal vectors X in,r and X in,i respectively. We numerically investigate the SNR penalties of 400-Gb/s D8QAM (L=1) using the proposed AEQ with varied cropping ratios (1-N 2 /N 1 ) in the case of different chromatic dispersions. In this simulation, chromatic dispersion up to 170 ps/nm (corresponding to 10-km SSMF), a 3-dB bandwidth of 32 GHz, laser linewidth of 2 MHz, and a FO of 1 GHz are considered. For each transmission distance, the performance converged at tap length N 1 (denoted in the legend of Fig. 6) in the case of no cropping (i.e. N 2 =N 1 ) is chosen as the reference. We can see from Fig. 6 that for chromatic dispersion within 170 ps/nm (10-km SSMF transmission), performance degradation resulting from halved tap lengths for the real-imaginary filters (i.e. 50% cropping ratio) is less than 1 dB. This gives complexity reduction of more than 50% in comparison with N 1 -tap conventional 4×4 MIMO.

Experimental setup
The experimental setup of the proposed scheme is shown in Fig. 7. In the transmitter-side DSP, 68-GBaud differential pre-coded 8QAM signal was firstly pulse shaped by root raised cosine (RRC) filters with roll-off factors of 0.1 before resampling to 88 GSa/s. After pre-compensating for the transmitter-side skew, independent signals for X and Y polarizations were loaded to the memory of the arbitrary waveform generator (AWG, Keysight M8196A) with a 3-dB bandwidth of 32 GHz. After SSMF transmission, a variable optical attenuator (VOA) was employed to adjust the received optical power with 3% monitored by a power meter. Due to the limited number of DFB lasers, we only used a DFB laser with a central wavelength of 1550 nm, a linewidth of ∼1.3 MHz, and output power of 15 dBm in the transmitter, while a 100-kHz external cavity laser (ECL) was utilized in the receiver. Four outputs of the commercial integrated coherent receiver with 40-GHz analog bandwidth were directly connected to a 256-GSa/s oscilloscope (Keysight UXR0594AP). Offline DSP includes resampling to 2 Samples/symbol, Gram-Schmidt orthogonalization procedure (GSOP), RRC filtering, and clock recovery. Optimized N 1 of 21 and N 2 of 11 are used for H x/y,rr/ii and H x/y,ri/ir filters respectively in the proposed AEQ for 2-km transmission. In the case of 10-km transmission, N 1 and N 2 are increased to 27 and 19 respectively. Adaptive differential decoding with one decision feedback was implemented to recover the signals before symbol de-mapping and final bit error counting.

Experimental results
Since 16QAM has been standardized in 400ZR, we choose 16QAM as the reference for performance comparison. D16QAM with dense constellation points [see Fig. 1(c)] is not experimentally verified in this paper, as very high-resolution digital and analog converters to ensure acceptable implementation penalty are not available. We firstly investigate the extreme performance (i.e. back-to-back transmission) of 400-Gb/s 16QAM and D8QAM using 100-kHz linewidth ECLs and 23-tap conventional 4×4 MIMO. For 400-Gb/s 16QAM, fast Fourier transform based FO estimation and VV CPE (BZ=80) were used for carrier recovery, while adaptive differential decoding without/with one decision feedback was employed for 400-Gb/s D8QAM. As shown in Fig. 8(a), 1-decision-aided differential decoding decreases the receiver sensitivity (at BER of 10 −3 ) of 400-Gb/s D8QAM from -17.8 dBm to -21.2 dBm. This performance improvement is ∼2 dB higher than the numerical result in Fig. 2. Simulation results in Fig. 2 show that the theoretical SNR at BER of 10 −3 for D8QAM (L=1) is 1.4 dB lower than 16QAM. This indicates that in shot-noise limited ideal coherent systems, the receiver sensitivity of D8QAM (L=1) is 1.4 dB lower than 16QAM at the same symbol rates. However, higher symbol rate of D8QAM than 16QAM at the same bit rates reduces the theoretical receiver sensitivity advantage from 1.4 dB to ∼0.15 dB (1.4 -10log 10 4 3 ). This can be validated from Fig. 8(b) that the receiver sensitivity of D8QAM (L=1) is ∼0.1 dB better than 16QAM at 224 Gb/s. The 1-dB receiver sensitivity penalty between D8QAM (L=1) and 16QAM at 400 Gb/s in Fig. 8(a) is attributed to the larger ISI for the higher symbol rate of 68 GBaud. Nevertheless, we believe that the 1-dB penalty is negligible for short-reach transmission, especially when the advantages of D8QAM (e.g. carrier-recovery-free operation and large linewidth tolerance) are considered. Figure 9 compares the performance of 400-Gb/s D8QAM (L=1) and 16QAM transmission over different lengths (0 or 2 km) of SSMF using 23-tap conventional 4×4 MIMO in the case of different transmitter-side lasers. As shown in Fig. 9(a), for 400-Gb/s 16QAM, the transmission penalty in the case of a transmitter-side ECL is negligible. However, replacing the ECL with the DFB in the transmitter degrades receiver sensitivity by ∼0.7 dB for 16QAM, giving the same performance as 400-Gb/s D8QAM. By contrast, as shown in Fig. 9(b), D8QAM remains the same performance regardless of the transmission distance and the laser type, indicating negligible transmission penalty and larger linewidth tolerance than 16QAM. Finally, we investigate the performance of 400-Gb/s D8QAM (L=1) transmission over 2-km and 10-km SSMF using various AEQs. Note that the complex-valued N-1 AEQ in this paper utilizes the same error to update the coefficients of the N-tap FIR filter and the 1-tap 2×2 MIMO (the same way as Ref. [6] did), which enables better performance than using different errors [7]. As shown in Fig. 10(a), the proposed AEQ (N 1 =21, N 2 =11) achieves approximately the same receiver sensitivity as the N-1 AEQ (N=21), indicating negligible impact of halved tap lengths for the real-imaginary filters. Assuming one complex multiplication requires four real multipliers (RMs), the proposed AEQ (N 1 =21, N 2 =11) uses ∼22% and 40% fewer RMs than the N-1 AEQ (N=21) and 15-tap 4×4 MIMO respectively (ref. Table 1), in the case of comparable 2-km transmission performance. Compared with JC AEQ (N=25) and 1-N AEQ (N=21), the proposed AEQ not only outperforms by 0.7 dB but also reduces the complexity by about 12% and 22% respectively. To ensure transmission penalty within 1 dB for an increased link length of 10 km, tap lengths of the FIR filters in all the AEQs are increased. During the experiment, we find that JC AEQ fails in working for 68-GBaud D8QAM 10-km transmission when the received optical power is lower than -16 dBm, due to uncompensated IQ crosstalk. Therefore, the corresponding performance is not shown in Fig. 10(b). We can see from Fig. 10(b) that the proposed AEQ (N 1 =27, N 2 =19) achieves comparable receiver sensitivity as the 1-N AEQ (N=27), N-1 AEQ (N=27), and 25-tap 4×4 MIMO. It is also the most hardware-efficient among all AEQs, with the number of RMs 14%, 14%, and 50% lower than the 1-N AEQ, N-1 AEQ, and 4×4 MIMO respectively. We can notice that the complexity advantage of the proposed AEQ over the 1-N AEQ and the N-1 AEQ decreases with the fiber length due to accumulated chromatic dispersion. Our AEQ using real-imaginary filters for IQ crosstalk compensation enables longer system reach, better performance, and lower complexity than JC AEQ.

Conclusion
We have proposed a 400-Gb/s coherent-lite scheme for short-reach transmission, which is enabled by D8QAM with 1-decision-aided adaptive differential decoding and a hardware-efficient AEQ. D8QAM (L=1) bypasses carrier recovery and mitigates cycle slips impact while achieving high FO tolerance and large linewidth tolerance of 1.5×10 −3 . We experimentally demonstrate that the receiver sensitivity of 400-Gb/s D8QAM (L=1) transmission over 2-km SSMF is insensitive to the laser type and is the same as conventional 16QAM in the case of a DFB laser. The proposed AEQ achieves comparable performance as conventional 4×4 MIMO with 40% and 50% lower complexity for 2-km and 10-km transmission respectively. Despite shorter tap lengths for the real-imaginary filters, our AEQ outperforms the existing simplified AEQs in terms of either performance (0.7 dB in the case of 2-km SSMF) or computational complexity (22% and 14% fewer RMs for 2-km and 10-km transmission respectively) at the comparable performance.
Disclosures. The authors declare no conflicts of interest.