Computationally efficient 104 Gb/s PWL-Volterra equalized 2D-TCM-PAM8 in dispersion unmanaged DML-DD system

Two-dimensional eight-level pulse amplitude modulation with trellis-coded modulation (2D-TCM-PAM8) is proposed to overcome the bandwidth limitation for high-speed signal transmission due to its high spectral efficiency. However, the high coding gain of the TCM can only be achieved in bandlimited additive white Gaussian noise (AWGN) channels and cannot be achieved in nonlinear channels without any equalizers. In the directly modulated laser and direct detection (DML-DD) transmission system, the transceiver nonlinearities and the interaction between DML chirp and fiber dispersion will introduce nonlinear distortion. To compensate for the nonlinear distortion, we propose a computationally efficient piecewise linear (PWL)-Volterra equalizer. In this equalizer, we first use the PWL to correct the skewed eye diagram and then employ a simple 2nd order Volterra to compensate for the residual nonlinear distortions. By using the PWL-Volterra equalizer prior to the Viterbi decoder, the high coding gain of TCM can be achieved. In the experiment, a 104 Gb/s 8-state 2D-TCM-PAM8 signal generated in a ∼ 20GHz DML is successfully transmitted over 10 km standard single-mode fiber (SSMF) in C band, with the bit error ratio (BER) below the HD-FEC limit of 3.8× 10−3. Compared to only using the conventional 2nd order Volterra equalizer with a similar BER performance, the PWL-Volterra equalizer shows 29% computational complexity reduction. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
5G is the driving force of the artificial intelligence, Internet of Things (IoT) ecosystem, cloud services, data center, and edge computing, which has an increasing pressure on bandwidth limitation of existing short-reach optical communication systems. For the data center interconnect (DCI) applications, the intensity modulation and direct detection (IM/DD) systems have attracted many research interests due to their low cost, small footprint and simple configuration [1,2]. Several advanced modulation formats have been proposed for the IM/DD system, such as pulse-amplitude-modulation with four amplitude levels (PAM4), discrete multi-tone modulation (DMT) and carrier-less amplitude-phase modulation (CAP) [3][4][5]. PAM4 has the advantage of only requiring 2-bit digital-to-analog converters (DAC), thus simplifying the transmitter. In the IM/DD systems, a directly modulated laser (DML) based transmitter is more desirable than an externally modulated laser (EML), or Mach-Zehnder modulator (MZM) based transmitter due to lower cost and lower power consumption [6]. To overcome the bandwidth limitation of a DML-DD system, many techniques have been proposed, such as PAM4-duobinary [7,8] and PAM8 [9][10][11]. 100 Gb/s PAM4-duobinary signal transmission over 15 km standard single-mode fiber (SSMF) was demonstrated using one 16.8 GHz DML in the C-band with pre-coding, Volterra equalizer, and 2-memory (16-state) maximum likelihood sequence estimation (MLSE) [7]. 112 Gb/s PAM4-duobinary signal was demonstrated over 1 km SSMF using 18 GHz DML with a digital pre-compensation, and 7-level training-sequence-aided least mean square (TS-LMS) algorithm [8]. 101.25 Gb/s PAM8 signal was transmitted over 10 km SSMF with optical filtering, the joint nonlinear equalization algorithm based on 19-tap cascaded multi-modulus algorithm (CMMA) and 389-tap Volterra filter (VF) [9]. 4 × 96Gb/s PAM8 transmission over 15 km SSMF in O band was demonstrated by using the 3 rd order Volterra equalizer [10]. The above-mentioned schemes require either an optical filter or complex digital signal processing (DSP) algorithms.
Alternatively, multi-dimensional trellis-coded-modulation (TCM) with higher-order PAM has been proposed to achieve high spectral efficiency and high coding gain in IM/DD systems [11][12][13]. In [11], a spectral efficient 31.25 Gb/s two-dimensional eight-level pulse amplitude modulation with trellis-coded modulation (2D-TCM-PAM8) has been demonstrated 0.5 dB sensitivity improvement than standard PAM4 at the same bitrate. It will be desirable to apply the 2D-TCM-PAM8 in the high bitrate DML-DD system. However, when a 2D-TCM-PAM8 signal with a bit rate beyond 100 Gb/s is transmitted through an SSMF in a C-band DML-DD system, the signal quality degrades quickly after the transmission. The main reason is that the TCM is introduced for bandwidth-limited Additive White Gaussian Noise (AWGN) channels [14], but the data channel after the transmission is not an AWGN channel due to linear distortions and various nonlinear distortions. The linear distortions come from the bandwidth limitation of experimental components. The nonlinear impairments mainly come from two parts: 1) high-frequency electronic devices (electrical amplifiers etc.) and optical components (modulators and detectors); 2) the skewed eye diagram induced by the interaction between the chirp of DML and fiber dispersion.
In our previous work [15,16], we have proposed and demonstrated piecewise linear (PWL) equalizer can correct skewed eye diagram effectively in DML-DD transmissions of 56 Gb/s PAM4 over 40 km and 84 Gb/s PAM4 over 20 km. However, the PWL is unable to remove nonlinear distortions other than amplitude-dependent distortions. On the other hand, 2 nd order Volterra can compensate most of the nonlinear distortions, although the number of multiplexers induced by 2 nd order term in Volterra equalizers exhibits a quadratic growth. The aim of this work is to compensate for the linear distortions and nonlinear distortions with reduced complexity. Our proposal is a PWL-Volterra equalizer. In this equalizer, we first use the PWL equalizer to alleviate bandwidth limitation and correct eye skew and then employ Volterra equalizer with a small number of 2 nd order taps to remove the residual nonlinear distortions, such as the nonlinearities of electrical/optical components. We apply the PWL-Volterra equalizer prior to Viterbi decoder, thereby ensuring an approximate AWGN channel for the TCM to keep its coding gain. The PWL-Volterra equalizer enables 104 Gb/s 8-state 2D-TCM-PAM8 to transmit over a 10 km SSMF in the C band with the bit error ratio (BER) below 3.8 × 10 −3 . The implemented PWL-Volterra equalizer has a 29% complexity reduction compared to the conventional 2 nd order Volterra equalizer with the same BER performance.

Principle of PWL-Volterra equalized 2D-TCM-PAM8
The transmitter DSP flow of 2D-TCM-PAM8 is shown in Fig. 1(a). The convolutional encoder and constellation mapper are used to generate the 2D-TCM-PAM8. In the encoder part, 2 bits (b4-b5) of 5 input bits go through an 8-state convolutional encoder with the rate of 2/3, and the other 3 bits (x1-x3) are not encoded, called uncoded bits. After this process, 6 bits (x1-x6) consist of 3 uncoded bits (x1-x3) and 3 coded bits (x4-x6) can be obtained. Next, they will be mapped to two consecutive PAM8 symbols in the time domain: Z1 and Z2 by constellation mapper. The essential concept in the constellation mapper is set partitioning. The purpose of set partitioning is mapping and distributing 6 information bits (x1-x6) mentioned above to 64 constellation points to limit the transitions to occur only at the largest squared Euclidean distance (SED). After this process, 8 subsets: S1-S8, and 8 constellation points in each subset can be observed. The constellation points in the same subset have the largest SED, and the transition and SED between them are called parallel transition and parallel distance, respectively. In contrast, the SED between constellation points of different subsets is named sequence distance. The minimum value between parallel distance and sequence distance is called free distance, which determines the coding gain. Besides, since two symbols are generated from 5 information bits, the spectral efficiency of 2D-TCM-PAM8 is 5/2 = 2.5 bits per symbol. For the PWL-Volterra equalized 2D-TCM-PAM8, its receiver side DSP includes a PWL-Volterra equalizer (a PWL equalizer and 2 nd order Volterra equalizer with a small number of taps) and an 8-state Viterbi decoder as shown in Fig. 1(c). The specific implementation process of the PWL equalizer is described in Fig. 1(d). It can be achieved by three steps, including amplitude threshold decomposition, linear multichannel equalization, and linear addition. Here, the sequence X = [−1, 5.3, −5, 6.7, −6.9, −3] is used as an example. More implementation detail about PWL equalizer can be found in [16].
As we mentioned before, the PWL-Volterra equalizer consists of a PWL equalizer and a 2 nd order Volterra equalizer. The 2 nd order Volterra equalizer can be implemented in the following form [17]: where x(k − l m ) is the k − l m th vector of the received signal sequence x, h 1 (l 1 ) and h 2 (l 1 , l 2 ) are the 1 st and 2 nd order Volterra kernels, which defines the number of involved samples. The computational complexities of both the PWL and 2 nd order Volterra are shown in Table 1. Here, the number of multiplexers indicates the computational complexity. For the PWL equalizer, a pre-equalizer (e.g., FFE) and three parallel FFEs are used. We use L 0 to represent the memory length of the pre-equalizer. This pre-equalizer is used to distribute the received signal around [-7 -5 -3 -1 1 3 5 7] and compensate channel linear distortions partly. Its coefficients are updated according to the LMS algorithm and calculated independently from PWL coefficients. Three parallel FFEs have the same memory length in the experiment, which is represented by L each . As for the 2 nd Volterra equalizer, L 1 and L 2 indicate the tap lengths of 1 st and 2 nd order Volterra, respectively. Computational complexity

Experimental setup
The experimental setup of 104 Gb/s 2D-TCM-PAM8 in the DML-DD system is shown in Fig. 2, which consists of a transmitter, a transmission link, and a receiver. The convolutional encoder, constellation mapping, and Viterbi decoder are particular procedures for the 2D-TCM-PAM8. At the transmitter, 104 Gb/s (41.6 GBd) 2D-TCM-PAM8 is formed with an imported data sequence after the convolutional encoder and constellation mapping. Then the 2D-TCM-PAM8 is pulse shaped by root-raised cosine filter with a roll-off factor of 0.01. The output signal of AWG (64 GSa/s) with peak-to-peak voltage (Vpp) of 0.95 V passes through a 6 dB radio frequency (RF) attenuator, an RF amplifier with 26 dB gain (SHF806A) and another 6 dB RF attenuator before launching into the DML. The first 6-dB RF attenuator is employed to enable the RF amplifier to work in the linear region while the second one is used to reduce the amplitude after the RF amplifier. The DML's center wavelength, 3-dB bandwidth, bias current, input V PP , and output power are 1550.9 nm, ∼20 GHz, ∼115 mA, 4.75 V, and 7.5 dBm, respectively. Here, the high bias current of the DML enables adiabatic chirp to dominate, instead of transient chirp [18,19]. Note that all equalizers in this work are used with 2 samples/symbol. Symbol error rate (SER) is used to optimize the threshold sets of PWL for 2D-TCM-PAM8. After the PWL equalizer, an 8-state Viterbi decoder is used to decode the trellis of 2D-TCM-PAM8 with low memory. Finally, the total number of 32768*2 2D-TCM-PAM8 symbols (32768*5 original input bits) are used to evaluate the BER performance.

Experimental results
In this section, we first optimize the threshold set of the PWL equalizer for 2D-TCM-PAM8, then measure the SER and BER performances of PWL-Volterra equalized 2D-TCM-PAM8. Besides, we show the BER and computational complexity comparisons of PWL-Volterra and conventional Volterra equalizer. Finally, the SER and BER performances of different bit rate 2D-TCM-PAM8 signal over 10 km transmissions are analyzed.

Threshold setting optimization of PWL equalizer for 2D-TCM-PAM8
For the PWL-Volterra equalized 2D-TCM-PAM8, we first optimize the threshold sets of the PWL to achieve its best performance. As mentioned above, the threshold set with τ = {λ 1 , λ 2 } is chosen in our experiment and three FFEs in parallel are used. Before using the three parallel FFEs, we use a pre-equalizer (an 11-tap FFE, L 0 = 11) to distribute signals around [−7, −5, −3, −1, 1, 3, 5, 7] to facilitate accurate segmentation. Since most of the pre-equalized signals are within the range of [−10, 10], we vary λ 1 and λ 2 from -10 to 10 to find the optimum threshold set for the case of 104 Gb/s after 10 km transmission with a fixed ROP of 5.6 dBm. The 3D colormap surface of SER performance versus threshold sets λ 1 and λ 2 of the PWL equalizer is shown in Fig. 3(a). We can observe that the SER function is symmetric with the sub-diagonal from the lower left to the upper right. It is because the order of λ 1 and λ 2 does not affect the equalizer performance. The worst SERs occur at the four corners of this contour diagram. In these cases, the PWL equalizer works as conventional FFE equalizers. A skewed and closed eye diagram with the threshold set {−10, 10} is shown in Fig. 3(i). Thanks to the PWL equalizer, the skewed eye diagram can be well corrected by using the threshold set of {1, −3}, which corresponds to an SER of 0.1[see Fig. 3(ii)]. It is the optimum threshold set we have found. Therefore, the threshold set {1, −3} is used in the following analysis.

SER and BER performances of PWL-Volterra equalized 2D-TCM-PAM8
As shown in Fig. 4(a), we compared the SER performances of 104 Gb/s 2D-TCM-PAM8 over 10 km transmission without equalizer, with FFE (164 taps), with PWL (11 taps pre-equalizer and 51 taps for each segment), with Volterra (81 first-order taps and 7 second-order taps), and with PWL-Volterra equalizers. The corresponding BERs after 8 state Viterbi decoder are shown in Fig. 4(b). Here, FFE has the same total taps of PWL for a fair comparison. We can observe poor SER performance of 2D-TCM-PAM8 without any equalizers as shown in Fig. 4(a). Moreover, the poor SER cannot be improved much even if FFE equalizer is used since linear equalizers cannot compensate nonlinear distortions. On the other hand, the PWL can correct eye skew effectively and the 2 nd order Volterra can compensate most of the nonlinear distortions. However, the PWL is unable to mitigate nonlinear distortions other than eye skew, and the number of multiplexers induced by 2 nd order term in Volterra equalizers exhibits a quadratic growth. Therefore, we propose to first use the PWL equalizer to correct the eye skew and then employ Volterra equalizer with a small number of 2 nd order taps to compensate for the rest of nonlinear distortions. By using this scheme, SER performance is improved. After the Viterbi decoder, the BER performance using the PWL-Volterra equalizer is significantly improved, compared to that without equalizers and that with other equalizers in Fig. 4(b). BERs without equalizers and with FFEs are poor because TCM only works in the AWGN channel but the nonlinear distortion makes it not AWGN channel anymore. Only when the nonlinear distortions are compensated effectively, can TCM show its high coding gain. The PWL-Volterra equalizer can effectively compensate for the nonlinear distortions and enable 104 Gb/s transmission over 10 km with the BER below the HD-FEC limit (3.8 × 10 −3 ). If only using either PWL (51) or Volterra (81, 7), the BER cannot be below the HD-FEC limit. If increasing the number of taps for the Volterra equalizer, the HD-FEC limit can be reached, yet with rather high complexity, which will be introduced in the next section. Therefore, we proposed and used the PWL-Volterra scheme in this experiment. As shown in Fig. 4(c), the PWL-Volterra equalizer has been demonstrated effective for both BtB and 10 km cases. For the BtB case, the BER below the HD-FEC limit can be achieved when the received optical power (ROP) is higher than 2.6 dBm. After the 10 km transmission, the SER and BER of the 104 Gb/s signal are 6.2 × 10 −2 and 3.4 × 10 −3 at the ROP of 5.6 dBm, respectively.

Performance and computational complexity comparison of PWL-Volterra and Volterra equalizer
In the previous section, we have demonstrated that the PWL-Volterra scheme can enable the BER to be below the HD-FEC limit. Here, we compare the computational complexity between the PWL-Volterra scheme and the conventional Volterra equalizer [17,20], to evaluate the computational complexity reduction by using the PWL-Volterra scheme. Figure 5(a) shows the BER performance of 104 Gb/s 2D-TCM-PAM8 over 10 km transmissions by using the different equalizers: Volterra(81, 11), Volterra(81, 11,5), Volterra(81, 11,11), Volterra(181, 11,11) and PWL(51)-Volterra(81, 7). We observed that Volterra(181, 11, 11) achieves similar BER with the PWL-Volterra scheme. And up to 11 3 rd taps are required. Obviously, it brings considerable complexity. Besides, after comparing Volterra(81, 11) and Volterra(181, 11), we prefer 181 1st taps one due to lower BER. Therefore, we focus on only 2 nd order Volterra equalizer and we only include 2 nd order Volterra with 181 1 st taps in BER comparisons in Fig. 5(b). Figure 5( . The complexity bar chart is shown in Fig. 5(b), where the complexity is calculated according to Table 1. Since the PWL with 51-tap linear equalizer for each segment, and 2 nd order Volterra (81, 7) are used in the PWL-Volterra scheme, it only requires 301 multiplexers while Volterra (18,15) requires 421 multiplexers. Therefore, the PWL-Volterra equalizer achieved a similar BER performance as the conventional 2 nd order Volterra equalizer, yet with a 29% reduction in computational complexity.

Performance comparison of different bit rates 2D-TCM-PAM8 over 10 km transmission
In this subsection, we take a quick look at the SER and BER performances of PWL-Volterra equalizer at different bit rates (i.e., 56 Gb/s, 72 Gb/s, 88 Gb/s and 104 Gb/s). Figures 6(a) and 6(b) show the SER and BER performances of these four rates 2D-TCM-PAM8 over 10 km transmission with the previously discussed PWL(51)-Volterra(81, 7). 56 Gb/s (22.4 GBd) and 72 Gb/s (28.8 GBd) signal perform quite well. With this powerful equalizer and 8-state Viterbi decoder, their BER values are below the HD-FEC limit when ROP is bigger than 0.6 dBm. When the bit rate grows to 88 Gb/s (35.2 GBd), since the signal lies close to the borders of system bandwidth, noticeable SER and BER degradation can be observed. The degradation continues until 104 Gb/s signal (41.6 GBd), only when ROP is 5.6 dBm, can BER reach the HD-FEC limit due to bandwidth limitation. Besides, the electrical power spectra of different bit rates 2D-TCM-PAM8 signal over 10 km are shown in Fig. 6(c). From that, we can find evidence of dominant adiabatic chirp.

Conclusion
Two kinds of nonlinearities occur in high bitrate 2D-TCM-PAM8 over a 10 km DML-DD system. They are the transceiver nonlinearities and the interaction between the DML adiabatic chirp and fiber chromatic dispersion. These nonlinearities can degrade the coding performance of the 2D-TCM-PAM8 signal. Here we proposed one computationally efficient PWL-Volterra equalizer to solve this problem. Specifically, PWL is used to correct the skewed eye diagram and 2 nd order Volterra equalizer is used to compensate for the rest of nonlinear distortions. The PWL-Volterra equalizer can enable 104 Gb/s 8-state 2D-TCM-PAM8 transmission over a 10 km SSMF to work below the HD-FEC limit. Besides, we compared the BER performance and the complexity of this PWL-Volterra equalizer with only using the Volterra equalizer. 29% complexity reduction can be observed in comparison with only using Volterra equalizer to achieve the same BER performance.