Reduced-state MLSE for an IM/DD system using PAM modulation

: The performance of a high-speed intensity-modulation (IM)/direct-detection (DD) transmission system could be limited by the bandwidth of optical transceivers. One popular way to cope with this performance limitation is to utilize the maximum likelihood sequence estimation (MLSE) at the receiver. However, a practical problem of MLSE is its high implementation complexity. Even though the channel impulse response can be truncated by using a two-tap filter before applying the MLSE, it still faces an implementation problem when used for multi-level modulation formats. In this paper, we propose and demonstrate a reduced-state MLSE for band-limited IM/DD transmission systems using M -ary pulse amplitude modulation (PAM-M ) formats. We use a conventional Viterbi algorithm to search a reduced-state trellis, which is constructed by using the coarse pre-decision of the signal equalized by a feed-forward equalizer. Thus, the proposed MLSE reduces the implementation complexity significantly. We evaluate the performance of the proposed reduced-state MLSE over 100 ∼ 140-Gb/s PAM-4/6/8 transmission systems implemented by using a 1.3-µm directly modulated laser. The results show that the proposed MLSE achieves almost the same performance as the conventional MLSE but reduces the implementation complexity by a factor of 4 ∼ 10 when the complexity is assessed by the number of multiplications and additions.


Introduction
Proliferation of bandwidth-intensive applications such as media service, cloud storage, and datacenters is now driving a phenomenal growth of data traffic in optical edge networks.Due to sheer market size and short crop rotation of such networks, it is of paramount importance to develop the optical transceivers in cost-effective and future-proof manners.Intensity-modulation (IM)/direct-detection (DD) system is capable of meeting the power consumption and costeffectiveness required for these cost-sensitive applications [1][2][3].Typically, directly modulated lasers (DMLs) and electro-absorption modulated lasers (EMLs), due to their small footprint and cost-effectiveness, are widely used for short-and intermediate-reach links [4][5][6][7].However, the bandwidth limitation of optical transceiver, caused in many cases by the modulation bandwidth of DML or EAM, makes it challenging to increase the data rate as high as required in the networks.In this regard, M-ary pulse amplitude modulation (PAM-M) has attracted a great deal of attention as a practical way to increase the data rate using the bandwidth-limited transceivers [8][9][10][11][12][13].For example, the PAM-4 modulation format has been adopted in IEEE 400GbE standardization.Also, several 200-Gb/s/λ transmission experiments have explored the possibility of using PAM-6 and PAM-8 formats for next-generation 800-Gb/s or 1.6-Tb/s Ethernet links [10][11][12].
The major deleterious effect of bandwidth limitation is the waveform distortions induced by inter-symbol interference (ISI).One popular approach to dealing with the distortions is to utilize the electric equalization technique at the receiver.A feedforward equalizer (FFE) is most often used to combat the ISI.However, this linear equalizer not only enhances the noise at the frequencies where the frequency response of channel is low, but is also not effective in compensating for nonlinear distortions, for example, caused by the nonlinear modulation dynamics of DML [9].The decision-feedback equalizer (DFE) can be used to avoid these problems, but it suffers from bursts of error caused by decision-error propagation.Also, the timing constraints induced by the feedback structure makes it challenging to realize high-speed DFEs.
The maximum likelihood sequence estimation (MLSE) is capable of detecting the data sequence without noise enhancement or error propagation.The Viterbi algorithms are typically employed to search a state trellis for the most probable sequence [14,15].However, when the MLSE is used for a system having a long delay spread, the size of state trellis grows considerably, and as a result, the implementation complexity increases enormously.A practical way to lower the complexity of MLSE is to use a partial response equalizer prior to MLSE.This equalizer truncates the system response by shaping it to a desired impulse response (DIR) of a short delay spread [16].Thus, the partial response equalizer can be composed of two filters: a conventional FFE and a simple post-filter which performs the DIR shaping so as to suppress the noise enhancement arising from the FFE [8,17].This MLSE assisted with channel memory truncation has attracted widespread interest in IM/DD systems [17][18][19][20][21][22].However, when the PAM-M modulation format is utilized, the channel memory truncation alone might not be sufficient to reduce the complexity of MLSE.The complexity of MLSE is governed by the number of trellis states, M L , where L+1 is the length of delay spread.Even though a two-tap post-filter reduces L to 1, the number of trellis states still grows rapidly when M is large.This makes MLSE impractical for multi-level PAM IM/DD systems.A couple of reduced-state MLSE have been proposed to lower the complexity of MLSE.For example, some of the trellis states are labelled with 'active' in the M-method, and the processing takes place only on such active states [23].However, it requires extra sorting process to select these active states.In addition, the complexity of metric computation in M-method is proportional to the constellation size M, and thus increases rapidly in multi-level PAM systems.A reduced-state sequence estimator exploiting the set partitioning was proposed in [24].However, this scheme introduced an embedded per-survivor decision feedback loop, which is generally regarded as obstacles to high-speed systems.
In this paper, we propose and demonstrate a reduced-state MLSE for band-limited IM/DD transmission systems using PAM-M formats.The key idea is to make a coarse pre-decision of the signal at the output of FFE and limit the number of states required in MLSE.The output of FFE at the receiver could be ISI-free (even if it suffers from noise enhancement).Thus, we can make a coarse pre-decision and make the number of most likely symbols, P, smaller than M.These P symbols are then used to construct a reduced-state trellis.Then, the number of trellis states is reduced to P L in our proposed scheme.We evaluate the performance of the proposed scheme using 100∼140-Gb/s PAM-4/6/8 signals generated from a 1.31-µm DML.We show that, even when the value of P is only 2, the proposed reduced-state MLSE achieves nearly the same performance as the conventional MLSE.However, for PAM-4/6/8 systems, the proposed MLSE reduces the number of multiplications and additions by 75%, 89%, and 93.8% respectively, and the storage units by 50%, 67%, and 75% respectively.

Conventional MLSE
The MLSE searches the state trellis for the most probable sequence [16].If the delay spread of channel impulse response has a length of L+1 symbols, the state sequence at time index n can be represented by a state defined as , where x k is the transmitted symbol at a time index k.It implies that the state of a sequence at any instant is determined by its L most recent symbols.Since each element x n−k in the state vector p n has M possible values drawn from PAM-M signal set S, i.e., x n−k ∈S, the trellis has M L states.The typical PAM-M constellation set can be expressed as S = {±A, ±3A, . . ., ±(M−1)×A}, where 2A is the spacing between adjacent levels.Since there are M transitions (or 'branch' in the trellis diagram) to and from each state, the total number of transitions in the state trellis is M L+1 .A cost to each state transition is often referred to as branch metric which quantifies the difference between the corresponding branch sequence and the received signal.Usually, for transitions from a state p n , the branch metric can be computed by the simple squared Euclidean distance as where y n is the received signal, • is the inner product, and the vector f = [f 1 , f 2 , •••, f L ] donates the post-cursor ISI coefficients (where f 0 is assumed to be unity) associated with the channel impulse response.The conventional MLSE, as shown in Fig. 1(a), begins by calculating the branch metrics, and then employs the Viterbi algorithm to search for the path of minimum accumulated branch metrics (i.e., survivor path) throughout the state trellis.Once the survivor path is determined, the index of the trellis states representing the most probable data sequence can be traced back.Finally, these indexes are sent to the decoder block and decoded using by the signal set S. A practical problem about the conventional MLSE is its implementation complexity since the number of trellis states (expressed as M L ) becomes enormous especially when the size of signal set (i.e., M) is large and the channel memory (i.e., L+1) is long.In practice, a partial response equalizer can be used to shorten the impulse response seen by the MLSE so that L can be reduced.As shown in Fig. 1(a), the partial response equalizer, implemented by using a linear FFE followed by a post-filter, is placed prior to the MLSE [8,17].The role of linear FFE is to compensate for the ISI at the expense of noise enhancement.The post-filter shapes the impulse response of the system such that it approximates the original channel impulse response.Thus, it suppresses the noise enhanced by the linear FFE, and thus maximizes the signal-to-noise (SNR) seen by the MLSE [17,25].A two-tap linear filter is often adopted as the post-filter since it yields an ISI with the shortest length.This two-tap post-filter can be written in the z-transform as where 0<α≤1.Overall, the combination of the linear FFE and the post-filter results in an impulse response truncated with a memory length of two, i.e., L=1.Thus, the number of trellis transitions required in the MLSE becomes M 2 .For example, as shown in the trellis diagram of Fig. 1(b), the number of state transitions is 4 2 =16 for PAM-4 signaling.Here, the node indices imply the states in the trellis diagram.The coefficient value of α for the post-filter can be adaptively obtained using the traditional least mean square (LMS) with a training sequence [22,25].

Proposed reduced-state MLSE
Here we propose a reduced-state MLSE to lower its implementation complexity considerably.
For this purpose, we employ a threshold detector to make a coarse pre-decision of the signal before applying the MLSE.The major purpose of this pre-decision is to reduce the number of states required for constructing the state trellis of MLSE. Figure 1(c) shows the block diagram of the proposed MLSE.The partial response equalizer is also used in our proposed scheme.Thus, the ISI induced by the band-limitation can be eliminated by the FFE.This equalized signal at a time index n is denoted by r eq (n).The threshold detector makes a coarse pre-decision of r eq (n), and thus we can limit the number of most likely symbols for r eq (n) to P, where P < M. The value of P is determined by the SNR of the equalized signal and the number of modulation levels.In practical applications, it would be necessary to measure the BER performance as a function of P.
Then, the selection of P would be made based on the trade-off between the performance and complexity of the proposed MLSE. Figure 2(a) shows an exemplary amplitude histogram of 100-Gb/s PAM-4 signal (generated by using a band-limited DML) after applying the FFE.Thus, it shows the amplitude histogram of r eq (n).The signal is nearly ISI-free, but is tainted by the noise, as shown in the inset.Thus, the kurtosis of amplitude distribution for each symbol is mostly determined by the SNR, which is 13.9 dB in this case.We should note that the signal also suffers from the nonlinear distortions induced by the modulation dynamics of DML, which is evident from the amplitude-dependent time skew observed in the eye diagram.The figure shows that the amplitude of each symbol is distributed at around its symbol level with a small variance.This implies that errors would occur mostly between adjacent symbols.Thus, we can limit the number of most likely symbols for r eq (n) to 2 in this case.Then, we can utilize this information to construct the state trellis.Figure 2(b) shows another example where the line rate of PAM-4 signal is increased to 112 Gb/s.In this case, the SNR after equalization is reduced to 9.3 dB.We observe that the kurtosis of the amplitude distribution is lower than that of Fig. 2(a).Thus, r eq (n) is likely to be one of three adjacent symbols when decoded in the end.Then, we can set the number of most likely symbols (i.e., P) to be 3 in this case.Evidently, a set of most likely symbols is composed of the symbols with the smallest Euclidean distance to r eq (n).Therefore, it can be determined by identifying the region in the signal amplitude where r eq (n) belongs.We first divide the signal amplitude into (M − P+1) regions based on the Euclidean distance between the amplitude of r eq (n) and the symbol levels.These regions have (M − P) boundaries, which requires the same number of thresholds.For example, Fig. 2(a) shows the case where the signal amplitudes are divided into 3 regions.Thus, P equals 2 in this case.Given r eq (n), the set of most likely symbols after the threshold detector can be expressed as For example, in Fig. 2(a) where P=2, we show the two decision thresholds with {-A, A}, and the corresponding sets of most likely symbols in Region 1, 2 and 3 are {−3A, -A}, {-A, A} and {A , 3A}, respectively.In another example shown in Fig. 2(b), there is only one decision threshold with two sets of most likely symbols, which are {−3A, -A, A} and {-A, A, 3A} for Region 1 and 2, respectively.One may notice that two sample values which likely correspond to the same symbol, for example, -A (orange curve), are separated into two adjacent regions (i.e., Region 1 and 2).This is because these two sample values have different sets of most likely symbols.For example, a sample value slightly smaller than -A has a higher possibility of being -3A than -A, and vice versa.It is also worth mentioning that the realization of the threshold detector is simple since it is composed of (M − P) comparators.
For r eq (n), we select S n eq , which is composed of P most likely symbols.Since the size of S n eq is smaller than that of S, we can reduce the size of state trellis for MLSE.The state sequence of our proposed scheme is defined as Apparently, the size of the state trellis is not dependent upon the size of the signal set S, but by the value of P, i.e., the size of the set S n eq after the threshold detector.Thus, it has P L states and P L+1 transitions.For example, when P=2, the state trellis for PAM-4 signal with a DIR of length two, i.e., L=1, has only 2 states and 4 transitions, as shown in Fig. 1(d).Compared to the trellis diagram of conventional MLSE shown in Fig. 1(b), this reduced-state trellis is much simpler.Now we describe the use of Viterbi algorithm to search this reduced-state trellis.As shown in Fig. 1(c), we first calculate the branch metrics for transitions from a state p ˆn, as follows |y n − x ˆn − p ˆn • f| 2  (5) Apparently, only P L+1 branch metric computations are processed in Eq. ( 5), which is much less than M L+1 computations required in the conventional MLSE.Then, a conventional Viterbi algorithm can be used to search this reduced-state trellis and outputs the state index representing the most probable data sequence.Finally, these indexes are decoded using the set S n eq .

Complexity comparisons
The complexity of MLSE is mainly determined by the computation of branch metric and Viterbi algorithm.Equation (1) tells us that the computation of branch metric requires L×M L+1 multiplications and (L+1)×M L+1 additions.Meanwhile, the Viterbi algorithm needs M L comparisons, M L+1 additions, and M L storage locations where each location is capable of storing (B+1) values.Here, B is the trace-back length and is typically set to be five times the channel memory length L+1 [13].Overall, the conventional MLSE requires L×M L+1 multiplications, (L+2)×M L+1 additions, M L comparisons, and (B+1)×M L storage units.We can easily estimate the complexity of the proposed MLSE in a similar way.It requires L×P L+1 multiplications, (L+2)×P L+1 additions, (P L +M − P) comparisons, and (B+1)×P L storage units.Note that we include the (M-P) comparisons required by the thresholds detector in this analysis.We find that the complexities of both MLSEs are roughly proportional to the number of state transitions.Thus, the complexity of conventional MLSE increases significantly with the number of modulation levels, M.However, the number of state transitions is not a function of M in our reduced-state MLSE, but by the number of most likely symbols, P. Thus, our MLSE reduces the implementation complexity of MLSE considerably, especially when M is large.For example, our proposed MLSE reduces the number of multiplications and additions both by (M L+1 −P L+1 )/M L+1 , the storage unit by (M L −P L )/M L , and the comparisons by (M L −P L −M + P)/M L .For example, when P=2, for PAM-4/6/8 signals with a truncated channel of length two (i.e., L=1), the proposed MLSE reduces the number of multiplications and additions by 75%, 89%, and 93.8%, respectively, and the storage units by 50%, 67%, and 75%, respectively.We summarize these results in Table 1.
Note that, when L>1, our proposed MLSE also reduces the number of comparisons.

Experimental setup
We evaluate the performance of the proposed reduced-state MLSE on 100∼140-Gb/s PAM-4/6/8 links implemented by using a 1.31-µm DML. Figure 3(a) shows the experimental setup.The PAM signals are generated off-line and then ported to a digital-to-analog converter (DAC) with a 6-bit resolution (bandwidth=35 GHz).In the case of PAM-6 generation, we first map a 5-bit data onto the two-dimensional 32-QAM constellation and make projections on one of the dimensional axes to have two PAM-6 symbols [13].The DAC is set to be operated at 1 sample/symbol.The signals are fed directly to a DML operating at 1313 nm.The DML emits the output power of 12.1-dBm when biased at 85 mA.Operation of DML at high bias current not only increases the output power, but also enhances the modulation bandwidth [26].Figure 3(b) shows the measured E/O response of the DML used in our experiment.The 3-dB bandwidth is measured to be 21.8GHz.The peak-to-peak amplitude of driving signal is set to be 3.56 V to strike a balance between SNR and the waveform distortions induced by nonlinear modulation dynamics of DML [9].After transmission over standard single-mode fiber (SSMF), the PAM signals are detected by using a PIN-TIA detector (bandwidth = 33 GHz) and then digitized at 80 Gsample/s using a real-time oscilloscope.The captured waveforms are processed off-line.The post-detection digital signal processing includes resampling, synchronization, and electrical equalization.The half-symbol-spaced linear FFE equalizer has an asymmetric structure and we optimize the number of the pre-and post-cursors for a given number of equalizer taps: the tap number is set to be 33.The performance of the equalizer is improved as we increase the number of taps, but levels off gradually beyond this set value.A two-tap linear filter is also employed after the FFE as a post-filter.This two-tap post-filter together with linear FFE works as a partial response equalizer and truncates the channel with its memory length of two (L=1).The bit-error ratio (BER) measurement is carried out with 10 6 symbols.The LMS based on training sequences is used to determine the tap coefficients including the value of α for the post-filter.The coefficients remain unchanged after the algorithm converges.

Experimental results and discussions
We first investigate the performance of proposed reduced-state MLSE for 100-Gb/s PAM-4 signal.Figure 4 shows the BER performance of PAM-4 signal after 0-and 40-km transmissions.Also shown in this figure are the BER performances obtained by using the linear FFE and the conventional MLSE.The results show that the receiver sensitivities are improved slightly after 40-km transmission.This should be ascribed to the negative dispersion of SSMF at the operating wavelength of the DML.The figures also show that the linear FFE always exhibits the worst performance and cannot reach the forward error correction (FEC) threshold of 5×10 −3 [27].This is because of the noise enhancement and nonlinear distortions in our DML-based transmission system.On the other hand, the MLSE can alleviate the noise enhancement, and thus improves the BER performance significantly.For example, the conventional MLSE improves the BER from 1.2×10 −2 to 3.6×10 −6 at a received optical power of −3 dBm after 40-km long transmission.
The results also show that our proposed MLSE with P=2 exhibits a similar performance to the conventional MLSE: the sensitivity difference between two MLSEs is merely ∼0.1 dB.This confirms that, in present of the noise enhancement induced by the linear FFE, errors occur mostly between the adjacent symbols, as shown in Fig. 2(a).Thus, we can limit the set of most likely symbols for each element in the state vector p n from 4 to 2 in the state trellis.The slight performance degradation is mainly due to the fact that errors occur between non-adjacent symbols occasionally.Thus, we can improve the performance of our proposed MLSE by setting P=3.
The results also show that the reduced-state MLSE with P=3 achieves the BER performance as good as the conventional MLSE after 0-and 40-km transmissions.Next, we increase the line rate to 112-and 120-Gb/s by employing the PAM-6 and PAM-8 formats, respectively.The BER performances of these PAM signals are plotted in Fig. 5 and 6.Due to relatively low baud rates of these two PAM signals with respect to 50-Gbaud PAM-4, the signals suffer less from the bandwidth limitation of DML (than the 100-Gb/s PAM-4).Thus, we can achieve BERs lower than 10 −2 at the received power of −3 dBm by using only FFE.Nevertheless, we cannot reach the FEC threshold by using this linear equalizer.The results show that the conventional MLSE brings evident BER improvements, especially for PAM-6 signal.For example, it achieves a BER of 5×10 −3 for PAM-6 signal after 0-and 40-km transmissions.The results show that our proposed MLSE with P=2 exhibits the same performance as the conventional MLSE for both PAM-6 and PAM-8 signals.It is worth noting that our proposed MLSE with P=3 does not bring any further improvements.This is because we have a relatively high SNR at the output of FFE due to the lower baud rates of PAM-6 and PAM-8 signals than that of PAM-4.Thus, the noise enhancement caused by FFE is not significant enough to make P larger than 2 in our proposed reduced-state MLSE for both PAM-6 and PAM-8 signals.It is apparent that the tap length of linear FFE greatly affects the overall performance of our system.Thus, we investigate the performance of the transmission system for various tap lengths of FFE. Figure 7 shows the BER performances versus the tap length of FFE for PAM-4/6/8 signals at a received optical power of −3 dBm.It shows that we cannot reach the FEC threshold by using the linear FFE even if the tap length is as large as 50.The results show that the proposed MLSE with P=3 exhibits the same performance as the conventional MLSE regardless of the tap length of FFE in all cases.When P=2, our proposed MLSE achieves the same performance as the conventional MLSE, except for the PAM-4 format where the proposed MLSE with P=2 underperforms the conventional MLSE very slightly.It should be noted that in comparison with the conventional MLSE, the proposed reduced-state MLSE with P=2 reduces the number of multiplications, the number of additions, and the number of storage units by 75%, 75%, and 50%, respectively, for 100-Gb/s PAM-4 transmission system.Finally, we compare the performance of the proposed MLSE with that of the conventional MLSE as we increase the line rate of the PAM signals generated by using the DML (having a bandwidth of 21.8 GHz). Figure 8 shows the results.The optical power of the received signal is −3 dBm and the transmission distance is 40 km.Due to the limited modulation bandwidth of DML and higher SNR requirements, the BER performance deteriorates as the line rate increases.The results show that the proposed MLSE with P=2 exhibits the same performance as the conventional MLSE all over the line rates in PAM-6/8 transmissions.The reduced-state MLSE with P=2 underperforms the conventional MLSE by 2% in line rate for PAM-4 format.When we adopt P=3, the proposed MLSE performs as good as the conventional MLSE for the PAM formats all over the line rates we measure.We are able to transmit 120-Gb/s PAM-6 signal over 40-km long SSMF using both the conventional MLSE and the proposed one.

Summary
We have proposed and demonstrated a reduced-state MLSE for band-limited IM/DD transmission systems using M-ary PAM format.By making a coarse pre-decision of the ISI-free signal equalized by linear FFE, we reduce the number of most likely symbols of the signal and thus the number of trellis states of MLSE.Then, the complexity of MLSE, which is mainly governed by the number of trellis states, is dependent upon the number of most likely symbols, not by the size of the signal sets.Since the number of most likely symbols cannot exceed the size of the signal set, we can reduce the complexity of MLSE considerably.We demonstrate experimentally the performance of our reduced-state MLSE by comparing it with that of conventional MLSE over 100∼140-Gb/s PAM-4/6/8 transmission systems implemented by using a 1.31-µm directly modulated laser.We show that the proposed MLSE with the number of most likely symbols being 2 performs similar to the conventional MLSE.For PAM-4/6/8 transmission systems, the proposed MLSE reduces the numbers of multiplications and additions by 75%, 89%, and 93.8% respectively, and the storage units by 50%, 67%, and 75% respectively.Thus, we believe that the proposed MLSE could be used to implement high-performance, cost-effective IM/DD systems.

Fig. 1 .
Fig. 1.(a) Block diagram of the convetional MLSE and (b) its trellis diagram with 4 states and 16 transitions.(c) Block diagram of the proposed reduced-state MLSE and (d) its trellis diagram with 2 states and 4 transitions.

Fig. 2 .
Fig. 2. The amplitude distribution of received symbols after linear FFE and the eye-diagrams for PAM-4 signal after 40-km transmission at (a) 100 and (b) 112 Gb/s.The dashed lines are the decision thresholds of the threshold detector.