Kernel mapping for mitigating nonlinear impairments in optical short-reach communications

: Nonlinear impairments induced by the opto-electronic components are one of the fundamental performance-limiting factors in high-speed optical short-reach communications, signiﬁcantly hindering capacity improvement. This paper proposes to employ a kernel mapping function to map the signals in a Hilbert space to its inner product in a reproducing kernel Hilbert space, which has been successfully demonstrated to mitigate nonlinear impairments in optical short-reach communication systems. The operation principle is derived. An intensity modulation/direct detection system with 1.5-µm vertical cavity surface emitting laser and 10-km 7-core ﬁber achieving 540.68-Gbps (net-rate 505.31-Gbps) has been carried out. The experimental results reveal that the kernel mapping based schemes are able to realize comparable transmission performance as the Volterra ﬁltering scheme even with a high order.


Introduction
Driven by emergence of cloud applications, more than 70% of network traffic stays within the datacenters, where high-speed short-reach transmission techniques are highly demanded [1]. Optical fiber transmission has evolved over decades and been widely recognized as the most cost-and energy-efficient technique to offer ultra-high capacity [1,2]. The datacenter I/O port transmission speed is soon approaching 200-Gbps per lambda [3]. To meet such a high data rate, the intensity modulation direct detection (IM/DD) systems with advanced modulation formats such as pulse amplitude modulation (PAM) or discrete multi-tone (DMT) have been experimentally demonstrated [4][5][6][7][8]13]. However, such an IM/DD system imposes stringent requirements on optoelectronic devices, bringing a significant challenge to cost-sensitive datacenters. Meanwhile, spatial division multiplexing (SDM) techniques provide an alternative approach to cope with an ever-increasing capacity demand, showing a great potential to achieve 1-Tbps and beyond [9]. Novel system techniques, such as vertical cavity surface emitting laser (VCSEL) with multi-core fiber (MCF), are promising to increase the spatial bandwidth density [10]. Although the cost-effective broadband opto-electrical components have been developing rapidly to keep pace with the capacity requirement of optical short-reach systems, the impairments, especially the nonlinear ones, introduced by these components to the high-speed short-reach communication systems cannot be neglected and become a significant issue that hinders the capacity improvement [2]. These nonlinear impairments come from non-idealities of various opto-electronic devices [2,11], such as the nonlinear modulation characteristics from the laser, saturated power amplification from the optical/electrical amplifier, square law detection of the signal at the photodetector, and consequently are difficult to be mitigated by using any analytical modeling.
Recently, Volterra filtering [12,13] and machine learning (ML) algorithms [14,15] have been introduced as numerical methods for channel equalization against the nonlinear impairments in 200-Gbps and beyond optical short-reach communication systems [4][5][6][7][8]13]. Compared with the linear signal processing methods employed only in one-dimensional scalar signal space, such as feed forward equalization, these methods handle the signal recovery in a complete and sequential high-dimensional space, known as a Hilbert space (HS). However, they still have limited benefits to mitigate nonlinear impairments. The Volterra filtering scheme uses high-order multiplications of all the inputs that have been applied to the system, and its complexity increases exponentially with nonlinear kernel orders. Typically, to keep complexity in an acceptable level, the nonlinear kernel order of no more than 3 is implemented [12,13]. However, a small order is limited to compensate the complicated nonlinear impairments in optical short-reach systems. Many ML algorithms, such as [14,15], compensate the nonlinear impairments through learning from multiple neural layers, which also treat the signal in a high-dimensional manner. Nevertheless, the more the neural layers are introduced, the higher the computational complexity is. It remains an open question whether the high computational complexity of ML-based equalizers are proper in optical short-reach communications.
From the signal processing perspective, the aforementioned challenge of the existing methods that mitigate the nonlinear impairments is referred to as 'curse of dimension' dilemma [16]. When the signal dimension increases, the performance can be improved but causing intolerable complexity. It often happens that when increasing the signal dimension up to a certain level, a very little performance improvement needs a dramatic increase of the calculations. One way to address this dilemma is to find a suitable dimension mapping function. The kernel mapping, also referred to as 'kernel trick' [18], was proposed to address the 'curse of dimension', and hence has a great potential to tackle with the issues caused by complicated nonlinear impairments in optical short-reach communications as well. A kernel mapping function maps the signals in a HS to the inner product in a reproducing kernel Hilbert space (RKHS). Therefore, the complicated calculations caused by a high (or infinite)-dimensional space are not needed.
In this regard, this paper introduces the kernel mapping method for signal processing in optical short-reach communications, where 'Mercer kernel' is utilized to map the signals in the HS to the inner product in the RKHS. A 540.68-Gbps (net-rate 505.31-Gbps) optical short-reach system with IM/DD, DMT, 1.5-µm VCSEL and 10-km 7-core fiber is carried out to experimentally demonstrate the kernel mapping idea. In this paper, we significantly extend the previous work reported in [19] by: 1) providing theoretical analysis and operational principles of the kernel mapping; 2) introducing mathematical derivations of the kernel least mean square (KLMS) and kernel recursive least squares (KRLS) algorithms; 3) showing experiments results for the DSP flow optimization process; 4) extending the experimental results with both KLMS and KRLS algorithms. The results reveal that kernel mapping is effective to mitigate the nonlinear impairments for short-reach optical communications. The proposed KLMS and KRLS achieve comparable transmission performance with Volterra filtering scheme.

Operation principles
In this section, we introduce operational principles of the kernel mapping for nonlinear impairment mitigation in short-reach optical communication systems.
In this paper, matrices are denoted by boldface capital letters, vectors are denoted by boldface lowercase letters, and scalars are denoted by lowercase letters. A boldface lowercase letter with a subscript represents a vector in the vector space. A boldface lowercase letter with blankets means a vector sampled at a given moment, while a normal lowercase letter with blankets means a scalar at a given sampling point.
The basics of HS and RKHS are detailed in Appendix A.1 and Appendix A.2, respectively. To employ kernel mapping in the optical communications, the signal processing in the RKHS is done with the inner production of the signals. To further improve the analytic power of the RKHS in optical communications, kernel mapping is needed for transforming the reproducing kernel κ (x i , x j ) from the RKHS to the feature space F composed of feature vectors ϕ (.) of the RKHS.
According to Mercer's theorem, any reproducing kernel κ (x i , x j ) in the RKHS can be expanded with the non-negative eigenvalues λ p and eigenfunctions θ p , which is expressed in (1). (1) Then, kernel mapping ϕ (.) is constructed as follows: where ϕ (.) is a kernel vector (i.e., a feature vector) in the feature space F. The dimension of the kernel mapping feature space is determined by the number of positive eigenvalues, which is infinite when Gaussian kernel function is employed. Thus, F is equal to RKHS with Gaussian kernel function. Without loss of generality, we do not distinguish F and RKHS in this paper. An important rule of the kernel function κ (x i , x j ) is expressed as: A concrete example of expanding calculation of 2 samples x i, p and x i, q in Gaussian kernel with (3) is presented as follows.
Equation (4) demonstrates the Gaussian kernel function is capable of processing all the orders of the kernels, i.e., (1, x, x 2 , x 3 , . . . ), where the higher orders (x 2 , x 3 , . . . ) represent nonlinear distortions. Therefore, after the kernel mapping the nonlinear effects are transformed into the inner production. However, the usage of the Gaussian kernel function in our proposed method does not need the concrete expansion shown in (4b), while the inner production shown in (23) in Appendix A.2 is enough. It is easy to find that the feature space F is the same as the RKHS since they share the same basis set. If we denote a vector ρ in the RKHS as ρ = m i=1 a i ϕ(x i ), any continuous mapping {f | R t ->R} in the RKHS meets (5) [18].
Equation (5) implies that the linear model in the RKHS has the universal approximation property [20]. Combining (28) in Appendix A.2 and (5), we can recover the signal impaired by the high-order nonlinear distortions, in the RKHS in a linear manner. Such signal recovery is based on the optimization of ϕ and ρ with training data sets. The kernel mapping in the RKHS is illustrated in Fig. 1. The initial signal vector x is distorted by the nonlinear impairments, which are difficult to be modeled by any analytic solutions in the HS. After kernel mapping ϕ (.), we can get an analytic solution in the RKHS with feature vectors ϕ (x). Then, ρ T ϕ based transformation (i.e. matrix rotation) is used to map the signals in the RKHS to R, which can be easily processed with linear compensating schemes for mitigating the nonlinear impairments. The strategy to employ kernel mapping is to formulate the classic linear signal processing in the RKHS to iteratively mitigate nonlinear impairments. We use the training data set {x(k), d(k)}| k=1,2,...N , to obtain the optimal parameters for kernel mapping ϕ (.) as well as the equalization coefficients. Here, x(k) is the received signal vector at the k-th sampling point in the HS, d(k) is the transmitted signal at the k-th sampling point, and N is the training data size. w(k)| k=1,2,...N is the estimated filter weight vector at the k-th sampling point. Assuming d=(d(1), d (2), . . . , d(N)), the error signal as the difference between d and kernel mapped signal is shown as follow: Equations (6) and (28) in Appendix A.2 are the same if (3) is satisfied. The geographical interpretation of the error signal e, desired signal d and kernel mapped signal is shown in Fig. 2.
The kernel mapping and the equalization coefficients can reach the optimization convergence point when the error signal is perpendicular to the kernel mapping space [17]. This principle shares with many linear signal processing methods. As a result, combing the kernel mapping and linear signal processing algorithms, such as LMS and RLS, makes optical short-reach communication systems tolerant to nonlinear impairments.

Kernel least mean square (KLMS) algorithm
The design of the KLMS algorithm at the k-th time sampling point in the RKHS is inspired by the classic LMS algorithm, which is expressed as: where e(k) is the error signal between transmitted signal d(k) and equalized signal, w(k) is the estimated filter weight vector, and η is the step-size parameter. By calculating the weight updating formula in (7b) iteratively, we get: Thus, after k-th training samples, the weight estimation is expressed as a linear combination of all the previous and present transformed inputs, weighted by the prediction errors (and scaled by η). This also matches the theorem employing ρ in (5) [20]. When the estimated weights are used to compensate a new set of received vector x', the recovered signal is calculated by: It is interesting to find the weights w absent in the equalized signals that are influenced by the nonlinear impairments. Instead, the sum of all past errors is multiplied by the kernel mapping on the previously received data (i.e., training data). It proves that the kernel mapping is a process without referring to the coefficient w and the high-dimensional feature vector ϕ. Thus, the 'kernel trick' proposed for optical communications can well address the nonlinear impairments in a high-dimensional manner and avoid the 'curse of dimension' efficiently. Then, the KLMS algorithm can be expressed as follows.
The function f k is the equalization mapping at the k-th iteration, which also corresponds to the k-th time sampling point. Similar as the LMS algorithm, (10a) is a mapping function, (10b) is an error function and (10c) is an updating function. The signal processing flow of KLMS is shown as in Algorithm 1. N L is the number of iterations, also the training overhead. All the nonlinear equalizer coefficients are derived using a training data set.

Kernel recursive least squares (KRLS) algorithm
The KLMS deals with the instantaneous value of the squared estimation error, and hence faces the same convergence rate problem as that in the LMS. Comparatively, the RLS algorithm aims at minimizing the sum of the squared estimation errors, it always provides a convergence rate significantly faster and better performance than the LMS.
In the KRLS, by introducing: we can get: (12) can be transformed to: Defining P(k)=[ηI+Φ(k) T Φ(k)] −1 , By introducing P(k) can be presented as: Defining β(k)=P(k)D(k), we can get: Here, e(k) is the prediction error. The KRLS scheme at the k-th iteration is expressed as follows: Similar as the RLS algorithm, (18a) is a mapping function, (18b) is an error function and (18c) is an updating function. The signal processing flow of the KRLS is shown as in Algorithm 2.

Experimental setup
To quantify the benefits of using kernel trick on optical short-reach communication systems, a high-speed IM/DD system is experimentally carried out. The experimental setup is shown in Fig. 3, the digital signal processing (DSP) flow of DMT modulation and demodulation is also included in Fig. 3. The DMT signals are generated offline in MATLAB and loaded into a 92-GSa/s digital to analog converter (DAC). The length of the inverse fast Fourier transform (IFFT) points and cyclic prefix (CP) length of the DMT signal are set to 1024 and 16, respectively. In the experiments, the VCSEL die is electrically driven via a 100-µm GSG 50-GHz probe. The light generated in the VCSEL is coupled into a single-mode fiber. There is no temperature controller (TEC) used in the setup. The VCSEL bias current is set to 7.8-mA. The measured central wavelength of the probed VCSEL is 1543.2-nm and the captured output optical power is 1-dBm. The maximum 3-dB bandwidth of VCSEL is about 22-GHz. The signal is split in a fan-in module of the 10-km 7-core multi-core fiber (MCF), using a 1:8 optical coupler. An Erbium-doped fiber amplifier (EDFA) with 14.8-dBm output power is used before the fan-in module to compensate for an extra loss from de-correlation. The optical delay lines are used to de-correlate the signals to emulate a practical system using seven independently modulated VCSELs. After the 10-km 7-core MCF, signals are detected individually after a fan-out module. The inter-core crosstalk is −45 dB/100 km, and therefore the cores of MCFs can be treated almost independently considering the typical reach within datacenters [21].
The 7 branches connected with 7 cores of MCF is connected to a DCF (−159 ps/nm) and the rest one is treated as the optical back-to-back (OBtB) transmission demonstration. The signal is amplified by a pre-amplifier EDFA and an optical tunable filter (OTF) is utilized to filter out the amplified spontaneous emission (ASE) noise. A 90-GHz PIN photo-detector (PD) is used at the receiver. A variable optical attenuator (VOA) is used before the PD. The signal after direct-detection is amplified by a 65-GHz linear electrical amplifier and captured by a 160-GSa/s digital storage oscilloscope (DSO). After DSO, the captured signal is processed offline.
The measured system frequency response with both OBtB transmission and MCF transmission is shown in Fig. 3(b). The response for the MCF based system is with DCF, which could reduce the influence of chromatic dispersion induced power fading and increase the system bandwidth. However, for short-reach optical communication systems where the fiber length is typically up to a few kilometers, the DCF is not always preferred because of extra deployment cost. In the OBtB case, 220 subcarriers are loaded. Here, bit-power loading [22] is used to improve the system capacity and spectrum efficiency. The quadrature amplitude modulation (QAM) orders vary from 64QAM to 16QAM are employed. In the 10-km MCF case where 210 subcarriers are loaded, bit-power loading is also employed.
The system nonlinear impairments mainly come from the VCSEL's chirp and its iterations with the fiber dispersion. Besides, some other factors, such as the high peak-to-average ratio of the DMT signal, inter-subcarrier mixing in square-law detection of photo detector, and saturation of the electrical amplifiers, also contribute to the nonlinearities of the short-reach optical link.

Experimental results and discussions
The kernel mapping is firstly used to optimize OBtB case, which is shown in Fig. 4. The original BER is measured with linear channel equalization. Here, Volterra filtering considers both the nonlinear distortions up to the 2 nd -, 3 rd -and 4 th -order are compared. In the Volterra filtering, the tap numbers of the 2 nd -, 3 rd -and 4 th -order nonlinear kernels are set to 15, 9 and 11, respectively. In the KLMS and the KRLS, the tap number is 3. With the 2 nd -order Volterra filtering, the BER is reduced with an increased number of training overhead N L and the BER is still higher than the continuously-interleaved Bose-Chaudhuri-Hocquenghem FEC (CI-BCH (1020, 988)) [23] with a BER-limit of 4.52×10 −3 [24]. Volterra filtering reaches the CI-BCH FEC limit when the 3 rd -nonlinear distortions are considered. When the Volterra filtering keeps up to the 4 th -order, the BER performance improvement is limited but at the expense of much higher computational complexity. Thus, the Volterra filtering with nonlinear distortions up to the 3 rd -order is included in the following experimental results. In contrast, KLMS and KRLS reaches BER lower than the CI-BCH FEC limit with 4096 training samples. The 3 rd -order Volterra filtering slightly outperforms the KLMS, and the KRLS outperforms the Volterra filtering. The BER performance versus the received optical power at OBtB case is shown in Fig. 5. The achieved line rate at OBtB is 91.72-Gbps. The linear equalization compensates the linear impairment and reduces the BER to ∼ 3×10 −2 . The system performance is greatly improved with Volterra, and the BER is reduced to CI-BCH FEC limit at 4.52×10 −3 . The KLMS reaches the similar performance as Volterra filtering, and the KRLS outperforms the Volterra filtering. The BER performance after 10-km MCF is shown in Fig. 6. The received optical power is 7-dBm. With kernel mapping, the BER of all 7 cores can be reduced under the CI-BCH FEC limit, the KLMS performs similar as the Volterra filtering, while the KRLS gets better performance. The achieved line-rates for 7 cores are 80.51-Gbps, 72.86-Gbps, 73.36-Gbps, 87.91-Gbps, 78.24-Gbps, 71.77-Gbps, and 76.03-Gbps, respectively. The total system capacity with 10-km 7-core fiber is 540.68-Gbps (net-rate 505.31-Gbps). The kernel mapping carries out infinite dimension calculation from HS to RKHS, which avoid the proposed KLMS and KRLS truncating operations. It is the main reason that the kernel mapping based methods could achieve similar or even better performance than the Volterra filtering scheme, which needs truncation operations in HS to guarantee the acceptable implementation complexity. When the modulation format changes from multi-carrier modulation to single-carrier modulation, such as the pulse amplitude modulation, the kernel mapping based equalization is still experimentally proved to be an effective method [25]. It is worth noticing that the kernel mapping method has a training process to optimize the parameters. The training process needs to be carried out when channel state changes, which do not frequently occur in static channels like the optical fiber channels.
Considering the storage complexity of kernel methods, the KLMS algorithm uses the stochastic gradient descent to obtain coefficients, and it has linear storage complexity in terms of the number of iterations, N L , i.e. O(N L ) [17,26]. The KRLS algorithm, on the other hand, calculates the coefficients by solving a least-square problem, involving the inversion of a kernel matrix, whose dimension depends on the number of iterations. Thus, it has quadratic storage complexity in terms of N L , i.e. O(N L 2 ) [17,26]. Moreover, the number of multiplications of the kernel methods mainly depends on the size of the feature space F that is composed of feature vectors ϕ (.) of the RKHS. A factor that influences the size of F is the number of training overhead N L . For the KLMS, its number of multiplications is equal to N L , while the KRLS has the number of multiplications as N L 2 [17,27,28]. Comparatively, the number of multiplications of the Volterra series equalizer is P p=1 (M p ) p , where M p is the tap number (i.e. memory length) of the p-th order nonlinear kernels. Table 1 shows the number of multiplications of different algorithms. According to the experimental results shown in Fig. 4, the BER performance becomes statured when the training overhead is more than 4096 for both the KLMS and KRLS. Therefore, we take this value to compare the kernel methods with the Volterra filtering.
It can be seen that when the feature space increases the kernel functions requires a higher number of multiplication operations that directly reflect the computation complexity. Although the KRLS always requires less training overhead for a given level of the BER, its higher computation complexity may hinder the practical viability. In [17,29], the authors have analyzed sparsification techniques, such as the novelty criterion (NC), the coherence criterion, the quantization criterion and surprise criterion, to reduce the size of feature space and thus decrease N L . The theoretical analysis has shown sparsification techniques can greatly reduce the complexity of kernel mapping based equalization techniques. In [28], the number of multiplications of the KLMS has decreased from 5000 to 150 after NC sparsification. Therefore, sparsification techniques could be an interesting direction to develop kernel mapping for channel equalization in short-reach optical links. Moreover, the mapping process in Eq. (4a) uses exponentiation, which might require application specific integrated circuit (ASIC) design in the implementation. The authors in [30] exploited an exponentially large quantum state space through controllable entanglement and interference to use the quantum state space as the feature space. It is envisioned the exponentiation of the kernel mapping process could be well treated in the future quantum computers [30].

Conclusions
To conclude, we introduce 'kernel trick' to optical short-reach communication systems to efficiently mitigate nonlinear impairments. From the signal processing perspective, the kernel mapping uses inner production calculation instead of direct high-dimensional mapping scheme, addressing the nonlinear impairments in a high-dimensional manner while avoiding the 'curse of dimension' efficiently. For a short-reach IM/DD system, the experimental results have demonstrated that introducing kernel mapping achieves performance of the BER under the CI-BCH FEC limit that is comparable with the Volterra filtering scheme.

A.1. Hilbert space (HS)
Considering an inner product signal space H with infinite orthonormal basis set {x b i } i=1−>∞ , the signal vector x in H can be expressed as: where a i is the coefficient. Once a i meets the following conditions: the distance between the two vectors in the inner product space H becomes meaningful and the matrix norm is orthonormal. Such an inner product space H is complete and also referred to as HS. The conventional signal analysis and processing in optical communications are normally performed in the HS. The training data set at the k-th sampling point in time is assumed as {x(k), d(k)}| k=1,2,...N , where x(k) is the received signal vector at the k-th sampling point in the HS and d(k) is the transmitted signal at the k-th sampling point, N is the training data size. The target of the recovered transmitted signal over a fiber link at the receiver in the HS is: where W(k) is the linear transformation (i.e. coefficient matrix) of the received signal vector x(k).

A.2. Reproducing kernel Hilbert space (RKHS)
Different from the conventional equalization methods, when introducing the kernel mapping in optical short-reach communications the signal space is changed to the RKHS. We define a function κ(x i , x j ), for which inputs are two t-dimension vectors x i and x j in H, and an output is a real number. κ(x i , x j )is a kernel function if and only if the Gram matrix shown in (22a) is a positive semi-definitive matrix, which is defined as (22b).
As a result, the kernel function κ is a continuous, symmetric and positive function defined in the space H, also referred to as 'Mercer kernel'. Gaussian kernel [17] is one of representative kernel functions and can be expressed as: The parameter σ is the bandwidth of the Gaussian kernel. Assume H to be any vector space of all real-valued functions generated by the kernel κ. Functions f and g from H are defined as follow: β j κ(x j , ·), ∀n, m ∈ N, (24) where α i and β j are the coefficients of the vectors x i and x j defined in the domain H. The bilinear form of f and g is defined as: Since the bilinear form in (25) satisfies the symmetry, linearity and positive definitive properties [20], (25) can be considered as the inner product belonging to the space H. Besides, it also meets the condition of the 'reproducing' property, expressed as: Thus, the kernel function defined in the space H meets the conditions of both the inner product and the 'reproducing' property. Such a function is recognized as the 'reproducing kernel function', and the space composed by the kernel functions is called RKHS. The vector signal x in the HS is mapped to the RKHS by the kernel function κ (., x), and the linear processing mechanism f + g is transformed to an inner product calculation < f, g> . Instead of (21), in the RKHS the target of recovered signal over a fiber link at the k-th sampling point is: The optimal solution for solving (27) can be expressed as: a k κ(·, x(k)), (28) where α i is the optimized coefficients, and N is the length of training data {x(k), d(k)}| k=1,2,...N . Although the functions in the RKHS are expansions with an arbitrary number of variables, the result can be expressed only with the training data set [10,12]. Hence, the optimization problem with an arbitrary number of variables is transformed into the one with N variables. Finally, we reach to (1) in Section II to continue the derivations.