Complex principal component analysis-based complex-valued fully connected NN equalizer for optical fibre communications

Xingyuan Huang; Yongjun Wang; Chao Li; Ran Gao; Qi Zhang; Lu Han; Xiangjun Xin

doi:10.1364/OE.502294

1. Introduction

The large amount of information transported using optical fibre deeply affects this digitally interconnected world [1]. Since optical signals are affected by various linear and nonlinear damage in optical communication system transmission, the capacity of optical channels is under increasing pressure [2–6]. Currently, many kinds of mainstream digital signal processing (DSP) algorithms have been able to address most linear impairments [7], such as chromatic dispersion (CD) and polarization mode dispersion (PMD) [8]. In actual optical signal transmission, due to the nonlinear characteristics of optical fibre channels, a high-order modulation signal [9] is more susceptible to nonlinear impairments such as self-phase modulation, cross-phase modulation, and four-wave mixing at high launched optical powers (LOPs). To mitigate the nonlinear effects, nonlinear compensation (NLC) algorithms have been introduced for digital coherent detection [10,11]. Because the transmission characteristics of signals in optical fibres can be characterized by the nonlinear Schrodinger equation (NLSE), several classical DSP techniques have been proposed, such as digital back-propagation (DBP) [12], the Volterra series nonlinear equalizer (VNLE) [13] and perturbation theory-based NLC (PB-NLC) [14]. PB-NLC is typically developed based on the first-order perturbation of the NLSE, and compared with DBP and VNLE, the computational complexity reduction of PB-NLC is generally attributed to the single-step perturbation-based pre/postdistortion (PPD) algorithm.

With the emergence of artificial intelligence, machine learning algorithms based on data-driven computing systems have been widely applied in the domain of optical fibre communications, and an increasing number of studies have attempted to explore the very large potential of machine learning to overcome optical fibre nonlinearity [15–19]. In contrast to the classical algorithms mentioned above, which rely on precise transmission link information, a class of nonlinear models called neural networks (NNs) only require a large number of optical signals at the receiver side to train the networks to establish the mapping relationship between the input and output [20–22]. Finally, the learned mapping is employed at the execution stage to mitigate nonlinear effects. In [23], the authors verified the effectiveness of a nonlinear equalizer based on a bidirectional long short-term memory (Bi-LSTM) neural network in digital coherent optical communication systems. In contrast, we proposed a nonlinear equalizer based on a bidirectional gated recurrent unit (Bi-GRU) neural network to further reduce the computational complexity from the perspective of network structure optimization [24]. Nevertheless, the computational complexity of the overall algorithms was still too high until Shaoliang Zhang et al. creatively proposed a nonlinear equalizer based on perturbation eigenvalue input NNs [25], which greatly promoted the application of NNs as technology to mitigate nonlinear damage in digital coherent optical communication systems. On this basis, the principal component analysis (PCA) technique was applied to a perturbation-aided deep neural network to further improve the system performance [26].

It is well known that digital signals are usually represented in plural form in optical signal processing in optical fibre communication systems. However, most existing real-valued neural network frameworks ignore the correlation between the real and imaginary parts of complex signals. Therefore, a class of complex-valued neural networks (CVNNs) that use complex-valued parameters and variables to process information has emerged as the times require [27,28]. In [27], the authors proposed a scheme for optical signal processing at the communication system receiver based on complex-valued fully connected neural networks and demonstrated the significant advantages of CVNNs. In addition, a ‘complex ReLU’ activation function was applied to channel equalization for the first time in [28], and these studies demonstrated the broad application prospects of CVNNs in the field of optical communication.

To the best of our knowledge, this paper is the first to apply the combination of perturbation theory assistance and CVNNs in coherent optical communication nonlinear equalization. The CVNNs framework we use is based on the fully complex forms of both neuron states and weight coefficients [29], which facilitates more natural processing of the receiving symbols in optical fibre communication systems because they are intrinsically complex. Furthermore, to reduce the space-time complexity, we extend the PCA method to the complex domain to fit our designed framework, which we call complex PCA ($\mathbb {C}$PCA). Based on the above, we propose a $\mathbb {C}$PCA-based complex-valued fully connected neural network (P-CFNN) model driven by perturbation theory to mitigate nonlinear effects in optical fibre communication systems. In summary, the content and innovations of our work mainly include but are not limited to the following:

• Construction of data features: We use first-order perturbation theory to analyse received symbols to obtain nonlinear impairment features, and we specifically propose a $\mathbb {C}$PCA technique to reduce the overall dimension of the perturbation eigenvalues in complex form, which preserves the complex form and can directly drive CFNN.
• Experimental verification: We built an experimental platform to verify our P-CFNN equalizer, which is a 375 km 120-Gbit/s dual-polarization 64-quadrature-amplitude modulation (DP-64 QAM) coherent optical communication system. Then, we verified the influence of the number of hidden layer neurons and complex nonlinear activation functions on the performance of the equalizer according to the change in quality factor (Q-factor) and ultimately selected an appropriate activation function and network structure to fit the experimental platform. In addition, we demonstrated the generalization ability of the equalizer we proposed under different LOPs.
• Performance analysis: We propose an equivalent real-valued fully connected neural network (RFNN) with the same computational complexity as the CFNN to fairly compare the network performance. We discuss the superiority of the overall complexity of the P-CFNN scheme over other schemes. Under the constraint of the same Q-factor, we confirmed that the P-CFNN we propose obtained a 40% reduction in time complexity and a 70% reduction in space complexity compared with PCA-based RFNN (P-RFNN).

The remainder of this paper is organized as follows. In Sec. 2, we review the first-order perturbation analysis principle and introduce the $\mathbb {C}$PCA and CFNN design methods. Section 3 introduces the experimental setup and results of our P-CFNN equalizer and analyses the complexity and other performance indicators. The last section summarizes this paper.

2. Principles

In this section, we review the optical pulse propagation principle and discuss the theoretical basis of $\mathbb {C}$PCA and how we designed the CFNN equalizer.

2.1 Perturbation theory analysis

For our polarization multiplexing coherent optical communication transmission system, the propagation of the optical pulse can be accurately modelled as a well-known Manakov equation as follows:

(1)$$\begin{aligned} &\frac{{\partial {U_{x/y}}\left(z,t\right)}}{{\partial z}} + \frac{\alpha }{2}{U_{x/y}}\left(z,t\right) + j\frac{{{\beta _2}}}{2}\frac{{{\partial ^2}}}{{\partial {t^2}}}{U_{x/y}}(z,t)\\ &= j\gamma \frac{8}{9}\left( {{{\left| {{U_x}(z,t)} \right|}^2} + {{\left| {{U_y}\left(z,t\right)} \right|}^2}} \right){U_{x/y}}(z,t), \end{aligned}$$

where ${U_{x/y}}\left (z,t\right )$ represents complex-valued signal envelopes for $x$- and $y$-polarization at retarded time frame $t$ and distance $z$ in an optical fibre. $\alpha$ represents the linearity attenuation coefficient, ${\beta _2}$ is the group velocity dispersion coefficient and $\gamma$ is the Kerr nonlinearity coefficient [30]. According to Eq. (1), we can obtain the nonlinear perturbation terms for the specified symbol based on first-order perturbation theory, abbreviated as follows [5 31]:

(2)$$\left\{ \begin{array}{l} \Delta U_{x}=P_{0}^{3/2}\sum\limits_{m,n}C_{m,n}T_x\\ \Delta U_{y}=P_{0}^{3/2}\sum\limits_{m,n}C_{m,n}T_y\\ T_x=X_{m}X^*_{m+n}X_{n}+X_{m}Y^*_{m+n}Y_{n}\\ T_y=Y_{m}Y^*_{m+n}Y_{n}+Y_{m}X^*_{m+n}X_{n} \end{array}\right.,$$

where $\Delta U_{x}$ and $\Delta U_{y}$ are nonlinear perturbation terms in the $x$- and $y$-polarization, respectively. $m$, $n$, and $m+n$ are symbol indices with respect to the specified symbol, and $^*$ expresses the complex conjugation. $P_{0}$ is the peak power of the transmitted signal, and $C_{m,n}$ denotes the perturbation constant coefficients. In the conception of this study, machine learning algorithms can be regarded as special tools for $C_{m,n}$ fitting from the received symbols without the link information. In Eq. (2), the triplets of intrachannel four-wave mixing and intrachannel cross-phase modulation are defined by $T_x$ and $T_y$ in $x$-polarization and $y$-polarization, respectively.

As is universally known, screening excellent and meaningful features is the eternal pursuit of feature engineering. $m$ and $n$ are key factors that dominate the triplets. Due to the complexity of the actual calculation, the minimum value range with the greatest impact on the specified symbol should be the first factor to be considered. In this study, we adopt the following rules to define $m$ and $n$ [32]:

(3)$$\left\{ \begin{array}{l} \left| m \right| + \left| n \right| \leq K\\ \left| {m \times n} \right| \leq K \end{array} \right.,$$

where $K$ is a hyperparameter that controls the number of triplets. Figure 1 shows the combination of indices we selected for $m$ and $n$, where the central blue highlight points are selected as triplets.

Fig. 1. The highlight points are selected as triplets.

Download Full Size | PDF

2.2 PCA extension in complex fields

The PCA method is one of the most widely used dimension reduction algorithms for data. For high-dimensional feature data, dimension reduction preserves the most important features of high-dimensional data and removes noise and unimportant features, which can ensure that a large amount of computing time costs can be saved within a certain range of information loss. This feature extraction generally requires that different categories of features have large discriminations in the process of transforming the original features.

As shown in Fig. 2, we assume that $T_x$ and $T_y$ analysed through perturbation theory have $p$ samples and $q$ dimensional features for each sample, which form a complex matrix ${T}_{p\times q}$. The main idea of PCA is to map the $q$ dimensional features onto the reconstructed $r$ dimensional features, which are brand-new orthogonal features called the principal components. The objective of PCA is to find a new set of orthogonal bases (i.e., $H=\left \{h_{1},h_{2},{\ldots }h_{r}\right \},r< q$) to make the data point map on the plane formed by the orthogonal bases. In signal processing, the signals are considered to have large variances and the noises to have small variances. Thus, we believe that the better $r$ dimensional features should satisfy the variances of the projection points of the $q$ dimension samples to reach the maximum value.

Fig. 2. $\mathbb {C}$PCA diagram. The real part of $T_{p\times q}$ is used to fit the reduced dimension matrix $H_{q\times r}$, which will be adopted to transform the dimension of its imaginary part.

Download Full Size | PDF

In [26], the authors converted the imaginary part of the complex feature dimension of ${T}_{p\times q}$ to real values and stacked it with the real part of ${T}_{p\times q}$ to form data points with dimension $2q$. Then, they directly reduce the dimensions of the data points to $2r$ with PCA while ignoring the physical correlation between the real part and the imaginary part of the triplets. As is well known, in practical coherent optical communication systems, the in-phase (I) and quadrature (Q) components of a signal collectively form an equivalent complex baseband model. The symbols of both I/Q channels come from the same light source and modulator, and are transmitted through the same optical fibre. Therefore, the data in the I/Q channels share the same physical characteristics in the same experimental environment. In this context, we can arbitrarily select the real or imaginary parts of the perturbation triplets of the signal to fit the dimension reduction matrix, and then use this matrix to transform the ${T}_{p\times q}$. Based on this idea, we propose a $\mathbb {C}$PCA technique applicable to the framework of CFNN, which can be applied to complex-valued perturbation feature triplets under the premise that the complex structures were guaranteed. We denote the real part data of the triplets as $s_{i} (1\leq i\leq p)$, and the $\mathbb {C}$PCA we propose first projects data point $s_{i}$ onto base $h_{j} (1\leq j\leq r)$. Its projection distance is $s_{i}^{\mathrm {T}}h_{j}$, and $\mathrm {T}$ represents the transpose of the matrix. The variance $D_{j}$ of all data projected on this base is expressed as:

(4)$$\begin{aligned} D_{j}=\frac{1}{p}\sum\limits_{i=1}^{p}(s_{i}^{\mathrm{T}}h_{j}-s_{center}^{\mathrm{T}}h_{j})^{2} & =\frac{1}{p}h_{j}^{\mathrm{T}}SS^{\mathrm{T}}h_{j}, \end{aligned}$$

where covariance matrix $G=\frac {1}{p}SS^{\mathrm {T}}$. Then, we calculate the $h_{j}$ corresponding to the maximum of $D_{j}$ according to the Lagrange operator:

(5)$$\left\{ \begin{array}{l} Gh_{j}=\lambda _{j}h_{j}\\ D_{j}=h_{j}^{\mathrm{T}}\lambda _{j}h_{j}=\lambda _{j} \end{array} \right..$$

The maximum variance is the maximum value of eigenvalue $\lambda _{j}$ of the covariance matrix, and the optimal projection direction is the eigenvector corresponding to the maximum eigenvalue. Finally, we take the first $r$ maximum eigenvectors to fit the dimension reduction matrix $H_{q\times r}$ of the real part data of $T_{p\times q}$, as shown in Fig. 2:

(6)$$H_{q\times r}=\left\{h_{1},h_{2},{\ldots}h_{r}\right\}_{q\times r}.$$

Then, the fitted dimension reduction matrix is applied to the imaginary part of $T_{p\times q}$ to transform the dimensions. Consequently, the real and imaginary components of the reduced dimension $T_{p\times r}$ can be expressed as:

(7)$$\left\{ \begin{array}{l} Re(T_{p\times r})=Re(T_{p\times q})\cdot H_{q\times r}\\ Im(T_{p\times r})=Im(T_{p\times q})\cdot H_{q\times r} \end{array} \right..$$

Considering the inherent connection between the real part and the imaginary part of $T_{p\times q}$, the dimension reduction method of the complex eigenvalue using this $\mathbb {C}$PCA technique is more reasonable in terms of physical interpretability and fits our CFNN equalizer.

2.3 Complex-valued neural network design

In this section, we present the design of our mathematical framework for CFNNs. The above work explains how the $\mathbb {C}$PCA we proposed reduces the dimension of perturbation features. Finally, the plural $T_{p\times r}$ will be sent into the input layer of the CFNN we designed. In our proposed framework, the representation of complex numbers can be expressed as a complex variable $z$, and the complex neuron function can be defined as $f(z)=u(z) +jv(z)$, where $u$ and $v$ are the real and imaginary parts of $f$, respectively. Complex-valued activation functions adopt the following real-imaginary type:

(8)$$g_\mathbb{C}(f) =g\left ( Re\left ( f\right )\right ) +jg\left ( Im\left ( f\right )\right ),$$

where $g_\mathbb {C}$ is a complex-valued activation function and $g$ is a real-valued function and complex neuron function. When training a neural network, the back-propagation process typically employs a gradient descent method to update the weights. However, at the early stage of CVNNs research, the complex-valued activation function was constrained to be holomorphic according to Liouville’s theorem [33], which is not conducive to the generalization of back propagation in complex fields. After Wirtinger calculus extended the concept of complex derivatives to nonholomorphic functions [34], it was found that functions with real and imaginary differentiable parts of each parameter are also compatible with backpropagation. Fig. 3 shows the CFNN architecture for the training process, in which the activation function, weight initialization and loss function are all complex values. We adopt an extended exponential linear unit (ELU) activation function in the CFNN [35]. It is an evolutionary version of a rectified linear unit (ReLU) activation function, which can maintain the activation function in a noise-robust state, and can be expressed as:

(9)$$\begin{aligned} ELU=\left\{ \begin{array}{ll} a, & a>0\\ \mu\left ( e^{a}-1\right ), & a\leq 0\\ \end{array} \right., \end{aligned}$$

where $\mu$ is a hyperparameter that represents the slope of the negative section, typically set to 0.1. Consequently, we employ a complex ELU ($\mathbb {C}$ELU), which applies separate ELUs on both the real and imaginary components of a neuron, as CFNN hidden layer activation functions, i.e.,

(10)$$\mathbb{C}ELU=ELU\left ( Re\left ( z\right )\right ) +jELU\left ( Im\left ( z\right )\right )$$

For weight initialization, we respect the Glorot criterion [36], which limits the weight value to be generated by the input, output and their gradients of the same variances. In addition, the equalizers in this paper mainly discusses classification tasks, so we use the $\mathbb {C}SoftMax$ function as the output activation function to map the probability of each category, which is expressed as:

(11)$$\mathbb{C}SoftMax(z_{i})=\frac{e^{\left | z_{i}\right |}}{\sum\limits_{c=1}^{C}e^{\left | z_{c}\right |}},$$

where $z_{i}$ is the output value of the $i$-th ($i =1, 2, 3,{\ldots }, C$) neuron and $C$ is the number of output node (i.e., the number of classifications) in the output layer. In this paper, we adopted DP-64 QAM signal in our experiment, so the output layer node is 64. To overcome the problem of local optimal solutions as much as possible, we use adaptive moment estimation (Adam) as our optimizer.

Fig. 3. The CFNN architecture consists of complex-valued features in the input layers, two fully connected hidden layers, and back propagation constantly updating the weights.

Download Full Size | PDF

3. Experiment results

3.1 Experimental setup

This section introduces the experimental platform and demonstrates the feasibility of the CFNN equalizer in 120-Gbit/s coherent optical fibre transmission systems. The experimental device for the DP-64 QAM coherent optical communication system is shown in Fig. 4. On the transmitter side, the symbol sequence of the pulse shaping 64-QAM signal generated by MATLAB is uploaded to an arbitrary waveform generator (AWG) at a sampling rate of 25 GSa/s. Each analogue signal is amplified by an electric amplifier (EA) and then sent into an in-phase/quadrature (I/Q) modulator. The nominal linewidth of the external cavity laser (ECL) is 100 kHz. A polarization-multiplexed signal is synthesized using a polarization-maintaining optical coupler (PM-OC), which splits the signal into two channels: one through the delay line (DL) and the other through the polarization controller (PC). The system is composed of a multispan G.652D standard single-mode fibre (SSMF) with a length of $5 \times 75$ km as the transmission chain. The SSMF chromatic dispersion and nonlinear index are $21.6676 \times 10^{-24} s^2/km$ and $1.3 /(W \cdot km)$, respectively.

Fig. 4. Optical communication experimental platform for 120-Gb/s DP-64 QAM. AWG: arbitrary waveform generator; EA: electric amplifier; ECL: external cavity laser; DL: delay line; PC: polarization controller; PM-OC: polarization-maintaining optical coupler; VOA: variable optical attenuator; SSMF: standard single-mode fibre; EDFA: Erdium-doped fibre amplifier; LO: local oscillator; PBS: polarization beam splitter; BPD: balanced photonic detector; ADC: analogue-to-digital converter.

Download Full Size | PDF

At the receiver side, a 100 kHz local oscillator (LO) and the output signal are both sent to a coherent receiver, which is composed of polarization beam splitters (PBSs), 90-degree optical hybrid couplers and balanced photonic detectors (BPDs). The $x$- and $y$-polarization components of the received optical signal and the local oscillator are separately combined and detected by two identical phase-diversity receivers. Finally, analogue-to-digital converters (ADCs) with a sampling rate of 100 GSa/s are set up for analogue-to-digital signal conversion. The offline DSP setup includes a low-pass filter (LPF), which filters the high-frequency components generated by the demodulator to obtain useful signals, I/Q imbalance compensation (I/Q-IC), chromatic dispersion compensation (CDC), clock recovery (CR), polarization demultiplexing (PD), polarization-mode dispersion (PMD) compensation, frequency offset estimation (FOE), carrier phase recovery (CPR), and the CFNN equalizer to determine the symbols. In order to better compensate for nonlinear damage, the received symbols will first pass through a series of linear equalization, which can make the nonlinear input features obtained by the neural network more pure. Therefore, the machine learning algorithm in offline DSP is placed at the end of many equalization algorithms, which can make the neural network equalizer more focused on handling residual nonlinear impairments.

3.2 Results and discussion

In this section, we validated the performance of the perturbation theory-aided P-CFNN equalizer for 64-QAM symbolic decisions in terms of nonlinear compensation. We verified the feasibility of the proposed algorithm by transmitting 64-QAM signals over a 5-span 75 km SSMF. Figure 5(a)$\sim$(d) correspond the 64-QAM signal receiving constellation before the nonlinear equalizer with launched optical powers (LOPs) of -4 dBm, −1 dBm, 1 dBm and 5 dBm, respectively. The signal constellation suffers from severe nonlinear distortion at high LOPs.

Fig. 5. Constellations of the received side signals at different LOPs. (a) −4 dBm constellation; (b) −1 dBm constellation; (c) 1 dBm constellation; (d) 5 dBm constellation.

Download Full Size | PDF

In practical optical communication engineering, it is meaningful to compare the performance of NNs only at an equivalent level of complexity. Currently, there is still a lack of standards for CFNNs and RFNNs with equivalent comparative capacities. In this section, we focus on how to compare the two kinds of NNs under the same time complexity and the impact on their space complexity under this condition. Time complexity describes the relationship between the time required for algorithm execution and the amount of input data. Multipliers usually require much more logic gates than adders, which directly leads to the computational complexity of multipliers is several times higher than that of adders. Therefore, for the optical fibre communication DSP algorithm that incorporates neural network structures, the simplest estimation of time complexity refers to the number of real multiplications of the algorithm only. This metric is also known as the number of real multiplications per symbol (RMpS) [37]. On the other side, the space complexity describes the relationship between the storage space required for algorithm execution and the amount of input data, which is usually indicated by the number of parameters.

In the following sections, $\mathbb {C}$ and $\mathbb {R}$ indicate whether a value corresponds to the complex domain or real domain, respectively. Considering that complex plane $\mathbb {C}$ is isomorphic to $\mathbb {R}^2$, one complex parameter ($P_\mathbb {C}$) is equivalent to two real parameters ($P_\mathbb {R}$) so that $P_\mathbb {C}=2P_\mathbb {R}$, and one complex multiplication ($M_\mathbb {C}$) requires four multiplications ($M_\mathbb {R}$) of real numbers, i.e., $M_\mathbb {C}=4M_\mathbb {R}$. For a general complex-valued multilayer perceptron (CV-MLP) and real-valued multilayer perceptron (RV-MLP) with $l$ hidden layers, the global number of RMpS is given by the following formula:

(12)$$\left\{ \begin{array}{l} M_\mathbb{C}=4N_{I}^{\mathbb{C}}N_{1}^{\mathbb{C}}+4\sum_{i=1}^{l-1}N_{i}^{\mathbb{C}}N_{i+1}^{\mathbb{C}}+4N_{l}^{\mathbb{C}}N_{O}^{\mathbb{C}}\\ M_\mathbb{R}=N_{I}^{\mathbb{R}}N_{1}^{\mathbb{R}}+\sum_{i=1}^{l-1}N_{i}^{\mathbb{R}}N_{i+1}^{\mathbb{R}}+N_{l}^{\mathbb{R}}N_{O}^{\mathbb{R}} \end{array} \right.,$$

where $N_{i}$ $(i=1,{\ldots }, l)$ is the number of hidden layer neurons. $N_{I}$ corresponds to the number of triplets derived from Eq. (3) and $N_{O}$ to the number of output layers.

In this study, we comprehensively consider the system performance and computing speed factors and then select two hidden layers in our framework. Compared with RFNNs, triplets are entered into CFNNs in the form of complex numbers, so that $N_{I}^{\mathbb {R}}=2N_{I}^{\mathbb {C}}$. For classification tasks, $N_{O}^{\mathbb {R}}=N_{O}^{\mathbb {C}}$. For our model with two hidden layers, $l$ was set to 2 in Eq. (12) and the following can be deduced:

(13)$$\left\{ \begin{array}{l} M_\mathbb{C}=4N_{I}^{\mathbb{C}}N_{1}^{\mathbb{C}}+4N_{1}^{\mathbb{C}}N_{2}^{\mathbb{C}}+4N_{2}^{\mathbb{C}}N_{O}^{\mathbb{C}}\\ M_\mathbb{R}=N_{I}^{\mathbb{R}}N_{1}^{\mathbb{R}}+N_{1}^{\mathbb{R}}N_{2}^{\mathbb{R}}+N_{2}^{\mathbb{R}}N_{O}^{\mathbb{R}} \end{array} \right..$$

Combining the above conditions, to make $M_\mathbb {C}\approx M_\mathbb {R}$, we deliberately restrict the following conditions to achieve performance comparisons with the equivalent computational complexity:

(14)$$\left\{ \begin{array}{l} N_{1}^{\mathbb{R}}=2N_{1}^{\mathbb{C}}\\ N_{2}^{\mathbb{R}}=2N_{2}^{\mathbb{C}} \end{array} \right..$$

Based on the hidden layer structure of Eq. (14), we can obtain the following trainable parameters that determine the space complexity:

(15)$$\left\{ \begin{array}{l} \begin{aligned} P_\mathbb{C} & =2N_{I}^{\mathbb{C}}N_{1}^{\mathbb{C}}+2N_{1}^{\mathbb{C}}N_{2}^{\mathbb{C}}+2N_{2}^{\mathbb{C}}N_{O}^{\mathbb{C}}\\ P_\mathbb{R} & =N_{I}^{\mathbb{R}}N_{1}^{\mathbb{R}}+N_{1}^{\mathbb{R}}N_{2}^{\mathbb{R}}+N_{2}^{\mathbb{R}}N_{O}^{\mathbb{R}}\\ & =4N_{I}^{\mathbb{C}}N_{1}^{\mathbb{C}}+4N_{1}^{\mathbb{C}}N_{2}^{\mathbb{C}}+2N_{2}^{\mathbb{C}}N_{O}^{\mathbb{C}} \end{aligned} \end{array} \right..$$

Consequently, when comparing these two NNs at the same time complexity, the space complexity $P_\mathbb {R}$ is approximately twice that of $P_\mathbb {C}$.

To effectively mitigate the nonlinear impairments in optical transmission systems, the triplet characteristic factors of the received symbols obtained from the first-order perturbation theory will be input into our CFNN as driven data. When designing our overall framework, considering both performance and computing speed, we use two hidden layers with $N_1$ and $N_2$ neurons in the CFNN. As shown in Fig. 6, the contour distributions of the global accuracy of the CFNN classifier corresponding to the values of $N_1$ and $N_2$ under different LOPs are presented. For the M-QAM coherent optical communication system, the corresponding $BER=(1-Accuracy)/\log _{2}{M}$. It can be seen that as the accuracy increases, the BER decreases, but the number of hidden layer neurons also increases correspondingly. Hence, we can obtain the optimal solution for setting the hidden layer of the equalizer by balancing the relationship between performance and complexity (positively correlated with the number of hidden layers) under different LOPs. In the experimental stage, we generated two 23 orders pseudo-random bit sequences (PRBS), and then combined the sequences to construct a stronger random sequence that will not be learned by the NNs [38]. The datasets for each LOP contain approximately $2^{20}$ symbols and we divided the datasets into 50% as the training set and 50% as the test set. During the training stage, we set dropout at the input layer to prevent overfitting of the equalizer training process. We gradually increased from a smaller dropout probability and observe changes in model performance to find the optimal configuration. In addition, we set an initial learning rate value of 0.1 and set a learning rate decrease rule: for example, after every 30 epochs, the learning rate is multiplied by 0.1. Then we constantly adjusted the learning rate to optimize the convergence process. The data pattern used in the training and test datasets has a maximum 0.5% normalized cross-correlation to ensure data independence. After comprehensive consideration of high accuracy and low computational complexity, contour lines of the central region corresponding to the $N_1$ and $N_2$ quantities are considered first for selection. Finally, we chose 32 and 64 neurons in the two hidden layers for our CFNN corresponding to the diamond boxes in Fig. 6. In addition, to avoid adverse effects of the sequence of data inputs on network training, we shuffle the data before each epoch to improve the generalization performance of the network.

Fig. 6. The hidden layer neuron numbers correspond to accuracy rates when the LOP is (a) −4 dBm, (b) −1 dBm, (c) 1 dBm and (d) 5 dBm. Inside the diamond box corresponds to the $N_1=32$ and $N_2=64$ we chose.

Download Full Size | PDF

The activation function is one of the essential characteristics of an NN, which is used to add nonlinear factors to the network and improve the expression ability of the model. Different nonlinear activation functions have different effects on NNs. We referred to the tanh, sigmoid, ReLU, LeakyReLU, and ELU activation functions in typical NNs to perform a comparative analysis of the bit-error ratio (BER) and the loss, which can be expressed as:

(16)$$\left\{ \begin{array}{l} tanh(a)=\frac{e^{a}-e^{{-}a}}{e^{a}+e^{{-}a}}\\ sigmoid(a) = \frac{1}{{1 + {e^{ - a}}}}\\ ReLU(a)=max(a,0)\\ leak{y^{}}ReLU(a) = \left\{ \begin{array}{l} a,a \ge 0\\ 0.1 a,a < 0 \end{array} \right. \end{array} \right..$$

As shown in Fig. 7, they are extended to $\mathbb {C}$tanh, $\mathbb {C}$sigmoid, $\mathbb {C}$ReLU, and $\mathbb {C}$LeakyReLU for CFNNs, as implemented in Eq. (8), which can be calculated as:

(17)$$\left\{ \begin{array}{l} \mathbb{C}tanh=tanh\left (Re\left ( z\right )\right ) +jtanh\left ( Im\left ( z\right )\right )\\ \mathbb{C}sigmoid=sigmoid\left (Re\left ( z\right )\right ) +jsigmoid\left ( Im\left ( z\right )\right )\\ \mathbb{C}ReLU=ReLU\left (Re\left ( z\right )\right ) +jReLU\left ( Im\left ( z\right )\right )\\ \mathbb{C}LeakyReLU\\ =LeakyReLU\left (Re\left ( z\right )\right ) +jLeakyReLU\left ( Im\left ( z\right )\right ) \end{array} \right..$$

Since the BER itself is very small, the representation of $\left |lg(BER)\right |$ can be clearly distinguished. Consequently, we used $lg(loss)$ and $\left |lg(BER)\right |$ vertical axes to obtain a trend of the curves more plainly. As seen from the curves of different activation functions, when the number of epochs is greater than 100, the ‘ReLU’ function loss and BER generally converge better than $\mathbb {C}$tanh and $\mathbb {C}$sigmoid, which is mainly supported by their optimization for the gradient disappearance problem. Finally, as seen from the BER curve trend, $\mathbb {C}$ELU achieves the best performance in classification accuracy because it combines the advantages of $\mathbb {C}$ReLU and $\mathbb {C}$LeakyReLU: on the one hand, it has soft saturation in the negative half axis of the input, which can solve the problem of permanent dead neurons in ReLU; on the other hand, the average value of the output is close to 0, which can alleviate the problem of changes in the output distribution and accelerate the model convergence.

Fig. 7. Loss and BER convergence trajectory of different complex CFNN activation functions. (a) Loss; (b) BER.

Download Full Size | PDF

As mentioned above, $K$ is a hyperparameter that controls the number of triplets and determines the feature dimension of the input signal. From the limit in Eq. (3), the number of triplets is positively correlated with $K$. When enough feature dimensions are input into an NN, PCA is used to reduce the dimensions of high-dimensional features to reduce the computational complexity as much as possible on the premise of ensuring the integrity of information. However, the transformation from the reconstructed input features to the principal components must lead to the loss of useful information. Thus, trading off complexity with equalization performance is essential to neural network equalizers. In actual performance testing, the nonlinear effects of the channel at low LOPs are not significant, making it difficult to distinguish the effects of PCA on different nonlinear equalizers. On the contrary, when the LOP becomes too high, the nonlinear interference in the channel becomes significantly severe. At this point, the advantage of retaining principal components with PCA has noticeably weakened. In summary, when the channel operates near the optimal transmission power, the impact of PCA is most pronounced. Hence, we verified the effects of real PCA on RFNNs and $\mathbb {C}$PCA designed for CFNNs and plotted the curves of the Q-factor under 1 dBm LOP. The relation between Q and the BER is expressed as Q $=20log_{10}{[\sqrt {2}{erfc}^{-1}(2{\rm BER})]}$. Figure 8 shows the action curves of RFNN, CFNN, P-RFNN and P-CFNN equalizers of the Q-factor when $K$ was set to 20 and gradually increased to 80. Clearly, it can be seen from the curves that the CFNN Q-factor is higher than that of the RFNN, and the P-CFNN Q-factor is higher than that of the P-RFNN. For the P-RFNN and P-CFNN equalizers, both the real PCA and $\mathbb {C}$PCA start dimensionality reduction from $K$=80. It is noteworthy that when the dimension is reduced to very small (e.g., $K$=20), the model training is not sufficient due to the innate lack of prior knowledge input to the NN, and the improvements in the Q-factor using both PCA methods are not significant. At the end of the curves, when the decreases in dimension are not large (e.g., $K$=70), the promotion effects of both PCA methods are significantly reduced. This is attributed to the fact that when PCA finds more new principal component feature vectors, it may not be able to express data features more accurately. As shown in Fig. 9, we further calculated the P-RFNN and P-CFNN Q-factor gains compared with an RFNN and CFNN, respectively, when $K$ varied from 20 to 70, which is represented by $\Delta$Q-factor. $\mathbb {C}$PCA selects different degrees of dimensionality reduction to improve the Q-factor, all of which are better than the real PCA. From the $\Delta$Q-factor curve trend of the P-CFNN, when the selection dimensions of $\mathbb {C}$PCA are reduced from $K$=80 to $K$=50, $\Delta$Q-factor obtains a maximum increase of approximately 0.57 dB.

Fig. 8. Curve of the Q-factor with $K$ from 80 via PCA dimension reduction for an LOP of 1 dBm.

Download Full Size | PDF

Fig. 9. Curve of $\Delta$Q-factor with $K$ from 20 to 70 for an LOP of 1 dBm.

Download Full Size | PDF

Based on the above analysis, we compared the $\Delta$Q-factor performance contrast of the RFNN, CFNN, P-CFNN, P-RFNN and the Bi-GRU NN-based nonlinear equalizer [24] with that without (W/O) employing the NLC algorithm in Fig. 10, all of which were executed in the same experimental environment. We uniformly input the number of feature dimensions with $K$=50, in which both the P-CFNN and P-RFNN adopted a dimension reduction treatment with $K$ from 80 to 50 for optimal performance. Figure 10(a) shows that all equalizers can perform nonlinear compensation in high LOP areas with large nonlinear interferences. Clearly, the performance of the P-CFNN equalizer performs the best under all LOPs in Fig. 10(b), where the maximum $\Delta$Q-factor reaches 3.94 dB. In addition, we provided a simulation device of 64 GBaud 1200 km to verify the applicability of the results in this paper. As shown in Fig. 10(c), the linear phase noise (LPN) is caused by the linewidth of the lasers, that is 150 khz and the frequency offset of the two lasers at the transmitter and receiver, that is 2 Ghz. At the receiver, the offline DSP is consistent with the experimental system. Figure 10(d) shows the improved Q-factor in the simulation system which depicts a similar trend to experimental results. The results prove that the conclusions of this paper still stand with the scenes with different data rates and transmission distances.

Fig. 10. System performance with different equalizers vs. LOPs. P-CFNN: red; P-RFNN: yellow; CFNN: purple; RFNN: green and Bi-GRU: cyan and W/O NLC: grey. (a) Q-factor vs. LOPs; (b) Improved Q-factor vs. LOPs. (c) Simulation setup; (d) Simulation results.

Download Full Size | PDF

Our discussion of the overall algorithm complexity is divided into time complexity and space complexity. In actual performance testing, we believe that the complexity introduced by the $\mathbb {C}$PCA dimension reduction process can be negligible. This is because NN equalizers are more commonly applied to the transmitting end for pre-compensation [25]. When performing pre-compensation at the transmitting end of the optical fibre communication system, the computational complexity of PCA can be ignored by using a lookup table (LUT) method [26]. Hence, the complexity of the equalization process for all equalizers mainly comes from the calculation times of the flow of the perturbation feature triplets in the network model. Since the computational complexity of a multiplier is several times that of an adder, the time complexity can be represented by the multipliers. According to Eq. (14), we adopted the RFNN with the equivalent CFNN structure mentioned above to contrast the complexity of each algorithm more reasonably. This means that when $N_{1}^{\mathbb {C}}$ = 32 and $N_{2}^{\mathbb {C}}$ = 64, $N_{1}^{\mathbb {R}}$ and $N_{2}^{\mathbb {R}}$ in the RFNN should be set to 64 and 128, respectively. Since we performed the classification task of 64-QAM signals, $N_{O}^{\mathbb {R}}$ = $N_{O}^{\mathbb {C}}$ = 64. In this case, we adjust the number of input triplets to calculate the total RMpS for each algorithm separately.

Fig. 11. The performance of different equalizers versus complexity: (a) Time complexity and (b) space complexity.

Download Full Size | PDF

Fig. 12. Computational complexity, including the RMpS and parameters.

Download Full Size | PDF

For the sake of simplicity, we take the RFNN equalization performance at an LOP of 1 dBm as the benchmark value, where $K$ is set to 80, $N_T$ is the number of feature triplets corresponding to the $K$ value, and the Q-factor is 8.45 dB. We calculated the $K$ value required by other NNs to reach the benchmark and calculated the corresponding space-time complexity. According to Eq. (13) and Eq. (15), we calculated the RMpS and the number of parameters for the RFNN and CFNN. First, we calculated when the Q-factor reached 8.45 dB, the number of real multiplications for the RFNN with $N_T$ = 1753 is 240768, and the number of parameters indicating the space complexity is 240768. Under the constraints of the same Q-factor, the $N_T$ of the CFNN only needs to be set to 1005, in which case the corresponding number of RMpS and parameters are 153216 and 76608, respectively, so the time complexity of the CFNN is reduced by 36% and the space complexity is reduced by 68% compared with the RFNN. In addition, we added the complexity of the Bi-GRU algorithm in [24] for comparison and used bar charts in Fig. 12 to represent the number of RMpS and parameters of various algorithms more intuitively. The number of Bi-GRU multiplications is approximately 2.5 times that of the RFNN at the same grade. Furthermore, after the real PCA algorithm and our designed $\mathbb {C}$PCA are applied to the RFNN and CFNN, the space-time complexity of the P-RFNN and P-CFNN have different degrees of reduction. We obtained that the RMpS of P-RFNN and P-CFNN are 137344 and 82560, and their parameters are 137344 and 41280, respectively. Finally, we plotted the corresponding changes in the complexity of different equalizers as the Q-factor changes in Fig. 7. According to the comprehensive analysis of Q-factor improvement under the same complexity shown in Fig. 9 and the complexity reduction under the same performance shown in Fig. 11, the P-CFNN has absolute advantages over the P-RFNN, reducing the time complexity by 40% and the space complexity by 70%. Consequently, we concluded that the proposed $\mathbb {C}$PCA algorithm is more effective on CFNNs than the conventional PCA algorithm on RFNNs, which can further explore the very large CVNNs potential.

4. Conclusion

In this paper, we proposed a model of P-CFNN driven by perturbation theory and demonstrated it experimentally on a 375 km 120-Gbit/s DP-64 QAM coherent optical communication system. To further reduce the overall computational complexity of the model, we designed a novel $\mathbb {C}$PCA technique applied to CFNNs, which is more physically interpretable than the conventional PCA applied to RFNNs. Considering both performance and computing speed, we investigated the selection of the number of neurons in the CFNN hidden layers. We also examined the effect of different nonlinear activation functions on model convergence. Finally, we chose $\mathbb {C}$ELU as our nonlinear activation function because it can promote the CFNN to converge better to achieve higher accuracy of the model. In addition, this paper proposed an equivalent RFNN with the same time complexity as a CFNN for fair performance comparison. Under all LOPs, the performance of the P-CFNN equalizer is the best among all comparison algorithms, and the maximum $\Delta$Q-factor compared to without employing the NLC algorithm reaches 3.94 dB. Under the constraint of the same Q-factor, we confirmed that the proposed P-CFNN we obtained a 40% reduction in time complexity and a 70% reduction in space complexity compared with the P-RFNN, which also proved the very large application prospect of the P-CFNN equalizer in optical fibre communication systems.

Funding

National Natural Science Foundation of China (62075014); National Key Research and Development Program of China (2021YFB2900703).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Agrawal, Fiber-optic communication system (Wiley-Interscience, 2002).

2. E. Ip, “Nonlinear compensation using backpropagation for polarization-multiplexed transmission,” J. Lightwave Technol. 28(6), 939–951 (2010). [CrossRef]

3. I. T. Lima, T. D. DeMenezes, V. S. Grigoryan, et al., “Nonlinear compensation in optical communications systems with normal dispersion fibers using the nonlinear fourier transform,” J. Lightwave Technol. 35(23), 5056–5068 (2017). [CrossRef]

4. X. Huang, Y. Wang, C. Li, et al., “Improved dbscan algorithm based signal recovery technology in coherent optical communication systems,” Opt. Commun. 521, 128590 (2022). [CrossRef]

5. L. Zhu, H. Yao, H. Chang, et al., “Adaptive optics for orbital angular momentum-based internet of underwater things applications,” IEEE Internet Things J. 9(23), 24281–24299 (2022). [CrossRef]

6. R. Alexey, A. Evgeny, S. Oleg, et al., “Compensation of nonlinear impairments using inverse perturbation theory with reduced complexity,” J. Lightwave Technol. 38(6), 1250–1257 (2020). [CrossRef]

7. H. Louchet, K. Kuzmin, and A. Richter, “Improved DSP algorithms for coherent 16-QAM transmission,” in European Conference on Optical Communication (ECOC), (Brussels, Belgium, Sep.,2008), pp. 1–2.

8. J. Zhao, Y. Liu, and T. Xu, “Advanced DSP for coherent optical fiber communication,” Appl. Sci. 9(19), 4192 (2019). [CrossRef]

9. M. Seimetz, High-order modulation for optical fiber transmission (Springer, 2009).

10. E. Giacoumidis, S. Mhatli, M. F. Stephens, et al., “Reduction of nonlinear intersubcarrier intermixing in coherent optical OFDM by a fast newton-based support vector machine nonlinear equalizer,” J. Lightwave Technol. 35(12), 2391–2397 (2017). [CrossRef]

11. C. Li, Y. Wang, J. Wang, et al., “Convolutional neural network-aided DP-64 QAM coherent optical communication systems,” J. Lightwave Technol. 40(9), 2880–2889 (2022). [CrossRef]

12. C.-Y. Lin, A. Napoli, B. Spinnler, et al., “Adaptive digital back-propagation for optical communication systems,” in Optical fiber communication conference, (San Francisco, California, United States, 2014), pp. M3C–4.

13. S. Deligiannidis, C. Mesaritakis, and A. Bogris, “Performance and complexity analysis of bi-directional recurrent neural network models versus volterra nonlinear equalizers in digital coherent systems,” J. Lightwave Technol. 39(18), 5791–5798 (2021). [CrossRef]

14. T. Oyama, T. Hoshida, H. Nakashima, et al., “Proposal of improved 16-QAM symbol degeneration method for simplified perturbation-based nonlinear equalizer,” in Conference on Optical Fibre Technology (COFT), (Melbourne, Australia, Jul., 2014), pp. 941–943.

15. G. Chen, J. Du, L. Sun, et al., “Nonlinear distortion mitigation by machine learning of SVM classification for PAM-4 and PAM-8 modulated optical interconnection,” J. Lightwave Technol. 36(3), 650–657 (2018). [CrossRef]

16. M. Li, S. Yu, J. Yang, et al., “Nonparameter nonlinear phase noise mitigation by using M-ary support vector machine for coherent optical systems,” IEEE Photonics J. 5(6), 7800312 (2013). [CrossRef]

17. Wang Danshi, Zhang Min, Cai Zhongle, et al., “Combatting nonlinear phase noise in coherent optical systems with an optimized decision processor based on machine learning,” Opt. Commun. 369, 199–208 (2016). [CrossRef]

18. S. Zhou, R. Gao, Q. Zhang, et al., “Nonlinear compensation for OAM optical fiber communication system based on naive gaussian bayes algorithm,” in Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC), (Beijing, China, Oct., 2020), pp. M4A–254.

19. J. Zhang, W. Chen, M. Gao, et al., “K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system,” Opt. Express 25(22), 27570–27580 (2017). [CrossRef]

20. Q. Fan, G. Zhou, T. Gui, et al., “Advancing theoretical understanding and practical performance of signal processing for nonlinear optical communications through machine learning,” Nat. Commun. 11(1), 3694 (2020). [CrossRef]

21. W. Xiong, B. Redding, S. Gertler, et al., “Deep learning of ultrafast pulses with a multimode fiber,” APL Photonics 5(9), 096106 (2020). [CrossRef]

22. Chi Nan, Zhao Yiheng, Shi Meng, et al., “Gaussian kernel-aided deep neural network equalizer utilized in underwater PAM-8 visible light communication system,” Opt. Express 26(20), 26700–26712 (2018). [CrossRef]

23. S. Deligiannidis, A. Bogris, C. Mesaritakis, et al., “Compensation of fiber nonlinearities in digital coherent systems leveraging long short-term memory neural networks,” J. Lightwave Technol. 38(21), 5991–5999 (2020). [CrossRef]

24. X. Liu, Y. Wang, X. Wang, et al., “Bi-directional gated recurrent unit neural network based nonlinear equalizer for coherent optical communication system,” Opt. Express 29(4), 5923–5933 (2021). [CrossRef]

25. S. Zhang, F. Yaman, K. Nakamura, et al., “Field and lab experimental demonstration of nonlinear impairment compensation using neural networks,” Nat. Commun. 10(1), 1–8 (2019). [CrossRef]

26. Y. Gao, Z. A. El-Sahn, A. Awadalla, et al., “Reduced complexity nonlinearity compensation via principal component analysis and deep neural networks,” in Optical Fiber Communication Conference (OFC), (San Diego, California, Mar. 2019), pp. Th2A–49.

27. S. A. Bogdanov and O. S. Sidelnikov, “Use of complex fully connected neural networks to compensate for nonlinear effects in fibre-optic communication lines,” Quantum Electron. 51(5), 459–462 (2021). [CrossRef]

28. W. Zhou, J. Shi, L. Zhao, et al., “Comparison of real-and complex-valued nn equalizers for photonics-aided 90-Gbps D-band PAM-4 coherent detection,” J. Lightwave Technol. 39(21), 6858–6868 (2021). [CrossRef]

29. J. A. Barrachina, C. Ren, C. Morisseau, et al., “Complex-valued vs. real-valued neural networks for classification perspectives: An example on non-circular data,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Toronto, ON, Canada, 2021), pp. 2990–2994.

30. E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

31. Z. Tao, L. Dou, W. Yan, et al., “Multiplier-free intrachannel nonlinearity compensating algorithm operating at symbol rate,” J. Lightwave Technol. 29(17), 2570–2576 (2011). [CrossRef]

32. C. Li, Y. Wang, L. Han, et al., “Optical fiber nonlinearity equalizer with support vector regression based on perturbation theory,” Opt. Commun. 507, 127627 (2022). [CrossRef]

33. T. L. Clarke, “Generalization of neural networks to the complex plane,” in 1990 International Joint Conference on Neural Networks (IJCNN), (San Diego, California, 1990), pp. 435–440.

34. M. Amin, M. I. Amin, A. Y. H. Al-Nuaimi, et al., “Wirtinger calculus based gradient descent and levenberg-marquardt learning algorithms in complex-valued neural networks,” in International Conference on Neural Information Processing (ICONIP), (Berlin, Germany, 2011), pp. 550–559.

35. D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in International Conference on learning Representations (ICLR), (San Juan, Puerto Rico, 2016).

36. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in International conference on artificial intelligence and statistics (AISTATS), (Sardinia, Italy, 2010), pp. 249–256.

37. P. J. Freire, Y. Osadchuk, B. Spinnler, et al., “Performance versus complexity study of neural network equalizers in coherent optical systems,” J. Lightwave Technol. 39(19), 6085–6096 (2021). [CrossRef]

38. T. Liao, L. Xue, L. Huang, et al., “Training data generation and validation for a neural network-based equalizer,” Opt. Lett. 45(18), 5113–5116 (2020). [CrossRef]

Complex principal component analysis-based complex-valued fully connected NN equalizer for optical fibre communications

Abstract

1. Introduction

2. Principles

2.1 Perturbation theory analysis

2.2 PCA extension in complex fields

2.3 Complex-valued neural network design

3. Experiment results

3.1 Experimental setup

3.2 Results and discussion

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Equations (17)

Optics Express