Memory-controlled deep LSTM neural network post-equalizer used in high-speed PAM VLC system

: Linear and nonlinear impairments severely limit the transmission performance of high-speed visible light communication systems. Neural network-based equalizers have been applied to optical communication systems, which enables significantly improved system performance, such as transmission data rate and distance. In this paper, a memory-controlled deep long short-term memory (LSTM) neural network post-equalizer is proposed to mitigate both linear and nonlinear impairments in pulse amplitude modulation (PAM) based visible light communication (VLC) systems. Both 1.15-Gbps PAM4 and 0.9Gbps PAM8 VLC systems are successfully demonstrated, based on a single red-LED with bit error ratio (BER) below the hard decision forward error correction (HD-FEC) limit of 3.8 x 10 − 3 . Compared with the traditional finite impulse response (FIR) based equalizer, the Q factor performance is improved by 1.2dB and the transmission distance is increased by one-third in the same experimental hardware setups. Compared with traditional nonlinear hybrid Volterra equalizers, the significant complexity and system performance advantages of using a LSTM-based equalizer is demonstrated. To the best of our knowledge, this is the first demonstration of using deep LSTM in VLC systems.

With the increase in system transmission rate, ISI is enhanced and more taps are needed when designing the equalizer. Due to the high computational complexity, the order of hybrid Volterra filters is generally limited to 2 to 3 orders with limited high-order taps [9] and LUT scheme generally uses extreme limited symbols [10,13]. An ideal nonlinear equalizer can effectively compensate for nonlinearities. However, the computational complexity increases as the number of taps increases. It is well known that the level of complexity that should normally be avoided is non-deterministic polynomial-time hardness (NP) and above. An LSTM neural cell-based equalization network is designed to solve this problem. Long-term memory parameters are used to store relatively slow-changing channels. Parameters in shortterm memory are used to quickly process finite taps that affect a symbol. Benefit from the characteristics of neural networks, the complexity of the equalizer can be significantly reduced to meet the complexity requirements.
Compared to traditional adaptive equalization, ML techniques are effective at using limited training samples to create a probability-based model. More specifically, ML emphasizes generalization ability and concept learning (CL) [14], and quantifies learning ability through generalization errors and under/over-fitting [15]. In recent years, many powerful traditional ML models [16] and simple multilayer neural network (NN) models [17] have been successfully applied in the field of communication systems, but studies have pointed out that some wrong experimental methods in communication systems may make the performance of machine learning seriously overestimated [18]. With the rise of deep neural networks in other fields and breakthroughs in nonlinear problems [19,20], we explore the use of deep neural networks as a way to solve nonlinear compensation in VLC systems.
In this paper, we propose and experimentally demonstrate a post-equalizer employing a memory controlled deep 32 layers LSTM neural network and a softmax-function based probabilistic classification model for PAM4 and PAM8 VLC systems. The memory controlled LSTM NN can simultaneously compensate for linear and nonlinear distortions. Applying the LSTM-based equalization scheme can improve system performance compared with the original FIR-based and Volterra-based equalization schemes. A data rate of 1.15-Gb/s is successfully achieved over 1-m indoor free space transmission based on a single red-LED with bit error rate (BER) under the 7% hard decision forward error correction (HD-FEC) limit of 3.8x10 −3 . Further, our experiments demonstrate that the LSTM neural network is suitable for truly random sequences. To the best of our knowledge, this is the first time that the LSTM of machine learning is successfully applied in VLC systems.

Principle
In VLC systems, it is easy to illustrate the nonlinear distortion by mapping the Tx and Rx signal amplitudes. Figure 1 shows the back-to-back PAM-8 data through the VLC channel. Impaired by nonlinearity, the data deviates from the linear straight line. As a trade-off in VLC systems, reducing the modulation depth will reduce nonlinearity while increasing the modulation depth improves the signal-to-noise ratio due to increased extinction ratio. where b is the for PAM syst designed For traditional equalizer weights fitting is reset after each retraining, so there is no memory effect in it. Differently, the parameters in the neural network have long-term memory in multiple trainings to help the model compensate for nonlinearity more accurately. Specifically, the LSTM assigns the highest priority training data to the latest training sequence during training, but also retains some weights in the original channel model. As shown in Fig. 2, a certain length of x'(n) and y'(n) are used as the training sequence to converge parameters of the neural network. As can be seen from Eq. (2), y(n) is related to the N taps vector X n . For convenience, we express X n as Eq. (4): where x(n) is the n-th input symbol of N taps and T denotes transpose or the vector. Therefore, each of our training samples and equalization set of input and output can be expressed as Eq. (5): LSTM is a probability-based model which cannot be used directly for equalization. Figure 3 shows the structure of the equalizer based on LSTM. The equalizer includes an input layer, a hidden layer, a classification layer, and an output layer. The standard LSTM cells is defined in [20]. We propose to transform the neural network output into a time-domain equalizer via a merge node according to the softmax function of Eq. (6) and Eq. (7):

Deep LSTM neural network structure
Where P(y = Li) represents the probability that y is equal to different amplitude levels Li, x is the Rx vector of tap numbers and w k is the weights of the equalizer.
Where NoL is the number of the amplitude levels (e.g. NoL of PAM-8 is 8, L1 = −7*Normalization Factor (NF), L2 = −5*NF, L7 = 7*NF). For the long/short-term memory links in Fig. 3, we use the standard LSTM cell structure in [20].  Figure 4 shows the flow chart of training process and equalization. In order to train more precisely than using direct accuracy based error, such as Mean Square Error(MSE) between actual and predicted during training, a softmax based cross-entropy equation is carried out in the algorithm as Eq. (8):

Training and transmission
where Y i is the i-th probability of the amplitude levels, X train and Y train is the Rx and Tx training set. However, inside the NN optimizer, the sample is cut by the whole block called batch to calculate the training error, as Eq. (9): where m is the batch size, and Complete sample i i batch =  , so If pseudo-random sequences are used for training and testing, there is a high probability situation exists: where i j ≠ . This situation is very common in the communication field and cause the serial numbers to be remembered, which is known to be the memory of the neural network and will give a wrong estimation of system performance [18]. We used a simple but effective way to control this memory effect: randomly pick single sample of all sample sets to compose each batch as Eq. (11) k batch is the new input to the optimizer. It is theoretically not possible to have the situation as in Eq. (9) after this method is used. In order to verify the feasibility of this method, we experimentally compared pseudo-random sequences with random sequences to verify that the LSTM equalizer avoids the memory trap described above.
Most neural networks have memory effects, whether it is deep neural networks (DNN), convolutional neural networks (CNN) or LSTM. It should be noted that when the neural network model is applied to the communication system, on the one hand, a part of the memory is beneficial, for example, remembering the previous parameters before the new training greatly reduces the amount of training required. On the other hand, part of the memory is harmful, such as remembering the sequence of training, because obviously the sequence to be transmitted should be unpredictable in communication system.

Complexity analysis
We estimated theoretical the complexity of LSTM network and the results show that it has an acceptable algorithm complexity. The level of training complexity is the same as that of the second-order Volterra equalizer, and it is much lower than the 3rd-order Volterra equalizer. In addition, high-order Volterra without simplification is not a practical algorithm with current computing power since its complexity is NP-hard.

O(n) O(n) O(n O ) O(n 2 ) O(n ) a N is the number of equalizer taps, L is the length of the training sequence, I is the number of iterations, H is the number of hidden layers.
As shown in Table 1, the complexity of training and equalizing determines the complexity of algorithm implementation. According to Eq. (2), the high-order multi-tap Volterra equalizer is theoretically impossible to be implemented since calculation of the parameters by solving higher-order Volterra series exceeds the NP hard level. In existing practice, the second-order Volterra algorithm with limited taps based on minimum MSE convergence is generally used. Figure 5 shows the experimental setup of the VLC system with PAM modulation employing LSTM based equalization. At the transmitter, the original binary bits are converted to PAM symbols, and an upconversion is performed after up-sampling. Then, the signal is generated by an AWG and passes through the basic hardware equalizer [21]. In order to reach the LED's switching current threshold, the signal amplified by an electronic amplifier (EA) is added with a direct current (DC) by a bias-T. The red illuminating chip in an ordinary commercial RGB LED (Engine LZ4-20MA00) was selected for transmitting signals to free space.

Experimental setup and results
At the receiving end, the detected optical signal is converted into an electrical signal by a PIN photodiode (Hamamatsu 10784). The weak original signal is amplified by an EA and sampled by an oscilloscope (Agilent 54855A) with sampling rate of 2-GSample/s. Finally, the signal is stored in a laptop with a graphics processing unit (GPU Nvidia GTX1060) for offline digital signal processing through a network cable/GBIP interface.
In offline digital signal processing of PAM4 and PAM8 signals, after synchronization, the down-conversion and down-sampling corresponding to the transmitter are used first. After passing through the equalizer, the symbol is decoded into the original binary signal. As shown in Fig. 6, the cross-entropy continues to converge after the mean square error (MSE) almost close to 0 in the training process. If MSE is chosen as the error function in LSTM, as shown by the blue line in the figure, the MSE is 0 with 4 × 10 2 iterations, so the parameter convergence will end regardless of the convergence threshold. In particular, we artificially set 0 points on the log coordinates of MSE to make the curve conform to the display rule. So, it will cause under-fitting problems and cannot accurately fit the parameters in the network. Actually, in general deep neural network experiments, cross entropy error is widely used in the training process of other applications, such as image identification and natural language processing. For a clear comparison, better performing but more complex FIR equalizers, data-aid LMS equalizer with decision-directed least mean square (DD-LMS) adaptive equalizer and data-aid LMS with Volterra equalizer were used. The results of applying LMS equalizer, hybrid Volterra and LSTM equalizer to PAM8 with different taps are shown in Fig. 7. It can be found that the system performance of using LSTM-based equalization scheme is superior to the original LMS and DD-LMS based equalization scheme by approximately 1.2 dB in the case of optimal number of taps. PAM8 systems using only the LSTM equalizer reduces the error by about half comparing with systems using the LMS + Volterra equalizer, resulting in a Q increase of approximately 0.8. The BER performance of VLC PAM8 systems using only Volterra equalizer are unacceptable due to severe over-fitting. The Volterra equalizers preconverged by LMS has approximately 0.5 better Q value than LMS equalizer in PAM8 systems. It is worth noting that Volterra equalizer in this comparison uses more tap to trade off complexity for optimal performance. When the number of taps is lower than 11, the parameter fitting of the equalizer obviously cannot follow the non-ideal response of the system. On the other hand, when the number of taps is too high, the performance of the system is reduced, because the additive white Gaussian noise (AWGN) accumulated in the system affects the accuracy of the optimizer in the equalizer. The performance of the LSTM equalizer in the PAM4 and PAM8 systems were studied separately. With the increase in baud rate, ISI also increases. Obviously, the number of taps need to be increased to cope with nonlinearity of the system, the complexity of the high-order Volterra equalizer makes its use no longer theoretically suitable. However, LSTM nonlinear equalizer can still be used since its complexity increases linearly with the number of taps. LSTM equalization scheme can make the BER performance of the system lower than the HD-FEC threshold when the transmission rate reaches 1.15-Gb/s in PAM4 and 0.975-Gb/s in PAM8 VLC systems. As shown in Fig. 8. It can be found that the system performance of using LSTM-based equalization scheme is significantly better than LMS + DDLMS and LMS + Volterra based equalization schemes. Due to the bandwidth limitation and filtering effect, system performance is seriously degraded by noise and cannot be compensated by DSP beyond certain baud rate. When the PAM8 system is operating at 325MBd, applying LSTM equalizer can achieve error-free transmission. However, at 275MBd, LMS + DDLMS scheme already failed to meet the BER requirements while the BER remains below the HD-FEC with the LSTM scheme. At the optimal working conditions, the LSTM scheme improves the transmission distance performance of the PAM8 VLC systems compared with the traditional schemes. As shown in Fig. 9, BER performance improvements are more significant when the noise impact introduced by bandwidth is smaller. When the transmission bit rate is 750Mbit/s, the LSTM equalizer can make the system's transmission distance reach 1.2M and keep the bit error rate below the HD-FEC threshold. The solution using the LMS + DDLMS and LMS + Volterra equalizer can only achieve a transmission distance of 0.8 meters and 0.9 meters. When the transmission bit rate is 900 Mbit/s, the LSTM equalizer can extend the transmission distance of the system from 0.7 meters to 0.9 meters. In addition, we studied the optimal operation of the LSTM equalizer at different bias direct current (DC) and input signal peak-to-peak voltage (VPP) conditions. As shown in Fig.  10, there are different working conditions with a transmission distance of 0.8m and a transmission baud rate of 750 Mbit/s, the LSTM equalizer extends the operating range under the FEC threshold limit. The PAM8 system has the best BER performance when the bias current is 60mA and VPP is 0.4 volts. Fig. 11. Q factor performance curve of pseudo-random and random sequence.
In particular, the problem of over-estimation of the performance of LSTM when using pseudo-random sequence has been experimentally analyzed as described in section 2.2 of this paper. As shown in Fig. 11. Pseudo-random sequences give overestimation of system performance over random sequences from two aspects: Memory effects and unrealistic coverage of patterns.
Specifically, the memory effect is due to the incorrect use of the same training sequence and test sequence in the experiment, which causes the neural network to remember the serial number of the transmission sequence. Therefore, if pseudo-random sequence is used in the experiment, there will be overestimation of system performance. The reason for the unrealistic pattern coverage which is impossible to cover the taps with too long training sequences in practical applications. For example, the PAM4's 40 taps completely cover training sequence length to 4 40 patterns. However, using some training methods in deep learning can make the performance closer to the theoretical limit-pseudo-random sequence performance in Fig. 11. One possible direction is to introduce some forgetting factors to ignore the unnecessary redundant training parts. Finally, the effect of the length of the training sequence on different nonlinear algorithms is experimentally measured. The Q factor after only one training is shown in the Fig. 12, in general, LSTM and Volterra based equalizers are similar in the length of the training sequence. Specifically, when the length of the training sequence is less than 1/16, the parameters of the two equalizers do not converge well. In all of the above experiments, we only used one training sequence of the same length that is sufficiently convergent to compare the LSTM and Volterra equalizer algorithms. It is worth mentioning that in the case of well convergence, the traditional equalizer's multiple training cycles will not have a significant performance improvement, because each complete training cycle will update the parameters without memory. But differently, under the influence of long-term memory, systems with LSTM-based equalizer will get higher performance which has been trained more than one times.

Conclusions
We theoretically and experimentally investigate the time-domain memory controlled LSTM neural network based equalization scheme for a band-limited PAM-8 VLC system. A data rate of 1.15-Gb/s is successfully demonstrated over 0.8-m indoor free space transmission based on a single red-LED with BER lower than 3.8x10 −3 . Besides, it is shown experimentally that LSTM-based equalizer can outperform original FIR based equalizer and Volterra based equalizer. Furthermore, we theoretically and experimentally verified that the scheme is applicable to random sequences and pointed out the causes of the performance overestimation problem of neural network found in previous studies.