Two-Lane DNN Equalizer Using Balanced Random-Oversampling for W-Band PS-16QAM RoF Delivery over 4.6 km

For W-band long-range mm-wave wireless transmission systems, nonlinearity issues resulting from photoelectric devices, optical fibers, and wireless power amplifiers can be handled by deep learning equalization algorithms. In addition, the PS technique is considered an effective measure to further increase the capacity of the modulation-constraint channel. However, since the probabilistic distribution of m-QAM varies with the amplitude, there have been difficulties in learning valuable information from the minority class. This limits the benefit of nonlinear equalization. To overcome the imbalanced machine learning problem, we propose a novel two-lane DNN (TLD) equalizer using the random oversampling (ROS) technique in this paper. The combination of PS at the transmitter and ROS at the receiver improved the overall performance of the W-band wireless transmission system, and our 4.6-km ROF delivery experiment verified its effectiveness for the W-band mm-wave PS-16QAM system. Based on our proposed equalization scheme, we achieved single-channel 10-Gbaud W-band PS-16QAM wireless transmission over a 100 m optical fiber link and a 4.6 km wireless air-free distance. The results show that compared with the typical TLD without ROS, the TLD-ROS can improve the receiver‘s sensitivity by 1 dB. Furthermore, a reduction of 45.6% in complexity was achieved, and we were able to reduce training samples by 15.5%. Considering the actual wireless physical layer and its requirements, there is much to be gained from the joint use of deep learning and balanced data pre-processing techniques.


Introduction
With the advancement of 5G mobile communication, millimeter band systems can provide higher data rates and less wireless interference, even in high-mobility environments [1][2][3][4][5][6][7][8][9][10][11]. Moreover, as wireless communications develop, the wider bandwidth expansion is driven by the growing demand for capacity in next-generation 6G mobile networks, which is expected to lead to ultra-high data rates [12,13]. Recently, W-band (75-110 GHz) has received more and more attention, with its wide bandwidth, relatively low atmospheric loss, and good directionality. It has been considered as a potential candidate for longdistance mobile communication [14][15][16][17]. Photonics-aided W-band signal generation is a critical form of technology in long-distance radio-over-fiber (ROF) transmission systems. It breaks the bottleneck of the bandwidth limitation of traditional electrical equipment and can achieve cost-effective and stable high-frequency millimeter-wave signals [18][19][20][21].
To further enlarge the capacity of the modulation-constraint channel, the probabilistic shaping (PS) technique is introduced into optical fiber communication systems [22]. Moreover, with the development of the PS technique, significant breakthroughs can be achieved in expanding transmission distance and improving spectral efficiency [23]. However, simultaneously, photoelectric and electronic devices usually introduce nonlinear For the NN classification algorithm, a class imbalance occurs when the number of samples in some classes is greater than that of instances in other classes. Here, we define the larger data set as the majority class and the smaller one as the minority class. For long-range high-speed wireless channels influenced by nonlinearity, phase noises, and high loss, it is difficult to learn from the samples of the minority class in the presence of serious class overlapping. The random oversampling (ROS) algorithm's main function is to overcome the imbalance problem by redistributing the training dataset. We employed the ROS technique on the Rx side of the W-band PS-16QAM RoF delivery in our work. The basic principle of ROS is shown in Figure 1. As we know, N-order modulation signal equalization is regarded as N-classification. The PS technique introduces a severe skew in the class distribution of the baseband PS-16QAM data, causing the generation of the majority class in the inner rings and a few minority classes in the outer rings, which can result in the original imbalanced dataset, as shown in Figure 1a. The core idea of the ROS algorithm is to randomly extract samples from the minority classes and make multiple replications in order to balance the class distribution of the training set [42,43]. The procedure using the classic ROS algorithm to generate a training set {S } is described as follows: Sensors 2023, 23, x FOR PEER REVIEW 3 of 16 replications in order to balance the class distribution of the training set [42,43]. The procedure using the classic ROS algorithm to generate a training set is described as follows: Figure 1. The principle of random oversampling and the corresponding target output of (a) the original imbalanced dataset, (b) the random oversampled dataset, and (c) the balanced dataset.
Step 1: For an original training dataset , the data size in the smallest minority class is counted as . Meanwhile, the data size in the largest majority class is written as , and the sample rate is SR. Here, , and is the length of the final training dataset for every classification. The imbalance ratio is .
Step 2: For one specific minority classification and the largest majority classification, all of their original samples constructed the original minority class training set and majority class training set , respectively.
Step 3: For 1: , we randomly chose numbers from 1, , and find the corresponding sample set from .
Step 4: The selected samples were added to the minority class set .
Step 5: The generated training set was obtained ∪ .
Step 6: For all minority classifications, we repeated the operations above and obtained a large number of samples. In addition, for majority classifications, we extracted only a small subset of samples from the imbalanced dataset. Finally, we obtained balanced dataset generation.

Two-Lane DNN (TLD) Equalizer
A two-lane DNN (TLD) equalizer was employed in our experiment to mitigate the nonlinear impairment. The so-called "two-lane DNN" represents that such an equalizer is composed of two DNNs, one designed to process real signal sequences and the other applied for imaginary signal sequences, since our modulated signal is a complex 16QAM signal. Figure 2 shows the schematic diagram of the TLD equalizer combined with ROS. Step 1: For an original training dataset {S}, the data size in the smallest minority class is counted as N − . Meanwhile, the data size in the largest majority class is written as N + , and the sample rate is SR. Here, N + + N − = N, and N + N − × SR is the length of the final training dataset for every classification. The imbalance ratio is λ = N + N − .
Step 2: For one specific minority classification and the largest majority classification, all of their original samples constructed the original minority class training set {S − } and majority class training set {S + }, respectively.
Step 4: The selected samples were added to the minority class set Step 5: The generated training set was obtained {S } = {S + } ∪ {S − }.
Step 6: For all minority classifications, we repeated the operations above and obtained a large number of samples. In addition, for majority classifications, we extracted only a small subset of samples from the imbalanced dataset. Finally, we obtained balanced dataset generation.

Two-Lane DNN (TLD) Equalizer
A two-lane DNN (TLD) equalizer was employed in our experiment to mitigate the nonlinear impairment. The so-called "two-lane DNN" represents that such an equalizer is composed of two DNNs, one designed to process real signal sequences and the other applied for imaginary signal sequences, since our modulated signal is a complex 16QAM signal. Figure 2 shows the schematic diagram of the TLD equalizer combined with ROS. In the training process, our proposed TLD equalizer was trained via two steps. In the first step, the training dataset was randomly oversampled to generate balanced 16QAM signals. In the second step, the equalizer was trained with the help of the trainingbalanced 16QAM sequence, and the weight value of the TLD equalizer was further optimized until the target error value was achieved. In the testing process, in contrast,  In the training process, our proposed TLD equalizer was trained via two steps. In the first step, the training dataset was randomly oversampled to generate balanced 16QAM signals. In the second step, the equalizer was trained with the help of the training-balanced 16QAM sequence, and the weight value of the TLD equalizer was further optimized until the target error value was achieved. In the testing process, in contrast, 30% of the originally received PS-16QAM signal used as the testing dataset was directly inputted into the welltrained TLD neural network. Finally, the BER decision could be implemented according to the equalized testing signals. Therefore, it is worth noting that ROS was only implemented into the training data, and the TLD network was trained well with the help of balanced training data.
In general, a NN is made up of an input layer, several hidden layers, and an output layer. Our proposed TLD equalizer has one input layer, L (set as two in our experiment) hidden layers, and one output layer. Considering the complexity of a typical M-layer NN described as C MLP = n 0 n 1 + n 1 n 2 + n 2 n M [44], the neural network we used was a simple and low-layer one, with a structure of [180-300-300-1]. There were 180 units in the input layer, 300 cells for both hidden layers, and one unit for output. Such parameters are proven to be capable of satisfying the need for nonlinear equalization without overly increasing the complexity of the neural network in the following sections. w I,mj k and w Q,mj k are the weight values of two lanes, respectively, which link the hidden layers and the output layer, where I or Q denotes the I or Q lane of signals it deals with, respectively. k represents the current k-th layer, (0-th for the input layer, and 3-th for the output layer), and m and j represent the m-th node in the former hidden layer and the j-th node in the current layer, respectively. Firstly, the weight value randomization (w I,mj k and w Q,mj k ∈ [−1, 1]) initialized their TLD equalizers, learning rates, and iterative learning epoch settings. Secondly, the oversampled PS-16QAM dataset was separated into I and Q two-lane vectors, respectively, and then sent to the input layer with a length of n 0 . As shown in Figure 2, the input vectors can be written as X I (n) = [x I (n), x I (n − 1) . . . , x I (n − n 0 + 1)] and X Q (n) = x Q (n), x Q (n − 1) . . . , x Q (n − n 0 + 1) , and then multiplied by weight values w I,mj 1 and w Q,mj 1 in the first hidden layer, respectively. Because the neurons between the layers are fully connected, the output of the j-th neurons in the first hidden layer can be described as follows: where n i denotes the number of nodes in the i-th layer, with n 0 and n M defining the numbers of nodes in the input layer and output layer, respectively. Based on the feedforward training process, the output of the l-th neuron in the 2-th hidden layer can be calculated as follows: Here, as mentioned above, L is set to 2, and n 1 is the number of nodes in the 1-th hidden layer. Take w L I,ml in the DNN dealing with I-lane signals as an example: it represents the weight value in the 2-th hidden layer of the m-th node in the L-1-th hidden layer and the l-th node in the current layer. h L I,m and h L Q,m denote the output of the m-th neuron in the L-th hidden layer for the I-lane DNN and Q-lane DNN, respectively. It is worth noting that f (.) denotes the nonlinear active function between hidden layers. In DNN, there are several common active functions, such as "sigmoid" and "tanh" functions.
Taking gradient explosion and gradient vanishing into consideration, we chose the "ReLu" function, which can be described as: Meanwhile, the transformation from the hidden layer to the output layer is linear, contrary to that from the input layer to the hidden layer, which is nonlinear. Considering this, we thus chose the "purelin" function as the active function g (.) in the output layer. The final equalized output result is given as: We noted that the weight values adaptively updated based on the least mean squares (LMS) error function, which can be given as: where J n is the cost function and T refers to the length of the training dataset X(n). Then, we subtracted the predetermined expected output value T N from the obtained output result O n to obtain the error value e n , which was fed back to TLD equalizers to participate in the calculation. With the aid of BP algorithms, the weight values w I,mj k and w Q,mj k were updated constantly until reaching the preset epoch or error value. The process of iterations can be given as: ∂real(J n ) ∂w I,mj k = real(e n ) · ∂real(O n ) ∂w I,mj k · ∂h I,j L ∂h I,j L−1 . . .
Here, η denotes the learning rate and the ∆ symbol refers to calculations of the gradients. In our proposed TLD-ROS equalization scheme, the original received PS-16QAM signal was firstly processed by deployed ROS, and then the target output was corresponding modified. Finally, the average distributed 16QAM was placed into the TLD classifier to obtain the optimum network parameters. Figure 3 shows the experimental setup of our presented W-band PS-16QAM delivery over a 4.6 km free space wireless transmission system. As the transmitter side shown in Figure 3a, an 88.5 GHz frequency space was generated by two individual external cavity lasers (ECLs) based on optical heterodyne beating. The baseband PS-16QAM signal was firstly generated via offline Matlab-2020b software, and then digital-to-analog converted (DAC) by the arbitrary wave generator (AWG) with a sampling rate of 100 GSa/s. Boosted by two parallel electrical amplifiers (EAs), the baseband I and Q component of the PS-16QAM signal created by the Pseudo-Random Binary Sequence (PRBS), with a length of 2 20 −1, was used to drive the I/Q modulator. In order to avoid overfitting in the equalization process, we used the PRBS of order 20 to generate the transmitted sequence. The periodic length of a PRBS-20 is 1048575 (2 20 −1 = 1,048,575 bits). The total transmitted I/Q length extracted from the generated PRBS-20 sequence was 163,84 (16,384 symbols = 65,536 bits), much less than 1,048,575, and effectively avoided the repetition of pseudo-random numbers. The external cavity laser (ECL1) at 1550 nm, with a line width of 100 kHz and an average power of 16 dBm, was modulated as a signal light source to carry transmitted PS-16QAM data by I/Q modulator with a 3 dB optical bandwidth of 30 GHz and a half-wave voltage of 2.7 V at 1 GHz. Then, the optical beam was amplified by a polarization-maintaining erbium-doped fiber amplifier (PM-EDFA). Then, the ECL2 at the center wavelength of 1549.3 nm, operating as a local oscillator (LO) with a frequency space of 88.5 GHz with the ECL1 light, was used to generate W-band mm-wave signals. After being delivered over a 100 m standard single mode fiber (SSMF), the optical power of the combined beam was adjusted by an attenuator (ATT) to obtain the optimum input power into the photodiode (PD), and PD was implemented within the frequency range of 10∼170 GHz at −2 V DC bias and with an output power of −7 dBm. firstly generated via offline Matlab-2020b software, and then digital-to-analog converted (DAC) by the arbitrary wave generator (AWG) with a sampling rate of 100 GSa/s. Boosted by two parallel electrical amplifiers (EAs), the baseband I and Q component of the PS-16QAM signal created by the Pseudo-Random Binary Sequence (PRBS), with a length of 2 −1, was used to drive the I/Q modulator. In order to avoid overfitting in the equalization process, we used the PRBS of order 20 to generate the transmitted sequence. The periodic length of a PRBS-20 is 1048575 (2 20 −1 = 1,048,575 bits). The total transmitted I/Q length extracted from the generated PRBS-20 sequence was 163,84 (16,384 symbols = 65,536 bits), much less than 1,048,575, and effectively avoided the repetition of pseudorandom numbers. The external cavity laser (ECL1) at 1550 nm, with a line width of 100 kHz and an average power of 16 dBm, was modulated as a signal light source to carry transmitted PS-16QAM data by I/Q modulator with a 3 dB optical bandwidth of 30 GHz and a half-wave voltage of 2.7 V at 1 GHz. Then, the optical beam was amplified by a polarization-maintaining erbium-doped fiber amplifier (PM-EDFA). Then, the ECL2 at the center wavelength of 1549.3 nm, operating as a local oscillator (LO) with a frequency space of 88.5 GHz with the ECL1 light, was used to generate W-band mm-wave signals. After being delivered over a 100 m standard single mode fiber (SSMF), the optical power of the combined beam was adjusted by an attenuator (ATT) to obtain the optimum input power into the photodiode (PD), and PD was implemented within the frequency range of 10∼170 GHz at −2 V DC bias and with an output power of −7 dBm.  As depicted in Figure 3a, at the transmitter, the output 88.5 GHz PS-16QAM signal was amplified by a low-noise amplifier (LNA) with a 30 dB gain and a cascaded power amplifier (PA, saturated output 18 dBm), and then the boosted W-band signal was fed into the W-band horn antenna (HA) with a 30 dBi gain. A pair of identical lenses (Lens 1 and Lens 2), the diameter and focal length if which were 10 cm and 60 cm, respectively, was designed to amplify mm-wave signals, thereby improving SNR. The pair of lenses was placed just between the transmitted HAs to focus the wireless mm-wave signal at the position of the received W-band HA. Figure 4 shows the photos of the long-distance radio-over-fiber delivery of the PS-16QAM signal over 4.6 km on Fudan campus. The Tx-side of the W-band 4.6 km wireless transmission system was situated at the Guanghua Building on the Handan Campus of Fudan University, and the Rx-side was located at the WuLi Building on the Jiangwan Campus of Fudan University. The distance between them was 4.6km on the satellite map. At the receiver, the received signal by HA was firstly boosted by a low-noise amplifier (LNA) with a 30 dB gain, and then down-converted into a 13.5 GHz intermediate frequency (IF) signal via a commercial mixer driven by a radio frequency (RF) signal with a frequency of 75 GHz. Additionally, the amplified IF signal after an EA with a gain of 26 dB was finally captured by a digital storage oscilloscope with a sampling rate of 100 GSa/s and an electrical bandwidth of 45 GHz. As shown in Figure 4, the offline DSP at the Rx-side is followed by down-conversion, resampling, the Gram-Schmidt orthogonalization procedure (GSOP), constant modulus algorithm (CMA), frequency offset estimation (FOE), carrier phase estimation (CPE), decision-directed-least mean squares (DD-LMS), and nonlinear a frequency of 75 GHz. Additionally, the amplified IF signal after an EA with a gain of 26 dB was finally captured by a digital storage oscilloscope with a sampling rate of 100 GSa/s and an electrical bandwidth of 45 GHz. As shown in Figure 4, the offline DSP at the Rxside is followed by down-conversion, resampling, the Gram-Schmidt orthogonalization procedure (GSOP), constant modulus algorithm (CMA), frequency offset estimation (FOE), carrier phase estimation (CPE), decision-directed-least mean squares (DD-LMS), and nonlinear equalization. Here, the nonlinear equalization methods which we compared include typical Volterra series, TLD, and TLD equalizers combined with ROS (TLD-ROS).

Traditional DSP Algorithm
Figure 5(a1-a5) provide the density distribution of PS-16QAM constellation diagrams after successive DSP algorithms, such as GSOP, CMA, FOE, CPR, and DD-LMS. As seen in Figure 5(a2), the constellation diagrams were circles after the CMA algorithm, but after the FOE algorithms, the frequency offset noise could be mitigated. In addition, by employing carrier phase recovery (CPR), the problem of phase offset was solved. After the DD-LMS algorithm, four points in the inner circle could be identified from complexvalued fuzzy signals. However, the remaining constellation points were widely scattered, as shown in Figure 5(a5). This implies that noises with a higher SNR requirement easily affect the signals in the outer circles. Meanwhile, in Figure 5(b1-b5), the complex-value 16QAM signals are successfully separated into real and imaginary parts after the aforementioned DSP algorithms. However, they are still fuzzy, since these steps are linear algorithms and are useless for nonlinear noises. In the 16QAM constellation diagram in

Traditional DSP Algorithm
Figure 5(a1-a5) provide the density distribution of PS-16QAM constellation diagrams after successive DSP algorithms, such as GSOP, CMA, FOE, CPR, and DD-LMS. As seen in Figure 5(a2), the constellation diagrams were circles after the CMA algorithm, but after the FOE algorithms, the frequency offset noise could be mitigated. In addition, by employing carrier phase recovery (CPR), the problem of phase offset was solved. After the DD-LMS algorithm, four points in the inner circle could be identified from complex-valued fuzzy signals. However, the remaining constellation points were widely scattered, as shown in Figure 5(a5). This implies that noises with a higher SNR requirement easily affect the signals in the outer circles. Meanwhile, in Figure 5(b1-b5), the complex-value 16QAM signals are successfully separated into real and imaginary parts after the aforementioned DSP algorithms. However, they are still fuzzy, since these steps are linear algorithms and are useless for nonlinear noises. In the 16QAM constellation diagram in Figure 5(b5), the signals recovered after DD-LMS are evenly distributed and relatively concentrated around the 16 points. This proves that PS-16QAM does not have the same density distribution as 16QAM, as PS-16QAM has a severe skew in the class distribution of constellation diagrams due to the PS algorithm. In order to conquer the uneven distribution problem, a preprocessing data method of ROS is considered a simple and valid approach to achieving balanced learning. This proves that PS-16QAM does not have the same density distribution as 16QAM, as PS-16QAM has a severe skew in the class distribution of constellation diagrams due to the PS algorithm. In order to conquer the uneven distribution problem, a pre-processing data method of ROS is considered a simple and valid approach to achieving balanced learning.

Results of ROS Processing
As seen in Figure 6a, the red-brown columns mark the amount distribution of constellation points in the inner ring of PS-16QAM, the blue ones in the middle ring, and

Results of ROS Processing
As seen in Figure 6a, the red-brown columns mark the amount distribution of constellation points in the inner ring of PS-16QAM, the blue ones in the middle ring, and the purple ones in the outer ring. The distribution of the transmitted PS-16QAM showed an uneven change with the information entropy. Such an imbalance becomes severe when the information entropy grows smaller. The process of the ROS method is presented here. The information entropy in our experiment was set as 5.6 bit/symbol. First, we oversampled the original unbalanced PS-16QAM training dataset. Thus, the corresponding target output has an equal distribution in Figure 6b. The constellation diagrams of PS-16QAM before ROS could not be fully separated into 16 points in Figure 6c, while in Figure 6d, it is shown that they could be divided into 16 equal parts after ROS.

Results of ROS Processing
As seen in Figure 6a, the red-brown columns mark the amount distribution of constellation points in the inner ring of PS-16QAM, the blue ones in the middle ring, and the purple ones in the outer ring. The distribution of the transmitted PS-16QAM showed an uneven change with the information entropy. Such an imbalance becomes severe when the information entropy grows smaller. The process of the ROS method is presented here. The information entropy in our experiment was set as 5.6 bit/symbol. First, we oversampled the original unbalanced PS-16QAM training dataset. Thus, the corresponding target output has an equal distribution in Figure 6b. The constellation diagrams of PS-16QAM before ROS could not be fully separated into 16 points in Figure  6c, while in Figure 6d, it is shown that they could be divided into 16 equal parts after ROS.  There are two methods of ROS. In the first approach, called "ROS I/Q " in Figure 7a, the complex QAM sequence is firstly divided into the real-valued (RV) I and Q parts and then randomly oversampled separately. The I and Q components, after ROS I/Q , are finally trained through the TLD equalizer, respectively. In the other approach, denoted as "ROS QAM " in Figure 7b, the density of the complex QAM signal is directly flattened via the complex-valued ROS and then split into real and imaginary parts as the input dataset is sent into the two independent DNN equalizers, respectively. In our experiment, we assumed that 70% of the received symbols after DD-LMS were constructed as training samples. Therefore, when transmitting 16,384 symbols, the length of the training dataset was established to be around 11,000 (16, 384 × 70% ≈ 11, 000), and the length of the testing set was around 5000 (16, 384 × 30% ≈ 5000). For the ROS I/Q scheme, the sizes of the generated balanced training datasets for the I-branch and Q-branch were 17,208 and 17,108, respectively, caused by the difference between the Iand Q-distribution. For the ROS QAM scheme, assuming that the lengths of the original and repeated samples were 11,000 and 15,864 symbols, respectively, the same target output number for the I-branch and Q-branch was 26,864 symbols (11,000 + 15,864 = 26,864).
Then, the category values Y (∈ [1,16]) were converted to standard complex 16QAM signals with the I data (∈ [−3,−1,1,3]) and Q data (∈ [−3,−1,1,3]). Furthermore, the generated ROS samples were fed into TLD with MSE loss, where the weight values on the Iand Q-paths, respectively, were updated until the threshold error value or iteration number was reached. was established to be around 11,000 (16,384 × 70% ≈ 11,000), and the length of the testing set was around 5000 (16,384 × 30% ≈ 5000). For the ROSI/Q scheme, the sizes of the generated balanced training datasets for the I-branch and Q-branch were 17,208 and 17,108, respectively, caused by the difference between the I-and Q-distribution. For the ROSQAM scheme, assuming that the lengths of the original and repeated samples were 11,000 and 15,864 symbols, respectively, the same target output number for the I-branch and Q-branch was 26,864 symbols (11,000 + 15,864 = 26,864). Then, the category values Y (∈ [1,16]) were converted to standard complex 16QAM signals with the I data (∈ [−3,−1,1,3]) and Q data (∈ [−3,−1,1,3]). Furthermore, the generated ROS samples were fed into TLD with MSE loss, where the weight values on the I-and Qpaths, respectively, were updated until the threshold error value or iteration number was reached.

BER Performances Using TLD Equalizer with or without ROS Preprocessing
Moreover, we compared the BER performance versus the intermediate frequency (IF) after down-conversion for 10-Gbaud PS-16QAM signals when the optical power into the uni-traveling-carrier photodiode (UTC-PD) was fixed as −1 dBm in Figure 8. It can be found that the BER increased with increasing IF, and when the IF was 21 GHz, there was steep fading in the high-frequency part of the electrical spectrum for the PS-16QAM signal, which severely affected the BER performance. In our experimental setup, the IF was optimized as 14 GHz. Meanwhile, we compared the BER performances of PS-16QAM signals, employing only TLD equalizers and the two different TLD-ROS equalizer schemes. It is obvious that the TLD-ROS equalizer schemes offer great help to the drop in BER compared with the TLD equalizer. Moreover, it is worth noting that in this case, the PS-16QAM signal was firstly shaped into a "flat" 16-QAM constellation and then trained by the neural network model, where a "perfect" ROS was carried out, namely, equiprobable symbols without repetition.

BER Performances Using TLD Equalizer with or without ROS Preprocessing
Moreover, we compared the BER performance versus the intermediate frequency (IF) after down-conversion for 10-Gbaud PS-16QAM signals when the optical power into the uni-traveling-carrier photodiode (UTC-PD) was fixed as −1 dBm in Figure 8. It can be found that the BER increased with increasing IF, and when the IF was 21 GHz, there was steep fading in the high-frequency part of the electrical spectrum for the PS-16QAM signal, which severely affected the BER performance. In our experimental setup, the IF was optimized as 14 GHz. Meanwhile, we compared the BER performances of PS-16QAM signals, employing only TLD equalizers and the two different TLD-ROS equalizer schemes. It is obvious that the TLD-ROS equalizer schemes offer great help to the drop in BER compared with the TLD equalizer. Moreover, it is worth noting that in this case, the PS-16QAM signal was firstly shaped into a "flat" 16-QAM constellation and then trained by the neural network model, where a "perfect" ROS was carried out, namely, equiprobable symbols without repetition.  Figure 9a illustrates the BER performance versus the input optical power for the PS-16QAM signal and the 16QAM signal, respectively. All nonlinear NN equalizers have the same neural structure [180-300-300-1], where 180 defines the memory length of the input layer. It is evident that when employing traditional DSP algorithms, PS-16QAM has a better BER performance than 16QAM, since it has a lower demand for SNR. However, linear CMMA and DD-LMS DSP algorithms are ineffective against the nonlinear problem. On the contrary, with the aid of TLD equalizers, the BER performances improve significantly. There is a difference in BER between 16QAM signals and PS-16QAM signals employing TLD equalizers. When the optical input power is less than −1 dBm, the BER performance of PS-16QAM is better than that of 16QAM due to the use of TLD equalizers,  Figure 9a illustrates the BER performance versus the input optical power for the PS-16QAM signal and the 16QAM signal, respectively. All nonlinear NN equalizers have the same neural structure [180-300-300-1], where 180 defines the memory length of the input layer. It is evident that when employing traditional DSP algorithms, PS-16QAM has a better BER performance than 16QAM, since it has a lower demand for SNR. However, linear CMMA and DD-LMS DSP algorithms are ineffective against the nonlinear problem. On the contrary, with the aid of TLD equalizers, the BER performances improve significantly. There is a difference in BER between 16QAM signals and PS-16QAM signals employing TLD equalizers. When the optical input power is less than −1 dBm, the BER performance of PS-16QAM is better than that of 16QAM due to the use of TLD equalizers, because PS-16QAM is better adapted to channel transmission when the SNR is low. Contrarily, when the optical input power is larger than −1 dBm, 16QAM outperforms PS-16QAM, because the greater input power meets the higher SNR requirement of 16QAM. Moreover, the outer ring constellation points of PS-16QAM signals are minority classes, which suffers inadequate training during the machine learning phase. Therefore, PS-16QAM may not perform as well as 16QAM when the optical input power is greater than −1 dBm. It is essential to deploy the ROS technique to overcome the imbalance learning issue of PS-16QAM. With the help of ROS, the receiver sensitivity of PS-16QAM signals equalized by TLD-ROS was improved, as 1 dB at a BER of the FEC-HD threshold of 3.8 × 10 −3 was compared with 16QAM signals only employing the TLD equalizers.
10−Gbaud PS−16QAM signals, and the electrical spectrum when the received IF was (b) 13.5 GHz and (c) 21 GHz. Figure 9a illustrates the BER performance versus the input optical power for the PS-16QAM signal and the 16QAM signal, respectively. All nonlinear NN equalizers have the same neural structure [180-300-300-1], where 180 defines the memory length of the input layer. It is evident that when employing traditional DSP algorithms, PS-16QAM has a better BER performance than 16QAM, since it has a lower demand for SNR. However, linear CMMA and DD-LMS DSP algorithms are ineffective against the nonlinear problem. On the contrary, with the aid of TLD equalizers, the BER performances improve significantly. There is a difference in BER between 16QAM signals and PS-16QAM signals employing TLD equalizers. When the optical input power is less than −1 dBm, the BER performance of PS-16QAM is better than that of 16QAM due to the use of TLD equalizers, because PS-16QAM is better adapted to channel transmission when the SNR is low. Contrarily, when the optical input power is larger than −1 dBm, 16QAM outperforms PS-16QAM, because the greater input power meets the higher SNR requirement of 16QAM. Moreover, the outer ring constellation points of PS-16QAM signals are minority classes, which suffers inadequate training during the machine learning phase. Therefore, PS-16QAM may not perform as well as 16QAM when the optical input power is greater than −1 dBm. It is essential to deploy the ROS technique to overcome the imbalance learning issue of PS-16QAM. With the help of ROS, the receiver sensitivity of PS-16QAM signals equalized by TLD-ROS was improved, as 1 dB at a BER of the FEC-HD threshold of 3.8 × 10 −3 was compared with 16QAM signals only employing the TLD equalizers.

Training Accuracy
For the implementation of DSP, followed by carrier phase recovery, we deployed different schemes of various linear and nonlinear equalizers, including a linear 21-tap CMMA equalizer combined with 223-tap DD-LMS; a 321-2nd tap Volterra equalizer; a TLD equalizer with the same structure as [180-300-300-1], with 180 units in the input layer, 300 cells for both hidden layers, and one neuron in the output layer; and a TLD equalizer combined with an ROS balancing method when the transmission speed of the PS-16QAM signal was 10-Gbaud. As shown in Figure 10a, it can be found that the BER performance improved with greater optical power. We can see clearly from Figure 10b-d that with the increase in the optical power of UTC-PD, the recovered constellation diagrams after DD-LMS became more and more clear, although they still could not be separated into 16 parts. When fixing the input optical power at 0 dBm, we found that the recovered constellation diagrams after the Volterra equalizer were still fuzzy and focused on four areas. TLD equalizers performed better than the Volterra series in terms of learning accuracy because they helped constellation points to separate into 16 parts and demonstrated better BER performance. The BER further dropped when we employed our proposed TLD-ROS, and the constellation diagram became clearer. Moreover, it is worth noting that in this case, the PS-16QAM signal was firstly shaped into a "flat" 16-QAM constellation and then trained by the neural network model, where a "perfect" ROS was carried out, i.e., equiprobable symbols without repetition. Moreover, we compared two TLD-ROS schemes. As can be seen from the BER result in Figure 9a, we came to the conclusion that the TLD-ROSQAM scheme performs better than the former one, with a relatively lower BER. It was also is verified that ROS is an effective algorithm for handling the imbalanced learning problem. In particular, ROSQAM is more suitable for handling imbalances caused by PAS. In terms of the receiver sensitivity, the PS-16QAM signal using TLD-ROS QAM equalizers showed a 1dB gain over typical TLD equalizers without balancing ROS, at a BER of 3.8 × 10 −3 .
improved with greater optical power. We can see clearly from Figure 10b-d that with the increase in the optical power of UTC-PD, the recovered constellation diagrams after DD-LMS became more and more clear, although they still could not be separated into 16 parts. When fixing the input optical power at 0 dBm, we found that the recovered constellation diagrams after the Volterra equalizer were still fuzzy and focused on four areas. TLD equalizers performed better than the Volterra series in terms of learning accuracy because they helped constellation points to separate into 16 parts and demonstrated better BER performance. The BER further dropped when we employed our proposed TLD-ROS, and the constellation diagram became clearer. Moreover, it is worth noting that in this case, the PS-16QAM signal was firstly shaped into a "flat" 16-QAM constellation and then trained by the neural network model, where a "perfect" ROS was carried out, i.e., equiprobable symbols without repetition. Moreover, we compared two TLD-ROS schemes. As can be seen from the BER result in Figure 9a, we came to the conclusion that the TLD-ROSQAM scheme performs better than the former one, with a relatively lower BER. It was also is verified that ROS is an effective algorithm for handling the imbalanced learning problem. In particular, ROSQAM is more suitable for handling imbalances caused by PAS. In terms of the receiver sensitivity, the PS-16QAM signal using TLD-ROSQAM equalizers showed a 1dB gain over typical TLD equalizers without balancing ROS, at a BER of 3.8 × 10 −3 .

Training Dataset Size
Next, we compared the BER curves of PS-16QAM versus different training sizes of TLD schemes in Figure 11 when the optical input power was fixed at −1 dBm. It can be concluded that a larger training dataset is beneficial to BER reduction, although at the expense of efficient system capacity. Moreover, it was obvious that the BER of PS-16QAM employing the TLD-ROSQAM equalizer performed the best, and it was reduced to below 3.8 × 10 −3 when the length of the training set exceeded 11,000. In contrast, the TLD-ROS I/Q equalizer required 1000 extra training samples, and 2000 extra training samples for the TLD equalizer. This means the ROS method is not only helpful for the adequate training of minority categories, but is also superior in training sample reduction. Although the training size was increased based on ROS, the demand for training capacity was decreased. In the following discussion, the training dataset size was set as 11,000 samples.

Training Dataset Size
Next, we compared the BER curves of PS-16QAM versus different training sizes of TLD schemes in Figure 11 when the optical input power was fixed at −1 dBm. It can be concluded that a larger training dataset is beneficial to BER reduction, although at the expense of efficient system capacity. Moreover, it was obvious that the BER of PS-16QAM employing the TLD-ROSQAM equalizer performed the best, and it was reduced to below 3.8 × 10 −3 when the length of the training set exceeded 11,000. In contrast, the TLD-ROSI/Q equalizer required 1000 extra training samples, and 2000 extra training samples for the TLD equalizer. This means the ROS method is not only helpful for the adequate training of minority categories, but is also superior in training sample reduction. Although the training size was increased based on ROS, the demand for training capacity was decreased. In the following discussion, the training dataset size was set as 11,000 samples.

Memory Size in Input Layer
Furthermore, we compared the BER performance of PS-16QAM signals with different optical input powers versus memory sizes of the NN model. The BER performance significantly improved with the increase in the system's SNR. As seen in Figure 12, it is evident that the HD-FEC threshold of 3.8 × 10 −3 could not be met for PS-16QAM signals when the system's SNR was low, with −3 dBm optical input power, until the memory size of the NN model exceeded 220. The HD-FEC threshold was reached with a memory size of only 180 when the optical input power was −1 dBm. However, BER performances improved with the increase in memory size despite the expense of computational complexity. Additionally, when the memory size was more than 180, the BER curve decreased slowly, since the overfitting effect occurred when the parameters of NN were too complicated. In light of the results mentioned above, 180 is the experiment's preferred memory size for the NN model.

Neuron Number in Hidden Layers
Moreover, we further compare the BER performance with an input optical power of −1 dBm versus the quantities of neural cells n1 and n2 in hidden layer 1 and hidden layer 2, respectively. The quantities of neurons n0 and in the input layer and the output layer were 180 and 1, respectively, for all TLD schemes. Here, the green dots mark the points at which the BER is 3.8 × 10 −3 . As shown in Figure 13, for TLD equalizers, the minimum data pair is 410 and 280. For TLD-ROSI/Q equalizers, the minimum data pairs are 280-240 or 320-200, and 270-200 for TLD-ROSQAM equalizers. Horizontally viewed from the axis in Figure 13, we can draw from the BER curved surfaces that, when the number of n2 units is fixed, BER generally drops with the increase in n1, although there are slight fluctuations. Likewise, BER curved surfaces show the same trend regarding the increase in n2, vertically viewed on the axis of coordinates. The results imply that the BER performance can be improved by increasing n1 or n2. Furthermore, we compared the complexity of different TLD schemes in Table 1, where the n1 and n2 data pair mark BER reaching the HD-FEC threshold with these hidden layer parameters. In addition, we also listed the complexity of the optimal 2nd Volterra equalizer in the same table, despite its BER not reaching the HD-FEC threshold in the same conditions. The complexity of a typical M-layer NN is described as = + + [44], where denotes the number of nodes in the i-th layer, with and

Neuron Number in Hidden Layers
Moreover, we further compare the BER performance with an input optical power of −1 dBm versus the quantities of neural cells n 1 and n 2 in hidden layer 1 and hidden layer 2, respectively. The quantities of neurons n 0 and n M in the input layer and the output layer were 180 and 1, respectively, for all TLD schemes. Here, the green dots mark the points at which the BER is 3.8 × 10 −3 . As shown in Figure 13, for TLD equalizers, the minimum data pair is 410 and 280. For TLD-ROS I/Q equalizers, the minimum data pairs are 280-240 or 320-200, and 270-200 for TLD-ROS QAM equalizers. Horizontally viewed from the axis in Figure 13, we can draw from the BER curved surfaces that, when the number of n 2 units is fixed, BER generally drops with the increase in n 1 , although there are slight fluctuations. Likewise, BER curved surfaces show the same trend regarding the increase in n 2 , vertically viewed on the axis of coordinates. The results imply that the BER performance can be improved by increasing n 1 or n 2 .

Neuron Number in Hidden Layers
Moreover, we further compare the BER performance with an input optical power of −1 dBm versus the quantities of neural cells n1 and n2 in hidden layer 1 and hidden layer 2, respectively. The quantities of neurons n0 and in the input layer and the output layer were 180 and 1, respectively, for all TLD schemes. Here, the green dots mark the points at which the BER is 3.8 × 10 −3 . As shown in Figure 13, for TLD equalizers, the minimum data pair is 410 and 280. For TLD-ROSI/Q equalizers, the minimum data pairs are 280-240 or 320-200, and 270-200 for TLD-ROSQAM equalizers. Horizontally viewed from the axis in Figure 13, we can draw from the BER curved surfaces that, when the number of n2 units is fixed, BER generally drops with the increase in n1, although there are slight fluctuations. Likewise, BER curved surfaces show the same trend regarding the increase in n2, vertically viewed on the axis of coordinates. The results imply that the BER performance can be improved by increasing n1 or n2. Furthermore, we compared the complexity of different TLD schemes in Table 1, where the n1 and n2 data pair mark BER reaching the HD-FEC threshold with these hidden layer parameters. In addition, we also listed the complexity of the optimal 2nd Volterra equalizer in the same table, despite its BER not reaching the HD-FEC threshold in the same conditions. The complexity of a typical M-layer NN is described as = + Furthermore, we compared the complexity of different TLD schemes in Table 1, where the n 1 and n 2 data pair mark BER reaching the HD-FEC threshold with these hidden layer parameters. In addition, we also listed the complexity of the optimal 2nd Volterra equalizer in the same table, despite its BER not reaching the HD-FEC threshold in the same conditions. The complexity of a typical M-layer NN is described as C MLP = n 0 n 1 + n 1 n 2 + n 2 n M [44], where n i denotes the number of nodes in the i-th layer, with n 0 and n M defining the numbers of nodes in the input layer and output layer, respectively. Moreover, for a traditional Volterra equalizer, the total number of multiplications offers the following complexity measure: C(N, L) = N ∑ n=1 (L−1+n)! (n−1)!(L−1)! [45], with N representing the Volterra order and L denoting the memory length. According to Table 1, the complexity of the optimal 2nd Volterra with 321 taps is close to that of TLD-ROS equalizers, although its effect on nonlinear compensation is far less than TLD-ROS equalizers. Therefore, it can further prove the superiority of a TLD-ROS QAM equalizer, because it reaches the HD-FEC threshold with the simplest structure. According to our test, the Adam optimizer was used, the batch size was 128, and the network required 30 epochs to converge. In our proposed TLD-ROS QAM equalization scheme, the trained neural network (NN) received plenty of selective oversampled data on which to train its few classification models. This implies that these samples in the minority class receive more training epochs than those in the majority class. Instead, the computation burden does not increase in our well-trained neural network. Furthermore, the complexity can be decreased by decreasing the amount of neural cells, and we see a complexity reduction of 45.6% for TLD-ROS QAM compared with a typical TLD equalizer without balancing ROS.

Conclusions
In this paper, we proposed a novel equalization scheme, applying two-lane DNN combined with balancing ROS to improve the overall W-band long-range wireless transmission performance. Moreover, its effectiveness was verified by PS-16QAM 4.6 km ROF delivery experiment results. In our proposed nonlinear equalization scheme, we employed ROS before TLD classification to avoid the class imbalance problem introduced by the PS technique. The experiment demonstrates that the ROS algorithm could enhance system performance without increasing the complexity of the prediction phase. In terms of the receiver sensitivity, the PS-16QAM signal using TLD-ROS QAM equalizers showed a 1dB gain over typical NN equalizers without balancing ROS when the BER reached the HD-FEC threshold of 3.8 × 10 −3 . We compared the performance regarding training accuracy, sample size requirement, and network structure complexity between the TLD, TLD-ROS I/Q , and TLD-ROS QAM schemes. We found that there was less demand for sample capacity and network structure depth by using TLD-ROS QAM , which enables leaning imbalance mitigation. Moreover, we observed a complexity reduction of 45.6% for TLD-ROS compared with a typical TLD equalizer without balancing ROS. Considering the actual wireless physical layer and its requirements, there is much to be gained from the joint use of deep learning and the pre-processing technique of balancing data. Data Availability Statement: The raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest:
The authors declare no conflict of interest.