Practical Demonstration of 5G NR Transport Over-Fiber System with Convolutional Neural Network

This study describes an experimental realization using digital predistortion (DPD) for a fifth generation (5G) multiband new radio (NR) optical front haul (OFH) based Radio over Fiber (RoF) link. For the performance enhancement and complexity reduction of RoF links, a novel Convolutional Neural Network (CNN) based DPD technique is proposed, followed by comparisons with the generalised memory polynomial (GMP) based DPD method. To support enhanced mobile broad band scenario, the experimental testbed uses the 5G NR waveforms at 10 GHz with 20 MHz bandwidth and a flexible-waveform signal at 3 GHz with 20 MHz bandwidth. For 10 km of typical single mode fiber, a Mach Zehnder Modulator with two distinct radio frequency waveforms modulates a 1310 nm optical carrier utilizing distributed feedback laser. The error vector magnitude and number of estimated coefficients, and multiplications are all used to describe the experimental outcomes. The goal of the research is to see if CNN-based DPD improves performance while lowering complexity levels to meet 3GPP Release 17 criteria.


Introduction
With major developments in the fifth generation known as 5G and beyond 5G, the centralization of radio access network (RAN) has been established due to the ever increasing rate of base stations (BS) [1], which decreases capital expenditure by simplifying network management [2]. An optical fronthaul (OFH/FH) simply connects the base band units (BBU) to remote radio heads (RRH) to facilitate C-RAN (see Figure 1). Due to unlimited benefits and increase in wireless link reach for all kinds of applications such as short to long link applications, microwave photonics-based solutions such as Radio over Fiber (RoF) have a higher significance connecting the BBUs with RRUs [1,2] with 5G in the operational stage in most parts of the world [1,2].
When it comes to D-RoF systems, the analogue to digital (ADC) and digital to analogue (DAC) required make the process exceedingly costly. Furthermore, common public radio interface (CPRI) restrictions exist due to high data rate capacity and limited bandwidth. The SD-RoF can be used to bypass the CPRI bottleneck. ADCs and DACs aren't needed because the method is based on sigma-delta modulation, which only uses one bit of ADC. However, the method is difficult, thus it's not recommended. Furthermore, because the quantization noise is significant for 1 bit, an extra band pass filter (BPF) at the RRH is required. However, this added complexity is not the only issue that needs to be addressed for S-DRoF implementation; the addition of BPF introduces additional noise in amplitude and phase, which necessitates a second method to remove these nonlinearities [13].  . The DPD-RoF model is then linearized by linking it to an RoF link. DPD is done in the digital baseband, which eliminates the need for DACs and ADCs The link that how CNN is made and trained is shown in Algorithm 1.
When it comes to D-RoF systems, the analogue to digital (ADC) and digital to analogue (DAC) required make the process exceedingly costly. Furthermore, common public radio interface (CPRI) restrictions exist due to high data rate capacity and limited bandwidth. The SD-RoF can be used to bypass the CPRI bottleneck. ADCs and DACs aren't needed because the method is based on sigma-delta modulation, which only uses one bit of ADC. However, the method is difficult, thus it's not recommended. Furthermore, because the quantization noise is significant for 1 bit, an extra band pass filter (BPF) at the RRH is required. However, this added complexity is not the only issue that needs to be addressed for S-DRoF implementation; the addition of BPF introduces additional noise in amplitude and phase, which necessitates a second method to remove these nonlinearities [13].
It is clear that utilising alternative techniques (D-RoF/SD-RoF) is time, resource and power consuming. As a result, the A-RoF systems are the preferable alternative for optical fronthaul due to their legacy, infrastructure, and cost-effectiveness (OFH). Mitigating nonlinearities in RoF transmission is critical for maximising the system's potential and has become worthy of discussion. To address the widespread challenges in all of these distinct  RoF system block diagram with Radio over Fiber and CNN based DPD system. Transmission of I/P and O/P across the general RoF link yields the RoF NN model . After that, we backpropagate error through I to train . The DPD-RoF model is then linearized by linking it to an RoF link. DPD is done in the digital baseband, which eliminates the need for DACs and ADCs The link that how CNN is made and trained is shown in Algorithm 1.
When it comes to D-RoF systems, the analogue to digital (ADC) and digital to analogue (DAC) required make the process exceedingly costly. Furthermore, common public radio interface (CPRI) restrictions exist due to high data rate capacity and limited bandwidth. The SD-RoF can be used to bypass the CPRI bottleneck. ADCs and DACs aren't needed because the method is based on sigma-delta modulation, which only uses one bit of ADC. However, the method is difficult, thus it's not recommended. Furthermore, because the quantization noise is significant for 1 bit, an extra band pass filter (BPF) at the RRH is required. However, this added complexity is not the only issue that needs to be addressed for S-DRoF implementation; the addition of BPF introduces additional noise in amplitude and phase, which necessitates a second method to remove these nonlinearities [13].
It is clear that utilising alternative techniques (D-RoF/SD-RoF) is time, resource and power consuming. As a result, the A-RoF systems are the preferable alternative for optical fronthaul due to their legacy, infrastructure, and cost-effectiveness (OFH). Mitigating nonlinearities in RoF transmission is critical for maximising the system's potential and has become worthy of discussion. To address the widespread challenges in all of these distinct RoF system block diagram with Radio over Fiber and CNN based DPD system. Transmission of I/P and O/P across the general RoF link yields the RoF NN modelÎ. After that, we backpropagate error through I to train I −1 . The DPD-RoF model is then linearized by linking it to an RoF link. DPD is done in the digital baseband, which eliminates the need for DACs and ADCs The link that how CNN is made and trained is shown in Algorithm 1.
It is clear that utilising alternative techniques (D-RoF/SD-RoF) is time, resource and power consuming. As a result, the A-RoF systems are the preferable alternative for optical fronthaul due to their legacy, infrastructure, and cost-effectiveness (OFH). Mitigating nonlinearities in RoF transmission is critical for maximising the system's potential and has become worthy of discussion. To address the widespread challenges in all of these distinct fields, a variety of strategies have been employed. The ones widely used are mentioned in Section 2 under the literature review.
As transmission quality diminishes, the laser nonlinearities become more significant. When it comes to long-range networks, however, nonlinearities caused by a combination of fiber dispersion and laser chirp are frequently the primary cause of impairments and degradation [7]. Due to the high peak-to-average power ratio (PAPR), Orthogonal Frequency Division Modulated (OFDM) signals, such as the highlighted 5G transmission, are also susceptible to similar distortions, i.e., PAPR.
This study introduces nonlinear behavior and signal degradation compensation for OFH based RoF systems using 5G NR based RoF technology, to the best of the authors' knowledge. The novelty of this study is numerous, following a thorough literature assessment on nonlinearity mitigation in Section 2: 1.
In the experimental testbed, multiband 5G NR signals are used to cover enhanced mobile wide band (eMBB) scenarios and small cells for 3 GHz and 10 GHz, respectively.

2.
An abled DPD method based on a Convolutional Neural Network (CNN) is proposed and demonstrated. In comparison to existing learning architectures, the proposed DPD identification approach has a better performance and lower complexity than our previous machine learning approach.

3.
Finally, a simple CNN-based DPD algorithm is proposed as an upgrade to our previously published DPD-based technique based on deep learning for 20 MHz with 5G New Radio (NR) based RoF links. A new sort of training is used to implement the CNN DPD technique, which does not use In-direct Learning Architecture (ILA). We first use an RoF CNN to simulate the generic RoF connection and then use this to train the proposed DPD CNN by backpropagating the mistakes.

4.
A comparative experimental investigation was conducted in which the previously proposed ILA-based GMP method was benchmarked against the CNN technique and compared utilising a 5G NR multiband signal. Error Vector Magnitude (EVM) and multiplications and coefficients required measuring complexities are used to evaluate the performance.

Literature Review
In this section, only Machine Learning Methods are used to discuss the linearization methodologies that have been inferred for the RoF systems. Linearization of OFH has been a significant research area as summarized below in Table 1. There has always been a drive to have better linearization to get higher performance that has shifted the focus towards Machine Learning (ML). ML being the new avenue is the core discussion of this work so a recent literature review has been discussed for all methods with specific importance to machine learning methods. For the reduction of the RoF system impairments, a thorough literature review is described in Table 1. The table outlines the method used, the type of linearization used, the category, the parameters examined, and the benefits and drawbacks of each method.

Convolutional Neural Network Architecture
Convolution Neural Networks (CNN) are advanced networks that, like the recommended NN-based DPD model, require a large amount of training data. The RoF link is then cascaded with this model, but the outcome is unknown. However, because the output of an RoF connection is known, we create an RoF NN model and train it to mimic the original RoF link. We can now backpropagate through this RoF CNN and adjust the parameters in the recommended CNN DPD model after we have formulated it.
The convolutional layer is the first layer of a CNN. With the help of a kernel, it extracts features from input data and outputs a feature map (convoluted data).
The kernel or filter (kernel matrix), input data (input matrix), and feature map are all components of a convolutional operation.
Assuming the examined RoF link has H(n) function and y(n) as an output signal, and that a baseband signal x(n) must be delivered over it where H(n) = y(n) x(n) , DPD tries to calculate the inverse transfer function of this RoF link, represented byÎ −1 , whose output is then indicated byx(n).
This can be written as: while, as G represents the gain. The CNN here finds out theÎ −1 utilized for predistortion. A direct training cannot be done for establishing the CNN for DPD as the idealx(n) is not known as illustrated in Figure 2. Initially, the second CNN simulates the RoF link. Here,x(n) is an input and y(n) G as output for a generic RoF link. This leads to the CNN learning, and it can now identify an approximate transfer functionÎ. The model weights are fixed once the RoF CNN model is formed, and then it is connected to the CNN DPD model. To calculate error using a loss function, we now use the original input, x(n) and output as training data. We then backpropagate it throughÎ to trainÎ −1 .

CNN Model Salient Features
This section will go through all the characteristics of CNN that are required for use in a DPD-based RoF system. The characteristics that make a CNN model are discussed in this section one by one. These characteristics serve as a foundation of the successful implementation of the CNN.

Optimizer
The Adam algorithm amalgamates the Momentum and RMS functions, which means that it keeps an exponentially decaying average of prior squared gradients as well as an exponentially decaying average of previous gradients. Due to the component of speed and the ability to adjust gradients, Adam is an effective optimizer. This is also why it is the most commonly utilised optimizer. ADAM has outperformed the competition, particularly in terms of convergence speed, which is why it is the most widely used and efficient optimizer.

Activation Functions
A node's capacity is not defined apriori. The activation helps to deduce it. It accomplishes this by establishing a relationship between the node's various weights and biases and then applying the relationship to the node as a function, so generating responses. It also assists it in learning complex data patterns. They convert the node's incoming signals into an output signal that will either be used in the network's next layer or will be the output. In contrast to the gradient of sigmoids, which grows smaller as the absolute value of gradient increases, the chance of a vanishing gradient is lesser when considering ReLu. ReLu has a constant gradient, which aids learning and makes it the better choice.

Regularization
CNN with high learning parameters and sparse or noisy training data are prone to overfitting problems. Overfitting occurs when a model improves its performance on training data but fails to classify fresh test instances that are part of the same domain problem. We employed an L2 regularization approach to avoid this problem.
In this method, a regularization term is taken into account when updating the cost function. This regularization term reduces the value of the weight matrices by resulting in a simpler model, minimizing the problem of overfitting greatly.
In L2, we have: where w is weight and λ is the regularization parameter (which is adjusted for more exact results). The L2 method is known as a decay approach for updating weights because it forces the weights to decay towards zero, though they never reach zero exactly. We have developed a foundation of all components of a CNN, the DPD CNN, which was used to distort the RoF system, and the replicated RoF CNN, which is needed for training this DPD CNN, based on the conversation so far. The CNN used here is a feedforward fully connected network with N hidden layers and K neurons per hidden layer. Figure 3 depicts the symbolic structure of the employed CNN. Due to the complex nature of baseband signals, which represent both the real and imaginary components of the signal, the CNN has two inputs and two outputs. At least one of the several hidden levels has been activated using the ReLu function (owing to its lower complexity).
into an output signal that will either be used in the network's next layer or will be the output. In contrast to the gradient of sigmoids, which grows smaller as the absolute value of gradient increases, the chance of a vanishing gradient is lesser when considering ReLu. ReLu has a constant gradient, which aids learning and makes it the better choice.

Regularization
CNN with high learning parameters and sparse or noisy training data are prone to overfitting problems. Overfitting occurs when a model improves its performance on training data but fails to classify fresh test instances that are part of the same domain problem. We employed an L2 regularization approach to avoid this problem.
In this method, a regularization term is taken into account when updating the cost function. This regularization term reduces the value of the weight matrices by resulting in a simpler model, minimizing the problem of overfitting greatly.
In L2, we have: where w is weight and λ is the regularization parameter (which is adjusted for more exact results). The L2 method is known as a decay approach for updating weights because it forces the weights to decay towards zero, though they never reach zero exactly. We have developed a foundation of all components of a CNN, the DPD CNN, which was used to distort the RoF system, and the replicated RoF CNN, which is needed for training this DPD CNN, based on the conversation so far. The CNN used here is a feedforward fully connected network with N hidden layers and K neurons per hidden layer. Figure 3 depicts the symbolic structure of the employed CNN. Due to the complex nature of baseband signals, which represent both the real and imaginary components of the signal, the CNN has two inputs and two outputs. At least one of the several hidden levels has been activated using the ReLu function (owing to its lower complexity). The O/P for the primary layer (hidden) is represented: where, represents the primary hidden output layer, is the nonlinear function of activation and is the weight and is the bias for the first output layer in the network. The general output for the i th layer is represented as: where, ℕ ∶ 2 ≤ ≤ After N hidden layers, the ultimate result will be: The O/P for the primary layer (hidden) is represented: where, l 1 represents the primary hidden output layer, f is the nonlinear function of activation and W 1 is the weight and b 1 is the bias for the first output layer in the network. The general output for the i th layer is represented as: After N hidden layers, the ultimate result will be:

Training Algorithm
The CNN DPD model was trained using the following algorithm. Mean Square Error (MSE) is employed as the loss measuring function, ADAM is used for the optimization, ReLu is used as activation function and backpropagation is used to update the weights. To improve the performance, Z number of iterations are used.
As previously stated, we use the input and the output of the original RoF connection to train our emulated RoF CNN model, and after this model is acquired, we connect the CNN DPD model to it. Then, once this training is complete, we attach the actual RoF link to this DPD CNN and begin the predistortion process (see pseudocode in Algorithm 1). It is shown in the algorithm that the CNN is made on I(x(n)) with updates onÎ followed by the pre-distortion functionality usage in terms ofÎ −1 (x(n)) as shown in Figure 2.

Experimental Setup
A scenario comprising of multiband 5G NR at 3 GHz (20 MHz) and 10 GHz (20 MHz), which was mentioned in our prior work [54], was used to validate this technique, but no DPD was used. As an upgrade to this architecture, a multiband DPD block has been added to this arrangement to improve the speed of this link. A dual drive Mach Zehnder Modulator (MZM) operating with two separate RF-driven signals, and a 1310 nm DFB laser is used in the system depicted in Figure 11. The Vector Signal Generator (VSG) labelled as VSG1 transmits RF1, a 5G NR waveform at 10 GHz, while the 5G transceiver transmits RF2, a 3 GHz flexible waveform signal. The DPD process can be broken down into three stages.
At first, these signals undergo up-conversion one by one at their respective carrier frequencies of 3 and 20 GHz, respectively, before passing them through 10 km of Standard Single-Mode Fiber (SSMF) and receiving them with a photodetector (0.71 A/W) that receives the signal and converts it back to the electrical domain. An amplification step is introduced since the multiband needs to be separated independently. The 10 GHz and 3 GHz signals are separated using a diplexer (DPX). After that, the signals are sent to several vector signal analyzers (VSA). For performance evaluation, each VSA output is passed to the post-processing block. This phase is done without DPD, which means the output is evaluated without going through the DPD process.
The DPD method represented in Figure 4 is used in the second phase, referred to as the DPD training phase, and training is used until the error converges.
To put it another way, DPD validates the theory behind inverting the amplitude and phase responses obtained at electrical amplifiers EA1 and EA2. CNN methods can be used depending on the user's needs and comparative requirements.
We use the PRS (positioning reference signal) given in the 5G NR architecture to accomplish synchronization for the received waveforms (both, input and output). PRS is assumed to have a bandwidth of 20 MHz/106 resource blocks. In the time domain, the received and output reference broadcast signals are correlated, and the PDP (power delay profile) is processed through the maximum block to determine the strongest path of arrival.
In the third step, the pre-distorted baseband signals are fed into the DPD block, where they are upconverted to their carrier frequency by their respective VSGs before being fed onto the optical connection. The signal obtained at the photodiode is then transmitted through a diplexer DPX to isolate the various multi-bands before being sent to the DPD training step. We flip the switches in the opposite direction during the DPD validation step. The evaluation for 5G NR frames is completed by predistortion and then transferring the frames to the VSG. Because the nonlinearities of the RoF link slowly change due to thermal impacts and component ageing, we conclude that real-time operation in the adaptation is unnecessary. Table 2 summarizes the parameters employed, which have previously been used in [1,2] and other state-of-the-art [54,55] studies.
being fed onto the optical connection. The signal obtained at the photodiode is then transmitted through a diplexer DPX to isolate the various multi-bands before being sent to the DPD training step. We flip the switches in the opposite direction during the DPD validation step. The evaluation for 5G NR frames is completed by predistortion and then transferring the frames to the VSG. Because the nonlinearities of the RoF link slowly change due to thermal impacts and component ageing, we conclude that real-time operation in the adaptation is unnecessary. Table 2 summarizes the parameters employed, which have previously been used in [1,2] and other state-of-the-art [54,55] studies.    Table 3 lists the parameters that define the architecture of the intended Convolutional Neural Network. The parameter selection is done by a trial-and-error test [1]. The    Table 3 lists the parameters that define the architecture of the intended Convolutional Neural Network. The parameter selection is done by a trial-and-error test [1]. The table's final section assesses the NN's complexity by computing its expressions in terms of its coefficients.

Results and Discussion
The results for the experimental setup outlined in the previous part are discussed in this section. The Mean Square Error (MSE) is one of the methods for estimating the accuracy of coefficient estimation for various architectures. From our previous work, GMP and Magnitude Selective Affine (MSA) DPD methods results are used to compare with our proposed CNN method. When no DPD is used, the MSE is 27 dB, while GMP has 30 dB. For Canonical Piecewise Linearization (CPWL) and MSA, the value drops to 35 dB, whereas CNN has an MSE of 39 dB.
The proposed approach is compared and reported in the form of Error Vector Magnitude (EVM). From our recent work [2], we just use GMP methods as a baseline architecture to compare with our proposed CNN method.

Error Vector Magnitude
The most common performance indicator utilised in 3GPP for this research item's performance evaluation is Error Vector Magnitude. The difference between the symbol's demodulated 'anticipated' value and the demodulated received symbols 'real' value is determined by EVM. EVM can be expressed as [5,55]: where M is the number of constellation symbols, S m denotes the constellation's real symbol associated with the symbol "m" and S 0,m is the real symbol associated with S m . The EVM limit for 3GPP using 256 QAM is 3.5% [56]. In Figure 5a, the Error Vector Magnitude EVM is represented for flexible waveforms compared for the case when there is no DPD improvement procedure employed when GMP DPD method is employed, and the CNN DPD method is utilized. It is observable that the CNN DPD method results in a better reduction as compared to GMP for all the flexible waveform architectures. Similarly in Figure 5b, it can be seen that when the RF input power is varied, the reduction in EVM due to the proposed CNN method is much better as compared to GMP method. For a high RF input power of 5 dBm, DPD CNN reduces EVM to approximately 2% while GMP has EVM around 5% so EVM reduction is about 9% with the CNN method bringing the performance within the limits set by 3GPP.
Telecom 2022, 3, FOR PEER REVIEW 11 better as compared to GMP method. For a high RF input power of 5 dBm, DPD CNN reduces EVM to approximately 2% while GMP has EVM around 5% so EVM reduction is about 9% with the CNN method bringing the performance within the limits set by 3GPP.
(a) (b) Figure 5. EVM performance comparison (a) for CNN and No DPD case for flexible 5G NR waveforms. (b) 5G NR performance DPD efficacy in terms of EVM for proposed CNN DPD method vs GMP method and without DPD for varying RF input power.

Complexity Considerations
By lowering complexity and attaining equivalent performance to the DNN approach outlined in prior work [2], CNN-DPD achieves a significant improvement. Table 4 depicts the difficulty of creating a DPD model, which is mostly determined by the number of actual multipliers necessary, as multipliers use the majority of the hardware resources. MSA DPD (220 multiplications) is significantly less complicated than CPWL, as seen in

Complexity Considerations
By lowering complexity and attaining equivalent performance to the DNN approach outlined in prior work [2], CNN-DPD achieves a significant improvement. Table 4 depicts the difficulty of creating a DPD model, which is mostly determined by the number of actual multipliers necessary, as multipliers use the majority of the hardware resources. MSA DPD (220 multiplications) is significantly less complicated than CPWL, as seen in Table 4. (880 multiplications). More complex versions can be constructed by increasing the memory depth Q and nonlinearity order K in the Volterra series. However, the computational complexity must be considered. This means that while choosing the DPD model and its complexity, a reasonable trade-off between complexity and performance may be made.
For a comprehensive review of CNN and Volterra techniques in the manner of complexity, Table 4 summarizes the complexities for the methods. CNN complexity is a challenging problem to solve, however as shown in this work, it can be reduced by employing a minimal number of N and K. Table 4. Comparisons of complexity.
In Table 5, the values of MSE and EVM @ 5dBm are summarized for the proposed method.

Conclusions
This paper describes the effective implementation of 5G NR multiband OFH utilising Convolutional Neural Networks to reduce RoF nonlinearities. To begin, a novel CNN approach was developed that not only has less complexity than conventional machine learning methods but also performs better with a 75% decrease in complexity overheads and multiplications. The theoretical foundations and pieces needed to design a Neural Network are also explained in the article. To a distance of 10 km, 5G New Radio multiband transmissions at 3 GHz and 10 GHz are used. The proposed CNN-DPD approach reduces EVM by 8% to 2.1% bringing the performance within 3GPP limits.