Bandwidth-Scalable Digital Predistortion of Active Phased Array Using Transfer Learning Neural Network

This paper proposes a transfer learning neural network (TLNN) approach for digital pre-distortion (DPD) of mm-Wave active phased arrays (APA) operated under variable signal bandwidth regimes. Compared with the conventional artificial neural network (ANN) method, the proposed approach can achieve similar linearization performance with much lower computational complexity by transferring part of a trained model from one bandwidth to another bandwidth. In the recently introduced 5G, the increased signal bandwidth triggers considerable memory effects in the APA. Moreover, dealing with different signal bandwidths typically requires a time-consuming recalculation of the predistorter parameters. In this paper, the authors propose to have those challenges solved by using a DPD model based on the transfer learning method. The proposed approach was validated with over-the-air (OTA) measurements on an APA excited with signals of varying bandwidth, namely from 20 MHz to 100 MHz. Experimental results show a significant reduction in the training time while ensuring good linearization performance. With the applied TLNN DPD, an 8.5 dB improvement of adjacent channel leakage ratio (ACLR) and 8.6% points improvement of error vector magnitude (EVM) is achieved. Under the variable bandwidth regime, the complexity of the DPD model in terms of the number of multiplications is reduced from 199168 to 160. The proposed TLNN DPD proved to be robust concerning variation in the bandwidth of the APA excitation signal.


I. INTRODUCTION
Active phased array (APA) transmitters including multiple antennas operating at mmWave frequencies, which are used in the recent wireless communication systems, are facing new challenges in the forms of high bandwidth, high nonlinearity and mutual coupling between antennas together with dynamic change of the bandwidth. Digital predistortion (DPD) techniques based on conventional methods can not easily handle these new challenges without increasing the The associate editor coordinating the review of this manuscript and approving it for publication was Amjad Ali. computational complexity. Together with the wide bandwidth, 5G has introduced a dynamic bandwidth selection that requires the mobile transmitter to quickly adapt to different operating conditions. Dynamic bandwidth selection together with the impact from the transmission channel makes the need for reusing the adjusted parameters defined for calibration, linearization, etc. highly important [1]. The transmission quality of the communication system is to a high degree dependent on how well it can dynamically change the bandwidth and power level with minimum cost in terms of speed and cost. The state-of-the-art (SoA) DPD systems deployed by the industry have excellent performance for VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ relatively steady conditions where the bandwidth and power are not rapidly changed. For cases with a rapid change of transmission parameters and environment, the existing DPD methods need to update a huge amount of coefficients which potentially can make the system complex and slow. Artificial neural networks (ANN) have been widely used in modeling nonlinear devices because of their good approximation ability to nonlinear functions [2], [3]. For wide bandwidth signals, in particular, the memory effects have a significant impact. There are generally two dynamic neural network structures for taking care of memory effects [4]. The first structure, recurrent neural networks (RNNs), utilizes feed-forward and feedback signal processing and uses outputto-input time-delays lines. In another structure, a timedelay neural network (TDNN), combines I/Q processing with input time-delay lines to handle memory effects. In order to extract amplitude and phase information from modulated complex wave-forms, ANNs need to consider operating with either complex-valued (CV) input signals, weights and activation outputs, or real-valued (RV) double-inputs double-outputs (and real weights and activation outputs), i.e. in the form of multiple I and Q components. CV operation leads to heavy calculations and a longer training phase [5] and therefore the proposed model in this work uses the RV concept. The real-valued time-delay neural networks (RVTDNNs) offer superior performance and easy baseband implementation when used for inverse modeling of PAs with strong nonlinearities and memory effects [6].
However, by increasing the bandwidth and nonlinearity, the RVTDNN requires a higher input dimension, i.e. larger number of IQ data, and more hidden layers which make the model slow. Several works based on transfer learning have been introduced to cope with these challenges [6], [7]. The study of transfer learning is motivated by the fact that one can intelligently apply knowledge learned previously to solve new problems faster or with better solutions [8]. A similar problem also lies in the way of other dense ANN networks with several layers and neurons used in image recognition [9] and channel estimation [10], [11]. In these works, the transfer learning techniques grant the models the ability to rapid image recognition and channel estimation by leveraging prior knowledge. Inspired by these works, this paper investigates applying transfer learning DPD for bandwidth-scalable APAs. Fig. 1 shows the block diagram of the actual transfer learning neural network (TLNN) linearization technique. Part of the narrow bandwidth model from the previous training has been transferred and combined with the fine-tuning layers to make the new model for the wide bandwidth.
This paper is organized as follows: Section I is the introduction. Section II presents the proposed linearization method. The measurement setup is in section III. The optimization of the pre-designed model and the reference model is described in section IV. Section V is about transfer learning implementation. Bandwidth-scalable predistortion results are shown in section VI and finally, the conclusion of this work is presented in section VII.

II. PROPOSED TLNN LINEARIZATION METHOD
This section describes the selected model for linearization, the data structure and architecture of the model together with a complexity analysis of the proposed neural network.

A. SISO MODEL FOR TLNN-BASED LINEARIZATION
Several modified DPD algorithms have been introduced to combat the challenges raised by the recently introduced hardware configuration for 5G mmWave transmitter based on the APA [12], [13], [14]. A single input single output (SISO) model where the entire transmitter has been considered as a two-port system has been presented using an observation receiver in far-field in [15], [16], and [17]. A memory polynomial model (MPM)-based DPD technique based on this SISO model has been used for the linearization of the antenna array in presence of crosstalk. It has been shown that the trained DPD is able to mitigate the impact of cross-talk at PAs outputs, which is also called load modulation, in a limited range of steering angle. The step size for reusing the trained model is dependent on the target specification of linearity and the amount of coupling among the branches of the APA which again is dependent on the size of the array and the distance between the patches [18]. The potential mismatches between PAs can be compensated so that they all exhibit the very same behavior which is presented in [19]. In this way, linearization in all directions can be achieved with a single DPD, in contrast to linearizing the main beam only. However, this approach requires analog circuits for compensating the mismatch in each branch which may introduce high complexity and delay for large arrays and the potential changes in the PAs' behaviors due to crosstalk. In the present work, based on the SISO model, the reference signal for DPD identification is obtained through far-field measurements of an observation antenna placed at the main beam direction, Fig. 1, and the focus here is on the challenges related to high bandwidth and dynamic bandwidth behavior.

B. DATA STRUCTURE OF THE MODEL
The data structure of the exploited TLNN is shown in Fig. 2, where y I (n) and y Q (n) are the I/Q components of input to the ANN andx I (n) andx Q (n) are the I/Q components of the output of the network. The data format of the source and target datasets is the same, and the inputs and outputs are represented as: and where M denotes the number of delay lines at the input of the network. The procedure for training is as follows: a set of source datasets, e.g. measured IQ samples of a 5G signal with 20 MHz channel bandwidth, are used for offline training. Part of the network is then used as a transfer learning model for the target dataset, which is a 5G signal that can have the same or different channel bandwidth. As illustrated in Fig. 2, the first k layers of the model, FC k , are used for extracting the nonlinear characteristics of the APA in low bandwidth cases and are frozen after executing offline training. The output of the frozen layers, T n , is written as: Here, f frozen (.) indicates the function representing the frozen layer. The block diagram in Fig. 2 represents a generic implementation of the TL concept.

C. TRANSFER LEARNING DPD ARCHITECTURE
The proposed DPD architecture used in this work is based on RVTDNN, where an arbitrary number of memory taps can be assessed [6]. The same taps configuration is employed between input and feedback signals regardless of the physics to be modeled. The proposed architecture has a fully-connected structure and the inputoutput relationship between the hidden layers is defined as [21], [22], [23]: where j is the j-th fully connected layer and f (.) is the activation function and y (j) is a P × 1 vector representing the output values of the j-th layer, W is a P × Q matrix representing the trainable coefficients, x (j−1) is a Q × 1 vector representing the outputs of the previous layers and B is a P × 1 vector representing the trainable biases. Thus, the number of outputs of the previous layer is defined by Q, and the number of inputs to the next layer is defined as P. By using the activation function, denoted as f in Fig. 2, any arbitrary nonlinear functions can be fitted. The proposed RVTDNN architecture uses the rectified linear units (ReLU) activation function, which is less computationally expensive than hyperbolic tangent (Tanh) and Sigmoid because it involves simpler mathematical operations [24], [25]. The ReLU activation function is defined as: The ReLU activation function introduces nonlinearity by setting negative inputs to 0, which also adds sparsity to the ANN and can simplify the computations. The fine-tuning layers denoted by z, where z = N − k, are defined as transferred layers (TL). The output of the i-th fine-tuning layers, (TL) i , is written as: where w T i and b i are the weights and biases of the i-th transfer layer and the final output, Y ′ n is defined as: where w T out and b out denote the weights and biases of the output layer and (TL) z is the output of z-th transfer layer. f 1 (.) and f 2 (.) are the activation functions which can be  chosen differently. In the presented work, both activation functions are of the ReLU type. The experimental dataset is divided into a training set and a validation set at 70% and 30%, respectively. The weights and biases of the network are learned by choice of an appropriate loss function. The two most used loss functions for regression tasks are mean square error (MSE) loss and Huber loss. The Huber loss is a robust loss function used for a wide range of regression tasks [26] and it is used for the presented work. The Huber loss function behaves quadratic for small residuals and linearly for large residuals and is defined as [27]: where δ, set to 1, is the parameter of Huber loss. Y ′ n and Y n denote the observation and prediction values, respectively. Through backward propagation and using the Adam optimization algorithm, the local minimum is approached. The measured data are collected and uploaded using MATLAB. The ANN is built and trained using the Keras 2.3.0-tf package in Python.

D. COMPLEXITY OF THE PROPOSED ANN
The complexity analysis is made with a starting point in Eq. 4, assuming only fully connected layers with equal amounts of neurons and P = Q. Between each fully connected layer, there are P 2 multiplications. The number of operations between the input layer and the first hidden layer is 2MP multiplications, where M is the number of time delays and P is the number of neurons. There are 2P multiplications between the last hidden layer and the output layer. The total amount of multiplications is: where the number of hidden layers is defined by J .

III. OTA MEASUREMENTS SETUP
The block diagram of the OTA measurements setup using a compact antenna test range (CATR) is shown in Fig. 3 [20] and the actual laboratory setup is in Fig. 4 Fig. 5a and Fig. 5b illustrate the amplitude to amplitude (AMAM) gain distortion and the amplitude to phase (AMPM) phase distortion at the APA output. Fig. 5c shows the time-domain compression of the waveform at the APA output. All measurements are based on 100 MHz bandwidth.

IV. ANN OPTIMIZATION RESULTS
The ANN optimization methodology presented in [30] was used in this paper. The methodology is applied to an   trade-off between the ACLR, the EVM and the number of multiplications. By keeping the number of time delays to 4 and the number of neurons to 256, it is possible to achieve an ACLR improvement of 13.1 dB, as shown in Fig. 6a, and EVM improvement of 8.8 % points, Fig. 6b, while keeping the number of multiplications as low as possible, i.e. app. 199 k, Fig. 6c. Increasing the number of neurons to higher than 256 will lead to ACLR incremental improvements below 0.4 dB and EVM incremental improvements below 0.2 % points, which we consider negligible for the sake of our optimization procedure as shown in Fig. 6a-b. There is a clear indication from Fig. 6c that in a dense network, with several hidden layers, the number of multiplications will increase drastically by the number of neurons. This is in agreement with Equation (9)   layers, for achieving a lower training time and computational complexity. In [30] a procedure to find the optimal values for the number of layers and the number of neurons has been proposed. The PSD result is in Fig. 7a shows the achieved outof-band improvement obtained by deploying the proposed optimized ANN-based DPD. Fig. 7b and Fig. 7c show the in-band AM/AM and AM/PM gain distortions related to EVM. These results are perfectly aligned with the expected performance based on the proposed optimization procedure, whose results are summarized in Fig. 6.

V. TRANSFER LEARNING IMPLEMENTATION
For implementing the transfer learning algorithm, the part of the model of 20 MHz bandwidth is copied and used as a transferred pre-design model for linearization of the 100 MHz bandwidth signal. This is done by freezing three hidden layers from the trained model of 20 MHz bandwidth. The frozen layers are then combined with the fine-tuning layers to build the model for 100 MHz bandwidth. The implemented architecture of the TLNN is in shown Fig. 8. Table 1 summarizes the implementation procedure used for the proposed method. Table 2 shows network configuration parameters for regular ANN and TLNN. By using the transfer learning approach, the number of hidden layers is reduced from four to one and the number of neurons is reduced from 256 to 16. Furthermore, the model from one bandwidth is transferred to another bandwidth which means the transferred pre-designed model already includes most of the knowledge of the nonlinear behavioral model of APA.

VI. BANDWIDTH-SCALABLE PREDISTORTION RESULTS
First, the model for the reference 100 MHz bandwidth based on regular ANN, has been optimized using the same procedure as described in section IV.   bench-marked with the regular ANN which has four fully connected layers and 256 neurons in each. The structures of the input and output layers of the networks are the same for both regular and TLNN. The number of multiplications based on Equation (9) for regular ANN and TLNN are given as: and for TLNN with 1 hidden layer and 16 neurons, it will result to:   of neurons to e.g. 8 neurons, results in degradation of the performance in terms of EVM and ACLR. This highlights the role of the fine-tuning layers. Detailed performance comparisons between regular ANN and the proposed TLNN are in Table 3 and Table 4 for respectively 50 MHz and 100 MHz bandwidths. These results show that it is pos- VOLUME 11, 2023 sible to achieve approximately the same linearization performance compared to regular ANN, i.e. an EVM improvement of 8.6 % points and ACLR improvement of 9 dB, by using TLNN with 16 neurons as is shown in Table 4. Hence the proposed approach proves to be robust versus signal bandwidth and can be used as a bandwidth-scalable linearization technique. On the other hand, TLNN allows for reducing the number of hidden layers (through re-using the frozen model) and  Table) size necessary to implement the DPD. Instead of storing two completely different sets of ANN DPD parameters (SoA approach), one for the narrow bandwidth use case and the other for the wide bandwidth use case, system engineers will need to store much fewer parameters for linearizing the wide bandwidth use case, because they can reuse most of the ones calculated for the narrow bandwidth. A long duration of the algorithm identification will be a problem for an adaptive online NN-based linearization technique. However, using the proposed TLNN, the time to calculate the incremental layers will be reduced and consequently relax the adaptive online processing issue. The HW implementation itself is challenged by the realization of the online OTA feedback receiver. There is a need for a far-filed observation antenna for providing the OTA feedback signal for adaptive online DPD. The feedback signal could be obtained from the receiver antenna of the same device, but the proper implementation techniques are still under discussion in industry and academia. One promising proposal is to use the auxiliary antenna connection (the diversity or MIMO antenna) [31]. However, there may be an issue with the low coupling ratio between the transmitter and these auxiliary antennas.

VII. CONCLUSION
This paper presented a bandwidth-scalable over-the-air DPD of an APA transmitter based on a TLNN method. The proposed methodology allows for reducing the hardware implementation complexity in terms of the number of multi-plications while ensuring the same linearization performance as a regular ANN. In the proposed method, part of the model is fixed as a pre-designed model, and then an incremental model component was trained and deployed for fine-tuning the remaining adaptation layers to build the final model. This paper demonstrated how such a TL technique could be used to implement a bandwidth-scalable digital predistorter. The ANN layers identified for one signal bandwidth were reused and enhanced with an incremental neuron layer to allow the ANN predistorter to successfully linearize input signals with wider bandwidths. The proposed linearization technique was validated with measurements on a state-of-theart 4 × 4 APA and a setup using up-and down-conversion from sub-6 GHz to 28 GHz for verification. Experimental results showed that our optimized ANN-based DPD could linearize a 20 MHz 5G signal with an EVM improvement of 8.8 % points and an ACLR improvement of 13.3 dB. It was also demonstrated that using TL, the same ANN DPD can be reused to linearize a 5G signal with a much wider bandwidth, namely 100 MHz. To do so, only an additional layer of 16 neurons was added on top of the reused ANN DPD. Such an approach allowed us to obtain an EVM improvement of 8.6 % points and an ACLR improvement of 8.5 dB. The multiplications of the ''Frozen layers'' should also be considered when evaluating the complexity of the overall TLNN-based DPD actuator, however, the complexity of TLNN-based DPD model identification is reduced to a factor of 160/199168 compared with the conventional ANN. The reduced complexity allows to bring down the cost of the implementation using digital hardware. Further research is being conducted to make the proposed bandwidthscalable DPD fully robust concerning the signal bandwidth and other transmitter operating conditions. Our future goal is to enhance the TL methodology to obtain a universal set of parameters that can be fully reused to linearize multiple signal bandwidths. Such a result would allow lowering further the complexity and cost of the DPD implementation on digital hardware. Furthermore, we expect that if the average output power and the peak-to-average power ratio change greatly, the nonlinear characteristics of the power amplifier will also change. An investigation of the capability of TL-based ANN for power-scalable scenarios may also be interesting for future work.  He is currently an Associate Professor of RF and mm-wave circuits and systems with the Department of Electronic Systems, Aalborg University. He has 20 years of experience in RF and millimeter wave circuits and systems, including 12 years of experience in CMOS RF/mixed-signal IC design. He is the grant holder and the PI of two Danish national research projects, and the Management Committee Member substitute from Denmark in the EU COST Action IC1301 with the aim to gather international efforts and address efficient wireless power transmission technologies. His current research interests include circuits and antennas for 5G and satellite communications, low-power CMOS RF and millimeter wave circuits and systems, circuits and systems for biomedical imaging, and artificial intelligence. He is a TPC Member of IEEE NORCAS. His Ph.D. thesis was listed at the Spar Nord Annual Best Thesis Nomination. He serves as a Reviewer for IEEE and Kluwer. Since 1993, he has been with Aalborg University, where he is currently a Full Professor, heading the Antennas, Propagation and Millimeter-Wave Systems Laboratory, with 25 researchers. He is also the Head of the Doctoral School on Wireless Communication, with some 40 Ph.D. students enrolled. He has also worked as a Consultant for the development of more than 100 antennas for mobile terminals, including the first internal antenna for mobile phones, in 1994, with the lowest SAR, the first internal triple-band antenna, in 1998, with low-SAR and high-TRP and TIS, and lately, various multi-antenna systems rated as the most efficient on the market. He has worked most of the time with joint university and industry projects and has received more than U.S. $21M in direct research funding. He is also the Project Leader of the RANGE project with a total budget of more than U.S. $8M investigating high-performance centimeter/millimeter-wave antennas for 5G mobile phones. He has been one of the pioneers in establishing over-the-air measurement systems. The measurement technique is now well established for mobile terminals with single antennas and he was chairing the various COST groups with liaison to 3GPP and CTIA for the over-the-air test of MIMO terminals. He is also involved in MIMO OTA measurement. He has published more than 500 peerreviewed papers, six books, and 12 book chapters, and holds over 50 patents. His research interests include radio communication for mobile terminals especially small antennas, diversity systems, propagation, and biological effects.