On Segmented Predistortion for Linearization of RF Power Ampliﬁers

. This paper presents a general survey of digital predistortion (DPD) techniques with segmentation. A comparison of global DPD with two segmented approaches namely Vector-Switched DPD and Decomposed Vector Rotation DPD is presented with the support of experimentation on a strongly non-linear 3 ways Doherty PA. It shows the interest of both segmented approaches in terms of linearization performance, complexity and ease of implementation compared to the global DPD. The paper starts with some mathematical generalities on interpolation and splines. It focuses on segmented models derived from Volterra series even if the presented principles can also be applied to neural networks.


Introduction
Digital predistortion is an efficient technique to linearize power amplifiers (PA) in wireless transmitters. It is widely used in base stations of cellular communication and broadcast systems. Power amplifiers are critical elements of radiocommunication systems because their power efficiency conditions the autonomy and cost of equipments and their linearity influences communication performance. New waveforms proposed in order to improve spectral occupancy exhibit very high crest factors and are very sensitive to PA nonlinearities. But to achieve a good power efficiency, it is necessary to operate the PA in a nonlinear region. Linearization techniques are therefore necessary to limit in-band signal distortion and out of band spectral regrowth.
The principle of digital predistortion consists in processing the baseband complex envelop of the PA input signal by a predistorter (DPD) so that the cascade of the DPD and PA becomes linear up to a certain amplitude value. the DPD should have inverse characteristics of those of the PA to be able to pre-compensate for nonlinear distortion and dynamic behavior (memory effects) of the PA. Many models of nonlinear dynamic system have been proposed for the DPD [1], [2]. They are discrete-time baseband models with complex input and output signals. Most of them are derived from truncated Volterra series (Memory Polynomial models (MP) [3], Generalized Memory Polynomial models (GMP) [4], [5]) or dynamic Volterra series (Dynamic Deviation Reduction-Based models (DDR) [6]) with limitation to finite memory lengths and nonlinearity orders. Neural networks are also potential candidates but they require nonlinear optimization techniques to update their coefficients which is not very appropriate for real-time adaptive DPD.
Volterra series present two interesting properties for the modeling of nonlinear dynamic systems: generality and linearity of the model in function of their coefficients which simplifies their identification. But their number of coefficients increases dramatically with memory depth and order of nonlinearity. Moreover they are generally built with nonorthogonal basis based on monomials which leads to bad numerical properties for their identification especially for high-order nonlinearity.
Models derived from Volterra series such as GMP or DDR have proven their effectiveness for numerous applications using mildly nonlinear PA such as class AB PA. But advanced architectures of power transmitters with good efficiency such as Doherty PA, envelop tracking, switched or out-phasing PA exhibit strongly nonlinear dynamic behavior more difficult to model. Also, new communication systems allow for very high data rate by using very wide bandwidth multidimensional signals (e.g. 4G and 5G systems with carrier aggregation and MIMO). It represents new challenges for DPD in terms of bandwidth, nonlinearity and dynamic behavior. It becomes difficult for a global DPD model to achieve an accurate representation of the system with good numerical properties and low computational complexity. This has moved research interest towards different local modeling approaches in which the global operating space is split into several subspaces represented by local models well suited to each sub-space. These local models have to be joined in some way to cover the global space. One of the motivations is to decompose a complicated problem into several simpler ones.
Another one is to use the locality to obtain quasi-orthogonal basis. The decomposition (segmentation) can be applied on temporal signal magnitude and phase, on signal spectrum, on the system. This segmented or piecewise DPD approach raises different questions such as: • how to partition the original space, • how to determine good models for the different segments of the partition, • how to handle DPD operators with complex inputs, • how to handle the dynamic aspects, • how to estimate the coefficients of these local models and how to represent them with sparsity, • how to join the local models.
Piecewise approximation is not a new idea. There has been a lot of works in particular on piecewise interpolation or approximation of nonlinear real-valued functions, e.g. using splines. The application of these theories to the case of DPD is not straightforward because of two main reasons: • the DPD is not a simple function. It is a dynamic nonlinear system; • the input and output of the DPD are complex signals. The DPD is a complex-valued operator of multiple complex variables.
This paper presents a general survey of DPD techniques with segmentation in the temporal domain. It also presents an experimental comparison of different approaches. It focuses on segmented models derived from Volterra series even if the presented principles can also be applied to neural networks. It starts with some mathematical generalities on interpolation, approximation and splines (Sec. 2). In Sec. 3 it gives some generalities on modeling and training of DPD. Section 4 focuses on segmented DPD with functions of a single real-valued variable. Section 5 is dedicated to segmented DPD that can manage nonlinearity and memory domains. Section 6 briefly presents some advanced segmented multidimensional DPD for multiband or MIMO applications. Section 7 is devoted to an experimental comparison of two of the most promizing segmented DPD techniques. Section 8 is the conclusion.

Polynomial Interpolation/Approximation
For a function y = f (x) : [a, b] → R and a set of N + 1 points {(x 0 , y 0 ), (x 1 , y 1 ), · · · , (x N , y N )}, there is a unique polynomial interpolator of degree N p(x) such that p(x i ) = y i . This polynomial can be expressed in different ways, e.g. using the Newton's divided difference formula or the Lagrange interpolation formula. The interpolation polynomial of f , can be written using the basis made of Lagrange polynomials L N ,i (x), as: The approximation error is e(x) = f (x) − p(x) and for f (x) (N + 1) times differentiable: Weierstrass has shown that f can be uniformely approximated by a polynomial but this is not the interpolation polynomial. Indeed, even if the interval between points x i is reduced and the degree of p(x) is increased, p(x) does not converge towards f (x). This is called Runge's phenomena.
The approximation error depends on the product v(x) = (x−x 0 ) · · · (x−x N ) whose value is related to the segmentation points x 0 , x 1 , · · · , x N . Tchebychev has studied the segmentation that minimizes the maximum of |v(x)| on the interval [a, b]. Tchebychev's alternance theorem states that the monic polynomial u(x) of degree n that minimizes L = max |u(x)| (on a given interval) takes alternatively the values ±L, n + 1 times. The best solution is u(x) = 2 −n T n (x) where T n is the Tchebychev polynomial of degree n. On the interval [−1, +1], it is defined by: The polynomial T n (x) has n roots x k , T n (x k ) = 0.
We can deduce that the segmentation x 0 , x 1 , · · · , x N that minimizes the maximum of |(x − x 0 ) · · · (x − x N )| on the interval [−1, +1] corresponds to the roots of the Tchebychev's polynomial of degree N + 1. They are given by: The solution for the interval [a, b] is obtained as: Using this segmentation, the approximation error using the Tchebychev segmentation is bounded by: This bound is 4 N +1 /2 times smaller than that of the general case. This highlight the importance of the segmentation to minimize the approximation error. For example, Fig. 1 shows the interpolation by a polynomial of degree N = 10 of the function y = 1/(1 + x 2 ) for x ∈ [−5, +5] using a uniform segmentation or a Tchebychev segmentation.
The Lagrange polynomials L N ,i (x) using Tchebychev segmentation are interesting because their supports are well localized around points x i . So they are quasi-orthogonal which is not at all true for uniform segmentation. Figure 2 shows the Lagrange polynomials of degree 10 with a uniform or a Tchebychev segmentation.
In the case of the approximation of a data sequence of M samples (x k , y k ), k = 0, · · · , M − 1 with x k ∈ [a, b] by a polynomial p(x k ) of degree n, p(x k ) can be expressed using different basis corresponding to different segmentations of [a, b]. If we call respectively L i,Tch (x k ) and L i,Uni (x k ), i = 0, · · · , n, the Lagrange basis with Tchebychev or uniform segmentation, the approximation polynomial p(x k ) is: If a least-square criterion is used, minimizing M k=1 (p(x k ) − y k )) 2 , the vectors c and d containing coefficients c i or d i are obtained as solutions of : The matrix y is the M × 1 column vector of samples y k , k = 0, · · · , M − 1, Φ is the matrix M × (n + 1) of the basis functions Φ(k, i) = L i (x k ) and Φ T is the transpose of Φ. In theory, whatever the chosen basis, the optimal solution for p should be the same. But the condition number of the correlation matrix R strongly depends on the basis. For example, for a polynomial of degree n = 35, the condition number of the correlation matrix R is greater than 10 16 for the uniform segmentation while it is smaller than 30 for the Tchebychev segmentation. Therefore due to those bad numerical properties, the quality of the approximation obtained with uniform segmentation can be degraded specially when computation is done in single precision. The good locality of Lagrange polynomials with Tchebychev segmentation has been exploited for DPD modeling by Barradas et al. in [7][8][9]. Details will be given later.
Another interesting point to remind is the influence of noisy data y i on the approximation results. It is important for example to better understand the influence of the measurement noise on the PA output when training the DPD using indirect learning approach (see Sec. 3). It is well known that the noise will introduce a bias on the coefficients of the DPD. Suppose that the data y i are corrupted by some noise i , with | i | < . The corrupted data are notedŷ i = y i + i and the interpolation polynomialp. The errorp(x) − p(x) can be bounded with Lebesgue constant that depends on the segmentation nodes: the polynomial degree and is much smaller for Tchebychev segmentation than for uniform segmentation.
Lagrange interpolation only requires that p(x i ) = f (x i ) for i = 0, · · · , n. Other polynomial interpolations request additional conditions on the derivatives of f . For example, Hermite interpolation requires that the 1 st derivatives of f and p be equal at points x i . For these 2(n + 1) conditions, there is a unique polynomial solution p(x) of minimum degree 2n + 1. The approximation error e(x) = f (x) − p(x), for f (x) (2n + 2) times differentiable, is such that: Hermite interpolation does not guarantee a uniform approximation of f when n increases. But Bernstein polynomials associated to a continuous function f and noted B n f (x), converge uniformly towards f when the degree n increases. They correspond to a uniform segmentation and are expressed, for x ∈ [a, b] as: The Bernstein basis polynomials of degree n are defined by B k,n (x) = n k (x − a) k (x − b) n−k , k = 10, · · · , n. Figure 3 shows Bernstein basis polynomials for n = 10 on the interval |−5, 5]. It can be seen that they are quite well localized around points x i . Also the sum n k=0 B k,n (x) = 1 ∀ x ∈ [a, b]. The Bernstein approximation is obtained by convex linear combination which leads to good numerical properties. Figure 4 shows the approximation by a polynomial of degree n = 10 of the function y = 1/(1 + x 2 ) for x ∈ [−5, +5] using a Bernstein polynomial (uniform segmentation) or a Lagrange polynomial with Tchebychev segmentation.   On this example, it can be seen that the approximation error obtained with the Bernstein polynomial is larger than that obtained with Lagrange interpolation and a Tchebychev segmentation. But there is no Runge's phenomenon in the Bernstein approximation and, if f (x) is continuous, it converges uniformly (even if rather slowly) towards f (x) which is not true for Lagrange interpolation. An interesting way to obtain a faster convergence of the approximation is to use piecewise approximation.

Piecewise Polynomial Approximation
Approximation of functions by a single polynomial has some limitations, such as Runge'sphenomenon for Lagrange uniform interpolation, necessity of high polynomial degree polynomials leading to high computation complexity and potential numerical problems. Piecewise approximation is a way to overcome some of these limitations. It allows to take advantage of the regularity of the function in a limited region.
To approximate a function y = f (x), x ∈ [a, b], the basic idea of piecewise polynomial approximation is to partition the interval [a, b] in N segments over which f is approximated by a polynomial. As the segments are shorter than the length of [a, b], f should have a more regular shape on each local segment than on the global interval and it should be possible to use polynomials of smaller degree than for the global polynomial approximation. Without additional constraints, the piecewise approximation may be discontinuous. Generally, some regularity is required for the approximation and a popular approach is the approximation by spline functions. There are two different ways to present splines. The first one consists in considering piecewise polynomials with continuity constraints on the function and its first derivatives at the borders of the segments. In this first approach the regularity constraints can be different for the different nodes. The second approach is based on a trade-off between the accuracy of the approximation and its regularity. In the second approach, the natural splines solution is obtained for a set of N data (x i , y i ) N i=1 by approximating f by a function h optimizing the criterion: where λ > 0 is a smoothing constant. The solution of this problem is a piecewise function made of polynomials of degree 2m − 1 in segments defined by boundaries x i and satisfying continuity at the boundaries for the function and its 2m − 2 first derivatives. With m = 2, this gives cubic splines.

continuous derivatives) and for a given set s of non-decreasing segmentation values (knots)
Cardinal splines correspond to an infinite number of knots with unit spacing.
With cubic interpolation, for a given set ( i=0 to determine and a set of n + 1 linear equations h(x i ) = y i plus 3(n − 1) continuity equations which gives a total of 4n − 2 linear equations. So there are 2 possible degrees of freedom that correspond to different types of splines. For example, natural splines are defined by setting h " (x 0 ) = 0 and h " (x n ) = 0. Natural splines minimizes the integral of the squared second derivative of h. S d,s is a vector-space of dimension n+ d. An interesting basis is that of B-splines (Basis-spline) [10]. B-splines are splines functions with minimum support [x i , x i+d+1 [. They can be defined by a recurrence relation for a given degree of polynomials d and a partition s of the interval [a, b] with n + 1 knots and n > d + 2. The recurrence relation defining The B-spline B i,d,s can be expressed as a concatenation of d + 1 successive polynomials pieces of degree d. It is To obtain a full basis of B-splines functions on [a, b], we must had 2d basis-functions whose support is partly inside [a, b], with usually d nodes equal to a and d nodes equal to b. A knot has a multiplicity m if it is repeated m times in the sequence.
Any spline function h d,s can be written as a linear combination of B-splines: Figure 5 shows an example of cubic B-splines defined for x ∈ [0, 1] with a uniform segmentation and a sequence of 9 knots. The basis functions are well localized around the knots and are therefore quasi-orthogonal. It also shows another spline basis proposed in [11] that keeps a very short support when d increases.  [11] (bottom) with uniform segmentation.
Splines functions can be used to approximate functions generally defined by a set of data points. The approximation can be based on interpolation or on minimizing a given criterion. A commonly used approximation criterion in the field of DPD is the least-square criterion. The approximation by a spline function with LS criterion can be achieved as in (1) and (2). in the case of the approximation of a data sequence by a spline function h(x) of degree d and segmentation s, h(x) can expressed as a linear combination of B-splines with coefficients c i : For LS critera min M k=1 (h(x k ) − y k ) 2 , the vector c of coefficients c i is obtained as the solution of (3), where y is the M × 1 column vector of samples y k , k = 0, · · · , M − 1 and Φ is the matrix M × (n + 1) of the B-spline functions.

Generalities on DPD
A general presentation of digital predistortion can be found in [1]. Figure 6 shows a basic tansmitter architecture using adaptive DPD. Volterra based models are very common for DPD. Most of them can be expressed as a linear combination of regressors that are derived from the input signal z(n). For example, for the generalized memory polynomial model (GMP) , the output y(n) is expressed as: It can be written as a dot product between the N c ×1 vector d of all the coefficients (a k,l , b k,l,m , c k,l,m ) and the N c × 1 For a set of N signal samples, the column vector y = (y(n), · · · , y(n − N + 1)) T is equal to: where Φ z is the N × N c matrix of regressors, φ i (n) is the i th basis function, N c is the number of coefficients.
There are two general approaches for the identification of coefficients, namely the direct (DLA) and the indirect learning (ILA) approaches.  DLA approach tries to minimize a criterion based on the error e DLA (n) = x(n) − y 0 (n) where x is the DPD input and y 0 (n) = y(n)/G is the normalized output with a reference gain G (Fig. 7). Many of DLA algorithms are based on first identifying the PA model and then inverting it or use it to train the DPD.
ILA approach (Fig. 8) is based on first solving a postdistortion problem and then using the postdistorter as a predistorter. The postdistorter is a fictive block placed after the PA that corrects the normalized PA output in order to minimize a criterion based on error e ILA = (z(n) − w(n)) where z in the PA input and w(n) is the postdistorter output (Fig. 8). Using a LS criterion J = n i=n−N +1 |e ILA (n)| 2 on N observation samples correponding to the PA output and input (y 0 (i), z(i)) n i=n−N +1 , the coefficients of the postdistorter that minimize J are solutions of a linear set of equations. The optimum coefficients vector is obtained with the pseudo-inverse of the regression matrix Φ y 0 built on y 0 (5).
where Φ H y 0 is the hermitian transpose of Φ y 0 . ILA approach is popular because the coefficient identification is a linear optimization problem. However, the measurement noise at the output of the PA introduces a bias on the solution. The DLA approach does not suffer from this drawback but it leads to a nonlinear optimization problem.
Both approaches depends on matrix R = Φ H y 0 Φ y 0 . Unfortunately, the successive signal samples are usually correlated and therefore matrix R is badly conditioned, especially when the the nonlinearity orders are high. To manage this problem, different methods can be used: regularization techniques such as L 2 norm Tikhonov (ridge regression), orthogonal polynomial basis, orthogonalization (e.g. Gram-Schmidt technique), dimension reduction by suppressing less-significant basis functions (e.g. orthogonal matching pursuit algorithm (OMP)). Another possible approach is to segment the problem into several problems of smaller dimensions. Different types of segmented DPD have been proposed. They are presented in the following sections.

Segmented DPD with Functions of a Single Real-Valued Variable
We have seen in Sec. 2 some results about interpolation or approximation of functions by polynomials or piecewise polynomials in the case of functions of a single real-valued variable. But in general, DPD are not such simple functions. First the input signal is complex-valued and secondly the DPD has to take into account the memory effects of the PA. It is ruled by nonlinear differential equations. It can be simplified by considering that the memory length is finite. The DPD can then be represented by a multivariate function of complex-valued variables. The complexity is much higher than for a function of a single real-value input variable.
Fortunately there are several cases, where simplifications can be done allowing to represent the nonlinear aspects of the DPD by a simple single-variable function of a real variable. This is object of the next two sections.

Piecewise Modeling in Quasi-Memoryless Models
In this section we will address the cases of quasimemoryless models and of block oriented models separating nonlinearity from memory effects. These are special cases where single-variable real-valued functions can be used for the DPD model.
The first works on DPD were dedicated to signals with narrow bandwidths for which PA memory effects could be neglected. In that case, the quasi-memoryless PA can be modeled with its AM/AM and AM/PM characteristics as functions of the magnitude (or power) of the input signal z(n). The baseband equivalent of the PA output can be expressed as y(n) = G PA (|z(n)|)z(n) where the magnitude and the phase of the gain G PA respectively represent the AM/AM and AM/PM of the PA. A major breakthrough was realized by Cavers [12] who proposed to model the DPD as a simple complex gain G PD depending only on the magnitude of the input signals and allowing to compensate for the AM/AM and AM/PM distortion of the PA. So, the output z(n) of the DPD, for an input signal x(n) can be written as: The DPD complex gain should pre-compensate the PA gain so that the PA output be proportional to the input signal x with a reference gain G, which means: The DPD corrective complex gain can be implemented by a Look-up-Table (LUT) [12] or by a polynomial function [13]: The content of the LUT or the coefficients of the polynomial are adaptively updated by DLA or ILA. In practice, either the DPD is directly updated or the PA gain is first estimated and then inverted to obtain the DPD.
For polynomial DPD, the DPD output is directly obtained by (6). For the LUT case, the LUT is addressed by the quantized value of some companding function of the input. If we note (x k ) N −1 k=0 the different possible quantized values, the LUT contains the corresponding gain G PD,k = G PD (x k ). For an input value x, the corrective gain is obtained by some interpolation or approximation of the function G PD (x) from the data (x k , G PD (x k )).
In order to cope with complicated PA characteristics, it is possible to increase the polynomial order or the LUT size but at the price of introducing ripples in the function (Runge phenomenon), increasing training convergence time, complexity and numerical problems. Therefore, different segmented approaches have been proposed: piecewise linear regression of the AM/AM and AM/PM of the PA gain [14], piecewise polynomial modeling of AM/AM and AM/PM in two regions (with a high order polynomial for the saturation region and a low order one for the linear region) [15], cubic spline interpolation [16], [17], piecewise interpolation by arc of circles [18], piecewise bilinear rational function [19] of the PA gain with piecewise inverse of the PA model to obtain the DPD model. An advantage of piecewise linear and piecewise bilinear function is that they can be easily inverted. For example, in the case of a bilinear function, if the PA model, for an input z and an output y is defined by: the inverse model used for the DPD (with input x and output z) is given by: In most of those studies, the segmentation (number of segments, position of the knots) and degrees of polynomial segments are determined in an empirical way. In [20], authors use AM/AM derivatives to segment it into linear, nonlinear and saturation regions.
Cavers worked on the optimal LUT-spacing [21]. He derived the optimum companding function of input magnitude for table indexing. It depends on the signal statistics and on PA characteristics. He showed that for a class AB PA, equispacing by amplitude is closed to the optimum. In [22], authors propose a non-uniform LUT indexing function that allows for a signal to quantization noise that does not depend on input signal statistics or power backoff of the PA. In [23] a segmented approach is used with more LUT entries in the strongly nonlinear segments than in the linear ones. In [24] the LUT-spacing is dynamically optimized in function of online estimated PA characteristics and the input signal statistics using histograms to approximate the signal statistics. In [25] and in [26], the authors theoretically study respectively the optimal spacing of piecewise linear LUT DPD and of a quadratically interpolated LUT DPD.

Piecewise Modeling in Block-Oriented Models
In order to take into account PA memory effects, one possible approach is to use block oriented nonlinear models (BONL) separating nonlinearity from memory effects such as Wiener, Hammerstein, Wiener-Hammerstein models. These models associate in cascade or in parallel several linear time-invariant filters (LTI) that represent the dynamics of the system and static nonlinear (SNL) blocks. A Wiener model is made of a cascade of an LTI followed by an SNL block and an Hammerstein model of an SNL block followed by a linear filter. One drawback of BONL models is that the identification of their parameters has to be done by nonlinear optimisation technique. But an interesting point is that SNL blocks can be represented as complex gains that are single-variable functions of their input signal magnitude. And therefore, all the piecewise techniques presented in 4.1 can be applied to the SNL blocks of BONL models.
In [27], the PA is modeled by a Wiener model where the SNL block is represented by a simplicial canonical piecewise linear (SCPWL) function [28]. Two SCPWL functions are used respectively for AM/AM and AM/PM characteristics. The DPD is an Hammerstein model. The inverse of the SNL-block of the PA model is also a piecewise linear function that can be easily obtained by inverse coordinate mapping. In [29], a piecewise linear predistorter is also proposed. the parameters of the SNL block are estimated by particle swarm optimization (PSO).
In [30], the authors apply a Wiener model to the PA. The SNL block of the PA Wiener model comprises an AM/AM characteristic that is modeled by a piecewise linear continuous and monotonically increasing function and an AM/PM characteristic represented by a piecewise constant function. The segmentation is determined empirically. The DPD is modeled by a memory polynomial model. A direct learning adaptive DPD is proposed. First the PA model is identified using RLS (recursive least square) algorithm. There are two parameters to identify for each segment of the SNL block. The inverse of the PA model is easy to obtain. It is an Hammerstein model with a piecewise SNL block. The DPD parameters are obtained by a piecewise RLS (PRLS) algorithm. The error to minimize is calculated thanks to the inverse function of the PA. This segmented approach leads to a direct learning adaptive algorithm that is less complex than common direct learning algorithms but it offers the same level of performance.
In [31], two Hammerstein models with Catmull-Rom cubic spline static nonlinearity are used for the DPD. One for correcting AM/AM distortion and the other for correcting AM/PM distortion. The DPD coefficients are identified using ILA with a separable nonlinear least squares (SNLS) optimization [32]. The segmentation is optimized empirically. The approach allows to compensate for high-degree nonlinearity with a limited set of coefficients.
In [33], we proposed a BONL DPD called "Filtered LUT" or FLUT. The FLUT DPD is made of an SNL block followed by a linear filter. But at the difference of Hammerstein models, the filter coefficients vary with the magnitude of the DPD input signal. The SNL block is implemented by a linear piecewise LUT (gain-LUT). A codebook stores the coefficients of the different filters. Both the gain-LUT and the codebook of filters are indexed by the magnitude of the input signal with a uniform companding function.

Piecewise Modeling in Volterra Based Models
Models derived from Volterra or from dynamic Volterra series can be reformulated (or generalized) with nonlinear single-variable functions of the input-signal magnitude.
For example, an MP DPD model with input x and output z defined as: can be reformulated with L single-variable nonlinear function f NL,l as: a k,l x(n − l) f NL,l (|x(n − l)|).
These single-variable nonlinear functions f NL,l can be approximated by different piecewise functions; A DPD model where the f NL,l are modeled by complex-valued cubic splines is proposed by Safari et al. in [34] and compared with MP model. The obtained piecewise model is linear with respect to its coefficients and can be identified using LS criterion and by ILA approach with similar equations as (3) and (5). It shows better results than MP model for a smaller number of coefficients. The same kind of approach is proposed in [35] using 2 nd order nonlinearity piecewise MP and a single knot.
The same method can be applied to reformulate GMP, DDR or any other Volterra based models. The principle is to first do the summation on the nonlinearity orders. In a GMP model, each of the 3 terms of (4) can be reformulated in the same way, e.g., for the second term: There are M L single-variable nonlinear functions f NL,l,m .
Barradas et al. in [9] show how this approach can be applied to any Volterra based model. They approximate the f NL,l,m by cubic splines functions constructed with B-splines. They suggest to use Tchebychev nodes as splines knots. They show that the numerical stability (matrix conditionning) of ILA identification with this approach is much better than with GMP model for similar performance and complexity. In [7] they expound that the high locality (limited support) of nonlinear splines basis makes them quasi-orthogonal which explains their better numerical properties. They develop their analysis by a theoretical comparison of polynomial and LUTs in PA modeling.
This approach is applied in [36] with f NL,l,m functions approximated by piecewise Lagrange (APL) basis functions.
Authors of [37] consider the determination of DPD as a multivariate regression problem and use the fact that any multivariate function can be approximated by a sum of separable functions to express a very general form of DPD with single-variable real-valued nonlinear functions. These functions can then be piecewise approximated by splines.
In [38], Zhu et al. propose a different approach to cope with envelop tracking PA, the behavior of which changes significantly in functions of power region. The technique is called "Decomposed Piecewise Volterra Series". It is based on vector threshold decomposition of the input signal x(n) = |x(n)|e jφ . It was initially proposed by Heredia for real-valued signals in [39]. For a given set of real positive increasing thresholds (λ i ) N i=1 and λ 0 = 0, N sub-signals x i are obtained, with: Every sub-signal is processed by a specific sub-DPD model (here DDR models are used). And the outputs of all these DPD are summed to obtain the global predistorted signal. In [40], the authors apply GMP models for the sub-DPD and the thresholds are determined using the slope and the rate of slope of the AM/AM characteristic. In [41], the same technique is used with a learning algorithm that decorrelates the DDR polynomial basis functions and is applied on each sub-DPD independently.

Segmented DPD for Nonlinearity and Memory Domains
In Sec. 4, the segmentation is applied on the nonlinearities only. But for some types of PA, e.g. Doherty PA, the memory effects are different at different power levels. In such cases, it may be useful to also segment the memory domain. In this section, we present different approaches that are not limited to the piecewise approximation of single-variable functions for nonlinearities. These approaches partition the global space of the input signals and fits a piecewise DPD to each of the regions of the partition. In this section, we present three techniques: Vector-Switched (VS) DPD, Continuous Piecewise Linear (CPWL) DPD and Decomposed Vector Rotation (DVR).

Switched DPD
Switched DPD allows to derive piecewise DPD taking into account nonlinearity and memory effects and applying the segmentation in both domains. The principle consists in switching several DPD models, each of them being well suited to a specific segment (or region) of the input signal. The continuity between the different models is a delicate notion. There are at least two questions: how should this continuity be defined and is it really necessary. Indeed, we can consider continuity with time and continuity with magnitude. When the models include some kind of memory, the final condition of one model can be used as the initial conditions of the next model, which ensures some kind of temporal continuity.
Switched DPD allows to use in each region a DPD model of smaller complexity (nonlinearity orders, memory lengths) than would be necessary with a global model. It has the ability to represent hard nonlinearities and to identify the region-DPD models with good numerical stability.
In [42], Afsardoost et al. propose the vector-switched (VS) DPD model. the VS model is a set of DPD models that can be switched and applied to the input according to some switching function based on the region of the input signal. The space of input signals x(n) is partitioned into N regions using vector quantization (VQ). VQ is applied to vectors of Q successive complex input samples X (Q) (n) = {x(n), x(n − 1), . . . , x(n − Q + 1)}. For an input sample x(n), the class of X (Q) (n) is determined and the corresponding model is chosen for x(n). Authors note that it is generally sufficient to use Q = 2 and to apply VQ on the input magnitude only.
Authors of [41] apply this approach with a set of GMP models and a learning algorithm that decorrelates the GMP polynomial basis functions. This training is applied on each DPD model independently using the input samples of the corresponding region. The VQ segmentation is achieved on the magnitude of the input signal with Q = 1 .

From PWL to Memory-SCPWL DPD
Piecewise linear functions and in particular simplicial canonical piecewise linear (SCPWL) functions were first applied to approximate functions of a single real-variable variable such as those presented in Sec. 4, in static cases to model AM/AM and AM/PM characteristics of quasi-memoryless PA [43], [44] or in dynamic cases to represent nonlinearities functions in Volterra based models [45]. Then they were also used to model functions of complex input [46] with the name memory-SCPWL functions. A PWL representation has several interests. In particular, thanks to its linear affine property, it can be inverted very easily. A PWL function can be described segment per segment, which may require a large number of coefficients. The number of coefficients can be reduced by using a global representation called Canonical PWL (CPWL) function [47] or an even more compact form called Simplicial Canonical PWL (SCPWL) function [28].
A SCPWL function f of a single real-variable x, is given by: where σ is the number of segment breakpoints, c i are coefficients, and λ i (x) are basis functions defined with a set of increasing breakpoints values (β i ) σ i=1 by: The coefficients c i can be complex-valued.
Cheong et al. in [46] modified that expression to make it suitable for modeling nonlinearities and memory effects. For a given memory length L, considering the last L input samples (x(n − l)) L l=0 , the new form is: This new form is linear with respect to its coefficients c l,i which simplifies its identification. Replacing |x − β k | where x is real-valued by ||x| − β k |e j arg (x) where x = |x|e j arg (x) is complex-valued, is called Decomposed Vector Rotation (DVR). Authors of [46] compared the memory-SCPWL DPD with MP and GPM DPD. They showed that memory-SCPWL DPD offers better modeling accuracy for sharp nonlinearities and that it is less sensible to noise at PA output thus reducing the potential bias on coefficients for ILA identification.
Of course, CPWL functions can also be used with the generalized form of Volterra DPD using function of single real-valued variables [48]. Zhu [49] has extended the memory-SCPWL model given by (7) with a more general model called Decomposed Vector Rotation DVR model. It starts from the representation by CPWL of a finite memory nonlinear system with real-valued input and output signals z and y. CPWL achieves a partition of the input signal space into K polyhedral regions separated by hyperplanes whose boundaries are defined with thresholds (β k ) K k=1 . The input-output relation is expressed as:

Decomposed Vector Rotation (DVR) Models for DPD
where L is the memory length. The hyperlanes are defined by L l=0 a k,l x(n − l) − β k = 0. CPWL can approximate a wide range of continuous nonlinear function with a very good accuracy [50]. Unfortunately, this model is not linear with respect to its coefficients (a l , b, c k ) and it is not directly usable for complex-valued input signal x. To overcome these two limitations, Zhu proposed a new formulation: In order to introduce interaction between signals at different time instants, some other terms can be added to the model. Depending on these added terms, the model is more or less complex. One possible expression is: In [51], Zhu discusses the respective interests of Volterra series and CPWL functions for DPD.
Authors of [52] propose a modification of the DVR model avoiding the calculation of absolute values. And the same kind of modified model is used in [53] for linearization of radio over fiber link.

Advanced Segmented DPD for Multidimensional DPD
In this section, we briefly discuss some more advanced points such as application of segmented DPD to systems using carrier aggregation with concurrent multi-band transmit-ters or MIMO techniques. Both cases correspond to multipleinputs DPD (or multidimensional DPD).
In systems using carrier aggregation with concurrent multi-band transmitters, the input signal includes several frequency bands that can be widely spaced in the non-contiguous intra-band or inter-band aggregation. It is possible to consider the multi-band signal as a unique one-dimensional (1D) single-band signal and to design the DPD accordingly but it will result in the use of very high sampling frequency and lead to very high complexity. Another better approach consists in splitting the 1D signal into N sub-signals corresponding to each of the N band. This is a segmentation in the frequency domain. The sub-signals have a much narrower bandwidth than the 1D original signal and can be separately digitized at a reasonable sampling rate. Separate feedback paths are used for the different sub-signals. Most often, N = 2 or 3. The N-dimension DPD must take into account the different subsignals to construct the predistorted signals corresponding to each carrier and suppress the cross-distortion generated between the sub-signals [54]. For example, for a 2D-DPD with two input sub-signals x 1 and x 2 and two output predistorted sub-signals z 1 and z 2 , a possible DPD model is: This multi-dimensional MP model have a large number of coefficients. Therefore it can be interesting to use piecewise models to decrease the number of necessary coefficients. Naraharisetti et al. [11], [55] have reformulated the model in a more general form that is expressed for z i with i = 1 or 2 as: The nonlinear gain functions G (i) m are functions of the two real-valued signals |x 1 | and |x 2 |. They can be approximated by different piecewise functions and in particular using spline basis functions.
2D cubic splines basis are used in [11], [55] and extended to 3D cubic spline basis in [56]. The coefficients of the cubic spline basis are obtained by LS fitting of the measured data. Using the cubic spline basis functions φ j,k , the gain functions are approximated by: Naraharisetti proposed the following formulation for the splines basis [11], [55]. For a given set of 2D-knots (|x 1,u | 2 , |x 2,v | 2 ), the basis functions φ j,k (|x 1 | 2 , |x 2 | 2 ) are created with 2D interpolation cubic spline such that: The 2D cubic spline basis are built from 1D Cubic spline basis by tensor product. The number of coefficients is equal to M N s1 N s2 .
In [57], instead of segmenting the model using single input magnitude, the partioning is done using vector quantization on combined input magnitudes. In [58], [59], 2D CPWL models are proposed.
The case of DPD for MIMO transmitters is quite similar to that of multi-band transmitters. nonlinear crosstalk effects due to coupling at the inputs of the elementary PAs create cross-modulation of the elementary signals. In [60], multivariate polynomial models are used. To reduce the complexity of such DPD, different piecewise approaches are studied. For example in [61], [62] Dual-Input Canonical Piecewise-Linear DPD are presented for MIMO applications with two antennas.
A new promising approach for multidimensional DPD with Multiple input multiple outputs, called Tensor-Network-Based DPD, is proposed in [63]. This new development should facilitate the identification of Volterra series with very high memory lengths and memory orders. They are based on the tensor-network based MIMO Volterra system framework of [64].

Experimental Comparisons
In this section, we give some experimental comparisons of three types of DPD: a non-segmented GMP model, a vector-switched DPD and a DVR DPD.
The experiments have been done with a strongly nonlinear 3-way Doherty PA based on three LDMOS transistors (BLF7G22LS-130 from Ampleon). Its maximum peak output power is 57 dBm. Its linear gain is 16 dB. The PA is preceded by a driver with a gain of 31.5 dB. The used carrier frequency is 2.14 GHz. The input signal is an LTE with a bandwidth of 20 MHz and a PAPR of 8 dB. The tests were achieved at an average PA output power equal to 47.3 dBm. For the presented results, training has been done with ILA.

Non-Segmented GMP-DPD
The structure of the GMP model (values of the eight parameters (K a , L a , K b , L b , M b , K c , L c , M c )) is determined by an hill-climbing (HC) optimization algorithm that we proposed in [65]. This algorithm optimizes a trade-off between the number of coefficients and the accuracy of the model. The obtained GMP structure is given by: . It has 38 coefficients. The obtained normalized mean square error (NMSE) is equal to −32.7 dB. NMSE is defined as the ratio between the power of the error and the power of the output signal. The AM/AM and AM/PM characteristics with and without the GMP DPD are given in Fig. 9.
To improve the results, we can try to increase the number of coefficients to the detriment of complexity. But the HC algorithm shows that the results cannot be significantly improved by increasing the number of coefficients. For example with 80 coefficients, the NMSE is only equal to −33.9 dB (only 1 dB better than with 38 coefficients).
By inspection of the AM/AM characteristic, we can clearly distinguish different areas with different slopes (gains). The segmented approaches may be good candidates to improve performance without significantly increasing the complexity.

Vector-Switched DPD (VS-DPD)
To apply Switched-Vector DPD with have to determine the good segmentation and the structures of models in each segment. We have chosen GMP models for each segment (or VQ class) with 14 coefficients and a maximum order of nonlinearity equal to 5 (instead of 38 coefficients and a maximum nonlinearity of 11 in the global GMP model). We used the same structure for all of the segment-models because the FPGA implementation will be sized by the most complex of those structures.
We have compared several types of segmentation: • scalar uniform quantization, scalar non-uniform Lloyd-Max segmentation,vector-quantization (VQ) with different vector dimensions, • training of the segmentation codebook on the PA input or output signal, • segmentation determined by the signals or by the PA characteristics.
For each type of segmentation, with have tested different values for the number of segments We observed that segmentation determined by the signals gives better results than segmentation driven by PA characteristics (slope variation in the AM/AM characteristic). So we focussed on segmentation determined by the signals. Learning the VQ codebook with the PA normalized output signal y 0 is slightly better than with the original input signal x. But it is easier to train with the input signal so we trained VQ on input signal. Morover, quantizing complex signals does not improve results compared to quantizing signal magnitude. So we applied quantization on signal magnitude.   Figure 10 illustrates how varies the NMSE in function of the number of segments N S for different types of quantization and VQ vector dimension. It can be seen that for scalar quantization and N S > 10 , using Lyod-Max quantizer instead of uniform segmentation only slightly improves the results. But using VQ segmentation with a vector size equal to 2 or 3 instead of scalar segmentation, improves the NMSE by approximately 1 dB. A vector size equal to 2 is sufficient. The results improve slowly when increasing the number of classes but this number must remain small enough in order that in each training buffer of N samples, the population size of each class is large enough for a good identification of the class-model. On the same Fig. 10, we have added the result obtained by the global GMP-DPD with 38 coefficients. We see that it is possible to achieve a similar NMSE with VS-DPD using 4 segments and VQ Segmentation or Lyod-Max scalar segmentation each segment corresponding to a model of 14 coefficients. Figures 11 and 12 show the result of the segmentation of AM/AM characteristic into respectively 12 and 4 classes with VQ (vector size = 2) or scalar Lyod-Max quantization.  Concerning the implementations, for VS-DPD with 12 segments we have to store 168 (12 × 14) coefficients instead of 38 coefficients for global GMP-DPD but this is not a problem because it remains very small compared to the memory size of common FPGA. But VS-DPD has many advantages compared to global GMP-DPD. First, the realtime computation complexity of the DPD is much reduced for VS-DPD (14 coefficients instead of 38). Secondly, the identification of coefficients is greatly facilitated, since the covariance matrix R of (5) has much smaller dimensions and is better conditioned (there is a ratio of ≈ 10 5 between the two condition numbers). Thirdly, the dynamic of coefficients is strongly reduced. For VS-DPD, the ratio between the magnitudes of the largest and the smallest coefficients is smaller than 500 for all the models while it is around 4e5 for global GMP-DPD. This last point is important for fixed point implementation of DPD. The only small drawback of SV-DPD compared to global GMP-DPD is that each input sample has to be quantified in order to determine its class and the coefficients of the DPD have to be modified for each signal sample.

Decomposed Vector Rotation DVR-DPD
For DVR-DPD we have to determine the DPD structure: number of segments (or number of thresholds), memory depths and terms that we keep in the model given by (8). Indeed in (8) there are 6 elementary types of basis functions, but may be some of them are not usefull and some other could be added. In [66] we have proposed an algorithm based on hill-climbing heuristic for the sizing of DVR models. This algorithm searches for a structure that optimizes a trade-off between the number of coefficients of the model and its modeling accuracy. The thresholds are equi-spaced but it would be interesting to optimize their values. Figure 13 shows the influence of the number of segments K and of the memory depth M on the NMSE. We can observe that the curves are not monotonically decreasing when the number of segments increases. May be an optimization of the threshold values would made the curves more regular. We can notice that the obtained NMSE values are quite similar to those of VS-DPD for the same number of segments. For example, for N S = 12 segments, the NMSE are close to −36 dB in both cases with M = 3 for DVR. For SV-DPD each model has 14 coefficients and we have to store 12 × 14 coefficients in memory. For DVR, the model is the same for each sample. The DVR model has 208 coefficients. The condition number of that matrix is 10 5 times greater than that of the global GMP-model with 38 coefficients. The dynamic of magnitude of coefficients is more important for DVR (≈ 1800) than for VS-DPD (≈ 500) but it much smaller than that of global GMP-DPD. For DVR, the identification of coefficients requires to deal with a 208 × 208 matrix R while for SV-DPD there are 12 identifications to do (one per segment), each of them with a 14 × 14 matrix R. Therefore identification step is less complex for VS-DPD.
Compared to the global GMP-DPD with 38 coefficients, the NMSE is improved by 3 dB with SV-DPD (12 segments and 14 coefficients per segment) and by 3.5 dB with DVR (12 segments and memory length = 3). Figure 15 shows the normalized power spectral densities obtained with the different DPD (the sampling frequency is equal to 200 MHz) with 12 segments for VS or DVR DPD.

Conclusion
This paper is a survey of predistortion techniques using segmentation.
A comparison of global GMP DPD with two segmented approaches namely Vector-Switched DPD (VS-DPD) and Decomposed Vector Rotation DPD (DVR-DPD) is presented with the support of experimentation on a strongly nonlinear 3 ways Doherty PA. It shows the interest of both segmented approaches in terms of linearization performance, complexity and ease of implementation compared to the global GMP-DPD. VS-DPD and DVR-DPD improves the NMSE by more than 3 dB. The VS-DPD shows very good numerical properties both for fixed-point real-time implementation and for identification of coefficients. The dynamic of the coefficients of the segmented SV and DVR DPD is very small compared to that of the global GMP DPD. The numerical conditioning of SV-DPD is very good.
There remain several paths to explore in the area of segmented DPD, such as optimizing the segmentation, using tensor algorithms for multivariate DPD, studying the potentialities of local model networks, association with neural nets.