Multiple-Antenna Emitters Identification Based on a Memoryless Power Amplifier Model

Power amplifier (PA) nonlinearity is typically unique at the radio frequency (RF) front-end for particular emitters. It can play a crucial role in the application of specific emitter identification (SEI). In this paper, under the Multi-Input Multi-Output (MIMO) multipath communication scenario, two data-aided approaches are proposed to identify multi-antenna emitters using PA nonlinearity. Built upon a memoryless polynomial model, the first approach formulates a linear least square (LLS) problem and presents the closed-form solution of nonlinear coefficients in a MIMO system by means of singular value decomposition (SVD) operation. Another alternative approach estimates nonlinear coefficients of each individual PA through nonlinear least square (NLS) solved by the regularized Gauss–Newton iterative scheme. Moreover, there are some practical discussions of our proposed approaches about the mismatch of the order of PA model and the rank-deficient condition. Finally, the average misclassification rate is derived based on the minimum error probability (MEP) criterion, and the proposed approaches are validated to be effective through extensively numerical simulations.


Introduction
Specific emitter identification (SEI) is committed to distinguish individual radiation sources by using essential radio frequency fingerprint (RFF) features extracted from different emitters. It can be applied to military communication [1], physical layer authentication [2], and enhancement of the security in wireless network, such as very high frequency (VHF) radio networks, Wi-Fi networks, cognitive radios, cellular networks [3], and so on.
In general, based on different states of signals, the identifiable RFF features for SEI are usually extracted from the transient signal or the steady-state signal. The transient signal, actually the turn-on signal, carries unique and unintentional information that is advantageous to emitter identification, and the features underlying are mainly extracted from the instantaneous amplitude, phase, frequency, and energy envelope [4][5][6]. Nevertheless, it is difficult to capture the transient signal since the duration time is often too short to use. As for the steady-state signal, many researchers focus on extracting the distinguishable features that are generated by hardware imperfection of the components inside the radiation source through advanced signal processing techniques. On one hand, the statistical characteristics of the original RF signal, such as the high order spectrum, have been used as the features to identify different emitters [7,8]. On the other hand, the Time-Frequency Analysis (TFA) methods [9,10], Wavelet Transform (WT) [11,12], and Hilbert-Huang Transform (HHT) [13,14] are successively applied to extract the transform domain characteristics from the received RF signals. However, these methods have little knowledge of impairments inside the individual emitter, and the performance can be easily affected by the wireless channels. Hence, there are many additional works sequence. In other words, PA nonlinearity is treated as a unique RFF of the emitter to fulfill the identification task, and we assume QAM is used by emitters when communicating with the receiver.

Memoryless Nonlinear PA Modeling
Generally, the memoryless polynomial model is simple in expression and can describe the intermodulation distortion of an RF PA well, such as PA9440 amplifier [28], in the narrowband communication system. More specifically, the polynomial coefficients are corresponding to the intermodulation coefficient such as third-order interception point (IP 3 ) and fifth-order interception point (IP 5 ). In this work, we choose the memoryless polynomial model to characterize the nonlinear behavior of all PA units in a multi-antenna emitter; then, the relationship between baseband equivalent input and output of the PA [20] at jth antenna branch can be written as where P denotes the max order of the PA model and can be configured as P = 3, 5, 7, . . . . At the RF front-end of emitters, the frequency components that resulted from the even terms in the model can be removed by the bandpass filter, thus the even terms are ignored in Equation (1). s j (n) is the baseband equivalent input of the nonlinear system, denoting the nth QAM symbol transmitted over the jth antenna. The superscript "*" is the conjugate operator. α 2p−1,j denotes the normalized (2p − 1)th order PA coefficient of the jth antenna and without loss of generality, we set α 1,j = 1 hereafter. x j (n) stands for the response of the nonlinear system. In this paper, we suppose that each emitter is equipped with J antennas. Therefore, the normalized discrete-time baseband equivalent form of the nonlinear distortion model for the multi-antenna system can be expressed as . . . (s 1 (n)) p · s * 1 (n) p−1 (s 2 (n)) p · (s * 2 (n)) p−1 . . .
wheres j (n) is the normalized version of s j (n).

MIMO Multipath Channel
The propagation channel considered in this paper is a linear discrete MIMO system with J transmit antennas and R receive antennas. Given the length of the training sequence N, the signal received at the antenna r can be represented by where ⊗ denotes the convolution operation. x j is the nonlinear distortion signal transmitted over the jth antenna, h rj represents the channel impulse response between the jth transmit antenna to the rth receive antenna and remains time-invariant during the data receiving process, and v r is the zero-mean additive white Gaussian noise received at the antenna r. Thus, we can unfold Equation (3) naturally in the form of y r = X (1) h r1 + X (2) where the order of the channel is L, X (j) ∈ C (N+L−1)×L is a Toeplitz matrix populated by x j in Equation (3). Note that the nonlinear coefficients are cross-coupled with channel coefficients in Equation (4); therefore, the major target arising in this paper is to get the accurate estimations of the separate nonlinear coefficient {α 2p−1,j } via the received signal {y r } with the aid of training sequences in MIMO multipath scenarios, before we can identify a specific emitter.

The Proposed Estimation Approaches
As mentioned above, it has been reported in [19,20] that the nonlinear coefficients of PAs can be estimated with two stages. The first stage establishes the initial estimation of channel coefficients and nonlinear coefficients through some well-designed training sequences sorted by amplitude. Then, an iterative method is applied at the second stage to eliminate the estimate bias. In [27], a PA parameter estimator combined the best linear unbiased estimation (BLUE) and singular value decomposition (SVD) is proposed for the SISO system. Since the method in [27] can get a closed-form solution of the PA nonlinear coefficients and has no constraint on the ordering of the amplitude of training symbols, which is more practical compared to the one in [19], we extend it to the MIMO multi-path scenarios and mark it as linear method in MIMO (LMM). Furthermore, we propose an alternative method to extract PA parameters in a nonlinear least square (NLS) manner. It should be noted that both LMM and NLS approaches can be degraded into SISO systems.

The LMM Approach
Note that, in [27], if the product terms of the channel coefficients and the nonlinear coefficients are substituted by some new integrated variables, we can obtain a set of linear equations with regard to the new variables, thus the product terms can be readily solved by LLS. Then, the only thing left to us is to extract α 2p−1,j from the solution.
According to Equation (4), we first vectorize signals received by all R antennas into vector , where the superscript "T" is the transpose operator.
In addition, the signal at the receiver side now is where v vec is the corresponding reshaped noise vector. h α ∈ C RJL(P+1)/2 is the integrated vector composed of all independent variables and can be represented as: with u r,l h = W r,l h α vec , in which l h = 0, . . . , L − 1, I α ∈ C (P+1)/2×(P+1)/2 is a unit matrix and blkdiag(·) is the block diagonalization function. In addition, D s is defined by with d s ∈ C (N+L−1)×JL(P+1)/2 being constructed by the known training sequence. To elaborate the process, we provide a numerical example in Appendix A.
Consequently, the least square (LS) estimation of h α in Equation (5) iŝ where the superscript " †" denotes the pseudo-inverse operation. It is worth noting that, compared to the BLUE method in [27], the least square solution of Equation (12) has no requirement on the estimation of noise power. Obviously, the condition of a unique solution to Equation (12) is natural that D s is full column rank and N ≥ JL(P + 1)/2 − L + 1 is satisfied. Afterwards, we define and further perform SVD on the matrix Q∈ C J(P+1)/2×RL to get a closed-form estimation of the PA parameters in the MIMO system. Since the normalized first-order nonlinear coefficients are assumed equal to 1, the execution steps of PA nonlinear coefficients estimator can be summarized as: (1) Reshape the observations according to Equation (5), and estimate h α according to Equation (12).
(2) Reshape theĥ α into the matrix Q, then perform SVD on Q j = U j ∑ j V H j , where Q j is a submatrix consisting of the ((j − 1)(P + 1)/2 + 1)th to (j(P + 1)/2)th rows of the matrix Q.
(3) Estimate the nonlinear coefficients of jth transmitting antenna as follows: where U (:,1) j and U (1,1) j are the first column and first element of U j , respectively.

The NLS Approach
Bearing in mind the received signal in Equation (4) for MIMO multipath transmission system, we can now further expand the expression with a series of nonlinear equations due to the existence of product term of the nonlinear coefficient α 2p−1,j and the channel coefficient h rj (l h ). As a consequence, the problem of nonlinear coefficients estimation can thus be alternatively transformed into a NLS optimization one when introducing a training sequence with length N.
In order to get a more robust solution, we choose to solve a constrained NLS optimization problem with q = RJL + J (P − 1) /2 independent variables, and the cost function can be given by: where · 2 denotes 2-norm. z∈ C q is the vector consisting of the independent variables and can be described as z = z T h , z T α T , in which z α and z h are respectively as z α = α 3,1 , . . . , α P,1 , . . . , α 3,J , . . . , α P,J T (16) with γz 2 2 2 is the regularization. g(z)∈ C R(N+L−1) denotes the residual function: C q → C R(N+L−1) with R(N + L − 1) ≥ q, that is, with ∆y r = y r − X (1) h r1 + X (2) h r2 + · · · + X (J) h rJ . (20) According to [29], the regularized Gauss-Newton iterative method is introduced here to figure out z, i.e., with The superscript "H" denotes conjugate transposition, and I q is a q × q unit matrix. J(z) is the Jacobian matrix of g(z), that is, the first-order derivative of g(z) with respect to z.
In general, given the Jacobian matrix J(z) with full column rank, one can apply the regularized Gauss-Newton method in Equation (21) to optimize problem Equation (15) and can eventually obtain the optimal solution z opt . In addition, the appropriate regularization factor γ can improve the condition number of the inverse matrix in Equation (22), which guarantees the robustness of the NLS optimization.

Practical Discussions on the Proposed Approaches
As mentioned in Section 3, we can get a closed-form solution for the problem of nonlinear coefficient estimation through the LMM approach. Alternatively, an iterative NLS approach can also be applicable to extract PA parameters from observations. However, in practice, there are two main factors that affect the accuracy of our proposed algorithms. The first factor is the case where the order of PA model is mismatched between the transmitting and receiving ends. The second one is the case where the matrix D s and the Jacobian matrix J(z) are rank-deficient. In the next Sections 4.1 and 4.2, we first theoretically analyze the impact of the mismatched model. After that, we present the rank-deficient condition of the D s and J(z) matrix in Section 4.3.

Overdetermined Order of the PA Model
In this subsection, we consider the impact of the overdetermined order of the PA model on the estimation accuracy of the nonlinear coefficients. Assume that all the PAs of an emitter actually have the same model order as P. If we obtain an overdetermined order of the PA model beforehand, e.g., P 0 with (P 0 > P), the expected observation of y vec = D where D , and the additive white noise has no effect on the unbiasedness of the estimation results. Therefore, we can conclude that, if the order of the PA model is overdetermined, the estimation of the nonlinear coefficients obtained by the LMM approach is still unbiased.

Underdetermined Order of the PA Model
In the sequel, we attempt to analyze the case that the order of the PA model is underdetermined, i.e., P 0 is lower than the actual P. It is easy to understand that D (1,..
, so that the estimation of h α can be written as: Attention must be paid to non-zero h in Equation (24) due to the non-zero nonlinear PA coefficients through the (P 0 + 2)th to the Pth order. The resultingĥ α is a biased estimation w.r.t.
where c∈ C m×n is a constant matrix with m = P 0 +1 2 and n = P−P 0 2 .
Then, the estimate bias of the nonlinear PA coefficients can be given as the following Proposition 1.
Therefore, we can conclude that, when the order of the PA model is underdetermined, the bias term of each nonlinear coefficient is related to training sequence and the higher-order nonlinear coefficients. In addition, the high-order nonlinear coefficients (i.e., α P 0 +2,j to α P,j ) are still unknown.

Rank Deficiency Condition of the Proposed Approaches
As mentioned earlier, it is impossible to determine a unique solution for the LLS problem in Equation (12) when the matrix D s is rank-deficient. In addition, for the NLS problem in Equation (15), it is also hard for the regularized Gauss-Newton method in Equation (21) to find the global optimal solution when the Jacobian matrix J(z) is rank-deficient at every z. However, since QAM is employed by all emitters, the training sequence has only a limited level of amplitude. We find that the rank attributes of D s and J(z) are both associated with the maximum order of the PA model. The specific relationship can be revealed by Proposition 2 as follows.

Proposition 2.
Given QAM symbols with M modulus values as the training sequence and the maximum P order of the PA model, the matrix D s is rank-deficient and the Jacobian matrix J(z) is also rank-deficient at every z, if M < (P + 1)/2.

Proof of Proposition 2. See Appendix C.
According to the Proposition 2, the D s and J(z) are full rank as long as the amplitude type of signal is sufficient. Therefore, our proposed approach can be applied in some single-carrier communication systems with higher order QAM modulation such as 16-QAM, 64-QAM and 256-QAM, which have 3, 9, and 32 different modulus values, respectively. Actually, it appears that our proposed approaches are readily suitable for other popular wireless communication systems, such as MIMO-OFDM systems, where there will be no rank-deficient problem of D s and J(z) in nature since the amplitude of the transmitted signal is generally continuous. Moreover, it is not necessary to estimate the channel order in a MIMO-OFDM system due to its ability to resist multipath effects, which will make the proposed approach more practical.

Error Rate Analysis For Classification
In this paper, we apply a minimum error probability (MEP) criterion [30] based on Bayesian theory to classify different emitters, and the RFF feature of each emitter is composed of the estimated nonlinear coefficients and can be expressed as follows: with where p = 3, 5, . . . , P.
Here, we take the case where there are two emitters as an example to give the derivation of the decision criteria, and the binary hypothesis test model can be considered as where C i denotes the category i, i = 1, 2; v α is composed of the residual additive Gaussian noise in the estimations of the nonlinear coefficients and we assume that each element of v α obeys a Gaussian distribution with a zero mean and a variance of δ 2 ; m 1 ∈ C J(P−1) and m 2 ∈ C J(P−1) are respectively the mean vectors of the estimated feature vector a for the two emitters, which can be obtained from the samples collected offline. Thus, the decision rule can be derived based on MEP criterion as follows: where the test statistic is a ts = (m 2 − m 1 ) T a and the decision threshold is thr = (m 2 For simplicity, if we assume that the variables in a are independent of each other, then the test statistic a ts ∼ N m ts , δ 2 ts , for the i-th emitter, the mean is as m ∑ k=1 δ 2 β 2 k with β k being the k-th element of (m 2 − m 1 ). As a result, with the assumption of equally probable hypotheses, the average misclassification rate based on MEP criterion can be derived asP where . When x increases, Φ(−x) decreases. As for the case where the estimations of the nonlinear coefficients are unbiased, the mean values of the RFF feature for the two emitters are readily as where α (i) p,j is the nonlinear coefficient of the i-th emitter. However, for the case of biased estimations, being the mean of the biases with i = 1, 2 in the estimation of RF fingerprint features. Therefore, according to Equation (31), the average misclassification rate for this case can be obtained by: where ∆β k is the k-th element of (∆m 2 − ∆m 1 ) and Therefore, we can conclude as follows: (1) TheP e decreases as the variance of the additive noise decrease.
(2) TheP e decreases as the difference between the mean values of the RF fingerprint feature for the two emitters increases, which indicates that the PA parameters of each emitter should be designed to be as different as possible in a bid to achieve better performance. (3) According to Equation (31), as for a fixed P, more nonlinear coefficients are used as features will make the (m 2 − m 1 ) T (m 2 − m 1 ) larger, which obviously leads to better classification performance. (4) According to Equation (36), when there are biases in the estimations, the additional terms θ 2 and θ 3 may causeP e to decrease compared to the case where there is no bias.
More generally, as for K (K > 2) emitters, with the assumption of equally probable hypotheses, the average misclassification rate based on MEP criterion can be represented as whereP c is the average correct classification rate; D k (k = 1, 2, . . . , K) indicates the discriminant domain of the kth class and it shows as Equation (39) Therefore, the P r (D k |C k ) can be obtained by solving the multiple integrals in Equation (40) is a standard deviation similar to the δ ts in Equation (31).

Simulation Setting
In the following simulations, if not specified, we use a 2 × 2 MIMO channel with Rayleigh multipath fading model, and the number of paths L is set to 2. The training sequences are QAM symbols with N = 35 for proper complexity and performance trade-off. The regularization factor γ is set to 0.05 for the trade-off between the variance and bias in the estimation of NLS approach. The complex nonlinear coefficients lead to AM-AM and AM-PM distortion, where the AM-AM distortion is mainly caused by the IP 3 and IP 5 , and higher order intermodulation distortion is usually ignored [28,31]. Generally, the actual communication system has specific requirements for the out-of-band spectral emission level of RF signals. Therefore, for the rationality of PA parameters in reality, we select the PA parameters that AM-AM characteristic obeys the method in [28,31] and the AM-PM is obtained by slightly adjusting the phase of the parameter in [20]. The nonlinear PA coefficients are displayed in Table 1, and it has been verified that the out-of-band emission level of the amplified signals is about 40 dBr, which meets the requirement of general agreement. In all experiments, we use the Normalized Mean Squared Error (NMSE) to evaluate the estimation accuracy of the nonlinear coefficients, i.e., where α true = [α 3,1 , α 3,2 , α 5,1 , α 5,2 , . . . , α P,1 , α P,2 ] T , α est is the estimation of the α true , and E{·} denotes the Mathematical Expectation.

The Impact of SNR
Note that Liu's algorithm in [19] assumes all PAs in one emitter are the same. To facilitate the comparison of the proposed approaches with Liu's algorithm, we extend Liu's algorithm to be suitable for our MIMO scenario. For brevity, we use the name Modified Liu Algorithm (MLA) to represent the modified version in the following. Here, we set P to 5, and the training sequence is 16-QAM symbols. At first, we give comparisons among the MLA, LMM, and NLS in Figures 1 and 2, where the third-order and fifth-order coefficients are combined as a classification feature, from which one can see that the performance of the LMM and NLS are apparently better than that of MLA, especially at low signal to noise ratio (SNR) regime. One possible reason lies in that when P equals 5 and the training sequence is composed of 16-QAM symbols, the D s and J(z) are both full column rank, whereas the matrix populated by the training sequence for the initial estimation of MLA method is rank-deficient regardless of how long the training sequence is. In addition, we notice that the NLS approach also performs better than the LMM one, which is because the introduction of regularization during the process of optimization. Furthermore, in order to explore the impact of iterations on performance of NLS approach, when SNR is 25 dB, we give some results in Figures 3 and 4 to demonstrate the convergence speed of NLS approach, where 19 iterations are needed to obtain the optimal solution.    In Figure 5, we fit a PA with P = 5, when third-order and fifth-order coefficients are combined as a classification feature, the classification performance is better than when only the third-order coefficient is used. This confirms the analysis that, for a fixed P, features with more nonlinear coefficients can effectively improve classification performance in Section 5. In the next simulations, if not specified, we use the estimated third-order and fifth-order coefficients together as classification features. Moreover, in Figures 6 and 7, we present results of the proposed approaches under a 4 × 4 MIMO scenario; as you can see, more RF channels bring more nonlinear coefficients, which is beneficial to improve classification performance. Here, for convenience, we take the nonlinear coefficients of Emitter2 and Emitter4 in Table 1 as those of the first 4-antenna emitter, and take the nonlinear coefficients of Emitter3 and Emitter6 in Table 1 as those of the second 4-antenna emitter.

Identification for Multiple Emitters
In this simulation, we test the performance of proposed methods under the multi-user case. Here, we set the maximum order of PAs in individual emitters to 5, and 16-QAM symbols are used as the training sequence. The detailed nonlinear coefficients of each emitter are displayed in Table 1. The results are shown in Figures 8 and 9, where the SNR is set to 25 dB, and each method is performed by 1000 Monte Carlo simulations.
As we can see from the Figure 8 and Figure 9, our proposed approaches are also robust and applicable to the multi-user case, and average misclassification rates of them are both increased as the number of emitters increased. Obviously, the greater the difference in nonlinear characteristics between emitters, the higher the resolution of the proposed approaches. In this simulation, as far as the nonlinear coefficients in Table 1 are concerned, the proposed approaches can resolve up to seven emitters at most. The number of emitters

The Confirmation of Practical Discussions
Hereafter, we set the number of emitters to 2 (i.e., emitter1 and emitter2), the maximum order of the PA model P to 5. The first to fifth order parameters are selected from Table 1. When we use a PA model with P 0 = 3 to fit the nonlinearity of the emitters, the results of Figure 10 confirm the conclusion in the Section 4.2 that the underdetermined order of the PA model indeed leads to the biases in the estimations of PA parameters. However, the results in Figure 11 show that the classification performance can be improved by using the biased estimations as features, which reveals that the biases in Equation (36) can make the difference between the classification features of emitters larger. Where the training sequence is 16-QAM symbols, the estimated third-order are used as classification features, and the legends "Well-estimated LMM(NLS)" in Figures 10 and 11 denote that results obtained from a well-determined order of PA model with P 0 = P = 3.  Then, according to the Proposition 2, when we use a PA model with P 0 = P = 5 to fit the nonlinearity of emitters, if 4-QAM, which is a constant modulus modulation, is adopted by each emitter, then it is clear that the D s and J(z) are both rank-deficient, and there is no unique solution in this case. Here, we compare the estimation accuracy of when taking 4-QAM and 16-QAM symbols as training sequence; the results are shown in Figure 12, which obviously corroborates the correctness of Proposition 2.

Experimental Results
In this subsection, we design a preliminary verification experiment to explore the effectiveness of our proposed approaches in reality. Due to the limitation of single-channel acquisition, we limit the proposed approaches in the SISO-OFDM system and validate them in a 802.11.g-based wireless local area network (WLAN) that is very common in real life. Therefore, we build an experiment platform, which is shown in Figure 13, to collect the measured router data. In this platform, we first use a LeCroy WaveMaster 813Zi-A Oscilloscope (Chestnut Ridge, NY, USA) equipped with a single antenna to acquire the RF signals from three TL-WR740 routers communicating with a smart phone on the 2.412 GHz channel, respectively. In this experiment, we use the extended NLS approach as an example to estimate nonlinear coefficients of each router, here we use the "wlan toolbox" in MATLAB R2019a (MathWorks, Natick, Massachusetts, USA) to perform pre-processing such as timing, synchronization, and de-frequency offset on the acquired RF signals. Since the nonlinearity of PA mainly caused by IP 3 , we set P = 3. Finally, a MEP-based classifier is used to identify the individual router. Note that the memoryless polynomial model may not be able to describe the nonlinearity of the PA in a broadband WLAN system, whereas Table 2 indicates that, according to the estimated PA coefficients in Figure 14, the mean is far greater than the variance for each router, thus the three routers are identifiable based on the extended NLS approach. Moreover, the average misclassification rates of the three routers are all 0. Therefore, we can conclude that the mismath of PA model does not affect classification performance. In addition, we also compare the power spectral density among measured data and simulated data for three routers in Figures 15-17, respectively, where the legend "Measured baseband OFDM symbols" denotes downconverted acquired RF signals, the legend "Amplified simulated baseband OFDM symbols" denotes simulated baseband OFDM signals amplified by the PA with measured nonlinear coefficients, and the legend "Raw simulated baseband OFDM symbols" denotes raw simulated baseband OFDM signals, all of their power being normalized. In order to explain the results in Figures 15-17, we calculate the NMSE between the PSDs of "Amplified simulated baseband OFDM symbols" and that of "Measured baseband OFDM symbols" for each router, and the NMSEs of three routers are, respectively, −7.6472, −7.5301 and −7.3589dB, which reveal that the PSD of the signal reconstructed by using the estimated PA coefficient can well fit that of the measured signal.  Power/Frequency (dB/(rad/sample)) Raw simulated baseband OFDM symbols Amplified simulated baseband OFDM symbols Measured baseband OFDM symbols Figure 15. Comparison of power spectral density between measured data and simulated data for router 1.    Figure 14 for the three routers.

Conclusions
This paper investigates the SEI scheme for multiple-antenna communication emitters, using PA nonlinearity as RFF features with the assumption that all PAs of a multiple-antenna emitter are independent from each other. Both the LMM and the NLS approaches are proposed to estimate the nonlinear coefficients in association with the memoryless polynomial PA model, where a closed-form solution can be obtained by the LMM approach, and the alternative NLS approach achieves better performance by adopting a regularized Newton-Gauss scheme. Practical discussion on the PA model mismatch is presented, and some theoretical results about the estimate bias and rank-deficient condition are provided to guide the design and implementation of the SEI over MIMO channels. In addition, an error rate analysis is also introduced for the MEP classifier. Simulation results demonstrate that the proposed approaches outperform the other existing schemes, especially in the rank-deficient case, and are effective to deal with SEI in MIMO communication systems. Moreover, the proposed approaches are verified to be effective on a 802.11.g-based experiment platform.

Acknowledgments:
The authors would like to thank Yingke Lei from the National University of Defense Technology for his valuable discussion on PA modeling introduced in this paper. The authors also appreciate anonymous reviewers for their helpful comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this appendix, we give a simplified example for the composition of the matrix in the proposed LMM approach. Here, the parameters are assigned as R = 2, J = 2, L = 2 and P = 7. Therefore, the matrix d s should be as where 0 1×8 is a 1 × 8 zero vector, d temp = [G 1 , G 2 ], and G j ∈ C N×(P+1)/2 , j = 1, 2, is as follows: which is actually the same as D s = blkdiag d (7) s , d where 10 , e 10 , e (1) 01 , e 10 , f and e (j) 10 , and f (j) 01 (j = 1, 2) are respectively expressed by: Similarly, as for the case that the order of the PA model is underdetermined, be aware that the parameters are assigned as R = 2, J = 2, L = 2 and P = 7, whereas P 0 equals 5, then D The examples for h (1,3,5) α and h (7) α are shown as 1,1 α 2,1 α where diag(·) denotes a diagonal function, I 3 is a 3 × 3 unit matrix, r = 1, 2 and l h = 0, 1.

Appendix B
In this appendix, we first give the basis of Guess 1.
Proof. At first, we discuss the case where the parameters are set to be J = 1, L = 2 and P = 7, respectively. As mentioned in Appendix A, d ( where with (·) −1 being the inverse of a matrix. Then, according to the mathematic formula for the inverse of the block matrix, we get with Therefore, if we guess that

Appendix C
In this appendix, we give the proof for Proposition 2. At first, we prove that, when the training sequence, consisting of QAM symbols, has M different modulus values and the maximum order of the PA model is P, if M < (P + 1)/2, then the matrix D s is rank deficient.
Proof. At first, we consider a matrix B∈ C l B ×c B , which is When l B = c B , B is actually a Vandermonde matrix and its determinant can be expressed as Therefore, if the level of the amplitude of b i (i = 1, . . . , l B ) is less than c B , then det(B) equals 0 and B is definitely singular.
When l B > c B , if the level of the amplitude of b i (i = 1, . . . , l B ) is less than c B , then any c B × c B sub-matrix of B is a singular Vandermonde matrix, that is, any c B × c B sub-matrix of B is column linear correlation, then B is also column linear correlation, which means B is a column rank-deficient matrix. Now, we consider the matrix G j ∈ N×(P+1)/2 in Equation (A2), here j = 1, . . . , J, and it can be rewritten by where G (1) j = diag(s j (0),s j (1), . . . ,s j (N − 1)) is a reversible diagonal matrix. In addition, G (2) j is equivalent to the matrix B when we set b i = s j (i − 1) 2 , i = 1, 2, . . . , N. Since the training sequence is composed of QAM symbols that have only M different moduli values, then any P+1 2 × P+1 2 sub-matrix of G (2) j is a singular Vandermonde matrix when M < (P + 1)/2. Thus, G j is column rank-deficient, and, according to Appendix A, d temp = G 1 , G 2 , . . . , G J is also column rank-deficient. Eventually, d s and D s are both column rank-deficient due to the linear correlation among their column vectors. However, when M ≥ (P + 1)/2, as long as N is large enough, then at least one of P+1 2 × P+1 2 sub-matrix of the G (2) j is a non-singular Vandermonde matrix. Thus, the G (2) j is a column full-rank matrix, and D s is also a column full-rank matrix.
Next, we prove that when the training sequence, QAM symbols, has M different modulus values and the maximum order of the PA model is P, if M < (P + 1)/2, then the matrix J(z) is rank-deficient at every z.
Similar to the discussion of the matrix B in Equation (A23), as for the matrix Ω, when the training sequence is composed of QAM symbols that have only M different moduli values and N ≥ (P + 1)/2, if M < (P + 1)/2, then there are at least (P + 1)/2 − M + 1 identical row vectors in any P+1 2 × P+1 2 sub-matrix of Ω, which means that any P+1 2 × P+1 2 sub-matrix of Ω is a singular matrix, and the column vectors of it are also considered to be linear correlation. As a result, Ω is also column linear correlation, that is, the Ω is a column rank-deficient matrix.
Since the matrix Ω has been proved to be a column rank-deficient matrix, the matrix [O rj ] is obviously column rank-deficient, and there must be a set of coefficients η 1 , η 2 , . . . , η (P+1)/2 that are not all equal to 0. Consequently, the following formula holds where W Then, we can conclude that the matrix Θ j is rank-deficient and the matrix Θ 1 , Θ 2 , . . . , Θ J is also rank-deficient. Thus, the Jacobian matrix J r is also rank-deficient because J r is the elementary column transformation of the matrix Θ 1 , Θ 2 , . . . , Θ J and the elementary column transformation does not change the rank of the matrix. Eventually, the the Jacobian matrix J(z) in Equation (A26) is rank-deficient.