PCA-Aided Linear Precoding in Massive MIMO Systems with Imperfect CSI

In this paper, a low-complexity linear precoding algorithm based on the principal component analysis technique in combination with the conventional linear precoders, called Principal Component Analysis Linear Precoder (PCA-LP), is proposed for massive MIMO systems.+e proposed precoder consists of two components: the first one minimizes the interferences among neighboring users and the second one improves the system performance by utilizing the Principal Component Analysis (PCA) technique. Numerical and simulation results show that the proposed precoder has remarkably lower computational complexity than its lowcomplexity lattice reduction-aided regularized block diagonalization using zero forcing precoding (LC-RBD-LR-ZF) and lower computational complexity than the PCA-aided Minimum Mean Square Error combination with Block Diagonalization (PCAMMSE-BD) counterparts while its bit error rate (BER) performance is comparable to those of the LC-RBD-LR-ZF and PCAMMSE-BD ones.


Introduction
In order to combat fading phenomenon in wireless communication, the Multiple-Input Multiple-Output (MIMO) technique has been proposed and already applied in 4th generation (4G) cellular networks [1]. e MIMO systems can significantly improve the channel capacity, the BER performance, and the reliability of wireless systems by increasing the number of antennas at both transmitter and receiver sides. e signal processing techniques for the uplink and downlink of both single-user MIMO, i.e., pointto-point MIMO, and multiuser MIMO (MU-MIMO) systems have been extensively researched in recent years. However, in practice, the number of antennas at the base sStation (BS) side in the MU-MIMO system is limited (normally fewer than 10) [2]. erefore, the spectrum efficiency and system capacity are still relatively modest.
In order to cope with these issues, Massive MIMO systems have recently been proposed [1,[3][4][5]. In the Massive MIMO, the number of antennas at the BS can be up to hundreds (or even thousands) to simultaneously serve dozens of users using the same frequency resource. By increasing the number of antennas at the BS side, the Massive MIMO systems can significantly improve the channel capacity and enhance the spectrum utilization efficiency and the BER performance of the system [4]. However, the Massive MIMO systems also face many challenges such as hardware complexity, power consumption, and system cost due to a large number of antennas deployed at the BS [1,2]. Basically, the Massive MIMO system can work in Time Division Duplex (TDD) or Frequency Division Duplex (FDD) mode. However, the TDD operation is preferable to the FDD operation because the TDD system can increase the number of antennas at the BS side to expand the system capacity without being affected by the coherence interval [1]. It is expected that Massive MIMO will be a key and bright candidate for the next generation wireless networks (e.g., 5G network) [1,4,6].
Undoubtedly, the Massive MIMO systems will become more complex as the number of antennas at the BS side gets very large. erefore, reducing the complexities of the signal processing algorithms for both uplink and downlink in Massive MIMO systems is necessary. In Massive MIMO systems, the complex signal processing is performed at the BS side. erefore, the precoding algorithms with low complexity, such as Zero Forcing (ZF), Minimum Mean Square Error (MMSE), and Maximum Ratio Transmission (MRT), are considered as suitable solutions for the downlink in the Massive MIMO system [7][8][9]. Besides, to eliminate interference from neighboring users and hence improve the system performance, the Block Diagonalization algorithm is adopted [10]. However, the computational complexity of the BD algorithm is very high. In this context, there exist different proposals to reduce the computational complexity based on the BD algorithm. For example, the QR decomposition based on Block Diagonalization (QR-BD) algorithm and the Pseudoinverse Block Diagonalization (PINV-BD) algorithms are proposed in [11,12]. Nevertheless, the application of these algorithms to Massive MIMO systems remains a challenge task due to their excessive complexities.
In [13], Wang et al. proposed the precoder consisting of two components that utilize the LQ decomposition and Singular Value Decomposition (SVD) of the channel matrix. Based on the proposed approach in [14], in [15], the authors proposed the low-complexity Lattice Reduction-(LR-) aided precoding algorithms for the MU-MIMO system, referred to as LC-RBD-LR-ZF and LCR-BD-LR-MMSE. In the LC-RBD-LR-ZF and LC-RBD-LR-MMSE algorithms, the first precoding matrix is created by applying the QR decomposition to the extended channel matrix. e second precoding matrix is obtained using the conventional ZF and MMSE precoding algorithms in combination with the LR technique to provide the corresponding LC-RBD-LR-ZF and LC-RBD-LR-MMSE precoders. It was shown in [15] that the precoders significantly improved the system performance while reducing the computational complexity when compared to the one in [14]. In [16], the authors proposed a low-complexity linear precoding scheme for MU-MIMO systems based on the principal component analysis technique. However, the computational complexities of the precoders in [13][14][15][16] are still very high due to the QR and LQ decomposition operations. erefore, these algorithms could hardly be applied to the Massive MIMO systems. Moreover, the systems are investigated under the assumption that the perfect channel state information (CSI) is available at the BS side.
Based on the principal component analysis technique and the linear precoding algorithms, the paper proposes a low-complexity precoder for Massive MIMO systems. In our proposed method, the precoding matrix is designed to consist two components. e first one is created by the MMSE precoding algorithm to minimize the interferences among neighboring users. e second one is designed based on the principal component analysis technique to improve the system performance. Numerical and simulation results show that the proposed precoder has remarkably lower computational complexity than the LC-RBD-LR-ZF in [15] and lower computational complexity than the PCA-MMSE-BD in [16], while its BER performance is comparable to those of the LC-RBD-LR-ZF and PCA-MMSE-BD precoders in both perfect and imperfect CSI scenarios. In addition, simulation results show that the channel estimation error adversely affects the system performance no matter which precoder is adopted. e system performance decreases as the channel estimation error increases and vice versa for all precoders. e rest of this paper is organized as follows. In Section 2, we present the Massive MIMO system model with imperfect CSI. e principal component analysis technique is reviewed in Section 3. In Section 4, we propose the linear precoding algorithm for the Massive MIMO systems that adopts the principal component analysis technique. Simulation results are shown in Section 5. Finally, conclusions are drawn in Section 6.

Notation.
e notations are defined as follows. Matrices and vectors are represented by symbols in bold; (·) T and (·) H denote the transpose and conjugate transpose, respectively, I N R denotes the N R × N R identity matrix, and trace · { } is the trace of a square matrix.

Downlink Channel Model in Massive MIMO System
Let us consider a TDD-based Massive MIMO system with N T antennas at the BS to simultaneously serve K users as illustrated in Figure 1. Each user is equipped with N u antennas. Let N R be the total number of antennas for the K users, then we have N R � KN u . Let x u ∈ C N u ×1 , u � 1, . . . , K, represent the transmitted signal vector for the uth user. e received signal at the uth user, y u ∈ C N u ×1 , can be expressed as follows: where H u ∈ C N u ×N T , W u ∈ C N T ×N u , and n u ∈ C N u ×1 are the channel matrix from the BS to the uth user, the precoding matrix, and the noise vector for the uth user, respectively. e entries of the channel matrix are assumed to be identical independent distributed (i.i.d) random variables with zero mean and unit variance.
Let y � y T 1 y T 1 · · · y T K T ∈ C N R ×1 be the overall received signal vector for all users. en, based on (1), y is given by where W ∈ C N T ×N R is the precoding matrix for all users, T is the transmitted signal vector for K users, and n ∈ C N R ×1 is the noise vector at the K users. e entries of n are assumed to be identical independent distributed (i.i.d) random variables with zero mean and variance σ 2 n . In reality, it is almost impossible for the BS to fully get the CSI due to the effects of thermal noise and pilot contamination. In other words, the system has to operate under imperfect CSI conditions. e accuracy of the CSI available at the BS side depends on the channel estimators to be used. e imperfect channel matrix MMSE channel estimation can be modeled as follows [17,18]: where Rayleigh fading channel from the BS to all users in which entries h ij are assumed to be complex Gaussian random variables with zero mean and unit variance. E err ∈ C N R ×N T is the channel estimation error matrix in which entries are assumed to be normalized i.i.d zero mean complex Gaussian random variables. ϕ ∈ [0, 1] is a parameter that indicates the accuracy of the channel estimator. From (3), we can see that ϕ � 0 means that there is no channel estimation error and the CSI at the BS is perfect. Conversely, ϕ � 1 indicates a complete failure of the channel estimator.

Review of the Principal Component Analysis Technique
e Principal Component Analysis technique was presented in [16,[19][20][21]. It is a mathematical tool that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables, which are called principal components. Let U ∈ C M×N be the original data set and Y ∈ C M×N be the rerepresentation of that data set. Based on the eigenvalue decomposition of UU T in linear algebra, the relationship between Y and U is given by [16,19] Y � BU. (4) Herein, B ∈ C M×M denotes the principal components of U. en, (4) can be represented as follows: is the row of B, which is basic vectors correspond to eigenvectors of covariance matrix UU T for representing the columns of U. e B matrix is defined in such a way that the first principal component b 1 has the largest variance, each succeeding component in turn has the highest variance under the constraint that it is orthogonal to the previous components. According to [16,19,20], the PCA technique based on the eigenvalue decomposition of UU T is summarized as follows.
First, each row of U is normalized to have zero mean as follows: where U mean ∈ C M×N represents the mean of rows of the matrix U.
as the set of eigenvectors associated with the eigenvalues μ k of the symmetric matrix UU T , then we have where μ k � (μ 1 , μ 2 , . . . , μ m ) ∈ R M×1 . Applying the QR decomposition to U, we obtain where R ∈ C M×N is an upper triangular matrix and Q ∈ C M×M is a unitary matrix. erefore, we can write By using the SVD decomposition, the R H matrix can be expressed as follows: From (9) and (10), it follows that Or equivalently, we have From (7) and (12), we can see that the eigenvectors and eigenvalues of the covariance matrix (UU T ) are contained in the matrices QV and Σ 2 , respectively. erefore, the principal component matrix B is given by N T antennas-BS N u antennas-user

Proposed PCA-LP Precoder.
In this section, we construct a linear precoding algorithm based on the principal component analysis technique, called PCA-LP precoder. In our proposal, the precoding matrix is given by where W a ∈ C N T ×N R , W b ∈ C N R ×N R , and β is the normalized power factor, which is given by e first precoding matrix W a is obtained using the conventional MMSE technique as follows: where σ 2 � σ 2 n /E s , E s is the transmit symbol energy, and W k a ∈ C N T ×N u (k � 1, 2, . . . , K) is the precoding matrix for the kth user. e second precoding matrix W b is constructed based on the PCA transformation as follows.
First, using W k a in (16), the effective channel matrix for the kth user is computed to be where H k ∈ C N u ×N T is the channel matrix from BS to the kth user. Next, the channel matrix H k eff is normalized to give H k nor ∈ C N u ×N u with zero mean as follows: where H k mean ∈ C N u ×N u denotes the mean matrix, whose entries are the means of the rows of the matrix H k eff . Applying the QR decomposition to H k nor , we obtain where R k nor ∈ C N u ×N u is an upper triangular matrix and Q k nor ∈ C N u ×N u is a unitary matrix with orthogonal columns. After that, the SVD operation is applied to (R k nor ) H to give where U k nor ∈ C N u ×N u and V k nor ∈ C N u ×N u are unitary matrices with orthogonal columns and Σ k nor ∈ R N u ×N u is a diagonal matrix. e principal component factor matrix A k PCA ∈ C N u ×N u for the kth user is obtained as Using A k PCA and H k eff , the combination channel matrix H k com ∈ C N u ×N u for the kth user is computed as follows: From (22), the precoding matrix W k b ∈ C N u ×N u for the kth user based on the conventional ZF algorithm is given by Finally, the second precoding matrix W b and the principal component factor matrix A PCA ∈ C N R ×N R for all users are represented as follows: e proposed algorithm PCA-LP is summarized in Algorithm 1.
At the user side, the received signal vector for all users can be expressed as Using y in (26), the estimated signal vector is given by From (27), with W PCA � W a W b , the error covariance matrix can be obtained as e Empirical Cumulative Distribution Functions (ECDFs) of the maximum diagonal element of Q in (28) for the PCA-LP and PCA-MMSE-BD precoders at different accuracy levels of the channel estimator are illustrated in Figure 2. It can be observed from the figure that the largest elements on the diagonals of the error covariance matrices for both precoders increase as the channel estimation error increases. In addition, the PCA-MMSE-BD precoder provides slightly smaller maximum errors than the PCA-LP one in all cases. Figure 3 shows the ECDFs of the sums of diagonal elements of the error covariance matrices for the PCA-MMSE-BD precoder and the proposed PCA-LP precoder. As can be seen from the figure, the sums of the diagonal elements for all the precoders experience exactly the same behavior as the maximum diagonal elements. Similar to the results in Figure 2, the mean square errors for both precoders increase when the channel estimation error increases. Besides, the PCA-MMSE-BD precoder provides slightly smaller summation errors than the PCA-LP one in all cases.
Since the diagonal elements of error covariance matrices determine the mean square errors (MSEs) between the transmitted symbols and the recovered ones, BER performances of the PCA-MMSE-BD and the proposed PCA-LP precoders decrease as the channel estimation error increases. Fortunately, the slightly higher MSE of the proposed precoder does not incur significant BER performance degradation as compared to the PCA-MMSE-BD precoder.

Computational Complexity Analysis.
In this section, the computational complexity of the proposed PCA-LP precoder is evaluated and compared with those of the LC-RBD-LR-ZF in [15] and the PCA-MMSE-BD in [16]. e complexities are evaluated by counting the necessary floating point operations (flops). We assume that each real operation (an addition, a multiplication, or a division) is counted as a flop. Hence, a complex multiplication and a division is equal to 6 flops and 11 flops, respectively. It is worth noting that the QR decomposition of an m × n complex matrix requires 6mn 2 + 4mn − n 2 − n flops. According to [22], the SVD of a m × n (m ≥ n) complex matrix requires (4m 2 n + 8mn 2 +9n 3 ) flops. Based on the abovementioned assumptions, the computational complexity of the proposed algorithm PCA-LP is computed to be where F 1 and F 2 are the number of flops to find the matrices W a and W b and F 3 is the number of flops for the multiplication two matrices W a and W b .
Step 3 to Step 7 until the precoding matrices W k b for all users are obtained. (9) Create the matrices W b and A PCA by arranging W k b and A k PCA to the main diagonals of W b and A PCA as in (24) and (25), respectively.
Wireless Communications and Mobile Computing e number of flops to find the matrix W a is equal to e number of flops to find W b is expressed as follows: where F 4 is the number of flops for the multiplication two matrices H k and W k a ; F 5 is the number of flops for the QR decomposition of H k nor ; F 6 is the number of flops for SVD of (R k nor ) H ; F 7 is the number of flops for the multiplication two matrices Q k nor and V k nor ; F 8 is the number of flops of the multiplication two matrices A k PCA and H k eff . Finally, F 9 is number of flops to find the matrices W k b . ese items are given by Besides, F 3 is calculated as erefore, the number of flops for the proposed PCA-LP precoder is represented as follows: Using similar complexity analysis steps, we are able to obtain the complexities of both the LC-RBD-LR-ZF and the PCA-MMSE-BD precoders, which are summarized in Table 1 in the next page.

Simulation Results
In this section, we compare both the computational complexity and the system performance of the proposed PCA-LP precoder with those of its LC-RBD-LR-ZF and PCA-MMSE-BD counterparts. Figure 4 demonstrates the computational complexities of the PCA-LP, LC-RBD-LR-ZF, and PCA-MMSE-BD precoders. In this scenario, N T � N R is varied from 40 to 100 transmit antennas, N u � 2, and K � N R /2. Numerical results show that the computational complexity of the proposed PCA-LP precoder is significantly lower than the complexity of the LC-RBD-LR-ZF precoder and lower than that of the PCA-MMSE-BD precoder. For example, at N R � N T � 80 antennas, the complexity of the proposed PCA-LP  precoder is approximately equal to 2.58% and 84.07% of the complexities of the LC-RBD-LR-ZF and PCA-MMSE-BD precoders, respectively. We can see that the complexity of the LC-RBD-LR-ZF precoder is very large for two reasons: firstly, the number of the QR operations applied to the extended channel matrix is too large. Secondly, the sizes of the precoding matrices P a ∈ C N T ×KN T and P b ∈ C KN T ×N R increase proportionally to N R and N T . erefore, the number of flops required for the multiplications of the two matrices P a and P b also increase.
e BER curves of the proposed PCA-LP as well as those of the LC-RBD-LR-ZF and PCA-MMSE-BD precoders are illustrated in Figures 5-8. In Figure 5, the system is assumed to work in the perfect CSI condition (i.e., ϕ � 0 and H � H) with the following parameters: N R � N T � 64, N u � 2, K � N R /2, and 4-QAM modulation. It can be seen from Figure 5 that the BER curves of the three precoders are almost identical when the SNR ≤ 27 dB. For larger SNR, the LC-RBD-LR-ZF precoder outperforms the remaining ones.
In Figure 6, we simulate the system performance in the imperfect CSI condition (i.e., H � ����� 1 − ϕ 2 H + ϕE err ) for ϕ � 0.5 and ϕ � 0.7. Other parameters are the same as those used to generate Figure 5. Similar to the results in Figure 5, the results in Figure 6 show that the three precoders provide nearly the same system performance. As the SNR is sufficiently large, the LC-RBD-LR-ZF precoder is able to provide better BER performance. e results from Figures 4-6 show that the LC-RBD-LR-ZF precoder can marginally outperform the proposed PCA-LP and PCA-MMSE-BD precoders in the sufficiently large SNR regions. However, it suffers from noticeably higher computational complexities.
In Figure 7, the BER performances of the system with N T � 128, N u � 2, and K � 64 are illustrated for the three precoders in the perfect and imperfect CSI conditions at the BS side. Here, 4-QAM modulation is also adopted. e simulation results in Figure 7 show that the BER curves of the PCA-LP proposed precoder almost coincide with those of the remaining precoders in all scenarios. Moreover, one can observe from Figures 5-7 that as N T increases from 64 to 128, the LC-RBD-LR-ZF precoder no longer outperforms the proposed PCA-LP one in the high SNR regions. Clearly, the larger N T is deployed at the BS, the better   system performance the proposed PCA-LP precoder can achieve. Figure 8 illustrates the BER curves of the three precoders as functions of ϕ at SNR � 24 dB and 27 dB. Other simulation parameters are kept the same as the ones used for Figure 5, i.e., N T � N R � 64, K � 32, N u � 2, and 4-QAM modulation. We again see that for the same parameters, the three precoders provide nearly the same BERs, particularly when ϕ becomes larger. Furthermore, the channel estimation error has an adverse effect on the system performance no matter which precoder is employed. e larger values of ϕ make the system performance deteriorate more rapidly.
It is noteworthy that the simulation results in Figures 5-8 are obtained for the worst cases when N T � K × N u , i.e., when the systems work in full-load conditions. As the number of user gets smaller, the system performance definitely becomes better.

Conclusions
is paper proposes the PCA-LP precoder with low complexity in Massive MIMO systems with imperfect CSI at the BS side. e proposed precoder consists of two component. e first one is designed to eliminate interference from neighboring users and the second one is created based on the Principal Component Analysis technique and the linear precoding algorithm to improve the system performance. Numerical and simulation results show that the computational complexity of the proposed PCA-LP precoder is significantly lower than the complexity of the LC-RBD-LR-ZF precoder and lower than that of the PCA-MMSE-BD precoder, while they all provide nearly the same system performance in all simulation scenarios. Simulation results also show that the channel estimation error has an adverse effect on the system performance no matter which precoder is adopted. For all precoders, the system performance decreases as the channel estimation error increases and vice versa. Moreover, the larger the channel estimation error is, the more rapidly the system performance deteriorates. Taking both system performance and complexity into consideration, the proposed PCA-LP precoder can be a potential digital beamforming technique for the downlinks of Massive MIMO systems.

Data Availability
All results included in this paper are generated by simulation in Matlab.

Conflicts of Interest
e authors declare that they have no conflicts of interest.