Simplified QR-decomposition based and lattice reduction-assisted multi-user multiple-input – multiple-output precoding scheme

: Future wireless communication systems require more and more antennas at the transceiver to improve the achievable rates. Multi-user multiple-input multiple-output (MU-MIMO) technique is regarded as a potential technique to serve large number of users simultaneously to further increase the achievable rates in MIMO systems. If the number of antennas at the transceiver is large, the computational complexity of precoding becomes the bottleneck and a big challenge in MU-MIMO system. In this paper, a simplified QR decomposition with lattice reduction assisted MU-MIMO precoding scheme named S-QR-LR is proposed as a low-complexity MU-MIMO transmission scheme. The simplified QR decomposition method is delicately designed and operated twice for the proposed precoding scheme not only to achieve good performance but also reduce the computational complexity significantly. The proposed S-QR-LR scheme first uses the simplified QR decomposition operation to balance the multi-user interference and the noise. Then, the proposed S-QR-LR precoding scheme utilizes the simplified QR decomposition method again with the assist of lattice reduction to obtain the precoding gain to further improve the performance with low computational complexity. Analytical and simulation results show that the proposed S-QR-LR precoding scheme achieves best performance among the existing precoding schemes, but requires the lowest computational complexity.


Introduction
Multi-user multiple-input-multiple-output (MU-MIMO) technique, a potential method to improve data rate and achieve high capacity has been extensively studied in recent years [1,2]. The vertical Bell laboratories layered space-time (VBLAST) MIMO technique is first designed in [3] for high data rate transmission. VBLAST architecture utilises the MIMO configuration to provide parallel layered transmission in spatial domain to improve the spectrum efficiency. The drawback of VBLAST is that more receive antennas are required than the transmitter, which is always not valid in downlink cellular systems. In consequence, the MU-MIMO precoding [4] technique for downlink MIMO systems that requires channel state information (CSI) at the transmitter, is developed to solve the problem. The spatial division multiple access-based MU-MIMO technique improves the achievable rate by transmitting the data of multiple users simultaneously and pre-cancelling the MU interference (MUI) at the transmitter with the CSI. To meet the continuous growing demands for high data rates, MU-MIMO systems [5] with more transmit antennas have attracted much attention for the future wireless communication systems. The configuration of up to eight transmit antennas at base station (BS) in LTE-advanced [6] and IEEE 802.11ac [7] is suggested. Massive MIMO (also known as large-scale antenna systems, very large MIMO and hyper MIMO) is an emerging technology that uses a few hundred antennas simultaneously to serve tens of terminals in the same time-frequency resource [8,9]. Therefore, MU-MIMO is regarded as a potential technique to serve large number of users simultaneously in future massive MIMO systems. Thus, the computational complexity of precoding becomes the bottleneck and a big challenge in MU-MIMO systems with large antennas at the transceiver.
In MU-MIMO systems, the MUI is the major problem that affects its performance severely. The objective of MU-MIMO techniques is to cancel the MUI via precoding at BS before transmission. The zero-forcing channel-inversion (ZF-CI) precoding is first proposed in [10], which introduces a ZF equaliser at the transmitter instead of the receiver to pre-cancel the MUI. However, ZF-CI precoding results in low performance due to the reason that the precoding vectors after ZF operation is non-unitary that will amplify the noise, especially when the channel is ill-conditioned. The well-known block diagonalisation (BD) precoding scheme that is regarded as a generalisation of the ZF-CI precoding is proposed in [11,12]. Two singular value decomposition (SVD) operations are implemented for each user in BD precoding algorithm. The MUI from other users is eliminated completely by adopting the first SVD operation to MUI MIMO channel. Thus, the MU-MIMO broadcast channel is decomposed into multiple parallel single-user MIMO (SU-MIMO) channels. Then, the second SVD operation is implemented to parallelise each user's streams and obtain maximum precoding gain for each sub-stream to further improve the performance. BD precoding provides better performance than ZF-CI precoding due to the reason that the unitary precoding vectors for BD will not amplify the noise after precoding. However, the computational costs of BD precoding scheme is very heavy because of two SVD operations for each user.
Recent researches for MU-MIMO in [13,14] focus on designing the BD-type precoding schemes with less computational complexity. By replacing the first SVD operation with a less complex solution to mitigate the MUI, a QR-decomposition-based BD (QR-BD) precoding scheme is presented in [13] for MU-MIMO systems. QR-BD utilises a QR-decomposition to the MUI MIMO channel to obtain the null space of MUI. Therefore, the complexity of SVD operation in BD precoding is reduced by QR operation in QR-BD precoding. Similar to the QR-BD precoding scheme, a generalised ZF-CI (GZI) precoding method is developed in [14], where the MUI MIMO channel is operated first by pseudo inversion and then QR-decomposition to mitigate the MUI. Both the QR-BD and GZI remain the second SVD operation that is the same as the original BD precoding. Hence, QR-BD and GZI reduce the complexity of BD precoding, but still have comparatively heavy computational complexity. Besides, the BD, QR-BD and GZI precoding schemes only attempt to completely eliminate the MUI, but without considering the impact of noise. Therefore, these BD-type MU-MIMO precoding schemes provide poor performance at low signal-to-noise ratios (SNRs), when the noise is the dominant factor to performance. To overcome the disadvantage of BD-type precoding, the generalised minimum mean square error (MMSE) CI precoding scheme denoted as GMI is proposed in [14] to balance the MUI and the noise for each user effectively. GMI outperforms the BD-type MU-MIMO precoding schemes since its precoding design compromises the MUI and the noise to achieve the maximum signal to interference and noise ratio.
Recently, lattice reduction (LR)-assisted precoding techniques are introduced in MU-MIMO systems to improve the performance and reduce the complexity of the linear precoder. The orthogonality between columns of the MIMO channel matrix is improved and the size of each column is reduced after LR to the MIMO channel matrix. The well-known LR algorithm is the so-called LLL algorithm proposed by Lenstra et al. in [15], where an extended real model for LR is introduced. Afterwards, a complex LLL (CLLL) algorithm based on the Gram Schmidt orthonormalisation is proposed in [16] and an equivalent CLLL algorithm based on QR-decomposition for LR is proposed in [17]. The overall complexity of the CLLL algorithm reduces nearly by half without sacrificing any performance compared with the LLL algorithm. Some researches that combine the LR technique with ZF, MMSE, BD or other precoding algorithms in MU-MIMO systems emerge in the literature recently. An LR-aided and simplified GMI (LR-S-GMI) precoding scheme is proposed in [18] to further reduce the computational complexity and achieves a better bit error rate (BER) performance than GMI. The LR-S-GMI precoding scheme first adopts the MMSE inversion method to compromise the MUI and the noise that is the same as the conventional GMI precoding. Then, the LR-aided MMSE linear precoding technique is employed in LR-S-GMI scheme instead of the SVD-based method in conventional GMI scheme to achieve better performance and complexity reduction. The LR-S-GMI precoding scheme performs better than the conventional GMI precoding scheme, since its second precoding matrix is from the MIMO channel after LR operation. LR-S-GMI also reduces the complexity of conventional GMI precoding, since linear MMSE precoding operation for LR-S-GMI to obtain the second precoding matrix requires less computational complexity than SVD operation for GMI.
In this paper, a simplified QR-decomposition with LR-assisted MU-MIMO precoding scheme named S-QR-LR is proposed as a new approach for low-complexity MU-MIMO transmission. The proposed S-QR-LR scheme introduces a simplified QR-decomposition with the assist of the improved LR to obtain good performance and reduce the computational complexity dramatically. The proposed S-QR-LR precoding scheme first uses the designed simplified QR-decomposition operation to find the precoding vectors to balance the MUI and the noise. To normalise these precoding vectors, the Gram Schmidt orthogonalisation (GSO) method is applied to these vectors to find the orthonormal basis as the first precoding matrix. After that, the second precoding matrix is obtained by the simplified QR-decomposition method again with the assist of LR to further improve the performance and reduce the computational complexity. We also propose an improved LR algorithm to obtain the inverse of the transformation matrix to further reduce the computational complexity of LR. Analytical and simulation results show that the proposed S-QR-LR precoding scheme achieves the best performance among the existing precoding schemes, but requires the lowest computational complexity.
This paper is organised as follows. The system model of MU-MIMO is illustrated in Section 2. To better illustrate our scheme, Section 3 reviews the conventional LR-S-GMI precoding scheme. Section 4 presents the proposed S-QR-LR precoding scheme and the improved LR algorithm for MU-MIMO system. In Section 5, the computational complexities of the conventional LR-S-GMI precoding scheme and the proposed S-QR-LR precoding scheme are analysed. Performance evaluation are demonstrated in Section 6. Conclusions are drawn in Section 7. Notations: We use (·) T , (·) H , (·)*, (·) −1 and (·) † for the transpose, Hermitian transpose, conjugate, inverse and pseudo-inverse operations, respectively. We denote CN (0, s 2 ) for the complex Gaussian distribution of a random variable having independent Gaussian distributed real and imaginary parts denoted by CN 0, s 2 /2 , with mean 0 and variance (σ 2 /2).

System model
We consider a downlink MU-MIMO system with one BS and K users, where the BS is equipped with N t transmit antennas and each user with N k receive antennas, as shown in Fig. 1. Without loss of generality, we assume that the receive antenna number N k for each user is the same. We assume that the perfect instantaneous CSI for each user is available at the BS. N k multiple data streams s k for user k are precoded by the corresponding precoding vector W k . All these streams from K users are combined together after precoding and emitted from N t transmit antennas at BS. The channel coupling the BS to user k is modelled as the flat Rayleigh fading MIMO channel as where H k ∈ C N k ×N t is the MIMO channel matrix for user k and the element h i, j indicates the channel impulse response coupling the jth transmit antenna to the ith receive antenna. It obeys independent and identically complex Gaussian distribution according to CN (0, 1). The received signal y k for user k is given by where W k ∈ C N t ×N k is the precoding matrix for user k and s k is the transmit signal vector for user k. Note that each user transmit N k data streams and therefore s k ∈ C Nk×1 . n k ∈ C Nk×1 is the kth user's Gaussian noise with independent and identically distributed entries of zero mean and variance σ 2 . The average transmit power of each user is normalised to one. Equivalently where W ∈ C Nt×Nr and s ∈ C Nr×1 are the precoding matrix and the transmit signals for K users, respectively. We have N r = KN k , since each user has the same N k . For simplicity, equal power allocation is applied and the power of each data stream is normalised to one; thus E[ss H ] = I N r . The MIMO channel matrix for K users is formulated as The channel matrix H S excluding the kth user's channel is given bỹ

Review of the conventional LR-assisted precoding scheme
To compare with the proposed S-QR-LR precoding scheme, we review the conventional LR-assisted LR-S-GMI precoding scheme first in this section. The LR-S-GMI precoding scheme is assisted by an LR method to reduce the computational complexity. Similar to the BD-type precoding schemes, the LR-S-GMI precoding is implemented by two steps.
Step 1: Obtain the first precoding matrix W o to balance the MUI and the noise.
The LR-S-GMI precoding algorithm takes noise into consideration and introduces the MMSE inversion method proposed in [10] to generate the null space ofH k with the consideration of the noise. By applying MMSE inversion to H m , we have where α = N r σ 2 /P total denotes the ratio of the noise variance to the transmit power andĤ According to the well-known relation of ZF and MMSE in the filter theory, H m H † m ≃ I for the high SNR scenario as α ≃ 0. Thus, the matrixĤ m k approximately lies in the null space ofH k .
Then, the QR-decomposition is applied toĤ m k for each user aŝ SinceH kĤ m k ≃ 0,H kQm kR m k ≃ 0. Moreover, becauseR m k is invertible,H kQm k ≃ 0. Similar to the GZI precoding scheme,Q m k can form the first precoding matrix for user k to balance the MUI and the noise After traversing all the users, the first precoding matrix for K users is It is observed that the LR-S-GMI precoding scheme requires one pseudo-inverse operation to an N r × N t matrix and K times of QR-decomposition operations to an N k × N t matrix to obtain the first precoding matrix.
Step 2: Obtain the second precoding matrix W g to further improve the performance.
After the first precoding operation, the effective channel matrix for user k is given by Then, the LR-aided MMSE linear precoding is employed to obtain the second precoding matrix W g k for user k with low computational overhead.
The extended effective channel matrix where the upper sub For the LR-aided MMSE precoder, the LR transformation is performed on the transpose of the extended channel matrix H T s k [19] asH where T k is a unimodular matrix with integer elements and determinant det(T k ) = +1. Thus, the LR-aided MMSE precoding matrix for user k can be obtained as where A k = [I N k , 0 N k ×N k ] aims to get the upper sub-martix ofH † s k . After traversing all the users, the second precoding matrix W g for K users is It is observed that the LR-S-GMI precoding scheme requires K times of pseudo-inverse operations to an N k × N t matrix to obtain the second precoding matrix.
To meet the total transmitted power constraint, the transmit power normalisation factor β is introduced before precoding as where β is given as Through the above two steps, the overall precoding matrix is obtained as W = W o W g norm .
In this section, we propose a simplified QR-decomposition-based and LR-assisted MU-MIMO precoding scheme denoted as S-QR-LR. Similar to the BD-type precoding scheme, the proposed S-QR-LR precoding algorithm is implemented in two steps.
Step 1: Obtain the first precoding matrix W o to balance the MUI and the noise.
We extend the channel matrix H m of K users with noise as H = [H m , a √ I N r ], where I N r is an N r × N r unit matrix. If we do pseudo-inverse operation to the extended channel matrix H as where Q N t [ C N t ×N r consists of the first N t rows of Q and Q N r [ C N r ×N r consists of the last N r rows of Q. We can get the inverse matrix R −1 simply as Therefore, we can rewrite (18) with the QR-decomposition transformation as whereR k [ C N r ×N k is the sub-matrix of ( R H ) −1 [ C N r ×N r for user k. According to (20), we have We can observe from (8), (18) and (21) thatĤ m k = Q N tR k is the precoding vector to balance the MUI and the noise for user k. H k Q N tR k converge to zeros when the regularisation factor α approaches to zero with high SNR. Thus, Q N tR k is approximately in the null space ofH k after considering the noise factor, where Q N t is obtained from (19) andR k is simplified and obtained from (19) and (22).
Since Q N tR k is not an orthonormal basis, we should find the orthonormal basis of Q N tR k as the precoding matrix for user k to improve the performance. We can apply the GSO method to find the orthonormal basis of Q N tR k as the first precoding matrix W o k for user k as After traversing all the users, the first precoding matrix W o for K users is expressed as It is observed that the proposed S-QR-LR precoding scheme requires just one QR-decomposition operation to an N r × (N t + N r ) matrix and K times of GSO operations to an N t × N k matrix to obtain the first precoding matrix.
Step 2: Obtain the second precoding matrix W g to further improve the performance.
After the first precoding operation, the MU-MIMO channel is transformed into K approximate parallel SU-MIMO channels, and we define the effective channel matrix for user k as H s k = H k W o k . Then, the simplified QR-decomposition is introduced again with the assist of the proposed improved CLLL algorithm to H s k to obtain the second precoding matrix W g to further improve the performance and reduce the computational complexity. The proposed simplified CLLL algorithm is introduced in the following section.
For user k, we extend the effective SU channel matrix as , where I N k is an N k × N k unit matrix. Then, the LR transformation is performed on the transpose of the extended channel matrix H T s k in precoding scenario [19] as whereH s k is the channel matrix after LR transformation and T k is a unimodular matrix with integer elements and determinant det(T k ) = +1. If the QR-decomposition is employed onH H s k as Thus, the MMSE inversion precoding matrix can be obtained with the simplified QR-decomposition and the assist of LR as follows where A k = [I N k , 0 N k ×N k ] aims to get the N k × N k upper sub-matrix. According to (25) and (26), we havẽ whereQ 1 s k consists of the first N k rows ofQ s k andQ 2 s k consists of the last N k rows ofQ s k . Therefore IET Commun., 2016, Vol. 10, Iss. 5, pp. 586-593 Thus, the second precoding matrix W g k for user k of the proposed S-QR-LR algorithm is obtained as T −1 k is obtained from the proposed improved CLLL algorithm in the following section instead of the direct inversion to T to further reduce the computational complexity. After traversing all users, the second precoding matrix W g for K users is denoted as It is observed that the proposed S-QR-LR precoding scheme requires K times of QR-decomposition operations to an N k × (N t + N k ) matrix to obtain the second precoding matrix.
To meet the total transmitted power constraint, the transmit power normalisation factor β is introduced before precoding as where β is given as Through the above two steps, the overall precoding matrix is obtained as W = W o W g norm . At the receiver, automatic gain control is introduced to compensate for the effect of amplification by the factor of β at the transmitter. After being divided by β, the received signal for user k is finally given by At last, the receiver quantises the received signal y k to the nearest data vector ⌊ y k ⌉ and the final detection result is recovered via s k = T k ⌊ y k ⌉.

Improved CLLL algorithm
In this section, we propose an improved LR algorithm based on CLLL [16] algorithm to obtain the transformation matrix T and the inverse of the transformation matrix T −1 with low complexity, where T is used to recover the transmit signal at the receiver side and T −1 is used to obtain the precoding matrix at the transmitter side. The detailed pseudo-code of the improved CLLL-based LR algorithm can be found in . < · ( ) and ℑ · ( ) denotes the real and complex parts of a complex number, respectively. For complex number x, |x| denoted its modulus: The aim of LR is to find, for a given lattice, a reduced matrix with shorter basis vectors and improved orthogonality compared with the original matrix. The extended CLLL algorithm mainly consists of the size reduction and the column swapping operations as follows: † Size reduction: A basis H is called size reduced by satisfying the condition If this condition is not satisfied for any entry pair (n, k), the linear combination operations from step 17 to step 22 of Table 1 are implemented until the condition is satisfied. The size reduction process aims to make basis vectors shorter and more orthogonal by asserting the condition of (35). † Column swapping: Two column vectors (column k and column k − 1) are exchanged or swapped if the following condition is not satisfied for a given δ parameter as follows This process is addressed from step 27 to step 38 of Table 1. After swapping, the basis vectors become shorter than that after size reduction step. Conventional CLLL LR algorithm can only obtain the transformation matrix T. However, the improved CLLL LR algorithm can also obtain the inverse of the transformation matrix T −1 with low complexity, avoiding the direct inversion operation to T. Since TT −1 = I, if T is obtained by a linear combination of columns and thus T −1 can also be obtained by a linear combination of rows for size reduction step. Therefore, when the condition is not fulfilled for any entry pair (n, k), the operation of linear combination of rows can be carried out to the identity matrix to obtain T −1 , as step 13 and step 20 shows.

Achievable sum-rate analysis
In this section, we give the analysis of the achievable sum-rate of the proposed S-QR-LR precoding scheme. Equivalent SU-MIMO channels H s k can be achieved after the first precoding step of mitigating the MUI. With equal power allocation, the capacity for user k after precoding can be denoted by where SNR k = (P k /σ 2 ) and P k is the transmit power for user k with normalised value in this paper. The SVD of H s k can be denoted as where U ∈ N k ×N k and V ∈ N k ×N k are unitary matrices. Σ ∈ N k ×N k is a diagonal matrix, whose diagonal elements are the singular values of the matrix H s k and are denoted by l 1 , l 2 , …, l Nk . Then, we have where Λ ∈ N k ×N k is a diagonal matrix. Using (39) and the identity det I + AB ( )=det I + BA ( ) , we can rewrite (37) as The capacity C k for user k is a function of the singular values of the channel matrix H s k . By Jensen's inequality, we have where the equality of (41) is true if and only if the singular values are all equal. Therefore, the capacity C k is maximised when the channel matrix H s k is statistically well-conditioned: namely, more orthogonal. For an orthogonal matrix, the condition number is one, where (l max /l min ) is defined to be the condition number of the matrix H. l max and l min are the maximal and minimal singular values of H.

Computational complexity analysis
In this section, we quantify the computational complexity with the notion of flop, where flop denotes the floating point operation. According to [18,21], We give the required flops for LR-S-GMI and the proposed S-QR-LR precoding algorithms in Tables 2 and 3. The required flops of CLLL algorithm is referred to [16]. Though, we proposed an improved CLLL algorithm to directly obtain T −1 , the complexity of the improved CLLL is almost the same as the conventional CLLL. Therefore, when evaluating the complexity of CLLL for the two schemes, we use the same complexity of the conventional CLLL for them.

Performance evaluation
In this section, the computational complexity of the conventional BD, QR-BD, GZI, LR-S-GMI and the proposed S-QR-LR precoding schemes are simulated and compared. We also compare the performance of the conventional precoding schemes with the proposed S-QR-LR precoding scheme in sum-rate and BER. We assume that full CSI for each user is available at the transmitter. The sum-rate for MU-MIMO system after precoding is calculated by [22] In Fig. 2, we present the probability density function of the condition numbers (on a log scale) for the effective channel matrices, where cond(H) = (l max /l min ) denotes the condition numbers of the matrix H. We use the variable x to denote the condition number of the matrix at the horizontal axis and the probability density function for the log scale of the condition number at the vertical axis for simplicity in Fig. 2. It is observed from the result that the value of condition number for the original matrix is always larger than 1. It is also observed that the average value of the condition number for H after LR is about 1 and its deviation after LR is much smaller than the original H. That is to say, the channel is well-conditioned and the capacity is maximised after LR. Therefore, we can achieve a much higher capacity of H after LR than the capacity of original H that is corresponding in Figs. 3 and 4.  Table 3 Computational complexity of S-QR-LR Operations Flops Fig. 3, we present the simulation result of the sum-rate for the conventional BD, QR-BD, GZI, LR-S-GMI and the proposed S-QR-LR precoding schemes with the same antenna configuration of N t = 4, N k = 2 and K = 2. As expected, the sum-rate or capacity of the LR-assisted precoding schemes of LR-S-GMI and the proposed S-QR-LR precoding schemes outperform the conventional precoding schemes without LR operation. In Fig. 4, we compare the BER performance of the five precoding schemes with N t = 4, N k = 2 and K = 2. As expected, the proposed S-QR-LR precoding scheme has the same BER performance as LR-S-GMI precoding algorithm and performs better than the conventional BD, QR-BD and GZI precoding algorithms. Fig. 5 shows the result of the computational complexity in flops for the conventional BD, QR-BD, GZI, LR-S-GMI and the proposed S-QR-LR precoding schemes with the increase of user number K. The transmit antenna number N t = KN k and the receive antenna number for user k are N k = 2. The BD precoding scheme introduces two SVD operations for each user and thus it is the most complicated precoding algorithm among these schemes. The QR-BD precoding utilises a QR-decomposition to replace the first SVD operation in BD precoding and therefore reduces the computational complexity of BD scheme. Similar to QR-BD precoding, GZI avoids the first SVD operation in BD precoding scheme to reduce the required flops in BD precoding scheme. However, both BD-QR and GZI precoding schemes remain the second SVD operation and still have comparatively heavy computational complexity. GMI precoding have the same computational complexity as GZI and the LR-S-GMI precoding scheme requires less flops than GMI. It is observed that the proposed S-QR-LR precoding algorithm reduces the computational complexity significantly than BD and still provides advantages than LR-S-GMI. It is because LR-S-GMI precoding scheme requires multiple times of pseudo-inverse operations to obtain the precoding matrix. However, for the proposed S-QR-LR scheme, it replaces the pseudo-inversion operations in LR-S-GMI scheme with the simplified QR-decomposition operations to obtain the precoding matrix. Moreover, the computational complexity of QR-decomposition operation is much smaller than the pseudo-inverse operation for a matrix with the same dimension. Fig. 6 shows the result of the computational complexity in flops for the conventional BD, QR-BD, GZI, LR-S-GMI and S-QR-LR precoding schemes with the increase of N k , where K = 4 and N t = KN k . It is apparent that the proposed S-QR-LR precoding scheme still requires the lowest computational costs than these existing precoding schemes with the increase of N k .   In this paper, a low-complexity linear precoding scheme named S-QR-LR is proposed in downlink MU-MIMO system. The proposed S-QR-LR precoding scheme introduces a delicate designed QR-decomposition operation first to mitigate the interference and then to obtain the precoding gain with the assist of LR. The achievable sum-rate and the computational complexity of the proposed scheme are analysed and compared with the conventional BD, QR-BD, GZI and LR-S-GMI schemes. Analytical and simulation results show that the proposed S-QR-LR precoding scheme achieves the best performance among these schemes, but requires the lowest computational complexity.

Acknowledgments
This work was partially supported by the National Science