Lattice reduction-ordered successive interference cancellation detection algorithm for multiple-input – multiple-output system

: Lattice reduction (LR) is a powerful technique for improving the performance of linear multiple-input – multiple-output detection methods. The efficient LR algorithms can largely improve the performance of the linear detectors (LDs). Note that the ordered successive interference cancellation (OSIC) system can decrease the interference between antennas and provide performance gain of the LDs. In this paper, a novel LR-aided algorithm called NLR-OSIC improving the performance of the OSIC system has been proposed. Most existing LR algorithms are designed to improve the orthogonality of channel matrices, which is not directly related to the error performance of the OSIC system. While the authors ’ algorithm maximises the signal-to-interference-plus-noise ratio (SINR) of the detected symbol in each stage of the OSIC system, thus exhibiting improved error rate than the previous LR-aided LDs and their corresponding OSIC algorithms. In each stage, the authors verify that maximising the SINR of the detected symbol can be formulated as a shortest vector problem which is solved by a suboptimal algorithm in this study. In the end of this study, the error rate performance of the proposed algorithm as well as the required complexity has been demonstrated through extensive computer simulations.


Introduction
Multiple-input-multiple-output (MIMO) technology is maturing and is being incorporated into emerging wireless broadband standards like long-term evolution [1]. Actually, the more antennas the transmitter or the receiver is equipped with, the more degrees of freedom that the system can provide, and the better the performance, such as the data rate or the link reliability [2]. The price to pay for MIMO is the increased complexity of the hardware and the complexity and energy consumption of the signal processing at both the transmitters and the receivers [2]. In MIMO systems, the detectors are usually great concerns at the receiver. Designing reliable and computationally efficient detectors has become a critical challenge of the MIMO systems. For example, the well-known maximum-likelihood detector (MLD) [3,4] provides optimal bit error performance, however, it has exponential complexity. The other performance-optimal signal detectors, such as the sphere detector also suffers from exponential complexity [3,5,6]. Moreover, the LD algorithms cannot provide excellent performance, i.e. the zero forcing (ZF) and the minimum mean-square error (MMSE) detectors have polynomial complexity, but they only collect poor diversity in MIMO systems [4,7]. Many improved algorithms have been proposed with various complexity and performance tradeoff. In [8,9], the MMSE-SIC (successive interference cancellation) and the block-iterative generalised decision feedback equaliser algorithms have been proposed to achieve high performance while the complexity is still very high. Two local neighbourhood search methodslikelihood ascent search (LAS) [10] and reactive tabu search [11] obtain near-optimal performance for lower-order quadrature amplitude modulation (QAM) but exhibit considerable performance degradation for higher-order QAM. To further improve performance for higher-order QAM, layered tabu search was proposed in [12]; however, this algorithm involves considerable high complexity when the problem size and/or the constellation size is large.
Recently, an efficient scheme called lattice reduction (LR) [13][14][15] has been applied in the MIMO detection. It is always used in conjunction with the LDs. The LR-aided technique can enhance the performance of MIMO digital communication systems. It attempts to find a more orthogonal basis for the channel matrix, as the orthogonality of the channel matrix largely affects the performance of the wireless system [16]. The Lenstra, Lenstra, and Lovasz (LLL) algorithm [17] has firstly been considered for LR-aided data detection. It allows suboptimum detectors to exploit all the available diversity, while it has polynomial complexity. As an alternative to the LLL algorithms, the dual LLL (D-LLL) algorithm, which reduces the lattice in the dual space, was studied in [18,19]. Element-based LR (ELR) algorithm [20] was proposed to reduce the diagonal elements of the noise covariance matrix, which enhances the asymptotic performance of LDs. The ELR provides better bit-error-rate (BER) performance than the LLL algorithm, especially for large MIMO systems [20]. However, though the LR-aided detection algorithms can achieve the full diversity, their performance is still far away from the MLD. Most recent works, such as [18][19][20][21] aim to develop LR-aided detectors with better performance, while some of them still bring larger complexity. As the polynomial complexity of the LR-aided detection algorithms is acceptable in MIMO systems, then moderate increases in the complexity are affordable. The algorithm proposed in this paper also improves the performance of the LR-aided detectors with affordable complexity.
As in the OSIC system, the interference from already-detected components is subtracted out from the received signal vector, resulting in better performance than the LDs. In this paper, we propose a novel LR-aided algorithm called novel lattice reduction (NLR)-OSIC for OSIC system. Compared with the LLL and ELR algorithms, the proposed algorithm has a different designing goal: to maximise the SINR of the detected symbol in each stage of the OSIC system. Note that the SINR of the detected symbol in each stage is proportional to the diagonal elements of R where QR = H is the QR factorisation of the channel matrix H, and we aim to use the LR scheme to maximise the diagonal elements. We verify that maximising the diagonal elements can be formulated as solving the shortest vector problem (SVP). As the SVP is a well-known problem, many algorithms can solve the SVP. By taking the decision region into account, we show that substantial performance improvement in the error rate, when compared with the LLL and ELR aided OSIC algorithms can be obtained with only moderate increase in the complexity.
Notation: Matrices are set in boldface capital letters, vectors in boldface lowercase letters. ( · ) T and ( · ) H are the transpose and the Hermitian transpose of ( · ), respectively. (·) † and ( · ) −H are the pseudo-inverse and the Hermitian of the pseudo-inverse of ( · ), respectively. ‖ · ‖ 2 means the 2-norm operation, and ( · )* is the conjugate of ( · ). We write A i,j for the entry in the ith row and the jth column of the matrix A, a i for the ith entry in a, and A i for the ith column of the matrix A.

System model
Now consider a MIMO system with N t transmitted antennas and N r received antennas as where y c ∈ ℂ Nr is the received signal, H c ∈ ℂ Nr×Nt is a Rayleigh fading channel, x c ∈ ℂ N t is the transmitted signal, and z c ∈ ℂ N r is the Gaussian noise. We assume that the number of the received antennas is no less than the number of transmitted antennas.
The following assumptions are made throughout this paper.
(AS1) We write the channel matrix H c as h c 1 , h c 2 , ..., h c N t , and the channel vectors h c i , i = 1, 2, ..., N t can be represented as where H c ji are i.i.d. zero mean complex Gaussian variables with variance (1/N t ) for j =1, 2, …, N r . Then, the signal-to-noise ratio (SNR) is defined as [7] r = s 2 Note that (1) is equivalent to the real MIMO input-output linear model [3] where Then, y ∈ ℝ K , z ∈ ℝ K , x ∈ ℝ N ,a n dH ∈ ℝ K×N , where K =2N r and N =2N t .

Review of the ZF and the MMSE OSIC systems
The ZF-OSIC system includes the ordering preprocess and the successive interference cancelation (SIC) scheme. The SIC scheme can decrease the interference between antennas. Once we obtain some detection components of the transmitted signal, we can cancel these components from the received signal, so that we can eliminate the interference of these components. Here we introduce the detail of the SIC scheme which can be found in [22].
In case any specific detection ordering is used, the input-output relation of (4) along with the transmitted signal x, the received signal y, and the channel matrix H need to be rearranged. Hence, without loss of generality, we assume that the transmit symbols x i , i =1, 2, …, N are sequentially detected at the receiver within N stages in the order from the Nth component to the first component given that y, H, and x have been rearranged subject to the detection ordering method.
Assume the QR factorisation of the channel matrix H is where Q is a unitary matrix and R is a upper triangular matrix. Substituting (5) into (4), we have Multiplying (6) by the unitary matrix Q H , (4) is equivalent tõ whereỹ = Q H y,z = Q H z.
Following the above discussion, the system detects the signal x from the Nth component to the first component. At the first stage, we have and detects the estimation of x N asx N = Qx ′ N , where Q( · ) denotes the quantisation (slicing) operation appropriate to D which is defined above. Then, at stage i(i ≥ 2), we have . Now we briefly discuss the ordering process below. A detection order which maximises the instantaneous output SINR of the detected symbol at each stage of the SIC system substantially improves the error performance compared to any fixed detection order.
As a preprocessing and ordering approach, the Vertical Bell Labs Layered Space Time (VBLAST) system has been proved to have the optimal detection ordering [22]. It maximises the signal-to-interference-plus-noise ratio (SINR) of the detected symbol at each stage substantially. We know that at stage i, the SINR of the detected symbol is proportional to the diagonal element R N−i+1,N−i+1 , i =1, 2, …, N [22]. The goal of this ordering is to find the permutation matrix P such that, keeping R j, j ( j > N − i + 1) unchanged, R N−i+1,N−i+1 is maximised at the ith stage of the SIC system. The column ordering algorithm is recursive and yields the optimal permutation π in N steps which means the components of x is detected in the order of π(1), π(2), …, π(N).
The description of the ordering in the ZF-VBLAST system is indicated in Fig. 1.
The MMSE-VBLAST system has the equivalent channel matrix H MMSE and received signal y MMSE as where I ∈ ℝ N×N is the unit matrix and 0 [ R N is the zero matrix. Then replacing the H and the y in the ZF-VBLAST system by H MMSE and y MMSE , we can get the MMSE-VBLAST system. It has been verified that the MMSE-VBLAST have better performance than the ZF-VBLAST system for it can provide larger SINR.

Previous LR-OSIC algorithm
LR is one technique which can obtain a near-orthogonal matrix. The LR-aided LDs are based on the fact that if channel matrix is more orthogonal, the decision region of LDs will be closer to that of the MLD [15]. Therefore, to improve the error performance of LDs, we employ LR to find a near-orthogonal channel matrix. As a result, LR-aided LDs yield error performance 'close' to the MLD and have the same error performance as the MLD if the reduced matrix is orthogonal. The orthogonality of the matrix H c a nb em e a s u r e di n terms of the orthogonality defect which is defined as [20,21,23] where h i is the ith column of H,a n dδ ≥ 1. Decreasing the orthogonality defect δ can improve the orthogonality of the matrix. Given the basis matrix H, LR algorithms find a reduced basis matrix H leaving the lattice unchanged. Accordingly, reducing basis H is equivalent to find a unimodular transformation H = HT [14], where T is a unimodular matrix (i.e. all the entries of T are integers, and the determinant of T is ± 1), such that H is near orthogonal.
The previous LR-OSIC in ZF case based on the reduced basis H is now performed as follows [20]. We first write (4) as where H = HT and x = T −1 x. Applying scaling and shifting on x, i.e. x − 1 N×1 /2 where 1 N×1 ∈ ℝ N×1 is the matrix with entries 1, we transfer the domain of x to consecutive integer sets, which are further transferred to the lattice-reduction domain as S ′ = T −1 x − 1 N ×1 /2, whose elements are in consecutive integer sets as well. Hence, we firstly use the OSIC algorithm in last subsection to detect x. However, instead of using quantisation operation Q x ′ i (i = 1, 2, ..., N ) to get the estimation of x i in last subsection where x ′ i can be obtained from (8) and (9), we obtain the estimationˆ x i of x i , i = 1, 2, ..., N as [20] x where , and ⌊·⌉is a rounding function. Once we get the estimation of x, the final detection result can be obtained bŷ where Q( · ) denotes componentwise quantisation as defined above, andˆ x is the estimation of x. The LR-OSIC in MMSE case replaces the H and y above with H MMSE and y MMSE , which is similar with the MMSE-VBLAST system.

Model of the NLR-OSIC algorithm
The previous LR detection work mainly aims to obtain better performance of the LDs. Note that the OSIC system can provide better performance than the LDs. Our algorithm aims to use the LR scheme to improve the performance of the OSIC system. We improve the performance by maximising the SINR of the detected signal at each stage in OSIC system. Our algorithm maximises the SINR of the detected symbol at the ith stage of the OSIC system by maximising the diagonal It means that we need to find a unimodular matrix T and get the reduced lattice H = HT such that We give a theorem below which can derive an equivalent target of our NLR-OSIC algorithm.
where U a,b and L a,b is the ath row and the bth column element of U and L, respectively, and i =1, 2, …, N.
Proof: See Appendix 1. □ As we know, our target is to find a unimodular matrix T to maximise , the problem can be transferred to find a matrix T such that L K−i+1,N−i+1 is minimised at the ith stage. Note that Note V = T −H is still a unimodular matrix. The problem above is equivalent to find a unimodular matrix V so that L K−i+1,N−i+1 is minimised at the ith stage.

Algorithm in each stage
We have shown the target of NLR-OSIC algorithm in last subsection. Our NLR-OSIC algorithm have N stages, i.e. we find a unimodular matrix to minimise L K−i+1,N−i+1 at the ith stage, then multiply these unimodular matrix to get the matrix V, and the detail can be found in the following subsection. Now we give an algorithm to find a unimodular matrix to minimise L K−i+1,N−i+1 at the ith stage. In the following subsection, it will be shown that, before each stage, the reduced matrix has the same form as M ∈ ℝ (l+n)×(l+m) , and Before deriving our algorithm, Lemma 1 will give us a method to find the above unimodular matrix. Since the SVP has been studied for a long time, there are many algorithms to get the shortest vector. However, the optimal solution of the SVP is shown to have very large complexity which is not acceptable in MIMO detection [23], and the suboptimal algorithms which have polynomial complexity can achieve good performance. Here we modify the algorithm in [20], so that it can be applied in our problem and has smaller computational complexity. Our algorithm is suboptimal, however, this algorithm has smaller complexity than other algorithms.
Note that the product of a series of column-addition matrices is the unimodular matrix [20]. Our algorithm factorises T ′ (m) to some column-addition matrices, then solves the column-addition matrices iteratively. Each column-addition matrix can be obtained by updating the kth column of the unit matrix I as where I i and I k are the ith and the kth columns of the matrix I, respectively and μ ik is an integer. Set W = M(1), then multiplying the matrix by the above column-addition matrix, we update the kth column of W as then we get the new matrix W . After one column-addition operation, we find that the square of the 2-norm of W 's the kth column is while other vectors' norm remain unchanged. The reduction Δ ik is defined as Considering (22), the partial derivation of Δ ik can be written as and Δ ik is maximised when where ⌊·⌉ is a rounding function. From the above, we use μ ik in (24) to update W as (20) given the pair (i, k), so that we can maximise Δ ik in the column-addition operation. Then the problem becomes how to choose the pair (i, k) sequentially to get the shortest vector.
Zhou and Ma [20] give one efficient way to choose the pairs (i, k) sequentially. We call W k 2 2 reducible if there exist one or more positive Δ ik (i ≠ k) in (22). In each iteration, our algorithm finds the largest reducible W k 2 2 . Then, we calculate all Δ ik , i ≠ k and choose i = argmax i≠k Δ ik . We use μ ik in (24) to update the matrix W. Continue the iterations until all W k 2 2 are irreducible. Finally, we need to find the shortest column vector, and exchange the shortest vector with the last column of W so that the last column is shortest and L n,m is minimised. Then according to Lemma 1, we can use T ′ (m) to get T(m).
A detailed description of the algorithm is given in Fig. 2.

Description of the NLR-OSIC algorithm
Firstly, we introduce one technique which can be used in our algorithm. Suppose x ∈ ℝ N ,w ed e fine the Householder vector [24] reflection (HR) [24]i sd e fined as and P is a unitary matrix. Then, we have Px =||x|| 2 e N .
From the above analysis, we will derive the unimodular matrix V in (17) below. Our algorithm has N stages, and in the ith stage, we use the LR scheme to find one unimodular matrix to minimise L K−i+1,N−i +1 . Once we get these unimodular matrix, then multiply these unimodular matrix, we can get the matrix V. The detail is as follows. where and defineB .., N ; j , K − N + k,a n d B N−i−1 has the same form with M in the last subsection.
Note thatB N −1 has the form of (25), using the method above, we can getB N −2 , and useB N −2 to getB N−3 . This procedure continues until we getB 1 . Finally, we havê Manipulating (28) as Then, is a unimodular matrix, andB 1 is a lower triangular matrix, then comparing with (17), we have and from V = T −H as shown above, we have T = V −H . We get the steps of our NLR-OSIC algorithm as shown Fig. 3.
Once we have T, substitute it to (12), we get where H = HT, and x = T −1 x is in S ′ as described in Section 2. Then, use the OSIC algorithm in Section 2 to derive the estimation of x, and we can obtain the final detected signal through (14). Our NLR-OSIC in MMSE case also replaces the H and y with H MMSE and y MMSE .

Performance analysis and simulations
In the OSIC system, we expand (7) as Suppose we detect x j ( j > i)asx j , and D x j = x j −x j , then (31) can be written asỹ As we know, the bit error probability of x i is affected by the SINR of From (33), reducing the error propagation of x j ( j > i) and enlarging R i,i can help us improve the SINR of x i . In our algorithm, our target is to maximise R i,i from R N,N to R 1,1 . In the first stage, maximising R N,N is equivalent to maximising the SINR of x N , and the error probability of x N will be minimised, which means the error propagation of x N will be minimised. Then, in the second stage, as the error propagation of x N is minimised, maximising R N−1,N−1 will result in maximising the SINR of x N−1 . Then, we can sequentially maximise the SINR of x N−2 , x N−3 and so on. Compared with our algorithm, the previous LR algorithms did not aim to maximise the SINR of x i in each stage of the OSIC system, which will result in worse performance than our algorithm in the OSIC system. Next, we validate the performance of the NLR-OSIC algorithm through the computer simulations over various MIMO systems. We compare the performance of the NLR-OSIC algorithm with the LLL-OSIC [17,20], the ELR-OSIC [20], the ZF (MMSE)-OSIC [22], and the ML algorithm [3,4]. The LLL-OSIC and the ELR-OSIC are briefly described in Section 2, and all the ordering process of LLL-OSIC, ELR-OSIC are the same as the VBLAST system. The ML algorithm has the optimal performance while it has exponential complexity. Fig. 4 compares the error performance of the algorithms in the ZF case with 16QAM, and N r = N t = 8. These two figures show that LR-aided algorithms have better performance than the VBLAST system while have worse performance than the ML algorithm. Fig. 4 represents that the LLL-OSIC algorithm has better performance than ELR-OSIC in the ZF case in 8 × 8 system. Fig. 5 represents that the ELR-OSIC algorithm has better performance than LLL-OSIC in the MMSE case. Moreover, both the two figures validate that our proposed algorithm achieve better error performance than the previous LR-OSIC detectors in the MIMO systems.
To probe a little further, we simulate the BER performance for the correlated fading channel. We use the Kronecker-structure-based channel model [20,25], in which the channel matrix is generated as  where H c is the channel matrix defined in Section 2, C r and C t denote the channel covariance matrices at the receiver and transmitter, and A 1/2 A 1/2 = A. We assume having uniform linear array antenna configuration [20,25], and C r and C t have the Toeplitz structures of where c r and c t denote the values of the correlation between the adjacent antennas and ( · )* denotes the conjugate of ( · ). Fig. 6 shows the attainable BER performance as a function of s 2 x /s 2 z for c r = c t = 0.3 in 8 × 8 system with 16QAM. We simulate the performance in the MMSE case, for the MMSE can provide better performance and is more widely used in MIMO system. From Fig. 6,w efind that our proposed algorithm also can achieve better error performance than the previous LR-OSIC detectors in the MIMO systems in correlated fading channel.

Complexity analysis of NLR-OSIC algorithm
To a large extent, the complexity of the LR-aided algorithms will be affected by the required number of iterations, i.e. the number of the updates of the matrix H (which will be assessed by means of simulation results in Figs     We now give a brief analysis about a picture of the per-iteration complexity. At the ith stage of our algorithm, each iteration need to update a (K − i +1)×(N − i + 1) matrix. While both the LLL and ELR algorithms should update a K × N matrix in each iteration [17,20]. The computational complexity is compared in Figs. 9 and 10, showing the empirical CDFs of the total number of arithmetic operations. The simulation environment of Fig. 9 (Fig. 10)i st h e same as Fig. 7 (Fig. 8). The simulation results show that the LLL also has the largest computational complexity, whereas the ELR-OSIC has the lowest computational complexity, and our proposed algorithm has moderate computational complexity.

Conclusion
The LR-aided algorithms have become promising detection algorithms in MIMO system, for their excellent performance and complexity. In this paper, we propose a novel LR-aided algorithm which aims to use the LR scheme to improve the performance of the OSIC system. Like the VBLAST system, we improve the performance by maximising the SINR of the detected symbol at each stage of the OSIC system. In fact, maximising the SINR is equivalent to a shortest vector problem. In addition, through computer simulations, we also demonstrated that our proposed algorithm is capable of obtaining better BER performance than the previous LR-OSIC algorithms with moderate increase in complexity.

Acknowledgment
This work was supported by the Natural Science Foundation of China under Grants 61372126 and 61302101. empty, and U = X. From [24], U −H has the form as follows where Y = X −1 is also a upper triangular matrix. As XY = I, we know N k=1 X i,k Y k,i = 1, i = 1, 2, ..., N .
As both X and Y are upper triangular matrices, we know X i,k =0,i > k and Y k,i =0, k > i. Then Substituting (42) into (41), we have X i,i Y i,i =1, i =1, 2, …, N. So, Multiplying (40) by the row permutation matrix L, we have As Y is the upper triangular matrix, Y H is the lower triangular matrix, then LU −H is the lower triangular matrix. From (38), we know Substituting (46) into (43), we can get (15). □

Appendix 2: Proof of Lemma 1
Proof: Firstly, as T ′ (m) is a real unimodular matrix, we can figure that the determinant of T ′ (m) is ± 1, and all the elements of T ′ (m) are integers. Then, it is easy to know that all the elements of T(m) are integers. Moreover, from [24], we know that where | · | is the determinant. Then T(m) = +1 · 1 = +1. .