Weighted Gauss-Seidel Precoder for Downlink Massive MIMO Systems

In this paper, a novel precoding scheme based on the Gauss-Seidel (GS)method is proposed for downlinkmassivemultiple-inputmultiple-output (MIMO) systems. The GS method iteratively approximates the matrix inversion and reduces the overall complexity of the precoding process. In addition, the GSmethod shows a fast convergence rate to the Zero-forcing (ZF)method that requires an exact invertible matrix. However, to satisfy demanded error performance and converge to the error performance of the ZF method in the practical condition such as spatially correlated channels, more iterations are necessary for the GS method and increase the overall complexity. For efficient approximation with fewer iterations, this paper proposes a weighted GS (WGS)method to improve the approximation accuracy of the GSmethod. The optimalweights that accelerate the convergence rate by improved accuracy are computed by the least square (LS) method. After the computation of weights, the different weights are applied for each iteration of the GS method. In addition, an efficient method of weight computation is proposed to reduce the complexity of the LS method. The simulation results show that bit error rate (BER) performance for the proposed schemewith fewer iterations is better than the GS method in spatially correlated channels.

intra-cell interference (ICI) in a single-cell system [1][2][3][4][5][6]. Indeed, there are various non-linear processing methods and those provide higher error performance than linear processing methods [7]. However, the complexity of non-linear processing as an existing problem has become higher in large scale MIMO systems. Therefore, the ZF method as the simple linear processing with optimal performance has been attracted attention for massive MIMO systems [5]. However, the ZF method requires the operation of direct matrix inversion whose computational complexity for multiplication is O N u 3 where N u is the number of users. The complexity of direct matrix inversion is unaffordable for the practical implementation of massive MIMO systems which serve tens of users as the number of BS antennas increases [6]. Several precoding and detection schemes with low complexity were surveyed in [8][9][10][11]. To balance between the complexity and performance, many studies that approximate the matrix inversion to reduce the complexity were conducted [12][13][14][15][16][17][18][19][20][21][22][23][24], and approximate methods can be classified into two categories: Neumann series expansion (NSE) and iterative method.
The NSE method approximates the matrix inversion as expansion terms of a matrix polynomial. The invertible matrix for an initial matrix is required to use the NSE method. The diagonal matrix was used as the initial matrix of the NSE method to achieve the error performance equivalent to the ZF method and low complexity in [12,13]. However, the NSE method has two limited conditions related to its practical application to massive MIMO systems. Firstly, the expansion of the NSE method does not exceed 2 terms since the expansion of 3 terms requires the same complexity of O N u 3 as direct matrix inversion. Secondly, the NSE method can only achieve high convergence probability in the large ratio ρ > 5.83, where the ratio ρ is defined as ρ = N b /N u and N b is the number of BS antennas [13]. Thus, the NSE method is difficult to achieve both low complexity and high approximation accuracy in small ρ.
To obtain better approximation accuracy than the NSE method for the same complexity, the iterative method approximates a vector multiplied by an invertible matrix. Since the iterative method only requires matrix-vector multiplication operation, any iterations maintain the complexity of O N u 2 . The error performance of iterative methods is determined depending on the initial solution and iteration matrix. The GS method requires fewer iterations for demanded error performance compared with other iterative methods such as Jacobi et al. [9].
To improve the error performance of the GS method, the initial solutions were variously defined in [15][16][17][18][19]. The stair matrix was proposed as the initial solution in [15]. The one expansion term of the NSE method that uses the diagonal matrix as the initial matrix was proposed as the initial solution in [16]. Also, the two expansion terms of the NSE method were proposed as the initial solution in [17][18][19]. In addition to variant initial solutions, the successive over relaxation (SOR) methods that converge faster with the aid of a relaxation parameter compared with the GS method were proposed in [20][21][22]. The GS and SOR methods can converge to the error performance of the ZF method with fewer iterations under the ideal conditions such as the large ρ. However, the convergence rate and approximation accuracy are reduced under practical conditions such as the small ρ and spatially correlated channels. To solve these problems, the GS methods based on the soft-output detector for uplink systems were proposed in [16][17][18], and the SOR method applying adaptive relaxation parameters for each iteration was proposed in [22]. Another way to solve these problems applies different weights for each expansion term or iteration to improve the error performance. The NSE methods applying different weights were proposed in [23,24], and the approximate methods with properly computed weights can achieve better error performance. However, References [12][13][14][15][16][17][18][19][20][21][22] did not consider the performance improvement by weighting, and the weights for each iteration were only set to default value as 1. In addition to weighting the approximate method, the consideration of practical channel environment is significant in terms of performance verification [6]. This paper especially verifies the error performance of iterative methods for spatially correlated channels.
In this paper, a WGS method to perform the ZF precoder is proposed and provides better error performance in spatially correlated channels. The weights that minimize the error between exact and approximate matrix inversion are computed by the LS method. For a feasible LS method, the process of the GS method should be transformed into the form of NSE. The optimal weights are only computed by the channel response of each channel coherence interval. However, the proposed scheme uses one weight vector in the optimal weights since it is more efficient in terms of computational complexity. Also, this paper proposes an efficient method to reduce the complexity for the computation of the weights. The simulation results show that the WGS precoder provides better error performance than the GS precoder. This paper is organized as follows. Section 2 presents models for the downlink massive MIMO system and spatially correlated channels. Section 3 explains iterative methods, the GS precoder, and the improvement of the GS method. In Section 4, the WGS precoder and the efficient weight computation method are proposed. Also, Section 4 provides the computational complexity for iterative methods. Section 5 presents the simulation results for BER performance comparison of iterative methods. Finally, Section 6 gives brief conclusions.

System Model
This section presents a downlink massive MIMO system model and channel model assumed in this paper. Fig. 1 shows the system configuration for the downlink massive MIMO where the BS with N b transmit antennas simultaneously communicates N u users with a single antenna, and N b N u is assumed. The channel matrix H between all transmit antennas and total users is H = h 1 h 2 · · · h N u T whose the m-th row vector is h m = h m1 h m2 · · · h mN b . The received symbol y m at the m-th user is as follows: where P is the downlink transmit power, G is an N b × N u precoding matrix, g m is the m-th column of G, s m is the m-th complex transmit symbol with zero mean and unit variance, z m is the m-th additive white Gaussian noise (AWGN) with zero mean and unit variance, and · F is Frobenius norm operator.

Channel Model
For spatially correlated channels, this paper considers the exponential correlation matrix model which is expressed as [25] where r i,j is the element in the i-th row and j-th column of the spatial correlation matrix R, (·) * is the complex conjugate, a = ζ e jθ is the correlation coefficient, ζ is the correlation coefficient magnitude with 0 < ζ ≤ 1, and θ is randomly determined in interval 0, π 2 . The channel matrix H is expressed as follows: where H w is an N u × N b Rayleigh fading channel matrix which elements are independent and identically distributed (i.i.d.) circular symmetric complex Gaussian random variables with zero mean and unit variance, R r is an N u × N u complex spatial correlation matrix for receive antennas, and R t is an N b × N b complex spatial correlation matrix for transmit antennas. The distance between adjacent antennas is related to the correlation coefficient magnitude and the larger distance than half a wavelength can omit the spatial correlation [22]. This paper considers single antenna users and assumes that the distance of users is larger than half a wavelength. Therefore, it is assumed that the magnitude of correlation coefficient of the user has zeros in this paper.

Approximate Methods
In this section, iterative methods and the relationship between iterative method and NSE are briefly explained, and the GS precoders are presented.

Iterative Methods
The iterative method approximates a signal vector x to solve the linear equation of Wx = s. The iterative method can be expressed as follows: where W = M − N, M is a nonsingular matrix, subscript k is the iteration number, and x 0 is an initial solution that generally assumes zero vector. The expression of Eq. (4) with respect to the x k is expressed as follows: where B = M −1 N is an iteration matrix. For the convergence of lim k→∞ x k = W −1 s, the condition of ρ (B) < 1 should be satisfied, and ρ (·) is the spectral radius of a matrix. In comparison to the iterative method, the NSE method is expressed as follows: where I N is the N × N identity matrix, and the iterative method can be transformed as the form of NSE as follows: where the initial solution x 0 is zero vector, and the M is called as preconditioner that determines the error performance of approximate methods.

GS Precoder
For the expression of the GS precoder, the gram matrix W is defined as follows: The preconditioner of the GS method is expressed as follows: where W is decomposed as W = D + L + U, D is the diagonal part, L is the strictly lower part, and U is the strictly upper part. The k-th GS solution x k is calculated as follows: where s = s 1 s 2 · · · s N u T is a N u × 1 modulated symbol vector. The downlink transmit symbol vector t is calculated with matched filter as follows: where γ is scaling factor to normalize the total transmit power.

Improvement of GS Method
The initial solution x 0 is variously defined according to the system requirement, which can improve the convergence rate and approximation accuracy of iterative methods. Among various initial solutions, the recent initial solution for improvement of the GS method is expressed as [18] x where the matrix E is defined as E = L + U. Another way to improve the GS method is the usage of a relaxation parameter. The relaxation parameter α to the GS method to work well at any ρ is applied as [19] x Eq. (13) is called as the SOR method, and the GS method is special case of the SOR method whose the value of α is equal to 1. The GS precoder with the initial solution of Eq. (12) and the SOR precoder with the appropriately selected α provides better error performance than the GS precoder with the initial solution of zero vector.

Proposed Downlink Precoding Scheme
In Section 2, the spatially correlated channel is considered. The problem is that the spatial correlation degrades the error performance of approximate methods. Although direct elimination of the spatial correlation is difficult, the improvement of error performance can alleviate this problem to some extent. In this section, a WGS method is designed to improve the error performance of the GS precoder for downlink massive MIMO systems. The WGS method is inspired by the weighted Neumann series in [23,24]. Also, an efficient method to reduce the complexity of weight computation is proposed.

Weighted GS Precoder
The proposed scheme that applies different weights to each iteration of the GS method is expressed as follows: where {u n } k−1 n=0 are complex weights to decrease approximation error, and the initial solution x 0 is zero vector. For the computation of the optimal weights, the WGS method is expressed similarly with Eq. (6) as follows: where the weights are applied in reverse order to each term of matrix polynomial, and the weighted W −1 k of the GS method is expressed as follows: where a n is the weight of reverse order to simplify the subscript. The weights can be computed by the LS method that determines the solution of the linear model in which the given data is the channel response [26]. Firstly, the optimal weights should minimize the value of the problem as follows: For the computation of the optimal weights, the j-th term of the matrix polynomial of the WW −1 k except the weight parameter is defined as follows: where c (j) is the -th column vector of C (j) and the -th column vector of I N u is defined as e .
Secondly, the optimization problem in Eq. (17) can be modified with the different form that computes the weights as the solution of linear equation. The modified optimization problem is defined as follows: i as follows: Finally, the weight vector computed by the LS method is expressed as follows: In this way, the computation of the optimal weights should be performed at every channel coherence interval, and the computational complexity is higher than direct matrix inversion. To reduce the complexity, the WGS precoder uses the only one optimal weight vector computed early in the system. The computation of the weights is executed only once and not included in approximation complexity. Since the optimal weights are determined only within a certain distribution, any weights in the distribution can sufficiently improve the error performance of the GS precoder. Fig. 2 shows the distribution of the optimal weights for 5 expansion terms of matrix polynomial in complex plane. To illustrate the effect of the spatial correlation and the ratio ρ = N b /N u on weights, three different cases are presented. In Fig. 2, n is the order of the weights, the N b is fixed at 200. The case (a) is that N u is 20 and ζ is 0, the case (b) is that N u is 40 and ζ is 0, and the case (c) is that N u is 40 and ζ is 0.3. The number of optimal weights representing the distribution is 200 for each term of the matrix polynomial. The distribution of the case (c) is the largest in Fig. 2, but is enough small since the width of the distribution for real and imaginary values is close to 1. The optimal weights change according to the ρ and ζ since the correlation degree of the channels is different. The weights of the first and second order overlap since those values are close to 1. In contrast, the absolute value of the weights for the latter order increases as the correlation degree of the channels increases and is clearly observed.

Efficient Weight Computation Method
As seen above process, the LS method can efficiently compute the weights. However, the computation method of the optimal weights requires very high computational complexity O N u 4 . This paper proposes an efficient method to compute the weights with less complexity. The complexity can be reduced by modifying the Z and b in Eq. (21). The modified Z is defined as the N u × k combined matrix X as follows: where diag (·) forms the diagonal elements of a matrix to a column vector, and the modified b is defined as the k × 1 vector d whose all elements have the value of 1. The efficient weight computation method is expressed similarly with Eq. (21) as follows: Additionally, only the real part of the weights is used to exclude the influence on imaginary values when the weights are determined close to the distribution boundary on the imaginary axis. In terms of the number of users, the weight computation method using only the diagonal elements for each term of the matrix polynomial reduces the complexity to O N u 2 . The weights computed by Eq. (23) are close enough to the optimal weights and significantly improve the error performance of the GS method. Tab. 1 expresses the meaning of parameters in the WGS algorithm that the summary of the WGS precoder using the real part of one a diag is expressed as an algorithm as follows: Step 1: Input parameter s, H, K Step 2: Initialization Step 3: Weight computation Step 4: GS iteration for k = 1 : 1 : K Step 5: GS precoder Step 6: Output parameter t The N u × 1 vector whose all elements are one real( ) The real part of the complex value trace( ) The sum of diagonal elements

Complexity Analysis
Tab. 2 expresses the complexity of iterative precoders compared in simulation results. The number of complex multiplications for only calculating x k is considered, and the complexity of the weight computation is excluded since the weight vector is not iteratively computed during the precoding process. The difference between the WGS and the GS method is the application of the weights as a coefficient of symbol vector in Eq. (14). The x k of the WGS method is expressed in terms of elements as follows: , and s i are the i-th element of x k , x k−1 , and s respectively, and w i, j is the element in the i-th row and j-th column of W. The multiplication required for one element of x k can be divided as 4 parts. The number of multiplications outside parenthesis is 1. In parenthesis, The number of multiplications for the first term is 1, the number of multiplications for the second term is i − 1, and the number of multiplication for the third term is N u − i. Since the size of x k is N u × 1, the total number of multiplications for one iteration is N u 2 + N u .

GS with the initial solution of zero vector
GS with the initial solution of Eq. (12) [14] H H W −1 s : N u 2 + N b N u

Performance Evaluation and Discussion
In this section, the BER performances of iterative precoders according to signal to noise ratio (SNR) are presented. The BER performance of the exact ZF precoder is provided as a benchmark. To confirm the difference in weight usage for BER performance, the GS precoder with the initial solution of zero vector is compared with the WGS precoder. Also, the BER performances of the GS precoder with the initial solution of Eq. (12) and the SOR precoder are provided to compare with the WGS precoder. In all simulations, the number of antennas at BS is fixed at 200 while the number of users is 20, 30 and 40 respectively, and 64-quadrature amplitude modulation (QAM) is used. The downlink channels are modelled as the Rayleigh fading channel with or without spatial correlation. In simulation results, K indicates the iteration number. Fig. 3 shows the comparison of the use of different weight vectors for the WGS method. The number of users is 30, the correlation magnitude is 0.5, and the number of iterations is 5. In Fig. 3, using all elements means the use of one a opt , using diagonal elements means the use of the real part of one a diag , and optimal weights mean the use of optimal weight vector that changes at each coherence interval. Fig. 3a presents the distance between weight values computed once by other methods and optimal weight values in complex plane according to the weight order. In Fig. 3a, the distance is averaged by 1000 optimal weight vectors. The graph shape can be different depending on the weight vector which is computed once. However, the difference in the graph shape does not change significantly since the one weight vector is determined close to the certain distribution as shown Fig. 2. The distance value calculated by comparison with the one a opt is smaller than using a diag , and the distance value of a diag on all weight order is smaller than 1. Furthermore, the BER performances are nearly same for all weight vectors in Fig. 3b. Thus, the use of a diag for the WGS precoder is reasonable in terms of the complexity and the BER performance, and the weight vector of the WGS precoder presented in all of following simulation results is based on the real part of the a diag .
where w i,j is the element in the i-th row and j-th column of W. Since the low ratio ρ and high correlation magnitude weaken the channel hardening property, the average magnitude for the diagonal dominance of W is decreased. Fig. 5 shows the BER performances of the GS precoder with the initial solution of zero vector and the WGS precoder in ideal Rayleigh fading channel with no spatial correlation. In order to compare the WGS precoder with the GS precoder only with respect to the number of users, the number of users is chosen as 20, 30 and 40. For all number of users, the BER performance of the WGS precoder is close to the exact ZF precoder in 3 iterations. However, the WGS and GS precoders provide nearly the same BER performance as the exact ZF precoder in Fig. 5a since the ratio ρ is large. In Figs. 5b and 5c, the difference of 2 dB is shown in the range of 18-20 dB. The difference in the BER performances is increasingly clear as the number of users increases.  6 shows the BER performances of the GS precoder with the initial solution of zero vector and the WGS precoder in spatially correlated channel. The number of users is 20, and the correlation magnitude is 0.5. For K = 4, the BER performance of the WGS precoder is close to the exact ZF precoder. The WGS precoder provides better BER performance than the GS precoder in the same iterations since the weights accelerate the convergence rate. The difference in the BER performances is more obvious compared with Fig. 5 since the spatial correlation significantly degrades the BER performance of the GS precoder. Fig. 7 shows the difference of approximation accuracy and convergence rate for the iterative methods according to the iteration number. In Fig. 7a, the approximation error between WW −1 k and I N u for the iterative methods is shown. The approximation error is calculated as follows: In Fig. 7b, the comparison of BER performances is shown for the SNR of 20 dB. The number of uses is 40, the correlation magnitude is 0.5, and the range of iterations is from 2 to 8 to clearly confirm that the error is gradually decreased and the WGS method completely convergences to the ZF method. The nearly optimal relaxation parameter of SOR method is empirically chosen for each configuration and correlation magnitude in Fig. 7 and following simulation results. It is confirmed that the graph shape in Fig. 7a is similar with that in Fig. 7b. Therefore, the decrease of the approximation error is directly related to the increase in BER performance. The WGS method shows the fastest convergence rate among the compared methods since the weights decrease the approximation error. Also, the approximation error and BER performance for WGS method with K = 5 are smaller than other methods with K = 6.  8 shows the BER performance for the WGS precoder, the GS precoder with the initial solution of zero vector, the GS precoder with the initial solution of Eq. (12), and the SOR precoder. In order to clarify the difference in BER performance among iterative precoders, the range of SNR is increased by 2 dB and 4 dB respectively. The correlation magnitude is fixed at 0.5, and the number of users is 30 and 40. In this situation, the WGS precoder requires more iteration to approach the BER performance of the exact ZF precoder since the convergence rate is sufficiently decreased. However, the WGS precoder outperforms the GS precoder with the initial solution of zero vector. In Fig. 8a, the WGS precoder is close to the exact ZF precoder in 5 iterations, and the WGS precoder provides better BER performance than the other iterative precoders in the same iterations. In Fig. 8b, the WGS precoder is close to the exact ZF precoder in 6 iterations, and even the WGS precoder with 5 iterations provides better BER performance than the other iterative precoders with 6 iterations. The improvement of the BER performance for the WGS precoder is noticeable in the large ratio ρ and spatially correlated channels. The WGS precoder maintains a faster convergence rate than the other iterative precoders, but large number of iterations can be required to converge to the BER performance of the exact ZF precoder shown in Fig. 8. However, the overall complexity is decreased by reducing the iteration number to satisfy demanded BER performance. In addition, the WGS precoder still maintains lower complexity than the exact ZF precoder according to the complexity analysis in Tab. 2.

Conclusion
The approximate methods based on iterative algorithm significantly reduce the computational complexity of direct matrix inversion and are efficient for the practical implementation of massive MIMO systems. In this paper, a weighted approximate method based on iterative algorithm is proposed to improve the approximation accuracy of the GS method. The WGS method uses different weights for each iteration of the GS method. To avoid the process that computes the optimal weights on every channel coherence interval, the weights are computed only once. The certain part of elements that is necessary for the computation of the weights is used to reduce the computational complexity. The WGS method maintains a fast convergence rate in spatially correlated channels. The simulation results show that the WGS method outperforms the GS method and provides better BER performance compared with other iterative methods.