A General Hybrid Precoding Method for mmWave Massive MIMO Systems

. Recently, hybrid precoding architectures have been proposed for the purpose of practical implementation of massive Multiple-Input Multiple-Output (MIMO) systems in the Fifth Generation (5G) networks. In this paper, a general precoding method is investigated for Millimeter Wave (mmWave) multi-user systems, which is composed of the designs in analog Radio Frequency (RF) and digital baseband matrices. In the general hybrid architecture, the analog part is constituted of independent analog sub-arrays with full connection inside. The analog precoding matrix is considered by maximizing Signal-to-Leakage-plus-Noise Ratio (SLNR) with only the long-term statistics of user groups. Due to the constant module constraint of RF chains, a supplemental matrix is introduced to reduce the performance loss. The digital precoding matrix performs Regularized Zero-Forcing (RZF) with the reduced amount of eﬀective channels. Finally, simulation results demonstrate the performance improvement of the proposed precoding method. Meanwhile, trade-oﬀ be-tween the performance and the complexity is handled well by the proposed method.


Introduction
In order to meet dramatically increasing requirements of spectral efficiency in the fifth generation (5G) systems, massive multiple-input-multiple-output (MIMO) has been well studied as a promising technology.Massive MIMO systems is based on a large size antenna array at the base station (BS).In this way, signal energy can be focused into small zones in the direction of the desired users by beamforming techniques [1], [2].In traditional MIMO systems, every antenna is linked to a radio frequency (RF) chain for the fully utilized degrees of freedom (DoF) and then simultaneously served multiple users.However, it is too difficult for such a great many RF chains to be installed at the physically constrained transceivers because of the limited space, the unaffordable cost and the tremendous energy consumption [3].Particularly, in millimeter wave (mmWave) massive MIMO systems, it is impossible to have one RF chain for each antenna.Therefore, the hybrid analog and digital array architectures are considered as potential solutions.
To exploit the full benefits of multiple antennas in the hybrid architectures, channel state information (CSI) is usually necessary in the BS side.Unfortunately, huge overhead for full CSI acquisition is hardly possible for both frequency division duplex (FDD) and time division duplex (TDD) massive MIMO systems, even with channel estimation and channel reciprocity, respectively.Consequently, low complexity hybrid precoding is encouraged in the hybrid architecture for massive MIMO systems.The long-term statistics, for example the spatial covariance properties, are often applied for the reduction of the complexity in CSI acquisition.That kind of properties remain the same for a relatively long time and can be obtained more infrequently compared to the instantaneous CSI.In addition, users who share the common spatial scatterers are divided into a group.Then the common parameters of the group can be obtained with lower overhead.In the hybrid precoding, analog precoding is performed in the light of the long-term statistics of channels while the digital precoding is achieved with a much-reduced amount of instantaneous CSI.
Two start-of-art hybrid precoding architectures, which are named fully-connected architecture and sub-connected architecture, are shown in Fig. 1.The performance of fullyconnected architecture in Fig. 1(a) with reduced CSI feedback has been investigated in [4][5][6] to serve multiple users.The analog part is for dimension reduction and is determined by channel covariance matrix while the digital part is designed by the traditional MIMO precoding methods.The sub-connected hybrid architecture in Fig. 1(b) on the other hand can reduce the hardware complexity in [7], [8], but is usually based on full CSI.Furthermore, a more general hybrid architecture is given in [9][10][11], aimed at striking balance between complexity and performance.However, full CSI is required and also limits its application in the actual network.
The related work is list on Tab. 1.In summary, there are still challenges need to be solved: 1) A flexible precoding method for the general hybrid architecture; 2) mmWave frequency band and multi-user interference elimination should be considered for the future networks; 3) Reduced amount of CSI is important for the system with large scale antennas.Therefore, for mmWave massive MIMO systems, a more general hybrid precoding with multi-users should be investigated.
In this paper, the goal is to provide a flexible precoding method with reduced CSI for the general hybrid architecture in multi-user massive MIMO systems.Based on long-term channel and interference statistics, the analog part is designed by maximizing signal-to-leakage-plus-noise ratio (SLNR).Afterwards, a constant module matrix and a supplemental matrix are extracted iteratively from the optimal result.The former one is set as the analog precoding matrix due to the phase shifter constraint while the latter one is added ahead of the digital part.In addition, the digital precoding matrix is formulated depends on the instantaneous CSI with lowdimension.The contributions are summarized as follows.
• To reduce the complexity of the analog part, the longterm channel and interference statistics are used to design the analog precoding, which is unchanged for a long time and can be obtained through previous channel property.The SLNR maximization is used to jointly consider the effects of signal and interference between different user groups.
• In the digital part, a supplemental matrix is added to each sub-array.It is to reduce the performance degradation caused by constant module constraint in RF chains.
Then the digital part is performed with reduced CSI, which needs overhead in TDD and FDD systems.
• From the simulations, the proposed precoding method has performance enhancement compared to other work.The complexity and performance strike balances compared to the fixed hybrid architectures.The structure of the rest of this paper is organized as follows.The system model is formulated in Sec. 2. In Sec. 3, hybrid precoding design is described in detail, as well as the complexity comparison for the architecture.Simulation results are presented in Sec. 4 to demonstrate the performance of the proposed hybrid precoding method.Finally, conclusions are summarized in Sec. 5.

System Model
In this section, a multi-user massive MIMO system is considered with hybrid precoding at the BS, as shown in Fig. 2. M sub-arrays are equipped on the BS side, each with K antenna elements, N RF chains (N ≤ K) and NK phase shifts (PSs).Then there are total N M RF chains, K M antenna elements, and NK M PSs.The fully-connected and the sub-connected architectures are special cases of the general model, corresponding to M = 1 and M = N, respectively.For simplicity, we assume that all the sub-arrays have the same structure.The extension to the irregularity case is straightforward.
The BS serves S(≤ N M) users, each with a single antenna.Let F RF ∈ C K M×N M be the analog RF precoding matrix, F BB ∈ C N M×S be the digital baseband precoding matrix, and P ∈ C S×S be the power allocation diagonal matrix.Denote y = [y 1 . . ., y s , . . .y S ] T be the received signal vectors, where y s corresponds to the received signal of user s.The received signal vector can be written as where x ∈ C S×1 is the transmitted signal vector with E[xx H ] = I and n ∼ CN (0, σ 2 I) is the complex additive white Gaussian noise (AWGN) vector.Let H ∈ C S×K M be the downlink channel matrix and H = [h 1 , . . ., h S ] T , where h s is the channel vector for the s th user.
Denote the antenna indexes to be {1, . . ., MK } and T m = {(m − 1)K + 1, . . ., mK }, for m ∈ {1, . . .M }, as the partitioned subset of antenna indexes connected to the m th sub-array, such as For the hybrid architecture in Fig. 2, the analog RF precoding matrix, F RF , is block-diagonal and can be expressed as where F m ∈ C K×N , m = {1, . . .M } is a fully-connected sub-matrix.This is different from the fully-connected architecture and the sub-connected one.
Based on the long-term statistics of massive MIMO channels, the spatial covariance properties, all users are clustered into M groups according to their similarity.The user groups are denoted by U m , m ∈ {1, . . .M }.The user grouping algorithms in [13], [12] help divide all users into M groups, each with S users, where SM = S. Let H m ∈ C S×K M be the channel matrix of the m th group and the corresponding whole channel matrix be H s h s } to be the long-term spatial covariance matrix of user s, where h s is the channel vector.Then the spatial covariance property of the group m is calculated by the users in this group, When the hybrid array architecture and user groups are given, the channel matrix can be written as where H gm , g, m ∈ {1, . . ., M } is the S × K channel matrix between the m th sub-array and the g th user group.
In addition, the accuracy of baseband precoding F BB in (1) depends on the available amount of CSI at the BS.In our model, to make sure that the data streams of one sub-array are routed to a selected user group, the baseband precoding matrix has a block diagonal form where It is the baseband precoding sub-matrix for the m th user group.Furthermore, W m is a function of the corresponding sub-effective channel matrix.
The power allocation matrix P = diag{P 1 , . . ., P S } in (1) is chosen to satisfy the transmit power limitation, F RF F BB P 2 F ≤ P T , where P T is the transmit power limitation.

Hybrid Precoding Design
From Sec. 2, the hybrid precoder can be divided into two parts, an analog precoder and a digital precoder.Based on channel statistics, the analog precoder is used to eliminate intergroup interference.Then the digital precoder, based on the instantaneous effective channel, is for multi-user interference cancellation and is obtained through real-time CSI.In addition, between these two parts, a supplemental matrix is added for loss of the RF constant module constraint.We will discuss their designs in this section.

Analog Precoding
The signal-to-interference-plus-noise ratio (SINR) of each user is optimized by the beamforming matrices.However, the optimal method generally needs to deal with M coupled variables and with high complexity.Instead, we design analog beamforming to maximize SLNR [14].In our model, the M sub-arrays are assigned to M groups by an allocation index δ g,m , g, m ∈ {1, . . ., M }.If sub-array m is assigned to serve user group g, then δ g,m = 1.Otherwise δ g,m = 0. Assume that each user group is exclusively connected to a sub-array.Therefore, the data streams of a user group are only routed to the selected sub-array rather than all the sub-arrays.Then, the complexity of baseband to RF processing and the power consumption are reduced.The SLNR of group g is given by In ( 6), (a) comes from the Jensen's inequation and (b) comes from the definition Rg,m = E(H H g,m H g,m ), where Rg,m is the correlation matrix between sub-array T m and group U g .
The optimal precoding matrix F (o)  m is designed based on From the Rayleigh-Ritz quotient result [14], the optimal result is proportional to a generalized eigenvector of the matrix pair {δ g,m Rg,m , σ 2 I + g g (1−δ g ,m ) Rg ,m }, and can be written as where EV. denotes the generalized eigenvector operation.Then, the optimal precoding matrix can be obtained by the leading columns of Γ (o) m corresponding to the largest eigenvalues, where ς is a normalization factor so that Tr(F m F H m ) = 1/M.Furthermore, for the constant module constrain of RF chains, the following optimization problem should be solved, min On the basis of the demonstration in [7], the optimal precoding sub-matrix will be where ∠F (o) m represents the angle of F (o) m .

Supplemental Matrix
Due to the constant module constraint in (11), the optimal solution in ( 8) and ( 9) for the problem in (7) is no longer valid.Therefore, a supplemental matrix C is added between the analog part and the digital part to compensate the performance loss, as shown in 3. The matrix is a block diagonal form, C = diag{C 1 , . . ., C M }, where C m ∈ C N ×N is the subsupplemental matrix for sub-array m.Then, an orthonormal Procrustes problem is constructed and the supplemental matrix is obtained by solving the following optimization problem min After the mathematical operator, the singular value decomposition (SVD), F (o)H m F m = UΣV H , the optimal result will be Based on the dictionary learning theory and alternating optimization theory in [17], the optimal analog precoding matrix and supplemental matrix are obtained by iterative refinement.The optimization algorithm is summarized as follows.
• Iterations for groups: -Replace the unconstraint analog precoding matrix with F (o) = F (o) C H . -Replace the constraint analog precoding matrix F by (11).-Replace the supplemental matrix C by (13).

Digital Precoding
After computing the analog precoding part and the supplemental part, the effective channel matrix, H (e) , can be obtained.From ( 2) and ( 4), the effective channel matrix between group g and sub-array m can be written as With the instantaneous CSI acquired at the BS side, the baseband precoding is simplified to a multi-user precoding problem and can be solved by existing algorithms.By applying regularized zero-forcing (RZF) method in digital baseband part, user groups are addressed separately at the baseband with their effective channel matrices H (e)  g,m = H gm F m C m , where m ∈ {1, . . ., M }.Then where β is the regularization factor.

Complexity Comparison
By fixing the numbers of total antenna elements and RF chains to MK and M N, the hardware complexities of the different architectures are compared in Tab. 2. By raising M, the number of antenna elements, PSs and RF chains in each sub-array decreases.Meanwhile, the numbers of deployed combiners and splitters connected to the antenna elements reduce: they are MK and M N when M and become zeros when M = S. Hence, the hardware complexity and cost are reduced as the number of sub-arrays increases.Furthermore, effective RF information shrinks as M grows, which will degrade performance.Therefore, proper numbers of sub-arrays should be decided according to the tradeoff among performance requirements, hardware complexities, and deployment costs in different areas.

Total
Total Total Hardware complexity comparison with different M.

Numerical Results
The numerical results are presented in this section to verify the performance of the proposed method of hybrid precoding design.We use the software MATLAB V8.0 [18] as the simulation tool in this section.Users are distributed as Poisson Point Process (PPP) in a sector with the angle range of 120 • and the radius of 50 m.The total number of antenna elements is set to be K M = 64.The Uniform Linear Array (ULA) with λ/2 between adjacent antennas is used inside each sub-array and 10 2 λ between adjacent sub-arrays, where λ is the wavelength.The number of users is set to be as the number of RF chains, S = M N. We use M = 4 as an example.
The mmWave propagation channels have limited scattering clusters.The classic Saleh-Valenzuela geometric channel model is adopted here [9].The channel vectors of user s can be obtained by where t, L, α l and a(θ l ) are the time index, the limited number of scattering clusters, the l th channel gain and the l th array response at the BS with the angle of departure θ l , respectively.In this paper, we adopt L = 8, α l ∼ CN (0, 1) and the carrier frequency at 28 GHz.The array response for ULA is [1, e 2πj/λd t sin(θ l ) , . . ., e 2πj/λd t (M K−1) sin(θ l ) ] T where d t is the space between two antenna elements.Through the uplink pilot, the BS estimates the statistical information of each user R s [13].
In Fig. 4, the achievable rates are compared between of the proposed method and the existing methods, where the number of RF chains is set by N M = 16.In the figure, the existing precoding methods for the two architectures in [4] and [15] are denoted by Ref. 1 and Ref. 2, respectively.The modified MMSE method proposed in [16] is used for the general architecture and expressed as Ref. 3. From this figure we can see, the performance of the achievable rate with the proposed method is consistent with the existing one in a fully-connected architecture.That is because the proposed method treats all users as one group and exploits the second order statistics of channels.In addition, the proposed method has a higher achievable rate with the general architecture and the sub-connected architecture.The improvements are obtained by jointly considering the signal, leakage, and noise power together instead of just leakage power or signal power in other methods.Furthermore, together with Tab. 3 for complexity comparison, we can see that the flexible hybrid architecture is a compromise among the performance, complexity, and cost.
Figure 5 illustrates the performance of the achievable rate with the proposed method in different numbers of RF chains.The transmit SNR is P t /δ 2 = 40 dB.Due to high spatial correlation, the dominant rank of the spatial correlation matrix is far less than the number of antennas.According to the calculation in [10], the average dominant rank of the given system is about 24.In the figure, the ideal case stands for the performance without RF constant module constraint, the w.sup.cases stand for the performance with supplemental matrices, and the w/o.sup.cases stand for the performance without supplemental matrices.from the figure we can see, by raising the number of RF chains to the DoF, the number of served users grows, so does the achievable rates.In addition, the performance is improved by means of adding the supplemental matrix.Furthermore, the proposed method outperforms the existing method.
In Fig. 6, the performance with or without constant module constraint is compared, where M = 2 and N = 8.The coordinated analog precoding in [11] is used for the performance comparison and expressed as Ref. 4. The classical block diagonalization (BD) in [12] for a full connection is donated as BD.The BD method is used as a reference to see the performance of our method.Compared to the Ref. 4, the proposed method has better performance both with constant module constraint (w.constraint) and without constant module constraint (w./o.constraint).Without the constraint, the RF precoding keeps orthogonality and does not lose any gains.However, the RF orthogonality is broken by the constraint.Therefore, the performance deteriorates.In Fig. 7, the performance of the achievable rate with the proposed method in different numbers of users is shown.The transmit SNR is P t /δ 2 = 40 dB.Due to the spatial correlation statistics, the maximal number is set to 24.The two numbers of sub-arrays are considered here, M = 2 and M = 4.The BD method is used as a reference.By raising the number of users, the achievable rates increase.The proposed method performs better than the previous work both in M = 2 and M = 4 cases.That is because the eigen beamforming in Ref. 4 causes interference between user groups while the proposed method considers the signal and interference together.As the BD curve shows, the rate rises at first and then becomes flat because of the rank of the channel matrix.If more users are added in the model, the performance will go down eventually.

Conclusions
In this paper, a general hybrid precoding method is designed for mmWave massive MIMO systems.All users are grouped and served by independent analog sub-arrays.We design the high-dimensional analog precoding matrix with the knowledge of long-term statistics of channels, the spatial covariance matrix.By maximizing the SLNR function, the optimal precoding matrix is split into an analog part and a supplemental part.Then a baseband digital precoding with a low dimension is obtained through the knowledge of the instantaneous effective channel matrix.Finally, simulation results demonstrate the feasibility and superiority of the proposed method compared to the existing ones.Meanwhile, the proposed method also reduces the complexity.

4 Fig. 7 .
Fig. 7. Performance comparison with the number of users.
LI received the B.E., M.S., and Ph.D degree in Telecommunication Engineering from Xidian University in 1994, 1997, and 2002, respectively.From 1997 to 1998, he was a visiting scholar in the Department of Engineering, Shizuoka University.From 2002 to 2004, he was a Postdoctor in the University of Trento.From July 2007 to Dec. 2007, he was a visiting Professor in the Institut National des Sciences Appliquees (INSA), LYON.He is currently a Professor in the School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China.His research interests are in the area of wireless networking and communications, including the next generation cellular network (5G) and WLAN (e.g.IEEE 802.11ax and 11ay), the MAC and higher layer technologies, non-orthogonal multiple access for 5G.Zhongjiang YAN (corresponding author) received the B.E. and Ph.D degree in Telecommunication Engineering from Xidian University in July 2006 and 2011, respectively.From Sept. 2010 to Dec. 2011, he was a visiting Ph.D. student in the Department of Electrical and Computer Engineering, University of Alberta.In Dec. 2011, he joined School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China, where he is currently an Associate Professor.His research interests are in the area of wireless networking and communication, including protocols design and their FPGA implement of the media access control layer, radio resource management, and traffic scheduling strategy of the wireless networks, e.g., 5G, WLAN and etc. Jiancun FAN received the B.S. and Ph.D. degrees in Electrical Engineering from Xi'an Jiaotong University, Xi'an, China, in 2004 and 2012, respectively.From 2009 to 2011, he was a Visiting Scholar with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.He is currently an Associate Professor with the Department of Information and Communications Engineering, Xi'an Jiaotong University.His general research interests include statistical signal processing and wireless communications, with emphasis on cross-layer optimization for spectral-and energy-efficient networks, multiple antenna MIMO communication systems, and practical issues in LTE, and 5G systems.Mao YANG received the B.E. and M.S. degree in Information and Telecommunication Engineering from Xidian University, China, in 2006 and 2009, and the Ph.D degree in Electronic Engineering from Tsinghua University, China, in 2014.He is current an Associate Professor of School of Electronics and Information at Northwestern Polytechnical University, China.His research interests are in the area of wireless networking and communications, including the next generation cellular network (5G) and WLAN (e.g.IEEE 802.11ax and 11ay), the MAC and higher layer technologies, nonorthogonal multiple access for 5G, software-defined wireless networking, and wireless network virtualization.