Channel estimation using variational Bayesian learning for multi-user mmWave MIMO systems

This paper presents a novel variational Bayesian learning-based channel estimation scheme for hybrid pre-coding-employed wideband multiuser millimetre wave multiple-input multiple-output communication systems. We ﬁrst propose a frequency variational Bayesian algorithm, which leverages common sparsity of different sub-carriers in the frequency domain. The algorithm shares all the information of the support sets from the measurement matrices, signiﬁcantly improving channel estimation accuracy. To enhance robustness of the frequency variational Bayesian algorithm, we develop a hierarchical Gaussian prior channel model, which employs an identify-and-reject strategy to deal with random outliers imposed by hardware impairments. A support selection frequency variational Bayesian channel estimation algorithm is also proposed, which adaptively selects support sets from the measurement matrices. As a result, the overall computational complexity can be reduced. Validated by the Bayesian Cramér-Rao bound, simulation results show that, both frequency variational Bayesian and support selection-frequency variational Bayesian algorithms can achieve higher channel estimation accuracy than existing methods. Furthermore, compared with frequency variational Bayesian, support selection-frequency variational Bayesian requires signiﬁcantly lower computational complexity, and hence, it is more practical for channel estimation applications.


INTRODUCTION
In recent years, millimetre wave (mmWave) communication, which can provide gigabit-per-second data rates, has received considerable attention [1]. Combined with hybrid large array techniques, applying mmWave has become crucial for successful development of the forthcoming fifth-generation (5G) mobile networks [2]. Nevertheless, estimating channel state information (CSI) is challenging for hybrid pre-coding-employed wideband multi-user mmWave systems due to large and compressed channel matrices [3]. Random outliers imposed by hardware impairments also need to be considered, because they will degrade the channel estimation (CE) 1 performance of CE algorithms [4]. Therefore, it is necessary to devise innovative CE algorithms to address the challenges in mmWave communication systems.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Communications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology 1 Note that the meaning of term "CE" is identical with the estimation of the CSI. Therefore, throughout this paper these two terms will be used interchangeably.

Prior work
Most of CE algorithms in mmWave systems concentrate on exploiting the channel sparsity [5]. By using compressed sensing theory, a sparse CE problem can be formulated as a sparse signal recovery problem. Then, the required pilot overhead can be significantly reduced as compared to conventional algorithms, which uses least square (LS) [6] or minimum mean squared error methods [7]. Assuming that the estimated amplitudes of non-zero coefficients are channel gains of each path, applying sparse exploration is to compare the pairs of angle of departure (AoD) and angle of arrival (AoA) of each path in mmWave systems [8]. Most existing CE methods use narrow-band flat channel models [8][9][10]. However, in practice, mmWave channels generally have wideband and frequency-selective fading characteristics.
There are some publications that propose CE algorithms for wideband mmWave channels [11][12][13][14][15][16]. By leveraging joint sparsity of angular and delay domains, a time-domain CE algorithm using quantised sparse recovery is proposed in [11]. However, this algorithm is only applicable for single-carrier systems. Based on a multi-resolution codebook, some alternative CE algorithms are presented. A distributed grid matching pursuitbased CE algorithm is proposed in [12] to estimate the channel parametres of frequency-selective fading channel. By exploiting the codebook structure, a combined time-frequency approach is proposed, which can estimate the wideband CSI with low pilot overhead [13]. Authors in [14] propose a support detection (SD)-based CE algorithm. By observing the structural characteristics of mmWave beamspace channel, this algorithm can detect the support of the sparse beamspace [14]. Moreover, a simultaneous weighted orthogonal matching pursuit (SW-OMP) algorithm is proposed in [15]. Since this algorithm considers realistic frequency-selective fading channel models, it can exploit spatially common sparsity of mmWave channels. Nevertheless, all these algorithms [12][13][14][15] assume that there is no multi-user interference. Therefore, they can only estimate one user CSI at a time. In addition, when applying these algorithms, grid errors would be induced to channel coefficients, because multi-resolution codebook-based CE strategies are employed [17,18].
In order to improve the overall CE accuracy, sparse Bayesian learning [19][20][21] is widely used to estimate the channel parameters. This method can avoid the grid errors, and hence, improve the estimation performance. Specifically, an expectation maximisation-based sparse Bayesian learning is proposed in [19]. By using Gaussian mixture prior models, this method can achieve high estimation accuracy with low pilot overhead. [20,21] use generalised approximate message passing-based methods [20,21] to estimate CSI of mmWave multiple-input multiple-output (MIMO) systems. However, all these methods do not consider hardware impairments [4], which affects the performance of CE [22]. To solve this problem, [23] proposes a Bayesian compressed sensing-based CE algorithm for mmWave channels with random outliers. However, this algorithm [23] is only applicable for point to point narrow-band flat channels.

Contributions
Motivated by above, we propose a novel CE scheme based on variational Bayesian inference in the frequency domain. This scheme is designed for wideband multi-user mmWave MIMO communication systems, where hybrid pre-coding architectures and frequency-selective fading channels are considered. A frequency variational Bayesian (FVB) algorithm is presented, which exploits the common sparsity of different subcarriers. The algorithm formulates the CE problem as a sparse Bayesian problem, and thus, it can be solved by using variational Bayesian inference. By simplifying the inverse derivations, a support selection-FVB (SS-FVB) algorithm is also proposed to reduce the computational complexity. In addition, we provide a Bayesian Cramér-Rao bound (BCRB) to evaluate the estimation performance of the proposed algorithms. To the best of our knowledge, our paper is the first publication that considers the effect of the random outliers imposed by hardware impairments for the multi-user wideband CE with hybrid architectures. We summarise the specific contributions as follows.
1. We propose a method that incorporates the machine learning technique into CE of wideband mult-iuser mmWave MIMO systems. Specifically, the sparse signal recovery problem is formulated as a sparse Bayesian learning problem, which is extensible and robust to inference. Considering the random outliers, a hierarchical Gaussian prior channel model is developed based on the identify-and-reject strategy, and thus, the sparsity of the vectorised CSI is encouraged by hyperparameters. As a result, the robustness against random outliers can be significantly enhanced. 2. The variational Bayesian inference is employed to solve the sparse Bayesian learning problem. The Kullback-Leibler divergence is also introduced to tack CSI. We calculate all the posterior distributions of the latent variables from the parameter. Hence, the estimation accuracy of the proposed CE algorithms is significantly improved. 3. The proposed scheme achieves an excellent performance in terms of CE accuracy. Specifically, the FVB algorithm shares all the information of support sets from the measurement matrices, and provides a better estimation performance. For the SS-FVB algorithm, by setting some columns of measurement matrices to zeros, the computational complexity can be reduced.
The work in paper substantially extends our previous work [25], where a robust frequency-distributed variational Bayesian estimation algorithm is presented, and its performance is assessed by comparing the estimation accuracy with some existing algorithms. However, [25] does not provide the analysis of algorithm complexity, and does not solve the problem of high computational complexity caused by using the proposed algorithm. The computation of BCRB is also omitted in [25]. To make our work complete and deep, in this paper, we propose an SS-FVB algorithm to reduce the overall computational complexity. We analyse the computational complexity of the proposed two CE algorithms in detail and summarise it. We also provide the detailed deducing of BCRB in Section IV.
The rest of this paper is organised as follows. Section II presents the wideband multi-user mmWave MIMO system model including a frequency-selective fading geometric channel model and a hybrid pre-coding architecture. In Section III, the CE problem in the frequency domain is further formulated as a sparse Bayesian learning framework. Section IV proposes a CE scheme based on variational Bayesian learning in detail, along with the derivations of the computational complexity and the BCRB. The numerical results in Section V are presented to compare the estimation performance of the two proposed algorithms. Finally, Section VI concludes the main conclusions derived from the simulation results with some ideas for further research work.

Notation
The notations used throughout this paper are denoted as follows. A, a and a denote matrix, column vector and scalar value, respectively. A ij is used to denote the i, j -th element of the matrix A. A :,j represents the j -th column and A i,: denotes the i-th row of A. ℂ M×ℕ represents the space of M × N complexvalued matrices. (.) T , (.) H , ||.|| 2 , ||.|| F and ⟨.⟩ denote the conjugate transpose, transpose, l 2 -norm, Frobenius norm, and expectation, respectively. ⊗ is the Kronecker product and vec(⋅) is the vectorised operation by the columns of the matrix. ⟨x⟩ p(x) is to take the expectation of x according to the distribution of p(x).

SYSTEM MODEL
As illustrated in Figure 1, we consider an orthogonal frequency division multiplexing (OFDM)-based multi-user mmWave MIMO communication system, which employs hybrid precoding architectures and frequency-selective fading geometric channel models. In this system, we assume that there are K user equipments (UEs) equipped with N UE antennas and M UE radio frequency (RF) chains. The UEs transmit pilot sequences to a base station (BS) with N BS antennas and M BS RF chains.

Channel model
We consider a frequency-selective fading geometric channel model according to the IEEE 802.11ad wireless standard [24]. The channel matrix H d,k ∈ ℂ N ×ℕ of the d -th delay tap and the k-th user can be expressed as [26] where L k represents the number of paths; l,k ∈ ℂ and l,k ∈ ℂ are the complex gain and the time-domain delay of the lth path, respectively; and a BS ( l,k ) and a UE ( l,k ) are antenna steering vectors of the BS and the UEs with l,k ∈ [0, 2 ) and l,k ∈ [0, 2 ) being AoD and AoA, respectively. B( l,k ) = rc (dT s − l,k ), where T s is the sampling period and rc ( ) denote the pulse-shaping filter at time . Typical uniform linear arrays (ULA) with half-wavelength separation are deployed at the BS and the UEs. Hence, the elements of a BS ( l,k ) and a UE ( l,k ) can be expressed as An extended virtual channel model [24] is employed to approximate the channel matrix H d,k as are the dictionary matrices with sizes G BS and G UE for AoA and AoD, respectively. To exploit the common sparsity of different sub-carriers, the frequency-domain channel matrix with the length of the delay taps N C at the subcarrier p (0 ≤ p ≤ P − 1) can be written as denotes a sparse matrix for the k-th user at the p-th sub-carrier and P is the number of the subcarriers.

Hybrid pre-coding
Considering that the BS uses a hybrid combining scheme, W denote the analog and digital combining matrices at the t -th time frame, respectively. W  [11], and we employ zero padding (ZP) [13] instead of the cyclic prefix to avoid corrupting the pilot at the symbol edges. The received signal at the p-th sub-carrier

Antenna
Phase shifter Adder FIGURE 2 Graphical model of the hierarchical Gaussian prior channel model using identify-and-reject strategy can be written as p ∈ ℂ M × are the effective noises from the hybrid combining procedure; w (t ) p denotes the vector of the random outliers including G ≤ M BS non-zero entries; and n (t ) p is the additive multi-variate Gaussian noise vector. In the above equation, x (t ) p,k is the effective pilot signal transmitted by the k-th user at the p-th subcarrier, which can be expressed as BB,p,k ∈ ℂ M × being the analog and digital pre-coding matrices, respectively.

PROBLEM FORMULATION
In this section, we propose a two-step approach to identify the CE problem. We first formulate the CE process as a sparse signal recovery problem in the frequency domain. Then, we propose a sparse Bayesian learning problem based on a hierarchi-cal Gaussian prior channel model, which employs the identifyand-reject strategy. Using this model for CE, the robustness and estimation accuracy of the proposed algorithm can be improved.

CE formulation
As the scatters of BS are much smaller than those of the UEs in the single-cell mmWave uplink MIMO system, a small angular spread will appear at the receiver, resulting in channel sparsity [28]. In particular, the number of effective paths generally is less than 30 [28]. Therefore, in this system the vectorised channel vector is sparse, when the sizes of dictionary matrices G BS and G UE are lager than 64 [29]. According to the sparsity of mmWave channels, Equation (5) can be rewritten as where × is a sparse angular vector with KL non-zero values. Since the available signal-to-noise ratio (SNR) of mmWave communication systems is generally very low, M successive OFDM symbols within the channel coherence time are required to estimate the CSI. By stacking the M measurements into a vector, Equation (7) can be expressed as Note that the number of the required pilot dramatically increases when the traditional LS algorithm is used. In order to reduce the required pilot overhead, compressed sensing-based methods, i.e., orthogonal matching pursuit (OMP) [30], compressive sampling matching pursuit (CoSaMP) [31] and the iterative re-weighted 1 and 2 algorithms [32], are used to recover the sparse vector h p from y p [see Equation (8)].
On the other hand, we can observe that the frequencydomain channel model in (4) exhibits the same sparse feature for any sub-carrier p [24]. Therefore, h p can be seen as a vectorised is the length of the vector common of each sub-carrier. Moreover, the number of required training frames can be substantially reduced due to the common sparse feature of different sub-carriers in the mmWave channels.

Sparse Bayesian learning
Compared to the signal recovery algorithms based on compressed sensing theory, a robust and extensible inference framework based on sparse Bayesian learning, can provide better estimation performance for sparse signal recovery problems [33]. By using this method, the sparsity of the estimated signal is encouraged by employing a sparsity-promoting prior, and all the corresponding components, such as hybrid pre-coding, additive Gaussian noise and random outliers, are presented following a sparse Bayesian learning framework. As a result, a hierarchical Gaussian prior model [34] based on two layers can be constructed.
We assume that non-zero values of channel vector h in the first layer are independent and identically distributed (i.i.d.) and the complex Gaussian distribution [34] where = [ 1 , 2 , … , N ] T denotes the sparsity of the channel h.
In the second layer, the prior precision is also considered as i.i.d. random variables following the Gamma distribution. Therefore, the distribution of prior precision is given by with n ≥ 0, where is the Gamma function. a and b are set as small constant values to make these priors non-informative over { n }. By using the same method as Equation (11), the noise variance can be expressed as where c and d are small values [34].
Considering the random outliers caused by hardware impairments, a set of indicator variables { m } is assigned to detect the outliers by employing the identify-and-reject strategy [35]. Therefore, the received signal can be expressed as where m,: denotes the mth row of , u m , v m and m are the m -th entry of u,v and , respectively. Thus, the probability density function (PDF) of the received signal can be rewritten as This operation ensures that, in the presence of random outliers, the received signal y m is independent to the PDF p(y|h, , ). Since m ∈ {0, 1}, a beta-Bernoulli hierarchical prior placed on m is where m denotes the m-th entry of . In addition, the distribution of prior precision can be written as where e and f follow the Beta distribution. For clarity, Figure 2 gives the graphical model of the hierarchical prior channel model based on the identify-and-reject strategy, where the constants, the hidden variables and the observation are denoted as squares, circles and a shaded circle, respectively. Now the considered CE problem is formulated as a sparse Bayesian problem, which can be solved by the proposed CE scheme in Section IV.

CE SCHEME DEVELOPMENT
In this section, we propose and analyse the novel CE scheme to solve the sparse Bayesian learning problem. Generally, h can be estimated by the posterior distribution p(h|y) [36]. As it will be shown in Section V, by employing the variational Bayesian inference, the FVB CE algorithm provides an excellent estimation accuracy, because it takes advantage of overall support sets of the measurement matrices. Nevertheless, the computational complexity of the FVB algorithm is relatively high since the inverse of the measurement matrix is complicated. To reduce the computational complexity, the SS-FVB algorithm is proposed by using an adaptive selection operation based on the correlations between different support sets. We also derive the computational complexity and the BCRB in this section to analyse the performance of the proposed algorithms.

FVB CE algorithm
We apply variational Bayesian inference to solve the sparse Bayesian learning problem, and thus, the convergence to the local optimum can be guaranteed [37] 2 For simplicity, the set of the unobserved variables { , h, , , } is denoted as {z}. Since the derivation of p(z|y) is difficult to compute directly [36], we decompose the logmarginal probability of the received signal into two terms lnp(y) = L(q) + KL(q||p) (18) where L(q) = ∫ q(z)lnp((y, z)∕q(z))d z, KL(q||p) = − ∫ q(z)ln(p(z|y)∕q(z))d z is the Kullback-Leibler divergence which describes the distance between p(z|y) and q(z) [38], and q(z) can be any PDF close to p(z|y). As p(z|y)≤q(z) and KL(q||p) ≥ 0, the nearest analytical expression of the posterior distribution p(z|y) can be obtained by minimising the Kullback-Leibler divergence. As lnp(y) is only dependent to y, the problem of minimising the Kullback-Leibler divergence is equivalent to maximising L(q). Consequently, our objective becomes to approximate the p(z|y) by maximising L(q) with respect to q(z), i.e., q(h), q( ), q( ),q( ) and q( ) [36]. Details of the derivations are given as follows.

Derivations of q(h) and q(α)
Based on Equations (11) and (15), lnq(h) can be derived as Since Equation (22)  We can also obtain lnq( ) As Equation (20) is the exponent of Gamma distribution, q( ) can be expressed as and n,n is the n-th diagonal element of in Equation (22).

ALGORITHM 1 Proposed FVB Algorithm
Input: y, x K ,W , A BS , A UE,K and tolerance .

Output:
The channel estimatorĥ(k) of the k-th user.

12:
End for

Derivations of q(θ) and q(β)
According to Equations (15) and (16), the derivation of lnq( ) is obtained as Note that m still obeys a Bernoulli distribution, i.e., where C is a normalisation constant. We have Then, q( ) can be derived as Equation (29) can be rewritten as which indicates that the posterior q( ) follows a Beta distribution Using the previously derived expressions for the variational distributions q(h), q( ) and q( ), the posterior distributions p(h|y), p( |y) and p( |y) can be calculated in the same way [39]. Then, the channel vector h can be estimated by the posterior mean, i.e.,ĥ The details of the algorithmic implementation of the proposed FVB CE algorithm can be found in Algorithm 1.

SS-FVB CE algorithm
The FVB CE algorithm requires high computational complexity for the necessary matrix inversion. In addition, the high complexity of the measurement matrix will significantly increase the overall computational complexity of the algorithm.
To reduce the algorithmic complexity, here we propose the reduced complexity CE algorithm, SS-FVB, which is based on the property that a small number of the relevant support sets occupies most of the energy of the measurement matrices [18]. This algorithm exploits the correlation between the residuals and the columns of the measurement matrices.
We first propose a convenient reconstruction of the measurement matrix by making a proper adaptive selection operation of the support set. We define the column of the measurement matrix as a "support set", which has the strongest correlation with the received signal. The index number of the corresponding support set, 1 , can be obtained as Then, a new measurement matrix is constructed as By subtracting the contribution of 1 from y, the residual r 1 can be obtained by In the same way, the correlation between the residual and the columns of the original measurement matrix can be exploited to find the new index number as follows: Without loss of generality, we assume 1 < 2 , and then, the new measurement matrix 2 is constructed as The residual r 2 can be calculated as Equation (36)  can be employed to learn the latent parameters. By recalling (9), the CE problem can be expressed as The same as the FVB CE algorithm, the sparse channel vector h can be estimated by computing the posterior distribution p(h|y). Since the procedure in approximating p(h|y) with q(h) is identical to the one presented for the FVB algorithm, this procedure is omitted. Following Equation (33), the sparse channel vector h can be estimated aŝ

ALGORITHM 2 SS-FVB CE Algorithm
Input: y, x K ,W , A BS , A UE,K and tolerance .

Output:
The channel vector of the k-th userĥ(k).

10:
Compute the estimatorĥ of the channel vector.

11:
Computeã∕b n andc∕d to update the hyper-parameters n and , respectively.

12:
Detect the indicators and reject the outliers.

16:
Findĥ(k) of the k-th user by detecting the non-zero values.

18: End for
Algorithm 2 presents the detailed algorithmic implementation of the proposed SS-FVB algorithm. The following theorem shows that the proposed adaptive selection operation always converges to a local optimum.

Theorem 1. After KL iterations, the selection operation of the support sets converges to a local optimum.
Proof. See Appendix A. □

Computational complexity
Considering that the inverse operations of and KL require many computational overheads in Algorithms 1 and 2, we derive the computational complexity of the inverses of and KL at every step. More specifically, the measurement matrices are singular matrices, and therefore we discuss the pseudoinverses for and KL , i.e., The computational complexity of different mathematical operations are summarised in Table 1. From this table, it can be observed that the computational complexity of the Outliers (G) ) compared with those of T in Equation (42). The SS-FVB CE algorithm simplifies the inverse derivations in the covariances, because (N − KL) columns in are set to zeros. Therefore, the low computational complexity is achieved by the SS-FVB algorithm at the cost of reduced estimation accuracy. We plot the running time versus the number of pilot for the proposed two algorithms in Figure 3, where K = 4, L = 4, M BS = 4, G BS = 128 and G UE = 128. We can see that as the number of the pilot frames increases, the run time of the proposed two algorithms improves. The SS-FVB algorithm has a much smaller run time than that of the FVB algorithm because of its lower computational complexity.

BCRB
BCRB 3 [40] is conveniently used to assess the estimation error of CE algorithms. BCRB can be calculated by where E[⋅] represents the expectation operation and J is a FIM. Assume that the derivatives of the logarithmic function of joint  PDF p(y, h) exist. The FIM can be expressed as [41] Since Equation (45) is complex to be computed, we propose the following theorem to obtain the expression of BCRB.

Theorem 2. It is assumed that p(y|h) satisfies the regularity condition
Then, the complex FIM is given by the matrix where Proof. See Appendix B. □ According to Equations (11) and (12), it is clear that the true prior distribution p(h; a, b) 4 of the sparse channel vector h reduces to where St(h n | , , ) is a Student-t distribution with = 0, = a∕b and = 2a. Moreover, the prior distribution of h n can be obtained as Taking the log-likelihood function on both sides of Equation (47), we can obtain The log-likelihood function of p(h n ) can be calculated as The first-order partial derivative of L(h n ) is .
Based on Equation (48), the first-order partial derivative of lnp(h) is the sum of the U (h n ). Thus, U (h) can be calculated as Using Theorem 2, J h can be calculated for both FVB and SS-FVB algorithms, as given by Using the same way as [42], the measurement matrix is added as the additional observation, because it will impact the estimation of h. Then, J y|h for the proposed two algorithms can be expressed as [43] It can be observed that the derivations of J y|h and J h use the same variables for our proposed CE algorithms, so their lower bounds are equal, denoted as J. The bound J will be used in the next section for analysing the CE performances of algorithms.

NUMERICAL RESULTS
In this paper, the wideband multi-user mmWave communication system has been implemented in software and its performance has been obtained by means of computer simulations. The simulation parameters in this paper are set according to references [3], [23], [24]. There are K = 4 UEs, each with N UE = 32 antennas, and a BS with N BS = 128 antennas. We set M BS = 4 at the receiver and M UE = 1 at the transmitter. The dictionary matrices are constructed with G BS = 128 and G UE = 128 for the receiver and transmitter, respectively. Both FVB and SS-FVB have been implemented, and their performance is compared with some other well-known CE algorithms, including OMP [30], CoSaMP [31], SW-OMP [15] and the SD-based CE algorithms [14]. In addition, the performance evaluation results for all these algorithms have been obtained by using Monte Carlo error counting techniques. The BRCB is also provided.
In accordance with the IEEE 802.11ad wireless standard, we also generate the channel (4) with the following parameters. To compare the performance of various CE algorithms, the Normalised Mean Squared Error (NMSE) ofĥ(k) is introduced, which is expressed as The performance of different CE algorithms versus the required pilot overhead is shown in Figure 4. We can see the superiority of the proposed FVB and SS-FVB CE algorithms. The poor performance of the OMP algorithm is caused by its inability to exploit the joint sparsity shared by the delay and frequency domains. The CoSaMP method, which uses OMP, exhibits a slightly lower error when M takes small values. Although the SD-based algorithm takes advantage of the structural characteristic of mmWave beamspace channel, estimating only one path at each iteration degrades its estimation performance. The FVB and SS-FVB algorithms achieve the best CE performance because both of them make full use of the common sparsity of different sub-carriers in the frequency domain. Figure 5 shows the NMSE performance versus SNR for various CE algorithms. The SNR is ranged from -15 to 5 dB in accordance with the realistic mmWave systems. It is noted that the OMP and CoSaMP algorithms have a poor performance for the whole SNR range as they ignore the joint sparse vector basis. In particular, the SD-based algorithm is the worst, and it cannot work when the SNR reduces to -10 dB. Compared with other four CE schemes, both of the proposed algorithms have much higher estimation accuracy even at low SNR range, because they can effectively exploit the variational Bayesian inference. Figure 6 shows the performance of the different competing algorithms versus the number of the random outliers. It can be observed that the compared algorithms are very sensitive to random outliers and their performance has a large degrada- tion. More specifically, the OMP, CoSaMP and SD-based algorithms cannot guarantee the estimation accuracy when the number of the random outliers increases to 10. According to the identify-and-reject strategy, the NMSE of the proposed algorithms increases very slowly as the number of the random outliers increases. Therefore, the variational Bayesian inference and the identify-and-reject strategy are the proper means to improve the estimation accuracy and overcome the random outliers in wideband mmWave systems. In Figure 7, we compare the NMSE versus the number of the pilot for the proposed CE algorithms as well as the BCRB. It can be seen that the NMSE of FVB algorithm is closer to Comparison of the NMSE versus number of random outliers for the proposed CE algorithms and the BCRB. The SNR is assumed as -5 dB. The number of the training frames is set as 80. The number of the outliers is ranged from 0 to 10 the BCRB than that of SS-FVB algorithm. This is because the SS-FVB algorithm does not exploit all the information from the support sets to estimate the common sparse channel vectors. Actually, employing a larger number of pilot can increase the estimation performance at the expense of both higher training overhead and computational complexity. Therefore, future research should consider ways of narrowing this gap and optimize the trade-offs between computational complexity and estimation performance by exploiting better recovery algorithms. Figures 8 and 9 show the NMSE of the proposed CE algo-rithms versus the SNR and the random outliers, respectively. Figure 7 shows that the NMSE values obtained by FVB and SS-FVB algorithms approximate closely to the BCRB within the whole SNR range. As shown in Figure 8, the identify-and-reject strategy enhances the robustness against hardware impairments for the proposed scheme, so that their performance has a slight degradation with the variation of the random outliers.

CONCLUSION
In this paper, we have proposed a novel CE scheme for wideband multi-user mmWave MIMO-OFDM systems. We have first proposed the FVB CE algorithm by exploiting the common sparsity of different sub-carriers. Considering the random outliers imposed by hardware impairments, a hierarchical channel model based on sparse Bayesian learning has been developed to improve the estimation accuracy. Besides, the power of the identify-and-reject scheme has enhanced the robustness of the FVB CE algorithm significantly. Then, the SS-FVB CE algorithm, which has developed the adaptive selection operation of the measurement matrices, has been proposed to reduce the computational complexity. This algorithm has provided the trade-off between computational complexity and estimation accuracy. Simulation results have verified that the proposed CE scheme is capable of achieving high estimation accuracy even at low SNR and pilot overhead. Future research can focus on balance the trade-offs between the computational complexity and estimation performance. It is also interesting to develop an inverse-free algorithm to reduce the computation complexity imposed by sparse Bayesian learning.