Beamspace Channel Estimation with Beam Squint Effect for the Millimeter-Wave MIMO-OFDM Systems

In this paper, we investigate two novel beamspace channel estimation schemes for millimeter-wave MIMO-OFDM systems relying on a uniform linear antenna array. Based on the sparse nature of the beamspace channel, compressive sensing-based methods can be adopted. Most of the existing wideband beamspace channel estimation schemes take the common support assumption that the support of all sub-carriers from a single path component keeps unchanged. However, due to the beam squint effect caused by time delays among different antennas, this assumption is no longer satisfied. Based on the above, we analyze the frequency-dependent sparse structure of the beamspace channel due to beam squint and propose the block simultaneous orthogonal matching pursuit algorithm where support is jointly estimated by using beamspace windows. Moreover, considering the power focusing effect of beam squint, we propose the two-stage beamspace channel estimation scheme, where the first stage is for the angle of arrival estimation, and the second stage is for updating path gain to reconstruct the beamspace channel. Simulation results demonstrate that proposed schemes can estimate the channel under the beam squint effect with high accuracy but low pilot overheads.


I. INTRODUCTION
W ITH the rapid popularization of mobile Internet applications and the widespread use of smart terminals, the demand for mobile data rates has increased exponentially. The massive multiple-input multiple-output (MIMO) technique is designed to improve spectrum efficiency in wireless communication due to its degree of freedom in the spatial domain [1]- [3]. Meanwhile, the millimeter-wave (mmWave) at 30-300 GHz can provide rich spectrum resources to support wideband transmission and achieve high data rates [4]- [7]. By combining mmWave wideband communication and the massive MIMO technique, both spectrum efficiency and data rates can be improved, which has attracted humongous attention in academia and industry communities [6].
For wireless communication systems, acquiring the instantaneous channel state information (CSI) has a decisive impact on system performance. However, channel estimation becomes a challenging problem when massive MIMO and mmWave technologies are introduced. On the one hand, the dimension of the channel matrix becomes significant with a large number of antennas at the base station (BS), which leads to colossal pilot overheads and high complexity if classic algorithms are adopted, e.g., least squares (LS) and minimum mean square error (MMSE) [8]. On the other hand, the severe path loss of the mmWave channel and the mobility of users bring difficulties to channel estimation.
To reduce the complexity of channel estimation, some authors propose various channel estimation schemes on the aspect of exploiting the sparsity in the angle domain and delay domain of mmWave channel [9]- [15], which are developed by compressive sensing (CS)-based methods. Particularly, [9] provides an orthogonal matching pursuit (OMP) solution for hybrid massive MIMO systems by using the angle-domain channel sparsity. The authors in [10] propose a joint chan-nel estimation and tracking scheme based on compressive sensing. In [11], a two-stage method is proposed for timevarying mmWave channels by exploiting the sparsity in the angle domain.
However, the methods in [9]- [11] are designed for narrowband systems. Although these methods can be extended to wideband systems via sub-carrier by sub-carrier processing, the complexity will increase due to the large number of subcarriers, especially in mmWave wideband systems. Recently, efficient wideband channel estimation schemes have been proposed for massive MIMO systems [12]- [15]. Specifically, an OMP based scheme is proposed in [12] by leveraging the sparsity of the mmWave channels in both time and frequency domains. [13] utilizes the basis expansion model to reduce the number of the coefficients to be estimated and then proposes a quasi-block simultaneous orthogonal matching pursuit algorithm to recover the channel. [16] investigates wideband beamspace channel estimation with a lens antenna array. An adaptive selecting network is designed at the first stage, and then a support detection-based channel estimation scheme is proposed by exploiting the structural characteristics of the mmWave beamspace channel. Besides, the authors in [14] propose a matrix-completion-based estimation approach based on a random spatial sampling structure. By exploiting the channel sparsity in both angle and delay domains, [15] adopts the maximum likelihood approach to obtain the super-resolution estimations of the AoAs.
However, the above-mentioned sparse estimation schemes have two vital drawbacks. On the one hand, they approximate the continuous parameter space to a finite set of grid points, introducing quantization error and degrading channel estimation performance. To deal with the grid mismatch problem, researchers propose off-grid schemes in [17], [18]. Specifically, [17] combines the smart perturbation mechanism and the simultaneous orthogonal matching pursuit (SOMP) algorithm to jointly solve off-grid parameters. In [18], the beamspace multiple signal classification (MUSIC) algorithm is implemented to estimate the angle of arrival (AoA) after antenna grouping. Although these methods have better performance than traditional ones, their complexity is generally high.
On the other hand, previous schemes [9]- [15] neglect the difference between the traditional MIMO channel model and the massive MIMO channel model. For large-scale antenna mmWave systems, time delay among antennas of the same physical path cannot be ignored due to a large number of antennas and a high sampling rate. The conventional massive MIMO channel model ignores delay differences among antennas due to a small number of antennas. Consequently, the channel estimation and precoding algorithms need to be revised. This phenomenon causes beam squint effect [19], [20] in the frequency domain, which is more obvious in wideband mmWave massive MIMO systems. The wideband solutions [12]- [15] suffer from obvious performance loss due to the common sparse support assumption is no longer satisfied. To deal with this problem, some solutions are pro-posed [21]- [26]. Specifically, [21] and [22] adopt the off-grid sparse Bayesian learning scheme to get continuous angledelay parameters, and [23] divides the entire bandwidth into multiple subbands, then obtains the joint estimation of the path directions by MUSIC algorithm. But the complexity of these two schemes is relatively high. Besides, authors in [24] propose a CS-based scheme that exploits the shiftinvariant block-sparsity of the beam squint channel model. In [25], with the help of the adaptive-updating dictionary, physical parameters of the channel can be extracted by a super-resolution CS algorithm. [26] proposes a simultaneous weighted OMP algorithm to deal with beam squint. However, none of these methods use the special sparse structure caused by the beam squint effect to reduce channel estimation complexity.
In this paper, we propose two schemes of beamspace channel estimation with beam squint effect for wideband mmWave massive MIMO systems: a CS-based algorithm and a two-stage framework. The main contributions of this paper are summarized as follows: 1) Based on the effect of beam squint, we firstly analyze a unique sparse structure of each path component of the beamspace channel, which is frequency-dependent. Besides, we define the beam distance as the support difference among different sub-carriers of each component, which can be mathematically calculated. 2) With the frequency-dependent sparse structure, we propose a block simultaneous orthogonal matching pursuit (BSOMP) scheme. For each path component, we jointly estimate sparse support to improve accuracy using the frequency-dependent sparse structure and then remove its influence to estimate the remaining path components. 3) We also propose a two-stage scheme, which decomposes the beamspace channel estimation into two subproblems, i.e., AoA estimation and path gain estimation. Due to the beam squint effect, we discover the property of power focusing, and then propose the off-grid AoA estimation algorithm by finding the index of antenna and sub-carrier with no power leakage. In the second stage, the channel gains are obtained by LS algorithm for reconstructing the beamspace channel.
The following of this paper is organized as follows. Section II introduces the channel model for the mmWave MIMO-OFDM system on a uniform linear antenna (ULA) array and introduces the effect of beam squint. Besides, the problem of beamspace channel estimation is formulated for singleantenna users. Section III proposes the BSOMP scheme to solve the problem of beamspace channel estimation with beam squint. In section IV, the two-stage scheme is proposed with the performance analysis. In Section V, simulation results are provided to demonstrate the advantages of our proposed scheme. Finally, the conclusion is given in Section VI.
Notations: Scalar is denoted in italics, while the lower-case and upper-case boldface denote a vector and a matrix, respec- An illustration of the signal delay a ULA with N antennas.
tively. For a matrix A, A T , A H , A −1 and A † denote the transpose, conjugate transpose, inverse, and pseudo inverse, respectively. A 2 and A F denote the spectral norm and Frobenius norm, respectively. For a vector a, a 2 denots the l 2 -norm. The cardinality of set S is denoted as |S|. A (S, :) and A (:, S) the sub-matrices of A consisting of the rows and columns indexed by S, respectively. Besides, I N denotes the N × N identity matrix. CN (a, R) denotes the complex Gaussian distribution with mean a and covariance R. E (·) denotes the expected value.

II. SYSTEM MODEL
We consider an uplink broadband time division duplexing (TDD) mmWave MIMO-OFDM system that consists of a BS equipped with N antennas and K users with a single antenna. Besides, OFDM modulation technology is adopted with M sub-carriers. In the following sub-sections, we will introduce the channel model of the beamspace domain and formulate the channel estimation problem.

A. BEAMSPACE CHANNEL MODEL
As shown in Fig.1, we take the time delay among antennas into consideration [27]. According to the far-field assumption that the distance between transmitter and receiver is much larger than the size of antenna array at BS, we have where τ k,l,n denotes the time delay from user k to n-th antenna of the BS for k ∈ {1, . . . , K}, l ∈ {1, . . . , L k } and n ∈ {1, . . . , N }, θ k,l denotes the AoA of l-th component from user k, and ϕ k,l = d sin θ k,l λc is defined as normalized AoA, τ k,l τ k,l,1 for simplicity. Besides, d is the antenna spacing of ULA and c denotes the speed of light.
Then the time-domain response of the uplink channel between user k and n-th antenna can be expressed as k,l e −j2π(n−1)ϕ k,l δ (t − τ k,l,n ), (2) whereg k,l g k,l e −j2πfcτ k,l is the equivalent complex gain of l-th path for user k and L k denotes the number of paths from user k to the BS. The frequency-domain response can be given as The channel response in frequency-domain of N antennas is represented by h k,m , i.e., where represents the spatial-domain steering vector which is relevant to the frequency of sub-carriers, and f m denotes the frequency of m-th sub-carrier. The widely-used MIMO channel model [28], [29] ignores the time delay among antennas, and the spatial-domain steering vector can be expressed as a(ϕ k,l ) = [1, e −j2πϕ k,l , . . . , e −j2π(N −1)ϕ k,l ], which is frequnecy-independent. But with the number of antennas and transmission bandwidth going large, the relationship max(τ k,l,n − τ k,l ) T s does not hold any more where T s refers to sampling interval. For example, in the mmWave massive MIMO systems, the BS equipped with 128-antenna ULA of half-wavelength spacing and operate at 28GHz with 4GHz bandwidth can induce the maximum delay of 9.07T s that is not negligible.
According to [30], we can get beamspace channel expression based on the discrete Fourier transform (DFT) matrix U, which consists of N orthogonal array response vectors whereφ n = 1 N (n − N +1 2 ) for n = 1, 2, · · · , N refers to the pre-defined direction. Consider the effect of frequencyselective fading, the beamspace channel of the m-th subcarrier can be expressed as k,l e −j2πτ k,l fmc k,l,m , (7) VOLUME 4, 2016 wherec k,l,m = U H Ξ m (ϕ k,l ) represents the l-th beamspace channel at sub-carrier m for user k. The channel model in (7) can be rewritten as a matrix form whereC k,m = [c k,1,m ,c k,2,m , . . . ,c k,L,m ] ∈ C N ×L , β k,m = [β k,1,m ,β k,2,m , . . . ,β k,L,m ] T ∈ C L×1 is the vector of path gains withβ k,l,m = N L kg k,l e −j2πτ k,l fm .

B. PROBLEM FORMULATION
In TDD systems, each user transmits their pilot in an orthogonal resource block. Therefore, the channel estimation of each user is independent of each other, and we omit subscript k in the following sections for notational simplicity. It should be noted that we focus on uplink channel estimation, and the downlink channel can be obtained by leveraging the channel reciprocity.
We denote x m,p as the transmitted pilot at the m-th subcarrier in the time slot p. Then, after receiver combining, cyclic prefix removal, and M -point DFT, the received signal at BS can be obtained as where W p ∈ C N RF ×N is the receiver hybrid combining matrix, and n m ∼ CN (0, σ 2 I N ) of size N × 1 is the noise vector with σ 2 representing the noise power.
Let P denotes the length of pilots and assume x m,p = 1 for p = 1, . . . , P . The overall received pilot signals at m-th sub-carrierȳ = [y T m,1 , y T m,2 , . . . , y T m,P ] T ∈ C N RF P ×1 can be obtained asȳ where Now, our target is the reconstruction of theh m based on y m andW for m = 1, 2, · · · , M .

III. THE BSOMP SCHEME FOR BEAMSPACE CHANNEL ESTIMATION WITH BEAM SQUINT
In this section, we first analyze the beamspace channel's unique sparse structure with beam squint effect. Then, the BSOMP algorithm is proposed for beamspace channel estimation with beam squint.

A. SPARSE STRUCTURE OF BEAMSPACE CHANNEL WITH BEAM SQUINT
In this sub-section, we analyze the sparsity ofh m and then prove that it exhibits a unique sparse structure, which is frequency-dependent.
We rewritec l,m as where sinc(x) = sinN πx sinπx represents the Dirichlet sinc function [31]. Most of the power ofc l,m is focused on a small number of elements according to the power-focusing property of sinc(x). Consequently,h m is a sparse vector due to the limited scattering in mmWave wireless systems, i.e., L is small [5], [32].
According to (12), the focused power is related to the frequency, which means the support ofh m is different among sub-carriers, i.e., where supp(x) represents index set of non-zero elements. In the narrowband or small scale of antenna array systems, the support ofh m is frequency-independent and usually takes the common support assumption that is no longer satisfied based on (13). For example, considering a system with AoA θ k,l = π/8, the number of ULA antennas N = 128, carrier frequency f c = 28 GHz and bandwidth B = 4 GHz, the normalized beamspace channel power distributions with antenna index over different sub-carriers is shown in Fig.2. It can be observed that the index of the antenna with the most power changes with sub-carriers. Specifically, the first sub-carrier has a strong beam at 116-th antenna, and the middle subcarrier at 119-th antenna, while the last sub-carrier at 123th antenna, which is different. Fig.3 depicts the relationship between the index of the antenna with the strongest power and the sub-carriers. It shows that the index of the strongest antennas varies with sub-carriers, which means the support of the beamspace channel is related to frequency.
Both figures show the impact of beam squint, which needs to be considered when we design schemes for beamspace channel estimation.
Based on the above analysis, we know that the assumption of common support is no longer satisfied among sub-carriers because of the beam squint effect. However, the beamspace channel still shows a unique frequency-dependent sparse structure which is the basis of the proposed BSOMP scheme. Lemma 1: With the l-th path component of the beamspace channel, the beam distance from m 1 -th sub-carrier to m 2 -th sub-carrier can be calculated as d l,m1,m2 = (m 1 − m 2 ) ηN ϕ l /f c where η represents the sub-carrier spacing. Proof 1: Firstly, we define the beam distance d l,m1,m2 as the index difference of the antenna that has the most power of l-th beamspace channel path between m 1 -th sub-carrier and m 2 -th sub-carrier, which can be expressed as  where n * l,m1 and n * l,m2 represents the index of antenna with largest power for m 1 -th, m 2 -th sub-carrier respectively.
According to the sparsity ofh m , we can find the n * l,m by n * l,m = arg min From (15), we can learn that the index n * l,m is relevant to the frequency of sub-carrier. It moves with f m , which indicates the common support assumption is no longer satisfied.
If the largest power antenna moves d l,m1,m2 with the subcarriers, the value of pre-defined direction changes as: So we have According to (17), we have the conclusion d l,m1,m2 = (m 1 − m 2 ) ηN ϕ l /f c .
Consider the effect of beam squint and Lemma 1, the d l,m1,m2 doesn't have to be an integer, and the fractional part d l,m1,m2 − d l,m1,m2 represents not all the beamspace channel power focus on the antenna of the index n * l,m where x represents that the largest integer smaller than x. Due to the limited resolution of pre-defined directions, some power will distribute on adjacent antennas, which is called the power leakage effect from another perspective. On the other hand, we can calculate the support of all sub-carriers if one specific sub-carrier's support is known.

B. PROPOSED BSOMP SCHEME
Sinceh m is a sparse vector, we intuitively adopt CS-based algorithms to solve the problem with a significantly reduced length of pilot signals, i.e., P (N/N RF ) [33], [34] . For the wideband systems, the CS-based schemes are proposed, such as SOMP [35] and OMP-based [12] algorithm. However, due to the beam squint effect, lthe common support among sub-carriers assumptipon of SOMP does not hold anymore, as shown in Fig.2 and Fig.3.
In this sub-section, we propose the BSOMP scheme to solve the beamspace channel estimation problem. Due to the frequency-dependent sparse structure of the beamspace channel, the BSOMP scheme jointly estimates the support of different sub-carriers for each path component to improve the accuracy. Although both SOMP and BSOMP estimate support jointly, the BSOMP scheme calculates each subcarrier's support after the jointly estimation result, and the SOMP scheme ignores the support difference among subcarriers, which leads to performance degradation. After support estimation, the LS algorithm is adopted to get beamspace channel response.
According to (12), the index of the strongest element n * l,m can be found as Based on Lemma 1, the index of the strongest element of the remaining sub-carriers can be calculated. For example, if we get n * l,1 , the others can be obtained as where we assume ϕ l ≈φ n * l,m . Then the support ofh m for l-th path component is where Θ N (x) = mod N (x − 1) + 1 represents the mod function whose target is to guarantee all elements in Ω l,m belong to {1, 2, ..., N }. It should be noted that the size of VOLUME 4, 2016 the support is determined by the parameter ω, and 96% beamspace channel power contained for ω = 4 and N = 256 scenario under common support assumption [16].
Removing the influence of former l paths, i.e.,h m (Ω) = 0, the support of (l + 1)-th path component can be obtained by repeating (18) and (20). Finally, the support ofh m is Ω = Ω 1 ∪ · · · ∪ Ω L . ConsiderH = [h 1 , . . . ,h M ], we design a beamspace window (BWin) to capture the beamspace channel power of l-th component, which is the region enclosed by several antennas and all M sub-carriers that expressed as The purpose of BWin is to determine the antenna region that most beamspace channel power-focused of l-th path component. Due to the beam squint effect, we use the average power focused on each antenna to measure the power of BWin captured, i.e., The key of BWin is to design d n . Based on Lemma 1, the maximal beam diatance happens when m 1 = M and m 2 = 1, i.e., It indicates the optimal d n is The relationship between target in (22) and d n is shown in Fig.4 when N = 128, M = 512, f c = 28 GHz and B = 4 GHz. It can be observed that target in (22) shows a good performance when d n satisfies (24). Besides, (24) and Fig.4 both show the optimal d n varying with different values of normalized AoA, which is caused by reason that different values of normalized AoA lead to different levels of power leakage beacause of beam squit.
After getting n * l , the sparse channel support of l-th component of other sub-carriers is obtained by (19). Then, the LS algorithm is adopted to get beamspace channel response. The pseudo-code of the BSOMP scheme is summarized in Algorithm 1.
Firstly, we initialize R = [r 1 , r 2 , . . . , r M ] = Y, where r m is the residual at sub-carrier m.
For l-th path component, we firstly adopt the BWins to capture beamspace channel power. Specifically, we calculate the the correlation matrix as A l =W H R in step 1. Note that the combine matrixW need to satisfied the condition of restricted isometry property (R.I.P), i.e.,W HW ≈ I N . Then, we generate N BWins Γ n = Θ N {n, ..., n + d n − 1} based on Lemma 1 in step 4, and obtain the index of strongest element of the l-th component at the first sub-carrier in step 6, i.e., The procedure above will be repeated until all L path components have been estimated. Finally, we combine the support of all L path components and independently estimate the beamspace channel, which is executed in steps 15 and 16.

C. ALGORITHM SUMMARY AND PERFORMANCE ANALYSIS
In this subsection, we summarize the proposed BSOMP scheme and analyze its Cramer-Rao lower bound (CRLB) and the computational complexity.
The main difference between the BSOMP scheme and conventional CS-based schemes is the support estimation. For instance, the OMP-based scheme estimates the support of different sub-carriers independently [12], which is vulnerable to noise and shows a poor performance at low signal-noise ratio (SNR) [36]. And the SOMP-based scheme assumes all sub-carriers share common support [35], which leads to serious performance loss due to the beam squint effect. By contrast, the BSOMP scheme jointly detects the support with beam squint by exploiting the frequency-dependent sparse structure of the wideband beamspace channel, which achieves a better performance. In addition, we assume the number of paths L is known as prior information that channel measurements can obtain [5]. For the cases where L is unknown, we can modify the loop condition of the algorithm [34]. Specifically, we define the channel estimation differ- Get sparse channel support of m-th sub-carrier Ω l,m based (19) and (20). 10 the number of loops. If the difference is smaller than a threshold ε, the procedure will be stopped.
Then, we discuss the the CRLB of the BSOMP scheme. We know that the noise in (10) after combining processing is still subject to Gaussian distribution, i.e.,n m ∼ CN 0 P N RF ×1 , σ 2 I P N RF [11]. Therefore, we give the conditional probability density function ofȳ m with givenh m , which is According to [37], the Fisher information matrix of (27) can be calculated as Based on the estimation theory [37], we have As for complexity analysis, we observe that the complexity of BSOMP is dominated by steps 2, 6, 10, 11 and 16 according to Algorithm 1. In step 2, the complexity of correlation matrix calculation is O (N N RF M P ). In step 6, the optimal n * l is found, which has the complexity in order of O (N M ). In step 10, the size ofW(:, Ω l,m ) is P N RF × (2ω + 1) and the size of r m is P N RF × 1. Hence, the multiplication gives a complexity in order of O N RF P ω 2 . Same as step 10, the step 11 calculate the multiplication between W(:, Ω l,m ) andŝ l,m , which gives a complexity in order of O (N RF P ω). In step 16, we compute the multiplication between W † (:, Ω m ) andȳ m , where the size of W † (:, Considering that there are L path components, steps 2,6,10 and 11 are executed for L times, while step 16 is executed once. So the total complexity of the BSOMP scheme is given as By contrast, the complexity of both the OMP-based and SOMP-based schemes can be presented as O M N RF P L 3 ω 3 + O (N M N RF P Lω) [11], [35], which is higher than that of the BSOMP scheme.

IV. THE PROPOSED TWO-STAGE BEAMSPACE CHANNEL ESTIMATION WITH BEAM SQUINT
For the BSOMP scheme, the size of support of all subcarriers are same, i.e., ω = ω 1 = · · · = ω M . According to Dirichlet sinc form of elements in (12), the level of power focusing is different among sub-carriers. Hence, ω is not suitable for all sub-carriers, which causes performance degradation. Considering this issue, in this section, we will introduce another property of beam squint, i.e., power focusing effect. Combining with the matrix form of beamspace channel expression, we propose a novel two-stage framework for beamspace channel estimation, including AoA estimation followed by path gain estimation.

A. THE POWER FOCUSING EFFECT OF BEAM SQUINT
In this subsection, we demonstrate the effect of power focusing, which can help us estimate the normalized AoA of l-th path component.
Due to the limited number of antennas, the pre-defined directionφ n are discrete values. In most cases, the value of 1 + fm fc ϕ l −φ n is not equal to zero, which causes the power leakage phenomenon. In traditional MIMO channel models, the level of power leakage of all sub-carriers is same. However, we can observe that the level of power leakage varies with frequency when the beam squint effect is taken into consideration.
Based on (31), we define the beam squint matrix Q l , whose column is [n * l,m , m] T for m = 1, 2, . . . , M . The value of 1 + fm fc ϕ l −φ n is different for different Q l (:, m). This variation rule is consistent with the Dirichlet sinc function. As shown in Fig. 5, 1 + fm fc ϕ l is located at the red region when n * l,m is found. The larger f m is, the value of 1 + fm fc ϕ l goes more closer toφ n if 1 + fm fc ϕ l <φ n . The step size depends on sub-carrier spacing η and the resolution of pre-defined direction that relies on the number of antennas N . When N and M are large, we will get the point where the entire channel power of the l-th path component is focused on one element with no power leakage, i.e., existing fm satisfies For the sub-carriers whose index satisfies m >m, the channel power of the l-th path component will gradually focus on the (n * l,m + 1)-th element. If M is large enough, we will get the next point with the whole channel power of the l-th path component, i.e., where fm 2 refers to the frequency of sub-carrier that entire channel power is focused for the second time.
For example, Fig.6 shows the normalized beamspace channel power distribution of l-th path component when the Besides, the entire beamspace channel power is concentrated at sub-carrier 22 for the first time. The same phenomenon is observed from antenna 117 to antenna 123, and the whole process presents the periodicity, which is defined as beam squint period (BSP). The BSP F bsp refers to the number of sub-carriers that satisfies d l,m1,m2 = 1, which can be calculated as Besides, the Fig.6 also shows the last entire beamspace channel power focuses at sub-carrier 512, antenna 123, and there are 7 complete BSPs when ϕ l = 0.4.

B. THE PROPOSED TWO-STAGE SCHEME
In this subsection, we propose a two-stage scheme to estimate the beamspace channel. The first phase is the normalized AoA estimation based on the power focusing effect, and the channel gain is obtained by the LS algorithm in phase two. For the l-th path component, the normalized AoA estimation is tightly related to the index of antennas that beamspace channel power focused. Meanwhile, the values of practical AoAs are continuous, resulting in beamspace channel power distribute more than one beam (antenna). Hence, the results of on-grid angle estimation approaches are inaccurate.
However, due to the power focusing effect of beam squint, the level of power leakage is not the same for different subcarriers, but it shows a tendency that the beamspace channel power slowly focuses on one antenna and then diverges with sub-carriers. The key idea of normalized AoA estimation is to find the index of antenna and sub-carrier that the entire channel power of l-th path component is focused on. Based on the analysis of the power focusing effect, we know that at least one complete BSP is needed to obtain the entire channel power and the corresponding normalized AoA, and with a larger number of sub-carriers or BSPs, the result becomes more accurate. The pseudo-code of the proposed two-stage beamspace channel estimation scheme is summarized in Algorithm 2.
Specifically, we firstly get the beam squint matrix Q l of the l-th path component based on Algorithm 1. Then, the sliding window (SWin) is adopted to capture the largest power point based on the effect of power focusing.
The SWin is defined as Υ m = Θ M {m − ∆ m , ..., m + ∆ m }, where ∆ m represents the size of SWin. For each complete BSP, we will get one point that the entire channel power of the l-th path component is focused on. Based on beam squint matrix Q l , we assume the range of n * l,m is from n l,s to n l,e (n l,e > n l,s ), and at least have n l,e − n l,s − 1 complete BSPs. For n l,s < n l,i < n l,e , we find the center point (n l,i , m l,i ) by After getting (n l,i , m l,i ), the normalized AoA can be calculated based on the following equations: where n 1 represents the index of the antenna at the first subcarrier with beam squint effect. By simplifying (36), the AoA can be expressed aŝ The final result for the l-th path component is obtained by average function, i,e., ϕ l = φ l,i n l,e − n l,s − 1 .
After the AoA estimation, we can getc l,m bŷ Then, the channel path gain can be obtained by the LS algorithm asβ Then, the influence of the l-th path components is removed in step 11. The beamspace channel can be recovered based on (8) after all path components have been estimated.

C. ALGORITHM SUMMARY AND PERFORMANCE ANALYSIS
In this subsection, we summarize the proposed two-stage scheme and analyze its CRLB and computational complexity. Note that the proposed two-stage scheme is different from conventional CS-based schemes. The CS-based schemes estimate the support by exploiting the sparsity in the angle or the time domain. By contrast, the proposed two-stage scheme aims to find out the largest element with no power leakage by using the power focusing effect of beam squint, which results in the normalized AoA can be estimated accurately. The advantage of the proposed two-stage scheme can be verified in the simulation results section.
Next, we give the the CRLB of the two-stage scheme when the prior knowledge of AoA is known.
Firstly, we rewrite (10) as whereC m is the ideal receive steering vector, andW is the combining matrix at receiver. The CRLB of the two-stage scheme can be given as follows. Lemma 2: By assuming the prior knowledge of AoA at BS, the CRLB of the two-stage scheme can be expressed as where D =C H mCm , G m =WC m and λ min (D) denotes the minimum eigenvalue of D. Proof 2: Please refer to Appendix A.
For the complexity analysis, the complexity of the proposed two-stage scheme is dominated by steps 2, 5, 9, 10, 11 and 17.

VOLUME 4, 2016
Step 2 has the complexity in order of O (N N RF M P ). In step 5, the complexity is dominated by (35), which has complexity in order of O (M ). The complexity of step 9, 10, 11 is O N 2 , O N P 2 N 2 RF and O (N P N RF ), respectively. In the end, the multiplication betweenĈ m of size N ×L andβ m of size L × 1 is required. Therefore, step 17 has complexity in order of O (N L).
Consider steps 2, 5, 9, 10 and 11 are executed L times, and step 17 is executed once. So the overall complexity of the two-stage scheme is Although the complexity of the two-stage scheme is higher than that of the other three schemes, it has the best performance among those schemes, which is demonstrated in the simulation section.

V. SIMULATION RESULTS
In this section, the simulation results are presented to evaluate the performance of the proposed beamspace channel estimation schemes. For the wideband mmWave systems, the BS equips the N = 128 ULA array and N RF = 8 RF chains to serve K = 8 single-antenna users with OFDM modulation, which has M = 512 sub-carriers. The carrier frequency is f c = 28 GHz, and the bandwidth is B = 4 GHz. For the channel parameters, the number of scattering paths is L = 3, the AoA is uniformly distributed in − π 3 , π 3 , the path gain β l is based on Gaussian distribution CN (0, 1), and the time delay is uniformly distributed in (0, 20 ns). While the SNR is defined as SNR = 1/σ 2 . Besides, we use P = 10 time slots for pilot transmission and set ω = 4 for the CSbased schemes. All simulation results are calculated over 1000 Monte Carlo trials.

A. AOA ESTIMATION PERFORMANCE COMPARISON
Firstly, we evaluate the performance of normalized AoA estimation. The Euclidean distance between the actual steering vector and estimated steering vector is defined to evaluate the accuracy of normalized AoA estimation [11]. Fig.7 shows the performance of normalized AoA estimation at different SNR levels. It can be observed that the performance of the proposed scheme is better than all considered schemes at all SNR regions, which shows the advantage of the AoA estimation scheme. Note that the proposed scheme in [11] and MUSIC algorithm [18] are designed for narrowband without the parameter for the number of sub-carriers. Fig.7 also shows that the performance of the proposed scheme becomes more accurate with N or M going larger. The reason is that the quantization error decreases with the increase of the number of antennas and sub-carriers, which is caused by the calculation of pre-defined directions and beam distance. Moreover, the proposed scheme has a preformance  floor when SNR is high, caused by the quantization error due to the limited number of antennas and sub-carriers. Fig.8 illustrates the d AoA against pilot length P of the proposed normalized AoA estimation scheme when SNR is set to 0 dB. It can be shown that systems of considered parameters have lower d AoA as the pilot of length P going larger. Besides, the accuracy of the proposed normalized AoA estimation scheme becomes more precise with the increase of the number of antennas and sub-carrier, which is consistent with the previous analysis. Moreover, Fig.8 indicates that we can increase the number of antennas and sub-carriers to get more accurate results instead of increasing the pilot length.
In addition to the Euclidean distance as evaluation criteria, we also evaluate the performance of normalized AoA estimation from another perspective. The probability of successful normalized AoA estimation is defined as the ratio of the number of successful experiments to the total number of trials. Note that we consider the normalized AoA estimation is successful when |ϕ l −φ l | < 0.001 due to the limited number of antenna and sub-carriers. According to Fig.9, it can be observed that the performance of the proposed AoA estimation scheme is better than all considered schemes, especially in the low SNR region. The reason is that we adopt the sliding window method, which can lessen the impact of noise. Moreover, the performance of the MUSIC algorithm is poor, and the success probability is only 42% when the SNR is high. This is because the step size of the peak search limits performance. Generally speaking, the smaller the step size is, the better the performance and the higher complexity.

B. CHANNEL ESTIMATION PERFORMANCE COMPARISON
In this subsection, we demonstrate the performance comparisons of beamspace channel estimation. The normalized mean square error (NMSE) is adopted as performance evaluation criteria, which is defined as Fig.10 shows the performance of all considered schemes in wideband wireless communication systems. From Fig.10, we can see that the performance of the proposed two-stage scheme is better than other schemes in all SNR regions, and the performance of the BSOMP scheme is better than other CS-based schemes and close to the oracle LS scheme. Specifically, the NMSE of the OMP-based scheme is not satisfying when SNR is low because it does not use the unique sparse structure of the wideband beamspace channel. For the SOMP-based scheme, the performance decline when SNR goes high, which is caused by the reason the common support assumption is no longer valid because of the beam squint effect. By contrast, the BSOMP scheme has better accuracy than both OMP and SOMP based schemes since it considers the sparse structure of the wideband beamspace channel. The BSOMP scheme can achieved the NMSE quite close to the oracle LS scheme, which means the support detection is accurate. Besides, the proposed two-stage scheme has the best performance since it exploits the sparse structure in the normalized AoA estimation phase and reconstructs the channel according to the channel model. The four CS-based schemes reach the NMSE floor when SNR is high since they only consider the wideband beamspace channel's nonzero elements and ignore low power elements, which induce the error. And the proposed two-stage scheme also reaches the NMSE floor when SNR is high. This can be explained by the fact that there is quantization error at the normalized AoA estimation phase due to the limited number of antennas and sub-carriers, which cannot be eliminated. Fig.11 depicts NMSE comparison against the bandwidth B when the SNR is 20 dB, and other simulation parameters remain the same as Fig.10. When B is small, the NMSE of SOMP is close to the BSOMP scheme. However, the performance of SOMP decreases as B increases and is even worse than the OMP scheme when B is larger than 3GHz. This can be explained by the fact that for small B, the level of beam squint is less pronounced and the support at different sub-carriers is similar. As B increases, the support of the wideband beamspace channel at different sub-carriers will be more divergent, and the common support assumption will be more unreliable. However, the performance of BSOMP and two-stage scheme do not degrade with B, which means both our schemes are robust to bandwidth even if the level of beam squint is not pronounced. Fig.12 provides NMSE performance against pilot length P when SNR is 20dB, and other parameters remain the same as in Fig.10. It can be observed that all considered schemes have better NMSE as pilot length becomes longer.  For the same pilot length P , the two-stage scheme always has better NMSE than all considered schemes, and the BSOMP scheme is better than other CS-based schemes. For different lengths of the pilot, the NMSE gap between the proposed two-stage scheme and other schemes is quite large. This suggests that the two-stage scheme can efficiently reduce the pilot overheads for the beamspace channel estimation. Fig.13 shows the achievable sum-rate with different channel estimation schemes, where the hybrid precoding method in [38] is adopted. From Fig.13, we learn that the system achieves a higher sum rate by utilizing the proposed twostage scheme, which is very close to one with the perfect channel when SNR is low (e.g., 0 dB) and moderate (e.g., 15 dB). The system using the BSOMP scheme has a higher sum rate than that uses other CS-based schemes. Besides, the sum rate of using a two-stage scheme at 0 dB is greater than that of other CS-based schemes at 15 dB, which indicates the two-stage scheme performs well, especially at low SNR. Moreover, the performance of all schemes varies with the channel estimation SNR. This can be explained by the fact that the sum rate is dominated by channel estimation error when the SNR for data transmission is high.

VI. CONCLUSION
In this paper, we investigate the beamspace channel estimation problem for the mmWave MIMO systems relying on the ULA array. We propose a BSOMP scheme and a novel two-stage beamspace channel estimation framework. For the BSOMP scheme, the support is detected by selecting the BWins generated to capture a single path beamspace channel power. For the proposed two-stage scheme, the normalized AoA is estimated by utilizing frequency-dependent sparse structure and power focusing effect of beamspace channel, aiming to find the no power leakage antenna among all sub-carriers. After getting normalized AoA, the path gain is updated in the second phase. The simulation results show that the performance of the BSOMP scheme is better than other CS-based schemes, and the two-stage scheme is better than other schemes in all considered SNR regions at the same pilot overheads. In future work, we will consider the beam squint effect when the reconfigurable intelligent surface (RIS) is introduced and propose a suitable algorithm for channel estimation and phase design of RIS. . where inequality (a) obtained by Rayleigh-Ritz ratio [9].
Same as in performance analysis of the BSOMP scheme, the Fisher matrix is where G m =WC m . Therefore, the CRLB ofβ is obtained by can be calculated as where the inequality (a) derives from the arithmeticharmonic means inequality [39] and λ i (·) denotes the i-th eigenvalue. Substituting (49) and (48) into (46), we finally get the CRLB of the two-stage scheme.