Next Article in Journal
Monitoring Movements of Ataxia Patient by Using UWB Technology
Next Article in Special Issue
Design and Optimization for 77 GHz Series-Fed Patch Array Antenna Based on Genetic Algorithm
Previous Article in Journal
On-Line Visual Tracking with Occlusion Handling
Previous Article in Special Issue
Iterative Analog–Digital Multi-User Equalizer for Wideband Millimeter Wave Massive MIMO Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FDD Channel Estimation Via Covariance Estimation in Wideband Massive MIMO Systems

Department of Computer Engineering & CITIC Research Center, University of A Coruña, 15001 Galicia, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2020, 20(3), 930; https://doi.org/10.3390/s20030930
Submission received: 10 January 2020 / Revised: 26 January 2020 / Accepted: 6 February 2020 / Published: 10 February 2020
(This article belongs to the Special Issue Millimeter-Wave Antenna Arrays: Design, Challenges, and Applications)

Abstract

:
A method for channel estimation in wideband massive Multiple-Input Multiple-Output systems using hybrid digital analog architectures is developed. The proposed method is useful for Frequency-Division Duplex at either sub-6 GHz or millimeter wave frequency bands and takes into account the beam squint effect caused by the large bandwidth of the signals. To circumvent the estimation of large channel vectors, the posed algorithm relies on the slow time variation of the channel spatial covariance matrix, thus allowing for the utilization of very short training sequences. This is possibledue to the exploitation of the channel structure. After identifying the channel covariance matrix, the channel is estimated on the basis of the recovered information. To that end, we propose a novel method that relies on estimating the tap delays and the gains as sociated with each path. As a consequence, the proposed channel estimator achieves low computational complexity and significantly reduces the training overhead. Moreover, our numerical simulations show better performance results compared to the minimum mean-squared error solution.

1. Introduction

Massive Multiple-Input Multiple-Output (MIMO) and millimeter wave (mmWave) technologies are promising candidates to satisfy the demands of future wideband wireless communication systems. A common feature of these technologies is the deployment of antenna arrays with a large number of elements while keeping the size of the antenna aperture small. This way, advantages such  as  large beamforming gains necessary to compensate the propagation losses at mmWave frequencies, or the ability to support many spatially multiplexed streams, are efficiently achieved [1,2]. In addition, it has been shown that in multi-cell massive MIMO systems the capacity increases without bound as  the number of antennas increases, even under pilot contamination, if the channel covariance matrices of the contaminating users are asymptotically linearly independent [3]. Measurements campaigns have proved that this is usually the case in practice [4].
Deploying massive antenna arrays raises concerns in terms of hardware cost and power consumption, especially at mmWave frequencies. To alleviate these requirements, hybrid architectures with an analog preprocessing network operating in the Radio Frequency (RF) domain have been proposed to reduce the number of complete RF chains [2]. Hybrid approaches imply stringent limitations, such as the reduced spatial dimensionality and the lack of flexibility of the analog part, usually implemented with a Phase Shifter (PS) network. Thus, results obtained for fully digital scenarios are not applicable to the hybrid case in general, and new precoding or combining solutions are needed.
The benefits of large antenna arrays rely on the accuracy of the Channel State Information (CSI). Considering wideband systems with multicarrier modulations, this information should be acquired for the whole frequency band. However, acquiring the CSI is challenging in this  scenario, especially when the system operates in Frequency-Division Duplex (FDD) mode. In this  case, if the channel is sounded in the downlink, the length of the training sequence depends on the number of transmit antennas [5]. Thus, most prior art on channel estimation focused on systems operating on Time-Division Duplex (TDD) mode [6]. Moreover, the aforementioned restrictions related to hybrid architectures are even more challenging on wideband scenarios, since the analog network is frequency-flat [7]. This‘is incompatible with methods independently designing the spatial signature of the training sequences for each subcarrier, such as those employed in [8,9], or fully digital narrowband solutions. Therefore, hybrid approaches usually compensate these limitations with longer training overheads [10]. This problem is even more relevant if large bandwidths are considereddue to  the beam squint effect [11], which causes that the channel statistics change in each subcarrier.
A common approach to reduce the training overhead is to as sume that the channel is sparse [12,13,14,15]. Some authors conducting measurement experiments reported sparsity at mmWave frequencies, with a few channel propagation paths and small angular spreads [16,17]. Nevertheless, measurement campaigns on sub-6 GHz frequencies with local scatterers around the Base Station (BS) show that these channels are not sparse in general [10,18]. As a consequence, dropping the sparsity as sumption enables a wider applicability, and allows us to circumvent the power leakage typically arisingdue to  basis mismatching.
Other authors consider that the channel second order statistics are known in a dvance. Unfortunately, these channel statistics have to be acquired too in practical settings. In case the channel is almost spatially uncorrelated or Rician-distributed, it has been proved that it is a good approximation to estimate only the diagonal elements of the covariance matrix of the channel [19,20]. Under correlated channels with Rayleigh fading, the training sequences are designed according to the main eigenvectors of the downlink channel covariance [21,22]. However, these training sequences cannot be used in hybrid architectures given their hardware constraints. In TDD scenarios, this covariance can be computed from the uplink [23,24] and extrapolated to the downlink by means of channel reciprocity. However, uplink and downlink channels are different in FDD mode, and in that case angular reciprocity is assumed [25,26,27,28,29]. However, the computational complexity of these solutions for fully-digital narrowband scenarios is high, and they require an additional training period in the downlink to obtain the instantaneous channel realizations.
Channel covariance estimation techniques usually rely on the assumption of Toeplitz covariance matrices, a circumstance that occurs in usual antenna arrangements, such as Uniform Linear Array (ULA), and channel as sumptions, such as Rayleigh fading and the one-ring channel model [19,21,27]. This structure can be exploited to design short training sequences based on sparse rulers. Previous works designed these rulers using coprime arrays [30,31], but this strategy cannot attain the shortest possible training lengths for any number of antennas. There are works that consider more efficient ruler designs for small covariance matrices [32,33], but they lack an efficient method to design these training sequences for an arbitrary number of antennas.
Regarding frequency selective channels, solutions have been proposed [15,34,35]. In [34], authors address covariance estimation in TDD mode for hybrid architectures. The proposed algorithm is an application of Orthogonal Matching Pursuit (OMP) and solves the Multiple Measurement Vector (MMV) problem by allowing for a dynamic sensing matrix. In addition, this method is extended to wideband scenarios under the assumption of common subcarrier support, as  done in [8,14,15]. Conversely, authors consider approximately frequency flat second order channel statistics in [35]. These  as sumptions are impractical for scenarios where the signal bandwidth is as large as  several GHz [10] because of the beam squint effect. Recently, refs. [28,29] considered the wideband scenario under beam squint and relied on angle reciprocity and channel sparsity to estimate the downlink channel in FDD. However, this approach shares the same drawbacks as  other works based on similar as sumptions, namely, large computational complexity and additional training in the downlink, as  explained above.
In this work, we propose a downlink channel estimation method for wideband massive MIMO systems over hybrid architectures based on the explicit acquisition of the channel covariance matrix. With the aim of achieving low training overheads, the proposed method employs training sequences based on sparse rulers. The considered setup typically arises in mmWave frequencies, but the proposed framework is general enough to be applicable in sub-6 GHz bands since channel sparsity is not required. Indeed, our approach only as sumes (1) the channel is wide-sense stationary over a period of time, and (2) the channel covariance matrix has a Toeplitz structure. Moreover,  we consider the nonlinear spatial transformation caused by the beam squint effect. In the following, we summarize the contributions of our work:
  • We propose a wideband downlink channel covariance estimation method for hybrid FDD massive MIMO systems that can be utilized for mmWave frequencies. The covariance matrices for different subcarriers lie in distinct subspacesdue to  the beam squint effect. The proposed covariance estimation algorithm implements dictionary rotations to overcome this difficulty and exploit the common structure of the wideband covariance matrices.
  • Differently from the prior art, and with the aim of reducing the training overhead, Wichmann sparse rulers [36] are used to design the training sequences. Prior work made use of coprime arrays, but this solution cannot attain the lowest training lengths for a given number of transmit  antennas. Other works that relied on sparse rulers did not provide methods to design the training sequences for a number of antennas arbitrarily large.
  • We analyze the limitations of the proposed method based on MUltiple SIgnal Classification (MUSIC): (1) The low rank characteristic of the sample covariance matrices limits the applicability of MUSIC-like techniques. The use of sparse rulers make it possible to extend the proposed method using spatial smoothing techniques and allows for even shorter training sequences;  (2) Prior knowledge on the number of channel paths. We show numerically that the number of channel paths can be estimated without significant performance losses and negligible computational cost. (3) The dictionary dependency. We provide a discussion on dictionary sizes, posing that moderated sizes provide enough angular resolution. Furthermore, since sparsity is not a requirement for the proposed method, we show empirically how to avoid the power leakage arising when the Angle of Departures (AoDs) do not lie on the dictionary.
  • We propose a low complexity channel estimator, leveraging on the structure of the wideband channels to avoid the individual per-subcarrier procedure. This estimator obtains the delays and gains corresponding to each channel propagation path exploiting underlying the common information for all the subcarriers. Moreover, the estimation is performed using the training sequence employed for the acquisition of the covariance matrices, and it does not require an additional training stage.

Notation

In this work, scalars are represented by lower case letters, vectors by bold lower case letters, and matrices by bold capital letters. The superscripts T and  H are the transpose and the Hermitian operators, respectively. The operator | · | with a scalar argument represents the absolute value, · with a vector argument represents its norm and  · F with a matrix argument the Frobenius norm. The operator · represents the element-wise floor operation. A B is the Kronecker product of matrices A and  B , and  A B the Khatri–Rao product (or the column-wise Kronecker product) of matrices A and  B . diag · is the diagonal matrix with the arguments in its main diagonal, and  blockdiag · constructs a diagonal supermatrix in which the diagonal elements are given by the matrices in the argument, and the off-diagonal elements are zero matrices. 0 and  1 denote the vectors of zeros and ones of the right dimension, respectively. The constant c is the speed of light.

2. System Model

We consider a hybrid FDD massive MIMO system with M transmit antennas and  N R F RF chains communicating with several single-antenna users over a multipath channel. To overcome the channel frequency selectivity, we assume an Orthogonal Frequency-Division Multiplexing (OFDM) modulation with  N c subcarriers and a cyclic prefix large enough to avoid intercarrier and intersymbol interference. One of the aims of this work is to estimate the downlink channel covariance; hence, as suming that the channel is locally stationary, we will consider several blocks k { 1 , , K } with the duration of the channel coherence time. At the beginning of each block, the BS transmits a training sequence of length T tr OFDM symbols to sound both the covariance and the channel. The remaining time is devoted to data transmission. That is, we collect K wideband observations of size T tr during K training periods. Since the block duration is limited by the channel coherence time, the training sequence length must satisfy T tr M due to  the large antenna array deployed at the BS. Indeed, we circumvent the requirement of training sequence lengths larger than the number of paths L, i.e., we propose solutions to the scenario T tr L .
Finally, we will as sume that the users feed the observations back to the BS using, e.g., scalar quantization [8,37]. With the received feedback, the BS estimates the channel covariance matrices for all users and frequency subcarriers. This scheme allows for tracking the channel covariance matrices using subsequent received feedbackdue to  the slow variation of the channel statistics. As previously mentioned, these slow changes make channel covariance acquisition simpler than estimating the channel itself. Next, with the aid of the obtained channel covariance matrices, we estimate the wideband channels employing the  K aforementioned observations.

2.1. Channel Model

We represent the channel response to an arbitrary user using a model similar to that in [34,38]. We as sume a multipath channel model with N taps, where the gains for all antennas in the f-th subcarrier during the k-th coherence block are given by
h k ( f ) = n = 0 N 1 h n , k ( f ) e j 2 π f n / N c , f = 0 , , N c 1 ,
where h n , k ( f ) C M are the channel coefficients of the n-th tap of the channel. Each tap, in turn, is decomposed into L paths as 
h n , k ( f ) = l = 1 L g l , k p rc ( n T s τ l ) a ( ϑ l , f ) ,
where the  a ( ϑ l , f ) C M are the steering vectors for the AoD of the l-th path ϑ l , g l , k N C ( 0 , σ l 2 ) and  τ l are the complex channel gain and the relative delay corresponding to the l-th path, respectively, and  p rc ( t ) is the transmit filter as sociated with the Digital to Analog Converter (DAC) sampled with time interval T s . We as sume that the channel gains for different paths are statistically independent, i.e.,  E [ g l g l * ] = σ l 2 δ ( l l ) .
Regarding the steering vectors, we consider a ULA configuration together with the so-called beam squint effect that arises for the typical antenna apertures in massive MIMO and large relative signal bandwidths B / f c [11]. We express this effect as  a rotation of the steering vectors with respect to the central frequency f c , i.e.,
a ( ϑ l , f ) = Γ ( ϑ l , f ) a ( ϑ l ) ,
where a ( ϑ l ) C M is the steering vector for  ϑ l in a  ULA, whose elements are [ a ( ϑ l ) ] m = e j 2 π λ d ( m 1 ) sin ϑ l , m = 1 , , M , with d is the distance between two consecutive array elements, λ is the carrier wavelength, and  Γ ( ϑ l , f ) is a diagonal matrix with the beam squint rotation coefficients. Under the common as sumption of inter-antenna distance d = c 2 f c , the elements of  Γ ( ϑ l , f ) read as 
[ Γ ( ϑ l , f ) ] m , m = e j π sin ϑ l ( m 1 ) 1 + Δ f N c B f c , m = 1 , , M ,
where Δ f B N c is the frequency offset of the f-th subcarrier with respect to  f c , and  Δ f = f N c 1 2 is its relative position to the central one. This frequency-dependent rotations allows for exploiting the common structure of the wideband covariance matrices, since the beam squint effect for the subcarriers is isolated as  a rotation of the frequency-independent vectors a ( ϑ l ) .
Combining Equations (1) and (2), the channel frequency response for the f-th subcarrier reads as 
h k ( f ) = l = 1 L g l , k β l ( f ) a ( ϑ l , f )
where N is the number of delay taps and  β l ( f ) = n = 0 N 1 p rc ( n T s τ l ) e j 2 π f n / N c .
Since the actual AoD of the channel are unknown, the linear combination of steering vectors in (5) can be approximated by means of the steering vectors as sociated with G predefined angles θ 1 , , θ G comprised in a  frequency-dependent dictionary A ( f ) = [ a ( θ 1 , f ) , a ( θ G , f ) ] , that is,
h k ( f ) = l = 1 L g l , k β l ( f ) a ( ϑ l , f ) A ( f ) B ( f ) g k ,
where g k C G is a vector with L non-zero entries in the positions corresponding to the AoDs, i.e.,  g k N C ( 0 , G ) with  G = diag ( σ 1 2 , , σ G 2 ) , and  B ( f ) = diag ( β 1 ( f ) , , β G ( f ) ) . Note that equality holds for (6) if the AoDs lie on the dictionary. According to the assumptions regarding the channel statistics, we obtain h k ( f ) N C ( 0 , C h ( f ) ) . Moreover, we use the approximation in (6) yielding
C h ( f ) A ( f ) B ( f ) G B H ( f ) A ( f ) H = A ( f ) G ˜ ( f ) A ( f ) H ,
with C h ( f ) being Toeplitz, and  G ˜ ( f ) = B ( f ) G B H ( f ) the effective gains for the f-th subcarrier. In the following, we will consider that the AoDs belong to the dictionary, i.e., we as sume C h ( f ) = A ( f ) G ˜ ( f ) A ( f ) H . The consequences of this as sumption are discussed in Section 4.4.

2.2. Channel Observations

We aim to estimate the channel response using hybrid architectures to reduce the power consumption and implementation costs. This is achieved by using a number of RF chains significantly lower than the number of antennas, i.e.,  N RF M . As a counterpart, the feasible hybrid precoding vectors are restricted to the subspace imposed by the hardware limitations. Considering hybrid precoding, the signal received by the users during the k-th training block in the f-th subcarrier is
φ k ( f ) = T ( f ) h k ( f ) + v k ( f ) = X ( f ) P h k ( f ) + v k ( f ) , f = 0 , , N c 1 , k = 1 , , K ,
where v k ( f ) N C ( 0 , σ v ( f ) 2 I ) is the noise, T ( f ) = X ( f ) P C T tr × M is a matrix whose rows are the training vectors for each of the  T tr training channel uses, such that T T ( f ) = [ t 1 ( f ) , , t T tr ( f ) ] , t t ( f ) C M , X ( f ) = diag ( x 1 ( f ) , , x T tr ( f ) ) contains the transmitted pilot symbols in the f-subcarrier satisfying | x t ( f ) | = 1 , and  P T = [ p 1 , , p T tr ] is the hybrid precoder. Therefore, each of these training vectors is the result of the product
t t ( f ) = p t x t ( f ) = P A , t p D , t x t ( f ) , t = 1 , , T tr ,
where p t is the configuration of the precoders during the t-th training period of k-th training block, and  P A , t C M × N RF and  p D , t C N RF denote the analog and digital precoding matrices. Recall that  P A , t is flat in the frequency domain. Hence, it is not possible to design independent training sequences for each subcarrier.
After the transmission of the  T tr vectors and the subsequent feedback stage, the channel observation φ k ( f ) of the k-th training period will be available at the BS. Since the channel and the noise vectors are statistically independent, the covariance of the observations is given by
C φ ( f ) = T ( f ) C h ( f ) T ( f ) H + σ v ( f ) 2 I .
By considering the channel covariance matrix model in (7), we rewrite the previous equation as 
C φ ( f ) = Ψ ( f ) G ˜ ( f ) Ψ H ( f ) + σ v ( f ) 2 I ,
where we introduce the measurement matrices Ψ ( f ) = [ ψ 1 ( f ) , , ψ G ( f ) ] = T ( f ) A ( f ) . When the K samples of the random variable of (8) are available at the BS, C φ ( f ) is estimated. Furthermore, given that both Ψ ( f ) and  σ v ( f ) are known, determining C φ ( f ) is equivalent to estimating G ˜ ( f ) .

3. Training Design

This section is devoted to the design of training sequences to estimate the channel covariance, together with bounds on their length to guarantee its correct identification under the assumption of a  Toeplitz structure.
Toeplitz covariance matrices of size M × M have M different coefficients and a training sequence of length T tr allows for defining ( T tr 2 + T tr ) / 2 equations. This establishes the following trivial lower bound on the training length: T tr 2 M + 1 / 4 1 / 2 . A necessary condition to achieve this bound is that the training sequence should allow for measuring the correlation between pairs of antennas for any separation. This has a strong connection with complete sparse rulers, defined mathematically as  an ordered integer sequence of with T elements R = ( r 1 , , r T ) , such that r 1 = 0 and any z r T can be written as  z = r i r j for some r i , r j R . In this case, we speak of a complete sparse ruler of length r T , and a perfect ruler is a complete ruler that contains the minimum possible number of elements for a given r T . It has been shown that a training sequence built from a perfect sparse ruler is sufficient for determining Toeplitz covariance matrices [32,33]. In that case, the length of the training sequence is equal to the number of elements of the ruler, i.e.,  T tr = T , and the last element of the ruler is  r T = M 1 .
In the following, we summarize some results found in the literature for sparse rulers. First, there exist bounds on the size of perfect rulers for a given length [39].
Theorem 1.
(see [39]) Denoting l ( r ) as  the minimum number of marks of a complete sparse ruler of length r, lim r l 2 ( r ) r exists and
2.434 lim r l 2 ( r ) r 375 112 3.348 .
Hence, it is possible to identify a Toeplitz covariance matrix using sparse rulers with a training length that satisfies 2.434 M T tr 3.348 M when M approaches infinity.r.
Utilizing the bounds provided by (12), we evaluate the efficiency of the training sequences employed in previous works on AoD estimation [30,31]. Let us start by introducing a trivial example of a complete sparse ruler of length m 2 , for any arbitrary m. Using the  2 m 1 marks ( 0 , 1 , 2 , , m 1 , 2 m 1 , 3 m 1 , , m 2 1 ) provides a length of  m 2 1 different differences. This complete ruler achieves a value of  lim m ( 2 m 1 ) 2 m 2 = 4 , larger than the bound of (12). A similar result is obtained with the use of coprime arrays [30,31]. This method is based on two co-prime numbers p and q, i.e.,  mcd ( p , q ) = 1 , and on generating the ruler as  the set { m p : 0 m < q } { n q : 0 n < p } . This method achieves a training length of  T tr = p + q and allows for identifying p q different lags. Thus, the limiting function reads as  ( p + q ) 2 p q . Setting p = q = M provides the bound lim M ( 2 M ) 2 M = 4 . Hence, coprime arrays cannot improve the asymptotic efficiency of the previous trivial complete ruler.
In the literature, we can find some techniques to build sparse rulers for arbitrary lengths satisfying the bound in (12). The following section summarizes a method to obtain rulers that are close to that bounds with very low computational complexity.

3.1. Wichmann Rulers

Wichmann [36] proposed a direct method to obtain sparse rulers of any size that, denoted as  the differences between consecutive terms, have the form
W = ( 1 , , 1 r , r + 1 , 2 r + 1 , , 2 r + 1 r , 4 r + 3 , , 4 r + 3 s , 2 r + 2 , , 2 r + 2 r + 1 , 1 , , 1 r )
for some positive integers r and s such that 2 r 2 s 2 r + 4 . The number of elements of this ruler is  l ( r , s ) = 4 r + s + 3 and its length n ( r , s ) = 4 r 2 + 8 r + 3 + ( 4 r + 3 ) s . Hence, if the elements of  W are denoted as  w i , i = 1 , , l ( r , s ) , the sparse ruler is  R = ( r 1 , , r l ( r , s ) + 1 ) with  r i = r i 1 + w i , i = 2 , , l ( r , s ) + 1 and  r 1 = 0 .
With a change of variable s = s + ( 2 r 2 ) , the length expression changes to
n ( r , s ) = 12 r 2 + 6 r 3 + ( 4 r + 3 ) s = n lo ( r ) + ( 4 r + 3 ) s , s = 0 , , 6 .
The parameter r roughly adjusts the length to intervals starting at n lo ( r ) . Then, for each r value, the  s parameter finely adjusts length in steps of size 4 r + 3 . This allows for finding parameters r and  s that generate a sparse ruler whose length is as close as  possible to a given value.
Finally, the ruler can be completed up to the desired length, relying on the following observation: it  is always possible to build a ruler that contains all differences in the interval [ a , b ] with a set of the form R = ( 0 , 1 , 2 , , c , b c ( c + 1 ) , , b 2 c , b c , b ) , with  c = b a . The cardinality of this set is at most 2 c + 3 . In this case, given a Wichmann ruler R = ( r 1 , , r l ( r , s ) + 1 ) of length n ( r , s ) , and a set R that contains all differences in the interval [ n ( r , s ) , M ] , a new ruler extended to length M can be found by  R R .

3.2. Training Based on Sparse Rulers

A training sequence to estimate a Toeplitz covariance matrix can be built from a sparse ruler. In that case, the length of the ruler must be equal to  M 1 , the number of correlation coefficients. Given a sparse ruler of length M 1 with elements ( r 1 , , M 1 ) Z T tr , a training sequence for a Toeplitz covariance matrix can be designed as  T T ( f ) = [ e r 1 + 1 , , e M ] , where e i is the M-length indicator column vector of zeros and a one in the i-th element.
In the case of Wichmann rulers, it is not possible to obtain them for arbitrary lengths, but instead a length close to a given value. Hence, to obtain a ruler close to length M 1 , the r parameter can be  found as  the largest such that n lo ( r ) M 1 , which is  r * = 12 M + 33 3 12 . The other parameter is the largest s such that n ( r * , s ) M 1 , implying that s * = M 1 n lo ( r * ) 4 r + 3 . The length of the ruler can be completed up to  M 1 by the method detailed in the previous section. Since the Wichmann rulers always contain entries from 0 to r, and the s parameter increases length in steps of  4 r + 3 , this completion procedure will add always at most 4 r + 3 r = 5 marks to the ruler, and the minimum training period will be in the interval T tr [ l ( r * , s * ) , l ( r * , s * ) + 5 ] . Figure 1 shows the training length obtained with this strategy compared to the upper and lower bounds in (12). It also shows the length achieved with the coprime arrays strategy.
In the following, we show that the mild as sumption N RF 2 is enough to obtain the sparse ruler training sequence T ( f ) according to (9). Let ϕ r , r = 1 , , N RF be blueeligible arbitrary phases for the PSs in the analog processing network, and  P A , t T = [ p A , t , 1 , , p A , t , M ] , with the  p A , t , m T C N RF is the m-th row of the analog precoding matrix for the t-th channel training use. Since the desired training based on sparse rulers must satisfy t t ( f ) = e r t x t ( f ) , we obtain P A , t p D , t ( f ) = e r t as 
p A , t , m T = [ e j ϕ 1 , e j ϕ 1 , e j ϕ 2 , , e j ϕ N RF 1 ] , if [ e r t ] m = 1 e j ϕ 1 , e j ( ϕ 1 + π ) , e j ϕ 2 , , e j ϕ N RF 1 ] , otherwise ,
with the digital part as  p D , t ( f ) = 1 2 e j ϕ 1 [ 1 , 1 , 0 , , 0 N RF 2 ] T . Note that only two elements of  p A , t , m T are  employed to achieve P A , t p D , t ( f ) = e r t . Indeed, N RF = 2 is enough to apply (15).

4. Covariance Matrix Estimation

In this section, we propose algorithmic solutions to estimate the downlink covariance in wideband massive MIMO systems under the beam squint effect. It is apparent that per-subcarrier (or per-sub-band) covariance estimation is not a practical approach since it does not exploit the structure of the wideband channel. Furthermore, system scalability is compromised when the number of subcarriers is large. The proposed method is based on the estimation of the subspace spanned by the steering vectors corresponding to the L AoD of the channel propagation paths. Well-known estimation methods like MUSIC have been employed to estimate a single signal subspace. However, the beam squint effect results in different subspaces for the channel covariance matrices at distinct frequencies. Resorting to the rotation matrices in (4), we extend MUSIC to the considered setup.

4.1. Subspace Estimation

To estimate the subspace, let us start by computing the eigendecomposition of the sample covariance matrices C ^ φ ( f ) = U ( f ) Λ ( f ) U H ( f ) for all the frequencies f. Given that the covariance converges to the second order moment of the channel for a large number of samples, we have C ^ φ ( f ) C φ ( f ) when K . Next, the eigenvectors corresponding to the L largest eigenvalues are discarded to obtain the basis U ¯ ( f ) that spans the noise subspace of the f subcarrier. Due  to the beam squint effect, the relationship among dictionary matrices for different frequencies is nonlinear. Therefore, it is not feasible to perform this step jointly for all the subcarriers unless the relative signal bandwidth B / f c is very small. A remarkable feature of this subspace discrimination is its lack of dependency with the SNR, as  long as  the noise is spatially white. Under such as sumption, for the approximation in (11), we get
C φ ( f ) = Ψ ( f ) G ˜ ( f ) Ψ H ( f ) + σ v ( f ) 2 I T tr = U ( f ) ( Λ ( f ) + σ v ( f ) 2 I T tr ) U H ( f ) .
Therefore, it is crucial to force this condition by applying a pre-whitening filter, if necessary. According to this observation, the search is performed in the effective subspace, contrarily to OMP-based methods like Covariance OMP (COMP) [34]. Thus, we identify the support S = { θ s 1 , , θ s L } , with  s i { 1 , , G } i , with the following wideband estimator function
J i = f = 1 N c 1 ψ i H ( f ) U ¯ ( f ) 2 2 , i = 1 , , G ,
where ψ i ( f ) = T ( f ) a ( θ i , f ) . Note that this function greatly increases the robustness of the decision, compared to a per-carrier estimator because it exploits the common structure for all the frequencies, given that the vectors a ( θ i , f ) are generated from a common dictionary as  shown in (3). Finally, the support S is identified as  the L largest values of  J i .

4.2. Estimating Gain Variances

Once the angles are determined, we construct the matrices Ψ S ( f ) = T ( f ) [ a ( θ s 1 , f ) , , a ( θ s L , f ) ] . Recall that the path delays τ l of (2) are unknown, as  well as  the matrices B ( f ) in (7). Therefore, we aim at estimating the per-carrier gain variances G ˜ ( f ) in (7). In particular, we denote by  G ^ S ( f ) the estimated gain variances as sociated with the angles in the support S . Then, it is possible to use the estimator in [40], i.e., G ^ S ( f ) = Ψ S ( f ) ( C ^ φ ( f ) σ v ( f ) 2 I ) ( Ψ S ( f ) ) H . This solution clearly depends on the rank of  Ψ S ( f ) , and cannot be applied in the following situations:
  • L > T tr . This scenario corresponds to short training sequences compared to the number of propagation paths. Under this as sumption, the pseudo-inverse Ψ S ( f ) cannot recover the L variance gains.
  • L T tr . In this case, the rank of  Ψ S ( f ) depends on the choice of the dictionary matrix, A , for a proper training matrix T . Recall that the dictionary contains non-orthogonal vectors when M is finite. When the distance between two consecutive angles in the dictionary is small, a situation that arises when G is large, this may lead to dictionary matrices such that rank ( A ) min { M , G } , thus making Ψ S H ( f ) Ψ S ( f ) ill-conditioned.
To circumvent (2), one possibility is to reduce the dictionary size (see Section 4.4). Alternatively, we propose to exploit the diagonal structure of  G ˜ ( f ) and estimate it as
vec ( G ^ S ( f ) ) = ( Ψ S * ( f ) Ψ S ( f ) ) vec ( C ^ φ ( f ) σ v ( f ) 2 I T tr ) ,
with ∘ the Khatri–Rao product (or the column-wise Kronecker product). The product Ψ S * ( f ) Ψ S ( f ) produces L-rank matrices when the columns of  Ψ S ( f ) are not co-linear (see Appendix A).
Regarding the channel gain variances, it is again possible to exploit the channel structure to increase the estimation accuracy. Consider the relationship between the values of  β l ( f ) across frequencies, in particular,
β l ( f ) = n = 0 N 1 p rc ( n T s τ l ) e j 2 π f n / N c = n = 0 N 1 p rc ( n T s τ l ) e j 2 π ( f ) n / N c * = β l * ( f ) ,
and, denoting f = f + N c , we obtain β l ( f ) = β l * ( f ) . Recall now that the L non-zero diagonal elements of  G ˜ ( f ) are  β l ( f ) σ l β l * ( f ) = | β l ( f ) | 2 σ l , l = 1 , , L . This implies that G ˜ ( f ) = G ˜ ( f ) , and  we can increase the robustness of the estimated gain variances in (18) with
vec ( G ^ S ( f ) ) = 1 2 ( Ψ S * ( f ) Ψ S ( f ) ) vec ( C ^ φ ( f ) σ v ( f ) 2 I ) + ( Ψ S * ( f ) Ψ S ( f ) ) vec ( C ^ φ ( f ) σ v ( f ) 2 I ) .
Observe that not all the gains can be calculated by means of the latter expression. In the case of odd N c , f = 0 has to be determined using (18), whereas f = 0 and  f = N c / 2 have to be computed via (18) for  N c even.
The proposed method, labeled as  DR-W-MUSIC (Dictionary-Rotation Wideband MUSIC), is summarized in Algorithm 1.
Algorithm 1 Dictionary-Rotation Wideband MUSIC
1:
g [ σ 1 2 , , σ G 2 ] T
2:
for all f { 1 , , N c } do
3:
    C ^ φ ( f ) 1 K k = 1 K φ k ( f ) φ k H ( f )
4:
    U ¯ ( f ) compute noise basis (16)
5:
end for
6:
for all i { 1 , , G } do
7:
    J i 0
8:
for all     f { 1 , , N c } do
9:
        a ( θ i , f ) Γ ( θ i , f ) a ( θ i ) rotate dictionary
10:
        ψ i ( f ) T ( f ) a ( θ i , f )
11:
        J i J i + ψ i H ( f ) U ¯ ( f ) 2 2
12:
  end for
13:
end for
14:
S L larger values on  J i
15:
for all f { 1 , , N c / 2 } do
16:
    Ψ S ( f ) T ( f ) [ a ( θ s 1 , f ) , , a ( θ s L , f ) ]
17:
    Ψ S ( f ) T ( f ) [ a ( θ s 1 , f ) , , a ( θ s L , f ) ]
18:
    vec ( G ^ S ( f ) ) , vec ( G ^ S ( f ) ) compute using (20)
19:
end for

4.3. Dictionary-Rotation Wideband Spatial Smoothing

Note that Algorithm 1 requires rank ( C ^ φ ( f ) ) > L , i.e., the number of estimated paths (or angles) is bounded by the number of training periods and the number of observations, i.e., L min { T tr , K } . Solutions to address rank deficient scenarios have been proposed in [41,42]. In particular, these solutions apply when rank ( C ^ φ ( f ) ) < L . Nevertheless, in the proposed scenario, it is desirable to reduce the training sequence length T tr as much as possible. Unfortunately, we obtain a more involved setup for rank ( C ^ φ ) = T tr < L . This scenario can be addressed by using spatial smoothing.
Spatial Smoothing (SS) is an array processing technique based on nested arrays with different antenna separation [43]. An effect similar to that of the nested arrays is obtained thanks to the use of sparse rulers [44]. Again, the proposed methods addressed the identification of signals lying in a single subspace, and we have to resort to dictionary rotations to exploit the wideband channel structure.
We start by introducing the sample covariance matrix vectorization y ( f ) = vec ( C ^ φ ( f ) ) , i.e,
y ( f ) = ( Ψ * ( f ) Ψ ( f ) ) diag ( G ˜ ( f ) ) + σ v ( f ) 2 e ,
where e = [ e 1 T , , e T tr T ] T . Observe that, due to the utilization of a perfect sparse ruler training T ( f ) , ( Ψ * ( f ) Ψ ( f ) ) contains the products corresponding to the spatial differences z { M + 1 , M 1 } . Conventional SS in [44] discards the repeated distances. However, this means employing 2 M 1 elements out of T tr 2 from y ( f ) in (22). Hence, we propose to refine the estimation by averaging over all the samples from the same inter-antenna distance. To that end, let us introduce y z j ( f ) , j = 1 , , J z , z = M + 1 , , M 1 as the j-th sample of the z-th difference in y ( f ) . Note that J z represents the number of samples available for the z-th difference, and its values depend on the sparse ruler structure and T tr . Next, we introduce the reduced vector y ˇ ( f ) C 2 M 1 with sorted elements [ y ˇ ( f ) ] z = 1 J z j = 1 J z y z j ( f ) corresponding to the 2 M 1 distances. Accordingly, we define the matrix Z ( f ) containing 2 M 1 sorted rows from ( Ψ * ( f ) Ψ ( f ) ) . Thus, the reduced vector is
y ˇ ( f ) = Z ( f ) diag ( G ˜ ( f ) ) + σ v ( f ) 2 e M .
These differences in y ˇ ( f ) are next seen as a phase shift of the M differences to be estimated. Considering M overlapping subarrays, such that the m-th subarray comprises the differences { M + m , m 1 } in y ˇ m ( f ) C M , the spatial smoothed matrix is obtained as follows: Y ˇ ( f ) = 1 M m = 1 M y ˇ m ( f ) y ˇ m H ( f ) . As shown in [43], Y ˇ ( f ) = Y ˇ 1 / 2 ( f ) Y ˇ 1 / 2 ( f ) where the matrix Y ˇ 1 / 2 ( f ) can be rewritten as
Y ˇ 1 / 2 ( f ) = 1 M A ( f ) G ˜ ( f ) A H ( f ) + σ v ( f ) 2 I M .
Due to the beam squint effect, (23) has to be computed individually for each subcarrier f. Nevertheless, we desire to achieve the benefits of exploiting the structure of the wideband channel. In this regard, we employ M Y ˇ 1 / 2 ( f ) = U ( f ) Λ ( f ) U H ( f ) as the sample covariance matrix in line 3 of Algorithm 1. Then, following Algorithm 1 steps, the L eigenvectors corresponding to the L largest eigenvalues are discarded to obtain an incomplete basis U ¯ ( f ) spanning the noise subspace. Hence, the wideband estimator incorporating the dictionary rotation capability of line 11 is updated as J i = J i + a H ( θ i , f ) U ¯ ( f ) 2 2 , where θ i is the i-th angle contained in the dictionary matrix A . To estimate the variance gains, we follow a similar procedure as that in Section 4.2 to derive the robust estimator
vec ( G ^ S ( f ) ) = 1 2 ( A S * ( f ) A S ( f ) ) vec ( Y ˘ ( f ) ) + ( A S * ( f ) A S ( f ) vec ( Y ˘ ( f ) ) ,
with A S ( f ) = [ a ( θ s 1 , f ) , , a ( θ s L , f ) ] and Y ˘ ( f ) = M Y ˇ 1 / 2 ( f ) σ v ( f ) 2 I M . This procedure replaces lines 16 to 18, and completes the posed DR-W-SS method.

4.4. Dictionary Size

In this section, we provide some insight regarding the choice of the dictionary size G. Let us start by determining a sufficient condition for unique angular identification. To that end, we first define the Kruskal rank of a matrix X , krank ( X ) . If krank ( X ) = x , then x is the maximum number such that any x columns of X are linearly independent.
Theorem 2.
In the noiseless scenario, and assuming AoD lying on the dictionary A ( f ) and rank ( C ^ φ ( f ) ) = L , the inequality L < krank ( Ψ ( f ) ) guarantees the unique identification of the L AoD [41]. See [45] for the proof.
Therefore, when krank ( A ) is small, the probability of identifying false directions increases. To illustrate the dependence with G, we compute in Figure 2 an upper bound of the Kruskal rank for G and M. This upper bound is computationally feasible and takes into account only z consecutive columns, a reasonable approximationdue to the structure of A . As shown in the figure, large G values complicate the AoD detection, which suggests to reduce the dictionary size. However, this strategy entails a power leakagedue to basis mismatching, that is, a performance lossdue to off-grid AoDs. However, the basis mismatching is moderate in practical settingsdue to the finite angular resolution of the ULA. For instance, a typical inter-antenna spacing d = λ / 2 entails an angular resolution of about 2 M radians [46], leading to G Δ ϑ π 360 º M for an angular range Δ ϑ . That is, G M provides enough resolution to cover the sector Δ ϑ = 120 º . In addition, an alternative to increasing the dictionary size is to overestimate the number of channel propagation paths L. This strategy is explored in Section 6.2.

5. Delay-Gain Channel Estimator

As a means to estimate the channel, we propose a (DG) Delay-Gains estimator that exploits the structure of the wideband channels. This method employs the observations for all the subcarriers φ k ( f ) to estimate the frequency independent gains of each propagation path g k (cf. (6)). Thus, even in the case where certain frequencies present very poor SNR, their corresponding channels can be estimated by using the information from the remaining frequencies.
The covariance estimation procedure provides the measurement matrix Ψ S ( f ) and the estimated gain variances G ^ S ( f ) for all the frequencies. Using this information, we estimate the delay matrices B ( f ) and the gain vector g k from the observations gathered in the covariance estimation stage. Next, a robust low complexity estimator is calculated for the frequency selective channels.
First, notice that the observations in (8) can be rewritten as φ k ( f ) = Ψ S ( f ) B S ( f ) g S , k + v k ( f ) , where Ψ S ( f ) was introduced in Section 4.2, B S ( f ) = diag ( β s 1 ( f ) , , β s L ( f ) ) and g S , k contains the gains for the positions s 1 , , s L of g k . Then, we focus on the unknown vector ζ k ( f ) = B S ( f ) g S , k that contains the channel gains affected by the shaping pulse. Accordingly, the linear Minimum Mean Squared Error (MMSE) estimator of ζ k ( f ) reads as
ζ ^ k ( f ) = G ^ S ( f ) Ψ S H ( f ) Ψ S ( f ) G ^ S ( f ) Ψ S H ( f ) + σ v ( f ) 2 I T tr 1 φ k ( f ) .
Once ζ ^ k ( f ) is determined, we have to identify the matrices B S ( f ) before performing the estimation of g S , k jointly using the vectors ζ ^ k ( f ) for all the frequencies. Since B S ( f ) contains samples of the shaping filter p rc ( · ) in the frequency domain, we obtain B S ( f ) by estimating the delays corresponding to the L channel paths. In particular, to determine delay for the l-th path τ l , we collect the entries as sociated with the l-th path for all the frequencies, i.e., [ ζ ^ k ( f ) ] l for f = 1 , , N c , in a vector. Then, we multiply this vector times the first N columns of the Discrete Fourier Transform (DFT) matrix, F C N c × N , that is
ζ l , k = F H [ ζ ^ k ( 1 ) ] l , , [ ζ ^ k ( N c ) ] l T .
Observe now that ζ l , k = g l , k p rc ( τ l ) + w k contains the aforementioned N stacked pulse samples p rc ( τ l ) = [ p rc ( τ l ) , p rc ( T s τ l ) , , p rc ( N 1 ) T s τ l ] T , and the estimation noise w k N C ( 0 , σ w 2 I N c ) . Unfortunately, these samples are scaled by the unknown complex-valued channel gain g l of (2). Note that we assume that the error is statistically independent for different frequencies. Hence, for given shaping filter p rc ( t ) , we propose the following estimator function for τ
D ( τ ) = k = 1 K | p rc T ( τ ) ζ l , k | p rc ( τ ) 2 ζ l , k 2 ,
where p rc ( τ ) = [ p rc ( τ ) , p rc ( T s τ ) , , p rc ( N 1 ) T s τ ] T contains the pulse samples for the delay τ . Unfortunately, D ( τ ) is a non-monotonic function. Thus, we minimize D ( τ ) over τ , e.g., by linear search over [ 0 , ( N 1 ) T s ] to obtain the delay estimate τ ^ l . In the high SNR regime, it is enough to sound within the interval corresponding to the two samples of p rc ( τ ) with larger absolute values.
Recall that the channel gains are multiplied times B S ( f ) in the unknown vector ζ k ( f ) of (24). Thus, employing the estimated delays τ ^ l , we compute B ^ S ( f ) for all frequencies (see (6)). To eventually estimate the channel gains g S , k , we compute the minimum variance unbiased estimator [47] using information from all the frequencies, i.e., g ^ S , k = B ^ ζ ^ k , where B ^ = [ B ^ S T ( 1 ) , , B ^ S T ( N c ) ] T and ζ ^ k = [ ζ ^ k T ( 1 ) , , ζ ^ k T ( N c ) ] T . Finally, by invoking (6), the estimated channel yields
h ^ k ( f ) = A S ( f ) B ^ S ( f ) g ^ S , k .

6. Simulation Results

The following setup is considered for the numerical experiments. We as sume a training sequence T generated from a Wichmann ruler of length T tr . The number of transmit antennas is M = 200 , while we consider a hybrid architecture using N RF = 2 RF chains, N c = 64 subcarriers, and a dictionary of size G = 400 with equally spaced angles within the range [ 0 , π ] . We consider the channel covariance model of (6) with L propagation paths uniformly distributed in [ 0 , π ] ; different number of training periods K, and N = 8 random delay taps uniformly distributed in [ 0 , N 1 ] T s , with T s = 1 / B . The numerical results are averaged over 200 channel covariance realizations for independent users, and over the different subcarriers N c . To evaluate the accuracy of the covariance identification, we employ the Normalized Mean Squared Error (NMSE) metric defined as NMSE = C ^ h C h F 2 C h F 2 with the Frobenius norm · F . In the case of channel estimation, we use the efficiency η [ 0 , 1 ] given by η = | h ^ H h | h ^ h , which is similar to the figures of merit in e.g., [26,34,48]. This insightful metric evaluates the signal power lost by beamforming using the estimated channels. Note that this metric is averaged over the number of channel blocks.
As a benchmark, we include an LS (known θ ) curve that individually estimates the gain variances for each subcarrier using (18) as suming that the channel angles are known. That is, it does not exploit the symmetry with respect to the central frequency of (19) to estimate the variance gain.

6.1. Wideband Covariance Estimation

Figure 3a shows the NMSE obtained with the proposed algorithms. The training sequence length is set to T tr = 25 and the channel parameters are adjusted according to massive MIMO settings with f c = 5 GHz, B = 20 MHz, L = { 15 , 30 } channel propagation paths, and a SNR of 15 dB. Using the same training sequence, we compare our methods with COMP in [34], which supports wideband estimation in the absence of beam squint. For L = 15 , the proposed methods achieve better performance than COMP when the number of training blocks K is large, but the accuracy of the estimation deteriorates when K < 20 . This behavior is motivated by the lack of similarity between the sample covariance matrix C ^ φ ( f ) and the actual channel covariance matrix C φ ( f ) that makes difficult the discrimination of the signal and noise subspaces (see Algorithm 1). Nevertheless, values of K above 50 are feasible in practical settings, since K is not restricted by the channel coherence time. We also include the case of L = 30 channel paths in Figure 3a. Under such as sumption, COMP and DR-W-MUSIC do not applydue to the lack of angular sparsity, and the limited rank of the sample covariance matrices, respectively. Conversely, the DR-W-SS approach performs close to the benchmark with known angles, denoted by LS (known θ ).
Figure 3a shows the NMSE for a mmWave setup with L = 5 , SNR = 0 dB, a small signal bandwidth B = 200 MHz and f c = 28 GHz. The robustness against the noise of the proposed strategies is remarkable, especially in the case of DR-W-SS. On the contrary, COMP does not achieve good performance results in this SNR regime compared to the proposed methods.
To evaluate the performance impact of the training sequence length T tr , we fix the parameters to M = 100 , L = 50 , SNR = 15 dB and K = 50 in Figure 4a. The training sequence is shorter than the number of paths and takes the values 19, 25, 32, 40, 50, to show the capability of DR-W-SS in the challenging scenario L > T tr . As shown, the proposed method performs better than the LS benchmark with known support S .
We now evaluate the consequences of neglecting the beam squint effect for the setup considered in Figure 3a, that is, f c = 28 GHz and different signal bandwidths, B = 200 , 800 , 1200 , 2000 MHz. The methods referred to as W-MUSIC consist of Algorithm 1 ignoring the dictionary rotation. Figure 4b exhibits the performance impact of beam squintdue to the lack of a common subspace for different frequencies. As shown, the proposed dictionary rotation is able to exploit the structure even in the case of B = 2000 MHz.

6.2. Covariance Subspace Estimation

In the previous numerical examples, the number of paths L was as sumed to be known. Subspace estimation methods, like the one in [49], can be used to estimate L by comparing the differences between consecutive eigenvalues of the sample covariance matrix with a threshold ϵ . In the wideband case, we employ the information of all frequencies jointly, to increase the detection accuracy. Moreover, the computational cost over DR-W-MUSIC and DR-W-SS is negligible while it is significant for other approaches not performing a subspace discrimination.
Figure 5a shows the robustness against uncertainty in the number of channel propagation paths L. We employ Algorithm 2 in [49] and empirically adjust the parameter ϵ for the different estimation methods. For the considered scenario with SNR = 0 dB, we set the threshold as ϵ = 0.25 for DR-W-MUSIC and ϵ = 0.01 for DR-W-SS. Remarkably, the performance obtained with subspace estimation for DR-W-SS is similar to that of the strategies with known L. In particular, for DR-W-MUSIC, overestimating the number of channel paths L improves the performance. This observation reveals that the support identified by Algorithm 1 with known L was incorrect. Recall that the support S comprises the angles corresponding to the L larger values on the estimator function J i . Under the strong noise conditions considered in Figure 5a, some angles belonging to the covariance support might be ignored (see line 14 of Algorithm 1). Instead, step 14 selects angles in the dictionary whose steering vectors are linearly dependent. Thus, overestimating L enables a better detection of the channel covariance subspace where angles with lower J i are included in S . In addition, we compare the effect of discarding the repeated distances in DR-W-SS, as performed in conventional SS [44]. In Figure 5a, this strategy is labeled DR-W-SS Discard. As shown, for T tr = 50 , the performance loss is significant.
In Section 4.4, we discussed the advantages and practical orientation of moderate dictionary sizes. Nevertheless, we investigate the effect of off-grid AoDs in Figure 5b for the mmWave setup and B = 200 MHz. The error floordue to basis mismatch is clear. As an alternative to increase G, we propose to compensate the basis inaccuracy by overestimating the number of channel paths L. This way, part of the power leakage is recovered with the additional AoDs. These approaches are denoted DR-W-MUSIC Overestimated L and DR-W-SS Overestimated L. For the LS estimator with known angles, we choose the closest dictionary angles to the real ones.

6.3. Wideband Channel Estimation

Figure 6a shows the performance of the channel estimator with the covariance identified and the two different channel estimation methods: LMMSE [21,25,47]
h ^ k ( f ) = C ^ h ( f ) T H T C ^ h ( f ) T H + σ v ( f ) 2 I T tr 1 φ k ( f )
and the proposed DG estimator of (27). The number of channel blocks is set to K = 100 , and the number of delay candidates τ considered in (26) is 20 N. Since the proposed DG method uses the information of all the subcarriers to produce the estimates, it is very robust against noise and results in a considerable performance gain. Indeed, more than 80 % of the signal power is captured at 10 dB with the proposed strategies.

6.4. Computational Complexity

Table 1 summarizes the computational complexity of the proposed methods. Recall that a fast covariance estimation is important, but not as critical as the actual channel estimation. Indeed, we only employ a training stage contrarily to other approaches like [21,37].
For the favorable scenario, DR-W-MUSIC(a) and DR-W-SS(a), the gain variances are estimated using well-conditioned measurement matrices, while (18) with the Khatri–Rao product is considered for MUSIC(b) and DR-W-SS(b). The complexities have been calculated as suming T tr > L for DR-W-MUSIC and COMP. It is noticeable that neither DR-W-MUSIC nor COMP depend on the number of antennas, although both of them linearly depend on the dictionary size G. The number of channel taps N is also relatively small, and T is the number of τ candidates evaluated to estimate the delays using (26). Since COMP and DR-W-MUSIC apply to scenarios where L and T tr are very small compared to M or G, the complexity of the latter is smaller in general. Furthermore, the number of steps performed by COMP is larger, and its computational burden depends on the number of training periods K.
When L < M , the computational complexity virtually depends on the number of antennas. Hence, DR-W-SS and DR-W-MUSIC establish a trade-off among T tr , L and M, since we could use the less complex option by adjusting T tr and K. The effect of increasing L is depicted in Figure 6b. Interestingly, in the region where the channel is sparse and Dynamic COMP (DCOMP) applies, L < 10 , the computational complexity of the proposed approaches is lower than that of DCOMP.
We now differentiate between the operations computed once using the estimated covariance matrix, referred to as “Computing Estimator”, and the channel estimation for each observation, referred to as “Channel Estimation”. Remarkably, the complexity of computing the proposed estimator (DG) linearly depends on M, cf. (24)–(26) whereas for the LMMSE estimator this dependency is quadratic.
Regarding the actual channel estimation, in scenarios suitable for DR-W-MUSIC with L < T tr , DG channel estimation is more efficient. This applies to mmWave frequencies and other massive MIMO scenarios, e.g., [8,37,50,51,52,53]. Conversely, LMMSE estimator might be less complex for large number of channel propagation paths, i.e., L > M .

7. Conclusions

In this work, we proposed a covariance and channel estimation method for FDD wideband hybrid massive MIMO systems. The numerical experiments showed the good performance of the posed covariance estimation approaches compared to other methods in the literature, even when considering very short training sequence lengths. In addition, we developed a robust channel estimator with computational complexity depending linearly on the number of antennas. Both facts contributed to a reduction in the channel training overhead. Finally, we evaluated the performance of the proposed scheme for a wide variety of scenarios. The extension to Uniform Planar Array (UPA) antenna arrangements, and the use of shorter training sequences is an interesting research line for future works.

Author Contributions

Conceptualization, J.P.G.-C. and P.S.-C.; formal analysis, J.P.G.-C. and P.S.-C.; investigation, J.P.G.-C. and P.S.-C.; writing—original draft preparation, J.P.G.-C., P.S.-C., P.M.C. and L.C.; writing—review and editing, P.M.C. and L.C.; funding acquisition, P.M.C. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded by Xunta de Galicia (ED431G2019/01), Agencia Estatal de Investigación of Spain (TEC2016-75067-C4-1-R, RED2018-102668-T, PID2019-104958RB-C42), and ERDF funds of the EU (AEI/FEDER, UE).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

To prove the linear independence, we consider two cases: (1) If the columns of Ψ S ( f ) are linearly independent, the same property holds for the columns of Ψ S * ( f ) Ψ S ( f ) . (2) Consider that the i-th column of Ψ S ( f ) , ψ S i for notation simplicity, is a linear combination of the columns whose indices are in the set T S , i.e., ψ S i = j = 1 | T | c j ψ T j with ψ T j 0 , j T . Then, for the i-th column of ( Ψ S * ( f ) Ψ S ( f ) ) , we obtain
j = 1 | T | c j * ψ T j * n = 1 | T | c n ψ T n = j = 1 | T | n = 1 | T | c j * ψ T j * c n ψ T n .
Notice that it is not possible to write this column as a linear combination of the form j = 1 | T | b j ψ T j * ψ T j unless ψ T j = n = 1 | T | c n ψ T n , j . This condition holds only if ψ S i = c j ψ T j .

References

  1. Rusek, F.; Persson, D.; Lau, B.K.; Larsson, E.G.; Marzetta, T.L.; Edfors, O.; Tufvesson, F. Scaling Up MIMO: Opportunities and Challenges with Very Large Arrays. IEEE Signal Process. Mag. 2013, 30, 40–60. [Google Scholar] [CrossRef] [Green Version]
  2. Alkhateeb, A.; Mo, J.; González-Prelcic, N.; Heath, R.W. MIMO Precoding and Combining Solutions for Millimeter-Wave Systems. IEEE Commun. Mag. 2014, 52, 122–131. [Google Scholar] [CrossRef]
  3. Björnson, E.; Hoydis, J.; Sanguinetti, L. Massive MIMO Has Unlimited Capacity. IEEE Trans. Wirel. Commun. 2018, 17, 574–590. [Google Scholar] [CrossRef] [Green Version]
  4. Gao, X.; Edfors, O.; Tufvesson, F.; Larsson, E.G. Massive MIMO in Real Propagation Environments: Do All Antennas Contribute Equally? IEEE Trans. Commun. 2015, 63, 3917–3928. [Google Scholar] [CrossRef] [Green Version]
  5. Hassibi, B.; Hochwald, B.M. How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49, 951–963. [Google Scholar] [CrossRef] [Green Version]
  6. Marzetta, T.L. Noncooperative Cellular Wireless with Unlimited Numbers of Base Station Antennas. IEEE Trans. Wirel. Commun. 2010, 9, 3590–3600. [Google Scholar] [CrossRef]
  7. Mailloux, R. Phased Array Antenna Handbook; Artech House: Norwood, MA, USA, 2018. [Google Scholar]
  8. Gao, Z.; Dai, L.; Wang, Z.; Chen, S. Spatially Common Sparsity Based Adaptive Channel Estimation and Feedback for FDD Massive MIMO. IEEE Trans. Signal Process. 2015, 63, 6169–6183. [Google Scholar] [CrossRef]
  9. Luo, X.; Zhang, X.; Cai, P.; Shen, C.; Hu, D.; Qian, H. DL CSI acquisition and feedback in FDD massive MIMO via path aligning. In Proceedings of the Ninth International Conference on Ubiquitous and Future Networks (ICUFN), Milan, Italy, 4–7 July 2017; pp. 349–354. [Google Scholar] [CrossRef]
  10. Björnson, E.; Der Perre, L.V.; Buzzi, S.; Larsson, E.G. Massive MIMO in Sub-6 GHz and mmWave: Physical, Practical, and Use-Case Differences. CoRR 2018. [Google Scholar] [CrossRef] [Green Version]
  11. Van Trees, H.L. Optimum Array Processing; Wiley: Hoboken, NJ, USA, 2002. [Google Scholar] [CrossRef]
  12. Neumann, D.; Wiese, T.; Utschick, W. Learning the MMSE Channel Estimator. IEEE Trans. Signal Process. 2018, 66, 2905–2917. [Google Scholar] [CrossRef] [Green Version]
  13. Gao, Z.; Hu, C.; Dai, L.; Wang, Z. Channel Estimation for Millimeter-Wave Massive MIMO with Hybrid Precoding Over Frequency-Selective Fading Channels. IEEE Commun. Lett. 2016, 20, 1259–1262. [Google Scholar] [CrossRef] [Green Version]
  14. González-Coma, J.P.; Rodríguez-Fernández, J.; González-Prelcic, N.; Castedo, L.; Heath, R.W. Channel Estimation and Hybrid Precoding for Frequency Selective Multiuser mmWave MIMO Systems. IEEE J. Sel. Top. Signal Process. 2018, 12, 353–367. [Google Scholar] [CrossRef]
  15. Haghighatshoar, S.; Caire, G. Massive MIMO Pilot Decontamination and Channel Interpolation via Wideband Sparse Channel Estimation. IEEE Trans. Wirel. Commun. 2017, 16, 8316–8332. [Google Scholar] [CrossRef] [Green Version]
  16. Sayeed, A.M. Deconstructing multiantenna fading channels. IEEE Trans. Signal Process. 2002, 50, 2563–2579. [Google Scholar] [CrossRef] [Green Version]
  17. Sun, G.R.M.J.S.; Rappaport, T.S. A novel millimeter-wave channel simulator and applications for 5G wireless communications. In Proceedings of the International Conference on Communications (ICC), Washington, DC, USA, 2–7 July 2017. [Google Scholar]
  18. Gao, X.; Tufvesson, F.; Edfors, O.; Rusek, F. Measured propagation characteristics for very-large MIMO at 2.6 GHz. In Proceedings of the Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 4–7 November 2012; pp. 295–299. [Google Scholar] [CrossRef] [Green Version]
  19. Sanguinetti, L.; Björnson, E.; Hoydis, J. Towards Massive MIMO 2.0: Understanding spatial correlation, interference suppression, and pilot contamination. arXiv 2019, arXiv:1904.03406. [Google Scholar] [CrossRef] [Green Version]
  20. Özdogan, O.; Björnson, E.; Larsson, E.G. Massive MIMO With Spatially Correlated Rician Fading Channels. IEEE Trans. Commun. 2019, 67, 3234–3250. [Google Scholar] [CrossRef] [Green Version]
  21. Adhikary, A.; Nam, J.; Ahn, J.Y.; Caire, G. Joint Spatial Division and Multiplexing: The Large-Scale Array Regime. IEEE Trans. Inf. Theory 2013, 59, 6441–6463. [Google Scholar] [CrossRef]
  22. Zhang, B.; Cimini, L.J.; Greenstein, L.J. Efficient Eigenspace Training and Precoding for FDD Massive MIMO Systems. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Singapore, 4–10 December 2017; pp. 1–6. [Google Scholar] [CrossRef]
  23. Neumann, D.; Joham, M.; Utschick, W. Covariance Matrix Estimation in Massive MIMO. IEEE Signal Process. Lett. 2018, 25, 863–867. [Google Scholar] [CrossRef]
  24. Upadhya, K.; Vorobyov, S.A. Covariance Matrix Estimation for Massive MIMO. IEEE Signal Process. Lett. 2018, 25, 546–550. [Google Scholar] [CrossRef] [Green Version]
  25. Xie, H.; Gao, F.; Jin, S.; Fang, J.; Liang, Y. Channel Estimation for TDD/FDD Massive MIMO Systems with Channel Covariance Computing. IEEE Trans. Wirel. Commun. 2018, 17, 4206–4218. [Google Scholar] [CrossRef] [Green Version]
  26. Zhang, X.; Zhong, L.; Sabharwal, A. Directional Training for FDD Massive MIMO. IEEE Trans. Wirel. Commun. 2018, 17, 5183–5197. [Google Scholar] [CrossRef]
  27. Khalilsarai, M.B.; Haghighatshoar, S.; Yi, X.; Caire, G. FDD Massive MIMO via UL/DL Channel Covariance Extrapolation and Active Channel Sparsification. CoRR 2018. [Google Scholar] [CrossRef]
  28. Jian, M.; Gao, F.; Tian, Z.; Jin, S.; Ma, S. Angle-Domain Aided UL/DL Channel Estimation for Wideband mmWave Massive MIMO Systems With Beam Squint. IEEE Trans. Wirel. Commun. 2019, 18, 3515–3527. [Google Scholar] [CrossRef]
  29. Wang, B.; Jian, M.; Gao, F.; Li, G.Y.; Lin, H. Beam Squint and Channel Estimation for Wideband mmWave Massive MIMO-OFDM Systems. IEEE Trans. Signal Process. 2019, 67, 5893–5908. [Google Scholar] [CrossRef] [Green Version]
  30. Pal, P.; Vaidyanathan, P.P. Coprime sampling and the music algorithm. In Proceedings of the Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), Sedona, AZ, USA, 4–7 January 2011; pp. 289–294. [Google Scholar] [CrossRef]
  31. Haghighatshoar, S.; Caire, G. Low-Complexity Massive MIMO Subspace Estimation and Tracking From Low-Dimensional Projections. IEEE Trans. Signal Process. 2018, 66, 1832–1844. [Google Scholar] [CrossRef]
  32. Romero, D.; Leus, G. Compressive covariance sampling. In Proceedings of the Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 2–7 February 2013; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
  33. Shakeri, S.; Ariananda, D.D.; Leus, G. Direction of arrival estimation using sparse ruler array design. In Proceedings of the Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Cesme, Turkey, 17–20 June 2012; pp. 525–529. [Google Scholar] [CrossRef]
  34. Park, S.; Heath, R.W. Spatial Channel Covariance Estimation for the Hybrid MIMO Architecture: A Compressive Sensing-Based Approach. IEEE Trans. Wirel. Commun. 2018, 17, 8047–8062. [Google Scholar] [CrossRef] [Green Version]
  35. Haghighatshoar, S.; Khalilsarai, M.B.; Caire, G. Multi-Band Covariance Interpolation with Applications in Massive MIMO. CoRR 2018. [Google Scholar] [CrossRef] [Green Version]
  36. Wichmann, B. A Note on Restricted Difference Bases. J. Lond. Math. Soc. 2016, s1-38, 465–466. [Google Scholar] [CrossRef]
  37. Huang, W.; Huang, Y.; Xu, W.; Yang, L. Beam-Blocked Channel Estimation for FDD Massive MIMO With Compressed Feedback. IEEE Access 2017, 5, 11791–11804. [Google Scholar] [CrossRef]
  38. Schniter, P.; Sayeed, A. Channel estimation and precoder design for millimeter-wave communications: The sparse way. In Proceedings of the 48th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 2–5 November 2014; pp. 273–277. [Google Scholar] [CrossRef]
  39. Leech, J. On the Representation of 1, 2, …, n by Differences. J. Lond. Math. Soc. 1956, s1-31, 160–169. [Google Scholar] [CrossRef]
  40. Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef] [Green Version]
  41. Lee, K.; Bresler, Y.; Junge, M. Subspace Methods for Joint Sparse Recovery. IEEE Trans. Inf. Theory 2012, 58, 3613–3641. [Google Scholar] [CrossRef] [Green Version]
  42. Kim, J.M.; Lee, O.K.; Ye, J.C. Compressive MUSIC: Revisiting the Link Between Compressive Sensing and Array Signal Processing. IEEE Trans. Inf. Theory 2012, 58, 278–301. [Google Scholar] [CrossRef]
  43. Pal, P.; Vaidyanathan, P.P. Nested Arrays: A Novel Approach to Array Processing With Enhanced Degrees of Freedom. IEEE Trans. Signal Process. 2010, 58, 4167–4181. [Google Scholar] [CrossRef] [Green Version]
  44. Ariananda, D.D.; Leus, G. Direction of arrival estimation for more correlated sources than active sensors. Signal Process. 2013, 93, 3435–3448. [Google Scholar] [CrossRef]
  45. Wax, M.; Ziskind, I. On unique localization of multiple sources by passive sensor arrays. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 996–1000. [Google Scholar] [CrossRef]
  46. Richards, M. Fundamentals of Radar Signal Processing; Professional Engineering; Mcgraw-Hill: New York, NY, USA, 2005. [Google Scholar]
  47. Kay, S. Fundamentals of Statistical Signal Processing; Prentice-Hall PTR: Englewood Cliffs, NJ, USA, 1993. [Google Scholar]
  48. Haghighatshoar, S.; Caire, G. Massive MIMO Channel Subspace Estimation From Low-Dimensional Projections. IEEE Trans. Signal Process. 2017, 65, 303–318. [Google Scholar] [CrossRef]
  49. Tsai, C.R.; Liu, Y.H.; Wu, A.Y. Efficient Compressive Channel Estimation for Millimeter-Wave Large-Scale Antenna Systems. IEEE Trans. Signal Process. 2018, 66, 2414–2428. [Google Scholar] [CrossRef]
  50. Shen, J.; Zhang, J.; Alsusa, E.; Letaief, K.B. Compressed CSI Acquisition in FDD Massive MIMO: How Much Training is Needed? IEEE Trans. Wirel. Commun. 2016, 15, 4145–4156. [Google Scholar] [CrossRef]
  51. Zhao, P.; Wang, Z.; Sun, C. Angular domain pilot design and channel estimation for FDD massive MIMO networks. In Proceedings of the International Conference on Communications (ICC), Washington, DC, USA, 2–7 July 2017; pp. 1–6. [Google Scholar] [CrossRef]
  52. Rao, X.; Lau, V.K.N. Distributed Compressive CSIT Estimation and Feedback for FDD Multi-User Massive MIMO Systems. IEEE Trans. Signal Process. 2014, 62, 3261–3271. [Google Scholar] [CrossRef] [Green Version]
  53. Cheng, X.; Sun, J.; Li, S. Channel Estimation for FDD Multi-User Massive MIMO: A Variational Bayesian Inference-Based Approach. IEEE Trans. Wirel. Commun. 2017, 16, 7590–7602. [Google Scholar] [CrossRef]
Figure 1. Training length vs. number of antennas with Wichmann rulers. Dashed lines show as ymptotic upper and lower bounds on the minimum training size based on (12).
Figure 1. Training length vs. number of antennas with Wichmann rulers. Dashed lines show as ymptotic upper and lower bounds on the minimum training size based on (12).
Sensors 20 00930 g001
Figure 2. Kruskal rank upper bound for different array and dictionary sizes.
Figure 2. Kruskal rank upper bound for different array and dictionary sizes.
Sensors 20 00930 g002
Figure 3. NMSE of different covariance estimation strategies for M = 200 antennas: (a) SNR = 15 dB, and L = { 15 , 30 } propagation paths; (b) SNR = 0 dB, and L = 5 paths.
Figure 3. NMSE of different covariance estimation strategies for M = 200 antennas: (a) SNR = 15 dB, and L = { 15 , 30 } propagation paths; (b) SNR = 0 dB, and L = 5 paths.
Sensors 20 00930 g003
Figure 4. NMSE of covariance estimation for (a) M = 100 antennas, L = 50 propagation paths, K = 50 , SNR = 15 dB, and different training sequence lengths; (b) M = 200 antennas, L = 5 propagation paths, and different signal bandwidths B for f c = 28 GHz.
Figure 4. NMSE of covariance estimation for (a) M = 100 antennas, L = 50 propagation paths, K = 50 , SNR = 15 dB, and different training sequence lengths; (b) M = 200 antennas, L = 5 propagation paths, and different signal bandwidths B for f c = 28 GHz.
Sensors 20 00930 g004
Figure 5. NMSE of covariance estimation for (a) M = 200 antennas, L = 15 propagation paths, T tr = 50 , SNR = 0 dB, and subspace estimation; (b) M = 200 , SNR = 0 dB, L = 5 propagation paths, and off-grid AoDs.
Figure 5. NMSE of covariance estimation for (a) M = 200 antennas, L = 15 propagation paths, T tr = 50 , SNR = 0 dB, and subspace estimation; (b) M = 200 , SNR = 0 dB, L = 5 propagation paths, and off-grid AoDs.
Sensors 20 00930 g005
Figure 6. (a) efficiency of channel estimation for M = 200 antennas, L = 5 propagation paths, T tr = 25 , and K = 100 training periods; (b) normalized computational complexity for M = 200 antennas, dictionary size G = 2 M , K = 150 training periods, training length T tr = 50 , N c = 64 subcarriers, and different number of paths.
Figure 6. (a) efficiency of channel estimation for M = 200 antennas, L = 5 propagation paths, T tr = 25 , and K = 100 training periods; (b) normalized computational complexity for M = 200 antennas, dictionary size G = 2 M , K = 150 training periods, training length T tr = 50 , N c = 64 subcarriers, and different number of paths.
Sensors 20 00930 g006
Table 1. Complexitiesof estimation methods.
Table 1. Complexitiesof estimation methods.
Covariance Estimation
DR-W-MUSIC(a) O N c ( T tr 3 + ( T tr 2 + T tr ) G + L 2 T tr + 2 L T tr 2 )
DR-W-MUSIC(b) O N c ( T tr 3 + ( T tr 2 + T tr ) G + L 4 T tr 2 + L 2 T tr 2 )
DCOMP O ( N c ( 2 K T tr 2 G + K L 3 T tr + 2 K T tr 2 ( L 2 + L ) + K L T tr 3 ) )
DR-W-SS(a) O N c ( M 2 + M 3 + ( M 2 + M ) G + L 2 M + 2 L M 2 )
DR-W-SS(b) O N c ( M 2 + M 3 + ( M 2 + M ) G + L 4 M 2 + L 2 M 2 )
Computing Estimator
DG O N c ( 3 L 2 T tr + T tr 3 + N L + L 3 + M L 2 + M L ) + K N T L
LMMSE (28) O ( N c ( 3 M 2 T tr + T tr 3 ) )
Channel Estimation
DG O N c ( L T tr + M L + L 2 )
LMMSE (28) O ( N c M T tr )

Share and Cite

MDPI and ACS Style

González-Coma, J.P.; Suárez-Casal, P.; Castro, P.M.; Castedo, L. FDD Channel Estimation Via Covariance Estimation in Wideband Massive MIMO Systems. Sensors 2020, 20, 930. https://doi.org/10.3390/s20030930

AMA Style

González-Coma JP, Suárez-Casal P, Castro PM, Castedo L. FDD Channel Estimation Via Covariance Estimation in Wideband Massive MIMO Systems. Sensors. 2020; 20(3):930. https://doi.org/10.3390/s20030930

Chicago/Turabian Style

González-Coma, José P., Pedro Suárez-Casal, Paula M. Castro, and Luis Castedo. 2020. "FDD Channel Estimation Via Covariance Estimation in Wideband Massive MIMO Systems" Sensors 20, no. 3: 930. https://doi.org/10.3390/s20030930

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop