Millimeter Wave Beamforming Codebook Design via Learning Channel Covariance Matrices Over Riemannian Manifolds

Covariance matrices of spatially-correlated wireless channels in millimeter wave (mmWave) vehicular networks can be employed to design environment-aware beamforming codebooks. Such covariance matrices can be represented over non-Euclidean (i.e., curved surfaces) manifolds, thanks to their symmetric positive definite (SPD) structures. In this paper, we propose three learning models for channel covariance estimation over Riemannian manifolds. First, we propose an unsupervised Riemannian K-means clustering approach, where Log-Euclidean metric (LEM) is utilized as a distance metric among channel covariance matrices over the Riemannian manifold. Second, we propose a Riemannian Competitive Learning (RCL) model, which is an online clustering solution that is intended to reduce the learning time of the offline K-means models. Third, we apply a Riemannian Dictionary Learning (RDL) model that leverages the sparsity properties of mmWave channels, while estimating their channel covariance matrices. Simulation results show that the proposed Riemannian K-means, online RCL and RDL models can achieve 41%, 30% and 44% higher data rate, respectively, than their Euclidean-based counterparts. Furthermore, they require few training samples and hence fast construction of codebook design in dynamic environments, which leads to low latency communication.


I. INTRODUCTION
High data rate and low latency are crucial to support emerging applications, in vehicular networks, such as infotainment, safety information, and online route mapping messages [2], [3], [4]. While vehicular communication over millimeter Wave (mmWave) spectrum bands [5] has large bandwidths [6], which is a key enabler for high data rate, it also experiences large path loss. Dynamic deployment of mobile road side units (RSUs), or generally relays with large antenna arrays, in zones of temporary high data demand can partially compensate for such large path loss.
Equally important, constructing environment-aware beamforming codebooks at such mobile relays, which are matched The associate editor coordinating the review of this manuscript and approving it for publication was Olutayo O. Oyerinde .
to the spatial correlation [7] of the mmWave user channels within each zone, is fundamental to enabling faster (i.e., low latency) [8] high-rate communications. Otherwise, having a fixed or pre-defined beamforming codebooks for all relays will result in lower data rates, compared to what can be potentially achieved. Therefore, providing high-rate and low-latency vehicular communication can be accomplished through relay-assisted mmWave communication with environment-aware beamforming codebooks, and this is the motivation of this work.
Beamforming codebook for spatially-correlated channels can be designed to match their their second order statistics (i,e, covariance matrices) [9], and hence there is a need to estimate channel covariance matrices. The problem of channel covariance estimation has been discussed in several studies. For example, the authors in [10] developed a low-rank VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ matrix structure for covariance estimation in hybrid mmWave systems. The work in [11] considered a maximum-likelihood technique, which is based on pilot allocations, to estimate channel covariance matrices. The work in [12] studied the impact of imperfect channel covariance information at the receiver on the overall system performance. Also, a channel covariance estimation scheme for downlink channels in Massive MIMO was proposed in [13] based on the uplink covariance information. In addition, a compressive sensingbased covariance estimation for sparse mmWave channels was introduced in [14].
In an attempt to avoid assumptions of the aforementioned algorithms on the exact modeling (e.g., sparsity modeling) of the correlated channels, supervised deep learning models were utilized to learn channel covariance matrices in [15] and [16] or to directly learn the best beamforming codebook [17]. Moreover, a situation-aware channel covariance estimation was proposed in [18] for beamforming design in downlink MIMO systems. However, such deep learning models are a) supervised ones that require labeled training samples of the best codebooks, which are hard to acquire in new and dynamic relay-assisted environments and b) multilayered complex ones that require large training samples, and hence longer latency, before establishing the best codebook. Therefore, we focus in this work on unsupervised (i.e., no labelled training [19]) and low-complexity (e.g., single-layer network) learning models to estimate the channel covariance matrices.
Intuitively, the complexity of any machine learning (ML) model can be reduced if the underlying structure of its dataset is known. Knowing that practical multiple-antenna channels are spatially correlated, they can be utilized to reduce the model complexity. More specifically, the covariance matrices of such channels are symmetric positive semi-definite ones. Applying positive shifts to these semi-definite matrices transform them into symmetric positive definite (SPD) ones, which can be modeled as points over Riemannian manifolds [1], [20], [21], which are particularly conic ones. Representing channel covariance matrices over Riemannian manifolds is the first novelty of this work, and it paves the way to employ simple learning algorithms for estimating such covariance matrices over Riemannian manifolds, which can be denoted as Riemannian-geometric learning models.
Riemannian geometry [22] has been recently considered in designing beamforming vectors [23], [24] and in other wireless problems such as link scheduling [25], [26]. In addition, non-Euclidean methods have been utilized in codebook design such as non-conic Riemannian manifolds (i.e, unitary, fixed rank) as in [27], and [28]. While these studies present novel non-Euclidean geometric concepts of beamforming and codebook designs, they have not been utilized in channel covariance estimation, which is the novel solution approach of this paper.
In this paper, we proposed three Riemannian-geometric learning models for channel covariance estimation as follows.
First, we propose an unsupervised Riemannian K-Means model to cluster the instantaneous channel covariance matrices, which are collected at the mobile relay during initial transmission with its associated users in its newly-deployed area, into multiple distinct groups. Unlike the traditional Euclidean-based K-Means clustering model that takes vector inputs only [29], [30], [31], [32], the proposed Riemannian K-Means approach clusters the SPD matrix samples (i.e., instantaneous channel covariance matrices) over the Riemannian manifold. Appropriate metrics, such as Log-Euclidean metric (LEM) [33], is used to calculate the geodesic distance which is the shortest distance among the SPD covariance matrices over the manifold. We show that compared to a Euclidean-based benchmark, the proposed Riemannian K-Means can provide more accurate estimation of the covariance matrices, resulting in a higher achievable data rate.
As the aforementioned K-means clustering model depends on storing the entire needed data samples of instantaneous channel covaraiance matrices prior to clustering, it may require large latency before constructing its best-learned beamforming coodbook. Therefore and secondly, we propose another Riemannian-geometric solution, namely online Riemannian Competitive Learning (RCL), which performs fast real-time clustering of channel covariance matrices over the Riemannian manifold. The online RCL solution is a quantization algorithm [34] that learns on the fly and avoids waiting for large number of channel samples by processing a single channel estimate at a time. Our study reveals that the faster estimation of online RCL comes at the cost of losing some estimation accuracy compared to the offline Riemannian K-Means.
While the above two solutions present novel Riemanniangeometric perspectives, they do not generally consider the sparse scattering nature of mmWave channels [14], [35]. In that regard, we point out that dictionary learning (DL) has been utilized as a powerful tool to represent sparse signals with few non-zero elements in many scientific research [36], [37]. Hence and third, we propose an unsupervised Riemannian Dictionary Learning (RDL) algorithm to leverage the sparsity of mmWave channels while learning their SPD matrices over Riemannian manifolds. Our results show that the RDL scheme is a highly-efficient learning model compared to the other solutions.
While traditional Euclidean-based solutions of the above mentioned approaches exist in the literature [38], [39], we propose their application in channel covariance clustering based on non-Euclidean geometry considering the Riemannian manifold. Overall, the three proposed models attempt to utilize their clustering accuracy to construct best environment-aware beamforming codebook, which provides higher data rate compared to a fixed or uniform one, while requiring as few learning samples as possible and hence low latency. As was initially stated, together high data rate and low latency are needed in vehicular communication.
We highlight the contributions of this paper as follows: • We model instantaneous channel covariance matrices, thanks to their SPD structure, as points over Riemannian manifolds, which enables the employment of unsupervised clustering approaches towards finding the statistical channel covariance matrices of a specific environment and hence constructing environment-aware beamforming codebooks for high-rate communication.
• We employ Riemannian K-Means clustering scheme for estimating statistical channel covariance matrices, which can achieve 41% higher data rate compared to its Euclidean K-means counterpart.
• We employ an online Riemannian Competitive Learning clustering, which makes faster estimation of the channel covariance matrices than the offline Riemannian K-Means, and provides 30% more data rate than a Euclidean K-means clustering solution.
• We also employ a Riemannian dictionary learning algorithm that exploits the sparse mmWave channel properties and achieve a data rate 44% higher than the Euclidean K-Means benchmark. The list of abbreviations used in this paper are listed in Table 1. The rest of the paper is organized as follows. Section II provides some preliminary concepts on Riemannian geometry, Competitive learning and Dictionary learning. Section III introduces the system model with problem formulation. Section IV describes the proposed solutions. Section V explains our simulation setup and the results for the proposed solutions. Finally, we conclude the paper in Section VI.

II. PRELIMINARIES
This section introduces some preliminary concepts on Riemannian geometry, competitive learning and dictionary learning solutions.

A. RIEMANNIAN MANIFOLDS
Topological manifolds represent non-Euclidean curved surfaces like as shown in Fig. 1. A tangent space T χ at χ is the composition of all the tangent vectors ν through the point χ. An exponential map at χ is defined as the function mapping from the tangent space T χ of the manifold at point χ to the manifold itself, i.e., exp χ : T χ → [40]. Conversely, log χ is defined as the inverse of the exponential map exp χ . Riemannian manifolds [22], [28] are equipped with Riemannian metrics. Riemannian metrics, such as LEM [33], are considered as inner products on the tangent space T χ and measure the length of geodesics between any two points over the manifold. SPD matrices are modeled as points over the interior surface of conic (i.e., Riemannian) manifolds [21]. Leveraging the intrinsic Riemannian-geometric structure of data given by the SPD matrices opens up the possibilities for numerous research studies [40], [41]. In particular, the application of geometric learning solutions using SPD matrices can characterize the data in non-Euclidean domains, which is not possible with traditional learning algorithms that deal with Euclidean-structured data samples (i.e., vectors) [42], [43].

B. COMPETITIVE LEARNING
In a competitive learning (CL) framework, the output units of the algorithm compete over each other to win a specific input vector, and hence, the name Competitive learning. Such competition oriented solutions were studied in many vector classification problems [44], [45], [46]. The work in [44] provided a survey on CL algorithms for optimal vector quantization with its application in numerical probability and data mining. The authors in [45] presented an overview of several CL algorithms and their possible hardware implementation for computer architecture. Ahalt et al. in [46] proposed a CL solution and showed its application to design a vector quantizer for speech database. Also, the work in [47] studied the interpretation of a gradient descent based CL framework and investigated its relationship to fuzzy learning vector quantization.
Considering the unique capability of SPD matrices to represent the underlying non-Euclidean data structure, CL has been recently applied for clustering over Riemannian manifolds. For example, CL for clustering SPD matrices over Riemannian manifolds has been studied in [34] where a neural network is applied to perform the clustering task. Also, VOLUME 10, 2022 the authors in [48] presented a novel CL based quantization algorithm over the Riemannian manifold for air traffic congestion analysis. However, we note that CL has not been previously considered for channel covariance estimation over Riemannian manifolds, as proposed in this paper.

C. DICTIONARY LEARNING
In the context of DL, a set of overcomplete basis known as a dictionary is used to present the essential information within a sparse signal with few nonzero elements [49]. In particular, DL is an efficient data representation tool to represent sparse data as a combination of basis vectors (or atoms) from a learned dictionary. Towards doing so, the DL solution finds a set of dictionary atoms that can accurately reconstruct the sparse data with minimum errors. In other words, the algorithm searches for the optimal vectors, known as atoms, that can be sparsely combined to represent each input data point of the given dataset. These atoms are obtained by factorizing the training data as a product of a dictionary and a sparse coefficient matrix [50]. The sparser the representation, the better will be the learned dictionary atoms. Consequently, the learned dictionary matrix is used to represent the signal as a linear combination of few nonzero dictionary atoms [51]. DL framework was studied in [36] to solve a pattern classification problem where the locality information of the atoms was preserved by employing a graph Laplacian matrix. In [37], the authors solved the DL problem for matrix factorization using alternating minimization. Motivated by the huge success of DL algorithms over linear spaced Euclidean data samples [52], such solutions were applied for SPD matrices over Riemannian spaces in [41], [53], [54]. Authors in [41] represented non-Euclidean data in the form of SPD matrices as a sparse combination of SPD atoms by applying DL over Riemannian manifolds. The authors in [53] and [54] provided kernel based DL solutions considering Riemannian geometry for sparse coding in computer vision applications. Nevertheless, DL has not been studied previously for channel covariance matrices estimation, which differentiates our work from the existing literature.

III. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we briefly describe the system model along with the problem formulation for channel covariance estimation in a mmWave vehicular network.

A. SYSTEM DESCRIPTION
We consider a typical vehicular setting where a mobile road side unit (RSU) or a relay moves to new vehicular environments every-time, and gathers the wireless channels towards its dynamic UEs. We assume these channels between the RSU and the existing UEs are collected over N (where N = 1, · · · , N ) communication sessions that resemble possible locations of the UEs. The covariance matrices of the newly collected channels are estimated at the RSU applying the clustering solutions. Finally, we apply matrix eigenvalue decomposition to construct an environment-aware beamforming codebook using the estimated covariance matrices.
Note that knowing the channel covariance matrices beforehand aids designing of efficient beamforming codebooks with higher data rates [9]. However, in a high-mobility vehicular scenario with dynamically deployed RSU as shown in Fig. 2, the covariance matrices of spatially-correlated channels are not known in advance because of their varying nature due to the relative position of RSU and its moving UEs. As a result, constructing fixed and predefined codebooks will be inefficient for such scenarios. Therefore, the codebook design needs to be learned based on the user's correlated channels within each relay's cluster and hence, requires an adaptive codebook [55]. To construct an adaptive codebook via channel covariance estimation, we consider the RSU is equipped with m antennas, and a single antenna is placed at the UEs. Let h i ∈ C m×1 represents an m × 1 wireless channel vector between the RSU and one of its UEs at the i−th location spot. The considered channel vectors h i are originated using the ray tracing model of an outdoor vehicular environment at 60 GHz [56]. No statistical model (e.g., circularly-symmetric complex Gaussian) was used to generate these vehicular channels.
A multiple-input single-output (MISO) channel between a given RSU and its UE is described as a correlated Rayleigh fading channel vector [20]. Calculating the spatially-correlated sample covariance matrices, h i h H i of these multi-antenna vehicular channels result in symmetric positive semi-definite matrices. Applying positive shifts to these matrices transform them into SPD ones according to the following where I is a m × m identity matrix, λ is an arbitrary small scalar, and H denotes matrix hermitian. We consider λ = 0.5 in this paper. The resulted positively-shifted sample covariance matrices (or simply SPD matrices) in (1) can be modeled as points on the conic Riemannian manifold [21] over the locations i = 1, 2, . . . , N . Consequently, = {Q 1 , Q 2 , . . . , Q N } represents the set of observations for all the given SPD matrices that are to be estimated, where each observation within is an m × m matrix.

B. PROBLEM FORMULATION
In order to design an environment aware beamforming codebook through channel covariance estimation, we aim to partition the collected samples of the shifted covariance matrices into K ≤ N disjoint groups R 1 , R 2 , . . . , R K , based on their distinct spatial correlation characteristics. The set of indices for the location points that belong to any group R k is denoted as l k = {1, 2, . . . , n k }, where k = 1, 2, . . . , K . Also, we denote the channel vectors within group R k as h i,k = h i , (∀i ∈ l k ). For estimating the covariance matrices, the collected samples {Q 1 , Q 2 , . . . , Q N } over the i = 1, 2, . . . , N location spots are formulated as the input to our non-Euclidean based clustering algorithms. The mean (i.e., centroid) is then computed for each group of samples based on their unique spatial-correlation characteristics. More specifically, for the considered groups of covariance matrices, the proposed clustering solutions seek to find the set of the estimated means given asĜ = {Ĝ 1 ,Ĝ 2 , . . . ,Ĝ K } over w iterations (where w = 1, · · · , W ). Matrix eigenvalue decomposition is applied over these estimated means of covariance matrices to construct the beamforming codebook.

IV. LEARNING CHANNEL COVARIANCE MATRICES
In this section, we explain our proposed clustering solutions for estimating channel covariance matrices in a vehicular environment.

A. RIEMANNIAN K-MEANS CLUSTERING
We model the observations of instantaneous channel covaraiance matrices as points over the Riemannian manifold, and consider the LEM Riemannian metric [20], [33] to measure the geodesic distances among them. The proposed Riemannian K-Means clustering algorithm assigns an input sample Q i ∈ to its nearest cluster R k * by calculating the optimal cluster index as whereĜ k is the mean (i.e., centroid) for cluster R k and k = 1, 2, . . . , K . The term d in (2) denotes the LEM geodesic distance. The centroidsĜ k of the channel covariance matrices are given by the Fréchet mean (also known as Riemannian mean [57]) and is defined as the minimizer of sum of the squared distances according to [58] The minimizer in (3) is actually obtained with a gradient descent procedure where from an initial value t = 0, the optimal centroid for group R k is reached by repeatedly applying the following update formula according to [59] where n k denotes the number of samples within group R k . Feeding more samples to the algorithm within a particular region of the RSU's coverage allows for a more precise identification of that cluster center. Finally, the induced positive shift is subtracted to achieve the desired estimated centroid of the channel covariance matrices for cluster R k Fig. 3 shows an illustration of the proposed Riemannian K-Means solution for channel covariance clustering where the collected SPD covariance matrices are grouped into different clusters over a conic Riemannian manifold. The different shapes of the SPD points in Fig. 3 are used to represent a vehicular environment with distinct characters.

B. ONLINE RIEMANNIAN COMPETITIVE LEARNING
The Riemannian K-Means algorithm requires large waiting time prior to conducting the clustering step, as it requires the RSU to collect all needed channel estimates from different locations within its coverage. To reduce such waiting time, we propose to apply online RCL, to cluster the channel covariance matrices over the Riemannian manifold. The online RCL is also an unsupervised clustering approach, but processes one channel estimate at a time (i.e., online) without requiring to wait for all the channel estimates together [34], [48].
In order to cluster the overall observations {Q 1 , Q 2 , . . . , Q N }, the online RCL algorithm computes the mean geodesic error (MGE) [34] as The mean of the proposed online RCL algorithm is calculated using a gradient descent approach, where the update [48] is given bŷ The solution follows the direction given in (7) at each step to update the cluster centerĜ k for any new input Q i . Instead of going along straight linear lines, in the Riemannian context, we use the exponential mapping to follow the geodesic distance over the Riemannian manifold.
For making the clustering real-time by processing the SPD inputs one at a time, the online RCL method employs a competition among the cluster centers to win a particular input. The LEM is used to decide the winner which gets updated by advancing towards that input. More specifically, the algorithm calculates the LEM distance from each cluster center to that input, and the cluster center with the shortest LEM distance to that SPD input wins the competition and is updated by approaching towards that SPD point. This differentiates the online RCL clustering scheme from the one explained in Algorithm 1, where the offline Riemannian K-Means considers all the samples together for calculating the centroids.
It is worth noting that at each step, the online RCL algorithm only requires as many distance computations as the number of desired clusters, and solely the winning output is updated by advancing towards the input, while the rest remain unchanged. Finally, the cluster center that wins the competition for a given set of input points is used to represent those input SPD matrices. The algorithm for the online RCL clustering scheme is provided in Algorithm 2.

C. DICTIONARY LEARNING FOR SPARSE CHANNELS
Through fewer RF chains than antennas, the mmWave channel estimator observes a reduced dimensional representation of the entire signal. Such sparse nature of the mmWave channels has been indicated in [14]. Consequently, the covariance matrices resulting from these mmWave channels develop sparse SPD matrices. Therefore, we point out the necessity of a learning model that involves sparse mmWave channel properties while estimating their statistical covariance matrices.
Inspired by the success of DL algorithms for representing sparse data [41], [52], we propose a Riemannian Dictionary Learning (RDL) algorithm to address the sparsity of spatio-temporal mmWave channels for estimating their Algorithm 2 Online RCL Clustering Algorithm Input: {Q 1 , Q 2 , · · · , Q N } collected from Output: {Ĝ 1 ,Ĝ 2 , · · · ,Ĝ K } for K clusters t ← 0, select {Ĝ 1 (0),Ĝ 2 (0), · · · ,Ĝ K (0)} at random repeat k ← argmin k d 2 Q i (t + 1),Ĝ k t for winner cluster center, {R 1 , · · · R k } ← cluster center {Ĝ 1 , · · · ,Ĝ K } statistical covariance matrices. Our proposed RDL solution clusters the instantaneous covariance matrices by learning a dictionary of SPD atoms in a way that the distinct features of the covariance matrices are preserved within a few atoms. More specifically, the given solution seeks a dictionary D = {D 1 , D 2 , . . . , D K } of few basis atoms such that each data point Q i ∈ can be represented as a combination of these atoms. This implies that We formulate the objective of the proposed RDL model as The obtained dictionary atoms D by the proposed RDL solution are treated as centroidsĜ of the channel covariance clusters. We summarize the proposed RDL algorithm in Algorithm 3.

D. EUCLIDEAN K-MEANS CLUSTERING
In order to show the potential advantages for representing channel covariance matrices over Riemannian manifolds, we also propose an Euclidean-based K-Means clustering approach. The Euclidean K-means algorithm treats the channel vectors h i as inputs and uses their linear Euclidean distances to compute the centroids given by the arithmetic mean [60]. Unlike our proposed solutions that cluster channel covariance matrices over the Riemannian manifold, the Euclidean-based K-Means attempt to make K -partitions of the channel vectors by applying the following minimization formula [61] whereĥ k denotes the centroid obtained by the Euclideanbased K-Means clustering model for group R k . The algorithm then calculates covariance matrices of those estimated channels for each group R k aŝ whereĜ k e is the covariance matrix for centroidĥ k . Finally, the sample covariance matrices are estimated from the ones given in (10). The Euclidean-based K-Means algorithm follows the same methodology provided in Algorithm 1, with the exception that it computes the linear distances between the channel vectors over the Euclidean space.

V. SIMULATION RESULTS
In this section, we introduce the different datasets that are considered for simulation along with the performance evaluation.

A. SIMULATION SETUP
Our work assumes a vehicular network operating at 60 GHz with 0.5 GHz bandwidth. A 4×4 phased array is considered at the RSU. We adopt three different vehicular datasets from the ''O1 − 60 ray tracing scenario'' [56] along with their channel models. The purpose of examining with different datasets is to observe the efficacy of the proposed schemes for various vehicular settings.
The area of interest under Dataset-1 considers 5460 user location points from user grid-1 of the ''O1 − 60 ray tracing scenario'' and is demonstrated in Fig. 4 (a). For Dataset-2, our work assumes a total of 10860 location points from user grid-1 as shown in Fig. 4 (b). The considered area under Dataset-3 is the largest one which consists of a total of 16800 samples from user grid-1, as shown in 4 (c). The distance between any two adjacent points is 20 cm [56] which is constant for all the considered scenarios. The time required for a vehicular UE to cover any of these regions can be determined using simple calculations. For instance, the Dataset-3 with 16800 location spots covers a distance of 3360 meters. A vehicle travelling at a speed of 30 mph (i.e., 48.2 km/h) will require almost 250 seconds or 4.2 minutes to cover this region. On the other hand, a vehicle moving at a speed of 63 mph (or 101.3 km/h) supported by the 5G NR C-V2X [62] will require approximately 120 seconds or 2 minutes to cover the considered area under Dataset-3. Considering such a dynamic vehicular environment makes it difficult to model it using a specific statistical model or with specific parameters (e.g., circularly symmetric zero-mean complex Gaussian with a specific variance.)

B. CODEBOOK DESIGN
The spatial-correlation characteristics of the miltiple-antenna channels can be utilized to construct matched beamforming codebooks [9], [63]. Matrix eigenvalue decomposition or spectral decomposition has been used previously for optimal precoder design [64]. Following these principles, the cluster centersĜ k obtained by our proposed solutions are factorized using their spectral decomposition where U k and A k are the m × m unitary and diagonal matrix, respectively. Then similar to the previous studies in [64] and [65], we simply choose the best eigenvector corresponding to the largest eigenvalue from the unitary matrix U k . More specifically, for a given cluster centerĜ k for group R k , we select the first column of U k , which refers to the strongest vector from the estimated covariance matrices.

C. PERFORMANCE EVALUATION
We use the normalized mean square error (NMSE) as the considered loss function to evaluate the performance of the proposed solutions. For any learning scheme, the NMSE for learning the mmWave channel covariance matrices is calculated as where is the true expected value of covariance matrices over the location i for group R k , and ||.|| F denotes matrix Frobenius norm operator. It is obvious from (12) that accurate estimation of the cluster centersĜ k c will make it as close to the expected valueĜ k z , and hence lower NMSE, which is the goal of our non-Euclidean clustering solutions towards designing the codebook. The average NMSE over all the given clusters is denoted as The LEM-based Riemannian K-Means and online RCL clustering algorithms were applied over the Riemannian manifold using the ''geomstats'' python package available in [66]. The simulations for RDL scheme considering the Riemannian geometry were performed using MATLAB following [41]. We applied the Euclidean-based K-Means clustering model using [67]. We first evaluate the NMSE performance of our proposed schemes and then observe the impact of NMSE on the achievable throughput.  and RDL schme with the Euclidean K-Means benchmark. The proposed Riemannian K-Means clustering, which is an offline quantization algorithm for estimating channel covariance matrices, yields slightly higher NMSE than the RDL scheme, but performs better than the online RCL and the Euclidean K-Means benchmark. This is because the online RCL algorithm can only update the cluster centers in a localized manner, but the offline Riemannian K-Means algorithm updates the cluster centers with full knowledge of the dataset in a global manner, resulting in higher accuracy. The result in Fig. 5 is supported by the previous studies where the offline solution outperforms the corresponding online clustering solutions [68], [69], but can only make the estimation after collecting all samples. Furthermore, Fig. 5 shows that the proposed RDL solution achieves the least NMSE compared to the other solutions as it can efficiently exploit the sparse mmWave channel properties. This is in agreement with [41], which shows that the Riemannian DL solution slightly outperforms the Riemannian K-Means method in terms of accuracy. As shown in Fig. 5, all of the non-Euclidean based Riemannian-geometric strategies outperform the Euclidean K-Means benchmark for any given number of clusters. For example, while the Euclidean K-Means approach produces an average NMSE of 3.5×10 −3 with 4 clusters, the Riemannian K-Means provides a substantially lower NMSE value of 7 × 10 −4 . In other words, the proposed Riemannian K-Means can provide 80% higher accuracy compared to the corresponding Euclideanbased benchmark. For the same number of clusters, our online RCL achieves an NMSE value of 2 × 10 −3 making it 42% more accurate than the Euclidean-based K-Means strategy. In addition, our proposed RDL solution has an NMSE of 4.8 × 10 −4 for four dictionary atoms, resulting in 86% higher accuracy than the Euclidean K-Means benchmark for learning channel covariance matrices in a mmWave network.
In comparison with recent literature (e.g., [15]), we consider Dataset-2 along with the scenario of a single cluster (i.e., K = 1). Table 2 shows that the Euclidean-based supervised Generative Adversarial Network (GAN) model of [15] (with NMSE of 0.004) has less NMSE compared to the Euclidean-based unsupervised K-Means model (with NMSE of 0.063), which is due to using labeled learning while training the GAN model. Table 2 also shows that all of the proposed non-Euclidean-based unsupervised models (i.e., Riemannian K-means, Online RCL, and RDL) achieve less NMSE than the Euclidean-based GAN. Such comparison shows the advantages of our proposed non-Euclidean and unsupervised clustering approaches, which estimate channel covariance matrices over SPD manifolds as opposed to a Euclidean space. One of the main differences between clustering over Riemannian manifold versus Euclidean spaces is the mathematical expression of the clusters' centroids. On one hand for the proposed non-Euclidean (i.e., Riemannian) approach, the centroid of each cluster is computed via the Fréchet mean of the instantaneous covariance matrices within it. Such Fréchet mean is in fact the corresponding geometric mean [70] of the instantaneous covariance matrices. On the other hand, the centroid of each cluster in the conventional Euclidean-based clustering approach is the arithmetic mean of the instantaneous channel vectors within it. Generally, the geometric mean of a group of numbers is less impacted by extreme values within such group compared to their arithmetic mean [71]. We believe that such intrinsic advantage, of having low impact for out-of-range values in the geometric (or Fréchet) mean, contributes towards having better performance (i.e., less clustering error) for Riemannian-based clustering approaches compared to Euclidean-based benchmarks. In other words, Riemannian-based clustering is less susceptible to out-of-range instantaneous channels (i.e., outliers) compared to Euclidean-based clustering. Consequently, Riemannian clustering results in less NMSE thanks to finding centroids (i.e., geometric means), which are well-positioned within close-by instantaneous channels that happen with high frequency. Fig. 6 shows the convergence of the three proposed clustering algorithms that are used to estimate the channel covariance matrices. As shown, the three clustering models converge to low values of NMSE, as that previously shown in Fig. 5, with a few iterations.
The results comparing the average NMSE between Riemannian K-Means and traditional Euclidean-based K-Means for all the considered Dataset-1, 2 and 3 is demonstrated in Fig. 7. This result deeply analyses the performance of a non-Euclidean based Riemannian-geometric solution to a Euclidean-based benchmark for estimating channel covariance matrices. It can be noted that the average NMSE is less while having a total of 5460 location points for Dataset-1, but it increases with the size of observations for Dataset-2 and 3. This is intuitive because increasing the number of location points will increase the NMSE. However, the proposed Riemannian K-Means clustering approach always performs better compared to its corresponding Euclidean-based strategy for any given data size. These results indicate that clustering channel covariance matrices directly over non-Euclidean Riemannian manifolds is more accurate compared to clustering the channel vectors based on their linear Euclidean distances. Note that both of the simulated cases in Fig. 7 apply unsupervised learning and require no prior labelled training for clustering channel covariance matrices. One more information can be extracted from Fig. 7. It is observed that the lowest NMSE is achieved for having different clusters for the three different datasets. We show the required number of clusters for different data samples in Table 3. Generally, dividing the considered region into more clusters is equivalent to adding more beamforming vectors in the designed codebook. As such, we are required to estimate the optimal number of clusters to design an efficient environment-aware beamforming codebook. Our results show that the codebook for Dataset-1 should have two unique beamforming vectors, while Dataset-2 and Dataset-3 require 4 and 6 beam vectors, respectively. We further analyze the performance of the proposed non-Euclidean based solutions with different group sizes for estimating channel covariance matrices in Fig. 8. Towards doing so, we looked at Dataset-2 and separated it into four disjoint groups. First, we consider 500 location points in each group (i.e., 4 × 500 = 2000 samples from Dataset-2), and then gradually increase the group size to 2715 location points in each cluster which is equivalent to considering the entire Dataset-2 with 10860 location points. As shown, reducing the samples from 2715 to 2000, the NMSE value provided by the Riemannian K-Means clustering solution increases to 8 × 10 −4 from 7 × 10 −4 . Similar behavior is observed for the other two clustering schemes. In particular, considering enough samples from the considered region allows the proposed solutions to make precise estimation of the cluster centers which leads to lower NMSE.

2) ACHIEVABLE THROUGHPUT PERFORMANCE
We compare the achievable throughput of the proposed learning solutions with two benchmarks. First, the uniform VOLUME 10, 2022  codebook is the one that picks the best beam for every UE locations from a fixed predefined codebook. Second, the Euclidean K-Means serves as a traditional benchmark of the Euclidean solution that was considered in Fig. 5 and Fig. 7. The transition from estimation accuracy towards codebook performance is described in two folds-(i) first, we show the trade-off between low latency (faster design) and achievable rate, then (ii) we consider fixed latency and compare the achievable throughput for different codewords (i.e., beamforming vectors).
The methodology explained in Section V-B is adopted to construct an adaptive environment-aware beamforming codebook from the estimated cluster centers. Fig. 9 demonstrates the trade-off between number of available samples (i.e., faster codebook design) and achievable data rate considering Dataset-2. We consider 4 codewords in the designed codebook for each of the schemes shown in Fig. 9. As shown in Fig. 9, reducing the data size costs some sacrifice in the achievable rate of the given learning solutions, which is a result of the estimation accuracy in Fig. 8. However, the rate achieved by our three non-Euclidean clustering approaches outperform the Euclidean benchmark for any sample size. In addition, our proposed solutions can achieve a better data rate than the uniform codebook benchmark that uses fixed codewords.
Considering Dataset-2, Fig. 10 compares the possible throughput for each strategy while taking into account different codewords. As was previously demonstrated in Fig. 5, the segmentation of a given dataset into certain particular clusters is what led to the different number of codewords (or simply beams) in the designed codebook. As shown in Fig. 10, the proposed Riemannian-geometric based solutions provide higher throughput than the Euclidean-based K-Means and uniform codebook benchmarks even for different codeword assumption. Also, the proposed RDL scheme always provides the highest throughput among the considered scenarios, thanks to its higher accuracy in estimating channel covariance matrices as shown earlier.
For a detailed comparison about the achievable throughput, we focus on the case with 4 codewords for all the considered scenarios in Fig. 10. For having 4 unique codewords, our RDL, Riemannian K-Means and online RCL clustering solutions yield an achievable rate of 4.62 bps/Hz, 4.53 bps/Hz, and 4.17 bps/Hz respectively. In the case for Euclidean K-Means and uniform codebook benchmark, the maximum data rate is 3.20 bps/Hz and 4.10 bps/Hz, respectively. As a result, our findings reveal that when compared to the Euclidean-based K-Means strategy, our proposed Riemannian K-Means enhances the data rate by more than 41%. While compared to the uniform codebook solution, the Riemannian K-Means increases the throughput by approximately 10%. We see that the online RCL solution can also provide approximately 30% higher data rate than the Euclidean-based K-Means benchmark. Finally, the proposed RDL solution provides 44% higher rate than the Euclidean K-Means and 13% higher rate than the uniform codebook, which is a result of its highest estimation accuracy in Fig. 5. Consequently, accurate estimation of 119626 VOLUME 10, 2022  channel covariance matrices over Riemannian manifold leads to increased throughput in the codebook performance. Fig. 11 depicts the running time of different clustering approaches as it varies with the potential number of clusters. The shown clustering approaches were all implemented using the same Python platform [66], [67] with an Intel(R) i7-4770 CPU, 3.4 GHz and 16 GB RAM configuration. As shown, the online RCL consumes less running time compared to the offline Riemannian K-means. For example, at 4 clusters, the online RCL has 4.6% saving in running time compared to the Riemannian K-Means approach. Fig. 11 also shows that Euclidean K-means requires less running time than Riemannian-based approaches. Contrasting Fig. 11 with Fig. 5, we find that requiring more running time results in lower NMSE (i.e., higher accuracy.) Table 4 shows the computational complexity of various clustering approaches for each input sample (i.e., assuming 1 channel input). For each of the clustering approaches, K denotes the number of clusters, W is the number of iterations for each clustering approach to find the centroids, and m is the dimension of the channel vectors. For a total of N samples, the complexities for all the clustering models in Table 4 are scaled by the factor N . On one hand and as presented in Table 4, the Riemannian K-means has complexity of O(WKm 2 ), which is mainly due to handling m × m dimensional SPD matrices [72]. Similarly, the online RCL and RDL have the same computational complexity [75] of O(WKm 2 ), thanks to clustering m × m SPD matrices. On the other hand, the Euclidean K-Means has lower computational complexity of O(WKm) [74], thanks to considering only m-dimensional channel vectors. In summary and on one hand, the Riemannian-based clustering approaches (i.e., Riemannian K-Means, Online RCL, and RDL) achieve less NMSE (Fig. 5) and higher data rate ( Fig. 9) compared to Euclideanbased K-Means. On other hand, such Riemannian-based clustering approaches require more running time (Fig. 11) and computational complexity (Table 4) compared to Euclideanbased approaches.

4) IMPACT OF ANTENNA PARAMETERS
We further investigate the effect of various antenna parameters on NMSE in Fig. 12. We choose Riemannian K-Means as a candidate of our Riemannian-geometric approach, and study how antenna parameters affect estimation accuracy while considering different antenna elements at the transmitter. As shown in Fig. 12, with a phased array of 2 × 2 antenna elements, the proposed Riemannian K-Means solution provides an NMSE of 1.6×10 −3 for 4 clusters, while the scheme provides an NMSE value of 1 × 10 −3 with 4 × 2 antenna elements, and performs even better while considering 4 × 4 phased arrays with an NMSE of 7 × 10 −4 . This implies that the proposed solution achieves approximately 128% higher accuracy when the number of antenna elements at the transmitter is increased from 4 to 16. Therefore, adopting larger phased arrays at the transmitter can help in estimating channel covariance matrices more accurately.

VI. CONCLUSION
In this paper, we have proposed an unsupervised Riemannian K-Means clustering scheme for estimating channel covariance matrices in a mmWave vehicular environment. The proposed solution is novel as it uses the underlying Riemannian-geometric structures of spatially-correlated wireless channels while learning their covariance matrices and does not require any prior labelled data for training. We have shown that the proposed non-Euclidean based Riemannian K-Means approach can provide 80% higher accuracy compared to the corresponding Euclidean-based K-Means benchmark. In addition, the achievable rate by the designed codebook for the Riemannian K-Means solution outperforms the Euclidean-based benchmark by more than 41%. We have provided another non-Euclidean clustering solution, namely online RCL, where the algorithm makes faster estimation by applying real-time clustering and can provide 30% higher rate compared to a Euclideanbased scheme. Finally, we have shown that our third proposed Riemannian-geometric solution, the RDL, exploits the mmWave sparse channel properties while learning their covariance matrices and can provide 44% higher achievable data rate than the Euclidean-based K-Means solution. Furthermore, our proposed clustering solutions use few training samples for faster codebook construction, which leads to low latency communication, with minimal loss in throughput.