DoA Estimation for Lens Antenna Array via Root-MUSIC, Outlier Detection, and Clustering

Lens antenna arrays have become attractive for mmWave MIMO systems due to their energy-focusing and path-division properties. However, when the signals cannot be well differentiated at different array elements due to their similar incident angles, we cannot estimate their DoAs separately. In this paper, we present a DoA estimation algorithm for this situation. Our algorithm has three main steps: a special version of root multiple signal classification (MUSIC) for lens antenna arrays, outlier detection, and clustering. Numerical results show that our algorithm can achieve good performance even with a large number of signal sources, a large number of array elements, and a small number of snapshots.


I. INTRODUCTION
Alens antenna array is an advanced antenna design for millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems [1]- [3]. As Fig. 1 shows, it is composed of two main components: an electromagnetic (EM) lens and a matching antenna array. The EM lens can focus the energy of the incident EM wave. After passing through the EM lens, the energy of the incident EM wave is spatially redistributed following a 'sinc' function with its center in the focal point. Thus, most of the energy is focused onto the close vicinity of the focal point. All the possible focal points form a focal arc and matching antenna elements are placed on it separately. Since the position of the focal point is determined by the direction-of-arrival (DoA) of the incident EM wave, the waves which come from sufficiently separated directions have distinct energy distributions on the lens antenna array. By the energy-focusing property, the signals with separated DoAs can be regarded as received by different array elements, as Fig. 1 shows. Thus, we can easily distinguish these signals and handle them separately. Because most of the energy is received by a small group of array elements, we can only connect them to radio frequency (RF) chains for further processing and ignore the rest. This makes a lens antenna The associate editor coordinating the review of this manuscript and approving it for publication was Usama Mir . array a good candidate for mmWave MIMO systems because both the hardware complexity and the power consumption are significantly reduced.
However, when the incident angles of the signals are similar and the groups of array elements receiving these signals overlap, as Fig. 2 shows, we cannot handle these signals separately. Specifically, we cannot use simple methods (such as algorithm A1 provided in Section VI) to estimate their DoAs independently. This situation can arise when the devices are deployed densely and communicate simultaneously.
In this paper, we provide a DoA estimation algorithm for the scenario that the incident angles of the signals are similar, as described above. Our algorithm has three main steps: (i) a special version of root multiple signal classification (MUSIC) [4] for lens antenna array, (ii) outlier detection, and (iii) clustering. The numerical results show that our algorithm can achieve a good performance even with a large number of signal sources (more than 10), a large number of array elements (more than 40), and a small number of snapshots (no more than 10).
We note that DoA estimation is considered in some papers studying channel estimation [1]- [3]. However, these papers study strategies which have explicit training overhead. This is not the case in our paper, in which the receiver does not know the transmitted symbols and treats them as Gaussian random variables, which is true for applications like localization. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. For lens antenna array, the signals with separated DoAs can be regarded as received by different array elements.

II. LENS ANTENNA ARRAY MODEL
The lens antenna array is described in [1]- [3]. As shown in Fig. 1 and Fig. 2, it consists of a planar EM lens and an antenna array. The EM lens is placed on the y-z plane and its size is D y × D z . The thickness of this lens is negligible compared with the focal length. The center of this lens is placed at the origin. We only consider azimuth DoAs, so the focal arc of the lens is on the x-y plane. The lens is well designed so that the focal arc is a semicircle around the center of the lens. The focal length of the lens, as well as the radius of the semicircle, is denoted as F.
To receive the focused energy, the array elements should be placed on the focal arc. Let (0 ≤ ≤ π ) and λ denote the azimuth coverage angle by the lens antenna array and the signal wavelength, respectively. According to [3], the array elements are placed based on the angle resolution of the lens such that where m is the index of antenna, M = 0, ±1, . . . , ±M is the set of indices,M = D y sin ( /2) /λ , andφ m ∈ [− /2, /2] is the angle of antenna m relative to the xaxis. Here a means the largest integer no greater that a. The number of array elements, denoted by M , is given as According to [1,Lemma 1], for the lens antenna array described above, the array response to a signal from direction where A = D y D z /λ 2 . 0 is a phase shift from lens' aperture to the array and it is common for all array elements. Since DoA estimation is based on the relative values of the array response, 0 will not influence our further analysis so we ignore this term by setting 0 = 2nπ for some integer n, just like [1], [2]. L (L < M ) non-coherent narrowband signal sources impinge the antenna array [·] T means the transpose. For snapshot t ∈ N, the received vector x(t) ∈ C M ×1 is given by where s l (t) is the signal of the lth source and n(t) ∈ C M ×1 is the noise vector at time t. Let us define Then, we can rewrite (3) as As [5], we assume both the signals and the noise are stationary, second-order ergodic, spatially and temporally white complex Gaussian processes. s(t) and n(t) are circularly-symmetric complex joint-Gaussian with distributions N (0, P) and N (0, σ 2 N I M ), respectively. I M is the identity matrix of size M . Since the signal sources are non-coherent, P is assumed to be non-singular and diagonal. The ith element on the diagonal is P i , which is the signal power of the ith source. The rank of P is L.
The array covariance matrix is given as When i = j, the (i, j)th element of this matrix, can be expressed as and the (i, i)th element can be expressed as We can observe that the (i, j)th element of the array covariance matrix is a weighted sum of all the signal power (added by the power of noise if the element is on diagonal). The weight of the kth signal power is dependent on the distances between its focal point and antenna i −M − 1, antenna j−M −1. The nearer their distances are, the larger the weight is. This observation coincides with the energy-focusing property mentioned in Section I. Assuming the number of snapshots is N , the goal of this paper is to provide an algorithm which estimates φ from x(1), . . . , x(N ).

III. OVERVIEW OF OUR ALGORITHM
In this section, we introduce the idea of our algorithm. In general, our algorithm can be divided into three main steps. First, DoA estimation has been widely investigated for a long time and many classical algorithms, e.g., MUSIC [6], estimation of signal parameters via rotational invariance techniques (ESPRIT) [7], and Iterative Quadratic Maximum Likelihood (IQML) [8] are provided to obtain an accurate estimation of DoA. Interested readers may refer to relevant textbooks and surveys, e.g., [9]- [11]. However, these algorithms are not suitable for lens antenna arrays. Algorithms like IQML are only suitable for conventional antenna array where every array element receives the incident wave with the same energy, and only the phases of the wave at array elements are different. Actually, these algorithms are designed by exploiting the DoA information implicit in the difference of the phases. However, due to the unique angle-dependent energy focusing capability of lens antenna array, its array responses are quite different from the conventional ones. Hence, algorithms like IQML cannot be directly used. Hence, modifying these algorithms for lens antenna array should be a good idea. Thus, in the first step of our algorithm, we provide a special version of root-MUSIC for lens antenna arrays.
Compared with conventional root-MUSIC, our version has two main differences.
First, in conventional root-MUSIC, the Vandermonde structure, which plays an important part in the algorithm, is obtained by nature. However, the array response of a lens antenna array does not have a Vandermonde structure, which makes direct application of root-MUSIC and its variants (e.g., unitary root-MUSIC [12]) impossible. Here, by reexpressing the 'sinc' function inside the array response in Second, in conventional root-MUSIC, the estimated DoAs are obtained by solving a (2M − 2)th degree equation (see equation (27) provided later) and find the desired L ones of all 2M −2 roots. It can be done because the L roots containing the DoA information must be with the largest magnitude inside the unit circle. However, in the root-MUSIC for lens antenna array, where is no such rule to help us distinguish the desired roots from the others.
To find the desired roots, in the second step of our algorithm, we use outlier detection. Here, we solve M − L (M − 1)th degree equations. For each equation, there are L desired roots containing the DoA information and M − 1 − L useless ones. Ideally, these desired roots should be the same for all M − L equations and the others are quite different. Hence, we can treat the undesired roots as isolated points and remove them by outlier detection.
Due to the noise and the limited number of snapshots, the DoAs obtained from the different equations cannot be the same but are perturbed around the true value. Hence, in the third step, we use clustering to divide all DoAs obtained into L clusters, and treat the center of each cluster as the estimated DoA.
The proposed method is summarized as Algorithm 1. There are three main steps in the algorithm: 1) A specially modified version of the Root-MUSIC algorithm for lens antenna arrays. The details of the algorithm are presented in Section IV. 2) Outlier detection method. This is presented in Section V. 3) Clustering method. For clustering, we use the K-medoids clustering algorithm [13]- [15]. Briefly, the K-mediods alogithm works as follows: First, we greedily select L of all the data points as the medoids to minimize the cost and associate each data point to the closest medoid. Then, for each medoid and for each non-medoid data point, we consider the swap of these two points, and compute the cost change.
Finally, we perfom the best swap. If it decreases the cost function, we do the swap again. Otherwise, the algorithm terminates. Since it is a standard method, we refer to the reader to [13]- [15] for a more detailed explanation.

IV. ROOT-MUSIC FOR LENS ANTENNA ARRAY
The sample data covariance matrix is given aŝ Let {λ 1 , . . . ,λ M } be the ordered eigenvalues ofR, such that λ 1 ≥ · · · ≥λ M > 0, and letê i be the eigenvectors associated withλ i . According to [4], [16], [17], we can partition these eigenvectors into signal eigenvectors and noise eigenvectors. Ideally, the signal eigenvectors span the range space of A whereas the noise eigenvectors span its orthogonal complement, i.e., the nullspace of A H . Hence, DoAs are estimated using , which is made up of the noise eigenvectors.

A. CONSTRUCTING VANDERMONDE STRUCTURE
In conventional root-MUSIC, the array response is a vector with Vandermonde structure and can be written as (9), we can obtain 2M − 2 roots. Among them, the L number of the roots which are inside and closest to the unit circle are picked. Then, the estimated DoA are derived using these picked roots. For lens antenna array, the array response is not a Vandermonde vector any more, so the conventional root-MUSIC cannot work. Hence, we construct a new vector with Vandermonde structure in this subsection. First, as N → +∞,R converges to APA H + I M . Since A, P, and I M are all real,R should be real as N → +∞. Hence, forR, we only consider its real part, because its imaginary part is caused by the limited number of snapshots. We definẽ R as the real part ofR, with eigenvaluesλ m and eigenvectors , and we useG instead ofĜ in (9).
The ith element ofẽ m is denoted by e i−1−M ,m , soẽ m can be expressed as [e −M ,m , . . . , eM ,m ] T . In the following, we define and express the response of antenna m in terms of z, as To construct the Vandermonde structure, we first give the following Lemma.
can be expanded to and and θ 0 = 1.
Proof: According to the property of elementary symmetric polynomials [18], we can obtain In addition, we define that From (16) and (17), we can observe that The corresponding coefficients of the polynomials on the left and right sides of (18) should be equal, so we can obtain (14) and χ 0,i = θ 0 = 1. Substituting (17) into (12), we can obtain (13) and the lemma is proven. Now, we give the following theorem. Theorem 1: a(z) Tẽ m = 0 is equivalent to the following equations.

1) SUFFICIENCY
In this part, we prove any z satisfying a(z) Tẽ m = 0 must satisfy (19). If there exists a z 0 ∈ −M , . . . ,M such that a(z 0 ) Tẽ m = 0, since we have we can obtain that e z 0 ,m = 0. Thus, when z 0 ∈ −M , . . . ,M , we can derive that Since i is an integer, we have sin(π i) = 0 and cos(π i) = (−1) i . Then, we can further simplify the equation above as Note that since z 0 is not an integer, sin(π z 0 ) = 0. Then, from a(z 0 ) Tẽ m = 0, we can derive that Multiplying both sides of (24) by M n=−M (z 0 − n), we can obtain the term in (12) is also 0. Summarizing the above results, according to Lemma 1, we can obtain (19). Thus, the sufficiency is proven.

2) NECESSITY
In this part, we prove any z satisfying (19)  A sin(πz 0 )/π and using (22) and (23), we can obtain that a(z 0 ) Tẽ m = 0. Thus, the necessity is proven. From the above theorem, (9) becomes where . Compared with (9), we use transpose instead of Hermitian transpose because both z and F are real. Thus, we successfully construct a vector with Vandermonde structure.

B. OBTAINING THE ROOT CANDIDATES
However, different from the conventional case, it is difficult to find our desired L roots containing the information of the DoAs among all the 2M − 2 roots in (27), so we here use However, due to the noise and the limited number of snapshots, the desired roots derived from different equations above cannot be exactly the same but perturb around the true value. In addition, for each equation, we will also obtain M − 1 − L useless roots containing no information about the DoAs. Here, we construct a matrix Z ∈ C (M −L)×(M −1) to store all the root candidates obtained from (29), where the (m − L)th row of Z is the vector of roots derived from solving z M −1 , . . . , 1 · f m = 0. We leave the task of distinguishing the desired roots from all the root candidates to the next step. Even though there is no analytical solution for (29) when M − 1 is equal to or larger than 5, there are lots of methods (please refer to [19]) available to find the roots of a polynomial quickly and accurately, such as Newton's method [20].
The special version of Root-MUSIC for lens antenna array provided in this section is summarized as Algorithm 2.

V. OUTLIER DETECTION
In this section, we treat the roots containing no information about the DoAs as outliers and remove them using outlier detection. From IV-B, we know among the roots obtained from each equation of (29), there are L ones containing the information about the DoAs. Thus, for the root candidates obtained from each equation, we leave no more than L of them. The rule of selection is given as follows: First, from (11) we know that the desired rules should be real and not larger than ≤ D y /λ, so we remove all the root candidates which do not satisfy these two conditions. For many cases, we can further remove the root candidates by roughly limiting the range of DoAs using the energy received. For example, if starting with one element, the array elements have continuous drops in energy all the way to the edge, it means no incident waves arrive at this part. Second, ideally, the L desired roots obtained from each equation of (29) should be the same and satisfy (28). Due to the noise and the limited number of snapshots, and put them into one big set. We call this set DoA dataset, and each element inside is one DoA data. The DoA dataset can be illustrated as Fig. 3. First, it contains some groups of points that very close to each other, as the solid black circles. The points in the same group actually correspond to one incident wave, and they will later be divided into the same cluster, of which the center will be the estimated DoA of this incident wave. However, besides these points, there are still some isolated points which will influence the results of clustering and should be removed. We set a threshold D t1 , if the distance from one DoA data to its nearest point is larger than D t1 , we believe this DoA data is isolated and remove it. Using this method, we can successfully remove the points like the hollow circles in Fig. 3. However, this method cannot remove the points like the hollow squares. To solve this problem, we calculate the local outlier factor (LOF) [21] of each DoA data and remove the data point whose LOF is larger than a given threshold D t2 . LOF is a score that tells how likely a certain data point is an outlier. The higher the LOF of a data point, the more likely it is to be an outlier. Note that LOF requires a new parameter, K near , which is the number of neighbors that the LOF calculation is considering, and in our following simulations we will assign specific values to it.
In summary, the outlier detection is given as Algorithm 3. After the outlier detection, we perform K-medoids clustering, and the center of one partitioned cluster is the estimated DoA of one incident wave.

Algorithm 3 An Outlier Detection Method to Find and Clean the Undesired Points, outlier_detection(L, Z)
Inputs: L -the number of signal sources, Z -the matrix of root candidates Output: φ data -the desired DoA data points after data clean 1: Initialize φ data as an empty set {}. Initialize z p as z this (j) 6: if z p / ∈ R or z p > D y /λ then 7: z this ← z this \{z p } end for 13: if length(z this ) ≤ L then 14: φ data ← φ data ∪ mapping (z this ) 15: else 16: v index ← indices of z value with minimal L values 17: φ data ← φ data ∪ mapping (z this (v index )) 18: end if 19: end for 20: Derive φ Dmin , where the ith element is the distance between φ data (i) and its nearest point. 21: for i = 1 to length φ data do 22: if φ Dmin (i )>D t1 then φ data ← z data \{φ data (i )} Remove the isolated point whose distance from the nearest point is larger than a given threshold. if φ LOF (j )>D t2 then φ data ← φ data \{φ data (j )} Remove the isolated point whose local outlier factor is larger than a given threshold.

VI. NUMERICAL RESULTS
In this section, we give some numerical results to show the performance of our method. We set D y /λ = 20 and A = 1. The azimuth coverage angle = π , so the DoA can come from −90 • to 90 • .M = D y sin ( /2) /λ = 20. Hence, 41 array elements placed on the focal arc of the lens,  where the angle of antenna m relative to the x-axis isφ m = arcsin(0.05m), m = 0, ±1, . . . , ±20.
To simulate a large number of signals arriving at the receiver simultaneously, we assume there are L = 14 equally powered uncorrelated narrowband sources with DoAs {±35 • , ±30 • , . . . , ±5 • }. Thus, P can be given as σ 2 P I L , where σ 2 P is the average power of the signals. Fig. 4 shows array response of array elements with indices 3, 4, and 5 as an example, and the situations for other array elements are similar. We can observe that since the incident angles of the adjacent signals are similar, the groups of array elements receiving these signals overlap. Hence, it is impossible to divide the array elements into some separated subsets and make each one handle one incident wave. Table 1 is the result of our method for one simulation. Here we set the number of snapshots N = 10 and the signalto-noise ratio (SNR) as 20dB. For outlier detection, we use D t1 = 4.5 • , D t2 = 3.5, and K near = 5. We roughly limit the range of DoAs by the energy received at first. From this table, we can find the error between the estimated and the true values come from two parts. One is that the true DoA fails to be detected and another angle is chosen instead by mistake. In table 1, the receiver fails to detect the incident wave with DoA 10 • , but believes that there exists a signal coming from 0.25 • . Usually, the difference between the estimated and the true values can be large in this case (9.75 • in table 1). We call this kind of error ''detection error''. The other one is the small deviation from true DoA due to the noise and limited number of snapshots, which exists for every incident wave. We call this error ''estimation error''.
We analyze the performance of our provided method from these two kinds of errors. For detection error, we use the probability that a detection error occurs as the performance metric. In a simulation, if we run T run independent runs, and the detection error occurs T l times for the lth incident wave, we can say the detection error probability (DEP) of the lth incident wave, DEP l , is approximated as Here we give a strict requirement that a detection error occurs as long as the difference between the estimated DoA and true DoA is larger than 2 • . For the DoAs which are successfully detected, we use root mean square error (RMSE) to study estimation error. Let D l be the set of simulation indices where the lth incident wave is successfully detected. Then, the RMSE of the lth incident wave is given by whereφ l (n) denotes the estimated DoA of the lth incident wave at the nth run. Now, we investigate these two performance metrics. Since the figures will become messy if we draw the curves of all incident waves, in the following figures, we only give the curves of incident waves with DoA 5 First, we make N change from 1 to 10. Other parameter settings are the same with table 1. Fig. 5 shows the DEP versus the number of snapshots N . It is observed that in general, the DEP decreases as more snapshots are available, as expected. There are some small fluctuations because we do not dynamically choose D t1 , D t2 , and K near for different simulation points. In addition, we can find that the waves with smaller incident angles generally performs worse than those with larger incident angles. It is because that the waves with smaller incident angles are focused onto the array elements near the center, which are heavily influenced by other signals on both sides, while the waves with larger incident angles are influenced by the signals only from one side. It is worthy to note that our algorithm can provide a good performance with only a small number of snapshots. Moreover, even in the extreme case where N = 1 or N = 2, the results of our algorithm are still acceptable. Fig. 6 shows the RMSE versus the number of snapshots N . We can find that the RMSE also decreases as we use more snapshots. Again, we can find that the waves with smaller incident angles generally performs worse than those with larger incident angles. Finally, our algorithm also provides a good RMSE performance with only a small number of snapshots.
Then, we investigate the DEP and the RMSE under different SNRs. We let the SNR change from 2dB to 20dB and set other parameters same with table 1. Fig. 7 shows the DEP versus the SNR. We can observe that as the SNR increases, the DEPs of all incident waves decrease, as expected. Also, the same with Fig. 5, we can still find that the waves with smaller incident angles have worse performance than those with larger incident angles. A similar situation can be seen in Fig. 8, which illustrates the RMSE performance. Finally, Fig. 7 and Fig. 8 show that our algorithm can provide a good performance even in a low-SNR regime.
To further show that our algorithm can achieve a good performance, here we provide two other algorithms for comparison.
In algorithm A1, we simply use the two array elements closest to the focal point. Assuming these two elements are with index m c1 and m c2 , the estimation is derived by solving where x m c1 (t) and x m c2 (t) are the signals received by elements m c1 and m c2 at snapshot t, respectively. Notice that this algorithm is feasible when the signals have separated DoAs, as the case in Fig. 1. Since there is little overlap in energy distribution, the array elements where the energy of the received signals are larger must be closer to the focal points. Thus, for one focal point, we can find the two array elements closest to it.  However, for the case investigated in this paper, when there are a large number of incident waves with similar DoAs, the groups of array elements receiving these signals overlap. Thus, as Fig. 4 shows, it is quite difficult to distinguish which array elements are closest to the focal points, since each element is influenced by multiple signals. Here, for the purpose of comparison, we assume the information about the closest array elements is known in advance, perhaps via an omniscient genie. We refer to this version of algorithm A1 as A1 with a genie. If we do not have such genie, we simply believe the L incident waves arrive at the antenna elements with largest receiving energy. We refer to this as A1 without a genie.
Fast root-MUSIC was provided in [22] as an alternative method to find the desired roots. In algorithm A2, we modify this method to lens antenna array and use it instead of roots cleaning and clustering to solve (29). According to [22], we divide F into two sub-matrix that where c ∈ R L×1 can be obtained by and the estimated DoAs can be derived by (30).  For comparison, we use the DEP and the RMSE over all antenna elements. Thus, (37) Fig. 9 and Fig. 10 show the DEP and the RMSE performance versus the SNR for different algorithms, respectively. Here we still assume the number of snapshots N = 10. From both figures we can observe that the algorithm A2, which is based on the fast root-MUSIC method, performs much worse than other two algorithms in the situation where there are a large number of incident waves and array elements. Briefly, it is because a large amount of information is lost when we 'compress' M − L equations in (29) into only one equation (34). Though the algorithm A1 with a genie performs slightly better than our algorithm, do not forget that it is based on a impractical assumption that we know which array elements are closest to the focal point of the incident wave. It actually considerably narrows the range because the DoA must be inside the range where its corresponding focal point is between these two array elements. For the case of A1 without a genie, we can observe that our algorithm performs better both in DEP and RMSE. Thus, these results verify that our algorithm is competitive and has good performance.  For supplement, we consider another scenario. Here, the number of array elements M = 13, and there are L = 6 more closely spaced non-coherent sources with DoAs {20 • , 23 • , 26 • , 29 • , 32 • , 35 • }. We fix the number of snapshots N = 10. To test the performance of our algorithm when the SNR is very low, we let the SNR change from −20dB to 20dB. In this scenario, we still compare our algorithm with A1 with a genie, A1 without a genie, and A2. For our algorithm, we set D t1 = 3.5 • , D t2 = 7, and K near = 8. From Fig. 11 and Fig. 12, we can observe that our algorithm performs better than other three algorithms. In this case, A1 with a genie has a worse performance, which differs from the case shown in Fig. 9 and Fig. 10 . This is because the array elements become sparse while DoAs of the sources become close. Thus, different incident waves may share the same closest array elements. However, from (33), we can find that only one DoA can be estimated using the signals received by two adjacent array elements, making A1 with a genie inaccurate in this case.

VII. CONCLUSION
In this paper, we provide a DoA estimation algorithm for the lens antenna array, which is suitable for the situation where the groups of array elements receiving the signals overlap so we cannot handle these signals separately. Our algorithm has three main steps: a special version of root multiple signal classification (MUSIC) for lens antenna array, outlier detection, and clustering. In numerical results, our algorithm is shown to be a good candidate for DoA estimation even when the numbers of signal sources and array elements are large and the number of snapshots is small.