1 Introduction

It is well-known that the physical nature of wave propagation and diffraction imposes a fundamental barrier in the resolution of imaging systems, which is termed diffraction limit or resolution limit. Since the famous works of Abbe [1] and Rayleigh [57] for quantifying the resolution limit, it is widely used in practice to date that the resolution limit is near half of the wavelength (see, for instance, [5, 6]). Although this kind of resolution limit was widely used, it lacks of mathematical foundations and not very applicable to some modern imaging modalities [15, 56]. From the mathematical perspective, the resolution limit could only be set when taking into account the noise [12, 21, 23] and surpassing these classical resolution limits is very promising for imaging modalities with high signal-to-noise ratio (SNR). This understanding motivates new works on deriving more rigorous resolution limits [30, 31, 49, 50]. At the beginning of this century, the dependence of two-point resolution on the noise level has been thoroughly investigated from the perspective of statistical inference [63,64,65], but the resolution estimates for resolving multiple sources only achieve substantial breakthroughs in recent years due to its nonlinearity.

To understand the resolution in resolving multiple sources, in the earlier works [44,45,46] we have defined “computational resolution limits” for number detection and location recovery in the one- and multi-dimensional super-resolution problems and characterized them by the signal-to-noise ratio, cutoff frequency, and number of sources. In [45], we derived sharp estimates for the computational resolution limits in one dimensional super-resolution problems. We extended the estimations to multi-dimensional cases in [44], but the new estimation is not that sharp due to the techniques of projection used there. Specifically, the upper bound for the resolution increases rapidly as the source number n and space dimensionality k increases. To address this issue, this paper aims to derive better estimates for the computational resolution limits in two-dimensional super-resolution problems and provide a new idea having the potential to tackle general multi-dimensional cases. The main contributions of our work are fourfold: (1) Our work improves the resolution estimates in [44] for number detection and location recovery in two-dimensional super-resolution problems; (2) As a consequence, we derive a stability result for a sparsity-promoting algorithm in two-dimensional super-resolution problems [or direction of arrival problems (DOA)]. Although it is well-known that the total variation optimization [11] and many other convex optimization based algorithms [70] have a resolution limit near the Rayleigh limit [16, 22, 69], our stability result exhibits the optimal super-resolution ability of \(l_0\)-minimization in solving such problems; (3) Inspired by the techniques used in the proofs, we propose a new coordinate-combination-based model order detection algorithm for two-dimensional DOA problems and demonstrate its optimal performance both theoretically and numerically, and (4) we also propose a new coordinate-combination-based MUSIC (states for MUltiple SIgnal Classification) algorithm for super-resolving sources in two-dimensional DOA estimation. Our original algorithm enjoys certain advantages compared to the conventional DOA algorithms. We also exhibit numerically the phase transition phenomenon of the algorithm, which demonstrates its excellent resolving capacity.

1.1 Existing Works on the Resolution Limit Problem

The first theory for quantifying the resolution limit was derived by Ernst Abbe [1, 72]. Since then, there have been various proposals for the resolution limit [33, 57, 61, 66], among which the famous and widely used ones are the Rayleigh limit [57] and the full width at half maximum (FWHM) [20]. However, these classical resolution limits neglect the effect of noise and hence are not mathematically rigorous [12, 21, 23]. From a mathematical perspective, there is no resolution limit when one has perfect access to the exact intensity profile of the diffraction images. Therefore, the resolution limit can only be rigorously set when taking into account the measurement noise or aberration to preclude perfect access to the diffraction images. Based on this understanding, many works were devoted to characterizing the dependence of the two-point resolution on the signal-to-noise ratio from the perspective of statistical inference [30, 31, 49, 50, 63,64,65]. These classical and semi-classical limits of two-point resolution have been well-studied and we refer the reader to [12, 17, 21, 41] for more detailed introductions.

For the resolution limit of superresolving multiple point sources, the problem becomes much more difficult due to the high degree of nonlinearity. To our knowledge, the first breakthrough was achieved by Donoho in 1992 [24]. He considered a grid setting where a discrete measure is supported on a lattice and regularized by a so-called Rayleigh index. The problem is to reconstruct the amplitudes of the grid points from their noisy Fourier data in \([-\Omega , \Omega ]\) with \(\Omega \) being the band limit. He derived both the lower and upper bounds for the minimax error in the amplitude reconstruction, emphasizing the importance of sparsity and signal-to-noise in super-resolution. But the estimate has not been improved until recent years. In recent years, due to the enormous development of super-resolution modalities in biological imaging [10, 29, 32, 59, 77] and the popularity of researches of super-resolution algorithms in applied mathematics [7, 11, 22, 25, 38, 40, 52, 53, 55, 70, 71], the inherent superresolving capacity of the imaging problem is drawing increasing interest and has been well-studied for the one-dimensional case. In [19], the authors considered resolving n-sparse point sources supported on a grid and improved the results of Donoho. They showed that, in the presence of noise with intensity \(\sigma \), the minimax error in the amplitude recovery scales like \(\textrm{SRF}^{2n-1}\sigma \) where \(\textrm{SRF}:= \frac{1}{\Delta \Omega }\) was called the super-resolution factor. The case of multi-clustered point sources was considered in [8, 37] and similar minimax error estimations were derived. In [4, 9], the authors considered the minimax error for recovering off-the-grid point sources. Based on an analysis of the “prony-type system”, they derived bounds for both amplitude and location reconstructions of the point sources. More precisely, they showed that for \(\sigma \lessapprox (\textrm{SRF})^{-2p+1}\), where p is the number of point sources in a cluster, the minimax error for the amplitude and the location recoveries scale respectively as \((\textrm{SRF})^{2p-1}\sigma \) and \((\textrm{SRF})^{2p-2} {\sigma }/{\Omega }\), while for the isolated non-clustered source, the corresponding minimax error for the amplitude and the location recoveries scale respectively as \(\sigma \) and \({\sigma }/{\Omega }\). We also refer the reader to [12, 51] for understanding the resolution limit from the perspective of sample complexity and to [16, 69] for the resolving limit of some algorithms.

On the other hand, in order to characterize the exact resolution rather than the minimax error in recovering multiple point sources, in the earlier works [42,43,44,45,46] we have defined “computational resolution limits” which characterize the minimum required distance between point sources so that their number and locations can be stably resolved under certain noise level. By developing a nonlinear approximation theory in a so-called Vandermonde space, we have derived sharp bounds for computational resolution limits in the one-dimensional super-resolution problem. In particular, we have showed in [45] that the computational resolution limits for the number and location recoveries should be respectively \(\frac{C_{\textrm{num}}}{\Omega }(\frac{\sigma }{m_{\min }})^{\frac{1}{2n-2}}\) and \(\frac{C_{\textrm{supp}}}{\Omega }(\frac{\sigma }{m_{\min }})^{\frac{1}{2n-1}}\), where \(C_{\textrm{num}}\) and \(C_{\textrm{supp}}\) are constants and \(m_{\min }\) is the minimum strength of the point sources. We have extended these estimates to multi-dimensional cases in [44] but the results are not that optimal due to the projection techniques used there. In this paper, we improve the estimates for the two-dimensional super-resolution problem by a new technique. The improvements shall be discussed in detail in Sect. 2.

1.2 Direction of Arrival Estimation

Our work also inspires new ideas for the two-dimensional direction of arrival estimation. Direction of arrival (DOA) estimation refers to the process of retrieving the direction information of several electromagnetic waves/sources from the received data of a number of antenna elements in a specific array. It is an important problem in array signal processing and finds wide applications in radar, sonar, wireless communications, etc; see, for instance, [6].

In one-dimensional DOA estimation, if the antenna elements are uniformly spaced in a line, the well-known MUSIC, ESPRIT algorithms, and other subspace methods can resolve the direction of each incident signal/source with high resolution. But for the two-dimensional DOA estimation with regular rectangular array (URA) where both azimuth and elevation angles should be determined, these subspace methods cannot be simply extended to the two-dimensional case to directly determine the azimuth and elevation angle of each source. A major idea to solve the two-dimensional DOA problem is to decompose it into two independent one-dimensional DOA estimations in which the subspaces methods can be leveraged to efficiently restore the direction components of sources corresponding to x-axis and y-axis. We call the methods with this decoupling idea as one-dimensional-based algorithms throughout the paper for the convenience of discussion. It is worth emphasizing that other ways for directly obtaining the azimuth and elevation angles of each source were also considered [39, 78, 80], but the signal processing in a higher dimensional space reduced their computational efficiency.

Although the one-dimensional-based algorithms are usually much more computationally efficient, they still suffer from some issues: (1) the loss of distance separation for x-axis or y-axis components; (2) pair matching of the estimated elevation and azimuth angles. For the first issue, the x-axis (or y-axis) components of two sources may be closely spaced even though the two sources are far away in the two-dimensional space. This causes very unstable reconstruction of the one-dimensional components and the sources. Most of the researches usually ignored these issues and some papers proposed different ways to enhance the reconstruction but the proposed methods are complicated [73, 74]. For example, in [74], the authors utilized Taylor expansion, subspace projection, and a tree structure to enhance the reconstruction when the recovered one-dimensional components are unstable. The second issue is that the pair matching of the estimated elevation and azimuth angles is very time consuming when dealing with multiple components of sources. It usually requires a complex process or two-dimensional search [18, 35, 48, 68, 80].

In this paper, we propose a new efficient one-dimensional-based algorithm for the two-dimensional DOA estimation which solves the above two issues in a simple way. First, our algorithm employs a new idea named coordinate-combination to avoid severe loss of distance separation between sources in certain region; see Sect. 5.4 for the detailed discussion. On the other hand, unlike conventional one-dimensional-based algorithms, the pair matching problem of our algorithm is a simple balanced assignment problem [54] which can be solved efficiently by many algorithms such as the Hungarian algorithm.

1.3 Organization of the Paper

The rest of the paper is organized in the following way. In Sect. 2, we present the main results on computational resolution limits for the number detection and the location recovery in the two-dimensional super-resolution problem. We also provide a stability result for a sparsity promoting algorithm. In Sect. 3, we prove the main results in Sect. 2 and discuss the generalization to higher dimensions. Inspired by the techniques in the proofs, in Sects. 4 and 5 we introduce respectively the coordinate-combination-based number detection and source recovery algorithms in two-dimensional DOA estimations. We also conduct numerical experiments to demonstrate their super-resolution capability. Section 6 presents a nonlinear approximation theory in Vandermonde space which is also a main part of the proof of our main results. Section 7 is devoted to some conclusions and future work. In the “Appendix”, we prove a technical lemma.

2 Main Results

2.1 Model Setting

We consider the following model of a linear combination of point sources in a two-dimensional space:

$$\begin{aligned} \mu =\sum _{j=1}^{n}a_{j}\delta _{{\textbf{y}}_j}, \end{aligned}$$

where \(\delta \) denotes Dirac’s \(\delta \)-distribution in \(\mathbb R^2\), \({\textbf{y}}_j \in {\mathbb {R}}^2,1\le j\le n\), which are the supports of the measure, represent the locations of the point sources and \(a_j\in {\mathbb {C}}, 1\le j \le n\), their amplitudes. We call that the measure \(\mu \) is n-sparse if all \(a_j\)’s are nonzero. We denote the coordinates of each \({\textbf{y}}_j\) by \({\textbf{y}}_{j,1}, {\textbf{y}}_{j,2},\) and

$$\begin{aligned} m_{\min }=\min _{j=1,\ldots ,n}|a_j|, \quad D_{\min }=\min _{p\ne j}\left| \left| {{\textbf{y}}_p-{\textbf{y}}_j}\right| \right| _1. \end{aligned}$$
(2.1)

We assume that the available measurement is the noisy Fourier data of \(\mu \) in a bounded and continuous domain \([0,\Omega ]^2\), that is,

$$\begin{aligned} {\textbf{Y}}(\varvec{\omega }) = {\mathscr {F}} \mu (\varvec{\omega }) + \textbf{W}(\varvec{\omega })= \sum _{j=1}^{n}a_j e^{i {\textbf{y}}_j^\top \varvec{\omega }} + {\textbf{W}}(\varvec{\omega }), \ \varvec{\omega }\in [0, \Omega ]^2, \end{aligned}$$
(2.2)

where \({\mathscr {F}} \mu \) denotes the Fourier transform of \(\mu \), \(\Omega \) is the cutoff frequency, and \({\textbf{W}}\) is the noise. We assume that

$$\begin{aligned} \left| {{\textbf{W}}(\varvec{\omega })}\right| < \sigma , \quad \varvec{\omega }\in [0, \Omega ]^2, \end{aligned}$$

where \(\sigma \) is the noise level. We remark that, throughout the paper, we will use bold symbols for vectors, matrices, and functions, and ordinary ones for scalar values. Especially, measurements \(\textbf{Y}\) and noise \(\textbf{W}\) are viewed as functions in all sections. Also, we will only use \(||\cdot ||_{\infty }, ||\cdot ||_{1}, ||\cdot ||_{2}\) for vectors.

We are interested in the resolution limit in superresolving a cluster of tightly spaced point sources. To be more specific, we denote by

$$\begin{aligned} B_{\delta , \infty }(\textbf{x}):= \Big \{ {\textbf{y}} \ \Big |\ \textbf{y}\in {\mathbb {R}}^2,\ ||\textbf{y}- \textbf{x}||_{\infty }<\delta \Big \}, \end{aligned}$$

and assume that \({\textbf{y}}_j \in B_{\frac{(n-1)\pi }{6\Omega }, \infty }(\textbf{0}), j=1,\ldots ,n\), or equivalently \(\left| \left| {{\textbf{y}}_j}\right| \right| _{\infty }<\frac{(n-1)\pi }{6\Omega }\). This assumption is because our techniques rely on the approximation theory in the Vandermonde space (Sect. 6) and the stability is related to the distance between \(e^{i {\textbf{y}}_{j,1}\omega ^*}+e^{i {\textbf{y}}_{j,2}\omega ^*}, j=1, \ldots , n,\) for certain \(\omega ^*\). Without this assumption, although the \({\textbf{y}}_j\)’s are well-separated on \({\mathbb {R}}\), the \(e^{i {\textbf{y}}_{j,1}\omega ^*}+e^{i {\textbf{y}}_{j,2}\omega ^*}\)’s may be very close. This is a common assumption when tackling the super-resolution problem, see for instance [9, 44, 45]. Since we are more interested in the case when \({\textbf{y}}_j\)’s are tightly spaced, this assumption is also reasonable.

2.2 Computational Resolution Limit for Number Detection in the Two-Dimensional Super-Resolution Problem

In this section, we estimate the super-resolving capacity of the source number detection in two-dimensional super-resolution problems. To be specific, we will define and characterize a computational resolution limit for the corresponding number detection problem. Our main results are built upon delicate analysis of the \(\sigma \)-admissible measure defined as follows:

Definition 2.1

Given a measurement \({\textbf{Y}}\), we say that \(\hat{\mu }=\sum _{j=1}^{m} {\hat{a}}_j \delta _{ {\hat{\textbf{y}}}_j}, \ {\hat{\textbf{y}}}_j\in {\mathbb {R}}^2\) is a \(\sigma \)-admissible discrete measure of \({\textbf{Y}}\) if

$$\begin{aligned} \left| {{\mathscr {F}}{\hat{\mu }} (\varvec{\omega })-\textbf{Y}(\varvec{\omega })}\right| < \sigma , \ \text {for all}\ \varvec{\omega } \in [0, \Omega ]^2. \end{aligned}$$

Note that the set of \(\sigma \)-admissible measures of \({\textbf{Y}}\) characterizes all possible solutions to the inverse problem with the given measurement \({\textbf{Y}}\). If all \(\sigma \)-admissible measures have at least n supports, then detecting the correct source number is possible, for example by targeting at the sparsest admissible measures. However, if there exists one \(\sigma \)-admissible measure with less than n supports, detecting the source number n is impossible without additional prior information. This leads to the following new definition of resolution limit, named computational resolution limit.

Definition 2.2

The computational resolution limit to the number detection problem in two dimensions is defined as the smallest nonnegative number \({\mathscr {D}}_{2,\textrm{num}}\) such that for all n-sparse measures \(\sum _{j=1}^{n}a_{j}\delta _{{\textbf{y}}_j}, {\textbf{y}}_j \in B_{\frac{(n-1)\pi }{6\Omega }, \infty }(\textbf{0})\) and the associated measurement \(\textbf{Y}\) in (2.2), if

$$\begin{aligned} \min _{p\ne j} \left| \left| {{\textbf{y}}_j-{\textbf{y}}_p}\right| \right| _1 \ge {\mathscr {D}}_{2, \textrm{num}}, \end{aligned}$$

then there does not exist any \(\sigma \)-admissible measure of \(\ {\textbf{Y}}\) with less than n supports.

The above resolution limit is termed “computational resolution limit” to distinguish it from the classic Rayleigh limit. Compared to the Rayleigh limit, the definition of the computational resolution limit is more rigorous from the mathematical perspective. It is related to the noise, by which it is more applicable for modern imaging techniques. In [44,45,46], the authors defined similar computational resolution limits and present rigorous estimations for them. Here by the following theorem, we derive an estimate to the \({\mathscr {D}}_{2, \textrm{num}}\), which substantially improves the estimate in [44] for the two-dimensional case.

Theorem 2.1

Let \(n\ge 2\) and the measurement \({\textbf{Y}}\) in (2.2) be generated by an n-sparse measure \(\mu =\sum _{j=1}^{n}a_j\delta _{{\textbf{y}}_j}, {\textbf{y}}_j \in B_{\frac{(n-1)\pi }{6\Omega }, \infty }(\textbf{0})\). Assume that the following separation condition is satisfied

$$\begin{aligned} \min _{p\ne j, 1\le p, j\le n}\Big |\Big |{\textbf{y}}_p- \textbf{y}_j\Big |\Big |_1\ge \frac{16.6\pi (n-1)}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}} \end{aligned}$$
(2.3)

with \(m_{\min }\) being defined in (2.1). Then there does not exist any \(\sigma \)-admissible measures of  \({\textbf{Y}}\) with less than n supports.

Theorem 2.1 reveals that when \(\min _{p\ne j, 1\le p, j\le n}\Big |\Big |{\textbf{y}}_p- \textbf{y}_j\Big |\Big |_1\!\ge \!\frac{16.6\pi (n-1)}{\Omega }\Big (\!\frac{\sigma }{m_{\min }}\!\Big )^{\frac{1}{2n-2}}\), recovering exactly the source number n is possible. Compared with the Rayleigh limit \(\frac{c_2\pi }{\Omega }\), where \(c_2\) is a constant, Theorem 2.1 also indicates that resolving the source number in the sub-Rayleigh regime is theoretically possible if the SNR is sufficiently large.

Moreover, the estimate in Theorem 2.1 substantially improves the result in [44]. To be more specific, in [44], the authors considered recovering the source number and locations from the noisy measurement,

$$\begin{aligned} {\textbf{Y}}(\varvec{\omega }) = {\mathscr {F}} \mu (\varvec{\omega }) + \textbf{W}(\varvec{\omega })= \sum _{j=1}^{n}a_j e^{i {\textbf{y}}_j^\top \varvec{\omega }} + {\textbf{W}}(\varvec{\omega }),\quad \left| \left| {\varvec{\omega }}\right| \right| _2\le \Omega , \end{aligned}$$

where the noise \({\textbf{W}}(\varvec{\omega })\) is bounded as \(\left| {{\textbf{W}}(\varvec{\omega })}\right| <\sigma , \left| \left| {\varvec{\omega }}\right| \right| _2\le \Omega \). The upper bound estimation for the two-dimensional computational resolution limit in number detection is

$$\begin{aligned} \min _{p\ne j, 1\le p, j\le n}\Big |\Big |{\textbf{y}}_p- \textbf{y}_j\Big |\Big |_2 \ge \frac{2.2e\pi n(n-1)}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}. \end{aligned}$$
(2.4)

Therefore, both the model and the norm of separation distances considered in [44] and our paper have no substantial difference that could make the problem in [44] inherently harder. But leveraging the new techniques here, the constant factor in the previous estimate (2.4) is improved to order n (Eq. (2.3)) now. In particular, by a simple calculation (taking into account the model difference), for \(n\ge 2\), our new estimate already becomes better.

On the other hand, it is already known from [44] that the computational resolution limit for the number detection in the general k-dimensional super-resolution problem is bounded below by \(\frac{C_{1}}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}\) for some constant \(C_1\). Thus the \({\mathscr {D}}_{2,\textrm{num}}\) is bounded by

$$\begin{aligned} \frac{C_{1}}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}} \le {\mathscr {D}}_{2,\textrm{num}} \le \frac{C_{2}n}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}, \end{aligned}$$
(2.5)

and our estimate for the upper bound is already very sharp.

The above estimates further indicate a phase transition phenomenon in the two-dimensional number detection problem. Specifically, by (2.5) we expect the presence of a line of slope \(2n-2\) in the parameter space \(\log (\textrm{SRF})-\log (\textrm{SNR})\) above which the source number can be correctly detected in each realization. This phenomenon is confirmed exactly by the number detection algorithm (Algorithm 2) later in Sect. 4.5 and illustrated in Fig. 2.

2.3 Computational Resolution Limit for Location Recovery in the Two-Dimensional Super-Resolution Problem

We next present our results on the resolution limit for the location recovery problem in two dimensions. We first introduce the following concept of \(\delta \)-neighborhood of discrete measures. Define

$$\begin{aligned} B_{\delta , 1}(\textbf{x}):= \Big \{ {\textbf{y}} \ \Big |\ {\textbf{y}}\in {\mathbb {R}}^2,\ ||\textbf{y}- \textbf{x}||_{1}<\delta \Big \}. \end{aligned}$$

Definition 2.3

Let \(\mu =\sum _{j=1}^{n}a_j \delta _{{\textbf{y}}_j}\) be an n-sparse discrete measure in \({\mathbb {R}}^2\) and let \(\delta >0\) be such that the n balls \(B_{\delta , 1}({\textbf{y}}_j), 1\le j \le n\) are pairwise disjoint. We say that \({\hat{\mu }}=\sum _{j=1}^{n}{\hat{a}}_{j}\delta _{{\hat{\textbf{y}}}_j}\) is within \(\delta \)-neighborhood of \(\mu \) if each \({\hat{\textbf{y}}}_j\) is contained in one and only one of the n balls \(B_{\delta , 1}({\textbf{y}}_j), 1\le j \le n\).

According to the above definition, a measure \({\hat{\mu }}\) in a \(\delta \)-neighborhood of \(\mu \) preserves the inner structure of the collection of point sources. For a stable location (or support of measure) recovery algorithm, the output should be a measure in some \(\delta \)-neighborhood of the underlying sources. Moreover, \(\delta \) should tend to zero as the noise level \(\sigma \) tends to zero. We now introduce the computational resolution limit for the support recovery problem. For ease of exposition, we only consider measures supported in \(B_{\frac{(2n-1)\pi }{12\Omega }, \infty }(\textbf{0})\), where n is the source number.

Definition 2.4

The computational resolution limit in the two-dimensional location recovery problem is defined as the smallest non-negative number \({\mathscr {D}}_{2,\textrm{supp}}\) so that for any n-sparse measure \(\mu =\sum _{j=1}^{n}a_j \delta _{{\textbf{y}}_j}, {\textbf{y}}_j \in B_{\frac{(2n-1)\pi }{12\Omega }, \infty }(\textbf{0})\) and the associated measurement \(\textbf{Y}\) in (2.2), if

$$\begin{aligned} \min _{p\ne j, 1\le p,j \le n} \left| \left| {{\textbf{y}}_p-\textbf{y}_j}\right| \right| _1\ge {\mathscr {D}}_{2,\textrm{supp}}, \end{aligned}$$

then there exists \(\delta >0\) such that any \(\sigma \)-admissible measure of \({\textbf{Y}}\) with n supports in \(B_{\frac{(2n-1)\pi }{12\Omega }}({\textbf{0}})\) is within \(\delta \)-neighborhood of \(\mu \).

We have the following estimate for the upper bound of \({\mathscr {D}}_{2,\textrm{supp}}\).

Theorem 2.2

Let \(n\ge 2\). Let the measurement \(\textbf{Y}\) in (2.2) be generated by an n-sparse measure \(\mu =\sum _{j=1}^{n}a_j \delta _{{\textbf{y}}_j}, {\textbf{y}}_j \in B_{\frac{(2n-1)\pi }{12\Omega }, \infty }(\textbf{0})\) in the two-dimensional space. Assume that

$$\begin{aligned} D_{\min }:=\min _{p\ne j}\Big |\Big |{\textbf{y}}_p-\textbf{y}_j\Big |\Big |_1\ge \frac{15.3\pi (n-\frac{1}{2})}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}. \end{aligned}$$
(2.6)

If \({\hat{\mu }}=\sum _{j=1}^{n}{\hat{a}}_{j}\delta _{{\hat{\textbf{y}}}_j}\) supported on \(B_{\frac{(2n-1)\pi }{12\Omega }, \infty }(\textbf{0})\) is a \(\sigma \)-admissible measure of \(\textbf{Y}\), then \({\hat{\mu }}\) is in a \(\frac{D_{\min }}{2}\)-neighborhood of \(\mu \). Moreover, after reordering the \({\hat{\textbf{y}}}_j\)’s, we have

$$\begin{aligned} \Big |\Big |{\hat{\textbf{y}}}_j-{\textbf{y}}_j\Big |\Big |_1\le \frac{C(n)}{\Omega }\textrm{SRF}^{2n-2}\frac{\sigma }{m_{\min }}, \quad 1\le j\le n, \end{aligned}$$
(2.7)

where \(\textrm{SRF}:=\frac{\pi }{D_{\min }\Omega } \) is the super-resolution factor and

$$\begin{aligned} C(n)=\frac{(1+\sqrt{3})^{2n-1}2^{5n-1}(2n-1)^{2n-1}\pi }{3^{2n-0.5}}. \end{aligned}$$

Theorem 2.2 demonstrates that when \(\min _{p\ne j, 1\le p, j\le n}\Big |\Big |{\textbf{y}}_p- \textbf{y}_j\Big |\Big |_1\ge \frac{15.3\pi (n-\frac{1}{2})}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}\), it is possible to recover stably the source locations. For sufficiently large SNR, the limit in Theorem 2.2 is less than the Rayleigh limit. This indicates that super-resolution is possible for two-dimensional imaging problems. Moreover, for all \(n\ge 2\), the estimate here is better than the one obtained in [44], which is

$$\begin{aligned} \frac{11.76e\pi n(n-1)}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}. \end{aligned}$$

Especially, the constant factor in our new estimate is of order n, while in the previous one it is of order \(n^2\).

It has been already shown in [44] that the computational resolution limit for the location recovery in the k-dimensional super-resolution problem is bounded below by \(\frac{C_{3}}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}\) for some constant \(C_3\). Thus the \({\mathscr {D}}_{2,\textrm{supp}}\) is bounded by

$$\begin{aligned} \frac{C_{3}}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}} \le {\mathscr {D}}_{2,\textrm{supp}} \le \frac{C_{4}n}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}, \end{aligned}$$
(2.8)

and our estimate for the upper bound is sharp.

On the other hand, (2.8) indicates a phase transition in the location recovery problem. From (2.8) we expect that there exists a line of slope \(2n-1\) in the parameter space of \(\log \textrm{SRF}-\log \textrm{SNR}\) such that the location recovery is stable in every point above the line. This is confirmed by Algorithm 4 in Sect. 5.4.2 and illustrated in Fig. 4.

Note that the bounds for the minimum separation distance to ensure a correct number detection scale like \(\frac{1}{\Omega }\left( \frac{\sigma }{m_{\min }}\right) ^{\frac{1}{2n-2}}\), while for the minimum separation distance to guarantee a stable location recovery, they scale like \(\frac{1}{\Omega }\left( \frac{\sigma }{m_{\min }}\right) ^{\frac{1}{2n-1}}\). How to understand the difference between the exponents in the estimates is also a very interesting question. One way is from the derivation of the lower bounds of the two resolution limits [43, 45]. In the analysis of the resolution limit in the number detection, we actually consider when the measurement from n sources can be approximated by the measurement generated by \(n-1\) sources, but in the location recovery, we consider when the measurement from n sources can be approximated by the measurement from another n sources. Thus for the number detection, we can use at most \(2n-1\) sources to construct the worst-case scenario, i.e., the measurement generated by specific \(n-1\) sources is close to the measurement from specific n sources. While for the location recovery, we can use 2n sources to construct the worst-case scenario, whence the location recovery has more freedom and becomes worse. This results in a difference between the exponents of the noise-to-signal ratio in both resolution estimates. The interested reader is referred to [43, 45] for detailed proofs of the lower bounds.

2.4 Stability of a Sparsity-Promoting Algorithm

Sparsity-promoting algorithms are popular methods in imaging processing and many other fields. By the results for resolution limit, we can derive a stability result for a \(l_0\)-minimization in the two-dimensional super-resolution problems. We consider the following \(l_0\)-minimization problem:

$$\begin{aligned} \min _{ \rho \text { supported on } {\mathscr {O}}} ||\rho ||_{0} \quad \text {subject to} \quad |{\mathscr {F}}\rho (\varvec{\omega }) -\textbf{Y}(\varvec{\omega })|< \sigma , \quad \varvec{\omega } \in [0,\Omega ]^2, \end{aligned}$$
(2.9)

where \(||\rho ||_{0}\) is the number of Dirac masses representing the discrete measure \(\rho \). As a corollary of Theorems 2.1 and 2.2, we have the following stability result.

Theorem 2.3

Let \(n\ge 2\) and \(\sigma \le m_{\min }\). Let the measurement \(\textbf{Y}\) in (2.2) be generated by an n-sparse measure \(\mu =\sum _{j=1}^{n}a_j \delta _{{\textbf{y}}_j}, {\textbf{y}}_j \in B_{\frac{(n-1)\pi }{6\Omega }, \infty }(\textbf{0})\) in the two-dimensional space. Assume that

$$\begin{aligned} D_{\min }:=\min _{p\ne j}\Big |\Big |{\textbf{y}}_p-\textbf{y}_j\Big |\Big |_1\ge \frac{16.6\pi (n-\frac{1}{2})}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}. \end{aligned}$$
(2.10)

Let \({\mathscr {O}}\) in the minimization problem (2.9) be \(B_{\frac{(n-1)\pi }{6\Omega }, \infty }(\textbf{0})\), then the solution to (2.9) contains exactly n point sources. For any solution \(\hat{\mu }=\sum _{j=1}^{n}{\hat{a}}_{j}\delta _{{\hat{\textbf{y}}}_j}\), it is in a \(\frac{D_{\min }}{2}\)-neighborhood of \(\mu \). Moreover, after reordering the \({\hat{\textbf{y}}}_j\)’s, we have

$$\begin{aligned} \Big |\Big |{\hat{\textbf{y}}}_j-{\textbf{y}}_j\Big |\Big |_1\le \frac{C(n)}{\Omega }\textrm{SRF}^{2n-2}\frac{\sigma }{m_{\min }}, \quad 1\le j\le n, \end{aligned}$$
(2.11)

where \(\textrm{SRF}:=\frac{\pi }{D_{\min }\Omega } \) and

$$\begin{aligned} C(n)=\frac{(1+\sqrt{3})^{2n-1}2^{5n-1}(2n-1)^{2n-1}\pi }{3^{2n-0.5}}. \end{aligned}$$

Theorem 2.3 reveals that sparsity promoting over admissible solutions could resolve the source locations to the resolution limit level. It provides an insight that theoretically sparsity-promoting algorithms would have excellent performance on the two-dimensional super-resolution problems. Especially, under the separation condition (2.10), any tractable sparsity-promoting algorithms (such as total variation minimization algorithms [11]) rendering the sparsest solution could stably reconstruct all the source locations.

3 Proofs of the Main Results

The idea for proving the main results of the paper is to use some new techniques to reduce the two-dimensional super-resolution problem to an approximation problem of specific complex vectors, for which we develop a nonlinear approximation theory in Vandermonde space in Sect. 6. The reduction techniques are mainly based on three crucial observations in the following subsection.

3.1 Three Crucial Observations

We here introduce three crucial observations that reduce the two-dimensional super-resolution problem to its one-dimensional analog, by which we are able to derive the resolution limit theory of this paper. These ideas also inspire a new direction for the DOA algorithms; see Sects. 4 and 5.

Translation invariance By translation invariance we mean that if a measure \({\hat{\mu }} = \sum _{j=1}^q {\hat{a}}_j \delta _{{{\hat{\textbf{y}}}}_j}\) is a \(\sigma \)-admissible measure for the measurement \({\textbf{Y}}\), then \({\hat{\mu }} = \sum _{j=1}^q {\hat{a}}_j \delta _{{{\hat{\textbf{y}}}}_j+\textbf{v}}\) is a \(\sigma \)-admissible measure for the measurement \(e^{i{\textbf{v}}^\top \varvec{\omega }} {\textbf{Y}}(\varvec{\omega })\), for any vector \({\textbf{v}}\in {\mathbb {R}}^2\). More precisely, we have

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j e^{i ({{\hat{\textbf{y}}}}_j+\textbf{v} )^\top \varvec{\omega }} - e^{i{\textbf{v}}^\top \varvec{\omega }} {\textbf{Y}}(\varvec{\omega })}\right| = \left| {\sum _{j=1}^q {\hat{a}}_j e^{i {{\hat{\textbf{y}}}}_j^\top \varvec{\omega }} - {\textbf{Y}}(\varvec{\omega })}\right| < \sigma , \quad \varvec{\omega } \in [0, \Omega ]^2.\nonumber \\ \end{aligned}$$
(3.1)

In addition, if for certain \(\delta \ge 0\),

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j e^{i {{\hat{\textbf{y}}}}_j^\top {\varvec{\omega }}} - \sum _{j=1}^n a_j e^{i {\textbf{y}}_j^\top \varvec{\omega }}}\right| < \delta , \quad \varvec{\omega } \in [0, \Omega ]^2, \end{aligned}$$
(3.2)

then for any vector \({\textbf{v}}\in {\mathbb {R}}^2\),

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j e^{i ({{\hat{\textbf{y}}}}_j+ {\textbf{v}} )^\top \varvec{\omega }} - \sum _{j=1}^n a_j e^{i ({\textbf{y}}_j+{\textbf{v}})^\top \varvec{\omega }}}\right| < \delta , \quad \varvec{\omega } \in [0, \Omega ]^2. \end{aligned}$$

Combination of coordinates

The second observation is that if (3.2) holds, we have a similar estimate for the summation of combinations of \(e^{i \tau {{\hat{\textbf{y}}}}_{j,1}}, e^{i \tau {{\hat{\textbf{y}}}}_{j,2}}\) and \(e^{i \tau {\textbf{y}}_{j,1}}, e^{i \tau {\textbf{y}}_{j,2}}\) for certain \(\tau \). Specifically, we have the following lemma. Throughout the paper, we denote the combinatorial numbers by \({t\atopwithdelims ()t_1}\) for certain \(t\ge t_1\).

Lemma 3.1

For any integer \(t\ge 0\) and \(\tau \le \frac{\Omega }{t}\), the measurement constraint (3.2) implies

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j (e^{ir_1}e^{i \tau {{\hat{\textbf{y}}}}_{j,1}}\!+\!e^{ir_2}e^{i \tau {{\hat{\textbf{y}}}}_{j,2}})^t \!-\! \sum _{j=1}^n a_j (e^{ir_1}e^{i \tau {\textbf{y}}_{j,1}}\!+\!e^{ir_2}e^{i \tau {\textbf{y}}_{j,2}})^t}\right| \!< \!2^{t}\delta , \quad r_1, r_2\in {\mathbb {R}}. \end{aligned}$$

Proof

Let \({\hat{d}}_j = e^{ir_1}e^{i\tau {{\hat{\textbf{y}}}}_{j,1}}+e^{ir_2}e^{i\tau {\hat{\textbf{y}}}_{j,2}}\) and \(d_j = e^{ir_1}e^{i\tau {\textbf{y}}_{j,1}}+e^{ir_2}e^{i\tau {\textbf{y}}_{j,2}}\). We have

$$\begin{aligned}&\left| {\sum _{j=1}^q \hat{a}_j \hat{d}_j^{t} - \sum _{j=1}^n a_j d_j^{t}}\right| = \left| \sum _{j=1}^q \hat{a}_j (e^{ir_1}e^{i \tau {\hat{\textbf{y}}}_{j,1}}+e^{ir_2}e^{i\tau {\hat{\textbf{y}}}_{j,2}})^{t}\right. \\&\qquad \left. - \sum _{j=1}^n a_j (e^{ir_1}e^{i \tau {\textbf{y}}_{j,1}}+e^{ir_2}e^{i\tau {\textbf{y}}_{j,2}})^{t}\right| \\&\quad =\left| \sum _{t_1+t_2=t, 0\le t_1,t_2\le t} {t\atopwithdelims ()t_1} \Big (\sum _{j=1}^q \hat{a}_j e^{ir_1t_1}e^{ir_2t_2}e^{i \tau {\hat{\textbf{y}}}_{j,1} t_1}e^{i\tau {\hat{\textbf{y}}}_{j,2} t_2}\right. \\&\qquad \left. - \sum _{j=1}^n a_je^{ir_1t_1}e^{ir_2t_2} e^{i \tau {\textbf{y}}_{j,1} t_1} e^{i\tau {\textbf{y}}_{j,2}t_2}\Big )\right| \\&\quad \le \sum _{t_1+t_2=t, 0\le t_1,t_2\le t} {t\atopwithdelims ()t_1} \left| {\sum _{j=1}^q \hat{a}_j e^{i \tau {\hat{\textbf{y}}}_{j,1} t_1}e^{i\tau {\hat{\textbf{y}}}_{j,2} t_2}- \sum _{j=1}^n a_j e^{i \tau {\textbf{y}}_{j,1} t_1} e^{i\tau {\textbf{y}}_{j,2}t_2}}\right| \\&\quad = \sum _{t_1+t_2=t, 0\le t_1,t_2\le t} {t\atopwithdelims ()t_1} \left| {\sum _{j=1}^q \hat{a}_j e^{i (t_1\tau , t_2\tau ) {\hat{\textbf{y}}}_{j}}- \sum _{j=1}^n a_j e^{i (t_1\tau , t_2\tau ) {\textbf{y}}_{j}}}\right| \\&\quad < \sum _{t_1+t_2=t, 0\le t_1,t_2\le t} {t\atopwithdelims ()t_1} \delta \quad \Big (\text {by } \tau \le \frac{\Omega }{t} \text { and } (3.2)\Big )\\&\quad = 2^{t}\delta . \end{aligned}$$

\(\square \)

This is the key observation of the paper. It reduces the two-dimensional super-resolution problem to nearly a one-dimensional super-resolution one. Since it is about the difference between summation of combinations of \(e^{i \tau {{\hat{\textbf{y}}}}_{j,1}}, e^{i \tau {\hat{\textbf{y}}}_{j,2}}\) and \(e^{i \tau {\textbf{y}}_{j,1}}, e^{i \tau {\textbf{y}}_{j,2}}\), we refer to this observation as combination of coordinates and call the elements \(e^{i \tau {\textbf{y}}_{j,1}}+e^{i \tau {\textbf{y}}_{j,2}}\) coordinate-combined elements. This coordinate-combination technique will be used in deriving new algorithms for the DOA problem in Sects. 4 and 5.

Compared to the projection techniques in [12, 44] which utilize the measurement constraint only in several one-dimensional spaces to derive stability results, our formulation utilizes more measurement constraints and consequently yields better estimates.

Preservation of the separation distance for the coordinate-combined elements The last observation is that, for \({\varvec{\theta }}_j\)’s in \([0, \frac{2\pi }{3}]^2\), the coordinate-combined elements \(e^{i {\varvec{\theta }}_{j,1}}+ e^{i {\varvec{\theta }}_{j,2}}\) still preserve the separation distance between the \({\varvec{\theta }}_j\)’s. This is revealed by Lemma 3.2. Note that the projection trick in [12, 44] and many conventional two-dimensional DOA algorithms do not preserve the separation distance between the original source. This causes many issues in the reconstruction and resolution estimation. Lemma 3.2 is the main result of this paper by which we can overcome the above issues and hence find a new way to solve two-dimensional DOA problems.

Lemma 3.2

For two different vectors \({\varvec{\theta }}_j \in \mathbb [0, \frac{2\pi }{3}]^2, j=1,2\) with \(\frac{\pi }{3}\le {\varvec{\theta }}_{j,2} - {\varvec{\theta }}_{j,1}\le \frac{2}{3}\pi , j=1,2\), if \(||{\varvec{\theta }}_1 - {\varvec{\theta }}_2||_1\ge \Delta \), then

$$\begin{aligned} \left| {e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}- (e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}})}\right| \ge \frac{3}{2\pi }\Delta . \end{aligned}$$

Proof

Note that \(0\le {\varvec{\theta }}_{j,1}<{\varvec{\theta }}_{j,2}\le \frac{2\pi }{3}, j=1,2\). We prove the lemma by considering the following two cases.

Case 1 \(0\le {\varvec{\theta }}_{1,1} \le {\varvec{\theta }}_{2,1} < {\varvec{\theta }}_{2,2}\le {\varvec{\theta }}_{1,2}\le \frac{2\pi }{3}\).

In this case,

$$\begin{aligned} \left| {e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}- (e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}})}\right| \ge&\left| {e^{i {\varvec{\theta }}_{2,1}}+ e^{i{\varvec{\theta }}_{2,2}}}\right| - \left| {e^{i {\varvec{\theta }}_{1,1}} + e^{i{\varvec{\theta }}_{1,2}}}\right| \\ =&2\Bigg (\cos \left( \frac{\phi _2}{2}\right) - \cos \left( \frac{\phi _1}{2}\right) \Bigg ), \end{aligned}$$

where \(\phi _j = {\varvec{\theta }}_{j,2} - {\varvec{\theta }}_{j,1}, j=1,2\). By the assumption made in the lemma, we have \(\Delta \le \phi _1 -\phi _2 \le \frac{\pi }{3}\). Note also that \(\frac{\pi }{6}\le \frac{\phi _1+\phi _2}{4}\le \frac{\pi }{3}\). Thus

$$\begin{aligned} 2\Bigg (\cos \left( \frac{\phi _2}{2}\right) - \cos \left( \frac{\phi _1}{2}\right) \Bigg )= & {} 4\sin \left( \frac{\phi _1+\phi _2}{4}\right) \sin \left( \frac{\phi _1-\phi _2}{4}\right) \\\ge & {} 4 \sin \left( \frac{\pi }{6}\right) \sin \left( \frac{\Delta }{4}\right) \ge \frac{3\Delta }{2\pi }, \end{aligned}$$

where the last inequality uses \(\sin (\frac{\Delta }{4})\ge \frac{3}{\pi } \frac{\Delta }{4}\) for \(0<\frac{\Delta }{4}\le \frac{\pi }{12}\).

Case 2 \(0\le {\varvec{\theta }}_{1,1} \le {\varvec{\theta }}_{2,1} \le {\varvec{\theta }}_{1,2}\le {\varvec{\theta }}_{2,2}\le \frac{2\pi }{3}\).

The idea is to calculate the angle between \(e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}\) and \(e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}}\). By simple analysis of the angle relations between \(e^{i {\varvec{\theta }}_{1,1}}, e^{i {\varvec{\theta }}_{1,2}}, e^{i {\varvec{\theta }}_{2,1}}, e^{i {\varvec{\theta }}_{2,2}}\), we obtain that the angle between \(e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}\) and \(e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}}\) is \(\frac{{\varvec{\theta }}_{2,1}-{\varvec{\theta }}_{1,1}+ {\varvec{\theta }}_{2,2}-{\varvec{\theta }}_{1,2}}{2}\), which is larger than \(\frac{\Delta }{2}\). Thus

$$\begin{aligned} \left| {e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}- (e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}})}\right| \!\ge \! \max \Big (\left| {e^{i {\varvec{\theta }}_{1,1}}\!+\! e^{i {\varvec{\theta }}_{1,2}}}\right| , \left| {e^{i {\varvec{\theta }}_{2,1}} \!+\! e^{i {\varvec{\theta }}_{2,2}}}\right| \Big )\sin \left( \frac{\Delta }{2}\right) . \end{aligned}$$

Since \(\frac{\pi }{3}\le {\varvec{\theta }}_{j,2} - {\varvec{\theta }}_{j,1}\le \frac{2}{3}\pi , j=1,2\), we have

$$\begin{aligned} \max \Big (\left| {e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}}\right| , \left| {e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}}}\right| \Big ) \ge 1. \end{aligned}$$

Therefore,

$$\begin{aligned} \left| {e^{i {\varvec{\theta }}_{1,1}}+ e^{i {\varvec{\theta }}_{1,2}}- (e^{i {\varvec{\theta }}_{2,1}} + e^{i {\varvec{\theta }}_{2,2}})}\right| \ge \sin \left( \frac{\Delta }{2}\right) \ge \frac{3\Delta }{2\pi }, \end{aligned}$$

where the last inequality uses \(\sin (\frac{\Delta }{2})\ge \frac{3}{\pi }\frac{\Delta }{2}\) for \(0<\frac{\Delta }{2}\le \frac{\pi }{6}\). \(\square \)

3.2 Proof of Theorem 2.1

Proof

The proof of this theorem is by contradiction. Suppose that there exists a measure \({\hat{\mu }} = \sum _{j=1}^q {\hat{a}}_j \delta _{{{\hat{\textbf{y}}}}_j}\) with \(q<n\) which is a \(\sigma \)-admissible measure of \(\textbf{Y}\). Then, by the measurement constraint (2.2) and \(|\textbf{W}(\varvec{\omega })|<\sigma \), we have

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j e^{i {\hat{\textbf{y}}}_j^\top \varvec{\omega }} - \sum _{j=1}^n a_j e^{i {\textbf{y}}_j^\top \varvec{\omega }}}\right| < 2\sigma , \quad \varvec{\omega } \in [0, \Omega ]^2. \end{aligned}$$
(3.3)

Since \({\textbf{y}}_j\in [-\lambda , \lambda ]^2\) with \(\lambda = \frac{(n-1)\pi }{6\Omega }\), by letting \({\textbf{v}}=(0,6\lambda )^\top \), we obtain

$$\begin{aligned} {\textbf{y}}_j+{\textbf{v}}\in [-\lambda ,\lambda ]\times [5\lambda , 7\lambda ]. \end{aligned}$$
(3.4)

On the other hand, by (3.3) we also get

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j e^{i ({\hat{\textbf{y}}}_j+{\textbf{v}} )^\top {\varvec{\omega }}} - \sum _{j=1}^n a_j e^{i ({\textbf{y}}_j+{\textbf{v}})^\top {\varvec{\omega }}}}\right| < 2\sigma , \quad \varvec{\omega } \in [0, \Omega ]^2. \end{aligned}$$

Thus with a slight abuse of notation, we still denote those \({\hat{\textbf{y}}}_j+{\textbf{v}}\) and \({\textbf{y}}_j+{\textbf{v}}\) by \( {\hat{\textbf{y}}}_j\), \({\textbf{y}}_j\) respectively and consider them in the rest of the proof. Note that we have

$$\begin{aligned} {\textbf{y}}_j \in [-\lambda ,\lambda ]\times [5\lambda , 7\lambda ], \quad j =1, \ldots , n. \end{aligned}$$

Let \(\tau = \frac{\Omega }{2(n-1)}\), together with \(\lambda = \frac{(n-1)\pi }{6\Omega }\), we have \(\tau {\textbf{y}}_j \in [-\frac{\pi }{12},\frac{\pi }{12}]\times [\frac{5\pi }{12}, \frac{7\pi }{12}]\). This yields

$$\begin{aligned} -\frac{\pi }{12}\le \tau {\textbf{y}}_{j,1}\le \frac{\pi }{12},\quad \frac{5\pi }{12}\le \tau {\textbf{y}}_{j,2}\le \frac{7\pi }{12}, \quad \frac{\pi }{3}\le \tau {\textbf{y}}_{j,2}- \tau {\textbf{y}}_{j,1} \le \frac{2\pi }{3}. \end{aligned}$$
(3.5)

On the other hand, let \({\hat{d}}_j = e^{i\tau {\hat{\textbf{y}}}_{j,1}}+e^{i\tau {\hat{\textbf{y}}}_{j,2}}\) and \(d_j = e^{i\tau {\textbf{y}}_{j,1}}+e^{i\tau {\textbf{y}}_{j,2}}\). By Lemma 3.1 and (3.3) we have that

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j {\hat{d}}_j^{t} - \sum _{j=1}^n a_j d_j^{t}}\right| < 2^{t+1} \sigma , \quad t =0, 1, \ldots , 2n-2. \end{aligned}$$
(3.6)

Let

$$\begin{aligned} {\textbf{b}}= & {} \Bigg (\sum _{j=1}^q {\hat{a}}_j {\hat{d}}_j^{0} - \sum _{j=1}^n a_j d_j^{0}, \quad \sum _{j=1}^q {\hat{a}}_j {\hat{d}}_j^{1} - \sum _{j=1}^n a_j d_j^{1}, \quad \ldots , \\{} & {} \quad \sum _{j=1}^q {\hat{a}}_j {\hat{d}}_j^{2n-2} - \sum _{j=1}^n a_j d_j^{2n-2}\Bigg )^\top . \end{aligned}$$

Since (3.5) holds, Lemma 3.2 yields

$$\begin{aligned} d_{\min }&:=\min _{p\ne q}\left| {d_p - d_q}\right| \ge \frac{3}{2\pi } \min _{p\ne q}\tau \Big |\Big |{\textbf{y}}_p - {\textbf{y}}_q\Big |\Big |_1> 12.4 \Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}\\&> 2\sqrt{6(1+\sqrt{3})}\Big (\frac{4}{\sqrt{3}} \frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}, \end{aligned}$$

where the second to the last inequality is due to the separation condition (2.3). On the other hand, we have \(|{\hat{d}}_p|\le 2, p=1, \ldots , q\) and \(|d_j|\le \sqrt{3}, j=1, \ldots , n\) since (3.5) holds. Thus we can apply Theorem 6.2 and get

$$\begin{aligned} ||{\textbf{b}}||_2 \ge \frac{m_{\min }(d_{\min })^{2n-2}}{(2(1+2)(1+\sqrt{3}))^{(n-1)}} > \frac{4^n \sigma }{\sqrt{3}}. \end{aligned}$$

Note that the results and proofs of Theorem 6.2 are presented in Sect. 6, but the readers can first focus only on the results before going through their detailed proofs. However, (3.6) implies that \(||{\textbf{b}}||_2 < \frac{4^n \sigma }{\sqrt{3}}\), which is a contradiction. This proves the theorem. \(\square \)

3.3 Proof of Theorem 2.2

Proof

Note that \({\textbf{y}}_j, {\hat{\textbf{y}}}_j\)’s are in \([-\lambda , \lambda ]^2\) with \(\lambda = \frac{(2n-1)\pi }{12\Omega }\) and \(\hat{\mu }= \sum _{j=1}^n {\hat{a}}_j \delta _{{\hat{\textbf{y}}}_j}\) is a \(\sigma \)-admissible measure of \({\textbf{Y}}\). Let \(\tau = \frac{\Omega }{2n-1}\). Similarly to the proof in the above section, we can construct \({\textbf{x}}_j = {\textbf{y}}_j+{\textbf{v}}, {\hat{\textbf{x}}}_j = {\hat{\textbf{y}}}_j+{\textbf{v}}\) so that \(\tau {\hat{\textbf{x}}}_j, \tau {\textbf{x}}_j \in [-\frac{\pi }{12},\frac{\pi }{12}]\times [\frac{5\pi }{12}, \frac{7\pi }{12}]\) and

$$\begin{aligned} \left| {\sum _{j=1}^n {\hat{a}}_j e^{i {\hat{\textbf{x}}}_j^\top \varvec{\omega }} - \sum _{j=1}^n a_j e^{i {\textbf{x}}_j^\top {\varvec{\omega }}}}\right| < 2\sigma , \quad \varvec{\omega } \in [0, \Omega ]^2. \end{aligned}$$
(3.7)

Thus we have

$$\begin{aligned} -\frac{\pi }{12}\le & {} \tau {\textbf{x}}_{j,1}\le \frac{\pi }{12},\quad \frac{5\pi }{12}\le \tau {\textbf{x}}_{j,2}\le \frac{7\pi }{12}, \quad \frac{\pi }{3}\le \tau {\textbf{x}}_{j,2}- \tau {\textbf{x}}_{j,1} \le \frac{2\pi }{3},\nonumber \\ \end{aligned}$$
(3.8)
$$\begin{aligned} -\frac{\pi }{12}\le & {} \tau {\hat{\textbf{x}}}_{j,1}\le \frac{\pi }{12},\quad \frac{5\pi }{12}\le \tau {\hat{\textbf{x}}}_{j,2}\le \frac{7\pi }{12}, \quad \frac{\pi }{3}\le \tau {\hat{\textbf{x}}}_{j,2}- \tau {\hat{\textbf{x}}}_{j,1} \le \frac{2\pi }{3}.\nonumber \\ \end{aligned}$$
(3.9)

Moreover, it follows that

$$\begin{aligned} -\frac{\pi }{12}\le & {} \tau {\textbf{x}}_{j,1}\le \frac{\pi }{12},\quad \frac{-7\pi }{12}\le \tau {\textbf{x}}_{j,2}-\pi \le \frac{-5\pi }{12}, \quad \frac{\pi }{3}\nonumber \\\le & {} \tau {\textbf{x}}_{j,1}- (\tau {\textbf{x}}_{j,2}-\pi ) \le \frac{2\pi }{3}, \end{aligned}$$
(3.10)
$$\begin{aligned} -\frac{\pi }{12}\le & {} \tau {\hat{\textbf{x}}}_{j,1}\le \frac{\pi }{12},\quad \frac{-7\pi }{12}\le \tau {\hat{\textbf{x}}}_{j,2}-\pi \le \frac{-5\pi }{12}, \quad \frac{\pi }{3}\nonumber \\\le & {} \tau {\hat{\textbf{x}}}_{j,1}- (\tau {\hat{\textbf{x}}}_{j,2}-\pi ) \le \frac{2\pi }{3}. \end{aligned}$$
(3.11)

Let \({\hat{d}}_j = e^{i\tau {\hat{\textbf{x}}}_{j,1}}+e^{i\tau {\hat{\textbf{x}}}_{j,2}}, d_j = e^{i\tau {\textbf{x}}_{j,1}}+e^{i\tau {\textbf{x}}_{j,2}}\) and \({\hat{g}}_j = e^{i\tau {\hat{\textbf{x}}}_{j,1}}+e^{i(\tau {\hat{\textbf{x}}}_{j,2}-\pi )}, g_j = e^{i\tau {\textbf{x}}_{j,1}}+e^{i(\tau {\textbf{x}}_{j,2}-\pi )}\). By (3.7) and Lemma 3.1, we arrive at

$$\begin{aligned}&\left| {\sum _{j=1}^n {\hat{a}}_j {\hat{d}}_j^{t} - \sum _{j=1}^n a_j d_j^{t}}\right| < 2^{t+1} \sigma , \quad t=0,1, \ldots , 2n-1, \end{aligned}$$
(3.12)
$$\begin{aligned}&\left| {\sum _{j=1}^n {\hat{a}}_j {\hat{g}}_j^{t} - \sum _{j=1}^n a_j g_j^{t}}\right| < 2^{t+1} \sigma , \quad t =0, 1, \ldots , 2n-1. \end{aligned}$$
(3.13)

Let

$$\begin{aligned} {\textbf{d}}= & {} \Bigg (\sum _{j=1}^n {\hat{a}}_j {\hat{d}}_j^{0} - \sum _{j=1}^n a_j d_j^{0}, \quad \sum _{j=1}^n {\hat{a}}_j {\hat{d}}_j^{1} - \sum _{j=1}^n a_j d_j^{1}, \quad \ldots ,\\{} & {} \quad \sum _{j=1}^n {\hat{a}}_j {\hat{d}}_j^{2n-1} - \sum _{j=1}^n a_j d_j^{2n-1}\Bigg )^\top , \end{aligned}$$

and

$$\begin{aligned} {\textbf{g}}= & {} \Bigg (\sum _{j=1}^n {\hat{a}}_j {\hat{g}}_j^{0} - \sum _{j=1}^n a_j g_j^{0}, \quad \sum _{j=1}^n {\hat{a}}_j {\hat{g}}_j^{1} - \sum _{j=1}^n a_j g_j^{1}, \quad \ldots ,\\{} & {} \quad \sum _{j=1}^n {\hat{a}}_j {\hat{g}}_j^{2n-1} - \sum _{j=1}^n a_j g_j^{2n-1}\Bigg )^\top . \end{aligned}$$

Equations (3.12) and (3.13) imply respectively

$$\begin{aligned} ||{\textbf{d}}||_2< \frac{2^{2n+1} \sigma }{\sqrt{3}}, \quad ||{\textbf{g}}||_2 < \frac{2^{2n+1} \sigma }{\sqrt{3}}. \end{aligned}$$

Note also that by (3.8), (3.9), (3.10), and (3.11), we get

$$\begin{aligned} |{\hat{d}}_j|, |d_j|, |{\hat{g}}_j|, |g_j|\le \sqrt{3},\quad j=1, \ldots , n. \end{aligned}$$

Define \(d_{\min }:=\min _{p\ne q}\left| {d_p - d_q}\right| \) and \(g_{\min }:=\min _{p\ne q}\left| {g_p - g_q}\right| \). Applying Theorem 6.2, we thus have that

$$\begin{aligned} \Big |\Big |\eta _{n,n}(d_1,\ldots , d_n, {\hat{d}}_1, \ldots , {\hat{d}}_n)\Big |\Big |_{\infty }<\frac{(1+\sqrt{3})^{2n-1}}{ d_{\min }^{n-1}} \frac{2^{2n+1} }{\sqrt{3}}\frac{\sigma }{m_{\min }}, \end{aligned}$$
(3.14)

and

$$\begin{aligned} \Big |\Big |\eta _{n,n}(g_1,\ldots , g_n, {\hat{g}}_1, \ldots , {\hat{g}}_n)\Big |\Big |_{\infty }<\frac{(1+\sqrt{3})^{2n-1}}{ g_{\min }^{n-1}} \frac{2^{2n+1} }{\sqrt{3}}\frac{\sigma }{m_{\min }}, \end{aligned}$$
(3.15)

where \(\eta _{n,n}(\ldots )\)’s are vectors defined as in (6.10). We now demonstrate that we can reorder \({\hat{d}}_j, {\hat{g}}_j\) to have \(|{\hat{d}}_j -d_j|< \frac{d_{\min }}{2}\) and \(|{\hat{g}}_j -g_j|< \frac{g_{\min }}{2}, j=1, \ldots , n\). First, since (3.8) and (3.10) hold, by Lemma 3.2 we have

$$\begin{aligned} d_{\min }\ge \frac{3}{2\pi } \min _{p\ne q}\tau \Big |\Big |{\textbf{y}}_p - {\textbf{y}}_q\Big |\Big |_1 \ge 11.475 \Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}> 2^{3/2}(1+\sqrt{3})\Big (\frac{2^{5/2}}{\sqrt{3}} \frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}, \end{aligned}$$
(3.16)

and

$$\begin{aligned} g_{\min }\ge \frac{3}{2\pi } \min _{p\ne q}\tau \Big |\Big |{\textbf{y}}_p - {\textbf{y}}_q\Big |\Big |_1 \ge 11.475 \Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}> 2^{3/2}(1+\sqrt{3})\Big (\frac{2^{5/2}}{\sqrt{3}} \frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}, \end{aligned}$$
(3.17)

where we also use separation condition (2.6) in the above derivation. Let

$$\begin{aligned} \epsilon _d = \frac{(1+\sqrt{3})^{2n-1}}{ d_{\min }^{n-1}} \frac{2^{2n+1}}{\sqrt{3}}\frac{\sigma }{m_{\min }},\quad \epsilon _g = \frac{(1+\sqrt{3})^{2n-1}}{ g_{\min }^{n-1}}\frac{2^{2n+1} }{\sqrt{3}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

By (3.16),

$$\begin{aligned} d_{\min }^{2n-1}\ge \frac{(1+\sqrt{3})^{2n-1}2^{3n+1}}{\sqrt{3}} \frac{\sigma }{m_{\min }}, \ \text {or equivalently}, d_{\min }^{n}\ge 2^n \epsilon _d, \end{aligned}$$

and by (3.17),

$$\begin{aligned} g_{\min }^{2n-1}\ge \frac{(1+\sqrt{3})^{2n-1}2^{3n+1}}{\sqrt{3}} \frac{\sigma }{m_{\min }}, \ \text {or equivalently}, g_{\min }^{n}\ge 2^n \epsilon _g. \end{aligned}$$

Thus the conditions of Lemma 6.8 are satisfied. By Lemma 6.8, we have that after reordering \({\hat{d}}_j, {\hat{g}}_j\),

$$\begin{aligned} \Big |{\hat{d}}_j- d_j\Big |< \frac{d_{\min }}{2},\quad \Big |{\hat{g}}_j- g_j\Big |< \frac{g_{\min }}{2}, \end{aligned}$$

and

$$\begin{aligned}&\left| {{\hat{d}}_j -d_j}\right| \le \Big (\frac{2}{d_{\min }}\Big )^{n-1}\epsilon _d = \Big (\frac{1}{d_{\min }}\Big )^{2n-2}\frac{(1+\sqrt{3})^{2n-1}2^{3n}}{\sqrt{3}}\frac{\sigma }{m_{\min }},\\&\left| {{\hat{g}}_j -g_j}\right| \le \Big (\frac{2}{g_{\min }}\Big )^{n-1}\epsilon _g= \Big (\frac{1}{g_{\min }}\Big )^{2n-2} \frac{(1+\sqrt{3})^{2n-1}2^{3n}}{\sqrt{3}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

Observing

$$\begin{aligned} e^{i\tau {\hat{\textbf{x}}}_{j,1}}- e^{i\tau {\textbf{x}}_{j, 1}}= & {} \frac{1}{2} \Big ({\hat{d}}_j -d_j+ {\hat{g}}_j -g_j\Big ),\nonumber \\ e^{i\tau {\hat{\textbf{x}}}_{j,2}}- e^{i \tau {\textbf{x}}_{j, 2}}= & {} \frac{1}{2} \Big ({\hat{d}}_j -d_j- ({\hat{g}}_j -g_j)\Big ), \end{aligned}$$
(3.18)

we conclude that

$$\begin{aligned} \Big |e^{i\tau {\hat{\textbf{x}}}_{j,1}}- e^{i\tau {\textbf{x}}_{j, 1}}\Big |+\Big |e^{i\tau {\hat{\textbf{x}}}_{j,2}}- e^{i\tau {\textbf{x}}_{j, 2}}\Big |\le & {} \Big (\Big (\frac{1}{d_{\min }}\Big )^{2n-2} + \Big (\frac{1}{g_{\min }}\Big )^{2n-2}\Big )\nonumber \\{} & {} \frac{(1+\sqrt{3})^{2n-1}2^{3n}}{\sqrt{3}}\frac{\sigma }{m_{\min }}. \end{aligned}$$
(3.19)

On the other hand, by (3.8) and (3.9),

$$\begin{aligned} |{\hat{\textbf{x}}}_{j,1}- {\textbf{x}}_{j, 1}|\le \frac{\pi }{6} \quad \text{ and } \quad |{\hat{\textbf{x}}}_{j,2}- {\textbf{x}}_{j, 2}|\le \frac{\pi }{6}. \end{aligned}$$

We further have

$$\begin{aligned}&\tau \Big |{\hat{\textbf{x}}}_{j,1}- {\textbf{x}}_{j, 1}\Big |+ \tau \Big |{\hat{\textbf{x}}}_{j,2}- {\textbf{x}}_{j, 2}\Big | \le \frac{\pi }{3}\Big (\Big |e^{i{\hat{\textbf{x}}}_{j,1}}- e^{i{\textbf{x}}_{j, 1}}\Big |+\Big |e^{i{\hat{\textbf{x}}}_{j,2}}- e^{i{\textbf{x}}_{j, 2}}\Big |\Big )\\&\quad \le \Big (\Big (\frac{1}{d_{\min }}\Big )^{2n-2} + \Big (\frac{1}{g_{\min }}\Big )^{2n-2}\Big ) \frac{(1+\sqrt{3})^{2n-1}2^{3n}\pi }{3\sqrt{3}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

Recalling that \(\tau =\frac{\Omega }{2n-1}\), we have

$$\begin{aligned} \Big |{\hat{\textbf{x}}}_{j,1}- {\textbf{x}}_{j, 1}\Big |+ \Big |{\hat{\textbf{x}}}_{j,2}- {\textbf{x}}_{j, 2}\Big |&\le \frac{2n-1}{\Omega } \Big (\Big (\frac{1}{d_{\min }}\Big )^{2n-2} + \Big (\frac{1}{g_{\min }}\Big )^{2n-2}\Big )\\&\quad \frac{(1+\sqrt{3})^{2n-1}2^{3n}\pi }{3\sqrt{3}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

Note that by (3.16), we obtain that

$$\begin{aligned} D_{\min }\le \frac{2\pi (2n-1)}{3\Omega }d_{\min } \quad \text{ and } \quad D_{\min }\le \frac{2\pi (2n-1)}{3\Omega }g_{\min }. \end{aligned}$$

Thus

$$\begin{aligned} \left| \left| {{\hat{\textbf{x}}}_j-{\textbf{x}}_j}\right| \right| _1 \le&\frac{(1+\sqrt{3})^{2n-1}2^{3n+1}\pi (2n-1)}{3\sqrt{3}\Omega } \Big (\frac{2(2n-1)}{3}\Big )^{2n-2} \Big (\frac{\pi }{\Omega D_{\min }}\Big )^{2n-2} \frac{\sigma }{m_{\min }}\\ =&\frac{(1+\sqrt{3})^{2n-1}2^{5n-1}(2n-1)^{2n-1}\pi }{3^{2n-0.5}\Omega } \Big (\frac{\pi }{\Omega D_{\min }}\Big )^{2n-2} \frac{\sigma }{m_{\min }}. \end{aligned}$$

Since \(||{\hat{\textbf{y}}}_{j}- {\textbf{y}}_{j}||_1 = ||{\hat{\textbf{x}}}_{j}- {\textbf{x}}_{j}||_1 \), we further get

$$\begin{aligned} \left| \left| {{\hat{\textbf{y}}}_j-{\textbf{y}}_j}\right| \right| _1 \le \frac{(1+\sqrt{3})^{2n-1}2^{5n-1}(2n-1)^{2n-1}\pi }{3^{2n-0.5}\Omega } \Big (\frac{\pi }{\Omega D_{\min }}\Big )^{2n-2} \frac{\sigma }{m_{\min }}. \end{aligned}$$

Since \(D_{\min }\ge \frac{15.3\pi (n-0.5)}{\Omega } \Big ( \frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-1}}\), together with the above estimate, we can also show that

$$\begin{aligned} \left| \left| {{\hat{\textbf{y}}}_j-{\textbf{y}}_j}\right| \right| _1< \frac{D_{\min }}{2}. \end{aligned}$$

This completes the proof. \(\square \)

3.4 Discussion on the Generalization to Higher Dimensions

The techniques in this paper seem to have the potential to improve the estimates of resolution limits in higher dimensions in [44]. First, it is not difficult to see that translation invariance and combination of coordinates discussed in Sect. 3.1 can be generalized to any k-dimensional space. After that, the k-dimensional super-resolution problems can be transformed into the following approximation:

$$\begin{aligned} \left| \sum _{j=1}^q \hat{a}_j \hat{d}_j^t-\sum _{j=1}^n a_j d_j^t\right| <C \sigma , \quad t=0,1, \ldots , s(n), \end{aligned}$$
(3.20)

where \(\hat{a}_j \in {\mathbb {C}}, \hat{d}_j \in {\mathbb {C}}, a_j \in {\mathbb {C}}, d_j\in {\mathbb {C}}\) and s(n) are \(2n-2\) or \(2n-1\). Note that \(d_j, {\hat{d}}_j\)’s are from the combination of coordinates as in the definition in Sect. 3.2 (after Eq. (3.5)). Problem (3.20) is still what is considered in Sect. 6 and the results there can be directly applied. The only requirement now is that the separation distance between the \(d_j\)’s should be large enough to ensure that we can detect the correct source number or stably recover the \(d_j\)’s from constraint (3.20). Thus, the main obstacle is to generalize Lemma 3.2 to the k-dimensional space, which says that when the sources \({\textbf{y}}_j\)’s are well-separated, the \(d_j\)’s should also preserve the separation distance to some extent. This will help to generalize Theorems 2.1 and 2.2 to the k-dimensional space and improve significantly the estimates of resolution limits in higher dimensions.

However, a direct generalization of Lemma 3.2 is already hard by the current method where we should analyze the geometry of the problem case by case. In the future, we seek to find new ways to get around this delicate issue and solve the problem in all dimensional spaces in a unified way.

On the other hand, the problem in k-dimensional space is actually more than just a direct generalization of Lemma 3.2. In particular, in order to analyze the stability of the location recovery in the k-dimensional space, we must not only prove that

$$\begin{aligned} \left| e^{i \varvec{\theta }_{1,1}}+e^{i \varvec{\theta }_{1,2}}+\cdots +e^{i \varvec{\theta }_{1,k}}-\left( e^{i \varvec{\theta }_{2,1}}+e^{i \varvec{\theta }_{2,2}}+\cdots + e^{i \varvec{\theta }_{2,k}}\right) \right| \ge c(k) \Delta \end{aligned}$$

holds for some constant c(k) when \(\left\| \varvec{\theta }_1-\varvec{\theta }_2\right\| _1 \ge \Delta \), but also show that

$$\begin{aligned} \begin{array}{c} \left| e^{i \varvec{\theta }_{1,1}}-e^{i \varvec{\theta }_{1,2}}+\cdots +e^{i \varvec{\theta }_{1,k}}-\left( e^{i \varvec{\theta }_{2,1}}-e^{i \varvec{\theta }_{2,2}}+\cdots + e^{i \varvec{\theta }_{2,k}}\right) \right| \ge c(k) \Delta ,\\ \left| e^{i \varvec{\theta }_{1,1}}+e^{i \varvec{\theta }_{1,2}}-\cdots +e^{i \varvec{\theta }_{1,k}}-\left( e^{i \varvec{\theta }_{2,1}}+e^{i \varvec{\theta }_{2,2}}-\cdots + e^{i \varvec{\theta }_{2,k}}\right) \right| \ge c(k) \Delta , \\ \vdots \\ \left| e^{i \varvec{\theta }_{1,1}}+e^{i \varvec{\theta }_{1,2}}+\cdots -e^{i \varvec{\theta }_{1,k}}-\left( e^{i \varvec{\theta }_{2,1}}+e^{i \varvec{\theta }_{2,2}}+\cdots - e^{i \varvec{\theta }_{2,k}}\right) \right| \ge c(k) \Delta \\ \end{array} \end{aligned}$$

under the same condition. This is because we need to construct more complex values like \(d_j, g_j\) (defined after Eq. (3.11)) and use the above results to obtain estimates such as (3.16) and (3.17) in order to get the stability of the reconstruction of each coordinate by treatments like (3.18) and (3.19). Therefore, the generalization of our results to higher dimensions is quite complicated. This definitely deserves further researches. It may inspire new algorithms in solving imaging and DOA problems in k-dimensional spaces. As discussed, the generalization and treatment in the k-dimensional space are still unclear, so we can not comment further on the extensions of our algorithms (proposed in Sects. 4 and 5) in higher dimensions.

4 An Algorithm for the Model Order Detection in Two-Dimensional DOA Estimation

In this section, based on the observations made in Sect. 3.1, we propose a new algorithm, named coordinate-combination-based sweeping singular-value-thresholding number detection algorithm, for the model order detection in two-dimensional DOA estimations.

Fig. 1
figure 1

The geometry of a uniform rectangular array

4.1 Problem Formulation

The existing two-dimensional DOA algorithms usually try to estimate the azimuth and elevation angles \((\theta _j, \phi _j)\)’s that are shown in Fig. 1. More precisely, we consider n narrowband signals/sources impinging on an \((\Omega +1)\times (\Omega +1)\) uniform rectangular array (URA) with \((\Omega +1)^2\) well calibrated and identically polarized antenna elements. The signal received by these antenna elements in a single snapshot can be expressed by

$$\begin{aligned} {\textbf{Y}}(\varvec{\omega }) = \sum _{j=1}^n s_j p_j e^{j k d_x{\varvec{\omega }}_1{\textbf{y}}_{j,1}}e^{j k d_y{\varvec{\omega }}_2{\textbf{y}}_{j,2}} +{\textbf{W}}({\varvec{\omega }}), \quad \varvec{\omega }\in \left\{ 0,1, \ldots , \Omega \right\} ^2, \end{aligned}$$
(4.1)

where \(s_j\) is the jth incident signal, \(p_j\) is a complex constant denoting the signal/antenna polarization mismatch, k represents the wavenumber of the carrier frequency, and \(d_x\) and \(d_y\) denote the distance between adjacent antenna element along the x-axis and y-axis, respectively. \({\textbf{y}}_{j,1}=\sin \phi _j \cos \theta _j\) is the direction component of signal \(s_j\) propagating along the x-axis and \({\textbf{y}}_{j,2}=\sin \phi _j \sin \theta _j\) is the one propagating along the y-axis. The \(\phi _j\) and \(\theta _j\) denote respectively the elevation and azimuth angles of \(s_j\). \({\textbf{W}}(\varvec{\omega })\) is the additive noise, which is usually assumed to be white Gaussian noise. Note that \(p_j, k, d_x, d_y\)’s are known factors in practical applications.

For convenience, we consider the following simplified form of (4.1):

$$\begin{aligned} {\textbf{Y}}(\varvec{\omega }) = \sum _{j=1}^{n}a_j e^{i {\textbf{y}}_j^\top \varvec{\omega }} + {\textbf{W}}(\varvec{\omega }),\qquad \ \varvec{\omega }\in \left\{ 0,1, \ldots , \Omega \right\} ^2, \end{aligned}$$
(4.2)

where \({\textbf{W}}\) is the noise with \(\left| \textbf{W}(\varvec{\omega })\right| < \sigma , \ \varvec{\omega }\in \left\{ 0,1, \ldots , \Omega \right\} ^2,\) and \(\sigma \) being the noise level. We aim to recover stably the number of the signals and the \({\textbf{y}}_j\)’s, by which the elevation and azimuth angles are stably resolved. For a better exposition, we still consider a discrete measure \(\mu = \sum _{j=1}^n a_j \delta _{{\textbf{y}}_j}\) and denote the \(a_j \delta _{{\textbf{y}}_j}\)’s as sources. The measurement (4.2) can be viewed as the noisy Fourier data of the measure \(\mu \) at some discrete points. To simplify the notation, here we still denote the measurements and noise by respectively \({\textbf{Y}}\) and \({\textbf{W}}\) and now they can be viewed as functions who only take nonzeros values at \(\varvec{\omega }\in \left\{ 0,1, \ldots , \Omega \right\} ^2\). This will not result in any confusion.

In this section and the next one, we shall propose new algorithms for detecting the model order and recovering the supports of \(\mu \) from the measurement (4.2). Our number detection method is based on thresholding on a Hankel matrix assembled by data from modifications of (4.2). The following subsection shall introduce the details of the Hankel matrix formulation. We refer to [2, 3, 13, 27, 28, 36, 44, 45, 58, 62, 75, 76] for other model detecting algorithms.

4.2 Hankel Matrix Construction

The Hankel matrix is constructed by the following three steps.

Measurement modification by source translation

Due to the translation invariance, suppose the sources are supported in \([-\lambda , \lambda ]^2\), we consider them displacing with a vector \({\textbf{v}}\) and get that \({\textbf{x}}_j = {\textbf{y}}_j +{\textbf{v}}\). Using a simple measurement modification technique, we obtain the measurement for the new source \({\tilde{\mu }} = \sum _{j=1}^n a_j \delta _{{\textbf{x}}_j}\). Specifically, we consider

$$\begin{aligned} \begin{aligned} {\textbf{X}}(\varvec{\omega }) =&e^{i {\textbf{v}}^\top \varvec{\omega }} {\textbf{Y}}(\varvec{\omega }) = \sum _{j=1}^{n}a_j e^{i ({\textbf{y}}_j +{\textbf{v}})^\top \varvec{\omega }} + e^{i {\textbf{v}}^\top \varvec{\omega }} {\textbf{W}}(\varvec{\omega })\\ =&\sum _{j=1}^{n}a_j e^{i {\textbf{x}}_j^\top \varvec{\omega }} + {\tilde{\textbf{W}}}(\varvec{\omega }), \quad \varvec{\omega }\in \{0,1, \ldots , \Omega \}^2, \end{aligned} \end{aligned}$$
(4.3)

with \(|\tilde{\textbf{W}}(\varvec{\omega })|< \sigma \).

Measurement modification by coordinate-combination The second procedure consists in modifying the measurement based on coordinate-combination. For \(s >0\), let \(r = \frac{\Omega }{2\,s}\). From the measurement \({\textbf{X}}\), we construct a list of new data given by

$$\begin{aligned} {\textbf{D}}(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} {\textbf{X}}({\varvec{\omega }}_{rt_1, rt_2}), \quad t=0, \ldots , 2s, \end{aligned}$$

where \({\varvec{\omega }}_{rt_1, rt_2} = (rt_1, rt_2)^\top \). Note that

$$\begin{aligned} {\textbf{D}}(t) =&\sum _{j=1}^n a_j(e^{i {\textbf{x}}_{j,1}r}+ e^{i {\textbf{x}}_{j,2}r})^{t} + \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} {\tilde{\textbf{W}}}({\varvec{\omega }}_{rt_1, rt_2})\\ =&\sum _{j=1}^na_j(e^{i {\textbf{x}}_{j,1}r}+ e^{i {\textbf{x}}_{j,2}r})^{t} + {{\hat{\textbf{W}}}}(t), \end{aligned}$$

where \( {{\hat{\textbf{W}}}}(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} {\tilde{\textbf{W}}}({\varvec{\omega }}_{rt_1, rt_2})\).

Hankel matrix construction and singular value decomposition Finally, from these \(\textbf{D}(t)\)’s, we assemble the following Hankel matrix

$$\begin{aligned} {\textbf{H}}(s)=\begin{pmatrix} {\textbf{D}}(0) &{}{\textbf{D}}(1)&{}\cdots &{} {\textbf{D}}(s)\\ {\textbf{D}}(1)&{}{\textbf{D}}(2)&{}\cdots &{}{\textbf{D}}(s+1)\\ \cdots &{}\cdots &{}\ddots &{}\cdots \\ {\textbf{D}}(s)&{}{\textbf{D}}(s+1)&{}\cdots &{}{\textbf{D}}(2s) \end{pmatrix}. \end{aligned}$$
(4.4)

We observe that \({\textbf{H}}(s)\) has the decomposition

$$\begin{aligned} {\textbf{H}}(s)= BAB^T+\Delta , \end{aligned}$$
(4.5)

where \(A=\text {diag}(a_1, \ldots , a_n)\) and \(B=\big (\phi _{s}(e^{i {\textbf{x}}_{1,1}r}+ e^{i {\textbf{x}}_{1,2}r}), \ldots , \phi _{s}(e^{i {\textbf{x}}_{n,1}r}+ e^{i {\textbf{x}}_{n,2}r})\big )\) with \(\phi _{s}(\omega )\) being defined as

$$\begin{aligned} \phi _{s}(\omega ) = (1, \omega , \ldots , \omega ^s)^\top , \end{aligned}$$
(4.6)

and

$$\begin{aligned} \Delta = \begin{pmatrix} {\hat{\textbf{W}}}(0)&{}{\hat{\textbf{W}}}(1)&{}\cdots &{} {\hat{\textbf{W}}}(s)\\ {\hat{\textbf{W}}}(1)&{}{\hat{\textbf{W}}}(2)&{}\cdots &{}{\hat{\textbf{W}}}(s+1)\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ {\hat{\textbf{W}}}(s)&{}{\hat{\textbf{W}}}(s+1)&{}\cdots &{}{\hat{\textbf{W}}}(2s) \end{pmatrix}. \end{aligned}$$
(4.7)

We denote the singular value decomposition of \({\textbf{H}}(s)\) as

$$\begin{aligned} {\textbf{H}}(s)={\hat{U}}{\hat{\Sigma }} {\hat{U}}^*, \end{aligned}$$

where \({\hat{\Sigma }} =\text {diag}({\hat{\sigma }}_1,\ldots , {\hat{\sigma }}_n, {\hat{\sigma }}_{n+1},\ldots ,{\hat{\sigma }}_{s+1})\) with the singular values \({\hat{\sigma }}_j\), \(1\le j \le s+1\), ordered in a decreasing manner. The source number n is then detected by thresholding on these singular values. In the next subsection we will provide the theoretical guarantee of the threshold.

4.3 Theoretical Guarantee

Note that when there is no noise, \({\textbf{H}}(s)=BAB^{\top }\). We have the following estimate for the singular values of \(BAB^{\top }\).

Lemma 4.1

Let \(n\ge 2\), \(s\ge n\), \({\textbf{y}}_j\in [-\frac{s\pi }{6\Omega }, \frac{s\pi }{6\Omega }]^2, 1\le j\le n,\) and \({\textbf{v}}\) in (4.3) be \((0, \frac{s\pi }{\Omega })^\top \). Let

$$\begin{aligned} \sigma _1,\ldots ,\sigma _n,0,\ldots ,0 \end{aligned}$$

be the singular values of \(BAB^T\) in (4.5) ordered in a decreasing manner. Then the following estimate holds

$$\begin{aligned} \sigma _n\ge \frac{m_{\min }\big (3\theta _{\min }(\Omega ,s)\big )^{2n-2}}{n(2(1+\sqrt{3})\pi )^{2n-2}}, \end{aligned}$$
(4.8)

where \(\theta _{\min }(\Omega , s)=\min _{p\ne j}\left| \left| {{\textbf{y}}_p\frac{\Omega }{2s}-{\textbf{y}}_j \frac{\Omega }{2s}}\right| \right| _1\).

Proof

Recall that \(\sigma _n\) is the minimum nonzero singular value of \(BAB^\top \). Let \(\ker (B^\top )\) be the kernel space of \(B^\top \) and \(\ker ^{\perp }(B^\top )\) be its orthogonal complement. Then we have

$$\begin{aligned}&\sigma _n=\min _{||x||_2=1,x\in \ker ^{\perp }(B^\top )}||BAB^\top x||_2\ge \sigma _{\min }(BA)\sigma _n(B^\top )\\&\quad \ge \sigma _{\min }(B)\sigma _{\min }(A)\sigma _{\min }(B). \end{aligned}$$

On the other hand, since by the condition of the lemma \({\textbf{x}}_j = {\textbf{y}}_j +{\textbf{v}} \in [-\frac{s\pi }{6\Omega }, \frac{s\pi }{6\Omega }]\times [\frac{5\,s\pi }{6\Omega }, \frac{7\,s\pi }{6\Omega }]\), we have \( \frac{\Omega {\textbf{x}}_j}{2\,s} \in [-\frac{\pi }{12}, \frac{\pi }{12}]\times [\frac{5\pi }{12}, \frac{7\pi }{12}]\). Thus, by Lemma 3.2, for \(r= \frac{\Omega }{2s}\),

$$\begin{aligned} \min _{p\ne q}\left| {e^{i {\textbf{x}}_{p,1}r}+ e^{i {\textbf{x}}_{p,2}r} -(e^{i {\textbf{x}}_{q,1}r}+ e^{i {\textbf{x}}_{q,2}r})}\right| \ge \frac{3}{2\pi } \theta _{\min }(\Omega , s). \end{aligned}$$

Note also that \(|e^{i {\textbf{x}}_{p,1}r}+ e^{i {\textbf{x}}_{p,2}r}|\le \sqrt{3}\). Thus applying Lemma 6.3 and Corollary 6.2, we have

$$\begin{aligned} \sigma _{\min }(B)\ge \frac{1}{\sqrt{n}}\frac{\Big (\frac{3}{2\pi }\theta _{\min }(\Omega ,s)\Big )^{n-1}}{(1+\sqrt{3})^{n-1}}. \end{aligned}$$

Then, it follows that

$$\begin{aligned} \sigma _n\ge \sigma _{\min }(A)\Bigg (\frac{\big (\frac{3}{2\pi }\theta _{\min }(\Omega ,s)\big )^{n-1}}{(1+\sqrt{3})^{n-1}}\Bigg )^2 \ge \frac{m_{\min }\big (3\theta _{\min }(\Omega ,s)\big )^{2n-2}}{n(2(1+\sqrt{3})\pi )^{2n-2}}. \end{aligned}$$

\(\square \)

We now present the main result on the threshold for the singular values of the matrix \({\textbf{H}}(s)\).

Theorem 4.1

Let \(n\ge 2, s\ge n\) and \(\mu =\sum _{j=1}^{n}a_j \delta _{{\textbf{y}}_j}\) with \({\textbf{y}}_j\in [-\frac{s\pi }{6\Omega }, \frac{s\pi }{6\Omega }]^2, 1\le j\le n\). Let \({\textbf{v}}\) in (4.3) be equal to \((0, \frac{s\pi }{\Omega })^\top \). Then for the singular values of \(\textbf{H}(s)\) in (4.4), we have

$$\begin{aligned} {\hat{\sigma }}_j< \frac{4^{s+1}\sigma }{3},\quad j=n+1,\ldots ,s+1. \end{aligned}$$
(4.9)

Moreover, if the following separation condition is satisfied

$$\begin{aligned} \min _{p\ne j}\left| \left| {{\textbf{y}}_p-{\textbf{y}}_j}\right| \right| _1\ge \frac{4(1+\sqrt{3})\pi s }{3\Omega }\Big (\frac{2n4^{s+1}}{3}\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}, \end{aligned}$$
(4.10)

then

$$\begin{aligned} {\hat{\sigma }}_{n}\ge \frac{4^{s+1}\sigma }{3}. \end{aligned}$$
(4.11)

Proof

We first estimate \(||\Delta ||_2\) for \(\Delta \) in (4.7). By the definition of \({{\hat{\textbf{W}}}}(t)\), we have \(\left| {{\hat{\textbf{W}}}}(t)\right| < 2^{t} \sigma \). Thus \(||\Delta ||_2\le ||\Delta ||_F< \frac{4^{s+1}\sigma }{3}\). By Weyl’s theorem, we have \(|\hat{\sigma }_j-\sigma _j|\le ||\Delta ||_2, j=1,\ldots ,n\). Together with \(\sigma _j=0, n+1\le j \le s+1\), we get \(|{\hat{\sigma }}_j|\le ||\Delta ||_2< \frac{4^{s+1}\sigma }{3}, n+1\le j \le s+1\). This proves (4.9).

Let \(\theta _{\min }(\Omega ,s)=\frac{\Omega }{2\,s}\min _{p\ne q}\left| \left| {{\textbf{y}}_p-{\textbf{y}}_q}\right| \right| _1\). The separation condition (4.10) implies that

$$\begin{aligned} \theta _{\min }(\Omega ,s) \ge \frac{2(1+\sqrt{3})\pi }{3} \Big (\frac{2n4^{s+1}}{3}\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}. \end{aligned}$$

By Lemma 4.1, we have

$$\begin{aligned} \sigma _n \ge \frac{m_{\min }\big (3\theta _{\min }(\Omega ,s)\big )^{2n-2}}{n(2(1+\sqrt{3})\pi )^{2n-2}}>2\frac{4^{s+1}\sigma }{3}. \end{aligned}$$
(4.12)

Similarly, by Weyl’s theorem, \(|{\hat{\sigma }}_n-\sigma _n|\le ||\Delta ||_2\). Thus, \({\hat{\sigma }}_n\ge \sigma _n - ||\Delta ||_2 > 2\frac{4^{s+1}\sigma }{3}-\frac{4^{s+1}\sigma }{3}= \frac{4^{s+1}\sigma }{3}\). The conclusion (4.11) then follows. \(\square \)

4.4 Coordinate-Combination-Based Sweeping Singular-Value-Thresholding Number Detection Algorithm

Based on Theorem 4.1, we can propose a simple thresholding algorithm, Algorithm 1, for the number detection.

Algorithm 1:
figure a

Coordinate-combination-based singular-value-thresholding number detection algorithm

Note that if the minimum separation distance \(d_{\min }\) in (4.10) increases to \(cd_{\min }\) for some \(c>1\), then we have

$$\begin{aligned} {\hat{\sigma }}_{n}>c^{2n-2}\frac{4^{s+1}\sigma }{3}. \end{aligned}$$

Thus even if we increase a little the separation distance, then we will get a large gap between \({\hat{\sigma }}_n\) and \(\hat{\sigma }_{n+1}\). This indicates that our algorithm is very stable and not sensitive to the noise level estimate. In practical applications, the noise level is usually not precisely known, but the robustness of our algorithm means that one can estimate a noise level although not tight and utilize our algorithm to detect the source number with high resolution.

On the other hand, if \(s<n\) in the applications, by (4.11) we have

$$\begin{aligned} {\hat{\sigma }}_j \ge \frac{4^{s+1}\sigma }{3}, \quad j=1, \ldots , s+1, \end{aligned}$$

when separation condition (4.10) holds, and the source number is deduced to be \(s+1\) by Algorithm 1. When \(s\ge n\) and (4.10) holds, Algorithm1 gives the exact n. Thus in applications, one is suggested to choose a suitable and large enough s. However, a suitable s is not easy to estimate and large s may lead to a deterioration of the resolution as indicated by (4.10). To remedy this issue, we propose a sweeping singular-value-thresholding number detection algorithm (Algorithm 2) below. In short, we detect the number \(n_{\textrm{recover}}\) by Algorithm 1 for all s from 2 to \(\lfloor \frac{\Omega -1}{2}\rfloor \), and choose the greatest one \(n_{\max }\) as the number of point sources. When the detected \(n_{\textrm{recover}}\) becomes smaller than \(n_{\max }\) for a large number of iterations, we will stop the loop. The details are summarized in Algorithm 2 below.

We remark that when \(s=n\) and the point sources satisfy

$$\begin{aligned} \min _{p\ne q}\left| \left| {{\textbf{y}}_p-{\textbf{y}}_q}\right| \right| _1 \ge \frac{Cn}{\Omega }\Big (\frac{\sigma }{m_{\min }}\Big )^{\frac{1}{2n-2}}, \end{aligned}$$
(4.13)

for some constant C, then (4.10) is satisfied. Thus by Theorem 4.1, for a suitable choice of \({\textbf{v}}\), Algorithm 1 can exactly detect the number n when \(s=n\). As s increases to values greater than n, (4.9) implies that the number detected by Algorithm 1 will not exceed n. Therefore, the sweeping singular-value-thresholding algorithm (Algorithm 2) can detect the exact number n when \(\Omega \) is greater than \(2n+1\) and the point sources are separated by the minimal separation distance we derived in Theorem 2.1. This demonstrates the optimal performance of Algorithm2. We also remark that the theoretical threshold derived in Theorem 4.1 seems to be larger than the one that is needed. One can improve the algorithm by choosing smaller threshold. Deriving new estimates for the thresholds in different cases is another interesting problem.

Algorithm 2:
figure b

Coordinate-combination-based sweeping singular-value-thresholding number detection algorithm

4.5 Phase Transition and Performance of Algorithm 2

In this subsection, we conduct numerical experiments to demonstrate the phase transition phenomenon regarding the super-resolution factor (\(\textrm{SRF}\)) and the SNR using Algorithm2. We consider recovering the number of three and four sources. We fix \(\Omega =10\) and detect the source number from their noisy Fourier data at \(\{0,1,\ldots , \Omega \}^2\). We consider sources in \([0, \frac{\pi }{2}]^2\) and the translation vector in Algorithm1 is \({\textbf{v}} = (0, \frac{\pi }{2})^\top \). The noise level is \(\sigma \) and the minimum separation distance between sources is \(D_{\min }\). We perform 10,000 random experiments (the randomness is in the choice of (\(D_{\min }\), \(\sigma \), \({\textbf{y}}_j\), \(a_j\))) and detect the source number by Algorithm 2. We record the number of each successful detection (source number is detected exactly) and failed detection. Figure 2 shows the result for the successful and unsuccessfully recovery in the parameter space \(\log (\textrm{SNR})\) versus \(\log (\textrm{SRF})\). It is observed that there is a line with slope (\(2n-2\)) in the parameter space of \(\log (\textrm{SRF})\)-\(\log (\textrm{SNR})\) above which the number detection is always successful. This phase transition phenomenon is exactly the one predicted by our theoretical results in Theorems 2.1 and 4.1. It also illustrates the efficiency of Algorithm 2 as it can resolve the source number correctly in the regime where the source separation distance is of the order of the computational resolution limit.

Fig. 2
figure 2

Plots of the successful and the unsuccessful number detection by Algorithm2 depending on the relation between \(\log (\textrm{SRF})\) and \(\log (\frac{1}{\sigma })\). a illustrates that three sources can be exactly detected if \(\log (\frac{1}{\sigma })\) is above a line of slope 4 in the parameter space. b illustrates that four sources can be exactly detected if \(\log (\frac{1}{\sigma })\) is above a line of slope 6 in the parameter space

5 An Algorithm for the Source Reconstruction in Two-Dimensional DOA Problems

In this section, based on the idea of coordinate-combination, we propose a new MUSIC algorithm for resolving the sources in the two-dimensional DOA estimation. Our algorithm is named as coordinate-combination-based MUSIC algorithm; see Algorithm4.

5.1 Hankel Matrix Construction

Similarly to the number detection algorithm in the above section, the MUSIC algorithm also relies on a singular value decomposition of certain Hankel matrix. Compared to conventional MUSIC-based DOA algorithms, the main novelty of our algorithm lies in a different way of assembling Hankel matrices. Similarly to Sect. 4.2, the Hankel matrix construction here is also based on observations in Sect. 3.1 and the details are presented below.

Measurement modification by source translation We consider the same model setting as (4.2) for the available measurement. We also perform the source translation and modify the measurement to get

$$\begin{aligned} \begin{aligned} {\textbf{X}}({\varvec{\omega }})&= e^{i {\textbf{v}}^\top {\varvec{\omega }}} {\textbf{Y}}({\varvec{\omega }}) = \sum _{j=1}^{n}a_j e^{i ({\textbf{y}}_j+{\textbf{v}})^\top \varvec{\omega }} + e^{i {\textbf{v}}^\top {\varvec{\omega }}} {\textbf{W}}(\varvec{\omega })\\&= \sum _{j=1}^{n}a_j e^{i {\textbf{x}}_j^\top \varvec{\omega }} + {\tilde{\textbf{W}}}(\varvec{\omega }), \quad \varvec{\omega }\in \{0,1, \ldots , \Omega \}^2, \end{aligned} \end{aligned}$$
(5.1)

where \({\textbf{x}}_j = {\textbf{y}}_j +{\textbf{v}}\) for a suitable \({\textbf{v}}\in {\mathbb {R}}^2\) and \(|\tilde{\textbf{W}}(\varvec{\omega })|< \sigma \).

Measurement modification by the coordinate-combination technique

Let \(s=\lfloor \frac{\Omega }{2} \rfloor \). From the modified measurement \({\textbf{X}}(\varvec{\omega })\), we construct the following two lists of data:

$$\begin{aligned}&\textbf{D}(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} {\textbf{X}}({\varvec{\omega }}_{t_1, t_2}), \quad t=0, \ldots , 2s, \\&\textbf{G}(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} (-1)^{t_2}{t\atopwithdelims ()t_1} {\textbf{X}}({\varvec{\omega }}_{t_1, t_2}), \quad t=0, \ldots , 2s, \end{aligned}$$

where \({\varvec{\omega }}_{t_1, t_2} = (t_1, t_2)^\top \). Note that

$$\begin{aligned} \textbf{D}(t) =&\sum _{j=1}^n a_j(e^{i {\textbf{x}}_{j,1}}+ e^{i {\textbf{x}}_{j,2}})^{t} + \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} {\tilde{\textbf{W}}}({\varvec{\omega }}_{t_1, t_2})\\ =&\sum _{j=1}^na_j(e^{i {\textbf{x}}_{j,1}}+ e^{i {\textbf{x}}_{j,2}})^{t} + {{\hat{\textbf{W}}}}_d(t), \\ \textbf{G}(t) =&\sum _{j=1}^n a_j(e^{i {\textbf{x}}_{j,1}}-e^{i {\textbf{x}}_{j,2}})^{t} + \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} (-1)^{t_2}{t\atopwithdelims ()t_1} {\tilde{\textbf{W}}}(\varvec{\omega }_{t_1, t_2})\\ =&\sum _{j=1}^na_j(e^{i {\textbf{x}}_{j,1}}- e^{i {\textbf{x}}_{j,2}})^{t} + {{\hat{\textbf{W}}}}_g(t), \end{aligned}$$

where \({{\hat{\textbf{W}}}}_d(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} {t\atopwithdelims ()t_1} \tilde{\textbf{W}}(\varvec{\omega }_{t_1, t_2})\) and \({\hat{\textbf{W}}}_g(t) = \sum _{t_1+t_2 =t, 0\le t_1, t_2\le t} (-1)^{t_2}{t\atopwithdelims ()t_1} \tilde{\textbf{W}}({\varvec{\omega }}_{t_1, t_2})\).

Hankel matrix construction

Finally, from these \({\textbf{D}}(t), {\textbf{G}}(t)\)’s, we assemble the following Hankel matrices:

$$\begin{aligned} {\textbf{H}}_d(s)= & {} \begin{pmatrix} {\textbf{D}}(0) &{}{\textbf{D}}(1)&{}\cdots &{} {\textbf{D}}(s)\\ {\textbf{D}}(1)&{}{\textbf{D}}(2)&{}\cdots &{}{\textbf{D}}(s+1)\\ \cdots &{}\cdots &{}\ddots &{}\cdots \\ {\textbf{D}}(s)&{}{\textbf{D}}(s+1)&{}\cdots &{}{\textbf{D}}(2s) \end{pmatrix}, \nonumber \\ {\textbf{H}}_g(s)= & {} \begin{pmatrix} {\textbf{G}}(0) &{}{\textbf{G}}(1)&{}\cdots &{} {\textbf{G}}(s)\\ {\textbf{G}}(1)&{}{\textbf{G}}(2)&{}\cdots &{}{\textbf{G}}(s+1)\\ \cdots &{}\cdots &{}\ddots &{}\cdots \\ {\textbf{G}}(s)&{}{\textbf{G}}(s+1)&{}\cdots &{}{\textbf{G}}(2s) \end{pmatrix}. \end{aligned}$$
(5.2)

5.2 Standard MUSIC Algorithm

In this subsection, we perform the standard MUSIC algorithm [40, 47, 60, 67] for the Hankel matrix \({\textbf{H}}_d(s), {\textbf{H}}_g(s)\) in (5.2). For ease of presentation, we only introduce the MUSIC algorithm for \({\textbf{H}}_d(s)\). The one for \({\textbf{H}}_g(s)\) can be developed in the same manner. Our algorithm first performs the singular value decomposition of \({\textbf{H}}_d(s)\),

$$\begin{aligned} {\textbf{H}}_d(s) = {\hat{U}}{\hat{\Sigma }} {\hat{U}}^*=[{\hat{U}}_1\quad {\hat{U}}_2]\text {diag}({\hat{\sigma }}_1, {\hat{\sigma }}_2,\ldots ,\hat{\sigma }_n,{\hat{\sigma }}_{n+1},\ldots ,{\hat{\sigma }}_{s+1})[{\hat{U}}_1\quad {\hat{U}}_2]^*, \end{aligned}$$

where \({\hat{U}}_1=({\hat{U}}(1),\ldots ,{\hat{U}}(n)), {\hat{U}}_2=({\hat{U}}(n+1),\ldots ,{\hat{U}}(s+1))\) with n being the estimated source number (model order). The source number n can be detected by Algorithm2 and many other algorithms such as those in [2, 13, 27, 28, 44, 45, 62, 75, 76]. Denote the orthogonal projection onto the space \({\hat{U}}_2\) by \({\hat{P}}_2x={\hat{U}}_2({\hat{U}}_2^*x)\). For a test vector \(\Phi (d)=(1, d,\ldots ,d^s)^\top \), one defines the MUSIC imaging functional

$$\begin{aligned} {\hat{J}}(d)=\frac{||\Phi (d)||_2}{||{\hat{P}}_2\Phi (d)||_2}=\frac{||\Phi (d)||_2}{||{\hat{U}}_2^*\Phi (d)||_2}. \end{aligned}$$

The local maximizer of \({\hat{J}}(d)\) indicates the supports of the sources. In practice, one can test evenly spaced points in a specified region and plot the discrete imaging functional and then determine the sources by detecting the peaks. In our case, we only need to test some discrete points \(d\in {\mathbb {C}}\) with \(|d|\le 2\) and select the peak by certain algorithms (such as the one in [47] or its two-dimensional analog). Finally, we summarize the standard MUSIC algorithm in Algorithm3 below.

Algorithm 3:
figure c

Standard MUSIC algorithm

5.3 Coordinate-Combination-Based MUSIC Algorithm

After applying the MUSIC algorithm to both \({\textbf{H}}_d(s), {\textbf{H}}_g(s)\), we expect to reconstruct n \({\hat{d}}_j\)’s which is close to \(d_j = e^{i {\textbf{x}}_{j,1}}+ e^{i {\textbf{x}}_{j,2}}\), and n \({\hat{g}}_j\)’s which is close to \(g_j = e^{i {\textbf{x}}_{j,1}}- e^{i {\textbf{x}}_{j,2}}\). The next question is how to link the pair \({\hat{d}}_j, {\hat{g}}_j\) that correspond to the same source. This is an inevitable pair matching issue in most of the two-dimensional DOA algorithms [44], where ad hoc schemes [14, 34, 79, 81] were derived to associate the estimated azimuth and elevation angles. Here, in contrast with conventional DOA algorithms, we do not need to link the azimuth and elevation angles but to link \({\hat{d}}_j\) and \({\hat{g}}_j\).

Observe that \(|d_j+g_j| = |2 e^{i {\textbf{x}}_{j,1}}|=2\) and \(|d_j-g_j| = |2 e^{i {\textbf{x}}_{j,2}}|=2\). We can use this criterion to match the pair \({\hat{d}}_j, {\hat{g}}_j\) that they should satisfy

$$\begin{aligned} |{\hat{d}}_j + {\hat{g}}_j|\approx 2, \quad |{\hat{d}}_j - {\hat{g}}_j|\approx 2. \end{aligned}$$
(5.3)

For example, we could consider the following minimization problem:

$$\begin{aligned} \min _{\pi \in \zeta (n)} \sum _{j=1}^n\left| {|{\hat{d}}_j + {\hat{g}}_{\pi _j}|-2}\right| +\left| {|{\hat{d}}_j - {\hat{g}}_{\pi _j}|-2}\right| , \end{aligned}$$
(5.4)

where \(\zeta (n)\) is the set of all permutations of \(\{1, \ldots , n\}\). This can be viewed as a balanced assignment problem [54], which can be solved efficiently by many algorithms such as the Hungarian algorithm.

We remark that our pair matching algorithm is not the one usually required in other one-dimensional based DOA algorithms. Unlike our case, the other pair matching problem is not an assignment problem, wherefore the pair matching is usually time consuming or complex processing is conducted to reduce the computational cost.

Algorithm 4:
figure d

Coordinate-combination-based MUSIC algorithm for two-dimensional DOA

5.4 Superiority of the Algorithm

5.4.1 Overcome the Issue of Separation Distance Loss in Conventional Two-Dimensional DOA Algorithms

Despite the fact that different recovering methods are proposed for DOA estimation in two dimensions, the conventional way for tackling the problem has hardly exceeded the scope of recovering the two direction (x- and y-direction) components of sources individually. Thus, as illustrated in Fig. 3a, severe loss of the source separation distance in one dimension is always an inevitable issue that causes unstable recovery of the direction components. Most of the researches ignored this issue and some papers [73, 74] proposed ad hoc schemes to enhance the reconstruction but in a complex manner. We also present an example in Fig. 3b to compare the recoveries of our Algorithm4 and a projection-based MUSIC algorithm (similar to the one in [44]). It clearly illustrates the recovery instability that is due to the projection.

Fig. 3
figure 3

a shows the severe loss of distance when considering the direction components of the sources. b illustrates an example of location recovery. The black points are the underlying sources and the red points are locations recovered by coordinate-combination-based MUSIC algorithm. The green points are locations recovered by one projection-based MUSIC algorithm. Although three sources are presented, the projection-based MUSIC can only recover two sources (Color figure online)

Our method is a new one-dimensional-based algorithm where the issue of severe source separation distance loss is avoided in a simple way. In our algorithm, the separation distance between direction components of sources are still preserved. This has been demonstrated by Lemma 3.2 for \({\varvec{\theta }}_j \in \mathbb [0, \frac{2}{3}\pi ]^2, j=1,2,\) with \(\frac{\pi }{3}\le {\varvec{\theta }}_{j,2} - {\varvec{\theta }}_{j,1}\le \frac{2}{3}\pi , j=1,2\). Furthermore, Theorem 5.1 shows that, for \({\textbf{y}}_j \in [0, \frac{\pi }{2}]^2\) and \({\textbf{v}} = (0, \frac{\pi }{2})^\top \), the separation distance between \({\textbf{x}}_j = {\textbf{y}}_j+{\textbf{v}}\)’s can be preserved after the coordinate-combination. By Theorem 5.1, if the distance between the \({\textbf{x}}_j\)’s is a certain constant C, then the distance between \(e^{i {\textbf{x}}_{j,1}}+ e^{i {\textbf{x}}_{j,2}}\) is larger than \(\frac{2C}{\pi ^2}\) times the original distance. For better results of preservation of the distance, as indicated by Theorems 2.1 and 2.2, we could consider sources in a smaller region with a specified translation. In the numerical experiments presented in this paper, for ease of discussion and presentation, we will consider sources in \([0,\frac{\pi }{2}]^2\) and the translation vector \({\textbf{v}} = (0, \frac{\pi }{2})^\top \). We leave the recovering strategies of the whole region \([0,2\pi ]^2\) and other enhancement for future work.

Theorem 5.1

For two different vectors \({\textbf{x}}_j \in \mathbb [0, \frac{\pi }{2}]\times [\frac{\pi }{2}, \pi ], j=1,2\), if \(||{\textbf{x}}_1 - {\textbf{x}}_2||_1\ge C\) for a constant C, then

$$\begin{aligned} \left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}- (e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}})}\right| \ge \frac{2C}{\pi ^2} C. \end{aligned}$$

Proof

We prove the lemma by considering the following two cases.

Case 1 \(0\le {\textbf{x}}_{1,1} \le {\textbf{x}}_{2,1} \le {\textbf{x}}_{2,2}\le {\textbf{x}}_{1,2}\le \pi \).

In this case,

$$\begin{aligned} \left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}- (e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}})}\right|&\ge \left| {e^{i {\textbf{x}}_{2,1}}+ e^{i{\textbf{x}}_{2,2}}}\right| - \left| {e^{i {\textbf{x}}_{1,1}} + e^{i{\textbf{x}}_{1,2}}}\right| \\&\ge 2\Bigg (\cos \left( \frac{\phi _2}{2}\right) - \cos \left( \frac{\phi _1}{2}\right) \Bigg ), \end{aligned}$$

where \(\phi _j = {\textbf{x}}_{j,2} - {\textbf{x}}_{j,1}, j=1,2\). By the assumption of the theorem, we have \(C\le \phi _1 -\phi _2 \le \pi \) and \(C\le \phi _1+\phi _2\le 2\pi \). Thus

$$\begin{aligned} 2\Bigg (\cos \left( \frac{\phi _2}{2}\right) - \cos \left( \frac{\phi _1}{2}\right) \Bigg )= & {} 4\sin \left( \frac{\phi _1+\phi _2}{4}\right) \sin \left( \frac{\phi _1-\phi _2}{4}\right) \\\ge & {} 4 \sin \left( \frac{C}{4}\right) \sin \left( \frac{C}{4}\right) \ge \frac{2C^2}{\pi ^2}. \end{aligned}$$

where the last inequality uses \(\sin (\frac{C}{4}) \ge \frac{2\sqrt{2}}{\pi }\frac{C}{4}\) for \(0\le \frac{C}{4}\le \frac{\pi }{4}\).

Case 2 \(0\le {\textbf{x}}_{1,1} \le {\textbf{x}}_{2,1} \le {\textbf{x}}_{1,2}\le {\textbf{x}}_{2,2}\le \pi \).

Again, the idea is to calculate the angle between \(e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}\) and \(e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}}\). By a simple analysis of the angle relations between \(e^{i {\textbf{x}}_{1,1}}, e^{i {\textbf{x}}_{1,2}}, e^{i {\textbf{x}}_{2,1}}, e^{i {\textbf{x}}_{2,2}},\) we obtain that the angle between \(e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}\) and \(e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}}\) is \(\frac{{\textbf{x}}_{2,1}-{\textbf{x}}_{1,1}+ {\textbf{x}}_{2,2}-{\textbf{x}}_{1,2}}{2}\) which is larger than \(\frac{C}{2}\). Thus

$$\begin{aligned}{} & {} \left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}- (e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}})}\right| \nonumber \\{} & {} \quad \ge \max \Big (\left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}}\right| , \left| {e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}}}\right| \Big )\sin \left( \frac{C}{2}\right) . \end{aligned}$$
(5.5)

We next claim that

$$\begin{aligned} \max \Big (\left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}}\right| , \left| {e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}}}\right| \Big ) \ge 2 \cos \Big (\frac{\pi -C/2}{2}\Big ). \end{aligned}$$

Otherwise, \(\pi - \frac{C}{2}<{\textbf{x}}_{1,2}-{\textbf{x}}_{1,1}\le \pi \) and \( \pi - \frac{C}{2}<{\textbf{x}}_{2,2}-{\textbf{x}}_{2,1}\le \pi \), which is impossible when \(||{\textbf{x}}_1 -{\textbf{x}}_2||_1\ge C\). Thus the claim is proved. Together with (5.5), we arrive at

$$\begin{aligned} \left| {e^{i {\textbf{x}}_{1,1}}+ e^{i {\textbf{x}}_{1,2}}- (e^{i {\textbf{x}}_{2,1}} + e^{i {\textbf{x}}_{2,2}})}\right| \ge 2 \sin \left( \frac{C}{4}\right) \sin \left( \frac{C}{2}\right) \ge \frac{2C^2}{\pi ^2}. \end{aligned}$$

This completes the proof. \(\square \)

Fig. 4
figure 4

Plots of the successful and the unsuccessful location recoveries by Algorithm4 in terms of \(\log (\frac{1}{\sigma })\) versus \(\log (\textrm{SRF})\). a illustrates that locations of three point sources can be stably recovered if \(\log (\frac{1}{\sigma })\) is above a line of slope 5 in the parameter space. Conversely, for the same case, b shows that locations of four point sources can be stably recovered if \(\log (\frac{1}{\sigma })\) is above a line of slope 7 in the parameter space

5.4.2 Phase Transition and Performance of Algorithm 4

Most of the conventional two-dimensional DOA algorithms consider multiple snapshots of measurements from coherent or incoherent signals. Also, the noise is usually assumed to be white Gaussian noise such that the expectation of the covariance matrix of the measurement vector is a sum of two terms, where the first term is from the correlation of the signals and the second one is the noise correlation matrix. Based on this crucial observation, many algorithms were derived to tackle the problem. Differently to the above model, we consider recovering the source from a single measurement with deterministic noise. Thus we do not compare the performance of our algorithm with those algorithms with statistical model. We demonstrate the super-resolution capacity of our algorithm for the single snapshot case by showing the phase transition of the algorithm. We will derive a coordinate-combination-based MUSIC algorithm for multiple snapshots case in a forthcoming work.

We now describe the numerical experiments for demonstrating the phase transition phenomenon of our algorithm in terms of the SNR versus the super-resolution factor. We fix \(\Omega =10\) and consider three and four sources separated by the minimum separation distance \(D_{\min }\), i.e., \(\min _{p\ne q}||{\textbf{y}}_p - {\textbf{y}}_q||_1\ge D_{\min }\). We perform 10,000 random experiments (the randomness is in the choice of \((D_{\min },\sigma , {\textbf{y}}_j, a_j)\) to recover the sources using Algorithm 4. The reconstruction is viewed and recorded as successful if the recovered source is in a \(\frac{D_{\min }}{3}\)-neighborhood of the underlying source, otherwise it is unsuccessful; See Algorithm 5 for the details of a single experiment. The results of the experiments are summarized in Fig. 4 which shows each successful and unsuccessfully recovery with respective to the \(\log (\textrm{SRF})\) and \(\log (\textrm{SNR})\). It is observed that there is a line with slope (\(2n-1\)) in the parameter space \(\log (\textrm{SRF})\) versus \(\log (\textrm{SNR})\) above which the source is stably reconstructed for every realization. This phase transition phenomenon is exactly the one predicted by our theoretical result in Theorem 2.2. It also manifests the efficiency of Algorithm 4 as it can resolve the source in the regime where the source separation distance is of the order of the computational resolution limit.

Algorithm 5:
figure e

A single experiment

6 A Nonlinear Approximation Theory in Vandermonde Space

Recall that in the proofs of Theorems 2.1 and 2.2, the three observations in Sect. 3.1 are used to transform the two-dimensional super-resolution problems into the following approximation

$$\begin{aligned} \left| {\sum _{j=1}^q {\hat{a}}_j {\hat{d}}_j^{t} - \sum _{j=1}^n a_j d_j^{t}}\right| < C \sigma , \quad t=0,1, \ldots , s(n), \end{aligned}$$
(6.1)

where \({\hat{a}}_j\in {\mathbb {C}}, {\hat{d}}_j\in {\mathbb {C}}\), \(a_j\in {\mathbb {C}}\), \(d_j\)’s are complex numbers with a minimum separation, \(q\le n\), and \(s(n)=2n-2\) or \(2n-1\). We are interested in the stability of recovering \(d_j\)’s by \({\hat{d}}_j\)’s from above approximation and call these problems the approximation problems in Vandermonde space. Note that, as already shown in the proofs, they play a crucial role in proving Theorems 2.1 and 2.2. The main motivation for transforming the two-dimensional super-resolution into the above problems is that we already have enough tools to analyze them. To be specific, in [45, 46], we have introduced a nonlinear approximation theory in Vandermonde space for certain cases to deal with one-dimensional super-resolution problems. In [46], we have derived the theory for real numbers and in [45] for complex numbers on the unit circle. Here, we derive a different theory for arbitrary bounded complex numbers (\(d_j\in {\mathbb {C}}, 1\le j\le n\)), which is used to prove Theorems 2.1 and 2.2.

Especially, for a given positive integer s and \(\omega \in \mathbb C\), we denote by

$$\begin{aligned} \phi _s(\omega )=(1,\omega ,\ldots ,\omega ^{s})^\top \end{aligned}$$
(6.2)

and call \(\phi _s\) a Vandermonde vector. We first study the following nonlinear approximation problem in the Vandermonde space,

$$\begin{aligned} \min _{{\hat{a}}_j, {\hat{d}}_j\in {\mathbb {R}},|{\hat{d}}_j|\le d,j=1,\ldots ,k}\Big |\Big |\sum _{j=1}^k {\hat{a}}_j\phi _{2k}({\hat{d}}_j)- \sum _{j=1}^{k+1}a_j\phi _{2k}(d_j)\Big |\Big |_2, \end{aligned}$$
(6.3)

where \(d_j\)’s are bounded complex numbers with a minimum separation distance. We derive a sharp lower bound on the above approximation in Theorem 6.2. In addition, we also investigate the stability of recovering \(d_j\)’s by \({\hat{d}}_j\)’s from the condition

$$\begin{aligned} \Big |\Big |\sum _{j=1}^k {\hat{a}}_j\phi _{2k-1}({\hat{d}}_j)- \sum _{j=1}^{k}a_j\phi _{2k-1}(d_j)\Big |\Big |_2<\sigma , \end{aligned}$$
(6.4)

and establish Theorem 6.3. These results are closely related to problem (6.1) and hence are also closely related to the proofs of the main results of this paper.

6.1 Notation and Preliminaries

We introduce some notation and preliminaries. We denote the Vandermonde matrix by

$$\begin{aligned} V_{s}(k)=\begin{pmatrix} 1&{}\cdots &{}1\\ d_1&{}\cdots &{}d_{k}\\ \vdots &{}\ddots &{}\vdots \\ d_1^s&{}\cdots &{}d_{k}^{s} \end{pmatrix}= \Big ( \phi _s(d_1)\ \ \phi _s(d_2)\ \ \cdots \ \ \phi _s(d_k) \Big ). \end{aligned}$$
(6.5)

For a real matrix or a vector A, we denote by \(A^\top \) its transpose and by \(A^*\) its conjugate transpose.

We first present some basic properties of Vandermonde matrices.

Lemma 6.1

For k distinct complex numbers \(d_j\)’s, we have

$$\begin{aligned} ||V_{k-1}(k)^{-1}||_{\infty }\le \max _{1\le i\le k}\Pi _{1\le p\le k,p\ne i}\frac{1+|d_p|}{|d_i-d_p|}, \end{aligned}$$

where \(V_{k-1}(k)\) is the Vandermonde matrix \(V_{k-1}(k)\) defined as in (6.5).

Proof

See Theorem 1 in [26].\(\square \)

As a consequence, we directly have the following corollary.

Corollary 6.2

Let \(d_{\min }=\min _{i\ne j}|d_i-d_j|\) and assume that \(\max _{i=1,\ldots ,k}|d_i|\le d\). Then

$$\begin{aligned} ||V_{k-1}(k)^{-1}||_{\infty }\le \frac{(1+d)^{k-1}}{(d_{\min })^{k-1}}. \end{aligned}$$

Lemma 6.3

For distinct \(d_1,\ldots , d_k \in {\mathbb {C}}\), define the Vandermonde matrices \(V_{k-1}(k), V_s(k)\) as in (6.5) with \(s\ge k-1\). Then the following estimate on their singular values holds:

$$\begin{aligned} \frac{1}{\sqrt{k}} \frac{1}{||V_{k-1}(k)^{-1}||_{\infty }} \le \frac{1}{||V_{k-1}(k)^{-1}||_{2}}\le \sigma _{\min }(V_{k-1}(k))\le \sigma _{\min }(V_{s}(k)). \end{aligned}$$

Proof

The result holds by using properties of matrix norms. \(\square \)

Denote by

$$\begin{aligned} S_{1k}^j:=\Big \{\{\tau _1,\ldots ,\tau _j\}: \tau _p\in \{1,\ldots ,k\},p=1,\ldots ,j \text { and } \tau _p\ne \tau _q, \text { for } p\ne q\Big \}. \end{aligned}$$

Note that there is no order in \(\{\tau _1,\ldots ,\tau _j\}\), i.e., \(\{1,2\}\) and \(\{2,1\}\) are the same sets. We then have the following decomposition of the Vandermonde matrix.

Proposition 6.4

The Vandermonde matrix \(V_{k}(k)\) defined as in (6.5) can be reduced to the following form by using elementary column-addition operations, i.e.,

$$\begin{aligned} V_{k}(k)G(1)\cdots G(k-1)DQ(1)\cdots Q(k-1)= \begin{pmatrix} 1&{}0&{}\cdots &{}0\\ 0&{}1&{}\cdots &{}0\\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ 0&{}0&{}\cdots &{}1\\ v_{(k+1)1}&{}v_{(k+1)2}&{}\cdots &{}v_{(k+1)k} \end{pmatrix}, \end{aligned}$$
(6.6)

where \(G(1),\ldots ,G(k-1),Q(1),\ldots ,Q(k-1)\) are elementary column-addition matrices,

$$\begin{aligned} D=\text {diag}\left( 1,\frac{1}{(d_2-d_1)},\ldots ,\frac{1}{\Pi _{p=1}^{k-1}(d_k-d_p)}\right) \end{aligned}$$

and

$$\begin{aligned} v_{(k+1)j}=(-1)^{k-j}\sum _{\{\tau _1,\ldots ,\tau _{k+1-j}\}\in S_{1k}^{k+1-j}}d_{\tau _1}\cdots d_{\tau _{k+1-j}}. \end{aligned}$$
(6.7)

Proof

See “Appendix B” in [46]. \(\square \)

Lemma 6.5

For an \(s\times k\) complex matrix A of rank k with \(s>k\), let V be the space spanned by columns of A and \(V^{\perp }\) be the orthogonal complement of V. Denote by \(P_{V^{\perp }}\) the orthogonal projection to \(V^{\perp }\), and set \(D=(A,v)\). We have

$$\begin{aligned} \min _{a\in {\mathbb {C}}^{k}}||Aa-v||_2=||P_{V^{\perp }}(v)||_2= \sqrt{\frac{\det (D^*D)}{\det (A^*A)}}. \end{aligned}$$

Proof

See Lemma 1 in [45]. \(\square \)

Lemma 6.6

We have

$$\begin{aligned} \sqrt{\frac{\det (V_{k}(k)^{*}V_k(k))}{\det (V_{k-1}(k)^{*}V_{k-1}(k))}}= \sqrt{\sum _{j=0}^{k}|v_{j}|^2}, \end{aligned}$$
(6.8)

where \(V_{s}(k)\) is defined as in (6.5) and \(v_{j}=\sum _{\{\tau _1,\ldots ,\tau _j\}\in S_{1k}^j}d_{\tau _1}\cdots d_{\tau _j}\). Especially, if \(|d_j|<d,j=1,\ldots ,k\), then

$$\begin{aligned} \sqrt{\frac{\det (V_{k}(k)^{*}V_k(k))}{\det (V_{k-1}(k)^{*}V_{k-1}(k))}}\le (1+d)^k. \end{aligned}$$
(6.9)

Proof

Note that in Proposition 6.4, all the elementary column-addition matrices have unit determinant. As a result, \( \det (V_{k}(k)^{*}V_k(k)) = \frac{ \det (F^{*}F)}{\det ( D^*D)}\), where F is the matrix in the right-hand side of (6.6), and D is the diagonal matrix in Proposition 6.4. A direct calculation shows that \(\det (F^{*}F)= \sum _{j=0}^{k}|v_j|^2\), where we use (6.7). On the other hand, \(V_{k-1}(k)\) is a standard Vandermonde matrix and we have \(\det (V_{k-1}(k)^*V_{k-1}(k)) =\frac{1}{\det (D^*D)}\). Combining these results, (6.8) follows. The last statement can be derived from (6.8) and the estimate that

$$\begin{aligned} \sqrt{\sum _{j=0}^{k}|v_j|^2}\le \sum _{j=0}^{k}|v_j|\le \sum _{j=0}^{k} \begin{pmatrix} k\\ j \end{pmatrix} d^j=(1+d)^k. \end{aligned}$$

\(\square \)

For reader’s convenience, we finally present two auxiliary lemmas. For positive integers pq and complex numbers \(z_1, \ldots , z_p, {\hat{z}}_1, \ldots , {\hat{z}}_q\), we define the vector

$$\begin{aligned} \eta _{p,q}(z_1, \ldots , z_p, {\hat{z}}_1, \ldots , {\hat{z}}_q)= \begin{pmatrix} |z_1-{\hat{z}}_1|\cdots |z_1-{\hat{z}}_q|\\ |z_2-{\hat{z}}_1|\cdots |z_2-{\hat{z}}_q|\\ \vdots \\ |z_p-{\hat{z}}_1|\cdots |z_p-{\hat{z}}_q| \end{pmatrix}. \end{aligned}$$
(6.10)

The following two properties of \(\eta _{p,q}\) hold.

Lemma 6.7

For complex numbers \(d_j, {\hat{d}}_j\)’s, we have the following estimate

$$\begin{aligned} \Big |\Big |\eta _{k+1,k}(d_1, \ldots , d_{k+1}, {\hat{d}}_1, \ldots , {\hat{d}}_k)\Big |\Big |_{\infty }\ge \left( \frac{d_{\min }}{2}\right) ^k, \end{aligned}$$

where \(d_{\min }=\min _{j\ne p}|d_j-d_p|\) and \(\eta _{k+1,k}(d_1, \ldots , d_{k+1}, {\hat{d}}_1, \ldots , {\hat{d}}_k)\) is defined as in (6.10).

Proof

Because we have \(k+1\) \(d_j\)’s and only k \({\hat{d}}_j\)’s, there must exist one \(d_{j_0}\) so that

$$\begin{aligned} |d_{j_0}-{\hat{d}}_j|\ge \frac{d_{\min }}{2}, \quad j =1, \ldots , k. \end{aligned}$$

Then the estimate in the lemma follows. \(\square \)

Lemma 6.8

Let \(d_j, {\hat{d}}_j\in {\mathbb {C}}, j=1,\ldots , k\) satisfy \(|d_j|, |{\hat{d}}_j|\le d\). Assume that

$$\begin{aligned} ||\eta _{k,k}(d_1, \ldots , d_k, {\hat{d}}_1, \ldots , {\hat{d}}_k)||_{\infty }< \epsilon , \end{aligned}$$
(6.11)

where \(\eta _{k,k}(\cdots )\) is defined as in (6.10), and that

$$\begin{aligned} d_{\min } = \min _{p\ne q}|d_p-d_j| \ge 2\epsilon ^{\frac{1}{k}}. \end{aligned}$$
(6.12)

Then after reordering \(d_j\)’s, we have

$$\begin{aligned} \left| {{\hat{d}}_j -d_j}\right| < \frac{d_{\min }}{2}, \quad j=1,\ldots ,k, \end{aligned}$$
(6.13)

and moreover

$$\begin{aligned} \left| {{\hat{d}}_j -d_j}\right| \le \Big (\frac{2}{d_{\min }}\Big )^{k-1}\epsilon , \quad j=1,\ldots , k. \end{aligned}$$
(6.14)

Proof

See “Appendix A”. \(\square \)

6.2 Lower-Bound for the Approximation Problem (6.3)

In this section, we derive a lower-bound for the nonlinear approximation problem (6.3). We first consider a special case.

Theorem 6.1

Let \(k\ge 1\) and \({\hat{d}}_1,\ldots , {\hat{d}}_{k}\) be k distinct complex numbers with \(|{\hat{d}}_j|\le {\hat{d}}, 1\le j\le k\). Define \(A:= \big (\phi _{k}({\hat{d}}_1), \ldots , \phi _k({\hat{d}}_k)\big )\), where \(\phi _k({\hat{d}}_j)\)’s are defined as in (6.2). Let V be the k-dimensional space spanned by the column vectors of A, and let \(V^{\perp }\) be the one-dimensional orthogonal complement of V in \({\mathbb {C}}^{k+1}\). Let \(P_{V^{\perp }}\) be the orthogonal projection onto \(V^{\perp }\) in \({\mathbb {C}}^{k+1}\). Then we have

$$\begin{aligned} \min _{a\in {\mathbb {C}}^k}\left| \left| {Aa- \phi _{k}(x)}\right| \right| _2 = \left| \left| {P_{V^{\perp }}(\phi _k(x))}\right| \right| _2 =\left| {v^{*}\phi _{k}(x)}\right| \ge \frac{1}{(1+{\hat{d}})^k}\left| {\Pi _{j=1}^k (x- {\hat{d}}_j)}\right| , \end{aligned}$$

where v is a unit vector in \(V^{\perp }\) and \(v^{*}\) is its conjugate transpose.

Proof

By Lemma 6.5, it follows that

$$\begin{aligned} \min _{a\in {\mathbb {C}}^k}\left| \left| {Aa- \phi _{k}(x)}\right| \right| _2 = \sqrt{\frac{\text {det}(D^{*} D)}{\text {det}(A^{*}A)}}, \end{aligned}$$

where \(D=\big (\phi _{k}({\hat{d}}_1),\ldots , \phi _{k}({\hat{d}}_k), \phi _{k}(x)\big )\). Denote \(\tilde{A} = \big (\phi _{k-1}({\hat{d}}_1),\ldots , \phi _{k-1}({\hat{d}}_k)\big )\). By (6.9), we have

$$\begin{aligned} \sqrt{\frac{\text {det}(A^{*}A)}{\text {det}(\tilde{A}^{*}\tilde{A})}} \le (1+{\hat{d}})^k. \end{aligned}$$

Therefore,

$$\begin{aligned} \min _{a\in {\mathbb {C}}^k}\left| \left| {Aa- \phi _{k}(x)}\right| \right| _2 \ge \frac{1}{(1+{\hat{d}})^{k}}\sqrt{\frac{\text {det}(D^{*}D)}{\text {det}(\tilde{A}^{*}\tilde{A})}}. \end{aligned}$$

Note that D and \(\tilde{A}\) are square Vandermonde matrices. We can use the determinant formula to derive that

$$\begin{aligned} \min _{a\in {\mathbb {C}}^k}\left| \left| {Aa- \phi _{k}(x)}\right| \right| _2\ge & {} \frac{1}{(1+{\hat{d}})^{k}} \frac{|\Pi _{1\le t<p\le k} ({\hat{d}}_t - {\hat{d}}_p)\Pi _{q=1}^k(x-{\hat{d}}_q)|}{|\Pi _{1\le t<p\le k} ({\hat{d}}_t - {\hat{d}}_p)|} \\= & {} \frac{1}{(1+{\hat{d}})^k}|\Pi _{j=1}^k (x- {\hat{d}}_j)|. \end{aligned}$$

This completes the proof of the theorem. \(\square \)

We now consider the approximation problem (6.3).

Theorem 6.2

Let \(k\ge 1\). Assume \((k+1)\) different complex numbers \(d_j\in {\mathbb {C}}, j=1, \ldots , k+1\) with \(|d_j|\le d\) and \((k+1)\) \(a_j\in {\mathbb {C}}\) with \(|a_j|\ge m_{\min }\). Let \(d_{\min }:=\min _{j\ne p}|d_j-d_p|\). For \(q\le k\), let \({\hat{a}}(q)=({\hat{a}}_1, {\hat{a}}_2, \ldots , {\hat{a}}_{q})^{\top }, a=(a_1, a_2, \ldots , a_{k+1})^{\top }\), and

$$\begin{aligned} {\hat{A}}(q) = \big (\phi _{2k}({\hat{d}}_1),\ \ldots ,\ \phi _{2k}({\hat{d}}_q)\big ),\ A = \big (\phi _{2k}(d_1),\ \ldots ,\ \phi _{2k}(d_{k+1})\big ), \end{aligned}$$

where \(\phi _{2k}(z)\) is defined as in (6.2). Then

$$\begin{aligned} \min _{{\hat{a}}_p,{\hat{d}}_p\in {\mathbb {C}}, |{\hat{d}}_p|\le {\hat{d}}, p=1,\ldots ,q}||{\hat{A}}(q){\hat{a}}(q)-Aa||_2\ge \frac{m_{\min }(d_{\min })^{2k}}{2^k(1+d)^{k}(1+{\hat{d}})^k}. \end{aligned}$$

Proof

Step 1 Note that for \(q<k\), we have

$$\begin{aligned}{} & {} \min _{{\hat{a}}_p, {\hat{d}}_p\in {\mathbb {C}}, |{\hat{d}}_p|\le d, p=1,\ldots ,q}||{\hat{A}}(q){\hat{a}}(q)-Aa||_2\\{} & {} \quad \ge \min _{{\hat{a}}_p,{\hat{d}}_p\in {\mathbb {C}}, |{\hat{d}}_p|\le d, p=1,\ldots ,k}||{\hat{A}}(k){\hat{a}}(k)-Aa||_2. \end{aligned}$$

Hence we need only to consider the case when \(q=k\). It then suffices to show that for any given \({\hat{d}}_j\in {\mathbb {C}}, |{\hat{d}}_j|\le {\hat{d}}, j=1, \ldots , k\), the following holds

$$\begin{aligned} \min _{{\hat{a}}_p\in {\mathbb {C}}, p=1,\ldots ,k}||{\hat{A}}(k){\hat{a}}(k)-Aa||_2\ge \frac{m_{\min }(d_{\min })^{2k}}{2^k(1+d)^{k}(1+{\hat{d}})^k}. \end{aligned}$$
(6.15)

So we fix \({\hat{d}}_1,\ldots ,{\hat{d}}_k\) in our subsequent argument.

Step 2 For \(l=0, \ldots , k\), we define the following partial matrices

$$\begin{aligned} {\hat{A}}_{l}= \left( \begin{array}{ccc} {\hat{d}}_1^l&{}\cdots &{}{\hat{d}}_k^l\\ {\hat{d}}_1^{l+1}&{}\cdots &{}{\hat{d}}_k^{l+1}\\ \vdots &{}\vdots &{}\vdots \\ {\hat{d}}_1^{l+k}&{}\cdots &{}{\hat{d}}_k^{l+k} \end{array} \right) , \quad A_{l}= \left( \begin{array}{ccc} (d_1)^l&{}\cdots &{}(d_{k+1})^l\\ (d_1)^{l+1}&{}\cdots &{}(d_{k+1})^{l+1}\\ \vdots &{}\vdots &{}\vdots \\ (d_1)^{l+k}&{}\cdots &{} (d_{k+1})^{l+k} \end{array} \right) . \end{aligned}$$

It is clear that for all l,

$$\begin{aligned} \min _{{\hat{a}}(k)\in {\mathbb {C}}^{k}}||{\hat{A}}(k){\hat{a}}(k)-Aa||_2 \ge \min _{{\hat{a}}\in {\mathbb {C}}^k}||{\hat{A}}_l {\hat{a}}- A_l a||_2. \end{aligned}$$
(6.16)

Step 3 For each l, observe that \({\hat{A}}_l = {\hat{A}}_0 \text {diag}({\hat{d}}_1^l,\ \ldots ,\ {\hat{d}}_k^l), A_l=A_0\text {diag}(d_1^l, \ldots , d_{k+1}^l)\), and thus

$$\begin{aligned} \min _{{\hat{a}}\in {\mathbb {C}}^{k}}||{\hat{A}}_l{\hat{a}}-A_la||_2\ge \min _{{\hat{\alpha }}_l\in {\mathbb {C}}^{k}}||A_0\hat{\alpha }_l-A_0\alpha _l||_2, \end{aligned}$$
(6.17)

where \(\alpha _l=\left( a_1(d_1)^l,\ldots ,a_{k+1}(d_{k+1})^l\right) ^{\top }\). Let V be the space spanned by the column vectors of \(A_0\). Then the dimension of V is k, and the dimension of \(V^\perp \), the orthogonal complement of V in \({\mathbb {C}}^{k+1}\), is one. Let \(P_{V^{\perp }}\) be the orthogonal projection onto \(V^{\perp }\). Note that \(||P_{V^{\perp }}u||_2=|v^{*}u|\) for \(u\in {\mathbb {C}}^{k+1}\), where v is a unit vector in \(V^{\perp }\) and \(v^{*}\) is its conjugate transpose. We have

$$\begin{aligned} \min _{{\hat{\alpha }}_l\in {\mathbb {C}}^k}||{\hat{A}}_0\hat{\alpha }_l-A_0\alpha _l||_2&=||P_{V^{\perp }}(A_0\alpha _l)||_2=|v^{*}A_0\alpha _l |\nonumber \\&=\Big |\sum _{j=1}^{k+1}a_j(d_j)^l v^{*} \phi _{k}(d_j)\Big | = |\beta _l|, \end{aligned}$$
(6.18)

where

$$\begin{aligned} \beta _l = \sum _{j=1}^{k+1}a_j(d_j)^l v^{*} \phi _{k}(d_j), \quad \text {for } l=0, 1, \ldots , k. \end{aligned}$$

Step 4 Denote \(\beta = (\beta _0, \ldots , \beta _k)^{\top }\). We have \(B{\hat{\eta }}=\beta \), where

$$\begin{aligned}B=\left( \begin{array}{cccc} a_1&{}a_2&{}\cdots &{}a_{k+1}\\ a_1d_1&{}a_2d_2&{}\cdots &{}a_{k+1}d_{k+1}\\ \vdots &{}\vdots &{}\vdots &{}\vdots \\ a_1(d_1)^{k}&{}a_2(d_2)^{k}&{}\cdots &{}a_{k+1}(d_{k+1})^k \end{array}\right) ,\quad {\hat{\eta }}= \left( \begin{array}{c} v^{*} \phi _{k}(d_1)\\ v^{*} \phi _{k}(d_2)\\ \vdots \\ v^{*} \phi _{k}(d_{k+1}) \end{array}\right) . \end{aligned}$$

Corollary 6.2 yields

$$\begin{aligned} ||{\hat{\eta }}||_{\infty }=||B^{-1}\beta ||_{\infty }\le ||B^{-1}||_{\infty }||\beta ||_{\infty }\le \frac{(1+d)^{k}}{m_{\min }(d_{\min })^{k}}||\beta ||_{\infty }. \end{aligned}$$

On the other hand, applying Theorem 6.1 to each term \(|v^{*} \phi _{k}(d_j)|\), \(j=1, 2, \ldots k+1\), we have

$$\begin{aligned} ||{\hat{\eta }}||_{\infty }\ge \frac{1}{(1+{\hat{d}})^k} ||\eta _{k+1, k}(d_1, \ldots , d_{k+1}, {\hat{d}}_1, \ldots , {\hat{d}}_{k} )||_{\infty }, \end{aligned}$$

where \(\eta _{k+1, k}(\cdots )\) is defined as in (6.10). Combining this inequality with Lemma 6.7, we get

$$\begin{aligned} ||{\hat{\eta }}||_{\infty }\ge \frac{(d_{\min })^{k}}{2^k(1+{\hat{d}})^k}. \end{aligned}$$

Then it follows that

$$\begin{aligned} ||\beta ||_{\infty } \ge \frac{m_{\min }(d_{\min })^{2k}}{2^k(1+d)^{k}(1+{\hat{d}})^{k}}. \end{aligned}$$

Therefore, recalling (6.16)–(6.18), we arrive at

$$\begin{aligned} \min _{{\hat{a}}(k)\in {\mathbb {C}}^{k}}||{\hat{A}}(k){\hat{a}}(k)-Aa||_2\ge & {} \max _{0\le l\le k} \min _{{\hat{a}}\in {\mathbb {C}}^k}||{\hat{A}}_l{\hat{a}}-A_la||_2\\= & {} \max _{0\le l\le k} |\beta _l|=||\beta ||_{\infty }\ge \frac{m_{\min }(d_{\min })^{2k}}{2^k(1+d)^{k}(1+{\hat{d}})^k}. \end{aligned}$$

This proves (6.15) and hence the theorem. \(\square \)

6.3 Stability of the Approximation Problem (6.4)

In the section, we present a stability result for the approximation problem (6.4).

Theorem 6.3

Let \(k\ge 1\). Assume k different complex numbers \(d_j\in \mathbb C, j=1, \ldots , k\) with \(|d_j|\le d\) and k \(a_j\in {\mathbb {C}}\) with \(|a_j|\ge m_{\min }\). Let \(d_{\min }:=\min _{p\ne q}|d_p-d_q|\). Assume that \({\hat{d}}_j\in {\mathbb {C}}, j=1, \ldots ,k\) with \(|{\hat{d}}_j|\le d\) satisfy

$$\begin{aligned} ||{\hat{A}}{\hat{a}}-Aa||_2< \sigma , \end{aligned}$$

where \({\hat{a}}=({\hat{a}}_1,\ldots , {\hat{a}}_k)^{\top }\), \(a = (a_1,\ldots , a_k)^{\top }\), and

$$\begin{aligned} {\hat{A}} = \big (\phi _{2k-1}({\hat{d}}_1),\ \ldots ,\ \phi _{2k-1}({\hat{d}}_k)\big ),\ A = \big (\phi _{2k-1}(d_1),\ \ldots ,\ \phi _{2k-1}(d_{k})\big ). \end{aligned}$$

Then

$$\begin{aligned} \Big |\Big |\eta _{k,k}(d_1,\ldots , d_k, {\hat{d}}_1, \ldots , {\hat{d}}_k)\Big |\Big |_{\infty }<\frac{(1+d)^{2k-1}}{ d_{\min }^{k-1}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

Proof

Since \(||{\hat{A}}{\hat{a}}-Aa||_2<\sigma \), we have

$$\begin{aligned} \min _{{\hat{\alpha }} \in {\mathbb {C}}^k}||{\hat{A}}{\hat{\alpha }}-Aa||_2<\sigma , \end{aligned}$$

and hence

$$\begin{aligned} \max _{0\le l\le k-1}\min _{{\hat{\alpha }} \in {\mathbb {C}}^k}||{\hat{A}}_l {\hat{\alpha }} -A_la||_2\le \min _{{\hat{\alpha }} \in {\mathbb {C}}^k}||{\hat{A}}{\hat{\alpha }}-Aa||_2<\sigma , \end{aligned}$$
(6.19)

where

$$\begin{aligned} {\hat{A}}_{l}= \left( \begin{array}{ccc} {\hat{d}}_1^l&{}\cdots &{}{\hat{d}}_k^l\\ {\hat{d}}_1^{l+1}&{}\cdots &{}{\hat{d}}_k^{l+1}\\ \vdots &{}\vdots &{}\vdots \\ {\hat{d}}_1^{l+k}&{}\cdots &{}{\hat{d}}_k^{l+k} \end{array} \right) , \quad A_{l}= \left( \begin{array}{ccc} d_1^l&{}\cdots &{}d_{k}^l\\ d_1^{l+1}&{}\cdots &{}d_{k}^{l+1}\\ \vdots &{}\vdots &{}\vdots \\ d_1^{l+k}&{}\cdots &{} d_{k}^{l+k} \end{array} \right) . \end{aligned}$$

For each l, from the decomposition \({\hat{A}}_l = {\hat{A}}_0 \text {diag}({\hat{d}}_1^l,\ \ldots ,\ {\hat{d}}_k^l), A_l\!=\!A_0\text {diag}((d_1)^l, \ldots , (d_{k})^l)\), we get

$$\begin{aligned} \min _{{\hat{\alpha }} \in {\mathbb {C}}^{k}}||{\hat{A}}_l\hat{\alpha }-A_la||_2\ge \min _{{\hat{\alpha }}_l\in {\mathbb {C}}^{k}}||{\hat{A}}_0{\hat{\alpha }}_l-A_0\alpha _l||_2, \end{aligned}$$
(6.20)

where \(\alpha _l=(a_1(d_1)^l,\ldots ,a_{k}(d_{k})^l)^{\top }\). Let V be the space spanned by the column vectors of \({\hat{A}}_0\). Then the dimension of V is k, and \(V^\perp \), the orthogonal complement of V in \({\mathbb {C}}^{k+1}\) is of dimension one. We let v be a unit vector in \(V^{\perp }\) and let \(P_{V^{\perp }}\) be the orthogonal projection onto \(V^{\perp }\). Similarly to (6.18), we have

$$\begin{aligned} \min _{{\hat{\alpha }}_l\in {\mathbb {C}}^k}||{\hat{A}}_0\hat{\alpha }_l-A_0\alpha _l||_2=||P_{V^{\perp }}(A_0\alpha _l)||_2=|v^{*}A_0 \alpha _l|=\Big |\sum _{j=1}^{k}a_j(d_j)^lv^{*}\phi _{k}(d_j)\Big |=|\beta _l|, \end{aligned}$$
(6.21)

where \(\beta _l = \sum _{j=1}^{k}a_j(d_j)^lv^{*}\phi _{k}(d_j)\). Let \(\beta = (\beta _0,\ldots , \beta _{k-1})^{\top }\). Moreover, similarly to Step 4 in the proof of Theorem 6.2, we have

$$\begin{aligned} \Big |\Big |\eta _{k,k}(d_1,\ldots , d_k, {\hat{d}}_1, \ldots , {\hat{d}}_k)\Big |\Big |_{\infty } \le \frac{(1+d)^{2k-1}}{m_{\min }(d_{\min })^{k-1}} ||\beta ||_{\infty }. \end{aligned}$$

On the other hand, (6.19)–(6.21) indicate that \(||\beta ||_{\infty }<\sigma \). Hence, we obtain that

$$\begin{aligned} \Big |\Big |\eta _{k,k}(d_1,\ldots , d_k,{\hat{d}}_1, \ldots ,{\hat{d}}_k)\Big |\Big |_{\infty } \le \frac{(1+d)^{2k-1}}{(d_{\min })^{k-1}}\frac{\sigma }{m_{\min }}. \end{aligned}$$

This completes the proof. \(\square \)

7 Conclusions and Future Work

In this paper, we have improved the estimates of resolution limits in two-dimensional super-resolution problems. We also theoretically demonstrate the optimal performance of a sparsity-promoting algorithm. Leveraging the new techniques in the proof, we have proposed a coordinate-combination-based model order detection algorithm and a coordinate-combination-based MUSIC algorithm for DOA estimation in two dimensions. The superiority of the introduced algorithms was demonstrated both theoretically or numerically.

Our work is also a start of many new topics. Firstly, one could extend the techniques to three- and k-dimensional spaces to improve the resolution estimates in higher dimensional super-resolution problems. Secondly, the idea of coordinate-combination could inspire new algorithms for two-dimensional DOA estimations in the case of multiple snapshots. These works will be presented in a near future.