Subdiffraction incoherent optical imaging via spatial-mode demultiplexing

I propose a spatial-mode demultiplexing (SPADE) measurement scheme for the far-field imaging of spatially incoherent optical sources. For any object too small to be resolved by direct imaging under the diffraction limit, I show that SPADE can estimate its second or higher moments much more precisely than direct imaging can fundamentally do in the presence of photon shot noise. I also prove that SPADE can approach the optimal precision allowed by quantum mechanics in estimating the location and scale parameters of a subdiffraction object. Realizable with far-field linear optics and photon counting, SPADE is expected to find applications in both fluorescence microscopy and astronomy.

The use of coherent optical processing to improve the lateral resolution of incoherent imaging has thus far received only modest attention, as prior proposals [13,[20][21][22][23][24][25] either did not demonstrate any substantial improvement or neglected the important effect of noise. Using quantum optics and parameter estimation theory, here I show that, for any object too small to be resolved by diffraction-limited direct imaging, SPADE can estimate its second or higher moments much more precisely than direct imaging can fundamentally do in the presence of photon shot noise. Moreover, I prove that SPADE can approach the optimal precision allowed by quantum mechanics in estimating the location and scale parameters of a subdiffraction object. Given the usefulness of moments in identifying the size and shape of an object [26], the proposed scheme, realizable with far-field linear optics and photon counting, should provide a major boost to incoherent imaging applications that are limited by diffraction and photon shot noise, including not only fluorescence microscopy [27][28][29][30] and space-based telescopes [31] but also modern ground-based telescopes [32][33][34][35].
This paper is organized as follows. Section II introduces the background theory of quantum optics and parameter estimation for incoherent imaging. Section III describes the SPADE scheme for general imaging. Section IV presents the most important results of this paper, namely, a comparison between the statistical performances of direct imaging and SPADE in the subdiffraction regime, showing the possibility of giant precision enhancements for moment estimation, while Appendix A justifies an approximation made in Sec. IV in more detail. Section V presents a numerical example to illustrate the theory, comparing the errors in estimating the first and second moments of subdiffraction objects using direct imaging and SPADE. Section VI proves that SPADE is close to the quantum precision limits to location and scale estimation in the subdiffraction regime. Section VII discusses other practical and open issues.

II. BACKGROUND FORMALISM
A. Quantum optics I begin with the quantum formalism established in Ref. [1] to ensure correct physics. The quantum state of thermal light with M temporal modes and a bandwidth much smaller than the center frequency can be written as ρ ⊗M , where is the average photon number per mode assumed to be 1 [36,37], ρ 0 = |vac vac| is the vacuum state, ρ 1 is the one-photon state with a density matrix equal to the mutual coherence function, and O( 2 ) denotes second-order terms, which are neglected hereafter. It is standard to assume that the fields from incoherent objects, such as stellar or fluorescent emitters, are spatially uncorrelated at the source [37]. In a diffraction-limited imaging system, the fields then propagate as waves; the Van Cittert-Zernike theorem is the most venerable consequence [37]. At the image plane of a conventional lens-based two-dimensional imaging system in the paraxial regime [37,38], this implies where R = (X, Y ) is the object-plane position, the notation (u 1 , u 2 , . . . ) denotes a column vector, F (R) is the source intensity distribution with normalization d 2 RF (R) = 1, |r = a † (r) |vac is a one-photon position eigenket on the image plane at position r = (x, y) with [a(r), a † (r )] = δ 2 (r − r ) [39], and ψ(r) is the field point-spread function (PSF) of the imaging system. Without loss of generality, the image-plane position vector r has been scaled with respect to the magnification to follow the same scale as R [38]. For convenience, I also normalize the position vectors with respect to the width of the PSF to make them dimensionless.
Consider the processing and measurement of the image-plane field by linear optics and photon counting. The counting distribution for each ρ can be expressed as n 0 , n 1 , . . .| ρ |n 0 , n 1 , . . . , where |n 0 , n 1 , . . .
is the optical mode function that is projected to the jth output, and With the negligence of multiphoton coincidences, the relevant projections are {|vac , |φ j }, with |φ j ≡ |0, . . . , n j = 1, . . . , 0 = b † j |vac = d 2 rφ j (r) |r . The zero-photon probability becomes 1 − and the probability of one photon being detected in the jth mode becomes p(j), where is the one-photon distribution. A generalization of the measurement model using the concept of positive operator-valued measures is possible [1,3] but not needed here. For example, direct imaging can be idealized as a measurement of the position of each photon, leading to an expected image given by which is a basic result in statistical optics [37,38]. While Eq. (2.4) suggests that, similar to the coherentimaging formalism, the PSF acts as a low-pass filter in the spatial frequency domain [38], the effect of more general optical processing according to Eq. where B(L) is the binomial distribution for detecting L photons over M trials with single-trial success probability and M(m|L) = δ L, j m j L! j [p(j)] m j /m j ! is the multinomial distribution of m given L total photons [40]. The average photon number in all modes becomes N ≡ M . Taking the limit of → 0 while holding N constant, B(L) becomes Poisson with mean N , and P (m) → exp(−N ) j [N p(j)] m j /m j !, which is the widely used Poisson model of photon counting for incoherent sources at optical frequencies [3,18,[27][28][29][30][31]36].

B. Parameter estimation
The central goal of imaging is to infer unknown properties of the source distribution F (R) from the measurement outcome m. Here I frame it as a parameter estimation problem, defining θ = (θ 1 , θ 2 , . . . ) as a column vector of unknown parameters and assuming the source distribution F (R|θ) to be a function of θ. Denote an estimator asθ(m) and its error covariance matrix as For any unbiased estimator ( mθ (m)P (m|θ) = θ), the Cramér-Rao bound (CRB) is given by [40,41] where J(θ) is the Fisher information matrix and the matrix inequality implies that Σ − J −1 is positivesemidefinite, or equivalently u (Σ − J −1 )u ≥ 0 for any real vector u. Assuming the model given by Eq. (2.5) and a known N , it can be shown [3] that which is a well known expression [16-18, 28, 30, 36]. For example, the direct-imaging information, given Eq. (2.4) and the limit p(j|θ) → d 2 rf (r|θ), is For large N , the maximum-likelihood estimator is asymptotically normal with mean θ and covariance Σ(θ) = J −1 (θ), even though it may be biased for finite N [40,41]. Bayesian and minimax generalizations of the CRB for any biased or unbiased estimator are possible [5,41] but not considered here as they offer qualitatively similar conclusions. The Fisher information is nowadays regarded as the standard precision measure in incoherent imaging research, especially in fluorescence microscopy [18,[28][29][30], where photon shot noise is the dominant noise source and a proper statistical analysis is essential. Apart from the CRB, another useful property of the Fisher information is the data-processing inequality [42,43], which mandates that, once the measurement is made, no further processing of the data can increase the information. For example, direct imaging with large pixels can be modeled as integrations of photon counts over groups of infinitesimally small pixels, so the information can never exceed Eq. (2.8). More generally, the data-processing inequality rules out the possibility of improving the information using any processing that applies to the direct-imaging intensity, such as the proposal by Walker et al. for incoherent imaging in Ref. [20], even if the processing is done with optics. Hence, as argued by Tham et al. [14], coherent processing that is sensitive to the phase of the field is the only way to improve upon Eq. (2.8). The information for any coherent processing and measurement is in turn limited by quantum upper bounds in SPADE is a technique previously proposed for the purpose of superresolving the separation between two incoherent point sources [1,6,7,9,[13][14][15]. I now ask how SPADE can be generalized for the imaging of an arbitrary source distribution. Consider the transverse-electromagnetic (TEM) basis and He q is the Hermite polynomial [48,49]. Assuming a Gaussian PSF given by ψ(r) = φ 00 (r), which is a common assumption in fluorescence microscopy [28,30], |ψ R is a coherent state [50], and the one-photon density matrix in the TEM basis becomes To investigate the imaging capability of SPADE measurements, define with µ = (µ X , µ Y ), leading to a linear parameterization of g given by Notice that each Θ µ is a moment of the source distribution filtered by a Gaussian. In particular, if the object is much smaller than the PSF width, the Gaussian can be neglected, and Θ µ becomes a moment of the source distribution itself. This subdiffraction regime is of central interest to superresolution imaging and, as shown in Sec. IV, also a regime in which direct imaging performs relatively poorly. Since a distribution is uniquely determined by its moments [51], F (R|θ) exp[−(X 2 + Y 2 )/4] and therefore F (R|θ) can be reconstructed given the moments, at least in principle. Note also that the object-moment order µ is nontrivially related to the order of the matrix element via µ = q + q , which is a peculiar feature of incoherent imaging. A measurement in the TEM basis yields which is sensitive only to moments with even µ X and µ Y , as also recognized by Yang et al. in Ref. [13]. This measurement is realized by demultiplexing the image-plane optical field in terms of the TEM basis via linear optics before photon counting for each mode and can be implemented by many methods, most commonly found in optical communications [1,6,15,[52][53][54]. To access the other moments, consider interferometry between two TEM modes that implements the projections This two-channel interferometric TEM (iTEM) measurement leads to The dependence on Θ q+q is the main interest here, as it allows one to access any moment parameter. For multiparameter estimation and general imaging, multiple TEM and iTEM measurements are needed. To be specific, Table I lists a set of schemes that can be used together to estimate all the moment parameters, while Fig. 1 shows a graphical representation of the schemes in the (q x , q y ) space. Neighboring modes are used in the proposed iTEM schemes because they maximize the Fisher information, as shown later in Sec. IV. The bases in different schemes are incompatible with one another, so the photons have to be rationed among the 7 schemes, by applying the different schemes sequentially through reprogrammable interferometers or spatial-light modulators [15,[52][53][54] for example.

TABLE I.
A list of measurement schemes, their projections, and the orders µ = (µ X , µ Y ) of the moment parameters Θ µ to which they are sensitive.

A. Direct imaging
Although the proposed SPADE method can in principle perform general imaging, its complexity would not be justifiable if it could not offer any significant advantage over direct imaging. To compare their statistical performances, consider first direct imaging with a Gaussian PSF. Expanding |ψ(r − R)| 2 in a Taylor series, I obtain in terms of the moment parameters defined as In terms of this parameterization, the Fisher information becomes Each dot corresponds to a TEM mode in the (q x , q y ) space, and each line connecting two dots denotes an interferometer between two modes in an iTEM scheme. The bracketed numbers are the orders (µ X , µ Y ) of the moment parameters to which the projections are sensitive. The unconnected dots in some of the iTEM schemes denote the rest of the modes in a complete basis, which can be measured simultaneously to provide extra information.
Assume now that the support of the source distribution is centered at the origin and has a maximum width ∆ much smaller than the PSF width. Since the spatial dimensions have been normalized with respect to the PSF width, the PSF width is 1 in the dimensionless unit, and the assumption can be expressed as which defines the subdiffraction regime. The parameters are then bounded by and the image is so blurred that it resembles the TEM 00 mode rather than the object, viz., f (r|θ) = |φ 00 (r)| 2 [1 + O(∆)]. Writing the denominator in Eq. (4.4) as 1 + O(∆) and applying the orthogonality of Hermite polynomials [48,49], I obtain CRB This is a significant result in its own right, as it establishes a fundamental limit to superresolution algorithms for shot-noise-limited direct imaging [20,[55][56][57], generalizing the earlier results for two sources [16][17][18] and establishing that, at least for a Gaussian PSF, the moments are a natural, approximately orthogonal [58] set of parameters for subdiffraction objects.

B. SPADE
To investigate the performance of SPADE for moment estimation, note that, in the subdiffraction regime, Eq. (3.6) can be expressed as where O(∆ |µ| 1 +2 ) is a linear combination of moments that are at least two orders above µ and therefore much smaller than θ µ . Approximating Θ µ with θ µ greatly simplifies the analysis below; Appendix A contains a more detailed justification of this approximation. For the TEM scheme, taking Θ 2q = θ 2q in Eq. (3.8) makes the information matrix diagonal, with the nonzero elements given by where N (TEM) is the average photon number available to the TEM scheme. The relevant CRB components are hence , µ = 2q. is unbiased and achieves the error given by Eq. (4.11) under the assumption Θ 2q = θ 2q . A precision enhancement factor can be defined as the ratio of Eq. (4.8) to Eq. (4.11), viz., Apart from a factor N (TEM) /N determined by the different photon numbers detectable in each method, the important point is that the factor scales inversely with θ µ = O(∆ |µ| 1 ), so the enhancement is enormous in the ∆ 1 subdiffraction regime. The prefactor in Eq. (4.13) also increases with increasing µ. To investigate the errors in estimating the other moments via the iTEM schemes, assume Θ q+q = θ q+q in Eqs. (3.10). The dependence of Eqs. (3.10) on θ q+q is the main interest, while I treat β(q, q ) as an unknown nuisance parameter [59]; the TEM scheme can offer additional information about β(q, q ) via θ 2q and θ 2q but it is insignificant and neglected here to simplify the analysis. The information matrix with respect to {θ q+q , β(q, q )} is block-diagonal and consists of a series of two-by-two matrices, each of which can be determined from Eqs. (3.10) for two parameters (θ q+q , β(q, q )) and is given by 14) where N (iTEM) is the average photon number available to the iTEM scheme. The CRB component with respect to θ q+q is hence obtained by taking the inverse of Eq. (4.14) and extracting the relevant term; the result is Defining the two photon counts of the (q, q ) iTEM channels as m (q,q ) + and m (q,q ) − with expected values N (iTEM) p (q,q ) (+|θ) and N (iTEM) p (q,q ) (−|θ), respectively, it can be shown that the estimatoř is unbiased and achieves the error given by Eq. (4.15) under the assumption Θ q+q = θ q+q . The iTEM schemes can also offer information about θ 2q and θ 2q via the background parameter β(q, q ), but the additional information is inconsequential and neglected here. An enhancement factor can again be expressed as . (4.17) With the background parameter β(q, µ − q) on the order of ∆ min[|2q| 1 ,|2(µ−q)| 1 ] , both 1/β and the µ!/[q!(µ − q)!] coefficient can be maximized by choosing q to be as close to µ/2 as possible. This justifies the pairing of neighboring modes in the iTEM schemes listed in Fig. 1 and Table I. With iTEM1, iTEM2, iTEM4, and iTEM5, |µ| 1 is odd, and With iTEM3 and iTEM6, |µ| 1 is even, and The enhancements, being inversely proportional to β, can again be substantial for higher moments. The only exception is the estimation of the first moments θ 10 and θ 01 , for which the right-hand side of Eq. (4.17) becomes N (iTEM) /N and the iTEM schemes offer no advantage. These results can be compared with Refs. [1,6] for the special case of two equally bright point sources. If the origin of the image plane is aligned with their centroid and their separation along the X direction is d, θ 20 = d 2 /4, and a reparameterization leads to a transformed Fisher information J (direct) (d) ≈ N d 2 /8 and J (TEM) (d) ≈ N/4 for the estimation of d, in accordance with the results in Refs. [1,6] to the leading order of d. The experiments reported in Refs. [13][14][15] serve as demonstrations of the proposed scheme in this special case.

C. Elementary explanation
The enhancements offered by SPADE can be understood by considering the signal-to-noise ratio (SNR) of a measurement with Poisson statistics. Suppose for simplicity that the mean count of an output can be written as N p(j|θ) = Aθ + B, which consists of a signal component Aθ and a background B. The variance is Aθ + B, so the SNR can be expressed as (Aθ) 2 /(Aθ + B). To maximize it, the background B should be minimized to reduce the variance. For direct imaging, the background according to Eq. (4.1) is dominated by the TEM 00 mode, whereas each output of SPADE is able to filter out that mode as well as other irrelevant low-order modes to minimize the background without compromising the signal. To wit, Eq. (3.8) for TEM measurements has no background, while Eqs. (3.10) for iTEM also have low backgrounds in the subdiffraction regime. The Fisher information given by Eq. (2.7) is simply a more rigorous statistical tool that formalizes the SNR concept and provides error bounds; reducing the background likewise improves the information by reducing the denominator in Eq. (2.7).
In this respect, the proposed scheme seems to work in a similar way to nulling interferometry for exoplanet detection [60]. The nulling was proposed there for the special purpose of blocking the glare of starlight, however, and there had not been any prior statistical study of nulling for subdiffraction objects to my knowledge. The surprise here is that coherent processing in the far field can vastly improve general incoherent imaging even in the subdiffraction regime and in the presence of photon shot noise, without the need to manipulate the sources as in prior superresolution microscopic methods [61][62][63][64][65][66] or detect evanescent waves via lossy or unrealistic materials [67,68].

V. NUMERICAL DEMONSTRATION
Here I present a numerical study to illustrate the proposal and confirm the theory. Assume an object that consists of 5 equally bright point sources with random positions within the square −0.3 ≤ X ≤ 0.3 and −0.3 ≤ Y ≤ 0.3. The average photon number is assumed to be N = 5×10, 000 in total. Figure 2 shows an example of the generated source positions and a direct image with pixel size δxδy = 0.1 × 0.1 and Poisson noise. I focus on the estimation of the first and second moments of the source distribution given by {θ µ ; µ = (1, 0), (0, 1), (2, 0), (0, 2), (1, 1)}. For direct imaging, I use the estimatorθ µ = µ! j D µ (r j )m(r j )/N , where m(r j ) is the photon count at a pixel positioned at r j . It can be shown that, in the small-pixel limit, this estimator is unbiased and approaches the CRB given by Eq.  For SPADE, I consider only the TEM 00 , TEM 10 , and TEM 01 modes, and the photons in all the other modes are discarded. As illustrated in Fig. 3, the iTEM1, iTEM2, and iTEM3 schemes suffice to estimate the parameters of interest. Table II lists the projections, and Fig. 4 plots the spatial wave functions for the projections. The light is assumed to be split equally among the three schemes, leading to 9 outputs; Fig. 5 shows a sample of the photon counts drawn from Poisson statistics. For the estimators, I use Eq. (4.12) and (4.16). Compared with the large number of pixels in direct imaging, the compressive nature of SPADE for moment estimation is an additional advantage. FIG. 3. A graphical representation of the iTEM1, iTEM2, and iTEM3 schemes involving the three TEM modes to be measured. Each line denotes an interferometer between two modes, and each unconnected dot denotes a TEM mode to be measured. The modes are also denoted by the parameters θ µ to which they are sensitive.  Fig. 3. |0, 0 corresponds to the TEM 00 mode, |1, 0 corresponds to the TEM 10 mode, and |0, 1 corresponds to the TEM 01 mode.
FIG. 4. The spatial wave functions r|φ j for the projections listed in Table II. x and y are image-plane coordinates normalized with respect to the PSF width and the color code corresponds to amplitudes of normalized wave functions. Figure 6 plots the numerically computed mean-square errors (MSEs) for 100 randomly generated objects versus true parameters in log-log scale. Each error value for a given object is computed by averaging the squared difference between the estimator and the true parameter over 500 samples of Poissonian outputs.  Table II and Fig. 4. Note how the counts for the antisymmetric modes are much lower as a result of filtering out the lowerorder modes. As argued in Sec. IV C, such dark ports enable a higher SNR by reducing the background without compromising the signal.
For comparison, Fig. 6 also plots the CRBs given by Eqs. (4.8), (4.11), and (4.15), assuming Θ µ = θ µ and neglecting the O(∆) term in Eq. (4.8). A few observations can be made: 1. As shown by the plots in the first row of Fig. 6, SPADE is 3 times worse than direct imaging at estimating the first moments. This is because SPADE uses only 1/3 of the available photons to estimate each first moment.
2. The theory suggests that the advantage of SPADE starts with the second moments, and indeed the other plots show that SPADE is substantially more precise at estimating them, even though SPADE uses only a fraction of the available photons to estimate each moment. This enhancement is a generalization of the recent results on two sources [1-6, 8, 9, 12-15].
3. The errors are all remarkably tight to the CRBs, despite the simplicity of the estimators and the approximations in the bounds. In particular, the excellent performance of the SPADE estimator in the subdiffraction regime justifies its assumption of Θ µ = θ µ .

VI. QUANTUM LIMITS
In the diffraction-unlimited regime, it is not difficult to prove that direct imaging achieves the highest Fisher information allowed by quantum mechanics. To be precise, that regime can be defined as one in which the PSF is so sharp relative to the source distribution that {|ψ R ; R ∈ supp(F )} can be approximated as the orthogonal position basis {|r }. ρ 1 becomes diagonal in that basis, and the quantum Fisher information [1,3,[43][44][45][46] is equal to the direct-imaging information given by Eq. (2.8). The physics in the opposite subdiffraction regime is entirely different, however, as diffraction causes {|ψ R } to have significant overlaps with one another, and more judicious measurements can better deal with the resulting indistinguishability, as demonstrated in Secs. IV and V. I now prove that SPADE is in fact near-quantum-optimal in estimating location and scale parameters of a source distribution in the subdiffraction regime. Suppose that the distribution has the form such that θ parameterizes a coordinate transformation R = R(ξ|θ), and the transformation leads to a reference measure F 0 (ξ) that is independent of θ. Taking R as a column vector, I can rewrite Eq. (2.2) as where E 0 (·) ≡ dξF 0 (ξ)(·) and k is the momentum operator in a column vector. I can now use the quantum upper bound on the Fisher information [1,43,46] and the convexity of the quantum Fisher information [69,70] to prove that the Fisher information for any measurement is bounded as where K is the quantum Fisher information proposed by Helstrom [44]. For the pure state, K can be computed analytically to give for the gaussian PSF. For example, a location parameter can be expressed as R = ξ − (θ, 0). Equation (6.7) then givesK = N. (6.8) This can be attained by either direct imaging or iTEM1 in the subdiffraction regime. The advantage of SPADE starts with the second moments, which are particularly relevant to scale estimation. Let R = θξ, which results inK For the TEM measurement, on the other hand, and using the lower bound by Stein et al. [71], I obtain which approaches the quantum limit given by Eq. (6.9) for θ → 0. The argument can be made more precise if the form of F 0 (ξ) is known, as the extended convexity of the quantum Fisher information can be used to obtain a tighter upper bound [70,72], while the O(θ 4 ) term in Eq. (6.14) can be computed to obtain an explicit lower bound for any θ.

VII. DISCUSSION
Though promising, the giant precision enhancements offered by SPADE do not imply unlimited imaging resolution for finite photon numbers. The higher moments are still more difficult to estimate even with SPADE in terms of the fractional error, which is ∼ CRB µµ /θ 2 µ = 1/O(N ∆ |µ| 1 ) for even |µ| 1 and 1/O(N ∆ |µ| 1 +1 ) for odd |µ| 1 , meaning that more photons are needed to attain a desirable fractional error for higher-order moments. Intuitively, this is because of the inherent inefficiency of subdiffraction objects to couple to higher-order modes, and the need to accumulate enough photons in those modes to achieve an acceptable SNR. A related issue is the reconstruction of the full source distribution, which requires all moments in principle. A finite number of moments cannot determine the distribution uniquely by themselves [51], although a wide range of regularization methods, such as maximum entropy and basis pursuit, are available for more specific scenarios [51,55,57,73,74].
Despite these limitations, the fact remains that direct imaging is an even poorer choice of measurement for subdiffraction objects and SPADE can extract much more information, simply from the far field. For example, the size and shape of a star, a planetary system, a galaxy, or a fluorophore cluster that is poorly resolved under direct imaging can be identified much more accurately through the estimation of the second or higher moments by SPADE. Alternatively, SPADE can be used to reach a desirable precision with far fewer photons or a much smaller aperture, enhancing the speed or reducing the size of the imaging system for the more special purposes. In view of the statistical analysis in Refs. [2,4,6], the image-inversion interferometers proposed in Refs. [2,4,6,[23][24][25] are expected to be similarly useful for estimating the second moments. For larger objects, scanning in the manner of confocal microscopy [27] or adaptive alignment [1] should be helpful.
Many open problems remain; chief among them are the incorporation of prior information, generalizations for non-Gaussian PSFs, the derivation of more general quantum limits, the possibility of even better measurements, and experimental implementations. The quantum optimality of SPADE for general imaging is in particular an interesting question. These daunting problems may be attacked by more advanced methods in quantum metrology [43][44][45][46][75][76][77][78], quantum state tomography [79][80][81], compressed sensing [55][56][57]81], and photonics design [52][53][54]. information, but the assumption also means that I do not need to consider any channel with order lower than q to compute the CRB with respect to θ 2q , simplifying the analysis below.
In practice, such a careful treatment of the nuisance parameters is unlikely to be necessary in the subdiffraction regime, as the numerical analysis in Sec. V shows that excellent results can be obtained simply by taking Θ µ = θ µ without any correction.