Semiparametric estimation for incoherent optical imaging

The theory of semiparametric estimation offers an elegant way of computing the Cram\'er-Rao bound for a parameter of interest in the midst of infinitely many nuisance parameters. Here I apply the theory to the problem of moment estimation for incoherent imaging under the effects of diffraction and photon shot noise. Using a Hilbert-space formalism designed for Poisson processes, I derive exact semiparametric Cram\'er-Rao bounds and efficient estimators for both direct imaging and a quantum-inspired measurement method called spatial-mode demultiplexing (SPADE). The results establish the superiority of SPADE even when little prior information about the object is available.


I. INTRODUCTION
Two fundamental problems confront incoherent optical imaging: the diffraction limit [1,2] and the photon shot noise [3,4]. To quantify their effects on the resolution rigorously, the Cramér-Rao bound (CRB) on the error of parameter estimation [5] has been widely used, especially in astronomy and fluorescence microscopy [6][7][8][9][10][11][12][13][14][15][16][17]. Most previous studies, however, assume that the object has a simple specific shape, such as a point source or two, and only one or few parameters of the object are unknown. Such parametric models may not be justifiable when there is little prior information about the object. Without a parametric model, the CRB seems intractable-infinitely many parameters are needed to specify the object distribution, leading to a Fisher information matrix with infinitely many entries, and then the infinite-dimensional matrix has to be inverted to give the CRB. While there also exist many studies on superresolution that can deal with more general objects [17][18][19][20], they either ignore noise or use noise models that are too simplistic to capture the signal-dependent nature of photon shot noise.
To compute the CRB and to evaluate the efficiency of estimators for general objects, here I propose a theory of semiparametric estimation for incoherent optical imaging. Semiparametric estimation refers to the estimation of a parameter of interest in the presence of infinitely many other unknown "nuisance" parameters [21,22]. The method has found many applications in econometrics, biostatistics, and astrostatistics [21]. A typical example is the estimation of the mean of a random variable when its probability density is assumed to have finite variance but otherwise arbitrary. Thanks to a beautiful Hilbert-space formalism [21,22], the semiparametric theory is able to compute the CRB for such problems despite the infinite dimensionality and also evaluate the existence and efficiency of semiparametric estimators. Such problems are exactly the type that bedevil the study of imaging thus far, and here I show how the semiparametric theory can be used to yield similarly elegant results for optical imaging.
The optics problem of interest here is the far-field imaging of an object emitting spatially incoherent light [2,4], with * mankei@nus.edu.sg; https://www.ece.nus.edu.sg/stfpage/tmk/ the most important applications being optical astronomy [6][7][8][9] and fluoresence microscopy [10][11][12][13][14]. With a finite numerical aperture, the imaging system introduces a spatial bandwidth limit to the waves, otherwise known as the diffraction limit [1,2]. The standard measurement, called direct imaging, records the intensity of the light on the image plane. Recently, quantum information theory inspired the invention of an alternative measurement called spatial-mode demultiplexing (SPADE) [23], which has been shown theoretically  and experimentally [52][53][54][55][56][57][58] to be superior to direct imaging in resolving two sub-Rayleigh sources and estimating the size and moments of a subdiffraction object. Most of the aforementioned studies, however, assume parametric models for the object. Exceptions include Refs. [24-27, 50, 55], which consider the estimation of object moments, but the results there are not conclusive-only the CRB for direct imaging was computed exactly [25], while the CRB for SPADE was evaluated only approximately [24,25,27]. Another problem is the existence and efficiency of unbiased moment estimators; again only approximate results have been obtained so far [24,25]. Building on the established semiparametric theory [21,22], here I compute the exact semiparametric CRBs and also propose unbiased and efficient moment estimators for both direct imaging and SPADE. These results enable a fair and rigorous comparison of the two measurement methods, which proves the fundamental superiority of SPADE for moment estimation. This paper is organized as follows. Section II introduces the Fisher information and the CRB for Poisson processes. Section III presents the semiparametric CRB in terms of a Hilbert-space formalism designed for such processes. Section IV introduces the models of direct imaging and SPADE. Section V computes the CRB for moment estimation with direct imaging and proposes an efficient estimator. Section VI shows how the CRB should be modified for a normalized object distribution. Section VII computes the CRB for SPADE and also proposes an efficient estimator. Section VIII uses the CRBs to compare the performances of direct imaging and SPADE, demonstrating the superiority of SPADE for subdiffraction objects. Section IX concludes the paper and points out open issues, while the Appendices detail the technical issues that arise in the main text.

II. CRAMÉR-RAO BOUND FOR POISSON PROCESSES
For optical astronomy [4,6,8,9], fluorescence microscopy [10][11][12][13][14], and even electron microscopy [15,16], Poisson noise can be safely assumed. Suppose that each detector in a photodetector array is labeled by x ∈ X , where X denotes the detector space. Assume that the observed process, such as the image recorded by a camera, is a Poisson random measure n on X and its σ-algebra Σ, with a mean given by the intensity measuren on the same (X , Σ) [59]. n(A) for any A ∈ Σ is then a Poisson variable with meann(A). For example, if X ⊆ R 2 is a two-dimensional surface, then Σ is the set of all subareas that can be defined on the surface, and n(A) = x∈A dn(x) is the detected photon number over the area A. For any vectoral function h : X → R q on the detector space,ȟ a linear functional of n, is a random variable with statistics where E denotes the statistical expectation, V denotes the covariance, ν denotes the average with respect to the intensity measuren, denotes the matrix transpose, and all vectors in this paper are column vectors. Suppose thatn depends on an unknown vectoral parameter θ ∈ Θ ⊂ R p with p entries and has a density f (x|θ) with respect to a dominating measure µ such that f (x|θ) = dn(x|θ)/dµ(x). The log-likelihood derivatives are given by [60] AsŠ is a linear functional of n, its covariance, called the Fisher information matrix, can be simplified via Eq. (2.3) and is given by [60] where S is a vector of detector-space functions given by Here V,Š,n, S, and ν are all evaluated at the same θ, and I assume hereafter that all functions of θ are evaluated implicitly at the same θ. Each S j is hereafter called a score function, borrowing the same terminology forŠ in statistics [21,22]. An important distinction is that, whereas E(Š j ) = 0, ν(S j ) does have to be zero, sincen does not have to be normalized. Let β(θ) be a scalar parameter of interest. If β(θ) = θ k for example, then all the other parameters in θ are called nuisance parameters. For any unbiased estimatorβ(n), the CRB on its variance is [5] J −1 seems intractable if θ is infinite-dimensional. The next section introduces a cleverer method.

III. SEMIPARAMETRIC CRAMÉR-RAO BOUND
The key to the semiparametric theory is to treat random variables as elements in a Hilbert space [21,22]. Here I introduce another Hilbert space for detector-space functions on top of the statistical one for the purpose of computing the CRB for Poisson processes. Define an inner product between two scalar functions h 1 , h 2 : X → R as and the norm as With the inner product, a Hilbert space H can be defined as the set of all square-summable functions, viz., Denote the set of score functions {S j } as S in a slight abuse of notation. If the Fisher information J jj = ν(S 2 j ) < ∞ for all j, S ⊂ H. Define the tangent space T ⊆ H of a parametric model as the linear span of S, or T = w S : w ∈ R p = span(S). (3.4) Define also an "influence" function as anyβ ∈ H that satisfies ν(βS) = u, (3.5) borrowing the name of a similar concept in statistics [21,22]. The Cauchy-Schwartz inequality ν( the right-hand side of which coincides with the CRB given by Eq. (2.7). Define the efficient influence as the influence function that saturates Eq. (3.6), viz.,

7)
CRB = ν(β 2 eff ).  Consider now the semiparametric scenario. For the purpose of this paper, it suffices to assume that the dimension of θ is infinite but countable (p = ∞). The score functions are still defined in the same way, but now there are infinitely many of them. The tangent space should be modified to be the closed linear span T = span(S), (3.10) so that projection into it is well defined [61], and the semiparametric CRB is still given by Eqs. (3.5), (3.8), and (3.9); see Appendix A for a proof. This Hilbert-space approach is tractable when finding a candidate influence function is straightforward and the tangent space is so large that the candidate is already in it or at least very close to it. If the dimension of θ is uncountable, the tangent space and the CRB can still be defined via the concept of parametric submodels [21,22], although it is not needed here.
If β can be expressed as a functional β(f ), a useful way of finding an influence function is to consider a functional derivative of β(f ) with respect to h(x) defined aṡ which leads to and theβ(x) function obtained from the functional derivative is an influence function that satisfies Eq. (3.5). The simplest example is the linear functional andβ(x) is an influence function.
If the tangent space is so large that T = H, then a squaresummable influence function is already in H = T and therefore efficient. There are often some constraints that make T smaller, however, and the CRB is reduced as a result. For example, if the constraint can be expressed as γ(f ) = 0, (3.16) and its functional derivative iṡ then and it follows thatγ should be placed in the set that spans T ⊥ , the orthocomplement of T in H. In terms of T ⊥ , the efficient influence can be evaluated as which is still tractable if R has a low dimension.

IV. INCOHERENT OPTICAL IMAGING
Consider a distribution of spatially incoherent sources described by the measure F on the object plane with coordinate y, a far-field paraxial imaging system with point-spread function ψ(z − y) for the field [2], further processing of the field on the image plane with coordinate z via passive linear optics with Green's function κ(x, z), and Poisson noise at the output detectors labeled by x ∈ X , as depicted by Fig. 2. For simplicity, assume one-dimensional imaging such that y, z ∈ R; generalization for two-dimensional imaging is possible [24,25] but not very interesting. The intensity can be described by the mixture model [4,[23][24][25]30] f where the image-plane coordinate z is normalized with respect to the magnification factor, both y and z are normalized with respect to the width of the point-spread function such that they are dimensionless, and ψ is normalized as |ψ(x)| 2 dx = 1. This semiclassical Poisson model can be derived from standard quantum optics [23,51,62].
For direct imaging with infinitesimal pixels, κ(x, z) = √ τ δ(x − z), where τ is a positive conversion factor, x ∈ X = R denotes the position of each pixel, dµ(x) = dx, and the image intensity obeys the convolution model which will be studied in Sec. V. The most remarkable physics of the problem lies in the possibility of improving the measurement via optics with a different Green's function κ. Quantum information theory has shown that substantial improvement is possible for subdiffraction objects, and SPADE has been found to be quantumoptimal in many special cases . In one version of SPADE, κ * (q, z) is the qth mode function in the point-spreadfunction-adapted (PAD) basis [25,36], such that the output intensity is given by where µ is simply the counting measure and κ and H obey special properties, as further discussed in Sec. VII. For a fair comparison, the quantum efficiencies of direct imaging and SPADE are assumed to be the same, meaning that [25] ∞ q=0 H(q|y) = τ, (4.5) where τ is the same factor as that for direct imaging. Then the expected photon number received in total, is also the same.

V. MOMENT ESTIMATION WITH DIRECT IMAGING
Consider the direct-imaging model given by Eq. (4.2). Assume that H can be expanded in a Taylor series as which leads to where the unknown parameters are the object moments defined by For the CRB to hold, the parameter space should correspond to the condition that F contains an infinite number of point sources with different positions, as discussed in Appendix B. Appendix C shows a way of reconstructing F from θ via an orthogonal-series expansion, following Ref. [51]. The score function for each θ j is It turns out that the tangent space T for this problem is equal to the whole Hilbert space H under certain technical conditions, as shown in Appendix D.
Let the parameter of interest be where u is independent of θ. To find a candidate influence function, a trick [63] is to consider the image moments φ given by are the monomials. Assuming that all the moments of F and H are finite such that all the moments of f are also finite, φ can be related to the object moments via and C is a lower-triangular matrix, and with C jj = H(x)dx = τ > 0, C −1 is well defined and also lower-triangular even if the dimension of C is infinite, as shown in Appendix E. The object moments can then be related to the image moments by and β can be expressed as According to Eq. (3.15), an influence function is . This result coincides with that derived in Ref. [25] via a more direct but less rigorous method, which is repeated in Appendix F for completeness. An unbiased and efficient estimatorβ(n) in terms of the observed process n can be constructed from the efficient influence asβ where the object moment estimator iš is a linear filter of n, so its variance is V(β) = ν(β 2 ), which coincides with the CRB. It is important to note that this estimator does not require any knowledge of the unknown parameters, asφ(n) is simply the empirical moments of the observed image, and C −1 is a lower-triangular matrix that depends on the moments of the point-spread function H. The estimator still works even if the object happens to consist of a finite number of point sources and θ is on the boundary of the parameter space, although its efficiency in that case is a more difficult question, as explained in Appendix B. Unlike some of the previous studies on superresolution [17][18][19][20], the results here place no restriction on the separations of the point sources and also account for Poisson noise explicitly.

VI. CONSTRAINED CRAMÉR-RAO BOUND
In imaging, the parameters of interest are often the moments with respect to a normalized object distribution with dF (y) = 1. A simple way of modeling this is to assume that θ 0 = 1 is known. This constraint also makes the model comparable to those in Refs. [24,26,51]. Then The new efficient influence, according to Eqs. (3.19) and (3.20), should therefore bẽ The constrained CRB is now The CRB is necessarily lowered by the constraint. Other approaches to the constrained CRB yield the same result, as discussed in Appendix G.
To construct a near-efficient estimator, suppose that n(X ) = dn(x) = L > 0 photons have been detected. Then dn(x) = L l=1 1 x=X l , and the photon positions {X 1 , X 2 , . . . , X L } are independent and identically distributed according to the probability measuren/N . Consider the estimatoř It is straightforward to show that which is close to the CRB given by Eq. (6.3) if L is close to its expected value N . The results are then consistent with standard results in semiparametric estimation concerning the moments of a normalized probability measure [21].

VII. EVEN-MOMENT ESTIMATION WITH SPADE
Now consider the SPADE model given by Eqs.
Suppose that Φ = {Φ q (k)} is the PAD basis [25,36] given by where b = {b q (k) : q ∈ N 0 } is the set of orthonormal polynomials defined by The polynomials exist for all q ∈ N 0 as long as the support of |Ψ(k)| 2 is infinite [64], and the orthonormality of Φ ensures that the measurement can be implemented by passive linear optics [23,30,36]. Equation (4.4) becomes As the b polynomials are derived by applying the Gram-Schmidt procedure to the monomials (1, k, k 2 , . . . ) , their basic properties include |Ψ(k) if |Ψ(k)| 2 is even, as is often the case in optics. These properties lead to where C is an upper-triangular matrix (C qj = 0 if j < q) with positive diagonal entries (C qq > 0). Equation (4.3) becomes which depends on the even moments θ 2j = y 2j dF (y), j ∈ N 0 . (7.9) The score function with respect to each θ 2j becomes Appendix H proves that T = span(S) = H.
To find a candidate influence function, suppose that Eq. (7.8) can be inverted to give An influence function for β = u θ according to Eq. (3.15) is thereforẽ Since T = H, thisβ belongs to T and is efficient as long as it is square-summable. The CRB is hence 13) D jk = f (j)δ jk . (7.14) A more direct but heuristic way of deriving Eq. (7.13) is shown in Appendix I. An unbiased and efficient estimator in terms of the observed detector counts n iš (q)n(q). (7.15) This estimator has a variance V(β) = ν(β 2 ) = CRB (SPADE) , requires no knowledge of any unknown parameter, and still works even if the object happens to consist of a finite number of point sources, with no restriction on their separations. If θ 0 = 1, the constrained CRB can be derived in ways similar to Sec. VI and Appendix G.
To estimate the odd moments of F via SPADE, variations of the PAD basis are needed [24,25]. The model is much more complicated and a derivation of the CRB and the efficient estimator is too tedious to work out here, but the upshot is the same: it can be shown that the tangent space for the problem encompasses the whole Hilbert space H, the efficient influence can be retrieved from the relation β = ν(β), and an unbiased and efficient estimator isβ(n) = β (x)dn(x).

A. Gaussian point-spread function
More explicit results can be obtained and the assumptions can be checked more carefully by assuming the Gaussian point-spread function The PAD basis becomes the Hermite-Gaussian basis, and it can be shown that [23,24,55] H(q|y) = τ exp − y 2 4 (y/2) 2q q! . (7.17) The C matrix in Eq. (7.7) can be determined by expanding exp(−y 2 /4) = ∞ j=0 (−y 2 /4) j /j!, giving It is not difficult to check that the matrix inverse of C is which is a degree-j polynomial of q.
∞ q=0θ 2j (q)f (q) is the jth factorial moment of f and indeed equal to θ 2j , since H(q|y) is Poisson and its factorial moment is ∞ q=0θ 2j (q)H(q|y) = y 2j [65]. In general, each degree-j moment of H(q|y) is a degree-j polynomial of y 2 , so each degree-j moment of f (q) is a linear combination of the moments of F up to degree 2j. All the moments of f are therefore finite as long as all the moments of F are finite. If u has a finite number of nonzero entries, the influence function given by Eqs. (7.12) is a polynomial of q, so ν(β 2 ) < ∞, and β ∈ H is ensured.

B. Bandlimited point-spread function
Another important example is the bandlimited point-spread function given by b is then the well known set of Legendre polynomials [66]. Appendix J shows the detailed calculations; here I list the results only. Equation (7.5) becomes where !! denotes the double factorial [66].θ 2j (q) is a degree-2j polynomial of q, soβ(q) is also a polynomial of q if u contains a finite number of nonzero entries. As long as all the moments of F are finite, all the moments of f can also be shown to be finite, and ν(β 2 ) < ∞ is ensured. Notice that the direct-imaging theory in Sec. V breaks down for this bandlimited point-spread function, as the second and higher even moments of H(x) = τ |ψ(x)| 2 = (τ /π) sinc 2 (x) are all infinite. The CRB in that case remains an open problem, although it is possible to apodize the point-spread function optically such that all its moments become finite and the semiparametric estimator given by Eq. (5.19) has a finite variance. For example, if then Ψ(k) is infinitely differentiable despite the hard bandwidth limit [67] and all the moments of |ψ(x)| 2 are finite [25].

VIII. COMPARISON OF IMAGING METHODS
The advantage of SPADE over direct imaging occurs in the subdiffraction regime, where the width ∆ of the object distribution F with respect to the origin is much smaller than the width of the point-spread function ψ [24][25][26]51]. As the width of ψ is normalized as 1, the regime is defined as For direct imaging in the subdiffraction regime, the image becomes close to the point-spread function, viz., 4) where N , the expected photon number received in total, is given by Eq. (4.6). With C jk = τ O(1) and ν(φφ ) = N O(1), the CRB becomes For SPADE on the other hand, notice that the C and C −1 matrices are upper-triangular, meaning that and the CRB for estimating θ k , where k is even, becomes which is much lower than Eq. (8.5) when ∆ 1 and k ≥ 2. This is consistent with earlier approximate results [24,25]. An intuitive explanation of the enhancement, as well as a discussion of the limitations of SPADE, can be found in Ref. [51]. The constrained CRB with θ 0 = 1 becomes [O(∆ k ) − θ 2 k ]/N = O(∆ k )/N , which is on the same order of magnitude as the fundamental quantum limit [26].
More exact and pleasing results can be obtained if ψ is Gaussian and given by Eq. (7.16). Consider for example the estimation of the second moment θ 2 . For direct imaging, it can be shown that For SPADE on the other hand, which not only beats direct imaging by a significant amount in the subdiffraction regime but is in fact superior for all parameter values. To further illustrate the difference between the two methods, suppose that the object happens to be a flat top given by dF (y) = θ 0 ∆ 1 |y|<∆/2 dy. With the constraint θ 0 = 1, the CRBs become It is noteworthy that Eq. (8.12) is exactly equal to the quantum limit given by [51,Eq. (E15)], meaning that SPADE is exactly quantum-optimal-at all parameter values-for estimating the second moment. This is consistent with previous results concerning the estimation of two-point separation [23] and object size [24,28], but note that the previous results assume that the object shape is known, whereas the CRBs and the estimators here assume the opposite.

IX. CONCLUSION
The semiparametric theory set forth solves an important and difficult problem in incoherent optical imaging: the evaluation of the CRB and the efficient estimation of object parameters when little prior information about the object is available. The theory gives exact and achievable semiparametric CRBs for both direct imaging and SPADE, establishing the superiority and versatility of SPADE beyond the special parametric scenarios considered by previous studies.
Despite the elegant results, the theory has a few shortcomings. On the mathematical side, the conditions for the theory to hold seem difficult to check in the case of direct imaging with a non-Gaussian point-spread function, especially when the point-spread function has infinite moments. It is an open question whether this is merely a technicality or a hint at a whole new regime of statistics. On the practical side, the theory may be accused of assuming ideal conditions for both measurements, such as infinitesimal pixels for direct imaging, the availability of infinitely many modes for SPADE, perfect specification and knowledge of the optical systems, and the absence of excess noise. Reality is necessarily uglier, but the results here remain relevant by serving as fundamental limits (via the data-processing inequality [51,68]) and offering insights into the essential physics. The theoretical and experimental progress on SPADE and related methods so far [69][70][71] has provided encouragement that the theory has realistic potential, and the general results here should motivate further research into the wider applications of quantuminspired imaging methods.
An interesting future direction is to generalize the semiparametric formalism for quantum estimation [72,73]. By treating the symmetric logarithmic derivatives of the quantum state ρ as the scores in the L 2 h (ρ) space proposed by Holevo [73] and adopting a geometric picture [74], a quantum generalization of the semiparametric CRB can be envisioned, but whether it can solve any important problem, such as the quantum limit to incoherent imaging [26,27], remains to be seen. Define the inner product between two random variablesř 1 andř 2 as and the norm as Let the Hilbert space of zero-mean random variables bě and defineŤ whereŠ is defined by Eq. (2.4). Letδ ∈Ř be any random variable that satisfies The semiparametric CRB is [21,22] E(δ 2 ) ≥ CRB = E(δ 2 eff ), (A6) The proof can be done via a Pythagorean theorem [22] without recourse to the Cauchy-Schwartz inequality or the existence of J −1 .Š j is called a score andδ an influence in statistics [21,22]; this paper uses the same terminology for S andβ in light of their resemblance to the statistical quantities. The resemblance can be turned into a precise correspondence for a Poisson random measure by considering the sub-spaceȞ ⊂Ř of random variables that are linear with respect to n. Any elementȟ ∈Ȟ can be expressed aš where U is a surjective linear map from H toȞ. Since by virtue of Eq. (2.3),Ȟ is isomorphic to H and U is unitary [61], and sinceŤ ⊆Ȟ andŠ = U S,Ť is isomorphic to T . Picking aδ ∈Ȟ witȟ leads to whereβ eff is given by Eq. (3.9) because of Eq. (A7) and the isomorphisms. The CRB becomes which is Eq. (3.8).

Appendix B: The moment parameter space
Define an s×s Hankel matrix with respect to a real-number sequence θ = (θ 0 , θ 1 , . . . ) as M (s) jk (θ) = θ j+k , j, k ∈ {0, 1, . . . , s − 1}. (B1) If θ is a moment sequence that arises from a nonnegative measure F , is nonnegative for any real vector w, and all Hankel matrices are positive-semidefinite, viz., Conversely, any sequence with Hankel matrices that obey Eq. (B3) can be expressed in the form of Eq. (5.3) with a nonnegative F by virtue of Hamburger's theorem [75].
For the CRB to hold for a p-dimensional θ, the parameter space Θ should be an open subset of R p [68,76]. Intuitively, the requirement makes sense because all the parameters in θ are unknown and θ should be allowed to vary in a neighborhood, otherwise the problem would be overparametrized. If Θ is not an open subset, the parameter space would be constrained and the CRB may be modified [76]. When all the moments are unknown parameters, consider the set Each θ ∈ Θ corresponds to a measure with infinite support r = ∞ [75]. The proof can be done by observing that the polynomial in Eq. (B2) has at most s−1 zeros and the integral is strictly positive for any w = 0 if and only if r ≥ s, and therefore the constraint for Θ is satisfied if and only if r = ∞. For r < ∞, F can be expressed in terms of its support {y l : 0 ≤ l ≤ r − 1, y l < y l+1 } as In the context of optics, r is the minimum number of point sources that can describe the object distribution. The assumption of Eq. (B4) as the parameter space is consistent with the infinite-support assumption for semiparametric estimation with mixture models [21,Sec. 6.5], and it also makes intuitive sense, at least as a necessary condition-with r point sources, there are only 2r unknown parameters, and the problem would be overparametrized if all the moments are assumed to be unknown. Further inequality constraints on θ may be needed to ensure the convergence of the Taylor series in Eqs. (5.2) and (7.8), although they should not affect the CRB as long as θ obeys and stays away from them [76]. The boundary of Θ corresponds to measures with finite support r < ∞. If s ≤ r, then M (s) > 0 and M (s) is full-rank (rank = s), but if s > r, I can write V is the Vandermonde matrix and invertible since {y l } are assumed to be distinct [77]. With M (s) ≥ 0 and diag(F ) ≥ 0, Sylvester's law of inertia [77] implies that the rank of M (s) is the same as the rank of diag(F ), which is r. In other words, the rank of M (s) is min(r, s), and any finite r means that M (s) is rank-deficient and does not satisfy the strict M (s) > 0 for all s > r. Whether the CRB still holds for θ on the boundary is a difficult question, considering that the parameter space here is infinite-dimensional and it is not obvious how existing finite-dimensional results regarding the CRB on a boundary [76] can be applied.

Appendix C: Series expansion of the object distribution
Assume that the object measure F can be expressed as the orthogonal series with respect to a reference measure F (0) , where {g j = ∞ k=0 G jk y k : j ∈ N 0 } are the orthogonal polynomials defined by and G is a lower-triangular matrix with nonzero diagonal entries that can be obtained by the Gram-Schmidt procedure. Thus each "Fourier" coefficient ξ j can be expressed in terms of the moment parameters as which can be written as Hence each θ corresponds to a set of coefficients ξ that can be used to represent F via Eq. (C1). It is straightforward to transform the CRBs and the efficient estimators derived in this paper for θ to those for ξ via Eqs. (C4).

Appendix D: Tangent space for the direct-imaging model
Consider the tangent space T given by Eq. (3.10) and the score functions given by Eq. (5.4) for direct imaging. First note that S ⊂ H, as the Fisher information J jj = S j , S j = ν(S 2 j ) is assumed to be finite for all j. Recent calculations in quantum estimation theory suggest that this assumption is reasonable for any measurement [26]. To prove T = span(S) = H, the standard method is to show that the only element in H orthogonal to S is 0 [61], that is, implies h = 0 (almost everywhere with respect ton). Here I list a few approaches with various levels of rigor. The first approach is to consider the set of orthogonal polynomials where A is a lower-triangular matrix with nonzero diagonal entries and can be determined by applying the Gram-Schmidt procedure to the monomialsφ(x) [64]. Under rather general conditions on f , the polynomials form an orthonormal basis of H [64], viz., and I can write Eq. (D1) as or, more compactly, If the only solution to Eq. (D5) is w = 0, then the only solution to Eq. (D4) is h = 0. This is equivalent to the condition that B is injective. Integration by parts yields where C is the same as Eq. (5.12). Since both A and C are lower-triangular with nonzero diagonal entries, B = AC is also lower-triangular with nonzero diagonal entries, and B has a well defined matrix inverse where I is the identity matrix; see Appendix E for details. If the matrices were finite-dimensional, the existence of a matrix inverse would imply and the only solution to Eq. (D5) would be w = 0. This proof is not entirely satisfactory however, as Eq. (D8) assumes that the product of the infinite-dimensional matrices is associative. Associativity assumes that the order of the sums involved in the matrix product can be interchanged, but it cannot be guaranteed for infinite-dimensional matrices. In other words, the existence of a matrix inverse for B may not imply that B is injective. A more rigorous approach is to define and notice that Eq. (D1) implies The unique solution to Eq. (D11) is h = 0 if the family {H(x − y) : y ∈ Y} satisfies a property called completeness in statistics [5]. For example, if H is Gaussian, {H} is a full-rank exponential family for any open subset Y ⊂ R and therefore complete [5]. An even more rigorous formulation of this approach [21] is to treat h, χ y as an operator that maps h ∈ H to a function of y in another Hilbert space, and then show that the null space of the operator consists of only h = 0. The proof again boils down to the requirement that {H} should be complete; see Ref. [21,Sec. 6.5].
Appendix E: Inverse of an infinite-dimensional triangular matrix Let C be an infinite-dimensional matrix with entries indexed by (j, k) ∈ N 2 0 . Define its formal matrix inverse C −1 as another infinite-dimensional matrix that satisfies If C is lower-triangular with nonzero diagonal entries, viz., then C −1 can be found by a recursive relation as follows. Define C (s) as the s × s upper-left submatrix of C. Write C (s+1) and (C −1 ) (s+1) as the partitions and the recursion starts from (C −1 ) (1) = (C (1) ) −1 with one element (C −1 ) 00 = 1/C 00 . The matrix inverse of an infinitedimensional upper-triangular matrix can be defined in a similar way.
Although the product of infinite-dimensional matrices may not be associative, it can still be proved by induction that D(Cu) = (DC)u for any vector u if D and C are lowertriangular, because involves finite sums only. Thus it is safe to assume that C −1 (Cu) = (C −1 C)u = u and C(C −1 u) = (CC −1 )u = u if C is lower-triangular with nonzero diagonal entries.
Appendix F: An alternative derivation of the Cramér-Rao bound for direct imaging Consider the problem described in Sec. V. Since the polynomials given by Eq. (D2) are an orthonormal basis, the information matrix for the moment parameters can be expressed as meaning that J = B B, where B = AC is given by Eq. (D6). Ignoring the complications of multiplying infinitedimensional matrices, the CRB becomes To evaluate A −1 A − , notice that the orthonormality of a can be written as whereφ is the monomials given by Eq. (5.7). In other words, giving This leads to Eq. (5.18) for the parameter β = u θ.
Appendix G: Alternative approaches to the constrained Cramér-Rao bound One way of deriving the constrained CRB if θ 0 is known is to consider the information matrixJ with respect to the parameters ϑ = (θ 1 , θ 2 , . . . ) without θ 0 . Then θ = (θ 0 , ϑ ) , andJ can be written as the submatrix of J, or where j is a column vector. Ignore the complications of dealing with infinite-dimensional matrices and partition J −1 similarly as Then it is straightforward to show that Letθ = (θ 1 ,θ 2 , . . . ) , and observe thatθ 0 = 1/C 00 from Eqs. (5.17), (5.12), and (5.6). Then Eq. (5.18) implies that which implies Eq. (6.3) if the parameter of interest is defined as β = u θ with u 0 = 0. Yet another way of deriving the constrained CRB can be found in Ref. [76], which can be shown to lead to the same result here.
The proof is similar to the first approach in Appendix D. Consider H = span(a) in terms of an obvious orthonormal basis a = a j (q) = δ jq / f (j) : j ∈ N 0 . (H1) Any h ∈ H orthogonal to the S given by Eq. (7.10) obeys which can be written as As C is upper-triangular with nonzero diagonal entries and f > 0 is assumed, B is lower-triangular with nonzero diagonal entries, and induction can be used to prove that the only solution to B w = 0 is w = 0, or equivalently h = 0. Hence T = span(S) = H. The proof is easier than the one in Appendix D because B here is lower-triangular rather than upper-triangular. An alternative proof, similar to the second approach in Appendix D and Ref. [21,Sec. 6.5] but less fruitful in this case, is to define and use the completeness of {H(q|y) : y ∈ Y} to prove the unique solution h = 0. If H is Poisson, for example, then {H} is a full-rank exponential family and therefore complete [5]. where D is given by Eq. (7.14), and the CRB for β = u θ coincides with Eq. (7.13). Consider the point-spread function given by Eq. (7.20). The standard Legendre polynomials are defined in terms of such that the orthonormal version is b q (k) = 2q + 1P q (k).
To bound the moments ofH and f , consider a lower bound on the binomial coefficient for j ≥ 1 given by [78, pp. 1186 such that each even moment ofH can be bounded as (q|y)q 2j + ∞ q=jH (q|y)q 2j (J16) This means that each even moment of f (q) is bounded as With ν(q 0 ) = ν(1) = τ θ 0 , ν(q 2j ) < ∞ for all j ∈ N 0 as long as θ 0 and θ 2j are finite. Odd moments can be bounded via the Cauchy-Schwartz inequality [ν(q j )] 2 ≤ ν(1)ν(q 2j ). Hence all the moments of f are finite as long as all the moments of F are finite.