Asymptotics of Bayesian Error Probability and Rotating-PSF-Based Source Super-Localization in Three Dimensions

We present an asymptotic analysis of the minimum probability of error (MPE) in inferring the correct hypothesis in a Bayesian multi-hypothesis testing (MHT) formalism using many pixels of data that are corrupted by signal dependent shot noise, sensor read noise, and background illumination. We perform this error analysis for a variety of combined noise and background statistics, including a pseudo-Gaussian distribution that can be employed to treat approximately the photon-counting statistics of signal and background as well as purely Gaussian sensor read-out noise and more general, exponentially peaked distributions. We subsequently apply the MPE asymptotics to characterize the minimum conditions needed to localize a point source in three dimensions by means of a rotating-PSF imager and compare its performance with that of a conventional imager in the presence of background and sensor-noise fluctuations. In a separate paper, we apply the formalism to the related but qualitatively different problem of 2D super-resolution imaging of a closely spaced pair of point sources in the plane of best focus.


Introduction
Spatial localization of single molecules well beyond the diffraction limit is an important task of many flourescence-based bioimaging techniques that are presently available [2,3]. This includes both full 3D localization as well as motion tracking. The unprocessed image data typically provide initial information about the coarser ranges to which the position coordinates of a single molecule in question are confined, but a more careful post-processing of the image data involving either a point-spread function (PSF) fitting [4] or a statistical approach [5] or a combined approach using Bayesian inference [6] and mean-squared displacement analysis [7,8] can reveal the location of and track [9] one or more fluorescing molecules with much finer precision. It is possible by these methods to achieve a precision 10-20 times smaller -occasionally even 100 times smaller [10] -than the standard Abbe diffraction limit, δ x ≥ 0.61λ /NA, of a microscope operating at an observing wavelength λ and with numerical aperture NA.
The typical error analysis for biomolecular localization has considered the mean-squared error (MSE) in estimating the location, either via a centroid-based PSF model fitting [11] or a Cramer-Rao-bound (CRB) based minimum-estimator-variance analysis [12,13,14]. These analyses are local and incomplete, however, since they fail to account for any prior knowledge one may have about the location based on low-resolution image data, which may be critical to the localization accuracy at low signal strengths. A local analysis also excludes, by its very definition, any higher-order sensitivity criteria that would permit a more accurate treatment of certain problems for which a first-order sensitivity metric like the Fisher information (FI), on which the CRB is based [15], vanishes identically. An important problem of this kind is the quantification of the error in the axial localization of a molecule at zero defocus using any defocus-symmetric PSF, such as the conventional Airy-disk PSF [14]. It seems to us more sensible, although theoretically less tractable, to pose the problem of super-localizing single molecules to sub-diffractive precision in a Bayesian framework where any prior knowledge may be incorporated in a statistical formalism in a non-local manner.
In the analysis presented here, the acquired image data automatically satisfy a Bayesian protocol in which the coarse spatial ranges bounding the source coordinates serve to provide an initial, or prior, statistical distribution of these coordinates, one that without additional spatial information can be regarded as being uniformly random over these ranges. Under a sufficiently high signal-to-noise ratio (SNR) of fluorescence detection, the data X contain, via their specific statistical noise model, detailed information about the source location. Their conditional statistics, specified by a probability density function (PDF), P(x | m), conditioned on the knowledge of a specific source location m, can be exploited to improve upon the location uncertainty to a degree that depends on directly on the source flux and background levels.
The problem of localization with sub-diffractive errors amounts, in our multi-hypothesistesting (MHT) based Bayesian view [16], to determining the source location by means of the posterior distribution, P(m|X), with a greater precision than that contained in the uniform spatial distribution of the prior p m , m = 1, . . . , M. Indeed, the mean and mode of the posterior provide two different excellent estimates of source location. The mean-squared error (MSE) of the first of these estimators, the so-called minimum MSE (MMSE), upper-bounds the best achievable precision theoretically possible with any image processing protocol. The maximum a posteriori (MAP) estimator provides a second fundamental metric of minimum error, namely the minimum probability of error (MPE) in estimating the correct source position from among a set of M a priori random possible positions, While the two metrics are in general not related to each other, the MMSE is lower-bounded by the so-called Ziv-Zakai bound [17] which is expressed in terms of the MPE of a related binary-hypothesis testing problem. The MPE metric also provides a useful relationship between Bayesian inference and statistical information via the Fano bound and its generalizations [18]. The present work extends our earlier study [19] of localization accuracy based on both the MPE and MMSE metrics, which were shown to be related closely for a highly sensitive Bayesian detector, from a few sensor pixels to the asymptotic domain of many sensor pixels. It differs, however, from the more standard asymptotic analyses of MHT [20] in which one assumes that many statistically identical data frames are present, by addressing the experimentally more realistic context of many image pixels that sample a spatially varying PSF, typically with a single peak so the pixels progressively farther away from the peak contain progressively less signal. The work reported here is a comprehensive analysis of the MPE within the MHT protocol, as applicable to problems involving imaging in the presence of photon and sensor noise. We have also developed approximate treatments of the MMSE metric, but the use of that metric to characterize the error of point-source super-localization will be presented elsewhere. We focus here exclusively on an MPE-based Bayesian error analysis of this problem.
In Sec. 2, we present a brief review of the general MHT error analysis. In Sec. 3, we discuss the problem of localization of a single point source from the MHT perspective, focusing on the calculation of the MPE in a MAP-based detection protocol for localization under purely Gaussian additive sensor noise statistics. For such noise statistics, it is possible to perform a simple asymptotic analysis of the MPE of localization and show that the MPE may be expressed as a sum of complementary error functions, in which at high SNR only one term tends to dominate all others that can thus be dropped. Section 4 generalizes the calculations of the previous section to more general noise statistics that include the fluctuations of a spatially uniform mean background and photon noise from the signal itself, both described by Poisson statistics. Under a simplifying approximation that all three noise statistics can be combined at each image pixel into a single pseudo-Gaussian statistics with the mean being equal to the sum of the (spatially varying) signal and (spatially uniform) background mean values, and variance being equal to the sum of the sensor noise variance and the signal and background mean values. Under Poisson statistics, since the two mean values also furnish the corresponding variances, this simplifying approach may be justified under conditions of either a large signal mean value or a large background mean value per pixel. A continuity correction [21], not implemented here, can improve the pseudo-Gaussian approximation still further for the combined noise statistics. In Sec. 5, these considerations are generalized further to include any exponentially peaked statistical distributions, for which asymptotically valid approximations of the kind covered in Sec. 4 are once again possible. The next section presents in graphical form our numerically obtained exact and asymptotic results for the MPE for the problem of super-localization of single point sources by two different classes of imaging microscopes, one based on conventional clear-aperture imaging and the other on a recently proposed [22] pupil-phase-engineered PSF that encodes defocus via its rotation in the image plane and is thus naturally suited to perform axial localization. Detailed comparisons of the performances of the two imagers with respect to the problem of full 3D localization of point sources are presented and discussed in this section. Some concluding remarks appear in Sec. 7.

Minimum Probability of Error for M-ary Hypothesis Testing
For Bayesian discrimination among M different hypotheses, labeled by the index m, which takes values m = 1, . . . , M, expression (1) for the MPE in inferring the correct hypothesis from data X, drawn from the set S X , may be reformulated by means of the Bayes rule as where R m is the decision region in the data set S X for the mth hypothesis. The MAP criterion, namely defines the decision regions for which the error probability (3) takes its minimum value. The M different hypotheses exhaust all possible decisions from any data outcome, i.e., S X = ∪ M m=1 R m . Since S X dx P(x | m) = 1 for any m, we may express Eq. (3) more conveniently as We shall assume that the data space is real and multi-dimensional, as the image data are typically collected at a number of pixels, say N, in the sensor plane. Thus, S X ⊂ R N . For N >> 1, as is typically the case, P (min) e may be evaluated approximately via an asymptotic analysis. In fact, one may show quite easily that for a fixed value of m, the sum over m ′ in Eq. (5) may be replaced in this case by a single termm, which labels the decision region that is the "closest" to R m in the following sense: The MPE is thus accurately approximated by the asymptotic expression

Gaussian Conditional Data Statistics
For Gaussian conditional data statistics, namely where x m denotes the mean value of the data vector under hypothesis m and σ 2 the variance of data at each pixel, the definition ofm may equivalently be stated as where up to an additive constant E m (x) is simply proportional to − ln[P(x|m) p m ], For a given m, we determinem by first mapping out the boundary between the decision region R m and other decision regions and then finding that decision region for which E m (x) takes the smallest possible value at the boundary. This is an optimization problem that is easily solved by requiring that at the boundary between R m and R where δ x mm ′ simply denotes the separation vector between the mean data values for the mth and m ′ th hypotheses, Expanding the squared norm on the right hand side (RHS) of Eq. (11) using the identity where a and b are two column vectors, we see that the boundary between R m and R m ′ is described by the equation where γ is defined by the relation Equation (13) defines a hyperplane that passes through the point x m + γδ x mm ′ /2 on the meanto-mean separation vector δ x m ′ m and is orthogonal to that vector. Clearly, over this hyperplane the common value of E m and E m ′ has its minimum at this point, namely at The index of the "closest" decision region to R m , as defined by Eq. (9), is then the argument of the minimum value of min x∈R m ′ (E m ) over all m ′ = m, i.e., m = arg min In view of the PDF (11), the asymptotically correct expression (7) may be evaluated approximately by transforming the integral on its RHS, for each m value, to a coordinate system in the N-dimensional data space for which one of the coordinate unit vectors, sayt, is chosen to be along the separation vector xm − x m and the remaining (N − 1) coordinate axes are chosen to span the hyperplane, orthogonal tot, that separates the decision region R m from its closest neighbor Rm. The deviation of a data vector from its mean value, (x − x m ), may then be expressed in the new coordinate basis as x − x m = x tt + x ⊥ , where x ⊥ is the projection of x in the (N − 1) dimensional hyperplane orthogonal tot, i.e., x T ⊥t = 0. This transformation allows us to express the squared norm (10) as and the PDF (8) as On substitution of this form for P(x | m) inside the integral in the asymptotic expression (7) and integrating the variable x t from F mm to ∞ and the orthogonal projection x ⊥ over the full hyperplane containing it, we have the simple result in which we have used the simple integral identity, (N − 1) times to integrate over the (N − 1) nonzero, mutually orthogonal components of the vector x ⊥ . The remaining integral in expression (19) may be evaluated in terms of the complementary error function as with erfc defined as erfc(u) For sensitive detectors, the quantity, F mm / √ 2σ 2 , is likely to be large, as it is proportional to the SNR for any m. In that case, expression (20) simplifies still further since the asymptotically valid approximation for erfc may then be used,

Pseudo-Gaussian Conditional Data Statistics
A similar but considerably more involved treatment of the MPE may be given for a pseudo-Gaussian conditional data PDF that accurately describes the statistics of image data acquired under combined photon-number fluctuations and sensor read-out noise, at least at large photon numbers. Let the data x, given hypothesis m, be distributed according to the PDF where, under the condition of statistically independent data pixels, the data covariance matrix is a diagonal matrix of the form 1 where σ 2 and x m denote, as before, the variance of sensor read-out noise and the mean data vector, respectively, given the hypothesis m. The variance of the pseudo-Gaussian PDF (24) is the sum of the Gaussian read-out noise variance and the variance of the shot noise corresponding to Poisson photon-number fluctuations, the latter being equal to the mean photon number at any pixel. Under asymptotic conditions, as for Gaussian data statistics, the most significant contributions to the MPE from the mth term in the sum (7) come from the vicinity of the point, x * , on the boundary between R m and Rm where P(x | m) has its largest value. This point does not, however, lie on the line joining the centers of the two decision regions nor is the boundary between two decision regions a hyperplane in general. Rather one must perform a constrained maximization of P(x | m), or equivalently a minimization of − ln P(x | m), subject to the constraint that x be on the boundary, according to the MAP decision rule underlying the MPE expression. In view of the form (25) for the negative log-likelihood function (LLF), this amounts, via the use of a Lagrange multiplier λ , to the minimization, from which, for brevity, we have dropped certain logarithmic terms that do not depend on x.
The minimum of the quadratic form (26) is easily determined by taking its gradient wrt x and setting it to zero at x = x * , This equation is readily solved as a simple matrix equation (12) and then combining the two terms containing it. The solution may be simplified by employing the definition (24) for the diagonal covariance matrices Σ m and Σm and then performing simple algebra, A similar expression for (x * − xm) also follows from (28), These expressions, when plugged into the constraint, − ln[P(x | m) p m ] = − ln[P(x |m) pm], with the negative LLF given by expression (25), yield an equation for λ , which may be simplified to the form Equation (30) is sufficiently complicated that its solution, λ , cannot be evaluated in closed form. However, under asymptotic conditions of many pixels, N >> 1, and large source flux, x m 2 >> Nσ 2 >> 1, for which we expect the inequalities, 1 << δ xm m 2 << σ 2 + x m 2 , to hold, we may write (σ 2 + xm) = (σ 2 + x m ) + δ xm m and expand the element-wise quotient, r m , of the two vectors inside the curly braces in Eq. (30) in powers of δ xm m /(σ 2 + x m ) to obtain where δ is a shorthand notation for the order of the asymptotically small elements of the element-wise quotient vector δ xm m /(σ 2 + x m ). The RHS of Eq. (30) may be similarly expanded by writing the logarithm of the ratio of the determinants of the diagonal covariance matrices, Σm and Σ m , as a sum of the logarithms of the ratios of their diagonal elements, ratios that, in the asymptotic limit, differ from 1 by the small ratios of the corresponding elements of δ xm m and (σ 2 + x m ). On performing this expansion to the lowest order in these ratios and substituting expression (31) in Eq. (30), we obtain a simple equation for λ in this order, from which λ may be determined as The contribution of the mth term in the sum (7) for the MPE may now be calculated in the asymptotic regime by recognizing that the integral of P(x | m) over Rm is dominated by its maximum value, P(x * | m), on the boundary between R m and Rm. The calculation can be made more precise by first determining the unit normal n * to the boundary at the point x * , and then resolving the vector, v whose negative squared norm occurs in the exponent of the Gaussian form of P(x | m), along n * and orthogonal to it. The integral of P(x | m) over Rm may then be performed by fixing the origin of an orthonormal coordinate system at x * , with coordinate axes that are along n * and orthogonal to it. The N-dimensional integral in expression (7) then reduces, approximately, to the product of a Gaussian integral along n * from 0 to ∞ and the remaining (N − 1) Gaussian integrals along mutually orthogonal directions in the orthogonal subspace of n * , each from −∞ to ∞. We do this next.
The unit normal n * on the boundary between regions R m and Rm is along the gradient of the where Eq. (25) was used to reach the second inequality. From Eq. (27) and the fact that λ ≈ 1/2 from expression (33) under conditions used to derive that expression, we see that expression (35) for n * simplifies approximately to the form where we used the definition of the norm, v 2 2 = v T v, to arrive at the second equality. We now are the projections of v along n * and in its orthogonal complement, respectively, with the former subdivided further into two parts, u m and U m , in which U m defines the shift vector between the mean data point x m within R m to the origin, at x * , of the new coordinate system whose axes are individually scaled by the diagonal elements of the diagonal matrix Σ In view of the definition (34) and the decomposition v = v + v ⊥ , as given in Eq. (37), expression (25) may now be exponentiated to arrive at a simplified form for P(x | m), The integral of P(x | m) over x in the decision region Rm, as we indicated earlier, can now be performed approximately as the product of the integral over the variable v 2 from U m 2 to ∞ and (N − 1) integrals over the remaining (N − 1) orthogonal components of v, each of the latter integrals having its limits extended to ±∞. The scaling of the coordinate axes in going from the x space to the v space, according to definition (34), exactly cancels out the determinantal factor in the denominator of expression (39), while the (N − 1) infinite integrals over the orthogonal complement of n * produce merely an overall factor (2π) (N−1)/2 , leaving just a single Gaussian integral over v to be done. In other words, in the asymptotic limit the following approximate value may be obtained for the overall multiple integral: This result is readily expressed in terms of the complementary error function, as done in the previous section for the case of purely additive Gaussian data, which leads to the following asymptotically valid result for the MPE (7): Numerically a somewhat more accurate form of the asymptotic expression is provided by including two terms, rather than one, for each value of m in the sum (41), the second term corresponding to the next nearest decision region at whose boundary P(x | m) takes its next highest boundary value. We employ such an improved approximation for all our numerical results presented in Sec. 6.
Detailed Expression for U m 2 According to Eq. (38), the quantity U m 2 is simply the inner product n T * Σ Using expression (36) for n * , we may write this inner product as In view of the relation (28) between (x * − x m ) and δ xm m and since λ ≈ 1/2 according to result (33), we may express U m 2 in Eq. (42) as is the arithmetic mean of the mean data vectors corresponding to the two decision regions R m and Rm. Since all the matrices that are sandwiched between δ x T mm and δ xm m in this expression are diagonal, they commute with one another and multiply together to yield other diagonal matrices. With the help of definition (24) for the covariance matrix Σ m , we may thus express the above result as a ratio of two single sums over the data pixels, In the asymptotic limit, the arithmetic mean,x m , of the mean data vectors in the two decision regions that occurs in this expression may be replaced by either mean data vector, say x m , without incurring significant error. This would simplify expression (44) somewhat.

Extreme Asymptotic Limit
As the number of data pixels and the sensitivity of the detector grow large, the MPE sum (41) will be dominated by a single term, namely that value of m for which the argument U m 2 / √ 2 of the complementary error function is the smallest. In this case the behavior of MPE is asymptotically exponential with a characteristic exponent, ν ∞ , that takes the value in which all additive terms in ln P (min) e that scale sub-linearly with N tend to 0 in the limit. In the pure Gaussian limit of additive noise alone, our results, including those derived earlier in the section, agree with the corresponding results derived in the previous section, as seen by setting terms like σ 2 + x m to just σ 2 in all our expressions of the present section. In particular, the exponent ν ∞ reduces to the value in this purely Gaussian noise limit.

Exponentially Peaked Conditional Data Statistics
The preceding asymptotic analysis of the MPE may be easily extended to any data statistics that may be expressed naturally in the exponential form, where L m (x; N) is the log-likelihood function (LLF) for the mth outcome. The exponential family of conditional data PDF [21] is an example of such distributions. We shall assume the property of exponential peaking for the PDF in the sense that the LLF is approximately an extensive variable in the asymptotic limit, N → ∞, i.e. lim N→∞ L m (x; N)/N is finite. This assumption must break down in a practical setting since as more and more pixels of data are included around and outward from the maximum-intensity pixel in the localized image PSF, the less signal per pixel is expected to be present on average. Depending directly on the extent of the spatial footprint of the PSF, this must imply, in general, an optimum number of data pixels beyond which the MPE is expected to show rapidly diminishing improvement and thus essentially to saturate with N. This saturation of the MPE with growing N is evidently present in expressions like (22), (45), and (46) that tend to saturate with growing N. The boundary of the decision region R m with a neighboring decision region R m ′ under the MAP hypothesis is simply the hypersurface on which P(x | m) p m = P(x | m ′ ) p m ′ , which in view of the form (47) of the PDF is equivalent to the equation We now determine that point x * on the boundary (48) at which P(x * | m) p m , or equivalently L m (x; N) + ln p m , is maximized. This is given by a constrained maximization of this quantity, subject to x * being on the boundary (48). This amounts, via a Lagrange multiplier λ , to maximizing the auxiliary function, which yields the following vanishing-gradient condition: This is a vector relation from which one can, in principle, evaluate the N components of the maximizer x * = x * (λ ; m, m ′ ). The multiplier λ must be evaluated self-consistently, however, by substituting the so obtained "solution" for x = x * back into the boundary equation (48). This procedure can be implemented via an iterative algorithm in which one first starts with an initial guess for λ , say λ (0) , then uses a numerical solver to solve Eq. (49) for x * , then substitutes the result in place of x back into Eq. (48), solving it for an improved value of λ , say λ (1) . This process is repeated as the alternate evaluations of x * and λ are refined from one iteration to the next. If x (n) * and λ (n) are their values at the nth iteration, one expects them to converge to the correct solution, i.e., λ (n) → λ , x (n) * → x * , as n → ∞. Depending on the form of the LLFs, more than one solution x * is possible. If we label the multiple solutions by the subscript s, running from 1 to S for S distinct solutions, then asymptotically the integral of P(x | m) over the neighboring decision region R m ′ will, in general, be dominated exponentially by Under very general conditions and in the extreme asymptotic limit, then, the MPE, as given by the sum (9), is dominated exponentially by a single term in that sum, that for which the expression (50) is maximized for all possible pairs (m, m ′ ), namely by This expression is expected to be equal logarithmically to the MPE in the asymptotic limit of infinitely many pixels, which via the exponential form of the PDF (47) is equivalent to the asymptotic limit − lim a limit that by our very assumption of extensivity of the LLF is well defined. The quantitiesx andm refer to the values of the boundary point x * and the hypothesis label for which the double maximization in Eq. (51) is attained. We shall now apply our MPE analysis to treat the important task of 3D super-localization in single-molecule imaging. Specifically, we shall characterize the MPE incurred in performing this task using a rotating-PSF imager [22] and compare its performance with the conventional clear-aperture imager for which the PSF in the plane of best focus is of the Airy disk variety. Since our preceding analysis is completely general, and makes no reference to a specific application or algorithm, even whether imaging based or not, we are in a position to compare the best achievable performance, from the Bayesian MPE perspective, of different protocols to achieve such localization.

3D Point-Source Localization Using Conventional and Rotating-PSF Imagers
The image of a point source produced by an imager in the absence of any sensor or photon noise is simply its point-spread function (PSF), which we denote as H( s), s being the position vector in the 2D sensor plane. The PSF must be discretized to reflect image-plane pixelation, requiring integration of image intensity over the area of each pixel during the recording time to generate what is the mean count of that pixel. In the presence of photon-number fluctuations, typically described as a Poisson random process, the count at a pixel fluctuates correspondingly. When detected by a sensor array like a CCD sensor, however, the read-out process adds further noise, known as additive read-out noise, that may be well approximated, after calibration, as zero-mean Gaussian noise with a fixed noise variance σ 2 d . The recorded count at a pixel then exhibits combined Poisson-Gaussian fluctuations from the two sources of noise, which, as we noted in Sec. 3, may be described accurately by a pseudo-Gaussian PDF (PGP) with a mean equal to the mean count at that pixel and a variance equal to the sum of the read-out noise variance and the mean pixel count. The presence of any background fluctuations, e.g., those arising from uncontrolled, largely randomly excited fluorophores in the imaging volume that add to the fluorescence of the specific molecule of interest in a bioimaging problem, is yet another source of noise that may be easily accounted for by adding a mean background count, b, to the mean signal count of the PGP at a pixel and including a corresponding Poisson-noise variance,b, in the overall variance. Assuming that the sum total of fluctuations at the different sensor pixels are statistically uncorrelated, given a specific hypothesis m, the PGP that describes this overall random process is then given by expression (23) with the following values for the mean and variance: x m = s m +b, Σ m = diag(σ 2 d + s m +b), where s m is the vector 2 of conditional mean values of the data pixels, given the hypothesis m. For simplicity, we assume the mean background count to be spatially uniform, sob is simply proportional to a vector of all 1's. A rotating-PSF imager is a specific imaging protocol in which a superposition of pure angular-momentum states of light can be effected through a Fresnel-zone partitioning of the imaging pupil [22]. This yields a PSF that rotates uniformly, in a nearly shape-and sizeinvariant manner, with defocus away from its Gaussian image plane. The amount of rotation of the PSF then encodes the depth (z) information of a point source and its transverse position the (xy) coordinates of the point source. The rotating PSF, while somewhat more extended spatially in the transverse plane than the conventional clear-aperture diffraction-limited Airy-pattern PSF, does not, unlike the latter, spread and lose sensitivity even when the defocusdependent phase at the edge of the pupil is many waves in magnitude. This represents a trade-off between transverse and axial (depth) encoding, which we can study and compare for both kinds of imagers under varying strengths of sensor and photon noise using our Bayesian MPE based metric of performance.
In the formal developments of this section, we keep the PSF as being general that we may choose, as needed, to be either the rotating PSF or the conventional PSF of a clear aperture without any engineered pupil phase. The defocus adds a quadratically varying radial phase in the pupil of form π(δ z/λ )(ρ/l) 2 , where λ is the mean illumination wavelength, δ z the defocus distance from the in-focus object plane a distance l from the imaging pupil, and ρ is the radial distance in the pupil. The phase at the edge of the pupil of radius R, namely ζ def = π(δ z/λ )(R/l) 2 , defines a defocus phase parameter that we shall henceforth use instead of the actual defocus distance δ z. Unlike the rotating PSF, the rapid spatial dispersal of the conventional PSF with increasing defocus should rapidly reduce its sensitivity to encode the depth coordinate of a point source away from the plane of Gaussian focus, although its tighter spatial footprint in that plane should endow it with a greater sensitivity and resolution to encode the transverse position of the source, at least in that plane. It is this fundamental trade-off between the decreased source-localization sensitivity and increased depth of field for the rotating PSF and their reversal for the conventional PSF that we expect to capture with the Bayesian MPE analysis. An alternative, minimum mean-squared error (MMSE) based analysis may also be given to describe this trade-off, but the two Bayesian error analyses are expected to produce similar conclusions, at least in the highly-sensitive detection limit, as we showed in Ref. [19].
Let I 0 H( s − s m ; z m ) be the mean intensity in the image of a point source of intensity I 0 located at position ( s m , z m ) in the object space. For our discrete representation, we evaluated our rotating PSF on a finer grid of subpixels than the actual sensor pixel grid, shifted it by the vector amount s m in the pixel plane, and finally summed over the subpixels constituting each sensor pixel to determine the mean count recorded by the pixel. We denote such a discrete version of the shifted PSF H( s − s m ; z m ) by h in which for a given value of m,m labels that source position m ′ for which x m ′ − x m 2 is the smallest for all m ′ = m. Note that a substitution of expression (55) in the expression (22) for the MPE immediately exhibits its dependence on the source flux-to-noise ratio (FNR), K 0 /σ . We also note that this asymptotic expression for the MPE is dominated typically by a single term in the m sum, that for which the norm x m ′ − x m 2 is the smallest among all possible distinct pairs of source positions, m = m ′ , all other terms being exponentially small in the asymptotic limit of many detected pixels, N >> 1. As we noted earlier, this result is quite analogous to the pairwise minimum Chernoff distance exponent that characterizes the MPE for M-ary hypothesis testing under asymptotically many IID measurements [20].
For the more general case of combined Poisson noise of photon fluctuations and Gaussian noise of the CCD read-out process, the asymptotic form of the MPE is given by a numerically improved modification of expression (41), as described in the text following that expression. To determine the decision region Rm "closest" to the decision region R m for a given value of m, we required that U m be the smallest of all (M − 1) quantities of the same form as the RHS of (44) in whichm is replaced by m ′ and all m ′ = m are considered. The contribution from the next nearest decision region was also included for each value of m, as needed for the numerically improved version of the asymptotic expression (41). This procedure was easily implemented in our Matlab computer code. In our calculations, we fixed the sensor read-out noise variance at σ 2 s = 1, and the mean background level, taken to be spatially uniform, was allowed to float with the mean signal strength at the brightest pixel in the conventional in-focus Airk-disk image, the ratio of the two fixed at 0.1. For such a uniform mean background level, pixels increasingly farther from the brightest pixel, even for the conventional in-focus image, see signal levels that get progressively weaker, with the background eventually dominating the signal. This situation only gets worse when the rotating PSF with its somewhat larger footprint but far better depth sensitivity is employed for localization. For this case, even with the most compact rotating PSF, the mean background level was roughly 0.55 of the signal level at the brightest image pixel. For these reasons, it is sensible to limit attention to only a small sub-image around centered at the brightest pixel, which we chose to be of area 12 × 12 square pixels.
We divide our 3D localization error analysis into two parts. In the first part, we calculate the MPE and its asymptotics for 2D transverse super-localization by factors 2x, 4x, 8x, and 16x at a range of focal depths, corresponding to the defocus phase ranging from 0 to 16 radians in steps of 2 radians, and the reference depth resolution set at a nominal 1 radian in that phase. The second part, by contrast, computes the MPE for achieving depth localization enhancements of 2x and 4x, corresponding to 1/2 and 1/4 radian of defocus phase, respectively, at two different defocus phases, 0 and 16 radians, for the same 4 transverse super-localization factors, 2x, 4x, 8x, and 16x. These localization error probabilities were computed for the pseudo-Gaussian statistics describing the combined fluctuations of photon number, CCD read-out, and background illumination. The nominal 3D localization cell requiring no detailed image data processing defines, for our purposes, the volume over which the prior may be defined to be uniformly random. Its two transverse dimensions were chosen to be 4 pixels× 4 pixels, while its depth dimension, as we have stated before, was taken to be 1 radian. These somewhat arbitrary choices can be shown to be consistent with the diffraction limited imaging criterion with respect to the parameters chosen for our PSFs. Note that much as in actual microscope observations of biological samples, all our images are thus assumed to be oversampled by the sensor array.

Transverse Localization in 2D
In our studies, we varied the FNR over eight different values, namely 100, 500, 1000, 2000, 3000, 4000, 5000, and 10000, which incidentally, since σ s is held fixed at 1, are also the values over which the source photon flux is varied. The error criterion we employed for reliable localization was that the MPE be less than 5%, corresponding to a statistical confidence limit (CL) for a match of better than 95%. Both exact and asymptotic numerical evaluations of the MPE, namely of relations (3) and (41), were performed, the former via a Monte-Carlo sampling approach in which N s data-vector samples {x (1) , . . . , x (N s ) }, with N s sufficiently large, were drawn according to the pseudo-Gaussian PDF (23) that approximates P(x | m), with the mean and variance given by relation (54) for each source location index m. The numerically exact MPE evaluation is predicated on the acceptance or rejection of each data sample x by testing whether P(x | m) is larger than all other P(x | m ′ ), m ′ = m, or not. Subtracting from 1 the acceptance rate averaged over all M values of the hypothesis label m then yields the exact MPE acoording to Eq. (3). The numerically exact results we present here were obtained for N s = 5000, but our evaluations changed by no more than a percent even when we increased N s to 20000 and still larger values in certain cases, giving us excellent confidence in the accuracy of the numerical results presented here.

Min. Prob. Error
Rotating PSF Based 2D Source Localization

Min. Prob. Error
Rotating PSF Based 2D Source Localization We present in Fig. 1 both the exact and asymptotic values of the MPE computed numerically from expressions (3) and (41) for a number of source FNRs and for varying degrees of 2D subdiffractive localization for the conventional imager. No depth localization was sought in the results displayed in Fig. 1, but transverse localization at two different depths, namely ζ = 0 and ζ = 16 radians, was considered. Since the conventional imager loses sensitivity quickly with increasing depth, the two depth values are attended by a large disparity in the corresponding values of the MPE. By contrast, the rotating-PSF imager has rather similar values of the MPE over a considerable depth range, necessitating two different figures, Figs. 2 (a) and 2 (b), to display them properly for the same two values of the depth. For the in-focus case (ζ = 0), it is clear that the conventional PSF based imager yields significantly smaller error probabilities than the rotating PSF based imager for the transverse localization task, regardless of the value of the enhancement factor that we considered here, namely 2x, 4x, 8x, and 16x. This is consistent with the fact that the conventional PSF has a more compact form than the rotating PSF when in focus. However, the behavior reverses dramatically at the larger defocus phase, ζ = 16 radians, since the rotating PSF maintains its shape and compactness over such defocus values while the conventional PSF spreads dramatically. Indeed, in accordance with the approximate depth invariance of the shape and size of the rotating PSF with increasing defocus, the MPE curves for the rotating-PSF imager are quite comparable at both defocus values.
Based on the comparisons presented in Figs. 1 and 2, it is clear that with increasing defocus the rotating PSF based imager must outstrip the 2D localization performance of the conventional imager at some fairly low value of the defocus phase, ζ , but in a manner that depends on the specific values of FNR and localization enhancement factors chosen. In Figs. 3(a)-(c), we present the comparison of the MPE based performance of the two imagers as a function of the defocus phase for a low and a high value of the FNR. The reversal of the behavior of the error probabilities for 2D super-localization with increasing defocus is easily seen in the plots of the MPE vs. the defocus phase in these figures. The cross-over of the error curves for the two kinds of imagers occurs at fairly low defocus values, providing evidence for the fact that the conventional PSF blurs rapidly with defocus and thus must fail to provide transverse superlocalization at all but the lowest values of the defocus with any reliability. For both imagers, the error probabilities rise, as expected, with the degree of 2D super-localization sought, requiring increasingly larger FNR to keep the MPE acceptably low, but the rotating-PSF based imager continues to provide excellent localization enhancement at fairly modest values of the FNR.
From our graphs of the MPE in Figs. 1 and 2 and other similar graphs, not shown here, for other values of the defocus phase ζ , we can also read off the minimum requirements on the source photon number to achieve, at the 95% statistical confidence limit, or 5% MPE, a specific 2D super-localization factor M ⊥ . We plot in Fig. 4 the minimum source photon number, K min , as a function of M 2 ⊥ for both the conventional and rotating-PSF imager for four different values of the defocus phase, namely 0, 4, 8, and 16 radians. As expected, for the rotating-PSF imager, K min , for each value of M ⊥ , increases rather modestly as it is defocused more and more over the full 16-radian range, while for the conventional imager defocused localization even for ζ = 4 requires roughly double the K min needed for the former imager operating at ζ = 16. Only at best focus, ζ = 0, does the conventional imager deliver a better localization performance at more modest photon numbers. Its PSF spreading is simply too dramatic, as it is defocused from its best focus, for it to stay competitive with the rotating-PSF imager. Although our results that we display here were obtained under the conditions of the mean background to the peak brightness of the conventional in-focus image ratio at 10% and negligible sensor noise, we have verified this comparative performance of the two imagers more generally.
The approximately linear form of the plots in Fig. 4 for both imagers confirms the approximately quadratic dependence of the minimum source strength on the degree of super- localization sought at any value of the defocus. This conclusion is quite consistent with previous mean-squared-error (MSE) based analyses [11,12] in the signal dominated regime of operation.

Full 3D Localization
We now address the problem of localizing the spatial position of a point source in all three dimensions by means of an imager that employs either a rotating PSF or the conventional PSF. As is well appreciated [14], the axial, or depth, localization provided by the conventional imager is rather poor for a source at the plane of best focus because of a lack of first-order sensitivity of the PSF relative to the defocus at this plane. The rotating PSF imager, however, has no such insensitivity, and is helped further by its ability, in sharp contrast with the conventional imager, to maintain a tight PSF profile over a large defocus range. We examine, via our present MPE analysis, these attributes of the rotating-PSF imager to achieve full 3D localization.
In Fig. 5, we display, for the rotating PSF imager, the MPE as a function of the source photon number for the same 10% background level used in the previous figures, but now for different degrees of transverse and depth localizations at two different depths, ζ = 0 [ Fig. 5(a)] and ζ = 16 [ Fig. 5(b)]. As expected, the MPE increases with increasing demand on the degree of depth localization from 2x to 4x for each of the transverse localization enhancement factors, 2x, 4x, 8x, and 16x, at each of the two reference depths. The rather modest differences between the overall behaviors presented in the two figures are a result of the approximate invariance of the rotating PSF shape and size across a large range of defocus. By contrast, the next two figures, Figs. 6(a) and 6(b), which present the corresponding results for the conventional imager, indicate a much larger error profile, even at zero defocus for the reason we have stated earlier, namely the first-order insensitivity of such imager in providing any depth discrimination at the in-focus plane. However, since the Bayesian error-probability analysis provides a global sensitivity metric, the higher order sensitivity of conventional imagers to yield depth discrimination is fully accounted for in our results.
The competition between axial and transverse localizations is also evident in these figures.  With an increasing signal strength, the behavior of the MPE is strongly influenced by any changes in the requirement of transverse localization, rising as it does with increasing values of the latter, but ultimately at sufficiently high strengths the MPE is limited by the requirement of depth localization alone, as seen in the asymptotic behavior of the various MPE curves at a fixed depth localization but for different transverse localization factors. This behavior is quite universal for both imagers at sufficiently large signal strengths for each depth localization, but it is particularly noticeable for the in-focus (ζ = 0) conventional imager for which the 2x and 4x transverse super-localization curves are essentially indistinguishable at the higher (4x) depth super-localization demanded over the full range of signal strengths considered here.

Concluding Remarks
The present paper has considered an asymptotic analysis of the Bayesian problem of multihypothesis testing, designed specifically to treat the fidelity of point-source localization with sub-diffractive error that is based on image data. We apply our exact and approximate analyses of the minimum probability of error (MPE) in discriminating among M 2 ⊥ × M possible outcomes of the source position inside an elementary base resolution volume that was subdivided uniformly into M 2 ⊥ × M subvolumes. The transverse and axial localization enhancement factors, M ⊥ and M , were chosen to have values 2,4,8,16 and 2,4, respectively, The image data were drawn from a small sub-image, here 12 × 12 square pixels, centered around the brightest pixel in the full image. Two different imaging systems, one based on conventional clear-aperture imaging and the other on a phase engineered rotating PSF imaging, were compared for their MPE-based potential to achieve 3D source super-localization The MPE was calculated for a number of different signal strengths of the source, ranging from 100 to 10 6 photons, at a mean background level that was pegged at 10% relative to the brightest pixel in the conventional in-focus image.
In the signal-dominated regime for which we have presented our detailed calculations here, we confirmed a number of conclusions drawn by previous researchers using mean-squared error (MSE) based analyses about the minimum source signal strength needed for localizing a point source spatially with precision exceeding what is nominally possible, namely of order λ 2 /NA 2 × λ /NA 2 which we regard as our base resolution volume. In particular, we showed a quadratic (M 2 ⊥ ) dependence of the minimum source strength needed to achieve a transverse localization improvement factor of M ⊥ at a statistical confidence limit of 95% or better. The agreement in the predictions of MSE and MPE based analyses in the high-sensitivity asymptotic limit is a consequence of the equivalence of the two metrics in that limit [19].
From our calculations of the MPE for combined 3D transverse-axial localization, we demonstrated an interplay between axial and transverse localization enhancements. We found that the sought axial localization enhancement typically serves to limit the 3D localization at sufficiently high signal strengths, at which all of the transverse localization improvements from 2x to 16x entail no added error cost. The reduced sensitivity of the conventional imager to perform any depth resolution at zero defocus, noted previously in a local error-bound analysis [14], is also borne out well in our global MPE based Bayesian analysis.
It is worth noting that the MPE is the best error performance one can achieve from the view point of error probability in a Bayesian detection protocol, and most actual localization algorithms are likely to be sub-optimal from this perspective. Reasons for this sub-optimality are many, not the least of which are both an incomplete identification of statistical sources of error and their imperfect statistical characterization. Effort should be devoted primarily in mitigating these systematic sources of error before employing the MAP estimate for localization. Without such mitigation, the performance bound presented by our MPE analysis may seem overly optimistic under realistic FNR conditions.