Co-designed annular binary phase masks for depth-of-field extension in single-molecule localization microscopy

: Single-molecule localization microscopy has become a prominent approach to study structural and dynamic arrangements of nanometric objects well beyond the diﬀraction limit. To maximize localization precision, high numerical aperture objectives must be used; however, this inherently strongly limits the depth-of-ﬁeld (DoF) of the microscope images. In this work, we present a framework inspired by “optical co-design” to optimize and benchmark phase masks, which, when placed in the exit pupil of the microscope objective, can extend the DoF in the realistic context of single ﬂuorescent molecule detection. Using the Cramér-Rao bound (CRB) on localization accuracy as a criterion, we optimize annular binary phase masks for various DoF ranges, compare them to Incoherently Partitioned Pupil masks and show that they signiﬁcantly extend the DoF of single-molecule localization microscopes. In particular we propose diﬀerent designs including a simple and easy-to-realize two-ring binary mask to extend the DoF. Moreover, we demonstrate that a simple maximum likelihood-based localization algorithm can reach the localization accuracy predicted by the CRB. The framework developed in this paper is based on an explicit and general information theoretic criterion, and can thus be used as an engineering tool to optimize and compare any type of DoF-enhancing phase mask in high resolution microscopy on a quantitative basis.


Introduction
Over the last decades, a variety of super-resolution fluorescence microscopy techniques have allowed to obtain images with higher resolutions than the diffraction limit [1]. Among these techniques, single-molecule localization microscopy consists in detecting single emitters [2][3][4][5], which, when optically isolated, can easily be super-localized, i.e. with nanometric precision [6]. On the one hand, taking advantage of such localization precision, single molecule/particle tracking allows the precise dynamic behavior of molecules to be revealed in complex environments including in live biological cells [7] or in structured materials [8][9][10]. On the other hand, the ultrastructure of densely labelled entities (e.g. biological specimens [11][12][13][14] or nanomaterials [15][16][17]) can be revealed by controlling the emission properties of the emitters used.
Because molecule diffusion and molecular assemblies are generally not confined within the two dimensions of the imaging microscope plane, several approaches have been designed to extend the super-localization concept to the third dimension [18,19]. 3D single-molecule localization proved to be very efficient within the depth-of-field (DoF) of the microscope, even in thick samples [20,21]. These techniques require sophisticated devices, calibration and processing techniques, and often lead to PSF displaying broadened transverse shapes which might penalize 2D localization accuracy.
Yet, for some applications, extending the DoF without aiming at superlocalizing molecules along the axial direction might be useful, for minimal instrumental and processing complexity or in the case of low photon numbers, for example when imaging at high speed. To this aim, several approaches have been proposed, consisting in extending the DoF by making the PSF of the microscope invariant along the imaging axis to generate volumetric images formed of 2D projections of the 3D imaging volume [22,23]. This concept was also used in 2-photon excitation fluorescence microscopy for fast volumetric imaging of brain function [24].
In this work, we propose an alternative methodology to optimize phase masks designed to increase the DoF of localization microscopes. Our approach is inspired by "optical co-design" [25][26][27][28][29][30], which not only takes into account the image formation model and the properties of the optical system to design the phase mask, but also the method of localization extraction to maximize the quality of the final information delivered by the system. Building on this, we propose a rigorous framework to optimize and benchmark phase masks aimed at optimizing 2D location accuracy within a prescribed DoF range. The potential of this framework is illustrated by optimizing and comparing the performance of annular binary phase masks [31] and Incoherently Partitioned Pupil masks [22].
More precisely, this optimization framework is based on the Cramér-Rao Bound (CRB) that represents the fundamental limit of single-molecule localization accuracy [32]. The CRB has already been used in the literature to evaluate the 2D and 3D location capabilities of localization microscopy strategies, and to compare the performance of different strategies [33][34][35]. However, to the best of our knowledge, it has never been used to design optimized DoF-enhancing phase masks. In order to efficiently localize the particle in practice from the images, we also propose a localization algorithm based on the maximum likelihood (ML) and adapted to the optical characteristics of the optimal masks. We show that contrary to the ML-based methods used in standard focused localization microscopy [33], this algorithm requires segmenting the DoF range in a sufficient number of segments to reach the CRB.
Our approach is in sharp contrast with the works in [31] and [36], which address the problem of DoF extension in classical imaging. In these publications, the optimization criterion is not a location performance expressed by a CRB or a Fisher information. It is the image quality obtained after deconvolution with an averaged Wiener filter. This quality is expressed in terms of mean square error (MSE) between the ideal and deconvolved images. We show in the present paper that since the CRB and MSE-based optimization criteria are different, they lead to optimal masks that are different and that have significantly different localization performance.
This article is organized as follows: Section 2 describes the imaging and noise models, introduces the Fisher information matrix and the CRB to calculate the fundamental limit of single-molecule localization accuracy, and describes annular binary phase masks aimed at improving the localization accuracy of defocused imaging systems. In Section 3, we present a co-design approach for optimizing DoF-extending phase masks. Using the CRB as a criterion, we optimize annular binary phase masks for various DoF ranges. In Section 4, we design a ML based localization algorithm and demonstrate that it reaches the CRB at the price of a moderate increase of the computational complexity compared to the case when the particle is in focus. In Section 5, we use the developed framework to benchmark the performance of two different types of optimized DoF-extending phase masks, with a particular focus on a simple and easy to realize binary mask composed of two rings only. Conclusions and perspectives are drawn in Section 6.

Single-molecule localization microscopy and DoF extension
Our goal is to improve the DoF of single-molecule fluorescence microscopes by using phase masks and adapted image processing algorithms. In this section, we first define the signal and noise models considered in this article, then the single-molecule localization accuracy criterion we have chosen. We then describe the type of phase masks we shall use for DoF extension and illustrate their capacity to make localization accuracy nearly invariant to defocus.

Signal and noise models
Since the emitter is unresolved by the microscope, we observe, in the image plane, the point spread function (PSF) of the microscope objective centered on the geometric image position of the emitter. The irradiance is proportional to where M is the lateral magnification of the imaging setup and θ 0 = (θ 0x , θ 0y ) t is the position of the emitter in the object plane. The superscript t denotes transposition. The function f ψ (x, y) represents the 2D spatial distribution of the PSF and is normalized so that ∬ f ψ (x, y) dx dy = 1. It is proportional to the squared modulus of the Fourier transform of the normalized complex pupil function defined as exp[iΦ(ξ, η)], when ξ 2 + η 2 <1, and 0 otherwise, with the phase function Φ(ξ, η) expressed, for pure defocus wavefront error, by In this equation, the defocus parameter ψ is defined as where NA o denotes the object NA of the imaging system, ∆z o the longitudinal distance between the observed particle and the focus point, and n o the refractive index of the object space. This parameter represents the first non-zero expansion of the optical path difference between the focused and defocused spherical wavefronts at the edge of the exit pupil. This approximation is valid for typical microscope configurations [37,38]. According to this model, if ψ = 0 (the object is in focus), the PSF is the classical Airy pattern with size defined by the object NA of the microscope objective, the absolute value of M, the lateral magnification of the imaging setup and the wavelength λ of the collected light. When ψ increases, the PSF progressively gets wider [39,40]. It is classically considered that defocus is not critical when |ψ|<λ/4 (Rayleigh criterion). The digital image delivered by the sensor is a version of Eq. (1) that has been sampled, filtered by the finite size pixel and corrupted by noise. Let us assume that this image has a width of 2P + 1 pixels, with P ∈ N + . We call s ij the number of photo-electrons observed at pixel (i, j) ∈ N 2 with |i| and |j| ≤ P. Assuming that the measurement noise is additive, Gaussian, spatially white, of mean zero and variance σ 2 n , the number s ij is a Gaussian random variable of probability density function is the mean value of s ij . In Eq. (5), N 0 denotes the total number of photo-electrons expected in the whole image and ∆ xy the length of the side of the square pixels. We have represented the photo-electron detection steps in Fig. 1: Fig. 1(a) represents the continuous PSF f ψ,θ 0 (x, y) of a defocused optical system with ψ = ±0.4λ; Fig. 1(b) represents its sampled/integrated version   For the sake of simplicity, we do not take into account in our model the signal-dependent photon noise, that is Poisson distributed and not additive. Also, we use Fourier optics for modeling image formation of high-aperture microscope objectives and nanometric targets (see [37,41] for a more accurate electromagnetic-based model). These simplifications ease the proof of the DoF-enhancing potential of binary phase masks. However, the results obtained in this paper can be generalized without difficulty to more thorough imaging and noise models.

Fundamental limit of localization accuracy
Locating a molecule consists in estimating the particle coordinates θ 0 with high accuracy from the measured data s ij . According to [33], the fundamental limit of localization accuracy can be obtained from the Fisher information matrix, which indicates how the likelihood of the observed data is affected by changes in the values of the parameters of interest. It is defined by where p s ij | µ ψ,θ 0 ij is defined in Eq. (4) and the symbol E[.] denotes the mathematical expectation operation.
The diagonal values of the inverse of the Fisher matrix are the Cramér-Rao bounds (CRB) of the estimates of θ 0x and θ 0y . They represent a lower bound on the estimation error variance of these parameters that can be obtained with an unbiased estimator. They thus represent the intrinsic difficulty of an estimation problem, independently of the (unbiased) method used to solve it. We consider here equivalently the square root of the CRB, denoted RCRB and which has the dimension of a distance, as the limit of localization accuracy.
Due to the circular symmetry of the PSF f ψ,θ 0 (x, y) for any value of ψ, the off-diagonal terms of the Fisher information matrix are zero and the RCRB is the square root of the inverse of the diagonal elements of the Fisher information. Thus, the RCRB along the x and y axes have the following expressions: (1) and (5). These expressions show this standard result that the fundamental limit on localization is inversely proportional to the signal-to-noise ratio (SNR), defined as SNR = N 0 /σ n and depends on the spatial sampling of the PSF. Moreover, when the PSF is circularly symmetric, which is the case here, one has RCRB x = RCRB y . Thus in the following, we will use the normalized RCRB in order to characterize the influence of defocus for any value of the SNR. The normalized RCRB is defined by dividing RCRB x by its value when the object is in focus (i.e. ψ = 0). It is independent of the value of the SNR.
As an illustration, we have represented in Fig. 2(a) the variation of the PSF profile along the x axis as a function of the defocus parameter ψ for an example of microscope configuration defined in Table 1. We express the defocus parameter ψ in units of central wavelength λ of the collected light. We note that the larger the defocus parameter, the more the PSF spreads out and its central lobe gets fainter. We have represented in Fig. 2(c) the variation of the normalized RCRB of this optical system as a function of defocus (blue dotted line). It is observed that the RCRB increases very slowly until ψ ±λ/4, which corresponds to the Rayleigh criterion, then the increase gets much sharper. At a defocus parameter of ψ = ±1λ, the RCRB is more than ten times larger than for ψ = 0 where, for example, the RCRB is equal to 0.04 pixel (or 2.7 nm) when N 0 = 500 photons and σ 2 n = 6 photons 2 /pixel. The case with a phase mask, in Fig. 2(b), is discussed in the next section.

DoF extension using annular binary phase masks
Our goal is to minimize the variation of localization accuracy with defocus by placing an optimized phase mask in the exit pupil of the optical system and by adapting the localization algorithm.
There exists many types of phase masks that may improve the DoF [42][43][44][45][46][47]. We consider in this article annular binary phase masks since they are easy to manufacture and have proven their efficiency in classical imaging applications [36,[48][49][50][51]. These masks consist of concentric rings defined by their normalized outer radii. Each ring implements a phase modulation of alternatively 0 or π radians at a nominal wavelength λ. For instance, we have represented in Fig. 3 a four-ring annular binary phase mask defined by the parameter vector ρ = (ρ 1 , ρ 2 , ρ 3 ) t , with 0 ≤ ρ 1 ≤ ρ 2 ≤ ρ 3 ≤ 1, where ρ n is the outer radius of the n-th ring. This mask defines a binary phase function that we denote by Φ mask (ξ, η, ρ). If such a mask is placed in the exit pupil of a defocused optical system, the phase function in the exit pupil, as defined in Eq. (2), has the Fig. 2. PSF variability of the x axis profile of (a) an aberration-free diffraction-limited system with a circular aperture, (b) the same optical system using a two-ring binary phase mask with ρ 1 = 0.59 and (c) the normalized limit of the localization accuracy, denoted normalized RCRB, along x coordinate axis as a function of defocus. The simulation parameters are defined in Table 1. following expression: where ρ is the mask parameter vector. We have represented in Fig. 2(b) the variation of the PSF profile as a function of the defocus parameter ψ when a two-ring mask defined by ρ 1 = 0.59 is placed in the exit pupil (the reason for choosing this value of ρ 1 will become apparent in the next section). The presence of the mask significantly alters the optical properties of the system through defocus when compared with the aberration free PSF in Fig. 2(a). The minimum spread of the PSF is not obtained for perfect focus (i.e. ψ = 0), but for ψ ±0.7λ. On the other hand, for ψ = 0, the PSF profile is not concentrated but divided in three main lobes. It is also noticed from Fig. 2(a) and 2(b) that whether or not the mask is present, the PSF is identical for ψ and −ψ (that is, on either side of the focus point). This would not be the case for most other types of DoF-enhancing phase masks, such as the cubic mask [42]. This is an interesting property of binary phase masks with π-phase modulation [31].
We have represented in Fig. 2(c) the variation of the normalized RCRB as a function of ψ with this mask placed in the exit pupil (red solid line). The normalized RCRB for a given scenario is equal to the RCRB in this scenario divided by the value of the RCRB without mask and in focus (i.e. ψ = 0). By comparison with the curve obtained without mask (blue dotted line), it is seen that the mask allows to get much lower values of the RCRB over the whole defocus range. With the mask, the maximal value of the RCRB over the defocus range is three times smaller than without it. Of course, this value is also three times larger that the value obtained without mask and in focus: the price to pay to extend the DoF is to degrade the localization accuracy in focus. It is interesting to note that the value of RCRB is good at ψ = 0, even if for this defocus parameter, the PSF is spread into three main lobes (see Fig. 2(b)). This means that such a PSF, albeit not concentrated in a single lobe, still contains enough information to ensure accurate localization.
Our goal will be to determine the binary phase masks parameters that optimize the RCRB for different numbers of rings and different values of the defocus range, and to determine the algorithms that make it possible to reach a localization accuracy equal to the RCRB in practice.

Binary phase mask optimization for localization applications
In this section, we define an optimization criterion for binary phase masks aimed at DoF extension. This criterion is highly non-convex, and we describe a method to perform its optimization. Then, we apply this method to optimize annular binary phase masks composed of two to five rings for various defocus ranges. In each case, we evaluate the obtained trade-off between localization accuracy and DoF extension.

Optimization method
Using Eq. (7) and (8), we are able to calculate the RCRB of an imaging system equipped with a phase mask for any value of the defocus parameter ψ. Since RCRB = RCRB x = RCRB y and the PSF is identical for ψ and −ψ using binary phase masks with π-phase modulation, a reasonable criterion for phase mask optimization is therefore to minimize the value of the RCRB for the worst value of ψ, that is, to define the optimal mask parameters as where [0, ψ max ] is the defocus range on which we want the localization to be accurate. Solving this minimax optimization problem is not simple since the function max ψ RCRB(ρ, ψ) is non-convex and presents several local minima. In Fig. 4(a), the value of max ψ RCRB(ρ 1 , ψ) is represented for a two-ring mask as a function of the radius ρ 1 and for different values of ψ max . We notice that there are several local minima. Similarly, Fig. 4(b) represents the value of max ψ RCRB(ρ 1 , ρ 2 , ψ) for a three-ring mask as a function of the radii ρ 1 and ρ 2 for ψ max = 2.5λ. We also can see several local minima (the global minimum is marked with a white cross). In these two cases, mask optimization can be solved by an exhaustive search. To achieve the non-convex optimization defined in Eq. (10) with a larger number of rings, we use the particle swarm optimization algorithm defined in [52]. This algorithm, developed by J. Kennedy and R. Eberhart in the late 1980s, relies on the collaboration of individuals. Based on simple displacement rules, a set of particles explores the landscape of the phase mask parameters to be optimized and gradually converges towards a local minimum. Because we have no guarantee to find the global minimum in a single optimization run, we perform a large number of optimization runs with randomly chosen initial values to find a reliable value of the global optimum, similarly to what is done in [31].  Table 1.

Performance and limits of optimized binary phase masks
By applying the optimization method described in the previous section, we have optimized multi-ring binary phase masks for various defocus ranges. Figure 5(a) represents the maximal value of the normalized RCRB over the defocus range obtained with the optimal mask parameters ρ opt , defined as and plotted as a function of the number of rings. Each curve corresponds to a different value of the defocus range ψ max . The leftmost point of the curves corresponds to an optical system without mask. In this case, RCRB max naturally increases with ψ max . When a mask is used, RCRB max first sharply decreases with the number of rings, then levels off. For example, on the blue curve corresponding to ψ max = 1λ, RCRB max is divided by three by using a two-ring mask, but is not reduced as the number of rings is further increased. This means that to solve the DoF extension problem over this defocus range with an annular binary phase mask, two rings are sufficient. It has to be noted that the optimal two-ring mask is defined by ρ 1 = 0.59, which corresponds to the mask used to plot Fig. 3.
If we now consider the orange curve that corresponds to ψ max = 1.5λ, we first see that it is above the blue one, since the problem to solve is more difficult. Moreover, for this defocus range, using three rings instead of two brings some improvement in localization accuracy. Globally speaking, it is observed that as the defocus range widens, a larger number of rings is needed to reach the optimal localization accuracy. For the maximal considered value of the defocus range ψ max = 3λ, four rings are necessary and sufficient to reach the minimal RCRB. Figure 5(b) represents the profiles of the optimized binary phase masks obtained for this defocus range for two to five rings. We can see that the optimal mask with five rings is quite similar to that with four rings, which is consistent with their similar localization performance.
The main conclusion of these results is that optimal binary masks significantly improve localization accuracy for all the considered defocus ranges, and that a limited number of rings is sufficient to obtain optimal performance. These conclusions are similar to those obtained in the case of DoF enhancement of classical imaging systems, where the optimization criterion is the quality of the deconvolved image [31]. However, as shown in Appendix A, the optimal masks are different.  Table 1.
In order to gain a deeper insight into the obtained results, let us analyze how the optimal masks modify the PSF of the optical system and how these modifications make it possible to extend the DoF. Figure 6(a) represents the variation of the PSF profiles as a function of the defocus parameter for four different configurations: without mask and with masks optimized, respectively, for ψ max = {1λ, 1.5λ, 3λ}. It can be seen that the use of a binary phase mask significantly reduces the fading of the PSF over the defocus range. Figure 6(b) illustrates the variation of the normalized RCRB as a function of ψ for the same four imaging systems. Comparing Fig. 6(a) and 6(b) is insightful. It is observed that the local minima of the RCRB in Fig. 6(b) correspond, for all defocus ranges, to two different types of PSF profiles that are appropriate for localization in Fig. 6(a). The first one is characterized by an important central lobe. It occurs for example Fig. 6. Evolutions of (a) the PSF profile along x axis and (b) the normalized limit of the localization accuracy, denoted normalized RCRB, as a function of defocus for an optical system using a binary phase mask optimized for ψ max = {1λ, 1.5λ, 3λ}. The simulation parameters are defined in Table 1. at ψ 0.7λ for ψ max = 1λ and ψ = 0 for ψ max = 1.5λ. The second type is characterized by important secondary lobes. It occurs for example at ψ 0 for ψ max = 1λ and ψ 0.7λ for ψ max = 1.5λ. The PSF profiles corresponding to the transition between these two types yield higher values of the RCRB. For example, for ψ max = 1.5λ, the worst localization accuracy is obtained for ψ 0.4λ. In Fig. 6(a), this value of ψ corresponds to a low contrast PSF profile with no distinct central or secondary lobes.

Localization algorithm
The phase masks have been optimized using the RCRB criterion defined in Eq. (11), which represents a lower bound on the localization standard deviation of unbiased estimation. It is thus a "potential" performance level, and one has to specify actual estimators able to reach this performance in practice. In the case of well focused images, it has been shown that for sufficient SNR, ML algorithms are able to reach the CRB [33]. However, in our case, the problem is more involved since it depends on another parameter, the defocus parameter ψ, which is unknown and will not be retrieved. Thus, the parameter ψ can be considered as a nuisance parameter for our localization problem. If one does not have any a priori information on the actual value of ψ, one has to maximize, with respect to θ and ψ, the log-likelihood defined as where s ij and µ ψ,θ ij are as defined in Eq. (5) and above, and p s ij | µ ψ,θ ij is defined in Eq. (4). It amounts to a joint ML estimation of θ and ψ. The problem is that the defocus parameter ψ is a continuous variable, so that this problem is intractable. In the next section, we propose a first method based on a single averaged kernel, which has the advantage of being as fast as in the standard case of a well-focused particle. After analyzing the performance and drawbacks of this method, we propose another solution based on the division of the defocus range in a limited number of subdomains.

Estimation based on a global kernel
When the defocus parameter ψ is known, the ML estimate of the position θ can be written as a correlation product: arg max This estimator requires the knowledge of µ ψ,θ ij and thus, of ψ. In practice, we have no information on ψ. A simple way to cope with this problem is to replace µ ψ,θ ij in Eq. (15) with a defocus invariant kernel r θ ij such that the estimator has the following expression: The question is how to construct this kernel. First, we show in Appendix B that it is possible, with some approximations, to get a closed-form expression of the variance var ψ [θ] of the estimator in Eq. (16) for a given value of ψ. We then choose a set of values ψ k , k ∈ [1, K], evenly spaced within the defocus range [0, ψ max ] and determine the kernel r θ ij that maximizes k (var ψ k [θ]) −1 , that is, the sum of inverses of the estimation variance for each value of ψ k . We show in Appendix C that this kernel has the following expression: where the coefficients α k are the components of the eigenvector associated with the greatest eigenvalue of the matrix W defined as where the superscript ∼ denotes the Fourier transform, the superscript * denotes the complex conjugate and f ψ (x, y) is as defined in Eq. (1). It can be noted that this approach consisting in defining a single "averaged" kernel is similar to that used for DoF extension in classical imaging, where an averaged Wiener filter is used to deconvolve the image whatever the value of ψ [28]. Figure 7(a) compares the empirical normalized standard deviation of the estimator defined in Eq. (16) with the normalized RCRB as a function of the defocus parameter ψ. The empirical normalized standard deviation of the estimator for a given scenario is equal to the empirical standard deviation of the estimator in this scenario divided by the value of the RCRB without mask and in focus (i.e. ψ = 0). The simulated optical system uses a two-ring binary phase mask optimized for the DoF range ψ max = 1λ. The estimator standard deviation is estimated using Monte-Carlo simulations based on 4000 realizations. This curve shows that the empirical normalized standard deviation of the estimator reaches the normalized RCRB only for ψ>0.4λ. For values of ψ below this limit, the variance of the estimator is much larger than the RCRB (the values are outside the graph limits in Fig. 7(a)).
The reason for this failure is the following. From Fig. 6, it can be seen that the shape of the PSF strongly varies with the defocus. For the defocus range ψ max = 1λ, two regimes can be distinguished: one where the shape of the PSF is spread with important secondary lobes (see Fig. 7(c)) and the other where the PSF is more concentrated around a major main lobe (see Fig. 7(d)). The optimal defocus invariant kernel r θ ij , represented in Fig. 7(b), is clearly much more similar to the PSF for ψ 0.7λ than to the PSF for ψ = 0. This explains why its accuracy is good for larger values of the defocus parameter but dramatically fails for smaller values. In conclusion, it is impossible for a unique kernel -even an optimal one -to encompass the PSF variability over the defocus range [0, 1λ]. It is therefore necessary to split the defocus range into sub-ranges on which the shape of the PSF is more invariant to apply the estimator defined in Eq. (16). It is important to specify that these sub-ranges do not allow to estimate the defocus parameter, but only to improve the estimation algorithm of the 2D coordinates of the single molecule observed.

Estimation based on multiple kernels
Suppose that the DoF range [0, ψ max ] is split into M distinct sub-ranges. Using the method described in the previous section, we can define M optimal defocus invariant kernels over each where β is a scale factor. Indeed, to perform the fit with the data actually present in the image s ij , it is necessary to scale the amplitude of the kernel to the amplitude of s ij with help of the parameter β defined as Let us consider again the imaging system with a two-ring binary phase mask ρ 1 = 0.59 studied in Fig. 7. We can define four sub-ranges in which the shape of the PSF is almost invariant: the sub-range m = 1 is ψ ∈ [0, 0.2λ], m = 2 is ψ ∈ [0.2λ, 0.35λ], m = 3 is ψ ∈ [0.35λ, 0.5λ], and m = 4 is ψ ∈ [0.5λ, 1λ]. For each of these sub-ranges, we compute the optimal defocus invariant kernel r m,θ ij as defined in Eq. (17). In practice, each of these four kernels is built using six values ψ k evenly spaced within each sub-range. They are respectively illustrated in Fig. 8(b), 8(c), 8(d), and 8(e). In particular, it is noticed that the kernel corresponding to the first sub-range (see Fig. 8(b)) is now similar to the PSF for ψ = 0 (see Fig. 7(c)). In Fig. 8(a), we have represented with blue dots the empirical normalized standard deviation of the estimator defined in Eq. (19) (estimated with Monte Carlo simulations). If we compare them to the RCRB (red dotted line), we note that in contrast to Fig. 7(a), the actual standard deviation reaches the RCRB for all the defocus parameters. Dividing the defocus range [0, ψ max ] into four parts is thus enough to encompass the PSF variability within this range. The price to pay is to perform four correlations instead of a single one in the case of well-focused systems.  Table 1. Of course, when the defocus range increases, the PSF variability within this range also increases (see Fig. 6(a)). As a consequence, one has to use a larger number of sub-ranges to encompass this variability. For example, for a range ψ max = 1.5λ, we have used five sub-ranges. We have represented in Fig. 9(a) the standard normalized deviation of this estimator, estimated with Monte-Carlo simulations, and we see that it fits the normalized RCRB. Figure 9(b) represents the same values for ψ max = 2λ. In this case, we had to use six sub-ranges to fit the normalized RCRB.
As a summary, contrary to well-focused systems, the RCRB of DoF-enhanced systems cannot be reached with an estimator consisting of a single correlation kernel. This is due to the fact that even with the optimal masks, the PSF varies within the defocus range. There is thus a price to pay in terms of computational complexity for DoF extension. We have proposed a method based on the subdivision of the defocus range in a sufficient number of sub-ranges, each one being associated with a single correlation kernel. We have shown that this method makes it possible to reach the RCRB. Its computation complexity is simply proportional to the number M of necessary sub-ranges, which increases with the width of the defocus range. Indeed, for M subranges, localization requires M correlations instead of a single one in the standard case of focused localization.  Table 1.

Comparison with other previously proposed DoF-extending mask
The mask optimization approach presented in this article is based on a general and objective localization criterion: the RCRB. This allows the comparison of any type of DoF-extending strategy on a quantitative basis. In order to illustrate this potential, we compare in this section the annular binary masks and the masks introduced in [22]. These masks consist of a series of concentric annular sub-apertures introducing phase delays that are much larger than the coherence length of illumination. Hence, the light beams emerging for each sub-aperture are mutually incoherent and the PSF of the mask is simply the incoherent addition of the PSFs produced by each sub-aperture. In the following, these masks will be denoted Incoherently Partitioned Pupil (IPP) masks. Their parameters are the widths of the rings.
To compare these two strategies, we optimize the masks, that is, the ring widths, with the minimax criterion defined in Eq. (10) for different values of the defocus range ψ max . Figure 10 represents the normalized value of RCRB max obtained with different imaging strategies for discrete values of ψ max ranging from 0 to 1λ with a step of 0.1λ. The black line represents the normalized RCRB max obtained with a localization microscope with no mask and only limited by diffraction. It will serve as a baseline for comparison. The dotted blue curve is obtained with the optimized annular binary phase mask and the dash-dotted red one with the optimized IPP mask. Note that for each value of ψ max , the optimal mask may be different. Interestingly, it is first observed that the phase masks do not improve performance for small defocus parameters ψ max <0.4λ, whatever the used strategy, since the three curves are superposed. In this case, DoF is too small for improving localization performance using masks. On the other hand, when the DoF range becomes larger, phase masks yield significant improvement, and this improvement is larger with annular binary phase masks than with IPP masks for any value of ψ max . For instance, when the defocus range is equal to ψ max = 1λ, the optimal annular binary phase mask yields a RCRB max three times smaller than that obtained without mask, while the optimal IPP mask reduces RCRB max only by a factor of two.  Table 1.
This result is an illustration of how the framework proposed in the article allows any type of DoF-enhancing strategy to be compared. Of course, localization accuracy may not be the only criterion for the choice of a mask in a given application. For example, manufacturability is also an important criterion. In that regard, annular binary phase masks may be easier to manufacture with photolithographic techniques since they only require one shallow etching level whereas the IPP masks necessitate deep tiers between each ring.

Conclusion
We have investigated the problem of DoF extension in the context of single-molecule localization microscopy. We have shown that placing an optimized phase mask in the exit pupil of the microscope and using an adapted processing algorithm allows to significantly increase the localization performance within the required defocus range. We have proposed different binary mask designs to enhance the DoF including a two-ring only solution that is easy to manufacture. Of course, there is a price to pay for DoF extension since the localization accuracy is always lower than for well-focused systems.
A strong asset of the framework developed in this article is to be based on an explicit and general information theoretical criterion. It thus makes it possible to compare any type of DoF extending masks on a realistic and quantitative basis. We have illustrated this potential by comparing optimized annular binary phase masks with another type of optimized phase mask proposed in the literature.
The present work is based on a simple imaging model, and in this sense, the orders of magnitude given in this article can be considered as upper limits on the DoF improvement that can be obtained in practice with phase masks. Thus, this method lays the basis on which more sophisticated and application-dependent strategies can be built. In particular, the imaging model could be improved in several ways, in terms of PSF modeling, optical aberrations or noise model. Another interesting perspective is to apply this framework to more general types of masks, such as continuous pure-phase masks or masks with combined amplitude and phase modulation, and to compare them on a realistic basis.

A. Comparison between CRB and MSE-based optimization criteria
In this Appendix, we compare the binary annular DoF-enhancing phase masks optimal for classical imaging systems and for single-molecule localization microscopy.
We have represented in Fig. 11 the optimal masks for DoF extension of ψ max = 1λ in classical imaging [31] (MSE optimization, Fig. 11(a)) and in localization microscopy (CRB optimization, Fig. 11(b)). It is seen that these mask are very different. The question is whether they yield different localization performance. Fig. 11. Annular binary phase masks optimized for DoF extension ψ max = 1λ in (a) classical imaging and in (b) localization microscopy. The mask parameters are respectively equal to ρ = (0.7684, 0.9272) t and ρ = 0.59. Each ring is defined as an annular region with constant phase modulation such as dark gray areas induce a phase of 0 radians and light gray areas induce a phase of π radians at a nominal wavelength λ.
To answer this question, we have plotted in Fig. 12(a) and Fig. 12(b) the x-axis profile of the PSF of a single-molecule microscope as a function of defocus when using these two masks. It is observed that these profiles are quite different. The profile of the mask optimized with the MSE Fig. 12. x axis profile of the PSF for a co-designed optical system using binary phase masks optimized for (a) image quality (MSE criterion) or (b) localization accuracy (CRB criterion). (c) normalized limit of localisation accuracy, denoted normalized RCRB, along the x axis as a function of the defocus parameter ψ. The simulation parameters are defined in Table 1. criterion ( Fig. 12(a)) is smooth since the quality of the deconvolved image has to be constant along the required DoF range. In contrast, the profile of the mask optimized with the CRB criterion ( Fig. 12(b)) varies more sharply. This is because, as explained in the section 3.2, the three-lobe profile around ψ = 0 and the "blob-like" profile around ψ = 0.7λ are both appropriate for localization (they yield low values of the RCRB).
Finally, Fig. 12(c) represents the localization performance of the two masks represented in Fig. 11, expressed in terms of normalized RCRB, as a function of the defocus ψ. It is seen that this localization performance differ significantly. We can conclude that the DoF enhancing masks optimal for classical imaging [31,36] are not optimal for localization microscopy.

B. Closed-form expression of the variance var ψ [θ]
In this Appendix, we show that it is possible, with some approximations, to get a closed-form expression of the variance var ψ [θ] of the estimator in Eq. (16) for a given value of ψ. For the sake of simplicity, we model the single-molecule localization problem as a one-dimensional problem. However, the method is also valid for 2D localization. To facilitate the mathematical developments, we assume that the observed signal s(x) is continuous and modeled as where θ 0 is the position of the emitter in the object plane, N 0 is the total number of photo-electrons expected in the whole 1D image, n(x) is a Gaussian zero-mean white noise of power spectral density S nn (ν) = q and f ψ,θ 0 (x) = f ψ (x − θ 0 ) is proportional to the 1D spatial distribution of irradiance over the sensor for a given defocus parameter ψ. We want to estimate the coordinate θ 0 of the emitter in the object plane by maximizinĝ where r(x) is a defocus invariant kernel.
To quantify the performance of the estimator defined in Eq. (22), we can calculate its bias and variance. Using Eq. (21), the estimator has the following expression: where Ω(θ) = ∫ f ψ,θ 0 (x)r(x − θ) dx is a correlation product and n (θ) = ∫ n(x)r(x − θ) dx is a filtered noise. Let us consider the second order Taylor expansion of Ω(θ) when θ is close to the true value θ 0 (i.e. when the variance is low) We suppose that the correlation function Ω(θ) reaches its maximum when θ = θ 0 . So using expressions (23) and (24), we can show that The estimated positionθ of the emitter is therefore a random variable fluctuating around θ 0 . Using the theorem of derivation under the integral sign, we can write This random process is zero mean, so the estimatorθ is unbiased. Moreover, the power spectral density of the filtered noise n (θ) is by definition equal to |2iπνr(ν)| 2 S nn (ν) where S nn (ν) = q and the superscript ∼ denotes the Fourier transform, leading to var ∂n (θ) ∂θ θ=θ 0 = 4π 2 q ∫ R ν 2 |r(ν)| 2 dν .
So taking into account the previous simplifications, we obtain the following closed-form expression of the variance var ψ [θ] for a given value of the defocus parameter ψ: (30)

C. Invariant kernel based on variance minimization
In this Appendix, we determine the optimal defocus invariant kernel r(x) such as the variance var ψ [θ] of the estimator defined in Eq. (22) is minimal. Considering a discrete set of values ψ k , such as k ∈ [1, K], the optimal defocus invariant kernel can be defined as the one which maximizes This criterion is reasonable. Of course, other criteria could have been considered, but one of its advantages is that its solution is closed-form. Indeed, optimization of this criterion can be performed by solving the following constrained optimization problem using the method of Lagrange multipliers: maximize function k ∫ ν 2f ψ k * (ν)r(ν) dν 2 subject to ∫ ν 2 |r(ν)| 2 dν = A, where A is a constant. The Lagrangian has the following expression: with λ a Lagrange multiplier. By annulling its functional derivative, it is easily shown that the Fourier transform of the optimal defocus invariant kernel is a linear combination of the functionsf ψ k (x) such as (var ψ k [θ]) −1 = 4π 2 N 2 0 q α t W t Wα α t Wα (34) where α = [α 1 , α 2 , . . . , α k ] is the vector of linear combination coefficients and W is a matrix defined by [W] ij = ∫ ν 2f ψ i * (ν)f ψ j (ν) dν. We can conclude that the optimal defocus invariant kernel that maximizes k (var ψ k [θ]) −1 is a linear combination of the functions f ψ k (x) which coefficients are the components of the eigenvector associated with the greatest eigenvalue of the matrix W.