Super-resolution microscopy of single atoms in optical lattices

We report on image processing techniques and experimental procedures to determine the lattice-site positions of single atoms in an optical lattice with high reliability, even for limited acquisition time or optical resolution. Determining the positions of atoms beyond the diffraction limit relies on parametric deconvolution in close analogy to methods employed in super-resolution microscopy. We develop a deconvolution method that makes effective use of the prior knowledge of the optical transfer function, noise properties, and discreteness of the optical lattice. We show that accurate knowledge of the image formation process enables a dramatic improvement on the localization reliability. This allows us to demonstrate super-resolution of the atoms' position in closely packed ensembles where the separation between particles cannot be directly optically resolved. Furthermore, we demonstrate experimental methods to precisely reconstruct the point spread function with sub-pixel resolution from fluorescence images of single atoms, and we give a mathematical foundation thereof. We also discuss discretized image sampling in pixel detectors and provide a quantitative model of noise sources in electron multiplying CCD cameras. The techniques developed here are not only beneficial to neutral atom experiments, but could also be employed to improve the localization precision of trapped ions for ultra precise force sensing.


Introduction
Detection and manipulation of individual atoms on neighboring sites of an optical lattice have attracted great interest in recent years for applications in quantum information processing [1][2][3][4][5][6][7][8], quantum simulations [9][10][11][12][13], and very recently for studying strongly correlated Fermi systems at the single particle level [14][15][16][17]. Resolving atom positions with single-site resolution represents a technological challenge, since in optical lattices the distance between two lattice sites is on the order of the optical lattice wavelength. In fact, experiments relying on atoms tunneling between lattice sites require short lattice constants since the tunneling rate decreases exponentially with larger intersite separation.
Previously, we demonstrated that the number of lattice sites between well-isolated atoms in a one-dimensional (1D) optical lattice can be resolved with high reliability even with objective lenses of moderate numerical aperture (NA) [4]. Single-atom localization methods are employed in our laboratories ever since to measure the spatial probability distribution of atoms performing discrete-time quantum walks [12,18,19]. Recent experiments in our laboratory beyond single particle physics require resolving the position of each individual atom in small clusters at high filling factors, even when each lattice site is occupied [20]. By exploiting the discreteness of the atoms' positions in the lattice, we demonstrate in this manuscript new methods that enable resolving clusters of atoms with high reliability. Superresolving a cluster of atoms constitutes a much bigger challenge than resolving the distance between exactly two atoms, as we originally demonstrated [4].
Besides presenting a conceptually simple introduction to super-resolved fluorescence imaging of atoms and to the related deconvolution problem, this work develops several new methods with respect to our original work [4], which include: Wiener deconvolution of fluorescence images combined with a robust spectral-density-estimation algorithm for a first estimation of the atoms' positions (Sec. 6.3), a weighted non-linear least squares estimation of positions accounting for the experimentally characterized noise (Sec. 5.2), and inclusion of constraints of the atoms' positions on the periodic lattice (Sec. 6.4), as well as an optimal algorithm for the iterative reconstruction of the line spread function of the imaging system (the analogue of the point spread function for 1D imaging) (Sec. 4.1) with mathematical treatment of the convergence limit (App. C), and a measurement of optical aberrations from single atom images (Sec. 4.2).
In general, our methods for the analysis of fluorescence images are closely related to those employed in superresolution microscopy of biological structures [21][22][23], or in astronomy, where stars appear as point-like radiation sources [24]: Knowing the exact number of emitters in an observed region of interest allows us to determine the center position of each emitter with an uncertainty much smaller than the width of the point spread function (PSF) of the imaging system. Super-resolution imaging relies on the precise knowledge of the properties of point-like atomic emitters trapped in an optical lattice as well as the detailed properties of background noise.
Light sources separated by less then an Abbe radius, r A = λ f /(2NA) (λ f is the fluorescence radiation wavelength and NA is the numerical aperture of the imaging system) form blurred, overlapping images, which cannot be optically distinguished and require superresolution techniques to be resolved. In the past few years, the Abbe diffraction limit has prompted ultracold atom experiments to develop objective lenses with larger numerical apertures in the range of 0.6<NA<0.8 to significantly enhance the optical resolution, thus allowing single lattice sites to be optically resolved [10,11]. These experiments have developed different solutions to the deconvolution problem: for example, linear least-squares fit of fluorescence intensities by a sum of reconstructed PSFs on a fixed lattice combined with threshold criterion [10,13,[15][16][17], deconvolution through a kernel function containing the PSF information (though not noise information) and threshold criterion [25], deconvolution through a Gaussian kernel combined with a threshold criterion [26], deconvolution by fitting different configurations of occupancies on a fixed lattice (though without specifically mentioning methods to account for noise) [11], or simply using a threshold criterion on the integrated fluorescence in the pixels corresponding to each lattice site [14]. A different approach based on electron microscopes has demonstrated even higher spatial resolution to resolve atoms in an optical lattice, though without reaching the sensitivity level needed for detecting single atoms yet [27].
However, experiments working under conditions of a low signal-to-noise ratio or employing a moderately large numerical aperture (as is our case NA = 0.23) require methods that can extract the maximum physical information on the positions of atoms, especially when bunched in closely packed clusters. The Fisher information matrix provides the mathematical instrument to identify the fundamental limit on the information that an estimator of positions can extract from a fluorescence image (Cramér-Rao information bound) [28]: If such an estimator exists, than this estimator maximizes the likelihood function associated with the estimated quantity (i.e. positions). In this sense, the maximum-likelihood estimator defines the gold standard for any image analysis technique. As argued in Sec. 6.3, we can approach this limit relying on a accurate knowledge of the line spread function and noise characteristics [29].
Furthermore, we would like to remark that the techniques and results demonstrated in this work could have an impact even beyond neutral atom experiments. For example, the majority of techniques for photoactivated localization microscopy have been optimized for the situation of fully separated fluorophore, while we here deal specifically with the opposite situation of bunched emitters (atoms) [30]. Our methods can also find application to improve the localization precision of trapped ions, where displacements of the equilibrium position are recorded to sense extremely tiny forces on the yoctonewton scale [31][32][33][34]. Recently, the same techniques presented in section 4.2 to quantitatively reconstruct optical aberrations have been employed, concurrently with this work, to characterize the imaging system of single trapped ions [34].

The deconvolution problem
The steps involved in the image formation from the point-like atomic radiation sources to the final image on the digital camera are schematically depicted in figure 1 and summarized below: (i) Optical diffraction by the imaging microscope transforms all radiation sources into blurred spatial distributions. This process can be mathematically expressed as the convolution of the original distribution O(x, y) with the point spread function (PSF) P(x, y), which represents the characteristic intensity distribution of an ideal point source recorded by the imaging system. In case of a system whose optical response is translationally invariant (isoplanatic behavior), the fluorescence image distribution reads In case of a hard circular aperture, for example, the PSF of diffraction-limited aberrationfree optical system is the well-known Airy disc pattern [35], whose first minimum corresponds to 1.22 r A , with r A being the Abbe radius defined above.

Optical imaging Digital detection
Parametric deconvolution Figure 1. Schematic representation of image formation and information extraction. We retrieve the original atomic distribution O(x, y) from the measured fluorescence image S [x i , y j ] by parametric deconvolution with the point spread function P(x, y) of the imaging system.
Here we use the model assumption that atoms trapped in a 1D optical lattice are line-like radiation sources: Their motion is tightly confined along the longitudinal direction (horizontal direction in the images) and optically not resolved (see also section 4), while it is only loosely confined along the transverse direction (vertical).
(ii) A CCD detector samples and digitizes the image distribution I(x, y) → I[x i , y j ], where x i and y j denote the integer-valued horizontal and vertical positions of a sampling bin and the squared brackets in our notation distinguish discrete from continuous distributions because in general I(x, y) I[x i , y j ]. In fact, the discrete and continuous distribution are mathematically related through (2) where ∆ s represents the sampling spacing in the object plane, R p (u, v) is the CCD pixel rectangular function equal to 1/∆ 2 p for u and v in the interval [−∆ p /2, ∆ p /2] and zero outside, with ∆ p < ∆ s being the size of the pixel projected onto the object plane (today's CCD detector have ∆ p ∼ ∆ s ). Likewise, R s (u, v) is the sampling rectangular function equal to one for both u and v in the interval [−∆ s /2, ∆ s /2] and zero elsewhere. Equation (2) represents the convolution of the continuous intensity distribution with the pixel response function R p (u, v) (digitization), which is multiplied by the 2D comb function with spacing ∆ s (discrete sampling) [36][37][38]. In order to prevent information loss by discrete sampling, the Nyquist-Shannon sampling theorem shows that the PSF or, more precisely, the Abbe radius must be imaged onto more than two CCD pixels, i.e., r A > 2 ∆ s [39,40].
(iii) Physical information contained in the recorded signal S [x i , y j ] is partially lost due to diverse noise sources affecting the image formation process. The effect of noise sources (see Tab. 2 for a complete list) is taken into accounted through an additive stochastic noise term [x i , y j ], which is added to the fluorescence intensity distribution: Here we assumed that the homogeneous background (by digitization offset, stray light, and dark currents discussed in Sec. 5.1) is subtracted from the signal so that the average value of the noise vanishes, [x i , y j ] = 0.
In order to retrieve the original information O(x, y) from S [x i , y j ], we need to invert equation (3) through deconvolution. However, deconvolution problems are in general illconditioned, especially in the presence of noise. A physical model assumption known as regularization must be employed to constrain the solutions. For example, atoms which are strongly confined in an optical lattice are modeled as identical, isolated emitters characterized by only two parameters: positions and fluorescence intensity. Numerous deconvolution strategies exist in the literature, differing in their effectiveness in constraining solutions and in the computational resources required [41]. Our specific parametric deconvolution approach mainly relies on a maximum-likelihood estimation constrained on a discrete lattice, for which a first estimation of the atoms' positions is provided by the so-called MUSIC (multiple signal classification) algorithm (see Sec. 6).

The detection apparatus
3.1. The optical microscope The imaging system depicted in figure 2 realizes an infinity-corrected microscope: The fluorescence light emitted by the atoms at λ f = 852 nm is collimated by a diffraction-limited objective lens (effective focal length f 1 = 36 mm) [42] and imaged onto an EMCCD detector by a plano-convex tube lens (focal length f 2 = 2 m). The magnification of the microscope is f 2 / f 1 ∼ 55, so that the Abbe radius (r A = 1.9 µm) of the point spread function is imaged over > 6 pixels of the CCD camera, thus fulfilling the requirement by the Nyquist-Shannon sampling theorem to avoid information loss [39,40]. We assembled the microscope objective from four off-the-shelf spherical lenses, which are stacked into a one-inch aluminum holder. By design the objective lens operates at the diffraction limit with a numerical aperture of NA obj. lens = 0.29. The objective lens was characterized with a shear-plate interferometer resulting in a peak-valley wavefront distortion of less than λ f /4 over 90% of the clear aperture fulfilling Rayleigh's quarter wavelength rule [43]. The long working distance (36.5 mm) allows the objective lens to be mounted outside of the vacuum sufficiently far away from the vacuum cell to prevent reflected light from molasses laser beams from reaching the camera. The microscope objective is mounted on a three-axis translation stage and connected through blackened tubes to the EMCCD detector. Inside the tubes five sooted knife-edge apertures are placed with gradually decreasing inner diameters to block stray light. To further suppress the remaining stray light a narrow-band (3 nm FWHM) optical filter with a transmission of 98% at the wavelength λ f is placed in front of the EMCCD detector. Over the past few years, single aspheric lenses  Figure 2. Illustration of the single-atom microscope. A 1D optical lattice produced by two counterpropagating laser beams (a) traps atoms (b) in the object plane of a infinity-corrected microscope objective (c). Beam tubes (d) block stray light and reflections of molasses beams (e) off the glass cell (f). A three-axis translation stage (g) allows precise alignment of the microscope objective. A tube lens (h) focuses the image onto an EMCCD detector (i). Tubes and cube-mounted turning mirrors (j) bridge the distance between the tube lens and the detector, while several built-in knife-edge apertures (k) and a narrow-band optical filter (l) further suppress remaining stray light. The separation between two adjacent lattice sites in the object plane is imaged to 24 µm in the image plane, which amounts to about 1.5 CCD pixels (m).
with long working distance (a few cm) have become widely available, and their utilization represents a good alternative to build an imaging system with a moderate numerical aperture (0.2 < NA < 0.5).

Localization of trapped atoms with small number of photons
According to Abbe's diffraction limit, the optical resolution of our imaging system is r A = 1.9 µm. However, to achieve single-site resolution we need to extract the position of single trapped atoms with an uncertainty smaller than the lattice spacing a (433 nm). In analogy to super-resolution imaging in biological systems, we can determine the position of our atoms beyond the optical resolution by precisely knowing its point spread function and the underlying noise. Following reference 44, in one dimension the localization precision of the fluorescence peak produced by a single atom can be estimated by where it is assumed that the fluorescence signal is integrated over n ⊥ pixels in the direction transverse to the lattice, and that RMS PSF is the RMS width of a Gaussian point spread function, ∆ p is the size of a camera pixel in the object plane, N is the average number of recorded photons per atom, and σ b is the RMS background noise (see Eq. (9) in Sec. 5.1).
In the literature, extensions of the result in equation (4) can be found for two dimensions [45] and, using the statistical theory based on the Fisher information matrix, for a generic disc point-spread function (e.g., Airy disc) [28]. The Fisher information approach, which for a Gaussian point spread function yields exactly equation (4), produces the fundamental theoretical localization limit that can be attained (Cramér-Rao information bound). Note also that the localization precision in equation (4) concerns only a single localized emitter, which is the case, for example, of an isolated fluorophore in photoactivated localization microscopy or of a very sparsely filled optical lattice. Section 6 is in particular concerned to super-resolve the position of emitters (atoms) clustered in small ensembles, which constitutes a significantly more demanding task. In addition, it should be noticed that, when employing an electron multiplying CCD camera (as is the case of the present work), a factor 2 must to be added in front of RMS 2 PSF in equation (4) to account for the effectively halved quantum efficiency due to the electron multiplying excess noise (see Sec. 5.1) [46].
In the following, we intend to give an estimate of the localization precision of our imaging system based on equation (4): The RMS PSF of our imaging system is ∼ 1.5 µm (see Sec. 4) and the parameter ∆ p can be calculated by dividing the pixel size (16 µm) by the magnification (∼ 55, see Sec. 3.1). The number of photoelectrons (ph. e − ) recorded on the EMCCD sensor per single atom can be estimated by knowing the photon scattering rate, the solid angle of the microscope objective into which photons are emitted, and the exposure time. Atoms illuminated with nearly resonant light at λ f emit photons at the maximal rate of Γ/2 for strong saturation, with Γ ∼ 2π × 5 MHz being the radiative decay rate for cesium. However, to prevent atoms from hopping along the lattice during imaging, the saturation parameter is typically chosen much smaller [47,48] (s ∼ 0.01), which reduces the scattering rate by a factor of 10 or more [49]. The solid angle directly depends on the NA of the imaging system according to the formula Ω/4π = (1− √ 1 − NA 2 )/2 ∼ 1 %. By additionally taking into account the finite quantum efficiency of the CCD camera QE(λ f ) ∼ 30 % (see Sec. 3.3) as well as photon losses (∼ 6 %) due to both reflections from optical surfaces (e.g. the vacuum glass cell) and the transmission of the narrow-band optical filter (see Sec. 3.1), we expect to detect about 1000 ph. e − per atom for a single fluorescence image with an exposure time of T = 1 s. For comparison, in our experiments we record about 1300 ph. e − s −1 per atom as discussed in section 6.1. The measured background-noise distribution, which is analyzed in section 5.2, has a RMS width σ b of about 0.6 ph. e − per camera pixel. Since we integrate the fluorescence images along the direction transverse to the 1D optical lattice (see Sec. 4.1), the variance of the background noise σ 2 b is multiplied by the number of transverse pixels n ⊥ (typically n ⊥ ∼ 40). Hence, based on equation (4) we expect a localization precision of ∆x ∼ 60 nm, which is sufficiently smaller than the separation between two lattice sites. By using longer exposure times it is possible to improve the resolution even further, however, at the cost of decreasing the duty cycle and increasing the probability for atoms to either hop to adjacent lattice site or to be lost because of heating and background gas collisions. Moreover, a slow drift of the entire lattice with respect to the imaging system (≤ 20 nm/s [5]) is responsible for the existence of an optimal exposure time (estimated around 2 s for our system) beyond which the localization precision deteriorates instead of improving, if the lattice drift is not suitably tracked and accounted for. Such drifts are especially notable in case of imaging systems with very high optical resolution, as recently demonstrated through the measurement of the Allan variance associated with the position uncertainty of trapped ions [34].

The EMCCD detector
In our experiment, the fluorescence signal of the atoms is detected using an electron multiplying CCD (EMCCD) camera (Andor iXon DV897DCS-FI), whose read-out noise is more than one order of magnitude smaller compared to that of scientific-grade CCD sensors. In fact, scientific-grade CCD sensors are at present limited by a background noise σ b > 6 ph. e − per pixel. The increased noise would deteriorate the localization precision estimated for our system by a factor of 6 (see Eq. (4)), therefore preventing reliable singlesite resolution. To detect signals of few photons, such as fluorescence of single atoms, alternative types of imaging sensors exist, which include intensified CCD sensors and CMOS sensors. In appendix A we provide a review of sensors suited for few-photon-signal detection, and discuss technical noise sources inherent the different technologies are discussed. Our EMCCD camera employs a front-illuminated, frame-transfer, buried channel CCD sensor (L3Vision CCD97) containing 512 × 512 active pixels with a pixel size of 16 × 16 µm 2 . A measurement comparing front-with back-illuminated sensors is given in appendix B. The quantum efficiency at the imaging wavelength λ f with the EMCCD sensor cooled to −70 • C is QE(λ f ) ∼ 30 %. It should be noted that the efficiency decreases at lower temperatures for wavelengths > 800 nm due to a temperature dependence of the silicon band gap [50].
In EMCCD sensors, suppression of read-out noise is achieved through the serial electron multiplying (EM) register, which amplifies charges by impact ionization at each shift step, similar to a staircase avalanche photodiode (see App. A). The EM register of our camera comprises N = 536 shift steps. Even though the probability of impact ionization at each individual step is only p imp ∼ 1.5%, due to the large number of steps, the mean gain of the cascaded multiplication process g = (1 + p imp ) N can reach values well above 1000. The effect of its stochastic nature on the detection noise is discussed in section 5.1.

The atomic sources
To acquire fluorescence images with the detection apparatus described in section 3 we trap atoms in a deep lattice potential with lattice constant a = 433 nm and illuminate them with a three-dimensional optical molasses. The saturation parameter of the optical molasses and the lattice trap depth (U/k B = 0.4 mK) are chosen as such to prevent atoms from hopping along the lattice direction. Figure 3(a) exemplarily shows a fluorescence image of eight trapped atoms, which are loaded into a 1D optical lattice in stochastic positions and subsequently imaged with an illumination time of 1 s. The intensity distribution for each atom exhibits a characteristic elliptical shape elongated along the radial direction of the optical lattice with an aspect ratio of about 6:1 (FWHM along the axial direction of 1 µm). The elongated shape originates from the thermal motion of trapped atoms (∼ 30 µK by sub-Doppler cooling in the optical molasses) in the radial direction, along which the confinement is weaker. Along the lattice direction, instead, trapped atoms can be regarded as localized point sources with a Dirac-delta longitudinal distribution with a certain radial intensity distribution O(y), because the extent of the axial thermal motion (FWHM ∼ 60 nm) as well as the drift of the optical lattice (≤ 20 nm/s [5]) is one order of magnitude smaller than the optical resolution. Being primarily interested in extracting the precise position of atoms along the optical lattice, we integrate the acquired images along the radial direction (I[x i ] = j I[x i , y j ]) as depicted in figure 3(b), which reduces the deconvolution problem to a one-dimensional one. The continuous curve overlapped with the integrated fluorescence signal shows the end result of the parametric deconvolution problem presented in section 6, which yields the atoms' positions with single lattice-site precision. Figure 3(c) shows the residuals between the reconstructed distribution and the measured signal, normalized to the expected noise strength. The uniform distribution of residuals with RMS spread around one attests the quality of the parametric deconvolution, which is ensured by an accurate knowledge of the LSF of the imaging system as well as of the noise model. The methods to reconstruct the LSF are presented in following section 4.1, while the physical noise model is presented in section 5 and the parametric deconvolution process is illustrated in the last section 6.

Reconstructing the line spread function with sub-pixel resolution
One key element to achieve a resolution beyond the diffraction limit is the accurate knowledge of the response of our imaging apparatus. More precisely, it is important for the parametric deconvolution problem to know exactly the imaged fluorescence intensity distribution of a single illuminated atom. The importance of an accurate knowledge of this distribution is quantitatively demonstrated in section 6.4. In a 1D optical lattice, the problem of reconstructing the positions of atoms can be reduced to one dimension by exploiting the factorized form of the single-atom distribution in equation (5). In fact, a single atom positioned at x = 0 yields (see Eq. (1)) a fluorescence distribution that integrated along the radial direction reads where L(x) = ∞ −∞ P(x, y) dy represents the so-called line spread function. As argued in section 2, the response function required in the deconvolution problem is the convolution of the optical line spread function L(x) with the 1D CCD pixel function R p (x) [51], In the following we present our method to reconstruct the L CCD function with increased signalto-noise ratio and sub-pixel resolution, which is based on superimposing multiple intensity distributions of sufficiently isolated atoms (for example, the rightmost atom in Fig. 3(b)). The superimposing process is generally referred to as image registration in digital signal processing.
We make use of a recursive algorithm to process single-atom images, whose end result should ideally converge to L CCD in equation (7). The algorithm is composed of a preparatory procedure followed by an iterative one. The first step of the preparatory procedure consists in identifying those regions of interest containing exactly one atom well separated from other atoms by several Abbe radii (typically 10) in order to allow us not only to reconstruct the central peak of the LSF but also the wings containing the diffraction fringes. In the next step, we apply a Fourier filter to each single-atom image to remove high-spatial-frequency noise. The filter utilizes the fact that every optical system with a hard aperture has a cutoff in the optical transfer function (OTF), defined as the Fourier transform of L CCD , exactly at the Abbe frequency 1/r A = 2NA/λ f . After discrete Fourier transformation (DFT) of the integrated intensity distributions, the filter sets the amplitude of all frequencies beyond the Abbe cut-off (typically > 1.2/r A to reduce Fourier artifacts) to zero because these frequencies components do not carry physical information (OTF = 0 in this region). The effect of Fourier filtering is significant for our imaging system because the Abbe frequency is three times smaller than the Nyquist frequency of 0.5 pixel −1 the frequency up to which noise appears if not filtered out. The last step of the preparatory procedure to reconstruct the LSF consists in interpolating the noise-filtered single-atom distributions with sub-pixel resolution, which allows us to reposition them in the subsequent iterative procedure with high precision. Because of the finite bandwidth of the OTF, the integrated fluorescence signal can be interpolated with an arbitrary spatial resolution using the Whittaker-Shannon interpolation formula: We extend the DFT fluorescence distribution in Fourier space beyond the Abbe cut-off with zero values (zero padding), so that the number of points in the Fourier space is increased by an integer factor s with respect to the original number. The inverse DFT of the zero-padded signal results in an upsampled distribution, where the width of a sub-pixel is equal 1/s of the original pixel's width. The size of the sub-pixel is chosen smaller than the estimated localization precision (typically s = 8 so that 1/8 pixel = 37 nm < ∆x, see Sec. 3.2). An alternative yet equivalent application of the Whittaker-Shannon interpolation formula operates directly in position space by convolving the spatial distribution with a sinc function.
The iterative part of the reconstruction algorithm consists chiefly of two steps. In the first one, we obtain the position of each atom by a non-linear least squares fit of the model distribution L CCD to the recorded fluorescence signal (see Sec. 6 for more details). The precise (unrounded) value of the atom position is used to shift and align all noise-filtered sub-pixelinterpolated intensity distributions. Hence, superimposing all images gives a reconstruction of the fluorescence distribution of a single atom with a signal-to-noise ratio enhanced by a factor √ N at , where N at is the number of superimposed single atoms (typically a few hundreds). The reconstructed distribution L guess provides us with a new estimate of L CCD . The iterative algorithm stops when no change is observed (typically after 5 to 10 iterations). For the first iteration, we use a Gaussian function to determine the position of single atoms in case no LSF function is a priori known.
A mathematical derivation (see App. C) shows that this algorithm converges to instead of the desired expression in equation (7), where R x is the probability distribution of the non-linear least squares estimator of the single-atom position for an atom ideally positioned in the origin x = 0 (with a RMS width ∆x ∼ 60 nm, see Sec. 3.2), and R sp the sub-pixel function equivalent to the pixel function R p but s times narrower. Because the "blurring" effect by both additional convolutions in equation (8) is on the order of a few tens of nanometers, we conclude that L guess (x) ∼ (R p * L)(x) to a good approximation. For precise reconstruction of the L CCD function, equation (8) shows that it is advantageous to increase the illumination time in order to decrease ∆x. The reconstructed LSF and the related modulation transfer function (MTF = |OTF|) are displayed in figure 4 and analyzed in detail in the following section.

Analysis of the reconstructed line spread function
Besides its importance to retrieve the atoms' positions with the maximum localization precision (see Sec. 6.4), the line spread function contains valuable information about the performance of the optical system. Since the influence of R p in equation (7) is small (ensured by the Nyquist-Shannon condition r A > 2∆ s ), the optical line spread function L is well approximated by L CCD . Figure 4(a) shows the reconstructed LSF obtained with the algorithm outlined in the foregoing section. In case of an aberration-free imaging system, the point spread function is described by the well-known Airy disk, whose corresponding LSF is displayed for comparison in the same figure. Besides an overall agreement, the reconstructed LSF exhibits a lower maximum and a distinct asymmetry such that the higherorder diffraction peaks are only visible on the left-hand side. These differences arise from wavefront distortion caused by optical aberration. Mathematically, the point spread function is defined by computing the modulus square of the Fourier transform of the electric field (wavefront) at the pupil (Fraunhofer diffraction). The wavefront contains all information about optical aberrations and can be expressed in the basis of Zernike polynomials [43]. To Table 1. Result of the wavefront fitting to the measured LSF expressed in terms of low-order Zernike polynomials. The overall wavefront distortion is obtained by adding the different contributions in quadrature. 1D fitting of our model to the LSF cannot prevent a certain ambiguity on the identification of wavefront distortion angles (not displayed).

Defocus Astigmatism
Coma Trefoil Spherical (1) gain insight into the nature and amount of the optical aberrations affecting our optical system, we fitted to the reconstructed LSF the one obtained from a wavefront expansion in low-order Zernike polynomials up to spherical aberration. The fitted LSF is displayed in the same figure, demonstrating a remarkable agreement with the experimental curve. The fit analysis reveals that the leading aberration contribution arises from astigmatism. The detailed list of Zernike coefficients is given in Tab. 1. Combining all contributions in the table yields an overall RMS wavefront error of ∼ λ/17 (whereas the peak-valley deviation is λ/3), which corresponds to a Strehl ratio of 0.87 defined as the ratio between the maxima of the measured PSF and the ideal one. Note that the Strehl ratio, in general, differs from the ratio obtained analogously for the 1D LSF (about 0.92, see Fig. 4). In addition, the wavefront analysis gives an estimate of the actual numerical aperture of the optical system, NA = 0.228 (3). The deviation between the estimated numerical aperture and the one of the objective lens design (NA = 0.29) is most likely caused by clipping at the knife-edge apertures along the imaging path, see figure 2. Concurrently with this work, a similar wavefront analysis based on Zernike polynomials has been carried out to characterize the aberrations affecting two-dimensional fluorescence images of single trapped ions [34]. Figure 4(b) shows the modulation transfer function of the reconstructed LSF compared to that of an aberration-free optical system and of the fitted wavefront model. The MTF of an optical system with a hard aperture has a simple, direct geometrical interpretation, which explains the shape as well as the hard cut-off. In general, it can be shown that the MTF is directly computed by convolving the pupil function with itself, where displacements of the electric field distribution in the convolution integral directly translate into spatial frequency units of the MTF [35]. Therefore, an optical system with a hard aperture, outside of which the pupil function is strictly zero, results in a cut-off of the MTF at the Abbe frequency. This cut-off also provides a direct method to extract the actual NA of the optical system without resorting to fitting wavefront distortions.

Isoplanatic regions
The deconvolution problem described in section 2 relies on a translationally invariant response of the optical system. However, in real systems the LSF varies over the field of view because of optical aberrations primarily due to coma. Due to spatial variations, the localization precision of the atoms' positions is reduced if a single LSF is employed over the full field of view. In the literature, regions over which the LSF remains effectively unchanged are known as isoplanatic regions. To characterize the homogeneity of the LSF of our imaging system, we divide the full CCD region into five patches, each spanning over 100 pixels, where we reconstructed the LSF individually for each patch using the algorithm presented in section 4.1, see figure 5(a-e). The first three patches exhibit minor changes in the optical response, while the rightmost one shows a clearly visible broadening and decreased peak hight. Fluorescence images of atoms analyzed in this manuscript are from the three leftmost regions only. To even account for minor spatial deviations, the first three regions are further divided into sub-patches, each of them with an individual local LSF used to reconstruct atoms' position. The result of the wavefront-deviation analysis of the sub-patches, according to the method presented in the previous section, is illustrated in figure 6.

Modeling the noise sources
In order to identify the exact lattice-site locations of individual atoms with high reliability, not only the optical response of the imaging system should be precisely known (see previous Sec. 4) but also an accurate model of all relevant noise sources should be constructed. A well-designed imaging apparatus should strive to attain a RMS noise limited by fluorescence photon shot noise, which represents the true fundamental noise contribution. In general, this can be reached (as is the case of this work) by understanding, and suppressing where required, the technical noise contributions affecting the image formation. Moreover, one should also be aware of the excess noise adding on top of shot noise, which is caused by the EM register in EMCCD cameras. This noise contribution effectively decreases the signal-to-noise ratio by a factor √ 2 and cannot be simply eschewed as it is intrinsic to the technology of EMCCD cameras. Alternative detectors such as CMOS cameras, which also feature small read-out   noise, are discussed in Appendix A. In this section, we discuss the relevant noise sources and show that the next noise contribution after fluorescence photon shot noise is photon shot noise caused by background stray light (≈ 0.5 ph. e − / pixel). A summary of individual noise components with their scaling and quantitive estimates is provided in table 2. Photo-response non-uniformity (PRNU) is caused by variations in the pixel geometry and in the substrate material across the CCD chip. In back-illuminated EMCCD sensors, this also includes the so-called etaloning effect due to interference fringes in the a back- Table 2. Noise contributions affecting single-atom imaging, a EMCCD detector cooled to −70 • C, and 1 s exposure time. The overall noise σ is obtained from equation (9), which takes into account the EM amplification factor g and the excess noise factor F. RMS noise values extracted from: (a) figure 8, (b)  Read-out noise occurs in the charge-to-voltage amplification and analog-to-digital conversion process. Because this noise component σ readout is not amplified by the EM register, it is effectively suppressed by setting the multiplication gain to large values.

Noise sources in the detection process
Dark current noise arises from thermally generated charges. Buried-channel sensors are affected by two contributions bulk and surface dark currents depending whether electron-hole pairs are generated in the depletion region or at the silicon-silicon dioxide interface. Midgap edge states, either localized in the proximity of bulk impurities or at the front interface, strongly enhance the probability of electrons to thermally hop from the valence to the conduction band through a two-step trap-assisted process [52]. A third contribution, in general negligible at low temperatures, comes from diffusion of minority carriers (electrons) from the p-doped silicon substrate into the depletion region. Because of the large continuum of edge states at the interface with the silicon dioxide layer, the surface contribution dominates by about two orders of magnitude. Since all dark currents are amplified by the EM register, cooling of the CCD sensor is necessary to detect signals with few photons. For Peltier-cooled sensors (T > −100 • C), the contribution by surface dark currents is suppressed in inverted-mode EMCCD chips by applying a large negative bias voltage (multi-pinned phase mode [53]), which creates an inversion layer of holes at the surface preventing electrons from filling the midgap states, and thus suppressing the charge generation process. Fluctuations of the number of thermally generated electrons in the bulk, S therm , adds a Poissonian noise component with RMS noise σ therm = S therm 1/2 .
Clock-induced charges are a spurious electronic signal, S CIC , generated during the charge transfer process in the CCD sensor when the clock voltage switches the pixel from inversion to non-inversion mode. The process accelerates holes at the inversion layer back to the heavily doped p-type regions (channel stops), which produce charge carriers through impact ionization. Despite their ubiquitous presence in every CCD sensor, Clock induced charges (CICs) can only be detected in EMCCD sensors due to the extremely low read-out noise. CICs are reduced by high parallel transfer rates, small slew rates, and small clock swing [54], while they cannot be suppressed by cooling the EMCCD chip (the probability of impact ionization even increases with lower temperatures). By advanced clock-waveform shaping, modern EMCCD cameras can strongly reduce CICs produced by the vertical transfer process [55]. CICs produced in the serial and multiplication register cannot be simply explained in terms of impact ionization by the accelerated holes. In modern EMCCD cameras, the generation probability of horizontal CICs per shift step results similar to that of vertical CICs [56] in spite of what was originally deemed [57]. The stochastic generation of CICs is Poissonian distributed with a RMS noise denoted by σ CIC = √ S CIC [58].
Excess noise arises from the stochastic nature of the gain process in the EM register of EMCCD cameras, which causes an asymmetric broadening of noise distributions. The resulting noise distribution after amplification by the EM register has an RMS noise increased by the so-called excess noise factor (denoted by F), which tends to √ 2 for a large number of multiplication stages (g 1), as can be shown [59].
The overall signal measured by the EMCCD camera is the sum of all components, S [x i , y j ] = S fluo [x i , y j ] + S stray + S therm + S CIC , whose variance σ 2 is the quadrature sum of all individual noise components, Note that σ 2 [x i , y j ] is also the variance of [x i , y j ], which is defined as [ (3). Equation (9) shows that high EM gains g improve the signal-tonoise ratio for signals of few photons by effectively removing the read-out noise component σ readout , which in CCD sensors dominates over σ fluo , instead. In turn, EMCCD cameras are affected by excess noise through the factor F, which effectively halves the quantum efficiency.

Noise model
The knowledge of each individual noise contribution and their physical scaling with respect to the spatially-dependent fluorescence signal S fluo [x i , y j ] (see Tab. 2) permits to construct a noise model, which is used in the parametric deconvolution process in section 6.3. Based on the functional dependence of σ on S fluo , we rewrite equation (9) in the form where the RMS background noise σ b as well as the coefficients c 1 and c 2 are parameters to be experimentally determined. When the signal S fluo is given in photoelectron units, the coefficient c 1 is directly given by the excess noise factor F = √ 2. The coefficient c 2 is compatible with zero for our experimental apparatus as shown in the following.
We determine the coefficient σ b by analyzing the background noise of the imaging system from a series of images recorded without atoms in the optical lattice (see inset of figure 7). The histogram in figure 7 shows the background signal per pixel binned by signal strength in units of photoelectrons. The characteristic shape of the histogram mainly arises from read-out noise (Gaussian peak) and stray light (exponential tail). Calculating the RMS width of the recorded histogram yields the desired coefficient σ b = 0.6 ph. e − / pixel for 1 s illumination time. To obtain quantitative insight into the different noise components of the background signal, we model it in terms of Poisson-distributed clock-induced charges, which are stochastically amplified through the EM register, on top of which Gaussian-distributed read-out noise is added [60]. The red line in figure 7 shows the fitted model reproducing closely the recorded background-noise histogram, whereas the dashed blue line shows the same model fitted to a background-noise histogram for images recorded with the camera shutter closed. Due to the blocked stray light, the exponential tail is reduced by more than one order of magnitude. However, it does not fully vanish because of dark currents and primarily clock induced charges (see Sec. 5.1). Fitting the background-noise histogram allows not only a more accurate estimate of the RMS background noise (σ b ), but also a precise calibration of the EM gain g, which is a free fitting parameter. In fact, the parameter g enters the model through the probability to record y electrons after the EM register for x initial electrons comprising both spurious electrons and photoelectrons which is given by [61] To validate the noise model given in equation (10), we perform a measurement of the signal-to-noise relationship [62]. Based on > 1000 sets of five consecutive fluorescence images (3 s exposure time each) containing a small ensemble of atoms, we estimate for every CCD pixel the average signal as well as the standard deviation (RMS noise) in each set of images. Note that we consider only image sets where the distribution of atoms remains constant (neither atom hopping nor losses). The cloud of dots in figure 8 shows the correlation between the estimated average signal and the estimated RMS noise, both expressed in photoelectron units. By binning the signal-to-noise data points by their signal strength, we obtain a precise reconstruction of the signal-to-noise relationship, as shown in the figure. The measurement is in good agreement with the noise model calculated using the coefficients σ b , c 1 and c 2 given above, therefore confirming the square-root dependence of the RMS noise on the fluorescence signal strength S fluo . Furthermore, because no linear dependency is discernible in the recorded signal-to-noise relationship, we set c 2 to zero.

Localization of atoms by parametric deconvolution
We retrieve the position of atoms in the optical lattice using a parametric deconvolution process, which comprises several stages:

Counting atoms in regions of interest
The knowledge of the exact number of atoms is a necessary prerequisite in order to determine the positions. While the identification of the number of atoms is relatively straightforward when the atoms are well separated from each other, e.g. by counting the number of peaks in the intensity distribution, it is more difficult when the atoms cluster together with separations smaller then the optical resolution. In such a situation, the peaks of individual atoms are no longer discernible. Hence, instead of counting peaks, we estimate the number of atoms from the total number of recorded photoelectrons normalized to that of a single atom. Figure 9 shows the histogram of the integrated photoelectrons of a large number of ROIs, exhibiting well-separated equidistant peaks. Each peak corresponds to a different number of atoms m, with the leftmost peak marking the number of photoelectrons per atom (about 1300 ph. e − /s). The continuous curve in the figure shows the best fit to the recorded histogram based on the sum of seven Gaussian distributions combined with a homogeneous background, which is added to account for atom losses during the exposure time corresponding to the flat background. The width of the peaks obtained from the fitting model (see inset in figure 9) increases with √ m. This broadening implies that adjacent peaks with m > 5 overlap significantly, thus preventing an unambiguous identification of the exact number of atoms m. This is one reason why we divide every integrated fluorescence distribution into smaller, well-separated ROIs, each containing at least one atom, as shown in figure 3(b). The width of each ROI is determined by thresholding method known as image segmentation algorithm. Besides, the subdivision in ROIs with a small number of atoms is also beneficial to reduce the computation time of the non-linear last squares estimation of positions, which scales quadratically with the number of atoms in a ROI.
The parametric deconvolution problem requires as a precondition that the number of atoms is correctly determined. Therefore, it is important to identify acceptance regions R i of the integrated photoelectron signal where the statistical hypothesis H i the analyzed ROI contains precisely i atoms is verified with a probability higher than certain desired confidence levels. Moreover, we should also take into account the additional statistical hypothesis H 0 that one or more atoms are lost during the exposure time. Referring to the fitting model in figure 9, the i-th Gaussian function describes the probability distribution when H i occurs, while the homogenous background models the probability distribution when H 0 occurs. The acceptance regions R i (shaded regions in the figure) are obtained by maximizing their width (i.e., the probability P(R i |H i ) defining the power of the statistical test) under the provision that the statistical confidence remains higher than a certain desired confidence level (percentage numbers in the figure). The confidence level for the region R i is defined as the probability that the hypothesis H i is verified when the integrated photoelectron signal is detected in R i , namely P(H i |R i ) [63]. From the analysis of acceptance regions in figure 9 we find that, given a certain confidence level, the width of the acceptance region is strongly determined by atom losses (homogeneous background). This result shows that is beneficial to minimize the atom loss probability during the illumination process (∼ 1 % in our case). By post-selection analysis, we only retain ROIs for which the integrated photoelectron signal lies inside one of the acceptance regions. Higher confidence levels can be chosen, but this implies 97  narrower regions R i and, therefore, more ROIs post-selected out. In order to achieve for the one-atom peak confidence levels > 99 % with small rejection rates (< 1 %), we segment the fluorescence image in smaller ROIs than those considered in the example displayed in figure 9.
The described method to determine the number of atoms, which relies on the total number of photoelectrons in each ROI, is relatively robust and simple to implement. However, it does not make optimal use of physical information contained in the image since spatial information is lost after the integrating photoelectrons for each ROI. It is possible to improve the accuracy by incorporating the spatial information of the recorded fluorescence images through a Bayesian updating algorithm [64].

The model distribution
When illuminated by nearly resonant light, atoms in a deep optical lattice behave like identical light sources positioned at certain sites of the lattice. Supposing that the number of atoms m is exactly known (see Sec. 6.1), we model the integrated signal of equation (3) as where ξ l are the positions of the atoms, and the amplitude factors A l account for inhomogeneities from the the illumination lasers as well as from atom losses during the exposure time. L CCD is the response function of the imaging system defined in equation (7) representing the fluorescence distribution of a single atom with sub-pixel resolution (numerical interpolation between sub-pixels permits its evaluation for any realvalued argument). Because we perform background subtraction on all integrated intensity distributions, the model in equation (12) does not require an additional constant offset. In addition, relying on the discreteness of positions in the optical lattice, we can express ξ l as where p l are the desired integer positions in lattice-site units, a is the separation between lattice sites in CCD pixel units, and δ L is the position offset of the optical lattice. For small optical aberrations (see Secs. 4.2 and 4.3), it is sufficient to consider a single value of a (1.47 pixels per lattice site) for the entire field of view. Moreover, losses by light-induced collisions prevents the detection of two (or more) atoms in the same lattice site in deep optical lattices [65,66]. Hence, p l p l for any pair of atoms l and l .

The parametric deconvolution process
To retrieve the atoms' positions, we employ a non-linear least squares optimization algorithm, which fits the model distribution I M in equation 12 to the recorded fluorescence distributions. This parametric deconvolution approach allows us to make optimal use of physical information contained in the response function of the imaging system and in the noise model. However, non-linear least squares optimizations require well-chosen starting parameters in order to guarantee the convergence to the global optimum. The parameters of our model are the amplitudes A l and positions ξ l of atoms. While an initial estimate of A l can be directly obtained from the average number of photoelectrons per atom (see Sec. 6.1), an estimate of positions ξ l demands a separate procedure. Hence, to obtain the first estimate of positions for the non-linear least squares optimization, we make use of the Wiener deconvolution combined with a spectral density estimation algorithm (MUSIC algorithm).
Wiener deconvolution. The main idea underlying our approach to obtain an estimate of ξ l is understood by considering the Fourier transform of the model distribution in equation (12), where In order to obtain f [k] avoiding noise amplification, we employ the Wiener deconvolution algorithm [67], which computes where SNR[k] = f [k] 2 /σ 2 is the signal-to-noise ratio defined as the ratio between the estimated deconvolved signal f [k], which is obtained by applying the filter iteratively (typically 10 iterations), and the integrated noise power in the analyzed ROI, which is estimated as σ 2 = n ⊥ n σ 2 b + F 2 N (see also Sec. 5.2). We recall that n ⊥ and n represent the number of CCD pixels in the ROI in the direction transverse and parallel to the lattice, respectively, and N is the integrated number of photoelectrons. The Wiener filter factor in equation (14) is relevant only at higher frequencies, while it is approximately 1 at lower frequencies because SNR[k] is very large (typically > 100) and MTF[k] ∼ 1 for k r A 1.
Spectral density estimate (MUSIC algorithm). Several methods exist in the literature to estimate the spectral density from a noisy signal f [k]. The simplest method known as periodogram method employs a discrete Fourier transform of f [k] to determine the n dominant Fourier components, whose frequencies yield an estimate of the positions ξ l . However, this method suffers from known deficiencies such as being a biased estimator and exhibiting spectral leakage. More refined methods known as subspace methods have been developed for the estimation of the spectral components when the signal contains exactly m dominant Fourier components (i.e. m atoms) with amplitudes well above the noise background. Among the subspace methods, the so-called MUSIC algorithm (multiple-signal classification) has been identified as the one exhibiting the highest spectral resolution [68]. MUSIC yields a pseudospectrum f [k] exhibiting a negligible bias in case of sufficient signal-to-noise ratio [69] and not suffering from spectral leakage in contrast to non-parametric methods (e.g., periodogram). In particular, the strength of MUSIC algorithm for the first estimation of the atoms' position resides in its robustness against noise disturbances and in the fact that no prior knowledge of the parameters (i.e. the atoms' positions) is required. This differs from leastsquares minimization procedures, which require good starting parameters to ensure a rapid converge to a global minimum. While our implementation of MUSIC algorithm requires the prior knowledge of the number of atoms, m, in the ROI, extensions of the algorithm exist in the literature that also estimate the number of sources [70], which could be helpful to handle very large ROIs with high filling factors. The solid line in figure 10 displays the estimated power spectral density obtained from the MUSIC algorithm applied to the fluorescence distribution shown in figure 3. The pseudospectrum exhibits eight sharp peaks much narrower than the diffraction-limit width, each approximately centered on the atoms' positions (note the logarithmic scale in the figure). The figure shows that the positions estimated by the MUSIC algorithm are very close to those determined by the more accurate non-linear least squares estimator, which takes into account the dependence of noise on the signal.
Non-linear least squares estimator. We use the position of the atoms estimated by the MUSIC algorithm as input parameters of the non-linear least squares minimization where S fluo [x i ] is the background-subtracted fluorescence distribution, I M [x i ] the model distribution given in equation (12), σ the noise model presented in equation (10), and n is the number of pixels in the 1D ROI. In the minimization problem of equation (16), the positions ξ l are treated as real-valued free parameters (compare Sec. 6.4). Furthermore, we use the model distribution I M instead of the measured signal S to estimate the noise variance σ 2 at the pixel x i because the model function provides a better estimate of the fluorescence signal after a few iterations of the least squares minimization. Several algorithms exist to carry out the minimization in equation 16 such as the Levenberg-Marquardt method. In our case we employ a trust-region algorithm, which allows us to constrain the amplitude parameters A l to physical boundaries (typically five times the width of the one-atom peak in Fig. 9). An example of the least squares parametric deconvolution is shown by the solid red line in figure 3(b). The accurate model in equation (12) constructed from the measured LSF and the weighting factors in equation (16) accounting for the correct variance of noise in each pixel ensure flat residuals with a variance around 1, as displayed in figure 3(c). For normal distributed residuals, the nonlinear least squares fit is equivalent to a maximum-likelihood estimator of positions [30], which defines the gold standard concerning the extraction of physical information from fluorescence images, as argued in section 1. Because each 1D pixel of the integrated signal carries a large number of fluorescence photoelectrons (see Fig. 3(b)), the dominating Poisson-distributed shot noise is well approximated by a Gaussian distribution. However, excess noise in the EMCCD camera causes non-Gaussian deviations, which can be seen, for example, in the logarithmic graph of figure 7. Even neglecting this super-Poissonian noise characteristic, previous work using EMCCD cameras reports localization of single emitters with a precision attaining the Cramér-Rao information bound [71]. It is the purpose of future work to refine our estimation of the atoms' positions by maximizing the appropriate likelihood function in order to account for the EM excess noise [72] as well as for the Poissonian statistics in the limit of very small signals [73]. In addition, we find that the distribution of the sum of squared residuals obtained by analyzing the positions of atoms in > 5000 ROIs is well described by a χ 2 distribution with n − 2m degrees of freedom. This result suggests that the minimization procedure of equation (16) approaches the limit of the maximum-likelihood estimator of the atoms' positions.

Enhancing the localization precision at higher filling factors
The parametric deconvolution method outlined in section 6.3 works very reliably in case of ROIs containing only a few atoms separated by several lattice sites. This is the situation, for example, of single-particle experiments such as quantum walks [12,18,19]. However, the determination of the atoms' positions is less reliable for atoms clustered in small ensembles, where the lattice filling factor approaches unity [4]. In experiments investigating strongly interacting particles, it is particularly important to reconstruct the atoms' positions with a high reliability also when the spacing between particles is close to, or is even less than, the optical resolution of the imaging system [13]. By taking the discreteness of the lattice into account, we demonstrate that the previously presented parametric deconvolution method can be extended to achieve high success rates also for small ensembles of atoms that are closely packed. As argued in section 6.2, the fact that atoms are trapped in an optical lattice provides us with two pieces of information: (1) the distance between two atoms can only be a multiple of the intersite separation a (see Eq. (13)) and (2) two or more atoms cannot occupy the same lattice site. To exploit the discreetness of the optical lattice, the lattice constant a (433 nm) needs to be precisely known in units of CCD pixels. Its value can directly be computed from the magnification factor (see Sec. 3.1) and the pixel size (see Sec. 3.3) or more accurately measured by analyzing the distribution of distances between two atoms, which are obtained using the deconvolution method presented in the previous section.
To include the constraints (1) and (2), we adopt the following procedure: the positions of the atoms are initially determined by the parametric deconvolution process described in the previous section, and rounded to an integer multiple of lattice sites. We subsequently produce an array of all combinations of distances between atoms, where each distance is let vary by ±1 lattice site with respect to the initial estimate. Furthermore, we exclude all combinations where two atoms occupy the same lattice site. For each combination of distances, we perform a non-linear least squares minimization of equation (16) with the amplitudes A l and the lattice position offset δ L as the only free parameters. We finally choose the combination of distances with the maximum likelihood, which provides us with the best guess of the positions of atoms. Moreover, the χ 2 distribution of the sum of squared residuals (see Eq. (16)) allows us to perform a likelihood-ratio test (e.g., Neyman-Pearson lemma) that rejects the reconstructed positions if the statistical confidence lies below a certain specified value. A related approach has also been reported in reference 11, however, without discussing how noise contributions are handled in the deconvolution problem.
In order to quantitatively benchmark the reliability of the discrete deconvolution method against the continuous one presented in the previous section, we employ Monte Carlo simulations of fluorescence images, which provide large statistics and the exact knowledge of the true positions. To simulate a pattern of atoms in the lattice, the model distribution in equation (12) is used to construct the fluorescence images, where the noise is randomly drawn to reproduce shot noise fluctuations of the fluorescence signal and the background noise distribution shown in figure 7. Our Monte Carlo simulations also incorporate the stochastic fluctuations produced by the EM register. In particular, we simulated four atoms equally spaced with the spacing varying from one to nine lattice sites. Figure 11 shows the success rate in determining the correct distance between the four atoms. The analysis of simulated images shows that the success rate of the continuous parametric deconvolution rapidly decreases for separations smaller than the Abbe radius r A (diffraction limit). The drop in the success rate is even more evident when a Gaussian function is used instead of the precisely reconstructed LSF L CCD , see section 4.1. In contrast, it is remarkable that the success rate of the discrete parametric deconvolution remains above 90 % for almost all lattice filling factors. Moreover, the success rate even increases at unity filling since the number of possible configurations of the four atoms is strongly reduced.
We recently developed a new atom resorting technique that allows us to deterministically position a small ensemble of atoms in any arbitrary pattern on the 1D optical lattice. The experimental details of the resorting technique are the subject of a future publication [20]. We employed this technique to reposition, on demand, four atoms along the lattice, thus reproducing the same distributions studied in figure 11, with the atoms separated by an equal number of sites (10, 5, 2 or even 1 lattice sites). Though based on a significantly smaller statistics, the experimental results are consistent with the theoretical findings obtained with a large number of simulated images, confirming an enhancement in the reliability of the parametric deconvolution if the positions are constrained onto the lattice.

Outlook
Optical diffraction imposes a stringent limit on the bandwidth of an optical system: physical information contained in the spatial frequency components above the Abbe frequency 1/r A is not captured by the imaging system. However, this physical information is not irremediably lost as long as prior knowledge of the structure of the imaged object is available. Advances in image processing techniques especially driven by the field of super-resolved fluorescence microscopy have demonstrated that in this case the higher-frequency components of the imaged object's spectrum can be extrapolated from the imaged distribution. This principle is what enables information retrieval with spatial resolution beyond the diffraction limit. The prior knowledge is necessary to solve the deconvolution problem, which ideally reconstructs the original object's distribution (and its entire spectrum) by eliminating diffraction-induced blurring and noise effects. In this article, we presented state-of-the-art methods that solve the deconvolution problem for fluorescence images of neutral atoms in optical lattices making optimal use of the prior information on the physical system (sub-pixel-reconstructed line spread function, accurate noise model, discreteness of the optical lattice). The image processing methods we developed are applicable to any experimental apparatus for fluorescence imaging of trapped atoms and ions, and can be directly generalized to two dimensions.
Our methods are particularly beneficial to improve the localization reliability in experiments with constraints on the number of scattered photons, or on the numerical aperture of the objective lens. For example, experiments imaging light fermions like lithium and potassium atoms suffer from low fluorescence scattering rates, which are generally more than one order of magnitude smaller than those achieved with heavier atoms like Cs and Rb. Recent experiments with light fermions have shown that even for large numerical apertures NA > 0.8 the number of photoelectrons recorded per atom is around 1000 for an exposure time of 1 s [14][15][16]. Similar yields of photoelectrons are obtained in our apparatus where we employ an objective lens with much smaller numerical aperture (NA = 0.23). By taking advantage of our deconvolution methods, these experiments can minimize the exposure time while ensuring a high reliability to localize each individual atom in the lattice. Short exposure times reduce the probability of atoms to hop to neighboring sites, thus avoiding localization errors as well as losses of atoms colliding inelastically with a second neighboring atom.
In our laboratory, the construction of a new experimental apparatus for imaging single atoms with much higher numerical aperture (NA > 0.9) is underway, which ensures a twenty times higher collection efficiency and a four times narrower point spread function. The analysis methods demonstrated in this article, when applied to the new imaging apparatus, should enable single-site resolution with unprecedentedly short exposure times (< 10 ms) [74], allowing us to directly discriminate between the two internal hyperfine states of Cs atoms (qubit states) [48,64,75]. Moreover, it is the goal of future work to investigate whether compressed sensing techniques, which rely on a completely different principle than our parametric deconvolution method, provide advantages for information retrieval beyond the diffraction limit [76,77].

A. Single-photon cameras
With current technology, the sensitivity limit of conventional CCD sensors is determined by read-out noise, which is produced during the processes of charge-to-voltage and analog-todigital conversion [78]. Low-noise CCD cameras cooled to low temperatures (< −50 • C) and operating at high read-out rates (> 10 Mpixel/s) have RMS values of read-out noise equivalent to 6 ph. e − . When operating at low read-out rates (< 20 kHz), commercial state-of-the-art CCD sensors can attain lower read-out noise around 2 e − , however, still above the value of one electron per pixel the ultimate limit for single photon imaging. Recently, research prototypes showed that sub-electron read-out noise could be realized in the near future [79].
For weak radiation sources (e.g. fluorescing single atoms) the amount of shot noise can amount to very few electrons. One would ideally need CCD sensors with sub-electron sensitivity to avoid that the read-out noise dominates over shot noise. To overcome the technical limit imposed by read-out noise, three major technologies have been developed over the years and found widespread application: electron multiplying CCD cameras and intensified CCD cameras, and more recently CMOS sensors. All these technologies rely on the preamplification of the physical signal the number of photoelectrons prior to the readout amplification stage. These three technologies are shortly reviewed in the following.
The basic idea underlying intensified CCD (ICCD) cameras dates back to the first half of the 20th century: It consists in employing a photocathode to convert the incoming photons into photoelectrons, whose number can then be multiplied by avalanche amplification. A gating electric field is used to precisely control the access of photoelectrons into a microchannel plate, where the avalanche multiplication process takes place. The electrons exiting the microchannel plate are accelerated towards a phosphor screen, upon which they recreate the same distribution of photons impinging at the photocathode. Secondary photons emitted by the phosphor screen are eventually imaged onto a low-noise CCD detector. Electron multiplication in the microchannel plate can readily reach amplification factors up to 10 4 , which allow read-out noise to be effectively suppressed down to values as low as 10 −3 e − /pixel. For a more detailed account of ICCD cameras, please refer to Ref. 80 and references therein.
The first practical demonstration of electron multiplying CCD (EMCCD) cameras has been provided in 2001 [81][82][83]. In essence, EMCCD sensors are CCD sensors equipped with a low-noise electron multiplying (EM) register in addition to the conventional register used to transport electron charges. In comparison to conventional CCD registers, the EM register uses a higher clocking voltage to provide the electrons with sufficient kinetic energy to cause impact ionization. Through avalanche multiplication, the EM register is able to stochastically multiply the number of electrons by factors in the range of a few thousands. The read-out noise is thereby effectively reduced on the order of 10 −2 e − /pixel, which is smaller than noise produced by dark counts and clock induced charges (CIC) [84]. For a more detailed account, please refer to Ref. [85] and references therein. In addition, both CIC and read-out noise can be further suppressed by hardware binning the CCD pixels along the vertical direction [86]. It is understood that spatial resolution along the binned direction is reduced depending on how many pixels are vertically binned together. Note that hardware binning is not exploited in this work.

A.3. Comparison between ICCD and EMCCD cameras
In contrast to EMCCD sensors, ICCD sensors are virtually insensitive to spurious CIC and thermal dark electrons since these processes occur after the amplification process. They are, however, sensitive to dark electrons generated in the photocathode (so-called equivalent background illumination), whose rate is generally small 1 e − /(pixel s) already at room temperature. Therefore, cooling of the sensor to low temperatures is not needed for short exposure times lasting only a few seconds. On the other hand, ICCD cameras have a lower quantum efficiency than EMCCD cameras (see also discussion in App. B) because of the lower sensitivity of photocathode materials, especially at longer wavelengths. For instance, at our fluorescence wavelength λ f = 852 nm, QE(λ f ) does not exceed 20 % at the present time. ICCDs allow much shorter gate times than EMCCDs, though this feature is not particularly relevant for imaging single atoms trapped in optical potentials. In addition, the finite radius of the microchannel plate in ICCD sensors limits the resolution to about 50 line pairs/mm; for the Abbe radius r A of our microscope objective, for instance, a magnification factor of the order of 50 is required to avoid affecting the overall optical resolution. A detailed comparison of noise properties has been published elsewhere [87][88][89].

A.4. CMOS sensors
Due to significant advances over the past two decades in manufacturing microscale, ultralownoise MOSFET devices, CMOS image sensors represent today an appealing alternative to conventional CCD detectors in low light imaging applications [90,91]. The basic element consists here of an active pixel sensor, which provides the charge-to-voltage conversion electronics and the transistors needed for voltage buffering and pixel addressing [92]. The absence of the CCD shift register enables faster parallel read-out rates and excludes the noise contribution caused by CIC. Read-out noise in CCD sensors is dominated by Johnson noise at the charge amplifier, whose white spectrum is inevitably fed into the large video bandwidth. In contrast, parallel amplification in CMOS sensors makes it possible to directly amplify the signal at the active pixel location, where the signal is formed. In such a way, the bandwidth of the source-follower and column amplifier used to amplify the charge signal into a voltage signal can be limited through a low-pass filter. This prevents high-frequency noise components to be amplified and fed through the high-bandwidth analog-to-digital conversion CCD counts (arb. un.) Figure 12. Interference fringes in a back illuminated chip under coherent illumination at λ f . The average peak-to-valley variation amounts to about 40% of the signal. A peak-to-valley fringe corresponds to a thickness change of the back-thinned silicon layer by less than 100 nm. The axes shows the spatial scale corresponding to the CCD chip.
circuitry [93,94]. Commercially available CMOS cameras exhibit read-out noise as low as 1 e − /pixel, which is nearly independent of the video frame rate. The presence of transistors in the pixel area significantly screen the silicon photosensitive area from the impinging photons, thus reducing the pixel's fill factor. To circumvent this problem, an array of microlenses is usually employed in scientific-grade sensors to efficiently collect photons in front of each pixel. Alternatively, back-illuminated CMOS sensors (see e.g. OmniVision Technologies, Inc.) can also be employed. A detailed comparison of CMOS cameras with CCD, ICCD, and EMCCD cameras has been carried out elsewhere [95,96].

B. Front vs. back illuminated CCD detectors
In silicon-based detectors, the quantum efficiency (QE) of CCD detectors is determined by the probability that an incoming photon is converted into an electron-hole pair inside the photosensitive region a region that is completely depleted of mobile charge carriers where electrons are efficiently collected into the pixel by means of a built-in electric field. For CCD detectors that are frontally illuminated, photons must first transverse the polycrystalline silicon structure of electrodes and a silicon-oxide insulating layer before they can reach the photosensitive region. Reflections and absorptions by the electrodes cause a reduction of QE, which can be avoided if the back side, i.e. the one opposite to the CCD electrodes, is turned towards the radiation source. This can be achieved by etching the chip to a thin layer around 10 − 20 µm thick. Back-illuminated back-thinned CCD detectors have thereby doubled the QE with peak values above 90 %. We have tested a commercial, state-of-the-art back-illuminated EMCCD sensor with read-out noise specified at < 0.05 e − /pixel and dark current noise at < 5 × 10 −3 e − /(pixel s) for a temperature below −65 • C. This sensor provides a QE of 60 % at the fluorescence wavelengths of 852 nm, which is about the double of that achieved by our front-illuminated camera. As a downside, the sensor displays interference fringes caused by multiple reflections of monochromatic photons at the interface between the substrate and the silicon-oxide layer and at the interface between the substrate and the air. In fact, the silicon substrate behave like an etalon plate at longer wavelengths where the absorption depth of silicon increases. The etaloning effect is evidenced in figure 12, where variations of the substrate thickness results in visible fringes with a contrast ∼ 40 %. We have verified that the interference pattern is stable and has only a weak dependence on temperature, as the interference pattern shifted by about half a fringe for a temperature change of 20 • C. Despite the large intensity variations across the sensor, the stability of interference pattern allows us to filter it out by dividing the recorded signal by a calibrated spatial mask.
Larger reverse-bias voltages (∼ 100 V) together with thicker substrates (∼ 100 µm) of high-resistivity silicon results in much wider photosensitive regions (deep depletion) [97]. Thicker depletion regions permit to enhance the QE especially in the near-infrared, where the absorption depth of silicon is >10 µm. Because of the higher thickness, photons are converted into electron-hole pairs before reaching the silicon-oxide layer, with the result that the etaloning effect is strongly suppressed. However, the suppressed etaloning effect comes along with about a one hundred-fold increase of dark counts, which can be suppressed by cooling to low temperatures < −50 • C. Up to the present, there exists neither CMOS nor EMCCD cameras that are based on a deep-depletion back-thinned sensor, which would be ideal for detecting small signals in the near-infrared.

C. Asymptotic limit of the iterative point-spread-function reconstruction
We study the LSF produced by the iterative reconstruction algorithm in the limit of infinitely many single-atom images that are overlapped with sub-pixel resolution according to the algorithm presented in section 4.1. Although the following discussion specifically focuses on the reconstruction of LSF in one dimension, analogous results can be demonstrated for the PSF in two dimensions.
The algorithm first constructs from the image of an atom positioned in x 0 a new realvalued distribution with sub-pixel resolution, where L CCD is the response of the imaging system defined in equation (7), is the additive noise, ∆ s is the spacing between CCD pixels, and R sp (x) is the sub-pixel function (one for −∆ sp /2 < x < ∆ sp /2 and zero elsewhere, where ∆ sp = ∆ s /s for some integer s). In essence, equation (17) maps the signal L CCD (n ∆ s − x 0 ) recorded at the n-th CCD pixel to a s-fold narrower signal R sp (x − n ∆ s ), yielding a continuous distribution in the real-valued variable x. The function L (k−1) guess , which is produced by the algorithm at the iteration k − 1, is fitted to the recorded fluorescence distribution to provide a maximum-likelihood estimatorx 0 of the atom position x 0 . This estimator is a stochastic variable due to the noise term in equation (17).
Because of symmetry reasons, its probability distribution is symmetrically centered on x 0 (unbiased estimator). Subsequently, the reconstruction algorithm translates with sub-pixel resolution the distribution in equation (17) by −x 0 , producing a new distribution I x 0 [x +x 0 ]. The algorithm adds all repositioned single-atom intensity distributions to yield a new estimate of the LSF where P(x 0 ) is the probability of the single atom to be in x 0 , and P(x 0 |x 0 ) is the conditional probability expressing the uncertainty distribution of the maximum-likelihood estimatorx 0 .
In the asymptotic limit of infinitely many images, the noise contribution averages out so that the expression of equation (18) takes the form This expression can be further simplified by assuming an isoplanatic response function of the imaging system such that P(x 0 |x 0 ) = R x (x 0 −x 0 ), where R x is the uncertainty distribution of the maximum-likelihood estimator of the atom's position. In addition, we make the physical assumption that P(x 0 ) is uniformly distributed, which is ensured by the incommensurability of the optical lattice with respect to the CCD array and by small drifts in the time of the optical lattice. The latter condition is particularly important to guarantee that all sub-pixels of the reconstructed LSF are equally sampled. After some algebra, equation (19) can be recast in the form which proves the expression used in equation (8).