Noise models for low counting rate coherent diffraction imaging

Abstract: Coherent diffraction imaging (CDI) is a lens-less microscopy method that extracts the complex-valued exit field from intensity measurements alone. It is of particular importance for microscopy imaging with diffraction set-ups where high quality lenses are not available. The inversion scheme allowing the phase retrieval is based on the use of an iterative algorithm. In this work, we address the question of the choice of the iterative process in the case of data corrupted by photon or electron shot noise. Several noise models are presented and further used within two inversion strategies, the ordered subset and the scaled gradient. Based on analytical and numerical analysis together with Monte-Carlo studies, we show that any physical interpretations drawn from a CDI iterative technique require a detailed understanding of the relationship between the noise model and the used inversion method. We observe that iterative algorithms often assume implicitly a noise model. For low counting rates, each noise model behaves differently. Moreover, the used optimization strategy introduces its own artefacts. Based on this analysis, we develop a hybrid strategy which works efficiently in the absence of an informed initial guess. Our work emphasises issues which should be considered carefully when inverting experimental data.


Introduction
Coherent diffraction imaging (CDI) is a class of microscopy method that circumvents the need of high quality optics.It is based on the calculation of a numerical lens to retrieve the quantitative sample image from coherently diffracted intensity measurements.The information obtained contains both the amplitude and phase distributions of the exit-wave field.This quantity can be related to various structural parameters such as absorption, dispersion, magnetization state, crystalline structure, etc. [1].Among the proposed CDI approaches, ptychography is particularly attractive since it allows the reconstruction of non-isolated objects, without a priori restrictions on the field of view [2] and without requiring any specific sample preparation.The ptychographic approach consists in scanning a sample across a finite-support beam and recording a diffraction intensity pattern for each probe position; assuming that the scan step is small enough, each point of the sample is encoded several times and in a different way.This redundancy ensures that the phase retrieval of the complex-valued diffracted field can be achieved.It is usually performed by iterative algorithms that combine the intensity patterns.
Successful results have been obtained with visible light [3,4], with soft [5] and hard [6][7][8][9] xrays, and in electron microscopy [10][11][12].Major advantages of the ptychography approach are linked to the absence of serious physical aberrations: the method is lens-less, does not require any reference beam or sample [13,14], and is robust to inaccurately known parameters that can be retrieved simultaneously with the object image.Examples of this last issue include the illumination function [7,15,16], the probe positions [16,17] and intensity fluctuations in the incoming beam [18].
However, as the approach is based on an iterative algorithm, it can face problems with convergence, uniqueness of the solution, etc.The successive iterations lead to a solution which is reached when the constraints resulting from the overlapping condition and the intensity measurements are satisfied simultaneously.In the presence of shot noise, such a solution does not exist as the different intensity patterns are not anymore consistent one with another.Low counting statistics are of key importance in the study for instance of radiation-sensitive objects (especially biological structures), or when the object scatters weakly, or when one attempts to obtain very high-resolution images although only few photons are scattered at the needed high angles.
In this work, we address precisely the question of the degradation of the solution that is obtained in a phase retrieval approach in presence of photon noise.While we study specifically the ptychographic scheme as an example, our methods and conclusions can be extended to the other iterative algorithm based lens-less imaging microscopies.We believe that the interested reader will find herein the material needed to adapt our approach to the case of support-based phase retrieval algorithm.
We begin by defining some common noise models in order to derive a fitting function by the mean of the maximum likelihood principle [19,20].A noise-model dependent reconstruction is thereby obtained by the minimization of the corresponding fitting functions.For this purpose, two different optimization strategies are examined, namely the ordered subset (OS) and the scaled gradient (SG).The former strategy is equivalent to the well known ptychographical iterative engine (PIE) when the additional assumption of a Gaussian noise model is considered.It has the advantage of fast convergence in the early iterations, but its final convergence is precluded by the inconsistencies in the different diffraction patterns.In contrast, the latter is slower in the early iterations, but its asymptotic convergence remains in presence of noise.For the different inversion schemes, a Monte-Carlo analysis is conducted for different noise levels, allowing a direct comparison of the solutions.The quantitative evaluation of each pair "noise model/optimization strategy" is done through quality indicators like the bias and standard deviation.Our results demonstrate the large variety of trade-offs resulting directly from the use of inversion schemes and from the implicit physical models.These are discussed in detail.The conclusions we reach have important implications for experimental applications of diffractive imaging.
The next section of this article presents the noise models that are considered for a CDI experiment.Section 3 gives the fitting functions that are derived from the maximum likelihood principle.Then, two iterative strategies that can be used for retrieving the object from the chosen fitting function are described.Section 4 presents the main results of this study: first, the definition of some performance indicators together with the description of the numerical sample are given; second, the convergence properties of the iterative strategies are briefly discussed; finally, Monte-Carlo analysis of the reconstruction algorithms are considered with regard to the selected noise models.

Noise models for ptychographic data sets
The ptychography approach requires the description of the exit field as a function of the probe p(r) and the sample scattering function ρ(r), named the object in the following.In the multi- where ρ is unknown and p j (r) := p(r − r j ) is the probe function shifted in r j .From a practical viewpoint, the reconstruction from ptychographic data requires the object and the probe to be discretised.In what follows, we denote by the object to be retrieved; N is thus the number of pixels in the object plane.This object is illuminated by a support-limited probe , where M is the number of pixels in the camera.This vector is converted into an M × N matrix P j so that the exit field is expressed in vector form by ψ j := P j ρ where the index j refers to the position of the probe.The corresponding far-field Ψ j := {Ψ m, j } M m=1 is computed from the exit-field by where W is the discrete Fourier transform operator.Provided that the size of the camera pixel or detector is small enough, the expected number of photons in the m-th detector reads where b m, j is the expected number of background events and A is the area of the detector.Since A can be incorporated into the probe, one can set A = 1 without loss of generality, so that h m, j = Ψ m, j 2 + b m, j is the expected number of events for the m-th measurement in the j-th illumination.
The above relations give a deterministic relationship between the object ρ and the expected (noise-free) data set {h m, j } that is at the basis of any reconstruction numerical scheme.However, when realistic data are considered, the presence of photon noise results in a substantial degradation of the measured data set y := {y m, j } relative to {h m, j }.In order to take into account the noise issue in a ptychographic experiment, three distinct noise models are introduced in the following sections.Each of them leads to a specific criterion that links the unknown object to the measured data.We will show that these criteria are fitting functions that provide an estimate of the object further obtained via a minimization algorithm.

Noise Model P: The standard photon counting model
The far-field intensity is a quantity with nonnegative real number values; however, a detector collects a finite number of photons: this number takes integer values that can be considered as a random variable.The standard probability distribution function (PDF) considered for particle counting is the Poisson probability law.Assuming independent measurements y m, j , the probability that the entire data set y is collected reads For experiments performed with a single photon counting detector, like a cooled charge coupled device camera [21] or a pixel camera (e.g. the Maxipix [22] or the Pilatus [23]), the main noise encountered during the measurement is indeed the Poisson shot noise.
The PDF given in Eq. ( 2) is standard in many applications dealing with low counting rates: for instance in transmission or emission tomography [24,25] or in astronomy [26].Although Poisson shot noise is sometimes used in the CDI community for testing algorithms [15,27,28], the noise model given in Eq. (2) has only recently been introduced in a phase retrieval algorithm [18,29].

Noise Model G : stabilizing the variance of the counting process
Even if one deals with counting statistics, it is often convenient to consider that the data are corrupted by an additive Gaussian (thermal) noise.Such a (standard) noise model is built with the following observation equation with ε m, j an independent centered fluctuation drawn from Gaussian random vector with constant variance σ 2 , ∀(m, j).Under these hypothesis, it is deduced that the PDF of the transformed data set √ y m, j is also Gaussian and reads With this model, the transformed measurement y 1/2 m, j has a standard deviation σ independent from its expected value h 1/2 m, j , while these two quantities should be linked for a photon counting process [30].Therefore, it is clear that a model mismatch exists in the noise model f G .In practice, however, this Gaussian approximation works well.The proof is given by the presence of several ptychographic reconstruction algorithms in the literature [16,31], which are related to this simple noise model, as shown in the section 3.2.This results from the fact that the square-root transformation applied to the photon noise is known as a "variance stabilization" transform that allows, in a first order approximation, the variance and the expected value of the transformed data to be independent parameters [32].A proof of the variance stabilization of the photon noise by the square-root transform is provided in appendix A.

Noise Model Q: An approximation of the counting model
Finally, the following observation equation is considered where ε m, j is an independent centred fluctuation drawn from Gaussian random vector with variance σ 2 m, j .As we are considering photon counting, the fluctuation variance σ 2 m, j should be set to the unknown expected-value h m, j .It leads to the following PDF for the data set y Provided that the number of expected counts {h m, j } is "large enough", the central limit theorem (Ref.[20], Sec.8.47) ensures that the Gaussian PDF in Eq. ( 6) is a good approximation of its Poissonian counterpart given in Eq. (2).Hence, from the ptychographic image reconstruction viewpoint, Eq. ( 6) is a fair noise model that could be used for the design of a reconstruction Note that data with no detected photon have been suppressed to avoid division by zero.Since the standard deviation depends on the data, this last noise model is no longer Gaussian.It is used for imaging reconstruction issues with photon noise in e.g., Ref. [33][34][35].

Ptychographic image reconstruction by the maximum likelihood principle
The estimation of the unknown object ρ from a noisy data set is now introduced.Following the standard statistical inference literature, the so-called maximum likelihood (ML) principle can be used to estimate the object.It derives directly from the noise model.In the case of the ptychographical reconstruction problem, the ML estimator for ρ is the quantity that maximizes (with respect to ρ) the PDF of the chosen noise model.In more formal terms, this ML estimate reads ρ • = arg min where "•" stands for P, G or Q (i.e., the noise model under consideration), and with the neg-loglikelihood [36], which is a fitting function adapted to the noise model f P , f G or f Q ; for more details concerning the ML principle the reader is referred to e.g., Ref. [20] (Chap.18).For the noise models considered in this article, these fitting functions split as a sum over all the probe positions: where L •; j is given by (up to irrelevant constant terms) where the dependencies with respect to (w.r.t.) the unknown object ρ are made explicit.From these expressions, one notes that the value y m, j = 0 leads to a contribution h m, j (ρ) in the summands of both Eq.(10b) and Eq.(10c).As a result, the fitting functions L P and L G are equivalent w.r.t. the camera pixels that do not detect any photon.On the contrary, zero intensity camera pixels are discarded from L Q (Eq.(10d)) which is expected consequently to lead to very noisy solutions since these pixels are legitimate constraints for the final solution (see Sec. 4.5 for an example).This problem is clearly circumvented if Eq. ( 10d) is modified so that the empty pixels are accounted for, i.e., The accuracy of this approximation [37] w.r.t. the Poissonian fitting function L P is studied in [33].When the counting process is Poissonian, L P is expected to be the "best" fitting function since it is perfectly adapted to the data fluctuations.With photon noise, the ML estimator drawn from L P is attractive because it benefits from good asymptotic properties: for high counting rates, the ML estimator is free of systematic errors and presents the best variance estimation (Ref.[20], p. 56).For limited counting rates, however, the situation can be different and another fitting function may be more appropriate.We also stress that (by definition) the ML does not account for any additional a priori constraints concerning the electronic density to be retrieved (e.g., support constraint, positivity).If the oversampling is too low and/or the number of diffraction patterns is limited (possibly equal to one), the ML may perform poorly and such additional constraints may be desirable (or even mandatory).This situation appears in support-based phase retrieval problems.However, since the present study aims at evaluating noise models for diffraction-pattern information only, the addition of object constraints has to be avoided because it may most probably blur the analysis.Hence, the ML is the appropriate tool to be considered.For sake of completeness, we also note that one can resort to the maximum a posteriori principle [38, p. 183] to introduce additional constraints within a statistical framework.
Finally, we note that the assumption h m, j > 0, ∀(m, j), is mandatory in order to ensure that L P given in Eq. (10b) is always defined.Indeed, the same condition results in the existence of the L P and L G gradients, allowing the iterative algorithms introduced in the next section to be defined.For the sake of simplicity, we assume in the following that the assumption h m, j > 0 holds [39].

Computing the ML estimate
From a practical viewpoint, computing a solution defined by Eq. ( 8) requires an iterative algorithm in order to minimize one fitting function among the ones given by Eq. ( 10).This computation reduces to an unconstrained optimization problem, the aim being to find a solution that makes the gradient of L • vanish.As a result, gradient-based algorithms are natural candidates for the optimization of the chosen likelihood.The gradient of the likelihoods given in Eq. ( 10) reads where the gradient for the j-th probe position ∂ •; j is given by with " †" the conjugate-transpose operator, and where Ψ •; j := {Ψ •;m, j } M m=1 is the corrected far-field that depends on the chosen fitting function The functions given by Eq. ( 10) being not strictly convex, local minima may exist and can trap gradient algorithms.Moreover, it is well known that ambiguous solutions exist so that a unique ML cannot be defined for the ptychographical problem [16,40].From a computational viewpoint, the gradients given in Eq. ( 11) are the basic ingredients in the design of iterative reconstruction algorithms dedicated to the noise models.Two different classes of iterative algorithms are considered in the next subsections.In section 4.5 we also consider a hybrid algorithm that uses the best properties of both strategies.

Ordered-subset optimization strategies
Within the ptychographic experiments, the successive acquisition of intensity patterns for different but overlapping illumination positions on the sample naturally defines a partitioning in the data set.Ordered-subset (OS) algorithms [41] [44,45] rest upon such a partitioning in order to update the object in a two nested loop process.Whereas the inner loop runs over the probe position j = 1 • • •J, updating consecutively the illuminated portion of the object, one full iteration k → k + 1 occurs once the J probes are considered.Thus, for a given initial guess ρ (0) , the algorithm is defined by the following updates for k = 0, 1, with D j a diagonal scaling matrix and where β > 0 is the step-length.One may note that the classical ptychographical iterative engine (PIE) is a special case of this generic OS strategy.For instance, the choice (where I is the identity matrix) is precisely [46] the version of the PIE introduced in Ref. [15] for the object update, stressing that the PIE is a reconstruction algorithm relying (implicitly) on the noise model given in Sec.In practice, OS strategies (like the PIE) have appealing properties for several reasons.Firstly, image reconstructions from large data sets are made computationally more compact because each object update (in the nested loop) uses a subset of the data set; this means also that the field of view can be changed dynamically in the case of a real-time reconstruction.Secondly, the unique 3D propagation and evolution of the probe as it passes through a thick object at each position can be modelled and inverted easily [47].Finally, these algorithms usually benefit from a fast convergence in the early iterations, hence providing an efficient means for the object estimate to "get into the right ball park" (see for instance Ref. [45]).However, some convergence issues exist for these algorithms.For instance, following [45], Godard et al. [28] show that the iteration (13) is not convergent to a local minimum of the fitting function L • (Eq.( 10)) if the scaling matrix D j depends on the probe position.Moreover, even when D j is constant over the probe position (i.e., if D j ≡ D, ∀ j), the authors find that the convergence toward a minimum of L • occurs only with noise-free data set.For noisy data, OS algorithms quickly find a relatively correct solution, but start to loop around after some iterations because the set of diffraction patterns is inconsistent, as a consequence of the presence of noise (see Sec. 4.4).Hence, at a given probe position, the algorithm "undoes" what it just did at the preceding probe position, reintroducing fully the noise within the associated diffraction pattern.In the next subsection, an iterative strategy that solves this convergence problem is considered.

Scaled-gradient optimization strategies
Given an initial guess ρ (0) , the following scaled-gradient (SG) strategy is defined for where the gradient ∂ • given by Eq. (11a) accounts for all the probes, and Λ ∈ R N×N is a diagonal scaling matrix chosen as As underlined in Ref. [48], the iteration ( 15) is a natural extension of the Error Reduction algorithm to the ptychographical approach.Since β and Λ are not dependent on the iteration number, the condition ρ (k) → ρ ∞ implies ||∂ • ρ (k) || → 0: the convergence toward a limit point implies that this point is a local optimum of L • .In practice, the step-length β > 0 is adjusted in order to generate a sequence converging toward a global or (at least) a local minimum of the fitting function.To our best knowledge, no result exists that gives admissible values for β ensuring the (local) convergence of Eq. ( 15).However, the tuning β ≈ 1 was found to ensure convergence in most cases investigated in the present study.Indeed, provided that β is properly tuned, the SG iteration was always found to be a convergent algorithm.
For OS algorithms, the order in which the subsets are treated is often critical.This is clearly not the case with the SG strategy since the update (15) requires the full set of probe positions.The SG strategy is usually slower than the PIE in the early iterations because the latter performs J updates when the former performs only one.However, the SG strategy converges to a (local or global) minimum of L • , even with a noisy data set.An illustration of these distinct convergence behaviors is given in Sec.4.4.

Data inversion: a resolution vs. robustness trade-off
Clearly, the ML solution (Eq. ( 8)) is subject to fluctuations induced by the random nature of the measurements y.Because four distinct fitting functions are discussed here, it is appropriate to search for the "best" model for the reconstruction purpose.This task requires to first define how the estimators will be compared. where aims at compensating a global phase ambiguity.For complex random variables, the standard deviation of the estimation is defined by Note that Eq. ( 17) and Eq. ( 20) are intuitive quality indicators for the (noise model dependent) estimator ρ • : whereas the bias (17) gives the systematic error, the variance (20) tells if the estimator is robust w.r.t. the noise.A third indicator is interesting to introduce: the mean square error (MSE) that combines conveniently the preceding indicators.While a general closed-form expression for the bias and the standard deviation is not available, the computation of the averaged quantities given by Eq. ( 17) and Eq. ( 20) can however be achieved via Monte-Carlo simulations.

Some implicit effects induced by the noise models
This section aims at deriving typical features contained in the calculated solutions resulting from the choice of the noise model itself.
The asymptotic case of an arbitrary large signal to noise ratio (SNR) is first investigated: since we are dealing with photon noise, this results in y m, j → h m, j (ρ ).Consequently from Eq. ( 11), the gradient evaluated in ρ vanishes whatever the noise model is.In this context, the true object ρ minimizes the four fitting functions and the bias vanishes, i.e. the four noise models are equivalent.Therefore, consideration of the four noise models is only relevant at low SNR.In particular, as Eq. ( 12) gives the following relation between the corrected exit-field drawn from the models P and G , it shows that the contribution in the final solution of a low SNR measurement [49] y m, j ∼ 1 is enhanced with the noise model P because its typical expected value is then h m, j < 1 ≤ y m, j .Such measurements being spread over the borders of the intensity pattern, one expects that the noise model P enhances i.e. the model R should lead to higher biases and to lower variances as compared to the noise model G .In summary, we can see that the specific behavior of each model is dominated by the set of pixels that collects the lowest number of photons.Finally, we also note that a photon noise, in the low SNR regime, produces very sparse intensity patterns.As these empty pixels are usually at the very edge, high-frequency components are missing in each intensity pattern and one can assume that the retrieved object is, more or less, a low-pass filtered version of the original object (with a loss in resolution being driven by the SNR).This result should hold whatever is the considered noise model.

A test-chart that highlights the predicted effects
To highlight these specific behaviors, a numerical test is now presented, which involves the evaluations of the estimation bias and standard deviation from Monte-Carlo simulations.The choice of the object is primarily motivated by its ability to illustrate the predicted "cut-off frequency" effect of each noise model explained above.For instance, a suitable object is a part of a Fresnel Zone Plate (see Fig. 1).The transmission coefficient of the object is set to one, while window of 260 × 260 pixels.The phase shift encountered by the beam is 1.72 radians and the radial frequency varies from 0.07 to 0.3 pixel −1 .The ptychographical data-set is composed of a total of 81 diffraction patterns, each one of size 100 × 100 pixels.The choice of a step size of about 20 pixels in both directions leads to an overlap ratio of 65%.In addition, two SNRs are considered in these simulations: the highest SNR provides a maximum of 10 6 expected counts over the 2D camera; the lowest provides 10 3 expected photons over the camera.The Monte-Carlo analysis presented below is based on a set of 100 noisy (photon noise) ptychographic data sets.

Some issues concerning the iterative strategy
It is clear from Sec. 3 that distinct iterative strategies can be derived for the minimization of the same fitting function.It is therefore appropriate to investigate the impact of the iterative strategy on the retrieved object.For that purpose, the inversion of a single ptychographic data set by either the OS or the SG strategy is now considered.For sake of simplicity, the fitting function L G is considered, but similar results are obtained with the other fitting functions.
When a noise-free data set is considered, Fig. 2(b) shows that the OS strategy converges toward a minimum of the fitting function since the gradient norm decreases toward zero.With noisy data (i.e., corrupted by photon noise) however, the gradient norm starts to decrease before it reaches a stagnation, such that the convergence does not occur.Furthermore, Fig. 2(b, c) shows that the OS strategy should be stopped early in the iteration process [50] in order to pick the best solution w.r.t. the relative error (in the object plane) defined by where ρ is the true object and where When the SG strategy is considered, Fig. 2(a, b) shows that the iterations converge toward a (local or global) minimum of L • , even with a noisy data set.This minimum defines an estimate which is a global trade-off over the set of inconsistent diffraction patterns, leading to a lower relative error than the best relative error reached by the OS strategy (see Fig. 2(c) and the reconstructions shown in Fig. 2(d, e, f)).

Investigation of the figure of merit for each noise model
In an attempt to define the intrinsic merit of each noise-model, the impact of the minimization method has to be as low as possible.In other words, the minima of L • can be only compared w.r.t. the noise models if one ensures that the reconstruction quality is not affected by the way the data are handled along the iterative process.Therefore, the use of the (convergent) SG strategy is mandatory, with the additional condition of an initialization as close as possible to the minima.For that purpose, the true solution is chosen as initial guess, i.e., ρ (0) = ρ .The algorithms are stopped when the norm of the gradient reaches a conveniently small value.
For each fitting function, and for the lowest SNR, the numerical evaluation of the averaged solution ρ • (both modulus and phase) given in Eq. ( 18) and the standard deviation given in Eq. ( 20) are provided in Fig. 3 and Fig. 4, respectively.The predicted cut-off frequency effect is clearly visible in Fig. 3: for L R , the edges of the modulus are smoothed and the phase is damped whereas they remain much more resolved (undamped) for L P .For this low SNR, the modulus is contaminated by fluctuations that come from the object phase function.The relative amplitude of these fluctuations is 8, 10, 17 and 10 % for L P , L G , L Q and L R , respectively.They reduce when the SNR increases and they become negligible (around 1%) for 10 6 photons.As explained in Sec.4.2, such artifacts appear because the retrieved object is essentially a lowpass filtered version of the original object (see Fig. 5).In the case of the present object, it results from a mixing between the real and imaginary parts of the object, leading to the observation of a phase-like structure in the modulus component.Finally, the standard deviation depicted in Fig. 4 confirms that L R has the highest robustness w.r.t. the photon noise whereas L P has the lowest.For all the fitting functions, the standard deviation grows with the collected number of photons, which is a standard result when one deals with photon noise (Ref.[51], p. 181).As expected, the fitting function L G reaches a tradeoff w.r.t.these two fitting functions, as expected from Sec. 4.2.Quantitatively, the numerical evaluation of the quality indicators defined in Sec.4.1 are reported in Table 1.For both SNRs, although the variance is lower with L G or L R , one notes that L P gives however the best results w.r.t. the MSE and the error in the object plane.The fitting function L Q being much less robust to the noise than the other three fitting functions (see Sec. 3), it is not considered as a valuable alternative for CDI in the low counting rate regime.Starting from a coarse initial guess In practice, the chosen initial guess is often a rough estimate and the iterative strategy adds its own bias and variance.For this reason, it is appropriate to investigate how the reconstruction quality deteriorates when one uses a coarse initialization with either the OS or the SG strategy.We further assume that no a priori object information can be used, resulting in the choice of a free-space estimate for the initial guess.The quality indicators achieved with 10 3 photons are reported in the Table 2; as the OS strategy does not lead to converging iterations (see Sec. 3.2), the iteration that gives the best (i.e., the smallest) error in the object plane is selected for each data set.
To summarize, the behaviors exhibited in the preceding section are still valid here: the fitting function L P offers the lowest bias but the worst variance while the fitting function L R has the opposite characteristics.Moreover, every criteria is improved when using the SG strategy, the gain being more clearly evidenced with the noise model P.For the sake of completeness, the difference-map (DM) iteration [52] for ptychographic image reconstruction, as described in Ref. [53] is also implemented and compared.In all the tests performed, the DM iteration and the OS strategy with L G perform equivalently [54].In Fig. 6, the results of the SG strategy obtained from a single noisy data set is shown for the various fitting functions; these illustrate how a typical reconstruction looks like for each noise model when the SG strategy is used.Finally, the algorithms presented in this work have been tested on several object classes: phase objects, absorption objects, objects with low or high contrasts, etc.It is always the case that the Poissonian noise model P presents the least systematic errors, whereas the noise model R is the most robust, the Gaussian model G reaching a trade-off between the two others.It is also observed that the differences between all these algorithms tend to vanish when the SNR increases.
The minimization of L P : a hybrid optimization strategy.It is clear from Tables 1 and 2 that L P is the fitting function that undergoes the strongest degradation if the initialization is far from the final solution.On the contrary, the minimization of L R is very robust w.r.t. the starting point.Moreover, the OS and the SG strategies are mostly equivalent for that fitting function.Hence, it is appropriate to search for a hybrid strategy that profits from both fitting functions.Therefore, we propose to use the OS strategy starting with the fitting function L R in order to get quickly to a first estimate which is subsequently introduced as an initial guess for the further minimization of the fitting function L P .As an example, one can perform 1000 OS iterations with L R followed by 1000 SG iterations with the fitting function L P .The quality indicators obtained with this strategy are shown in Table 2.One notes that these indicators are improved: they reach values similar to the ones obtained with the true object (see Table 1).Figure 7 also shows the reconstructed phase obtained by either the "hybrid" strategy or by the SG strategy with the true object as an initial guess.These phases are similar, showing that the hybrid strategy is a valuable technique for the optimization of the Poissonian fitting function.

Conclusion
In summary, we have addressed the question of the choice of the iterative process for coherent diffraction imaging in the case of data corrupted by noise.Several noise models compatible with photon (or electron) shot noise have been presented and further used within two inversion strategies, the OS and the SG.We have shown that any physical interpretation drawn from a CDI iterative technique requires a detailed understanding of this iterative technique.Our analysis emphasizes that iterative reconstruction algorithms for CDI often assume implicitly a noise model that may be more or less a coarse approximation of the data fluctuations.While standard asymptotic results for photon noise foresee that high SNR measurements should be handled in the same way by any model, each model has the ability to enhance or inhibit the weight of low SNR measurements in the final reconstruction.From this viewpoint, the noise models presented in this paper reach their own resolution vs. robustness trade-off.The merit of each noise model may be user and/or object dependent and, from an experimental perspective, the impact of the intensity fluctuations w.r.t. the noise model has to be tested on numerical samples prior to the inversion.An efficient strategy to circumvent the problem in the case of experimental intensity analysis consists in building a set of data for a model sample, designed as close as possible to the available experimental data set (Fourier space resolution, number of probe positions, SNR, etc.).This numerical data set can then be used to test the different noise model approaches and emphasizes the possible reconstruction artifacts.Whereas it is not a surprise that in presence of shot noise the initial object guess has a strong impact on the final solution obtained with CDI, the employed optimization strategy (OS or SG) generates its own artifacts.Clearly, algorithms that reach the minimum of the fitting function defined by the noise model should be used.On the contrary, if non-converging algorithms are employed, some additional reconstruction degradation is expected.Finally, based on this detailed study, a hybrid strategy has been presented that improves the convergence towards the minimum of the Poissonian fitting function when a good initial guess is missing.
The ML principle adopted in this work does not rely on any prior model related to the unknown object.However, when such information is available, such models provide additional constraints that may enhance the resolution and the robustness of the reconstructions.In this context, the maximum a posteriori would be the natural extension of the ML principle when prior models are accessible.It would lead to a penalized fitting function as discussed elsewhere in e.g., Ref. [9,18].Projective algorithms (like ER or HIO) are also natural means to handle additional constraints concerning the unknown object.Another interesting perspective consists in the adaptation of these standard algorithms in order to cope with the various noise models presented in this article.

A. The variance stabilization transform
Let y be a random variable with mean y = μ and variance VAR(y) = σ 2 , and suppose that σ and μ are related by σ = f (μ) for some function f .A variance-stabilization transform aims at constructing a function h such that the random variable h(y) has an almost constant variance, without loosing any information (i.e., h has to be injective in the range of y).
The Taylor expansion of h around μ in the first order is h(y) − h(μ) = (y − μ)h (μ) + R (27) where R stands for higher order terms.One then has VAR h(y) = VAR h(y) − h(μ) where (y − μ) h (μ) = 0 is used in the last line.Neglecting all the contributions from the terms higher than the first order gives: Thus, within the first order approximation, the variance of h(y) is independent of μ if a function h is exhibited such that h (x) x=μ = b/ f (μ) for a constant b in R. The obvious candidate h(x) = bx/ f (μ) is of no interest, being a linear function.A suitable choice for the Poissonian case in which f (x) = √ x is the function h(x) = √ x; we then find b = 1/4.This is the variance stabilization used in the Section 2.2.Anscombe, in [55], showed that the function h(x) = (x + 3/8) 1/2 has a better variance-stabilization capability than the square-root transform.

Fig. 1 .
Fig. 1.The test object is a (support-limited) quadrant of a Fresnel zone-plate extending over 100×100 pixels within a 260×260 pixel image.The modulus (a) is 1 within the support of the object and the phase (b) ranges from 0 to 1.72 rad.The corresponding cross-sections are plotted along the 86-th column of the image.A real probe function (c) is chosen so that it extends over 58×58 pixels (full-width at half maximum) within a 100×100 pixel image, corresponding in an oversampling ratio of 1.7; the corresponding cross-section is plotted along the 50-th column of the image.itvanishes outside the object window.The object extends over 100 × 100 pixels in a numerical

#Fig. 2 .
Fig. 2. A ptychographical reconstruction illustrates the convergence behavior of the OS and the SG strategies.In these examples, the fitting function is L G given in (10c), such that the OS strategy corresponds to the standard PIE.Top line: for the OS (dashed line) and the SG (solid line), (a) evolution of the fitting function L G w.r.t. the iteration k for a noise-free data set (thick line) and an example of a noisy data set (thin line) with a maximum of 10 3 photons on the camera; (b) idem for the gradient norm ||∂ G (ρ (k) )||; (c) idem for the error Err(ρ (k) ) defined by (25).Second and third lines: reconstruction from a noisy data set; with the OS strategy, ( ) is the estimate that minimizes the error depicted in (c) and ( ) is the estimate obtained after k =2000; with the SG strategy, (×) is the estimate after k =2000.The shown results correspond to a 180 × 180 window centered around the object central pixel.The respective color scales are indicated on the figure.

Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 3.The average solution for each noise model evaluated over a series of 100 noisy data sets.The initial guess is the true object and the SG strategy is used for the optimization of the fitting function.For each noisy data set, no more than 10 3 photons impinge on the detector.The grey level scaling in each column shares the same linear scale.The shown results correspond to a 180 × 180 window centered around the object central pixel.

#Fig. 6 .Fig. 7 .
Fig. 6.Object reconstruction by means of the minimization of the fitting functions L P , L G and L R given in Eq. (10).The optimization is performed with the SG strategy presented in Sec.3.3 and an initial guess defined as an uniform object.The phase is set to zero outside of the object support for visualization purpose.

#
173585 -$15.00USD Received 31 Jul 2012; revised 11 Sep 2012; accepted 11 Sep 2012; published 1 Nov 2012 (C) 2012 OSA 5 November 2012 / Vol. 20, No. 23 / OPTICS EXPRESS 25933 In the statistical literature, the accuracy of estimation is evaluated via two standard indicators, the estimation bias and the estimation standard deviation.Let • be the expectation (i.e., average over several realizations of the noise) operator, and ρ •;n and ρ ;n denote the n-th component of the ML solution ρ • and the true object ρ , respectively.Then, the estimation bias reads November 2012 / Vol. 20, No. 23 / OPTICS EXPRESS 25924 the spatial resolution (i.e.reduces the bias) w.r.t. the noise model G .However, this gain has necessarily a cost: because these low SNR measurements are plagued by large fluctuations, the model P should also lead to larger estimation variance.The opposite arguments holds for the noise model R since the condition h m, j 1 leads to #173585 -$15.00USD Received 31 Jul 2012; revised 11 Sep 2012; accepted 11 Sep 2012; published 1 Nov 2012 (C) 2012 OSA 5

Table 1 .
Figure of merit of each noise model.The l 2 -norms of the bias, standard deviation (STD) and mean-square error (MSE) as well as the error (Err) in the object plane for the fitting functions defined in section 2 are given.The SG strategy is used with the true object as initial guess.

Table 2 .
The l 2 -norms of the bias, STD and MSE as well as Err in the object plane achieved by the fitting functions L P , L G and L R when either the SG or the OS strategy is used with a free-space as initial guess.The results achieved by the DM and the hybrid method are also presented.