Solution of the phase problem for coherent scattering from a disordered system of identical particles

While the implementation of single particle coherent diffraction imaging for non-crystalline particles is complicated by current limitations in photon flux, hit rate, and sample delivery a concept of many-particle coherent diffraction imaging offers an alternative way to overcome these difficulties. Here we present a direct, non-iterative approach for the recovery of the diffraction pattern corresponding to a single particle using coherent x-ray data collected from a two-dimensional (2D) disordered system of identical particles, that does not require a priori information about the particles and can be applied to a general case of particles without symmetry. The reconstructed single particle diffraction pattern can be directly used in common iterative phase retrieval algorithms to recover the structure of the particle.


Introduction
It was recently realized [1,2,3] that analysis of diffraction patterns from disordered systems based on intensity cross-correlation functions (CCFs) can provide information on the local symmetry of these systems. A variety of disordered systems, for example, colloids and molecules in solution, liquids, or atomic clusters in the gas phase can be studied by this approach. It becomes especially attractive with the availability of xray free-electron lasers (FELs) [4,5,6]. Essentially, scattering from an ensemble of identical particles can provide, in principle, the same structural information as single particle coherent imaging experiments [7] that are presently limited in the achievable resolution [8]. Clearly, a large number of particles in the native environment can scatter up to a higher resolution, at the same photon fluences, comparing to experiments on single particles injected into the FEL beam. However, it is still a challenge to recover the structure of individual particles composing a system using the CCF formalism. In the pioneering work of Kam [9,10] it was proposed to determine the structure of a single particle using scattered intensity from many identical particles in solution. However, this approach, based on spherical harmonics expansion of the scattered amplitudes, was not fully explored until now. Recently, it has been revised theoretically [11,12,13] and experimentally [14]. The possibility to recover the structure of individual particle was demonstrated in systems in two-dimensions (2D) [11,12,14] and in threedimensions (3D) [13] using additional a priori knowledge on the symmetry of the particles. Unfortunately, these approaches, based on optimization routines [11,13] and iterative techniques [12,14], do not guarantee the uniqueness of the recovered structure unless strong constraints are applied. This all leads to a high demand in finding direct, non-iterative approaches for recovering the structure of an individual particle in the system.
In this paper we further develop Kam's ideas and propose an approach enabling unambiguous reconstruction of single-particle intensity distributions using algebraic formalism of measured two-and three-point CCFs without additional constraints. Once the single-particle intensity is obtained, conventional phase retrieval algorithms [15,16] can recover the projected electron density of a single particle. Our approach is developed for 2D systems of particles which is of particular interest for studies of membrane proteins. It can be also used to study 3D systems provided that particles can be aligned with respect to the incoming x-ray beam direction.

Basic equations
We consider a scattering experiment in transmission geometry [see figure 1], where the direction of the incoming x-ray beam is perpendicular to the 2D sample plane, and simulate a set of diffraction patterns. Our sample consists of an arbitrary small number N of identical, spatially disordered particles [see figure 2(a)]. Also, we assume a uniform distribution of orientations of particles in the system. First, for simplicity, we consider a model where the total scattered intensity I(q) is represented as an incoherent sum of intensities I ψ i (q) corresponding to individual particles in the system, where q is the momentum transfer vector and ψ i is the orientation of the i-th particle. This model is a good approximation for dilute systems when the mean distance between particles is much larger than their size [2,3]. In this approximation we neglect the interparticle correlations due to coherent interference of scattered amplitudes from individual particles. Later, in simulations, we generalize our approach to the case of coherent scattering from a system of particles in the presence of Poisson noise and demonstrate the applicability of our approach. The incident x-ray beam coherently illuminates a 2D disordered sample and produces a diffraction pattern on a detector. The direction of the incident beam is defined along the z axis of the coordinate system.
In the frame of kinematical scattering, the intensity I ψ 0 (q) scattered from a single particle in some reference orientation ψ 0 is related to the electron density of the particle ρ ψ 0 (r) by the following relation Once the single-particle intensity I ψ 0 (q) is determined, conventional phase retrieval algorithms [15,16] can recover the projected electron density ρ ψ 0 (r) of a single particle. Our goal is to determine the scattering pattern of a single particle I ψ 0 (q) using a large number of diffraction patterns I(q) corresponding to different realizations of the system. The intensity I ψ 0 (q) scattered from a single particle can also be described in a polar coordinate system q = (q, ϕ) as a function of the angular position (0 < ϕ ≤ 2π) on the resolution ring q [see figure 2(b)]. For each resolution ring q, one can represent the intensity function I ψ 0 (q, ϕ) as a Fourier series expansion, where I n q,ψ 0 are the Fourier components of I ψ 0 (q, ϕ). For a 2D system of identical particles, the intensity I ψ 0 (q, ϕ) scattered from a particle in the reference orientation ψ 0 = 0 is related to the intensity I ψ i (q, ϕ) scattered from a particle in an arbitrary orientation ψ i as I ψ i (q, ϕ) = I ψ 0 (q, ϕ − ψ i ). Applying the shift theorem for the Fourier transforms [17] we obtain for the corresponding Fourier components of the intensities, I n q,ψ i = I n q,ψ 0 exp(−inψ i ). Using these relations we can write for the Fourier components I n q of the intensity I(q, ϕ) scattered from N particles where A n = N i=1 exp(−inψ i ) is a random phasor sum [18]. According to (3) the intensity scattered from a single particle can be uniquely determined by the set of complex coefficients {I n q,ψ 0 } = {|I n q,ψ 0 |, φ n q,ψ 0 = arg(I n q,ψ 0 )}. Here we propose a direct approach for determination of these Fourier components {I n q,ψ 0 } applying two-and threepoint CCFs to the measured intensities I(q, ϕ) scattered form N particles.
We start with the two-point CCF defined at two resolution rings q 1 and q 2 [9,11,2] C(q 1 , q 2 , ∆) = I(q 1 , ϕ) I(q 2 , ϕ + ∆) where 0 ≤ ∆ ≤ 2π is the angular coordinate [see figure 2(c)], I(q, ϕ) = I(q, ϕ) − I(q, ϕ) ϕ is the intensity fluctuation function, and . . . ϕ denotes the average over the angle ϕ. It can be directly shown [2] that the Fourier components C n q 1 ,q 2 of the CCF C(q 1 , q 2 , ∆) for n = 0 are defined by the Fourier components I n q of the intensities I(q, ϕ) § C n q 1 ,q 2 = I n * q 1 · I n q 2 , and for n = 0 Fourier components C n q 1 ,q 2 = 0 according to the definition of the intensity fluctuation function I(q, ϕ). Using (4) in (6) and introducing statistical averaging . . . M over a large number M of diffraction patterns one can get Here, we used the fact that for a uniform distribution of orientations of N particles |A n | 2 M asymptotically converges to N for a sufficiently large number M of diffraction patterns .
Equation (7) can be used to determine both, the amplitudes |I n q,ψ 0 | and phases φ n q,ψ 0 (for n > 0) of the Fourier components I n q,ψ 0 associated with a single particle. For example, to determine the amplitudes equation (7) can be applied successively to three different resolution rings q 1 , q 2 and q 3 connecting each time a pair of q-values. Direct evaluation gives for the amplitudes of the Fourier components of a single particle on the ring q 1 where is an experimentally determined quantity. Applying (8) to different resolution rings q and orders n all required amplitudes |I n q,ψ 0 | can be determined ¶. Equation (8) should be used with care to avoid possible instabilities due to division by zero. For this purpose one should exclude from consideration the cases when C n q 3 ,q 2 M is close to zero. Since the Fourier components of the intensities obey the symmetry condition I n * q,ψ 0 = I −n q,ψ 0 it is sufficient to determine I n q,ψ 0 for n ≥ 0. The 0-th order Fourier component by its definition [2,3] is a real-valued quantity, I 0 q = I(q, ϕ) ϕ , and can be determined from the experiment as well. Using (4) one can readily find that I 0 q,ψ 0 = I 0 q /N . Equation (7) also determines the phase difference between two Fourier components I n q 1 ,ψ 0 and I n q 2 ,ψ 0 of the same order n, defined at two different resolution rings q 1 and q 2 , Notice, that one can freely assign an arbitrary phase to one of the Fourier components I n q j ,ψ 0 , which corresponds to an arbitrary initial angular orientation of a particle. § We note that the Fourier components of the intensity fluctuation function I n q can be expressed through the Fourier components of intensity I n q using relation I n q = I n q − I 0 q · δ n,0 , where δ n,0 is the Kronecker symbol.
Note, that in this paper, contrary to [2,3], we use a not normalized CCFs which, in particular, results in different asymptotic values of |A n | 2 M . ¶ If number of particles in the system is not known a priori then N in (8) has to be considered as a scaling factor that is obtained on the final stage of reconstruction of a single particle intensity (see section 3).
Assuming, for example, φ n q 1 ,ψ 0 = 0 and using (9) one can directly determine the phases of the Fourier components with the same n-value on all other resolution rings q 2 = q 1 [12].
To completely solve the phase problem, it is required to obtain additional phase relations between Fourier components with different n values on different resolution rings q. These relations can be determined using a three-point CCF introduced by Kam [9,10]. The important aspect of our approach is to use the three-point CCF defined on three different resolution rings [see figure 2(d)], contrary to [11,12] where the three-point CCF was defined on two resolution rings. Similar to C n q 1 ,q 2 it can be shown (see Appendix) that for the Fourier components of this CCF the following relation is valid, for n 1 = 0, n 2 = 0, n 1 = −n 2 . Using (4) in (11) and performing statistical averaging we obtain for the averaged Fourier components of the three-point CCF where Our analysis shows (see Appendix) that the statistical average A n 1 ,n 2 M converges to N for a sufficiently large number M of diffraction patterns, i.e. arg[ A n 1 ,n 2 M ] = 0, and we get from (12) the following phase relation Equation (13) determines the phase shift between three Fourier components I (n 1 +n 2 ) q 1 ,ψ 0 , I n 1 q 2 ,ψ 0 , and I n 2 q 3 ,ψ 0 of different order n defined on three resolution rings. If n 1 = n 2 = n and n 3 = 2n, equation (13) reduces to a particular form, giving the phase relation between Fourier components of only two different orders n and 2n, Equations (8), (9) and (13) constitute the core of our approach and allow us to directly and unambiguously determine the complex Fourier components I n q,ψ 0 using measured x-ray data from a disordered system of N particles. The obtained Fourier components I n q,ψ 0 can be used in (3) to recover the scattered intensity I ψ 0 (q, ϕ) corresponding to a single particle.

Recovery of the projected electron density of a single particle
We demonstrate our approach for the case of a coherent illumination of a disordered system of particles, in the presence of Poisson noise in the scattered signal. We recover diffraction patterns and projected electron densities for two different particles, a centered pentagonal cluster [ figure 3(a)] that has 5-fold rotational symmetry and an asymmetric  In both cases we consider kinematical coherent scattering of x-rays with wavelength λ = 1Å from a system of N = 10 clusters in random positions and orientations, distributed within a sample area of 5 × 5 µm 2 [see figure 2(a)]. Diffraction patterns are simulated for a 2D detector of size D = 24 mm (with pixel size p = 80 µm), positioned in the transmission geometry at L = 3 m distance from the sample [see figure 1]. This experimental geometry corresponds to scattering to a maximum resolution of 0.25 nm −1 . For given experimental conditions the speckle size corresponding to the illuminated area is below the pixel size of the detector. At the same time the speckle size corresponding to the size of a single particle is about 12 pixels, that provides sufficient sampling for phase the retrieval.
The coherently scattered intensities simulated for single realizations of the systems + are shown in figures 3(c) and 3(d). We note, that in simulations the intensity I(q, ϕ) scattered from a system of N particles was calculated as a coherent sum of the scattered amplitudes A ψ i (q, ϕ) from each particle, i.e, I(q, ϕ) = | N i=1 A ψ i (q, ϕ)| 2 . The incident fluence was considered to be 10 12 and 10 13 photons/25 µm 2 for pentagonal and asymmetric clusters, respectively. In the case of asymmetric clusters additional Poisson noise was included in the simulations. The Fourier components of two-point and threepoint CCFs [equations (7) and (12)] were averaged over M = 10 5 diffraction patterns * .
Once all amplitudes I n q,ψ 0 and phases φ n q,ψ 0 of the Fourier components I n q,ψ 0 are determined, the diffraction pattern corresponding to a single particle can be recovered by performing the Fourier transform (3). Using experimentally accessible quantities I(q, ϕ) ϕ M = N I 0 q,ψ 0 and I n q,ψ 0 = √ N I n q,ψ 0 , we can rewrite (3) as Here we discuss determination of the amplitudes and phases of the Fourier components I n q,ψ 0 associated with a single particle, using x-ray data from 2D system of pentagonal clusters [figure 3(a)] described above. Applying expression (8) to different q we find all required amplitudes √ N |I n q,ψ 0 | (for n > 0) scaled by the factor √ N . The 0-th order Fourier component is determined as I 0 q,ψ 0 = I(q, ϕ) ϕ M /N . In figure 4 the values √ N |I n q,ψ 0 | derived from (8) normalized by I(q, ϕ) ϕ M are shown for n ≤ 40 at three different resolution rings q 1 = 0.21 nm −1 , q 2 = 0.23 nm −1 and q 3 = 0.25 nm −1 . Due to 5-fold symmetry of the pentagonal cluster, in the q-range accessible in our model + All simulations of diffraction patterns were performed using the computer code MOLTRANS. * According to our simulations, two-point and three-point CCFs defined on different resolution rings q 1 , q 2 and q 3 [see (5) and (10)] even in the case of coherent illumination of a disordered system do not contain the inter-particle contribution. Contrary to that, as it was shown in [2,3], this is not the case for two-point CCFs defined on the same resolution ring q 1 = q 2 , when inter-particle contributions and noise can give a substantial contribution. Therefore, in our present work we avoided this last case for the unique determination of amplitudes and phases of the Fourier coefficients of single particles. experiment we need to determine the phases of the Fourier components I n q,ψ 0 only with n = 10 · l, 1 ≤ l ≤ 4, where l is an integer. Below we present the algorithm for the phase determination using equations (9), (13) and (14). (i) First, we assign an arbitrary phase to one of the Fourier components I n q,ψ 0 , which corresponds to an arbitrary initial angular orientation of a particle. It is convenient to assign this phase to the Fourier component with the smallest available n value. In our case we assign a zero phase φ 10 q 1 ,ψ 0 = 0 to the Fourier component I 10 q 1 ,ψ 0 . (ii) The phases of the Fourier components with n = 10 on other resolution rings q j (j = 1, 2) were obtained by using (9), i.e., φ 10 q j ,ψ 0 = φ 10 q 1 ,ψ 0 + arg[ C 10 ].
(v) The phase of the Fourier component I 30 q 1 ,ψ 0 on the resolution ring q 1 was determined by using (13) with n 1 = 10 and n 2 = 20, φ 30 The process was continued in a similar way until all phases at each q-value were determined. The same procedure has been applied for the case of 2D system of asymmetric clusters [ figure 3(b)]. Due to asymmetric structure of particles, in the same q-range we need to determine the phases of a significantly larger set of the Fourier components with n = 2 · k, 1 ≤ k ≤ 24, where k is an integer. Assigning a zero phase φ 2 q 1 ,ψ 0 = 0 to the Fourier component with n = 2 we successively determine the phases of the Fourier components of the higher orders up to n = 48. We note, that the data redundancy intrinsic to C n q 1 ,q 2 M and C n 1 ,n 2 q 1 ,q 2 ,q 3 M offers a lot of flexibility in determination the possible ways of solving the phases of the Fourier components I n q,ψ 0 . If the number of particles N in the system is not known [see equation (15)], it can be determined using the positivity constraint I ψ 0 (q, ϕ) ≥ 0 for the recovered intensity. One can slightly relax this constraint and allow a small fraction of pixels with negative values in order to get an image with a better contrast. These negative values can be substituted by zeros before using this diffraction pattern in the iterative phase retrieval algorithms for the reconstruction of the particle structure. This procedure can be justified by inaccuracies which arise due to statistical estimate of the CCF and also different errors in calculations. For example, for the case of pentagonal clusters in our simulations we applied the value of N = 9.8, which corresponds to 0.27% of negative pixes on the detector [see figure 5]. This value is very close to the theoretically predicted N = 10 for a uniform distribution of orientations of particles.

Results and discussion
The diffraction patterns corresponding to single particles recovered by our approach are presented in figures 3(e) and 3(f). As one can see from these figures, the recovered single particle diffraction patterns reproduce the diffraction patterns of individual clusters shown in figures 3(a) and 3(b) very well. The structure of the clusters reconstructed from these recovered diffraction patterns was obtained by a standard phase retrieval approach and is presented in figures 3(g) and 3(h). In our reconstructions we used an alternative sequence of hybrid input-output (HIO) and error-reduction (ER) algorithms [15]. The comparison of the single cluster structures obtained by our approach [figures 3(g) and 3(h)] with the initial model shown in the insets of figures 3(a) and 3(b) confirms the correctness of our reconstruction. Both clusters were reconstructed with a resolution up to 25 nm. These results clearly demonstrate the ability of our approach to recover the single-particle structure from noisy data obtained in coherent x-ray scattering experiments.
The ability of the presented approach to recover the diffraction pattern of a single particle relays on the accuracy of the two-and three-point CCFs determination. The statistical properties of the CCFs and their convergence to the average values strongly depend on the number of particles N in the system, their density and the distribution of their orientations [2,3]. In particular, the value of the scaling factor N in the expression of the Fourier components of the CCF is defined by the statistical distribution of particles in the system. We performed a few simulations with varying number N of particles in the system: a) for a Gaussian distribution of the number of particles with an average value N = 20 and standard deviation σ = 2; b) for a Poisson distribution with an average number of particles N = 30. In both cases the CCFs statistically converge to the same average values as for the systems with a fixed number of particles, N = 20 and N = 30 (for Gaussian and Poisson distributions, correspondingly). The only difference we observed is a slightly slower convergence of CCFs in the case of statistical distribution of particles compared to the case with a fixed number N . This means, that in the case of systems with a varying number of particles one needs to measure a larger number of diffraction patterns.
Particle density also affects the statistical behaviour of the CCFs. In the limiting case of a very dilute system the CCFs calculated for a single diffraction pattern are defined by independent structural contributions of N individual particles. In this case statistical averaging over diffraction patterns is, in fact, acts on the fluctuating terms |A n | 2 in (7) and A n 1 ,n 2 in (12) and convergence of these terms to their statistical estimates determines the number M of diffraction patterns required for averaging [3]. In the case of a dense system, the CCFs contain a significant inter-particle contribution in the same range of q-values, where the structural contribution of individual particles is observed [2,3]. This additional contribution slows down the convergence of CCFs (especially of the three-point CCF) and the number of measured diffraction patterns should be increased. In the presence of Poisson noise in the scattered signal one needs to accumulate even more diffraction patterns. Additional effects of resolution, solution scattering, etc. on the measured CCFs were considered in detail in [19]. Results of our simulations with PMMA clusters and asymptotic estimates presented in [3] show that, in practice, the number of particles in the system should be less than few dozens. This allows to achieve statistical convergence of the CCFs using 10 5 − 10 6 diffraction patterns. In the disordered systems considered in this work the particle density is characterized by R /d ≈ 5.0, where R is the average inter-particle distance. At these conditions it was sufficient to use 10 5 diffraction patterns with Poisson noise to achieve convergence of the two and three-point CCFs and perform successful reconstruction of the particle structure. In practical applications the convergence of the CCFs can be directly controlled as a function of the number M of the diffraction patterns considered in the averaging [3].

Conclusions
In conclusion, we present here a direct, non-iterative approach for the unambiguous recovery of the scattering pattern corresponding to a single particle (molecule, cluster, etc) using scattering data from a disordered system of these particles including noise. Our simulations demonstrate the successful application of this approach to 2D systems composed of particles with and without rotational symmetry. It can be of particular interest for studies of membrane proteins which naturally form 2D systems. Our approach can be also applied to 3D systems of particles to recover their projected electron density, if a specific alignment of the particles along the direction of the incoming x-ray beam can be achieved. We have shown that our approach is robust to noise and intensity fluctuations which arise due to coherent interference of waves scattered from different particles. We foresee that this method will find a wide application in the studies of disordered systems at newly emerging free-electron lasers.
where C n 1 ,n 2 q 1 ,q 2 ,q 3 are the Fourier components of the CCF C(q 1 , q 2 , q 3 , ∆ 1 , ∆ 2 ). In general, (A.2) determines a relation between three different Fourier components of intensity I n q of the order n 1 , n 2 and n 1 + n 2 , defined on three resolution rings, q 1 , q 2 and q 3 .
Considering incoherent scattering from N particles, we can rewrite (A.2) in terms of the Fourier components of intensity I n q,ψ 0 associated with a single particle, C n 1 ,n 2 q 1 ,q 2 ,q 3 M = I (n 1 +n 2 ) * q 1 ,ψ 0 I n 1 q 2 ,ψ 0 I n 2 q 3 ,ψ 0 · A n 1 ,n 2 M , where . . . M denotes the statistical averaging and we introduce a new random phasor sum Now we determine the statistical estimates of the terms C n 1 ,n 2 M and D n 1 ,n 2 M . We consider an uniform distribution of orientations of particles ψ on the interval (−π, π), where two different orientations ψ i and ψ j are independent and equally probable. We also consider that the sum (or difference) of two angles ψ, as well as a product nψ (where n is an arbitrary number) is also distributed on the interval (−π, π). Another words, we deal with a wrapped angular distribution [20]. In this case, we can use two following arguments. First, the differenceψ = ψ i − ψ j of two uniformly distributed independent variables ψ i and ψ j also has a uniform distribution. In this case the probability density function (PDF) ofψ is a convolution of the corresponding PDFs of ψ i and ψ j [21], which is just a constant after wrapping to the interval (−π, π). Second, using the rules of probability theory for transformation of random variables [21,22], it can be shown that the product nψ is also a uniformly distributed variable. Using a combination of these two arguments in (A.7) and (A.8) we find that C n 1 ,n 2 M = 0 and D n 1 ,n 2 M = 0. Therefore, the statistical estimate in (A.5) becomes a real number, A n 1 ,n 2 M = N . Using this result in (A.3) we obtain (13).