Phase retrieval from single biomolecule diffraction pattern

In this paper, we propose the SPR (sparse phase retrieval) method, which is a new phase retrieval method for coherent x-ray diffraction imaging (CXDI). Conventional phase retrieval methods effectively solve the problem for high signal-to-noise ratio measurements, but would not be sufficient for single biomolecular imaging which is expected to be realized with femto-second x-ray free electron laser pulses. The SPR method is based on the Bayesian statistics. It does not need to set the object boundary constraint that is required by the commonly used hybrid input-output (HIO) method, instead a prior distribution is defined with an exponential distribution and used for the estimation. Simulation results demonstrate that the proposed method reconstructs the electron density under a noisy condition even some central pixels are masked.

1 Introduction X-ray free electron lasers (XFELs) can potentially provide us a novel mean to determine the threedimensional (3D) structure of biomolecules from the diffraction images of single molecules [1,2]. Conventionally, x-ray crystallography has been the principal tool to determine high-resolution 3D structures of proteins, nucleic acids, and their complexes. However, a critical process is the crystallization, where a sufficiently large crystal must be prepared. It is known that about 40% of biomolecules, particularly membrane proteins, do not crystallize. The coherent x-ray diffraction imaging (CXDI) by XFELs does not require a crystal and will change the situation.
In order to realize the single molecule imaging, a straightforward scenario has been proposed [3,4] and demonstrated with single mimivirus particles [5]. Firstly, a series of single molecules with the same structure are injected into vacuum and exposed to a very short (less than 5 fs) pulse x-ray beam. The diffraction image of the molecule with unknown orientation is recorded before radiation-induced structural degradation. This process is repeated until a sufficient number of images are recorded. The diffraction data are then classified into groups according to the orientations of the molecule and averaged to improve the signal-to-noise ratio [6,7]. Next, the relative orientations of the classes are determined to assemble a 3D coherent diffraction pattern in reciprocal space. Finally, the 3D electron density of the molecule is obtained by a phase retrieval method.
The above scenario was proved to be effective for the 3D reconstruction of mimivirus particles [5]. However, the condition for a single molecule CXDI will be more severe since the number of the photons received by each pixel will be very limited. Some methods of reconstructing 3D diffraction intensity have been proposed for this problem [8][9][10]. They are likely to be computationally intensive and the feasibility remains to be seen.
Another scenario might be possible. After collecting a sufficient number of diffraction images with unknown orientations, 2D electron density images are first reconstructed from the 2D diffraction data, followed by the 3D reconstruction method used in transmission electron microscopy [11]. In either scenario, the difficulty is due to the small number of photons scattered by a single molecule even though x-ray beams that linac coherent light source (LCLS) has started to commission give us illumination which is around 10 9 times brighter (∼10 19 photons/pulse/mm 2 in the unfocused beam) than that of current synchrotrons [12].
We propose a new method for retrieving phases of a diffraction image which will be obtained by XFELs. The method is based on the maximum a posteriori (MAP) estimation of the Bayesian statistics. The posterior distribution is computed from the likelihood and the prior. For the present problem, the Poisson distribution is appropriate for the likelihood. The prior reflects the assumption of the electron density. Since the electron density is known to have a lot of zeros, we employ the exponential distribution which encourages the MAP estimate to have a lot of zeros (this is called a sparse solution [13,14]. A similar idea is employed in [15]). Therefore, we call our method, sparse phase retrieval (SPR) method.
This combination is effective for the problem and is distinct from other Bayesian approaches proposed for different phase retrieval problems [16,17]. Compared to the widely used hybrid input-output (HIO) method [18][19][20], the SPR method does not require the support region and is robust against the limited photon counts and the missing central pixels. In this paper, we focus on 2D phase retrieval problem. The resulting SPR method can be used directly for the second scenario. If 3D diffraction data are available, it is also possible to apply the SPR method. But reconstruction of 3D diffraction data from 2D images with unknown orientations is beyond the scope of the present paper.

Proposed method 2.1 Diffraction image and noise
We first formulate the phase retrieval problem of a single diffraction image. Let f (x, y) ≥ 0 be the electron density of a molecule projected onto a two-dimensional (2D) plane. The coordinate is discretized, that is, x, y = 1, · · · , M . We introduce the following notation where, F is the Fourier transform of f defined as follows, The coefficient of the Fourier transform differs from the commonly used 1/M 2 . We prefer the above definition because the power of F and f is the same. We assume that the frame size is sufficiently larger than the molecule in the image and many components of f are zero.
In the ideal situation, each pixel of a diffraction image receives the number of photons proportional to the power spectrum |F uv | 2 , where (u, v) is the 2D index of the pixel. However, the XFEL measurement of a single molecule diffraction will be different because the flux of the x-ray laser is not sufficiently large. The number of the photons detected by each sensor will be small and behave stochastically.
Let N uv be the number of photons detected by the sensor at (u, v). We study the distribution of N uv . The total number N all = uv N uv is a stochastic variable which follows the Poisson distribution. The expected value of N all is proportional to the intensity I X of x-ray. Assuming each N uv is independent and the distribution belongs to the same family, N uv also follows a Poisson distribution. Let the expected value of N uv be S uv and the distribution of N uv becomes We only consider this counting process and do not consider other observation noise. Approximately, S uv is denoted as S uv = α|F uv | 2 cos 3 θ, where α is a positive constant and θ is the scattering angle (see Fig. 1). Because θ is a function of u and v, we further rewrite it as α|F uv | 2 c uv , where c uv = c(u, v) = cos 3 θ.
Note that the scaling of |F uv | 2 cannot be determined only from the observed scattering pattern and α is set to 1. This scaling indeterminacy is ignored here. It would be recovered from other knowledge, such as the total energy. In the following, we consider the density reconstruction as an estimation of f . Commonly used estimate is the maximum likelihood estimate (MLE) which maximizes p(N |f ). The likelihood in Eq. (2) is maximized by setting |F uv | 2 = N uv /c uv , but this is not sufficient for the estimation of f because the phase is lost. Some additional information is needed.

SPR method
Instead of the MLE, the MAP estimate of the Bayesian statistics is used for the SPR method. The MAP estimate maximizes the posterior distribution, where the posterior distribution is defined as the product of the likelihood function and the prior distribution. Similar framework has been proposed for different phase retrieval problems [16,17].
The likelihood function is defined as a Poisson distribution as in Eq.
(2) and we discuss the prior distribution in this subsection. For a 2D image of a single molecule diffraction, it is natural to assume that the electron density is high in the central part and becomes zero in the peripheral part. But we do not know which pixel is zero. The prior should reflect this knowledge.
It is known that the prior distribution with an exponential distribution encourages the MAP estimate to have a lot of zeros [13,14] and we use this for the SPR method where each f xy is assumed to be independent. The hyperparameter ρ xy reflects the prior belief of f xy = 0. Larger the value of ρ xy , stronger the belief. In order to reflect the prior knowledge, we define the following one parameter function ρ(µ) xy The two parameters a and b were adjusted to make w xy equal to 0 at the center and 1 at the corners (see Fig. 2) and µ adjusts the overall strength of the prior. We have employed a parabolic function, but if there is any prior knowledge, ρ(µ) xy should reflect it.
The posterior distribution of f observing N has the following relation, It is convenient to take the logarithm of p(f |N ) in order to compute the MAP estimate. By taking the logarithm and collecting terms related to f , the following function is obtained The first summation corresponds to the likelihood with the Poisson distribution (similar idea is used in [16]), the second summation represents the prior with the exponential distribution, and µ controls the balance between two terms. This combination is the key idea of the proposed method. The SPR method is summarized as follows.
We start with a large µ and decrease it. The MAP estimate is computed for each µ, and the best value of µ is selected according to some criterion. This issue will be discussed later. We note that our formulation is related to the compressed sensing (CS) [21], and the idea of CS has been applied to the phase retrieval problem [15]. Our approach is different from the work since the Poisson likelihood is included.

Algorithm
A gradient based algorithm is used to compute the MAP estimate. The algorithm is shown below starting with the following relation, By defining the inverse Fourier transform as In order to compute the derivative of ℓ µ (f |N ), F is applied to f , then F −1 is applied to g(F ; N ). Thus, the computational cost is similar to the HIO method [19]. We employed a naïve iterative updating rule for the MAP estimate of f xy , x, y = 1, · · · , M .
where t is the iteration number. The "max" operation keeps f xy nonnegative. The positive variable η t controls the step size and a simple line search of η t accelerated the convergence largely. The initial value is generally important for gradient based algorithms. But for the SPR method, the influence is effectively reduced by controlling µ. The MAP estimate is first computed for a large µ and then for smaller µ's successively. For a very large µ, most components of the optimal f are 0 and the influence of the initial value is negligible. The MAP estimate is then used for the initial value of a slightly smaller µ, which is a good starting point in general.

HIO method and Bayesian framework
In this subsection, we discuss the relation between the widely used HIO method and the SPR method.
This problem can be formulated as the MAP estimation problem by setting the Gaussian likelihood function and the uniform prior distribution, that is, where C is a positive number. If f xy is unbounded, abovep u (f xy ) is improper. The MAP estimate is computed by solving Eq. (7). Formally, the difference between the HIO method and the SPR method is the likelihood and the prior. But more importantly, the proposed SPR method does not require the support region. In the SPR method, a lot of components of the estimated electron density f become 0 automatically.

Simulated data
The SPR method was tested with simulated diffraction data. The electron density of lysozyme, a 14 kD protein, was projected onto a 2D plane. The projected density was then Fourier transformed into reciprocal space (u, v) to simulate a diffraction image. The density and the ideal diffraction image are shown in Figs. 3a and 3b, respectively.
Experimental diffraction images were simulated by converting the diffraction intensity into the number of photons per effective pixel (λ/σL) 2 according to the Poisson distribution with the mean, S uv , defined as follows,  Here, I is the incident x-ray flux (photons/pulse/mm 2 ), r c is the classical electron radius (2.82×10 −12 mm), λ is the x-ray wave length (0.1 nm), σ is the oversampling ratio per one dimension (2), and L is the molecule diameter (6.16 nm) (see Fig. 1). The Ewald sphere curvature was ignored but this is a good approximation for small angles.
Two examples for scattered photons of x-ray fluxes with I = 5.0×10 21 and 1.0×10 21 photons/pulse/mm 2 are shown in Figs. 3c and 3d, respectively. Although the fluxes in the above simulation are around 100 times larger than that of the current LCLS, the sizes of the molecules of interest are generally larger than lysozyme. When the size is around 1000 kD, the total number of photons arriving at the 2D plane will be in the similar range as Figs. 3c and 3d.  The SPR method was applied to the data to reconstruct the density. Figure 4 shows the results of the SPR method applied to the ideal diffraction image. As µ decreases, more components become positive. The reconstructed electron density becomes similar to the true density when µ = 1. Thus, it is important to select the best µ. A simple strategy is to rank the results with a criterion. Many criteria have been proposed in statistics, such as, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) [22], but here we use a different one. Letf (µ) be the maximizer of ℓ µ (f |N ) and the following error is used as the criterion

Results
whereF (µ) uv = F (f (µ)) uv . This error function has been widely used for the phase retrieval problem and is employed here as well in order to rank the results. Roughly speaking, the error decreases as µ shrinks since it controls the balance between the prior and the likelihood. But the error does not necessarily decrease monotonically because the log posterior in Eq. (4) is different from the error. Thus, we choose the µ which minimizes Error F (µ). Figure 5 shows the Error F (µ) for diffraction images in Fig. 3. The value of µ was varied from 1×10 4 to 1×10 −4 . Note that although the horizontal axis of Fig. 5 increases from left to right, the value of µ was decreased in the experiments. The best µ was selected for each diffraction image by Error F (µ) and the reconstructed electron densities are shown in Figs. 6a, 6b, and 6c.
We also applied the HIO method for the data. The central square (154 × 154 pixels) was set as the support region. The method has a problem with convergence, and we stopped the iteration loop when the incremental difference of the estimated density becomes small enough. Figure 7 shows the reconstructions   The black circle and triangles on the curves correspond to the reconstructed density images of Fig. 9.
obtained by the HIO method for the diffraction images in Fig. 3. The reconstructed electron densities are similar to Fig. 6, but the SPR method obtained better Error F (µ) for all of the diffraction images. We note the computational time of the SPR method. The method was implemented with C and run on a desktop machine with an Intel Xeon X5570 (2.93 GHz 4 cores). The optimal electron density was computed for all the values of µ's (33 different values of µ) within 10 minutes.

Missing centers
One of the problems of the phase retrieval from diffraction images is missing centers. The intensities around the center cannot be measured experimentally because the scattered beam is superimposed by the direct beam around the center and a beam stop is used to block the direct beam.
The SPR method can be applied to the case easily by modifying eq. (4) as follows, where α is the active pixel set where the central pixels are excluded. The error function is also modified as Firstly, we masked the central pixel from each diffraction image in Fig. 3 and applied the SPR method. By masking the central pixel, Fig. 3b lost around 5.7% of the photons, Fig. 3c lost 294 out of 4630 photons and Fig. 3d lost 56 out of 957 photons. Figure 8 shows the errors. Note that the error values are better than those in Fig. 5, but it does not mean the reconstructed densities in Fig. 9 are better than those in Fig. 6, because the errors are computed only for the active pixel set. Figure 10 shows the results of the HIO method. Since the central pixel corresponds to a scaling factor of the electron density of the protein, the reconstructed images are quite similar to those reconstructed from the complete diffraction images. We see that the error values of the SPR method are better than those of the HIO method.
Next, we further masked the central 3 × 3 pixels from each diffraction image and applied the SPR method. Note that Fig. 3b lost around 35% of the photons, Fig. 3c lost 1595 out of 4630 photons and Fig. 3d lost 312 out of 957 photons. More than 30% of the photons are lost and the results are expected to be degraded largely. Figure 11 shows the errors. The reconstructed densities are shown in Fig. 12. Figure 13 shows the results of HIO method. We see that the error values of the SPR method are better and the reconstructed results are more stable than the HIO results. It is worth noting that the SPR

Discussion
We have proposed a new phase retrieval method. The SPR method is based on the MAP estimation of the Bayesian statistics, it will be effective for single molecule diffraction images which will be obtained by XFELs.
From a Bayesian viewpoint, the difference between other phase retrieval methods including the HIO method and the SPR method is summarized in the likelihood and the prior. Many methods use the squared loss, which corresponds to a Gaussian likelihood, while the present method employs a Poisson distribution. This is suitable for CXDI data because the number of the scattered photons detected at each pixel is a counting process. Similar idea has been found in a related work [16]. The prior in HIO method corresponds to a uniform distribution with a bounded support, while the proposed method uses the exponential distribution. Similar prior has been widely used in statistics [13]. This combination shows a new promising direction for phase retrieval.    The proposed method has been tested with simulated data. The electron densities are reconstructed with a reasonable computational cost, and errors of the results are smaller than the HIO method. We further tested the method with the diffraction data where some central pixels are masked. The proposed method is applied without any major modification, and the results are quite stable even some pixels are masked.
The SPR method reconstructs the density from an incomplete observation utilizing the sparsity. This idea is similar to CS [21]. In [15], CS is applied to the phase retrieval problem, however, in our problem, the number counts are very limited, and we needed to extend the model.
Finally, we list some of our future works to improve the SPR method. One is the hyperparameter. The hyperparameter ρ(µ) xy can be modified to reflect the knowledge of the true density. This may result in a better estimate. Another issue is the algorithm. The densities for the whole sequence of µ in Fig. 5 are computed with a simple line search algorithm but advanced optimization methods may speed it up.