Perspectives for imaging single protein molecules with the present design of the European XFEL

The Single Particles, Clusters and Biomolecules & Serial Femtosecond Crystallography (SPB/SFX) instrument at the European XFEL is located behind the SASE1 undulator and aims to support imaging and structure determination of biological specimen between about 0.1 μm and 1 μm size. The instrument is designed to work at photon energies from 3 keV up to 16 keV. Here, we propose a cost-effective proof-of-principle experiment, aiming to demonstrate the actual feasibility of a single molecule diffraction experiment at the European XFEL. To this end, we assume self-seeding capabilities at SASE1 and we suggest to make use of the baseline European XFEL accelerator complex—with the addition of a slotted-foil setup—and of the SPB/SFX instrument. As a first step towards the realization of an actual experiment, we developed a complete package of computational tools for start-to-end simulations predicting its performance. Single biomolecule imaging capabilities at the European XFEL can be reached by exploiting special modes of operation of the accelerator complex and of the SASE1 undulator. The output peak power can be increased up to more than 1.5 TW, which allows to relax the requirements on the focusing efficiency of the optics and to reach the required fluence without changing the present design of the SPB/SFX instrument. Explicit simulations are presented using the 15-nm size RNA Polymerase II molecule as a case study. Noisy diffraction patterns were generated and they were processed to generate the 3D intensity distribution. We discuss requirements to the signal-to-background ratio needed to obtain a correct pattern orientation. When these are fulfilled, our results indicate that one can achieve diffraction without destruction with about 0.1 photons per Shannon pixel per shot at 4 Å resolution with 1013 photons in a 4 fs pulse at 4 keV photon energy and in a 0.3 μm focus, corresponding to a fluence of 1014 photons/μm2. We assume negligible structured background. At this signal level, one needs only about 30 000 diffraction patterns to recover full 3D information. At the highest repetition rate manageable by detectors at European XFEL, one will be able to accumulate these data within a fraction of an hour, even assuming a relatively low hit probability of about a percent.


I. INTRODUCTION AND REQUIREMENTS
Imaging of single molecules at near-atomic resolution is expected to result in a significant advance in structural biology. One could obtain structural information of large macromolecular assemblies that cannot crystallize, like membrane proteins. In order to perform single molecule imaging, a straightforward "diffraction before destruction" method has been proposed. 1-4 A great number of single molecules with the same structure are injected into vacuum and interact with ultrashort X-ray pulses, before being completely destroyed. A sufficient number of diffraction patterns is recorded, with unknown orientation. Next, the relative orientations of the 2329-7778/2015/2(4)/041702/11 V C Author(s) 2015 2, 041702-1 different images is determined, so that the 3D intensity distribution of the particle can be obtained. 5,6 The 3D electron density of the molecule is obtained from the 3D diffraction pattern with the help of a phase retrieval method. An important parameter of the problem is the number of scattered photons per effective Shannon pixel. For biological material, the photon count per shot per pixel of solid angle X p , averaged over shells of wavenumber q is proportional 7 to the square of the wavelength k 2 . Lower photon energies result in a stronger diffraction signal, but a limit is dictated by the resolution that one needs to achieve, a balance being in the range between 3 keV and 5 keV. The FWHM focal spot size should be roughly between 5 and 10 times larger than the sample size to grant good photon beam quality within the interaction area. 8 We find therefore that a biomolecule of around 15 nm diameter, with a number of nonhydrogen atoms in the molecule N atom $ 30 000, requires a pulse fluence of about 10 13 photons/ (300 nm) 2 , for an average of hN p i $ 0:1 photons per Shannon pixel at a photon energy of 4 keV. This signal level is higher than what is required by usual methods of pattern orientation determination. Photons have to be delivered in extremely short X-ray pulses to limit radiationinduced changes during the exposure. Estimates indicate that an X-ray pulse duration shorter than about 5 fs is needed. [9][10][11][12][13] The key parameter for optimizing a photon source for single biomolecule imaging is then the peak power. Ideally, the peak power in our case of interest should be more than 1 TW. For example, we note that 10 13 photons at 4 keV correspond to an energy of about 6 mJ which yields, in 4 fs, a peak power of about 1.5 TW. It is worthwhile to mention that 1 TW at 4 keV gives the same signal per Shannon pixel as 27 TW at 12 keV (assuming a fixed pulse duration). In this article, we study possibilities and opportunities for single biomolecule imaging, which will be enabled by applying advanced FEL techniques to the SPB/SFX (Single Particles, Clusters and Biomolecules & Serial Femtosecond Crystallography) instrument to be installed 14,15 in the European XFEL baseline.

II. TW SOURCE FOR THE SPB/SFX INSTRUMENT
The SPB/SFX instrument at the European XFEL will be located at the SASE1 undulator line. 14,15 Fig. 1 shows this line from the injector up to the SASE1 undulator. Our scheme for an X-ray source suitable for the SPB/SFX instrument is heavily based on the use of a slotted spoiler foil in the last bunch compressor chicane, a method devised and experimentally demonstrated at the LCLS. [16][17][18] The last linac section before the third bunch compressor BC2 is set at an off-crest accelerating rf phase, so that a yt bunch tilt is present at the center of BC2. A thin foil with a narrow slot at its center is placed in the beam path. Coulomb scattering of the electrons passing through the foil increases the emittance of most of the beam, but leaves a thin unspoiled slice, where the beam passes through the slit, thus allowing for an x-ray FEL pulse much shorter than the FWHM electron bunch duration. The minimum duration of the unspoiled slice of the electron bunch measured at the LCLS is about 3 fs. A design of a self-seeding setup based on the undulator system for the European XFEL is sketched in Fig. 2. We exploit a combination of a self-seeding scheme [19][20][21] with an undulator tapering technique [22][23][24][25][26][27][28][29][30][31][32][33] consisting in a slow reduction of the field strength of the undulator in order to preserve the resonance wavelength, while the kinetic energy of the electrons decreases due to the FEL process. Highly monochromatic pulses generated with the self-seeding technique make the tapering more efficient than in the SASE case. Here, we study a scheme for generating 1 TW-level X-ray pulses in the SASE1 tapered undulator. We optimize our setup based on simulations for a 14 GeV electron beam with 1 nC charge compressed up to 10 kA peak current. In this way, the output power of the SASE1 undulator could be increased from the value of 100 GW in the SASE regime to about 1.5 TW at the photon energy range around 4 keV. For self-seeding, we consider a single-crystal scheme, with a crystal identical to that installed at the LCLS, allowing for exploitation of different reflections. In Fig. 3, we show amplitude and phase of the transmittance for the C(111) asymmetric Bragg reflection at 4.1 keV. 21,34 The monochromatic seed signal is exponentially amplified passing through the first 7 uniform cells of the output undulator and reaches saturation with about 100 GW power. In a second part of the output undulator, the monochromatic FEL signal is enhanced up to 1.5 TW by taking advantage of the undulator magnetic field taper over the last 22 cells.

III. OPTICS LAYOUT FOR THE SPB/SFX INSTRUMENT
The SPB/SFX optical layout 14,15 is sketched in Fig. 4. The first upstream optical element is a Horizontal Offset Mirror (HOM) pair with a clear aperture along the mirror surface of 800 mm. 35 For the maximal incident angle h ¼ 3.6 mrad, one achieves an overall highreflectivity close to 100% over the photon energy range between 3 keV and 5 keV. It can be shown that in our case of interest the HOMs are expected to preserve the radiation wavefront. Once the radiation pulse enters the experiment area, it is focused by a Kirkpatrick-Baez (KB) mirror system. 15 The layout for the KB system at SPB/SFX is shown in Fig. 4. Two elliptical mirrors with a 950 mm clear aperture along the mirror surface and a fixed incidence angle of 3.5 mrad are assumed in the vertical and horizontal direction in order to achieve high efficiency at high photon energies. Considering a 950 mm clear aperture and a 3.5 mrad reflection angle, one obtains 36 a lateral aperture of 3.3 mm. However, the $900 m-long propagation distance from source to sample leads to a large lateral beam size at the focusing optics. In fact, accepting 4r of the beam, for a photon energy of 4 keV, the desired lateral aperture for the ultra-short pulse case is about 8 mm. As a result, due to the large divergence of a nominal X-ray pulse shorter than 10 fs, one suffers major diffraction effects from the KB mirror aperture, leading to about a hundred-fold decrease in fluence at photon energies around 4 keV. However, it is possible to obtain an X-ray source capable of producing X-ray pulses with smaller angular divergence of about 2 lrad and, simultaneously, about 4 fs duration. This can achieved by preparing an electron beam with the characteristics described in Sec. IV and by introducing a slotted foil setup in the last electron bunch compressor of the accelerator complex.

IV. RADIATION FROM SASE1
The source divergence is the most important parameter for the X-ray beam transport system and is largest for the lowest photon energies and the lowest electron charge. At any given photon energy, in the nominal SASE mode of operation, a shorter photon pulse duration corresponds to a lower electron charge and, therefore, directly translates into a larger divergence of the radiation pulse. In particular, at a photon energy of 4 keV, X-ray pulses with duration shorter than 10 fs correspond to the lowest charge for the baseline SASE mode of operation, that is 20 pC. This also corresponds to the largest FWHM divergence of 5 lrad. Such large divergence is too large to fit the present acceptance of the SPB/SFX instrument.
We propose to overcome this issue based on a special mode of operation of the accelerator complex, which will allow to reach the required fluence with the present design of the SPB/ SFX instrument optics. In the following, we describe the operation of SASE1 driven by an electron beam with specially optimized parameters. We use an electron bunch of 1 nC, a peak current of 10 kA, and a normalized rms emittance of about 1 mm mrad at 14 GeV. Full start-to-end simulations for the electron bunch from the photocathode to the SASE1 undulator entrance can be found in Ref. 37. The slotted foil method, [16][17][18] which is used routinely at the LCLS, provides X-ray pulses down to 3 fs duration. 18 Detailed computer simulations with 2 Â 10 5 macroparticles have been carried out to evaluate the performance of the slotted spoiler for our case of interest using the tracking code ELEGANT. 38 They include multiple Coulomb scattering in a 2 lm thin aluminum foil. The FEL process is simulated with the code GENESIS. 39 Filtering through the self-seeding monochromator was performed with the help of in-house routines. The output power and spectrum for the case of 4 fs pulse mode of operation at 4.1 keV photon energy are shown in Fig. 5. A complete data set of simulations of the radiation pulse is available in Ref. 40, where the mode same mode of operation discussed here is studied in more FIG. 4. Sketch of optical components for the SPB/SFX instrument. 14,15 detail. The present study builds on results in Ref. 40 including data processing up to orientation recovery and presenting first requirements on uniform background as discussed in Secs. VII.

V. NANO-SCALE FOCAL SPOT
We carried out wavefront propagation simulations to investigate the evolution of the radiation beam profile through the SPB/SFX optics. Our wave optics analysis takes into account aberrations and errors from each optical element. In our case of interest, a reflection from the mirror becomes similar to the propagation through a transparency at the mirror position, which just changes the phase of the reflected beam without changing its amplitude. Applying the Marechal criterion, i.e., requiring a Strehl ratio larger than 0.8, and treating the errors from the different optics independently, we conclude that an height error h rms < 1.5 nm should be sufficiently small for diffraction-limited propagation through the SBP beamline at a photon energy of 4 keV. In fact, the SPB/SFX instrument designers are planning to use mirrors capable of preserving the geometrical focus properties at much shorter wavelength range. The effects of the horizontal offset mirrors in the X-ray beam transport are modeled using the code SRW 41 as a combination of two apertures with sizes determined by the mirror length (800 mm in our case) and two phase shifters describing the mirror surface errors. The SRW code has further the capability of modeling the KB optics by elliptical mirrors (with length of 950 mm in our case) and to account for all aberrations. The KB mirror surface errors are simulated by two phase shifters, similar to the case of offset mirrors. The plot in Fig. 6 shows the intensity profile at the focus, integrated over the radiation pulse. This is thus a simulation of the energy profile per unit surface that can be measured by a detector that integrates over a single radiation pulse, placed in the plane of interest. The maximal fluence in the focus may now be obtained from F ¼ N p /S, where N p is the number of photons into the radiation pulse and S is the effective focal spot squared. We found that 1/S $ 1.5 Â 10 9 cm À2 . For 10 13 photons/pulse, which can be achieved as discussed previously, this amounts to a fluence of about 1.5 Â 10 22 photons/cm 2 . This result can be achieved without additional cost for the baseline optical layout of the SPB/SFX instrument and with very moderate costs for the installation of the slotted foil setup into the beam formation system.

VI. NOISY X-RAY DIFFRACTION PATTERNS
Simulations of diffraction patterns were performed using the program Moltrans. 43 This program calculates the coherent sum of the scattering contributions from each atom in terms of spherical waves, yielding the intensity in every pixel of the detector. The simulations were carried out for a photon energy of 4.1 keV, corresponding to a wavelength k ¼ 0.3 nm. The sample-detector distance 42 was chosen to be 100 mm and the detector size was set to 256 Â 256 mm, which is close to the size of the 1 megapixel Adaptive Gain Integrating Pixel Detector (AGIPD). 44 These simulation parameters give the best achievable resolution in real space, 0.342 nm, corresponding to the edge of the detector. Further discussions on the resolution limit can be found in Ref. 40. For the simulations we used complete the 12-subunit RNA Polymerase II structure (1 wcm in pdb database). This structure is approximately 15 nm in diameter and consists of about 31 000 atoms.
In order to increase the signal in each pixel and to speed up all calculations we binned the AGIPD 200 lm pixels 5 Â 5 times. Therefore, all calculations were performed with 256 Â 256 pixels of 1 Â 1 mm 2 each. For the wavelength and sample-detector distance used here, we have a sampling rate, i.e., a number of pixels per speckle in each direction, of two. Such sampling rate should allow for successful reconstruction of the combined reciprocal space. The Shannon pixel size for our calculation is therefore 2 Â 2 mm 2 .
FIG. 6. Distribution of the radiation pulse energy per unit surface in the plane placed in the focus, integrated over the radiation pulse.

Struct. Dyn. 2, 041702 (2015)
The intensity in each pixel of the detector was converted into number of photons for two different incident fluences: 10 21 and 10 22 photons/cm 2 . Poisson noise was added to the calculated intensity. A quantum efficiency of about 85% is assumed for the AGIPD, for standard window and at the photon energy of 4 keV. From the AGIPD design 44 one can evaluate the expected number of false hits per detector pixel (0.2 mm Â 0.2 mm). Assuming an Equivalent Noise Charge (ENC) of 300 electrons and a threshold of 0.9, one obtains 10 À4 false photon counts per pixel or 100 photons/frame. Although there is no single photon sensitivity, for our purposes it is sufficient to have a false positive level small enough -compared to the average photon count at the detector edge-to be handled. In Sec. VII, we quantify this statement. For a fluence of 10 22 photons/cm 2 , a false positive level of 100 photons/frame does not constitute an issue for the retrieval of the 3D intensity distribution.
We simulated 30 000 randomly oriented diffraction patterns for the RNA Pol II structure with fluence of 10 22 photons/cm 2 and 300 000 patterns with fluence of 10 21 photons/cm 2 . The plot in Fig. 7 shows the radial average of the photon count. This plot demonstrates that a signal of the order of 0.1 photons per Shannon angle at resolution of about 0.4 nm (15.7 nm À1 ) can be achieved. A typical diffraction pattern from a single FEL pulse as seen by the AGIPD is shown in Fig. 8. For the present detector layout, almost 30% of the detector area is actually insensitive or "dead." A central hole with the size of about 3 mm, covering 1.5 speckles (or Shannon pixels), was introduced in all simulated diffraction patterns, and this region was excluded from the calculations. We also took into account the 6 mm wide gaps between different detector modules corresponding, in our case, to the size of three Shannon pixels (Fig. 8).

VII. DATA PROCESSING AND BACKGROUND
The main step in processing the noisy diffraction patterns is to get the 3D intensity distribution of the molecule (slice shown at Fig. 9(a)) by assigning orientations to each individual image. We performed this using the Expansion-Maximization-Compression (EMC) algorithm. 45 Results are shown in Fig. 9(b) for the data generated in Sec. VI (30 000 patterns with 10 22 photons/cm 2 giving 4000 photons/pattern). 3D rotation space was discretized into 691 440 groups.
To demonstrate that the signal level is not limiting in terms of orientation recovery, we also generated a data set with incident fluence of 10 21 photons/cm 2 (400 photons/pattern) and 300 000 patterns. The result is shown in Fig. 9(c).
We believe that the principal factor affecting the feasibility of the experiment at the predicted signal levels will be background. As a first step towards understanding these effects, we simulated the effect of uniform background at an intermediate signal level of 800 photons/pattern (corresponding to incident fluence of 2 Â 10 21 photons/cm 2 ). Data were generated with different background levels from 40 to 600 photons/pattern. The expectation-maximization update rule in the EMC algorithm was modified to take into account this known background level at each pixel. The results are shown in Fig. 10. As one can see, for more than 400 background photons/pattern, the algorithm is unable to accurately assign orientations. The analysis presented in this section includes only uniform background distribution and should be considered as a first study towards the understanding of requirements on the background level for the 3D intensity reconstruction. A more thorough analysis is required at different signal levels as well as with non-uniform background distributions.

VIII. DISCUSSIONS AND CONCLUSIONS
The imaging method "diffraction before destruction" promises to be a revolutionary technique for structural biology, capable of resolving the structure of molecules that cannot crystallize. Here, we propose a cost-effective proof-of-principle experiment, aiming to demonstrate the actual feasibility of a single molecule diffraction experiment using the baseline European XFEL accelerator complex and the SPB/SFX hardware. More specifically, we want to determine the structure of a relatively small (about 30 000 non-hydrogen atoms), well-known protein molecule and compare it with results in the protein data bank. We developed a complete package of computational tools for start-to-end simulations predicting the performance of this experiment. Its composition is sketched in Fig. 11. In this paper, we reported about detailed simulations from the photocathode of the European XFEL injector up to data processing.
We found that with some relatively inexpensive modifications to the accelerator and undulator setup, we can obtain the required photon flux to determine the structure of a 15 nm sized RNA polymerase II molecule. This was contingent on having little or no background. Some preliminary simulations suggest that the understanding and reduction of such non-sample scatter is very important for the future of single protein imaging.