Reconstruction of three-dimensional molecular structure from diffraction of laser-aligned molecules

Diffraction from laser-aligned molecules has been proposed as a method for determining 3-D molecular structures in the gas phase. However, existing structural retrieval algorithms are limited by the imperfect alignment in experiments and the rotational averaging in 1-D alignment. Here, we demonstrate a two-step reconstruction comprising a genetic algorithm that corrects for the imperfect alignment followed by an iterative phase retrieval method in cylindrical coordinates. The algorithm was tested with simulated diffraction patterns. We show that the full 3-D structure of trifluorotoluene, an asymmetric-top molecule, can be reconstructed with atomic resolution.

Diffraction from laser-aligned molecules has been proposed as a method for determining 3-D molecular structures in the gas phase. However, existing structural retrieval algorithms are limited by the imperfect alignment in experiments and the rotational averaging in 1-D alignment. Here, we demonstrate a two-step reconstruction comprising a genetic algorithm that corrects for the imperfect alignment followed by an iterative phase retrieval method in cylindrical coordinates. The algorithm was tested with simulated diffraction patterns. We show that the full 3-D structure of trifluorotoluene, an asymmetric-top molecule, can be reconstructed with atomic resolution. Most known molecular structures have been determined using X-ray diffraction from crystallized samples. For molecules that cannot be crystallized, structural determination in the gas phase becomes important. Gas electron diffraction (GED) has been a successful method for structure determination for molecules in the gas phase. However, it provides only 1-D information (interatomic distances), which limits the complexity of structures that can be retrieved. Laser alignment breaks the spherical symmetry of randomly oriented molecules in the gas phase, creating a possible solution for 3-D imaging without crystallization. 1,2 In 1-D alignment, a linearly polarized laser pulse is used to align the most polarizable axis of the molecule. For symmetric-top molecules, the molecule is free to spin with respect to this axis, while for asymmetric-top molecules, the laser affects the angular distribution about two molecular axes. 3 In 3-D alignment, either two laser pulses or an elliptically polarized laser pulse are used to align all three axes. 4,5 Most studies of structure determination from diffraction of aligned molecules have focused on 1-D alignment because it provides a higher degree of alignment. Existing reconstruction algorithms [6][7][8][9][10][11][12] are limited by several factors. First, they require an extremely high degree of alignment, which is not compatible with current diffraction experiments. In order to reach a high degree of alignment it is necessary to start with very cold molecules that can only be produced at densities that are several orders of magnitude below those needed for diffraction experiments. Second, each method is limited to a specific type of molecule. Third, only a 2-D projection of the 3-D structure can be reconstructed unless the symmetry of the molecule can be assumed to be known. Diffraction from a single molecule could in principle be used for 3-D imaging with atomic resolution; however, in general it is not possible to accumulate enough scattering events before the molecule is damaged (either with X-rays or electrons). X-ray free-electron lasers (X-FELs) could potentially be operated in single-molecule diffraction mode if sufficiently short and intense pulses become available. 13 Structure determination from diffraction of aligned molecules is currently an area of great interest. [6][7][8][9][10][11][12][14][15][16] Even for the case of perfect 1-D alignment, a general method for structural reconstruction has not been demonstrated. The difficulty lies in the rotational averaging over all azimuthal angles with respect to the alignment axis. 3 Iterative phase retrieval in cylindrical coordinates can be used to reconstruct an azimuthal projection of molecules with a high degree of rotational symmetry. 6,7 A holographic method has been applied to symmetric-top molecules in which one of the atoms has a scattering cross section that is much higher than the others. 8 An angular-correlation-function method [9][10][11][12] can be used to reconstruct the axial projection of an object 11,12 and the 3-D structure of objects of cylindrical symmetry 9,10 with 1-D perfect alignment. It has been recently shown that angular correlation functions are able to reconstruct the 3-D structure from randomly oriented objects with helical symmetry 14 or icosahedral symmetry, 17 provided that singleshot diffraction patterns from a single molecule or a few molecules are available. Recently, the holographic method, combined with a genetic algorithm, has been used to reconstruct an azimuthal projection of a symmetric-top molecule from experimental data. 2 In this Letter, we demonstrate a two-step method for retrieving the 3-D structure of trifluorotoluene (C 6 H 5 CF 3 ), an asymmetric-top molecule, using two diffraction patterns: one corresponding to partial 1-D alignment and the other to randomly oriented molecules. Perfect alignment corresponds to a delta-function-like angular distribution, while partially aligned molecules have a continuous angular distribution. The diffraction pattern of partially aligned molecules can be treated as a convolution of the diffraction pattern of perfectly aligned molecules with the angular distribution. The first step in our method is a deconvolution to retrieve the diffraction pattern corresponding to perfectly aligned molecules from the two input diffraction patterns, using a genetic algorithm. 2,18 The second step is to reconstruct the molecular structure from the deconvolved diffraction pattern. Briefly, an iterative phase retrieval algorithm in 3-D cylindrical space is used to reconstruct the full 3-D molecular structure. The genetic algorithm used in the first step is applicable for the case where a single axis of the molecule is aligned. For asymmetric-top molecules, 1-D alignment affects the angular distribution about two molecular axes. We were able to apply the genetic algorithm by selecting a time at which the angular distribution is narrow along the most polarizable axis and approximately random along the second-most polarizable axis.
The genetic algorithm converts the problem of achieving perfect alignment experimentally into a problem that is much easier to solve-achieving partial alignment with a known angular distribution. The genetic algorithm deconvolution requires the diffraction patterns and the angular distribution as inputs. The angular distribution can be measured experimentally or calculated if the moments of inertia and polarizability of the molecule are known. 16,19 If the angular distribution is not known, an estimated distribution can be used as a starting point, and then it is optimized by the genetic algorithm until the distribution that best matches the data is found. The result with lowest final error gives the best retrieval pattern. 19 Since most published reconstruction algorithms assume a pattern corresponding to perfect alignment to begin with, their functions are comparable to our second step, the iterative phase retrieval. Our iterative phase retrieval is ab initio-it does not require any structural information of the molecule, as opposed to standard GED and ultrafast electron diffraction (UED) methods were an initial model of the structure is needed. 20 This advantage will be particularly important for determining the structure of intermediate states in ultrafast reactions, for which it is difficult to obtain accurate theoretical models. In this algorithm, multiple cylindrical harmonics are used to resolve the symmetry of different groups in the molecule, and a 3-D atomicity constraint is applied to retrieve the full 3-D structure with atomic resolution. The atomicity constraint, first implemented in 1952, 21 takes advantage of the fact that the scattering arises from a limited number of point-like objects.
The temporal evolution of the angular distribution of the molecules was calculated in order to simulate the diffraction pattern corresponding to partial alignment. The calculation of the time-dependent rotational wave function for an asymmetric top molecule following an aligning laser pulse has been detailed in prior work. 16 Briefly, we solve the time-dependent Schrodinger equation (TDSE) for a rigid rotor interacting with a non-resonant laser pulse. The laser pulse excites only rotational transitions through the polarizability interactions from the initial ground vibronic state, and centrifugal distortion is not taken into account. A thermal distribution of initial asymmetric top states is used, and the lack of rotational symmetry of trifluorotoluene implies that each rotational state is equally weighted. Focal volume averaging is neglected.
The time-dependent Hamiltonian is where c is the field-free Hamiltonian for an asymmetric rigid rotor, a is the lab frame polarizability tensor of the molecule, and E(t) is the envelope of the laser pulse in the polarization direction, assumed to be Gaussian. The polarizabilities for trifluorotoluene are assumed to be same as that of toluene found in Ref. 22, and the rotational constants A, B, and C can be found in Ref. 23. Since the laboratory frame polarizability tensor contains both the Euler angles h and v, the wave function also depends on both angles and is calculated in the symmetric top basis. We follow the convention in Zare, 24 where h, v, and u are the Euler angles, denoting the polar angle, the azimuthal angle in the molecular frame, and the azimuthal angle in the laboratory frame, respectively. The Boltzmann-weighted squares of the propagated wave functions for each initial state are summed to give the angular distribution q as a function of time during and after the pulse where J 0 , s 0 , and M 0 are the asymmetric top angular momentum indices for the initial states in the thermal distribution, E J 0 ;s 0 ;M 0 is the field-free energy for a particular initial state, and W J 0 ;s 0 ;M 0 ðh; v; tÞ is the time-dependent wave function for the same initial state. k is Boltzmann's constant and T is the rotational temperature. The distribution function q is calculated numerically on a 20 Â 20 grid in h and v from the wave function.
The simulated laser pulse duration of 50 fs with a peak intensity of 15 TW/cm 2 and linear polarization lies in the regime of impulsive alignment. An initial rotational temperature of 0.1 K results in a peak alignment of hcos 2 hi ¼ 0.58, where "hi" indicates averaging over all molecules. While the simulations were performed for impulsive alignment, it is expected that with adiabatic alignment a comparable degree of alignment could be reached with initial temperatures higher than 10 K, although in this case the distribution along v might be less uniform. The temporal evolution of hcos 2 hi and hcos 2 vi is shown in Fig. 1(a). We simulate the diffraction patterns using the angular distribution at a time of 1.5 ps before the peak alignment, where hcos 2 hi ¼ 0.56 and hcos 2 vi ¼ 0.45, indicated by the vertical line in Fig. 1(a). hcos 2 hi ¼ 1 and hcos 2 vi ¼ 1 correspond to perfect alignment, while hcos 2 hi ¼ 1/3 and hcos 2 vi ¼ 1/2 correspond to a random distribution of the corresponding angles. This difference is because h is a polar angle and v is an azimuthal angle. The angular distribution qðh; vÞ at this time is shown in Fig. 1(b). At this time, the distribution approximates single-axis alignment, to which the genetic algorithm can be applied. Even though the angular distribution along v is not quite random, the reconstruction was successful. The diffraction patterns of C 6 H 5 CF 3 were simulated using a structure that was calculated using ab initio molecular orbital techniques. 25 The atomic scattering cross sections and scattering phases are listed in Ref. 26. Simulation of diffraction patterns from imperfectly aligned molecules is described in detail in Appendix A. In this manuscript, I random , I perf ect , and I partial are used to represent the normalized diffraction intensities corresponding to randomly oriented, perfectly single-axis-aligned and partially single-axis-aligned molecules, respectively. All patterns are normalized to the sum of the scattering intensity over all the atoms in the molecule. For the simulation of I perf ect , I partial , and I random , a large set of single-molecule diffraction patterns are calculated for each possible orientation (h, u, v) of the molecule, and the patterns are combined according to the angular distribution. The simulated I perf ect , I partial À I random , and I random are shown in Fig. 2. We use a diffraction-difference pattern I partial À I random that is commonly used in experiments to remove background noise. 20 The alignment axis is vertical in Fig. 2. The diffraction patterns in Figs. 2(a)-2(c) are displayed as a function of s, where s ¼ 4p k sin a 2 À Á is the momentum change of diffracted electrons, k is the electron wavelength, and a is the scattering angle. The diffraction pattern for partial alignment (Fig. 2(b)) shows significant blurring compared to the one from perfect alignment (Fig. 2(a)). We assume a 19 Å À1 s coverage in all patterns, which can be achieved experimentally with an electron beam with a kinetic energy of 100 keV and a detector with an angular aperture of 6.4 .
We have used a genetic algorithm that combines the patterns I partial and I random to retrieve a pattern I retrieved that approximates I perf ect . For single-axis alignment, if the angular distribution qðhÞ and I perf ect are known, the corresponding I partial can be obtained by convolving I perf ect with a point-spread function (PSF). 18 The PSF uses a known qðhÞ to map each pixel in I perf ect onto multiple pixels in I partial , which causes the blurring seen in I partial . It is impossible to find I perf ect from I partial via a simple deconvolution with the PSF, since the PSF here is not shift-invariant. The PSF mapping depends only on qðhÞ, i.e., it does not depend on the structure.
In the genetic algorithm, the simulated patterns I partial and I random with the corresponding PSFs are taken as the input. We start with a uniform pattern as the first guess of I retrieved . In each iteration, a small random change is made in I retrieved . The PSF mapping is then used to generate an I 0 partial and I 0 random from the I retrieved . The error is defined as the difference between the measured and the guessed patterns. Only changes that reduce the error are kept. The solution converges after about 10 5 iterations. The algorithm was run 20 times and the results were averaged to produce I retrieved . The retrieved scattering amplitude ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi I retrieved p is shown in Fig. 3(a), and the simulated ffiffiffiffiffiffiffiffiffiffiffiffi I perf ect p is shown in Fig. 3(b) for comparison. Figure 3 illustrates that the main features in I perf ect , such as the brightness and position of the local maxima, are retrieved accurately.
We have tested the genetic algorithm with different levels of shot noise. The same pattern is retrieved if there are at least 40 scattering events per pixel in a 70 Â 70 pixels diffraction pattern shown in Figure 3. For example, using an electron source with an energy of 100 keV, the minimum number of scattered electrons in the diffraction pattern is 4 Â 10 7 . For a gas jet with a molecular density of 10 15 cm À3 and a diameter of 200 lm, 10 10 incident electrons are required. For an experiment operating at a repetition rate of 10 kHz and 2 Â 10 3 electrons per pulse this means a total acquisition time of 500 s. The quality of the retrieved pattern starts to degrade for lower signal levels.
Once I retrieved was found, an iterative phase retrieval algorithm was used to determine the structure. For single-axis alignment, it is simpler to use cylindrical coordinates. For a single where G m ðR; fÞ is the m th -order expansion in cylindrical harmonics. 6,7 For single-axis-aligned molecules, the rotational averaging removes the dependence of the diffracted intensity IðR; fÞ on the azimuthal angle w For a given structure, G m ðR; fÞ can be calculated by where f k is the form factor of the k th atom, ðr k ; u k ; z k Þ gives the coordinates of the k th atom in cylindrical coordinates, and J m is the Bessel function of order m. 6,7 The reciprocal and real space expansion functions are connected by Fourier-Hankel transforms where ðr; u; zÞ are the coordinates in the molecular frame and g m ðr; zÞ is the m th -order expansion of the object in cylindrical harmonics. We selected a square area in I retrieved to compute the fast Fourier-Hankel transform. The selected area corresponds to 13.5 Å À1 Â 13.5 Å À1 in reciprocal space, which results in a 0.23 Å Â 0.23 Å pixel size in real space. A central circle with a radius of s ¼ 0.6 Å À1 is zeroed to simulate experimental conditions. The full 3-D object in real space f ðr; u; zÞ can be reconstructed by summing over all the g m f ðr; u; zÞ ¼ X m g m ðr; zÞ expðimuÞ: Each g m can be obtained from the full 3-D object f ðr; u; zÞ Each g m contains different information about the structure. g 0 is the projection of the molecule on the z-r plane. For m 6 ¼ 0, each g m contains the part of the object with m-fold rotational symmetry. Trifluorotoluene contains a benzene ring (2-fold rotational symmetry) and a trifluoromethyl group (3-fold rotational symmetry). The m ¼ 62, 4, 6,. orders contain the structure of the benzene ring, while the m ¼ 63, 6, 9,… orders contain the structure of CF 3 . Other orders, such as m ¼ 61, 5, 7, are zero. In a conventional phase retrieval algorithm, the amplitude is known and only the phase needs to be retrieved. Here, only the total diffracted intensity IðR; fÞ is known, and the amplitude and phase of each G m must be retrieved. We include all orders up to m ¼ 63 in the algorithm. In order to construct the full 3-D structure, at least the m ¼ 0, 62, 63 orders need to be included. We also included the m ¼ 61 orders, which are zero for trifluorotoluene, to ensure that the algorithm is run without any assumptions on the symmetry of the molecule.
The steps in the retrieval algorithm are similar to those found in Ref. 7. For the initial guess, we distributed the pattern uniformly over all m orders, jG m ðR; fÞj ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi IðR; fÞ=7 p , and associated each pattern with a random phase. In each iteration of the loop, an inverse Fourier-Hankel transform is applied on each G m to calculate g m using Eq. (7), and a 3-D object f ðr; u; zÞ is generated by combining all g m using Eq. (8). The real space constraints (described below) are applied on f ðr; u; zÞ, and then a new set of g m are computed using Eq. (9). Finally, the G 0 m are computed by Fourier-Hankel transform of g m (Eq. (6)), and a Fourier space constraint is applied on G 0 m to calculate G m . The flowchart of the algorithm is shown in Appendix B.
The Fourier space constraint is The algorithm is run for 1600 iterations, separated in 20 cycles with 30 iterations of the Hybrid Input-Output algorithm (HIO) 27 where f 0 n is the 3-D object in the n th iteration obtained using Eq. (8). We have used three real space constraints: a loose support, positivity in the real part of f, and atomicity. It has been shown that using the support and positivity constraints, the azimuthal projection of the object g 0 can be retrieved. 7 We have introduced the additional constraint of atomicity, which has allowed us to reconstruct all 7 g m . We used a fixed loose support of 30 Â 9 (z Â r) pixels. The size of the object (excluding the hydrogen atoms) in g 0 is 25 Â 6 (z Â r) pixels. The hydrogen atoms are not expected to be reconstructed because their scattering amplitude is much lower than that of carbon and fluorine. The third dimension u is set to be 20 pixels deep. The second constraint, positivity of the real part of the object, was verified numerically using the known atomic scattering phases. 26 The atomicity constraint is applied by converting the object from cylindrical coordinates f ðr; u; zÞ to Cartesian coordinates f ðx; y; zÞ. The local maxima are considered to be the atoms, i.e., a pixel is identified as an atom if its value is larger than the value of all 6 adjacent pixels and above a threshold that is set at 7% of the brightest pixel. A 3 Â 3 Â 3 voxel area centered at the identified atom is considered to be satisfying the atomicity constraint. Finally, the selected voxels are converted back to cylindrical coordinates. The atomicity constraint is updated every 20 iterations. To make the convergence faster, we apply a 2-D atomicity constraint in the last 10 out of every 100 iterations. The 2-D atomicity is a local-maxima search in g 0 , the projection of molecule in the z-r plane. In 2-D atomicity, a pixel that is brighter than all 4 adjacent pixels is considered a local-maximum. The brightest 15 local maxima are identified as 2-D atoms. For each 2-D atom, an adjacent 3 Â 3 pixels area is considered as satisfying the 2-D atomicity. The 2-D atomicity applies uniformly on the third dimension u. More 2-D atoms are kept than the actual number of 2-D atoms in g 0 , which is six. Note that an off-axis 2-D atom might represent one or more atoms in 3-D, depending on the rotational symmetry. Applying the atomicity constraint requires the atoms to have a separation of at least 2 pixels. In our case, the minimum bond length excluding hydrogen atoms is 1.32 Å , which corresponds to more than 5 pixels.
We define an error in real space E Real that is used for monitoring the convergence of the algorithm where S, P, and A represent the loose support, positivity, and atomicity constraints, respectively. P ðr;u;zÞ3X jf ðr; u; zÞj is the sum of the absolute value of f over all pixels that satisfy the constraint X. The g m corresponding to the minimum of E Real is returned as the reconstruction result. The algorithm was run 20 times, and the 2-D projection of the structure g 0 was reconstructed successfully 15 out of 20 times. Out of the 15 successful 2-D reconstructions, 9 exhibit the same rotational symmetries, and those are considered successful 3-D reconstructions. The average of the 9 successful reconstructions is taken as the final reconstruction result.
Figures 4(a)-4(d) show the absolute value of the reconstructed g m for m ¼ 0, 1, 2, and 3, respectively. For g m6 ¼0 , the discrete Hankel transform limits the reconstruction to r > 0. 30 Each off-axis 2-D atom in g 0 holds a certain rotational symmetry that can be retrieved from the g m6 ¼0 . For example, at the location of the upper off-axis atom, g 3 is brighter than g 1 and g 2 , indicating a 3-fold rotational symmetry. At the location of the two lower off-axis atoms g 2 is brighter than g 1 and g 3 , indicating a two-fold rotational symmetry. In the 3-D reconstruction, for the on-axis atoms, only g 0 is used, and, for each off-axis atom, g 0 and the g m6 ¼0 that is brightest at that position are used. Only an area of 3 Â 3 pixels is kept around the atom for the g m6 ¼0 . Figure 5 shows the retrieved 3-D object, along with the ball-and-stick model of the molecule shown in inset. The relative orientation between the benzene ring and the trifluoromethyl group depends on the relative phase between g 2 and g 3 . It was not possible to accurately retrieve this relative orientation, most likely because the diffraction pattern is not very sensitive to this angle. Once the angle is fixed, the position of each atom deviates by less than 0.1 Å from the theoretical values. The structure is clearly atomized, and both the benzene ring and the trifluoromethyl group can be easily identified. For this reconstruction, including rotational symmetries up to m ¼ 63 was sufficient. More complex molecules might require more orders to be included and, thus, more information to be retrieved from a single diffraction pattern. Extending the method to more complex molecules might require providing additional information, for example, by increasing the coverage of the diffraction pattern or using inputs from theoretical models of the structure.
In summary, we have demonstrated a two-step method for retrieving the full 3-D molecular structure from diffraction patterns of partially aligned asymmetric top molecules. A genetic algorithm was used to retrieve a diffraction pattern corresponding to perfect alignment from patterns corresponding to partially aligned molecules. Then an iterative phase retrieval algorithm in cylindrical coordinates was used to reconstruct the full 3-D molecular structure from the retrieved diffraction pattern. We have successfully tested this algorithm on a small asymmetric-top molecule with simulated diffraction patterns. The data were simulated using a 50 fs alignment pulse with 15 TW/cm 2 peak intensity and molecules with a 0.1 K initial rotational temperatures. We would expect the algorithm work for shot noise level of no less than 40 scattering events per pixel, on a 70 Â 70 pixel detector. Our method has the advantage that it does not require perfect alignment, which makes it compatible with diffraction experiments. In addition, our iterative phase retrieval is ab initio-it does not require any structural information of the molecule, this will be particularly important for determining the structure of intermediate states in ultrafast reactions. Moreover, the full 3-D structure can be reconstructed.
The work at UNL was supported by Chemical Sciences, Geosciences, and Biosciences Division, Office of Basic Energy Sciences, Office of Science, U.S. Department of Energy under    Figure 7 shows the flowchart of the iterative phase retrieval algorithm.