Efficient electronic structure calculation for molecular ionization dynamics at high x-ray intensity

We present the implementation of an electronic-structure approach dedicated to ionization dynamics of molecules interacting with x-ray free-electron laser (XFEL) pulses. In our scheme, molecular orbitals for molecular core-hole states are represented by linear combination of numerical atomic orbitals that are solutions of corresponding atomic core-hole states. We demonstrate that our scheme efficiently calculates all possible multiple-hole configurations of molecules formed during XFEL pulses. The present method is suitable to investigate x-ray multiphoton multiple ionization dynamics and accompanying nuclear dynamics, providing essential information on the chemical dynamics relevant for high-intensity x-ray imaging.

We present the implementation of an electronic-structure approach dedicated to ionization dynamics of molecules interacting with x-ray free-electron laser (XFEL) pulses. In our scheme, molecular orbitals for molecular core-hole states are represented by linear combination of numerical atomic orbitals that are solutions of corresponding atomic core-hole states. We demonstrate that our scheme efficiently calculates all possible multiple-hole configurations of molecules formed during XFEL pulses. The present method is suitable to investigate x-ray multiphoton multiple ionization dynamics and accompanying nuclear dynamics, providing essential information on the chemical dynamics relevant for high-intensity x-ray imaging. The advent of x-ray free-electron lasers (XFELs) 1,2 opens up a new era in science and technology. [3][4][5] Unprecedentedly ultraintense and ultrafast hard x-ray pulses generated from XFELs enable us to measure molecular structures on the atomic scale and to explore the structural dynamics on the femtosecond scale. One of the most prominent XFEL applications is imaging of biological macromolecules. X-ray crystallography with XFELs, after demonstration of the proof-of-principle, 6 has started to reveal previously unknown protein structure, 7 promising a breakthrough in structural biology (see reviews in Refs. [8][9][10][11]. Recent advances in time-resolved serial femtosecond crystallography [12][13][14][15][16] enable us to take a step forward towards molecular movies. To investigate molecular structure and structural dynamics with XFELs, one needs to understand radiation damage dynamics-x-ray multiphoton ionization dynamics and accompanying nuclear dynamics. Furthermore, the phase problem 17 is the bottleneck to reconstruct molecular structures in serial femtosecond crystallography as much as in conventional x-ray crystallography. To overcome the phase problem for x-ray crystallography with XFELs, one uses conventional phasing technique at intermediate x-ray intensity, 18 or one takes an advantage of the large degree of ionization at high x-ray intensity. The latter brings in highintensity phasing (HIP) methods, 19 including high-intensity multiwavelength anomalous a) haoyj@ustb.edu.cn b) ludger.inhester@cfel.de c) kota.hanasaki@cfel.de d) sangkil.son@cfel.de e) robin.santra@cfel.de 2329-7778/2015/2(4)/041707/17 V C Author(s) 2015 2, 041707-1 diffraction 20,21 and high-intensity radiation damage induced phasing. 22 The HIP techniques require detailed description of deep-inner-shell decay dynamics of heavy atoms influenced by the molecular environment. Therefore, understanding of radiation damage dynamics is the key for successful molecular imaging.
Modeling of biological macromolecules exposed to XFEL radiation was pioneered by the seminal work of Neutze et al. 23 Since then, there have been several computational tools to address molecular imaging problems, for example, classical molecular dynamics model, 24,25 particle-in-cell approach, 26,27 transport model, 28,29 and Coulomb complex model. 30,31 Some of these methods have been recently applied to ion fragment spectra 32 and electron spectra 33 from C 60 molecules interacting with intense x-ray pulses. So far, most computational methods have been based on the independent-atom model or the plasma model. The description of the molecular environment is omitted by assumption or incorporated in an ad hoc manner. When a molecule absorbs x-ray photons, inner-shell multiple ionization induces fragmentation dynamics. 34,35 Chemical bonds are weakened, and electrons and holes rearrange before the molecule breaks apart. [36][37][38] Detailed electronic structure calculations for chemical bonding and charge rearrangement are thus crucial to describe molecular effects in modeling of the XFEL-matter interaction.
The electronic response of atoms and molecules to XFEL pulses is in essence characterized by multiphoton multiple ionization dynamics. [39][40][41] The XATOM toolkit 42,43 has been developed to simulate the XFEL-atom dynamical interaction, and the ionization dynamics model has been tested with a series of experiments. [44][45][46][47][48] The unprecedentedly large number of x-ray photons delivered by an XFEL pulse induces sequential x-ray absorptions, creating a variety of different q-hole configurations for each charge state þq. To simulate ionization dynamics, one needs to calculate photoionization cross section, Auger rate, and fluorescence rate for each configuration and solve a set of coupled rate equations for the time-dependent populations of the configurations. 39,49,50 The q-hole electronic configurations are energetically highly excited, and theoretical treatment of such highly-excited states is challenging. Another challenge is the complexity of the ionization dynamics. Even for the atomic case, one must solve more than 2 Â 10 7 coupled rate equations for Xe L-shell-initiated ionization dynamics. 46 To address this formidable problem, a Monte-Carlo approach has been proposed for solving the rate equations 43,45 and sampling the most probable configurations. 46 In this scheme, the electronic structure is calculated for every single configuration selected on the fly as part of the Monte Carlo sampling. For the molecular case, the complexity increases even further because of the degrees of freedom associated with atomic motions, so the Monte Carlo approach seems to be the only way to simulate molecular response to an intense XFEL pulse. However, the most important question still remains: How to describe the electronic structure of molecules on the fly for the Monte Carlo method?
Here, we present a new efficient method for electronic structure calculation of polyatomic molecules and implement a dedicated toolkit, XMOLECULE. The proposed method is well suited for calculations of molecular multiple-hole configurations that are formed during x-ray multiphoton ionization dynamics. To efficiently describe molecular orbitals of core-hole configurations, the method employs atomic orbitals as basis functions that are numerical solutions of atomic core-hole states, calculated by XATOM. 42 For any given molecular electronic configuration and any given molecular geometry, XMOLECULE calculates molecular orbitals and orbital energies, which are essential components for dynamical simulations of x-ray multiphoton multiple ionization. We demonstrate that XMOLECULE is capable to calculate the whole spectrum of multiple-hole configurations at a given molecular geometry and potential energy surfaces for given multiple-hole configurations of molecules. Also, performance scalability with the system size is discussed. In this paper, we focus on the implementation of a molecular electronicstructure approach. Calculating cross sections and rates and solving coupled rate equations to simulate ionization dynamics will be described elsewhere. Having achieved these results, XMOLECULE aims to play a key role in molecular imaging at high x-ray intensity.
The paper is organized as follows. Section II formulates our scheme to calculate molecular multiple-hole configurations. It includes theoretical and computational schemes for basis function generation with numerical atomic orbitals, multicenter integration on a molecular grid, and direct Coulomb integral evaluation. In Sec. III, we show benchmark calculations for XMOLECULE, and then numerical results for the potential energy curves of various electronic configurations of carbon monoxide, and single-and double-core ionization potentials of several polyatomic molecules. We discuss the scalability of our scheme to a molecular size of hundreds of atoms. This is followed by the conclusion in Sec. IV.

A. The Hartree-Fock-Slater method
We consider a molecular system composed of N atom atoms with N elec electrons. The Ath nuclear charge and coordinates are denoted by Z A and R A , respectively. The molecular charge state þq is given by q ¼ P A Z A À N elec . We use the Hartree-Fock-Slater (HFS) method in which molecular orbitals (MO), w i ðrÞ, and orbital energies, e i , are obtained by solving the effective single-electron Schrödinger equation (atomic units are used unless specified otherwise), where V ext ðrÞ is the external potential due to the nuclei, and the Hartree potential V H ðrÞ represents the classical Coulomb interaction among the electrons, and the last term V X ðrÞ represents the exchange interaction, which is approximated by the Slater exchange potential, 51 The electronic density qðrÞ is obtained by the sum of squared MO's weighted by the occupation numbers fn i g as where n i 2 f0; 1; 2g. In contrast to conventional ground-state electronic structure calculations, in which the N elec spin-orbitals with the lowest energies are filled, we consider all possible fn i g subject to P i n i ¼ N elec , in order to take account of electronic excited states representing q-hole configurations.
The total energy within the HFS method is given by the sum of the nucleus-nucleus repulsion energy and the electronic energy,

B. Linear combination of numerical atomic orbitals
For atomic systems, the orbital is represented with spherical harmonics as where n, l, and m are the principal quantum number, the orbital angular momentum quantum number, and the associated projection quantum number, respectively. The radial wavefunction u nl ðrÞ can be solved by a numerical grid-based method. The XATOM toolkit 42 has been developed to solve the atomic HFS equation. By employing the generalized pseudospectral (GPS) method 52,53 and imposing a spherically symmetric potential, XATOM accurately calculates u nl ðrÞ for a given (n, l)-subshell, and accordingly / l ðrÞ for a given l ðn; l; mÞ. This numerical atomic orbital has been used to successfully calculate multiple-hole configurations formed during x-ray multiphoton ionization dynamics in the atomic case. 41,42 For molecular systems, we employ the linear combination of atomic orbitals (LCAO) scheme to construct molecular orbitals, where / l ðrÞ is the lth atomic orbital (AO) and C li is the coefficient of the lth AO for the ith MO. Using Eq. (8) transforms the self consistent field (SCF) Eq. (1) into the corresponding Roothaan-Hall equation, 54 HC ¼ SCE; where E is a diagonal matrix of MO energies and C is the MO coefficient matrix. The elements of the Hamiltonian matrix H and the overlap matrix S are given as where the effective potential V eff ðrÞ V ext ðrÞ þ V H ðrÞ þ V X ðrÞ. Equation (9) is solved in a selfconsistent manner. To accelerate convergency, we employ the direct inversion in the iterative subspace (DIIS) method. 55,56 When we encounter convergence problems at large bond distances, where the energy gap between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) is very small, we apply level shifts 57 in the SCF iterations.
Here, our choice of basis set for the LCAO scheme is the numerical atomic orbitals (NAOs) obtained by XATOM described above. In Fig. 1, we plot the squared radial function  ju nl ðrÞj 2 for the 1s, 2s, and 2p orbitals of the ground state of the neutral nitrogen (N) atom, the single-core-hole (SCH) state of N þ , and the double-core-hole (DCH) state of N 2þ , respectively. Comparison among different core-hole states shows significant deformation of valence orbitals in states with core holes. To cover these effects efficiently in the molecular calculation, we use NAOs that are numerical solutions of the corresponding atomic core-hole states. For instance, N 2þ 2 with one core hole at each atomic site (a DCH state) is calculated with basis functions optimized for N þ ð1s À1 Þ on both N atoms, whereas N 2þ 2 with a single-site DCH state is calculated with basis functions optimized for N 2þ ð1s À2 Þ on which the core hole is located and basis functions optimized for neutral N on the other side. In this way, we expect core-hole MOs are well described by core-hole-adapted NAOs.
To achieve utmost efficiency towards complex ionization dynamics, we employ the minimal basis set. Each AO with (n, l, m) in Eq. (7) corresponds to a single basis function. Fully or partially occupied (n, l)-subshells contribute to a set of basis functions and each l gives ð2l þ 1Þ basis functions (jmj l). For example, the N atom has 1s, 2s, and 2p (partially) occupied subshells, which constitute 5 basis functions (/ 1s ; / 2s ; / 2p x ; / 2p y , and / 2p z ) in total. This basis set is denoted as [2s1p]. According to the minimal-basis-set scheme, the chemical elements from B to Ne have the same number of basis functions (N basis ¼ 5). In Sec. III A, we will discuss limitations and extensions of the minimal-basis-set scheme.

C. Molecular grid and multicenter integration
Equations (10) and (11) require evaluation of the corresponding integrals in three dimensions. In our case, the / l ðrÞ and / ðrÞ are represented with a radial grid and spherical harmonics. To perform 3D integrals involving many atomic centers, we employ the multicenter integration proposed by Becke. 58 Molecular grid points are constructed as a combination of sets of atomic grid points. Each set of atomic grid points, centered at one of the nuclei, consists of N r radial grid points and N ang angular grid points. The radial grid points are exactly the same as those used for NAO calculations with the GPS method. 52,53 The angular grid points are obtained by the Lebedev grid scheme 59 with an angular momentum cutoff at l max . The number of angular grid points is approximately given by N ang % 4ðl max þ 1Þ 2 =3. A detailed description of constructing multicenter molecular grid points is found in Refs. 60 and 61. We use an atomic radial grid size (r max ) large enough ($10 Å ) so that the atomic grids of many neighboring atoms overlap with each other. In principle, different atomic grid parameters can be used for individual atoms in a molecule. For convenience, however, we use the same grid parameters for all atoms. Then, the total number of molecular grid points is given by Becke's multicenter integration scheme 58 introduces a set of smooth nuclear weight functions fw A ðrÞg, subject to the constraint P A w A ðrÞ ¼ 1. The nuclear weight functions are generated by the third-order polynomial cutoff profile in the fuzzy cell scheme. 58 Then, any integral of a given function f can be evaluated by the sum of individual atomic integrals, where r A r À R A . Each atomic integral can be readily performed using the spherical coordinate system of r A , centered at the Ath atom, where k is the index of the grid points of the Ath atom and w k is defined as a product of the radial Legendre-Gauss-Lobatto quadrature weights 62,63 and the angular Lebedev quadrature weights. 59

D. Implementation of direct Coulomb integrals
In electronic structure calculations, one of the most time-consuming parts is the evaluation of electron repulsion integrals. In order to achieve fast calculation within a desired accuracy, we have developed a multipole expansion scheme with an adaptive cut off. First, the integral involved in the Hartree potential in Eq. (3) can be decomposed into individual atomic integrals, where q A ðrÞ qðrÞw A ðrÞ. Each single-center density q A ðrÞ can then be regarded as the atomic contribution to the total electronic density. To implement the integral, we expand the singlecenter density with real spherical harmonics Y lm ðh; uÞ as where q A lm ðrÞ is the (l, m)-component of the spherical expansion, With this single-center decomposition and spherical harmonic expansion of the electronic density, q A lm ðrÞ, the Hartree potential in Eq. (3) is obtained as where V A lm is given by where r < ¼ minðr 0 A ; r A Þ and r > ¼ maxðr 0 A ; r A Þ. This radial integral is numerically evaluated in combination with various truncation methods (see the Appendix).

E. Molecular electronic configuration
Keeping the energetically lowest orbitals doubly occupied, the SCF procedure obtains the HFS solution for the electronic ground state. In order to obtain a solution for an excited electronic state of a q-hole configuration, each molecular orbital has to be assigned a specific occupation number. This can be done, as in the ground state calculation, by identifying the orbitals by their HFS energy eigenvalue. However, during the SCF iterations, the energetic order of MOs may change. Thus, identifying the orbitals by ordering them according to their HFS energy eigenvalue may lead to failure of the above SCF procedure or yield a solution for a different electronic state than required. This is called variational collapse. [64][65][66] To prevent this situation, we employ a variant of the maximum overlap method. 66 In the maximum overlap method, the desired excited electronic state is specified by a set of initial guess orbitals fw guess j g in combination with a set of occupation numbers fn j g. In each SCF iteration, the occupation number n i of the calculated orbital w i is chosen according to its projection onto the subspace spanned by the guess orbitals with respective occupation number. Specifically, we calculate the overlap of the ith current MO with the jth guess MO, whereS l ¼ Ð d 3 r / l ðrÞ/ guess ðrÞ. Note that the basis set for the initial guess orbitals is not necessarily the same as the one used for the expansion of the actual molecular orbitals, because different NAOs can be used for different q-hole configurations. Therefore,S l can be different from the overlap matrix S l defined in Eq. (11). Then, the projections of the ith orbital onto the span of the guess orbitals for the unoccupied (n ¼ 0), singly occupied (n ¼ 1), and doubly occupied (n ¼ 2) cases are given by where j runs over all initial guess orbitals whose occupation number n j equals n. To preserve the character of the required electronic configuration during the SCF procedure, we choose the set of the occupation numbers of the current orbitals, fn i g, such that P i P ðn i Þ i is maximized, while the total number of doubly and singly occupied orbitals is maintained.
This procedure to determine the orbital occupation critically depends on the initial guess MOs. Thus, it is essential that the provided guess MOs fw j g together with the provided occupation numbers fn j g describe a wavefunction that is close to the required solution. For the calculations performed here, we choose initial guess MOs obtained from a previous calculation for a lower ionized electronic state or for the same electronic state with an altered molecule geometry. For the single-core-hole state in N 2 , we obtain a localized core hole on a specific nucleus by performing a Boys-orbital-localization procedure 67 of the two guess core orbitals. Having obtained a converged solution, we verify that the obtained set of MOs is indeed close to the initial guess, by inspecting the individual overlap O ij .

A. Benchmark calculations
We first estimate the accuracy of our calculations using the numerical multicenter integration in comparison with conventional calculations using the analytic Gaussian integration by GAMESS. 68 Here, we employ the 6-31G Gaussian basis set 69 to calculate the SCF-level groundstate energy of a water molecule. The internuclear distance of RðOHÞ ¼ 0:957 Å and the bond angle of ffðHOHÞ ¼ 104:48 are used. Only in this test, we employ the restricted Hartree-Fock (RHF) method instead of the HFS method, in order to directly compare with the GAMESS results. Figure 2 shows that our numerical calculations converge to the GAMESS results as the number of radial grid points per atom (N r ) and the number of angular grid points per atom (determined by l max ) are increased. The total number of molecular grid points for N r ¼ 50 and l max ¼ 8 is 3 Â 50 Â 110 ¼ 16500. The maximum radius r max ¼ 20 a.u., and the GPS mapping parameter 52,53 L ¼ 1 a.u. is used. Note that all grid parameters utilized here provide a numerical accuracy jDEj < 1:5 eV. If chemical accuracy is required (typically 1 kcal/mol % 0.04 eV), our study for the water molecule shows that it is achievable with N r ! 200 and l max ! 11, keeping the same L and r max . As to be shown in Sec. III B, the energy scale of x-ray-induced dynamics of highly-charged molecules will extend into the keV regime. Therefore, the worst grid parameters (for example, N r ¼ 30 and l max ¼ 4) shown in Fig. 2 would be sufficient to describe the molecular ionization dynamics at high x-ray intensity.
We next examine the performance of our NAO basis set scheme. In Fig. 3, we show the calculated HFS energy of (a) the ground state of neutral N 2 molecule with NAOs optimized for neutral N atom and (b) the quadruple-core-hole (QCH) state of N 4þ 2 ion with NAOs optimized for the DCH state of N 2þ . The internuclear distance R ¼ 1.096 Å is fixed. N r ¼ 200, L ¼ 1 a.u., r max ¼ 20 a.u., and l max ¼ 11 are used. The results are shown together with those obtained by the equivalent calculations using conventional Gaussian-type-orbital (GTO) basis sets of different sizes (STO-3G 70 and a series of Dunning's correlation-consistent basis sets; 71 All GTO basis sets are obtained from the EMSL Basis Set Library 72 ). DE is the energy difference from the total energy calculated with the uncontracted version of cc-pV6Z, [16s10p5d4f3g2h1i] with 161 basis functions, which is considered here the complete basis set limit. Thus, DE indicates the numerical error due to lack of basis functions.
In both Figs. 3(a) and 3(b), one can see that the minimal NAO basis set is superior to the conventional minimal basis set of STO-3G, illustrating that fully optimized NAOs are a practical choice for the basis set in the LCAO scheme. Also, Fig. 3 shows convergency of GTOs with respect to the number of basis functions. Interestingly, the conventional GTOs for QCH N 4þ 2 perform almost one order of magnitude less accurate than GTOs for neutral N 2 . The reason is that GTOs are optimized to be used for neutral ground-state calculations. In contrast, NAOs optimized for corresponding atomic q-hole configuration provide similar accuracy for both neutral N 2 and QCH N 4þ 2 . Thus, NAO functions provide an ideal basis set for our minimal-basisset HFS scheme.
To improve accuracy, we try to increase the number of NAOs in a systematic manner by including unoccupied atomic orbitals with higher (n, l) such as 3s, 3p, and so on. As shown in Fig. 3, the NAOs are somewhat inefficient to achieve higher accuracy by simply extending to higher (n, l), as previously reported in Ref. 73. This is attributed to the fact that additional series of higher (n, l)-orbitals, whose mean square radius is far from the atomic center, are inefficient for representing bonding molecular orbitals. Instead, we propose a scheme for adding compact p-type and d-type functions to the minimal NAO basis set in order to improve the  description of chemical bonding. Additional functions are constructed by use of radial wavefunctions of occupied subshells multiplied by r, where r is the radial coordinate in the atomic system. For the chemical elements from B to Ne, the p-type functions are u 2s ðrÞY 1m ðh; uÞ, where m ¼ 0; 61, and the d-type functions are u 2p ðrÞY 2m ðh; uÞ, where m ¼ 0; 61; 62. By adding these functions, as denoted by extended NAO (NAO[e]) and as marked with the black rectangle in Fig. 3, the accuracy is much improved; the total energy of neutral N 2 is close to the cc-pVDZ level, and the total energy of QCH N 4þ 2 is close to the cc-pVQZ level. The number of basis functions for NAO[e] is only 13 per atom, whereas cc-pVQZ has 55 basis functions. There have been several approaches for extension of the minimal NAO basis set, 73,74 where additional basis functions are constructed in a systematic way. Figure 4 shows the HFS total energies in Eq. (6) using core-hole-adapted NAO basis functions for all possible q-hole configurations that can be accessed by x-ray multiphoton ionization of the neutral carbon monoxide molecule. The internuclear distance R ¼ 1.128 Å is fixed, and the grid parameters of N r ¼ 50, L ¼ 1 a.u., r max ¼ 20 a.u., and l max ¼ 8 are used. For convenience, the figure shows these configurations grouped into charge states. The lowest horizontal line for each charge state indicates the ground-state energy for a given charge þq. This figure then illustrates how much energetically excited the q-hole configurations are. For example, the energy of DCH CO 2þ (O1s À2 ) is about 1 keV higher than the ground-state energy of CO 2þ . Ionization dynamics induced by intense x-ray pulses may occur step by step, visiting lots of these electronic states. Therefore, it is crucial to efficiently calculate this set of electronic states of q-hole configurations.

B. Potential energy curves for various hole configurations
We further investigate the behavior of the potential energy curves (PEC) obtained using the NAO basis set. In Fig. 5 converged. 75,76 Our NAO[e] scheme reproduces well the cc-pVTZ results, even though the size of NAO[e] (N basis ¼ 13) is much smaller than that of cc-pVTZ (N basis ¼ 30). For comparison, we also plot PECs with NAOs optimized for neutral ground-state atoms, denoted by NAO[n], which shows a trend similar to what a conventional STO-3G minimal basis set would be. The NAO[n] results represent a poor estimate of PECs due to missing the core-hole effect on orbitals. On the other hand, PECs from NAOs, which are optimized for atomic core-hole states, show dramatic improvement over NAO[n], even though NAO and NAO[n] have the same number of basis functions (N basis ¼ 5).

C. Single-and double-core ionization potentials of molecules
To further test the accuracy of our calculation scheme, we compare core ionization potentials for a series of small molecules obtained from the HFS calculation. The molecular geometries are taken from Ref. 77 and the grid parameters are the same as those used in Sec. III B. We derive the single-core ionization potential from the HFS orbital energy of a neutral groundstate calculation using NAOs with and without additional basis functions. The double-core ionization potential is calculated as the sum of the first and the second core ionization potential, where the second core ionization potential is taken from the orbital energy of the SCH state calculation. For the DCH states with core holes on different nuclear sites, thus, two values are obtained for the two different ionization sequences. Table I lists the ionization potentials compared with the values obtained from completeactive-space SCF (CASSCF) calculations 75 and experimental values. [78][79][80][81][82][83] For our calculations of two-site DCH states, a mean value and a deviation are listed for two values from the different ionization sequences. For the CASSCF results of two-site DCH states, only triplet spin states are listed and the difference between singlet and triplet states is smaller than 0.7 eV. Note that N 2 O is a linear molecule N t -N c -O, where N t indicates the terminal N atom and N c means N at the center. As can be seen, the CASSCF results 75 show agreement within less than 4 eV with the available experimental values. The single ionization potentials we extract from the much simpler HFS calculation using the minimal NAO basis set show for all molecules a similar agreement within 5.1 eV, except F1s À1 in LiF (28 eV). For the DCH states, where the core holes are located on different nuclear sites, with the minimal NAO basis set, we also see a similar agreement within 7 eV to the CASSCF values and, where available, the experimental values. Again, LiF is an exception showing a much larger discrepancy of ' 30 eV. For the DCH state with core holes on the same nucleus, we find a systematically larger disagreement of about 20-30 eV (for F1s À2 in LiF 78 eV).
The inclusion of the p-type and d-type functions in the basis set leads in most cases to a larger deviation to the literature values than the results obtained with the minimal basis set. For these calculations, we get ionization potentials that tend to be lower than the literature values (from 3.4 eV for Li1s À1 to 60.5 eV for F1s À2 in LiF). Clearly, the extended NAO basis set should improve the quality of the electronic structure model, as the electronic wave function FIG. 5. Potential energy curves of CO 2þ double-core-hole states as a function of the internuclear distance R. Energy is given relative to the ground-state energy of neutral CO. has more flexibility. Thus, we conclude that the good agreement with the minimal basis set might be an artifact due to cancellation of errors.
For the results obtained with the larger basis set, we attribute the remaining deviations to the literature values mainly to relaxation energy contributions associated with the core hole electron removal. The applied scheme of taking orbital energies as ionization potentials cannot account for these effects. For core holes on the same nuclear site, where the core hole relaxation contributions are particularly strong, we see the strongest deviations (18.2-60.5 eV). Also, the extreme deviations for LiF may be explained from these contributions: The core hole on the F atom in LiF shows a particular large core hole relaxation effect, whereas for the core hole on Li it is very small. 75

D. Performance scaling
Our implementation of XMOLECULE aims for large-scale molecular calculations, especially for a large number of repeated calculations where time and resources available for each calculation are severely limited. At the same time, it requires the capability of calculating moderatesize systems in order to describe molecular-environment effects. Here, we demonstrate the performance scalability of our scheme toward molecular calculations with a few hundred atoms. Our grid-based method has the potential to achieve linear scaling in the number of atoms. [85][86][87][88] In the HFS method, the two-body interaction is divided into the exchange interaction and the direct Coulomb interaction. The former is replaced with the local density approximation, and the latter is treated with the Hartree potential as described in Sec. II D. The computational complexity of the Hartree potential is OðN 2 grid Þ, where N grid is linearly proportional to N atom , because the potential V H ðrÞ in Eq. (3) contains the integral over molecular grid points and has to be evaluated at every single molecular grid point. By introducing the truncation methods described in the Appendix, this complexity can be reduced to OðN grid N atom Þ. These truncation schemes do not change the quadratic scaling behavior with respect to N atom , but reduce the actual computational time by several times (for example, a factor of two in our following calculations).
Another truncation can be made in the evaluation of one-body matrix elements in Eqs. (10) and (11). Both H l and S l are decomposed into atomic contributions by the multicenter integration: H l % P A H A l and S l % P A S A l . We define an AO pair / l ðrÞ/ ðrÞ and its contribution to each atomic grid, Then we set H A l and S A l to zero if Q A l < e, where e is a control parameter. The complexity of the integrals in Eqs. (10) and (11) is OðN 2 basis N grid Þ, where both N basis and N grid are linearly proportional to N atom . By using our truncation scheme described above, we can reduce it to a quadratic behavior with respect to N atom . Figure 6 shows the size dependence of the computation time of XMOLECULE with the current truncation schemes. We calculate the HFS ground state of C 24 H 12 molecule (coronene) in its equilibrium molecular geometry taken from Ref. 77. Then, we perform calculations for n such molecules (n ¼ 1; …; 7) stacked in the vertical direction with an interlayer separation of 3.3 Å . The minimal NAO basis set is used with N r ¼ 20, L ¼ 1 a.u., r max ¼ 10 a.u., and l max ¼ 4. The y axis is the CPU time per SCF iteration in seconds on a lab workstation (Intel Xeon X5660 2.80 GHz), and the x axis indicates the number of atoms in the stacked (C 24 H 12 ) n molecule. When all truncations are off (blue curve), the computational performance shows close to a cubic dependence. On the other hand, when the truncation method of Eq. (21) is applied with e ¼ 10 À3 (red curve), the scaling shows a quadratic dependence on the system size. Note that when the truncation of Eq. (21) is used, the complexity of the matrix element calculations is FIG. 6. Performance scaling with respect to the molecular size. The y axis is the CPU time per SCF iteration in seconds, and the x axis is the number of atoms in stacked (C 24 H 12 ) n molecules. The dotted lines with OðN 2 atom Þ and OðN 3 atom Þ indicate a quadratic behavior and a cubic behavior, respectively, with respect to the number of atoms, N atom . reduced to a quadratic relation, while the Hartree potential calculation becomes the most timeconsuming step, which is also governed by a quadratic scaling. The difference in the total energy between the calculations with and without this truncation is less than 0.14 eV/atom, whereas the truncated calculation is about 7.5 times faster than the calculation with no truncation. The calculation with 216 atoms (n ¼ 6) takes 40 second per single SCF iteration on the lab workstation. The whole computation time takes about 14 min including the overhead costs for numerical grid construction and 12 SCF iterations. When additional truncation schemes for the Hartree potential (see the Appendix) are applied with e 0 ¼ 0:1 and e 1 ¼ 0:01 (green curve), the complexity is a bit reduced towards a linear relation and the errors in the total energy are less than 0.93 eV/atom. The actual computational time per iteration is improved by a factor of two for the 216-atom case.

IV. CONCLUSION
In summary, we present a new method to calculate various multiple-hole electronic states for polyatomic molecules that may be formed by x-ray multiphoton ionization dynamics at high x-ray intensity. The method is based on the Hartree-Fock-Slater method, employing the LCAO scheme, where NAOs are used as a minimal basis set for molecular orbital calculations. Usage of NAOs has two advantages over conventional Gaussian-type basis functions. First, NAOs are obtained from numerical solutions for atomic core-hole states at the same computational level. Second, accuracy and efficiency of numerical integration with NAOs are controllable by grid parameters and truncation schemes. The NAOs presented here are accurately solved by using the numerical grid-based method that is implemented in the XATOM toolkit.
Using core-hole-adapted NAOs, molecular orbitals for core-hole states are efficiently calculated. We present benchmark calculations for multiple-core-hole states of N 2 . The NAO results show consistent accuracy for different charge states, which is not the case for conventional basis sets that are optimized for neutral systems. We demonstrate that our scheme is able to calculate all possible configurations that may be formed by removing zero, one or more electrons from the ground-state configuration of neutral CO molecule. The electronic state during x-ray multiphoton ionization dynamics may visit several of these multiple-hole configurations, which are energetically excited by about 4 keV with respect to the ground-state configuration of neutral CO. For molecular and ionization dynamics during XFEL pulses, we need not only all different multiple-hole states but also potential energy surfaces for individual electronic states. For double-core-hole states of CO 2þ , we calculate potential energy curves with core-hole-adapted NAOs, in good agreement with converged results with respect to the basis-set size. Also, we present single-and double-core-hole ionization potentials for several molecules in comparison with available theoretical and experimental data.
Efficient electronic structure calculations for molecules are essential for dynamical modeling of molecules at high x-ray intensity. We have implemented XMOLECULE to make a step toward dynamical simulation of molecular imaging with XFELs. Calculations of photoionization cross sections, fluorescence rates, and Auger rates for all possible configurations formed during molecular ionization dynamics are in progress.
Then, we define to be used as a truncation indicator. Note that the number of electrons in the Ath atomic electronic density is given by Within the atom, we consider higher multipole moments of the density to be less relevant. Thus, if d A lm is small enough in comparison with d A 00 , then the contribution of l and m is truncated, i.e., where e 1 is a truncation control parameter. Another truncation is that if the distance from the origin of the Ath atom is large enough, the Hartree potential contributed from A is approximately evaluated by the monopole only, and all l > 0 contributions are truncated, i.e., V A lm ðr A Þ ! 0 when r A > r c ; where r c is a cut-off radius given by r c ¼ Q A =e 0 ¼ ffiffiffiffiffi ffi 4p p d A 00 =e 0 . Here, e 0 is another truncation control parameter.