Simulation of electron energy loss spectra of nanomaterials with linear-scaling density functional theory

Experimental techniques for electron energy loss spectroscopy (EELS) combine high energy resolution with high spatial resolution. They are therefore powerful tools for investigating the local electronic structure of complex systems such as nanostructures, interfaces and even individual defects. Interpretation of experimental electron energy loss spectra is often challenging and can require theoretical modelling of candidate structures, which themselves may be large and complex, beyond the capabilities of traditional cubic-scaling density functional theory. In this work, we present functionality to compute electron energy loss spectra within the onetep linear-scaling density functional theory code. We first demonstrate that simulated spectra agree with those computed using conventional plane wave pseudopotential methods to a high degree of precision. The ability of onetep to tackle large problems is then exploited to investigate convergence of spectra with respect to supercell size. Finally, we apply the novel functionality to a study of the electron energy loss spectra of defects on the (1 0 1) surface of an anatase slab and determine concentrations of defects which might be experimentally detectable.

Experimental techniques for electron energy loss spectroscopy (EELS) combine high energy resolution with high spatial resolution. They are therefore powerful tools for investigating the local electronic structure of complex systems such as nanostructures, interfaces and even individual defects. Interpretation of experimental electron energy loss spectra is often challenging and can require theoretical modelling of candidate structures, which themselves may be large and complex, beyond the capabilities of traditional cubicscaling density functional theory. In this work, we present functionality to compute electron energy loss spectra within the onetep linearscaling density functional theory code. We first demonstrate that simulated spectra agree with those computed using conventional plane wave pseudopotential methods to a high degree of precision. The ability of onetep to tackle large problems is then exploited to investigate convergence of spectra with respect to supercell size. Finally, we apply the novel functionality to a study of the electron energy loss spectra of defects on the (1 0 1) surface of an anatase slab and determine concentrations of defects which might be experimentally detectable.
Keywords: EELS, ELNES, linear scaling, defects, titanium dioxide, theoretical spectroscopy, electron energy loss spectroscopy (Some figures may appear in colour only in the online journal) Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. challenging, due to the poor scaling of traditional density functional theory methods with system size. This is espe cially frustrating as many technologically and scientifically interesting systems can only be modelled using hundreds to thousands of atoms-examples include whole nanoparticles, grain boundaries, wellconverged isolated defects, and thin film surfaces.
In this work, we propose a means to overcome this systemsize barrier by implementing functionality for EELS simulation within the framework of a code suitable for very largescale calculations, onetep [8]. Total energy and force calculations are available in onetep with linearscaling com putational effort, due to a combination of methods based on optimisation of a representation of the single electron density matrix using a highlyefficient set of in situ optimized local orbitals and sparse matrix techniques making use of a hybrid OpenMPMPI parallel stratergy [9]. While our proposed method for EELS simulations requires a oneoff ( ) O N 3 diago nalization to obtain Kohn-Sham eigenstates, the minimal basis set means computational costs are reasonable for sys tems up to around three to four thousand atoms.

Methods
Simulation of EELS using density functional theory is most commonly achieved using Fermi's Golden Rule to obtain the imaginary part of the dielectric function in terms of dipole matrix elements between core and conduction band states. In atomic units, this gives: where ω is the transition energy, Ω is the volume of the unit cell, ψ i are (allelectron) conduction band states, ψ c is a core state, with respective energies E i and E c . r is a position oper ator defined as the displacement from the nucleus whose core electrons are being excited, and q is the momentum transfer. The δfunction conserves energy and can be replaced with a Gaussian to introduce appropriate broadening.
In the dipole approximation this expression becomes: where we have expanded the complex exponential to first order, noting that the orthogonality of wavefunctions removes the constant term in the Maclaurin series. This form is espe cially useful as the momentum transfer can be supplied during postprocessing (see section 2.4) and thus many different (small) momentum transfers can be investigated using the results of a single DFT calculation. The onetep code implements a linear scaling density func tional theory [10,11] (LSDFT) scheme based on the density matrix formalism [8]. The density matrix is represented in terms of a set of localized basis functions [12] referred to as nonorthogonal generalized Wanier functions-NGWFs [13], φ α , and a density kernel αβ K : It is known that materials with a band gap exhibit 'near sightedness' [14]: their density matrix decays exponentially with | − | ′ r r . It is therefore possible to impose a rangebased truncation on the density kernel so that it becomes a sparse matrix [15].
The NGWFs are expressed in terms of an underlying basis of periodic sinc (psinc) functions, which have been shown to be equivalent to a planewave basis [16]. The NGWFs are strictly localized within a sphere of a chosen cutoff radius cen tred on the atom to which they are attached. The onetep code uses a nested loop optimisation method: in the outer loop, NGWFs are optimized using a conjugate gradient algorithm to minimize the total energy; for each outer loop step, the den sity kernel is optimized to minimize the energy for the cur rent NGWFs subject to the conditions that the density matrix remains idempotent and electron number is conserved.
The underlying psinc basis permits use of fast Fourier transforms (FFTs) to obtain reciprocal space representations, such as for nonlocal projectors and for the kinetic energy operator. To increase the efficiency of FFTs in large systems, we make use of structures called FFT Boxes. These are small subspaces of the simulation cell, centred on a given NGWF and large enough to completely contain all NGWFs which overlap with it [13].

Conduction optimisation
The nested loop optimisation method produces a kernel and NGWF set which are optimized to represent the valence manifold accurately and efficiently. However, these NGWFs often represent unoccupied conduction states rather poorly. To obtain an accurate representation of the lowlying conduction band states, we follow the procedure described in Ratcliff et al and introduce a second kernel and a second set of NGWFs: ( ) χ α r . These conduction NGWFs are optimized to represent the lowlying conduction states [17]. They can be combined with the valence NGWFs to produce a joint representation in which all valence and conduction eigenstates can be accu rately represented.

Projector augmented wave
In onetep the projector augmented wave (PAW) formalism of Blöchl can be used [18,19] to recover allelectron results from calculations including only valence electrons explicitly. PAW enables calculations with a much smaller planewave basis (or, equivalently, a smaller underlying psinc basis) than would be required for either an allelectron or normcon serving pseudopotential approach.
Allelectron matrix elements of the dipole operator between conduction band eigenstates and core states are required for simulated EELS. In PAW these take the form [20] Here ψ i is an allelectron conduction band wavefunction, ψ i is the corresponding pseudowavefunction, ψ c is a core wavefunction associated with a particular atom, ν p is a pro jector and ϕ ν and ϕ ν are pseudo and allelectron partial waves respectively.

Implementation of EELS simulation
In a postprocessing step after a converged calculation, the Kohn-Sham Hamiltonian matrix expressed in the NGWF rep resentation is diagonalized to obtain the Kohn-Sham (pseudo) wavefunctions ψ i in terms of NGWF coefficients ( ) † α M i : The calculation of matrix elements then proceeds according to (3), via three steps: (i) Matrix elements are computed on the Cartesian grid between NGWFs and the core state ⟨ ⟩ φ ψ | | α r c , and between NGWFs and projectors, ⟨˜⟩ φ | α ν p .
(ii) The PAW correction term is calculated, taking the form calculating the partial wave terms on a logarithmic radial grid to ensure high accuracy. (iii) The above two terms are combined and the result is mul tiplied by the NGWF wavefunction coefficient matrix to produce matrix elements between allelectron conduction wavefunctions and core wavefunctions.
The first step requires the generation of kets of the form ⟩ ψ | r x c on a regular real space grid. Note that the PAW formalism means that is not necessary to accurately reproduce the part of the core orbital which lives within the PAW sphere on the reg ular grid: the radial grid terms will account for that part of the matrix element. This means that the method remains suitable even for tightlyconfined core orbitals of higher Z elements, for which the PAW radial grid terms account for almost all of the matrix element. For firstrow and secondrow elements, however, the confinement even of the 1s orbitals is not so tight, and the first term in (3) must be reproduced accurately.
The most straightforward approach to generating ⟩ ψ | r x c would be to transform the core orbitals directly to the real space grid in an FFT box centered on the atom, and multiply by the position operator before integrating the product of this function and the NGWF. However, it was determined that due to the high spatial frequencies of core orbitals, this approach is not sufficiently accurate on a Cartesian grid of feasible spacing.
Instead, we use a Fourier space method for applying the position operator: This approach gives considerably higher accuracy in repro duction of the core orbitals since it calculates directly the Fourier transform of the product ( ) ψ r r c Note that the diagonalisation of the Hamiltonian in the basis of NGWFs introduces a cost of ( ) O N 3 to an EELS calcul ation. This diagonalisation is, however, a oneoff calculation per system and its cost will only become significant compared to the cost of NGWF optimisation for very large systems, well over the 2000-4000 atom systems we aim to target with this methodology.

Calculation of spectra
Using (3) provides matrix elements which can be combined with (1) to provide spectra, subject to appropriate broad ening via convolution with a suitable function. This is usu ally a Gaussian and/or a Lorentzian, whose widths are usually chosen so as to approximately match the broadening in a corresponding experiment, due to lifetime and instrumental effects. For this operation we rely on the OptaDoS code [21]. This code supports a number of different broadening schemes: here we will use fixed broadening in most cases, with energy dependent lifetime broadening in selected cases, as indicated in the figure captions. The OptaDoS code also accepts a momentum transfer parameter, a unit vector in the direction of the momentum transfer. For our simulated spectra an isotropic average over directions was taken.
In the prediction of spectra for solids, it is often necessary to use a high density of kpoints for Brillouin zone integration to achieve a wellconverged spectrum. In linearscaling DFT approaches it is more common simply to use a larger supercell with periodic replicas of the primitive cell, which produces an effective kpoint sampling equal to the number of repeats of the primitive cell in each direction.

Core holes
The simulation of the electron energy loss process using Fermi's Golden Rule within KSDFT neglects the interaction between the excited electronhole pair. A reasonable approx imation which is widely used to improve this is to introduce a core hole, i.e. a missing electron in the appropriate core level of the atom whose spectrum is required. Within pseudopo tential and PAW methods, this is achieved by assigning this atom a modified pseudopotential, which takes into account the vacant core orbital. Several methods exist for this: the simplest is to use a pseudopotential for an atom with an atomic number one greater than the actual species (the 'Z+1' method). Greater accuracy can be obtained by regenerating the appropriate PAW data set with fixed occupancies corresp onding to the promotion of an electron from the core level to the lowest previouslyunoccupied state. For example, for a core hole in the 1s orbital of a carbon atom, the configuration solved for would be 1s 1 2s 2 2p 3 . The method, while somewhat empirical in nature, has been widely shown to significantly improve agreement of predicted spectra with experimental results. However, it comes with the disadvantage that calcul ations must be repeated for each atom for which predicted EEL spectra are required.

Absolute energy offset
In many applications, the ability to predict changes in the spec trum for a particular element in different local environments is of more significance than to predict the absolute energy of the spectrum. Nevertheless, manual alignment of the offset of the edge by comparison to experiment is clearly undesirable. Mizoguchi et al [22] proposed a method to compute absolute offsets in the context of pseudopotential methods by compar ison of the energies of valence pseudopotential and allelectron calculations for groundstate and excitedstate atoms. In this approach, one computes three excitation energies: (i) the dif ference in total energy of the full system between the ground state and a state with the core hole potential present and an extra electron placed in the lowest energy conduction state; (ii) the change in total energy of the isolated allelectron atom under a similar promotion of an electron from the core orbital to the lowest unoccupied state; and (iii) the difference in total energy between the isolated ground state pseudoatom and the core hole potential with a promoted electron. Essentially, one is taking the excitation energy in the context of the real system, subtracting off the response of the pseudised atom, and adding back on the response of the allelectron atom, in an attempt to take into account the response of the allelectron atom in the real environment.
Where + + E sys ch e is the total energy of the system as calcu lated with a core hole potential and an electron in the lowest state of the conduction band. E sys,gs is the ground state energy of the system (no core hole, no electron in the conduction band).
+ + E aeatom ch e and E aeatom,gs are the all electron total ener gies of the isolated atom under consideration with the core hole (and excited electron) and in the ground state respec tively. Finally + + E psatom ch e and E psatom,gs are the pseudoatom total energies of the isolated atom under consideration with the core hole (and excited electron) and in the ground state respectively.
One then uses this excitation energy as the offset of the lowest energy state in the conduction band. Whilst there is not perfect agreement with experimental edge onset energies, values computed using (6) are correct to approximately 1-2%, and the method has met with widespread success in predicting chemical shifts for a given element between different mat erials [28][29][30][31].

Demonstration of methodology
Our first task is to demonstrate that the implementation of simulated electron energy loss spectroscopy, within the con text of linearscaling DFT with local orbitals, is capable of generating results systematically equivalent to widely used simulated EELS methodology. We first compare the output of the current implementation to planewave pseudopoten tial (PWP) methods, utilising the widelyused PWP package, We have chosen a range of simple systems to span wide and narrowbandgap materials. In each case we generate an equivalent supercell within onetep and castep, resulting in the set of systems shown in table 1. For the purposes of sec tions 3.1 and 3.2 we are primarily interested in the capacity of our implementation to produce predicted spectra for a given input geometry which match closely those produced by other methods. For this reason the simulation cells used were not subject to relaxation of the lattice constant. In the interests of consistency the experimental value of the lattice constants are used throughout.
For this preliminary investigation no core holes were used. onetep and castep calculations were performed at a kinetic energy cutoff of 800 eV, which is wellconverged for all materials studied here. We utilize the PBE functional [33], which as is widelyunderstood, would be expected to underestimate bandgaps but otherwise produce geometry and electronic structure in good agreement with experiment. Only the Γ point is sampled for the supercell ground state calcul ations. For the onetep calculations, we use the PAW data sets of Jollet, Torrent and Holzwarth [34]. For castep the on the fly pseudopotential generator was used. Both sets have been shown to be highly accurate through comparisons made as part of the 'Delta' project [35].
Valence and conduction NGWFs were truncated in onetep to a radius of 10.0 a 0 (5.3 Å) for all materials, which we veri fied was able to produce wellconverged densities of states for all systems in the valence and conduction bands. Kernel trun cation was not applied in these systems as they are too small for this to be worthwhile.
The allelectron calculations were performed using the ELK code [36]. The parameter rgkmax, which controls basis set size, was set to 7. Muffin tin radii for the species simulated were (Å): carbon: 0.95 oxygen: 0.95 magnesium: 1.16. An LDA functional was used. Core hole effects were included using a 'Z + 1' approximation as described in the ELK documentation.

Comparison to plane-wave methods
As the underlying basis of psinc functions used to express the local orbitals in a onetep calculation is equivalent to plane waves, we expect a very high degree of agreement between predicted spectra and those produced using a plane wave Note: Structures were obtained via the inorganic crystal structure database [27].
code. This is seen in the case of the tested systems, as long as the low energy conduction states can be well converged. Using a low smearing to compute spectra (far lower than the broadening usually observed in experiment) permits detailed comparison of fine structure between the two simulation methods. We do not calculate absolute energy offsets at this stage, but rather align the energy axis to the first peak of the spectrum, to aid detailed comparison of the shape of the pre dicted spectrum. Note that we will not perform any rescaling for this comparison, providing a powerful test of how robust our method is across different PAW data sets. Figure 1 shows this comparison in the case of magnesium oxide, graphite, diamond and silicon. In all cases we see almost perfect agreement in terms of relative peak position, peak height, and relative peak heights for at least the first 10 eV above the onset. Beyond this, the quality of the representation of the conduction band states in onetep is somewhat reduced, and there are minor discrepancies in peak heights, though these would not impair qualitative comparisons.

Comparison to all-electron methods
Simulations of the diamond and magnesium oxide systems were undertaken using the allelectron ELK code to provide a further point of comparison for our method. Given the com putationally demanding nature of allelectron calculations, smaller supercells were used. The diamond simulation was conducted in a × × 2 2 2 supercell and the magnesium oxide simulation in an unreduced eight atom unit cell. Monkhorst-Pack kpoint meshes of × × 10 10 10 and × × 8 8 8 respec tively were used. Figure 2 shows a comparison between onetep results with one core hole and broadened with a 1.5 eV width Gaussian and allelectron results. Once again, we see a very good agree ment, validating the PAW methodology in general and our Fourier space method for displacement core kets ( ⟩ ψ | r x c ) in particular. It should further be noted that the ELK does not use the dipole approximation and the close agreement of our results validates the use of (1) in this work. Note also that even though conduction NGWFs in the ONETEP calculation have only been optimised for the first roughly 10-20 eV beyond the conduction band edge, there is nevertheless relatively good agreement with allelectron methods over the whole range of 10-50 eV.

Comparison to experimental spectra
Having established that the methodology is in excellent agree ment with existing stateoftheart techniques for electron energy loss spectroscopy based on KSDFT, we are now in a position to compare directly with experimental spectra. For this comparison we will show that it becomes considerably more important to include the effects of core holes, so we show results both with and without a core hole included for a chosen atom. Note that the excellent agreement between the current methodology and the welltested planewave pseudo potential formalism, shown in figure 1, can be shown to be retained fully when using a PAW dataset with a core hole included.
The experimental spectra we reproduce from the litera ture [37,38] were obtained using transmission electron microscopy at a variety of facilities: see the individual ref erences for more detail. Our simulated spectra are computed under the assumption of zero momentum transfer. To facili tate comparison to experimental results we apply a 1.5 eV Gaussian broadening, which roughly matches the effective resolution of older spectra (though current stateoftheart facilities can improve upon this resolution). In the case of graphite and the magnesium K edge in MgO, lifetime broad ening effects were also included, since it is clear that there is increasing broadening at higher energies. In all cases, since both experimental and computed spectra are measured in arbitary units, we rescale the experimental results verti cally for ease of comparison, based on best agreement of the first peak or the first and second peaks. A test of simulated spectra for the carbon K edge in diamond and the oxygen K edge in MgO indicated that there is a minimal difference between spectra computed using relaxed instead of unre laxed lattices once a physically reasonable broadening has been applied.
Our simulated spectra have been offset by an energy shift which places the lowest conduction band state at the energy computed using the Mizoguchi method described in sec tion 2.6. The same offset was applied to spectra simulated with and without core holes (for a given system). This offset method has been used in all our simulated spectra other than those shown in figures 1 and 2.  Comparison of predicted spectra from the current method with experimental spectra for carbon Kedges in diamond (a) and graphite (b). The inclusion of a core hole dramatically improves the agreement of the predicted diamond spectrum with experiment. Graphite, however, has greater screening and less of a change is seen. Upper energy axis for simulated spectra. Lower energy axis experimental spectra. A lifetime broadening scheme was used for the carbon K edge in graphite. Experimental data reproduced from [38] with permission.  Figure 3 shows results for Mg and O Kedges in bulk crys talline magnesium oxide. Comparing the spectra without a core hole (green) and experimental (blue) lines, we see ini tially a poor agreement between computed and experimental spectra. Given the large band gap of the material it is likely that the core hole potential is rather weakly screened. Thus a core hole potential must be included to reproduce the experimental spectrum (red line, see next section for further discussion).
In the case of carbonbased materials, diamond and graphite, figure 4 shows that there is already a quite impres sive similarity between experimental results and simulation even without core holes. Relative peak positions match well, and with the exception of the first and second peaks there is a good agreement in relative intensities.

Core holes
In order to account for the effect of the hole left when a core electron is excited in the electron energy loss process, a modi fied PAW data set can be used. These 'core hole' potentials are created for atoms with an empty (or fractionallyoccupied) core orbital. Since these data sets result in a net charge being added to the simulation cell, care must be taken to converge results with respect to cell size due to the long range nature of the Coulomb force. Here linearscaling DFT has particular strength as large cells, which might be infeasible with conven tional plane wave codes, can be simulated.
As discussed in section 3.3, materials with wide band gaps only weakly screen the core hole charge. To achieve good agreement between simulated and experimental spectra in such materials, it is necessary to include the core hole [20]. For the wide band gap materials in section 3.1 a second set of simulations were conducted including a whole core hole in the 1s orbital.
In MgO the inclusion of a core hole is clearly beneficial in terms of improved agreement with experiment. The oxygen K edge shows a shift of peaks to higher energies relative to the first peak, correcting the peak energy underestimate seen in the noncorehole spectrum and resulting in the impressive agreement seen in figure 3. Particularly encouraging results are seen for the magnesium K edge, where a significant increase in the intensity of the first peak relative to the second leads to a convincing match between theoretical and exper imental spectra.
The remaining discrepancy between our predicted Mg K edge and the experimentally observed edge is due primarily to our choice of broadening scheme. We have elected to adopt a simple energy dependent Lorentzian broadening, which has the effect of reducing the intensity of peaks at higher energies relative to those at lower energies. As a result of this the rela tive intensity of the second and third peaks in the structure at 1320 eV is reversed.
In the case of diamond ( figure 4) there is a change in the relative intensities of the first two peaks, which now show the correct intensity ordering with respect to experiment. Note also that, the spacing of the first and second peaks is increased from 5.37 eV to 6.35 eV, meaning that the position of the second peak with respect to the experimental spectrum (spacing around 6.1 eV) changes from being slightly underes timated to slightly overestimated.
For graphite, in figure 4, the increased screening effects reduce the impact of including a core hole on the computed spectrum. An improvement in the relative spacing of the π * and σ * peak onsets is seen, which when combined with energy dependent broadening (taking into account the short lifetime of excitations to high energy conduction band states) a very good agreement with experiment is expected.

Convergence with system size
The inclusion of a core hole raises the issue of convergence with respect to system size, as in insulating materials the Coulomb interaction between periodic images is very long ranged. To investigate how large a simulation cell would be needed to obtain a well converged spectrum the magnesium oxide system was selected. Starting with the 216 atom sim ulation cell of MgO used previously, we construct an eight fold replica of this simulation cell, containing 1728 atoms. A smaller 64 atom cell was also constructed and used with castep with a × × 6 6 6 kpoint grid. We compute the oxygen K edge electron energy loss spectrum for the two larger cells and compute the Mizoguchi edge offset energy for all three.
Examining the Mizoguchi edge offset energies we see that there is a significant under convergence in the 64 atom cell with respect to the 216 atom cell. The computed energy for this system is 541.1 eV, differing by 485 meV from the offset computed for the 216 atom with Γ point sampling (540.6 eV). Going from the 216 atom cell to the 1728 atom cell we see that the former is close to converged, with a computed offset of 520.8 eV compared to 521.1 eV for the larger system (dif ference 240 meV). The computed spectra in figure 5 also con firm that the 216 atom system is well converged both with respect to electrostatics and kpoint sampling. While the dif ferences in computed edge offset energies may seem small we stress that when combining spectra of multiple atoms to produce a simulated spectrum of a sample of finite thickness these small differences could greatly alter the predicted peak  widths in the resulting spectrum. We therefore propose that when performing calculations with the intent of combining spectra from multiple atoms it is necessary to use simulation cells containing on the order of at least two hundred atoms in order to correctly converge the offsets which must be applied to those spectra prior to their combination.

Anatase surfaces
Finally, we present a practical example of the use of the cur rent methodology, namely to predict the influence of surfaces and defects on the EEL spectra of anatase. This system pro vides an excellent demonstration of the utility of onetep, since in order to fully relax defect geometries, very large cells are needed. This is particularly true for charged defects, which produce longranged electrostatic and strain fields.
First, we construct a 720 atom slab of pristine anatase with (1 0 1) surfaces exposed on both sides, surrounded by a 36 Å vacuum gap. The slab geometry was relaxed using the onetep implementation of the BFGS algorithm [39] so that all forces were below 0.1 eV − Å 1 . We refer to this system as the 'pris tine' slab.
A second surface cell was then prepared, containing a doubly positive oxygen vacancy formed by removal of one of the surface bridging oxygen atoms. The geometry of this cell was also relaxed, leading to the simulation cell shown in figure 6. We refer to this as the 'defective' system.  Predicted oxygen K edge spectra of surface and bulk atoms in a perfect anatase (1 0 1) slab. The differences between these spectra may be sufficient to resolve the surface signal experimentally using a method similar to that described in [6]. The subsurface atom used was one of those directly below a surface bridging oxygen: it occupied the same position as the 'def' atom did prior to relaxation.   Here we see that all atoms not directly adjacent to the defect produce very similar spectra. The spectrum produced by the atom closest to the defect ('def') produces a spectrum with a different shape and edge onset energy; these two features could be used to identify the presence of a defect. We show in figure 9 that these differences stand out even against a modest background signal for other atoms. Oxygen K edge for atoms in a defective surface def far n-a n-r nn-a nn-r sub The oxygen K edge energy loss spectrum for a bridging surface oxygen atom for the pristine slab was computed and is shown is figure 7. In all cases, a whole core hole in the oxygen 1s orbital was used. One of the most recognizable fea tures of the anatase oxygen K edge is reproduced, namely the double peak separated by 2.26 eV. The relative intensity of the two peaks differs somewhat from experimental spectra, where they have an approximate 1 : 1 ratio. This may be expected for undercoordinated surface bridging oxygen atoms, as a similar intensity ratio is seen in xray absorption spectra of anatase (1 0 1) surfaces [40]. Also shown in figure 7 is the spectrum for a subsurface oxygen atom, this spectrum shows significant differences from the spectrum of the bridging atom: there is a reduction in intensity of the first peak relative to the second and an increase in peak separation. Together these changes should make it possible to resolve between surface and bulk spectra using a method like that in [6]. In [6] a series of elec tron energy loss spectra were taken through areas of a sample with differing thickness and therefore differing contributions of the bulk to the recorded spectrum. A principal components method was then used separate the surface contribution to the spectra.
Electron energy loss spectra were computed for a selection of six oxygen atoms at various distances from the defect in the defective system, as indicated in figure 6. Distances of these atoms to the defect are given in table 2. For each position, the edge offset was calculated according to the method of Mizoguchi. The edge offset for the defec tive atom was found to be 518.9 eV and for the subsurface atom was 518.7 eV. The other atoms have offsets of between 518.3 and 518.4 eV. These are measurable differences given sufficient energy resolution, but it is worth noting that the uncertainty in the calculated values due to convergence with respect to system size could be of similar or greater magnitude as described in section 3.5.
Examining figure 8, we see that the expectation that elec tron energy loss spectroscopy is sensitive only to shortranged effects is clearly borne out for this system. The oxygen K edge for the atom far from the defect is effectively identical to an equivalent atom in the pristine slab. We may conclude that a high concentration of defects must be present to significantly alter an spectrum which is averaged over a large area, as only atoms very close to a defect will produce contributions to the spectrum which differ from that of a pristine slab.
Although EELS is expected to be a surfacesensitive method, an electron beam nevertheless penetrates a certain distance into a slab. In a real experimental measurement for an anatase slab, even if the lateral resolution of a beam is very high, spectra from multiple atoms at different depths into the slab are likely to be mixed, leading to an averaged spectrum. In figure 9 we have simulated this mixing effect by taking a weighted combination of spectra for two atoms lying on a ver tical line through the sample and thus likely to be excited by the same electron beam. The mixing ratios have been chosen to reflect slabs of varying thickness, with the 1 : 3 defect:sub surface ratio approximating the slab depicted in figure 6. Figure 9 highlights the challenges faced in identifying a defect using EELS. We can see however that the structure of the first peak, at approximately 520 eV, changes considerably between the defect spectrum and that of atoms in the layers below. This change in structure is visible even with considerable broad ening and thus there is some hope that in sufficiently thin sam ples the presence of intrinsic defects would be detectable.

Conclusions
We have demonstrated an efficient method for the computa tion of electron energy loss spectra for large, complex nano materials systems. This approach has been implemented in the linear scaling code onetep. We have tested our method against both experimental spectra and other wellestablished simula tion methods (both planewave and allelectron methods); plane wave and allelectron. We have also demonstrated suc cessful implementation of corehole and absolute energy shift calculations. In all cases convincing agreement is obtained, with core holes being required in the case of comparisons to experiment, particularly in wide bandgap materials. Predicted spectra for a defective surface with contributions from multiple atoms: oxygen K edge spectra for the sub surface atom combined with that for the defect atom (a) and far atom (b). The objective is to simulate taking a spectrum for a sample of finite thickness. The spectra shown in (b) are intended to represent those of pristine slabs of various thicknesses. It can be seen that only in a thin sample would the contribution of the defect be resolvable at realistic energy resolutions: a 0.7 eV Gaussian broadening is used here. This represents a robust test of both the matrix element gen eration method and the conduction optimisation method used by onetep which has not previously been extensively tested with bulk solid systems.
As onetep does not yet support spin-orbit coupling, we expect poor reproduction of edges where this effect is signifi cant, such as L 2, 3 edges of heavier elements. Our methodology is, however, readily adaptable to include spin-orbit coupling, which can be implemented relatively straightforwardly into the PAW framework, so we are confident that edges with sig nificant splitting could also be reproduced in principle.
The convergence of predicted spectra with respect to system size in large band gap materials was investigated using the prototypical insulating system magnesium oxide. We found that for the purposes of comparison to experiment, simulations are well converged within supercells of manage able size, namely in the region of 200-1000 atoms.
As an application of our method to a system of high technolog ical relevance, we investigated the (1 0 1) surface of anatase, and the impact of defects on the spectra of that surface. We have con firmed that since the electron energy loss technique is very local, surface point defects are likely to only be identifiable in thin speci mens, in experiments with atomic resolution. However, extended defects such as the columns of atoms in a grain boundary, which have different coordination number from their bulk counterparts, should be readily distinguishable with the current tools.