Questaal: a package of electronic structure methods based on the linear muffin-tin orbital technique

This paper summarises the theory and functionality behind Questaal, an open-source suite of codes for calculating the electronic structure and related properties of materials from first principles. The formalism of the linearised muffin-tin orbital (LMTO) method is revisited in detail and developed further by the introduction of short-ranged tight-binding basis functions for full-potential calculations. The LMTO method is presented in both Green’s function and wave function formulations for bulk and layered systems. The suite’s full-potential LMTO code uses a sophisticated basis and augmentation method that allows an efficient and precise solution to the band problem at different levels of theory, most importantly density functional theory, LDA+U , quasi-particle self-consistent GW and combinations of these with dynamical mean field theory. This paper details the technical and theoretical bases of these methods, their implementation in Questaal, and provides an overview of the code’s design and capabilities.


PROGRAM SUMMARY
Program Title: Questaal Licensing provisions: GNU General Public License, version 3 Programming language: Fortran, C, Python, Shell Nature of problem: Highly accurate ab initio calculation of the electronic structure of periodic solids and of the resulting physical, spectroscopic and magnetic properties for diverse material classes with different strengths and kinds of electronic correlation. Solution method: The many electron problem is considered at different levels of theory: density functional theory, many body perturbation theory in the GW approximation with different degrees of self consistency (notably quasiparticle self-consistent GW ) and dynamical mean field theory. The solution to the single-particle band problem is achieved in the

Introduction
Different implementations of DFT are distinguished mainly by their basis set, which forms the core of any electronic structure method, and how they orthogonalise themselves to the core levels. Using these classifications most methods adopt one of four possible combinations shown in Fig. 1. In the vast majority of cases, basis sets consist of either atom-centred spatially localised functions (lower panel of Fig. 1), or plane waves (PW) (upper). As Nuclei are shown as dots. The all-electron methods APW and KKR on the right substitute (augment) the envelope function (green) with numerical solutions of partial waves inside augmentation spheres (blue and red). Parts inside augmentation spheres are called "partial waves." The two figures on the left use a pseudopotential allowing their envelope functions to be smooth, with no augmentation needed. A pseudopotential's radius corresponds to a characteristic augmentation radius. The top two figures use plane waves for envelope functions; the bottom two use atom-centred local basis sets. We denote localised basis sets as "KKR", for the Korringa-Kohn-Rostocker method [1] as it plays a central role in this work; but there are other kinds, for example the Gaussian orbitals widely favoured among quantum chemists.
for treatment of the core, it is very common to substitute an effective repulsive (pseudo)potential to simulate its effect, an idea initially formulated by Conyers Herring [2]. Pseudopotentials make it possible to avoid orthogonalisation to the core, which allows the (pseudo)wave functions to be nodeless and smooth. For methods applied to condensed matter, the primary alternative method, formulated by Slater in 1937 [3], keeps all the electrons. Space is partitioned into non-overlapping spheres centred at each nucleus, with the interstitial region making up the rest. The basis functions are defined by plane waves in the interstitial, which are replaced ("augmented") by numerical solutions of the Schrödinger equations (partial waves) inside the augmentation spheres. The two solutions must be joined smoothly and differentiably on the augmentation sphere boundary (minimum conditions for a non-singular potential). Slater made a simplification: he approximated the potential inside the augmentation spheres with its spherical average, and also the interstitial potential with a constant. This is called the Muffin Tin (MT) approximation; see Solutions to spherical potentials are separable into radial and angular parts, φ (ε, r)Y L (r). The φ are called partial waves and Y L are the spherical harmonics. Here and elsewhere, angular momentum labelled by an upper case letter refers to both the and m parts. A lower case symbol refers to the orbital index only ( is the orbital part of L=( , m)). Right: a muffin-tin potential. Potential is flat in the interstitial region between sites, and spherically symmetric in a volume around each site. Blue depicts an augmentation radius s R around a sphere centred at some nuclear position R where the interstitial and augmented regions join.
The φ are readily found by numerical integrating a one-dimensional wave equation (Sec. 2), which can efficiently be accomplished.
An immense amount of work has followed the original ideas of Herring and Slater. The Questaal package is an all-electron implementation in the Slater tradition, so we will not further discuss the vast literature behind the construction of a pseudopotential, except to note there is a close connection between pseudopotentials and the energy linearisation procedure to be described below. Blöchl's immensely popular Projector Augmented-Wave method [4] makes a construction intermediate between pseudopotentials and APWs. Questaal uses atom-centred envelope functions instead of plane waves (Sec. 3), and an augmentation scheme that resembles the PAW method but can be converged to an exact solution for the reference potential, as Slater's original method did. The spherical approximation is still almost universally used to construct the basis set, and thanks to the variational principle, errors are second order in the nonspherical part of the potential. The nonspherical part is generally quite small, and this is widely thought to be a very good approximation, and the Questaal codes adopt it.
For a MT potential (V M T taken to be 0 for simplicity), the Schrödinger equation for energy ε has locally analytical solutions: in the interstitial the solution can expressed as a plane wave e i k·r , with ε= 2 k 2 /2m. (We will use atomic Rydberg units throughout, =2m=e 2 /2=1). In spherical coordinates envelope functions can be Hankel functions H L (E, r)=h (kr)Y L (r) or Bessel func-tions j (kr)Y L (r), except that Bessel functions are excluded as envelope functions because they are not bounded in space. Inside the augmentation spheres, solutions consist of some linear combination of the φ .
The all-electron basis sets "APW" and "KKR" [1] are both instances of augmentedwave methods: both generate arbitrarily accurate solutions for a muffin-tin potential. They differ in their choice of envelope functions (plane waves or Hankel functions), but they are similar in that they join onto solutions of partial waves in augmentation spheres. Both basis sets are energy-dependent, which makes them very complicated and their solution messy and slow. This difficulty was solved by O.K. Andersen in 1975 [5]. His seminal work paved the way for modern "linear" replacements for APW and KKR, the LAPW and Linear Muffin Tin Orbitals (LMTO) method. By making a linear approximation to the energy dependence of the partial waves inside the augmentation spheres (Sec. 2.4) the basis set can be made energy-independent and the eigenfunctions and eigenvalues of the effective one-particle equation obtained with standard diagonalisation techniques. LAPW, with local orbitals added to widen the energy window over which the linear approximation is valid (Sec. 3.7.3) is widely viewed to be the industry gold standard for accuracy. Several well known codes exist: WEIN2K (http://susi.theochem.tuwien.ac.at) and its descendants such as the Exciting code (http://exciting-code.org) and FLEUR (http://www.flapw.de/site. A recent study [6] established that these codes all generate similar results when carefully converged. Questaal's main DFT code is a generalisation of the LMTO method (Sec. 3), using the more flexible smooth Hankel functions (Sec. 3.1) instead of standard Hankels for the envelope functions. Accuracy of the smooth-Hankel basis is also high (Sec. 3.13), and though not quite reaching the standard of the LAPW methods, it is vastly more efficient. If needed, Questaal can add APW's to he basis to converge it to the LAPW standard (Sec. 3.10).

Questaal's History
Questaal has enjoyed a long an illustrious history, originating in O.K. Andersen's group in the 1980's as the standard "Stuttgart" code. It has gone many subsequent evolutions, e.g. an early allelectron full-potential code [7], which was used in one of the first ab initio calculations of the electronphonon interaction for superconductivity [8], an efficient molecules code [9] which was employed in the first ab initio description of instanton dynamics [10] one of the first noncollinear magnetic codes and the first ab initio description of spin dynamics [11], first implementation of exact exchange and exact exchange+correlation [12], one of the first all-electron GW implementations [13], and an early density-functional implementations of non-equilibrium Green's functions for Landauer-Buttiker transport [14]. In 2001 Aryasetiawan's GW was extended to a full-potential framework capability being the first all-electron GW code [15]. Soon after the concept of quasiparticle self-consistency was developed [16], which has dramatically improved the quality of GW. Its most recent extension is to combine QSGW with DMFT. It and the code of Kutepov et al. [17] are the first implementations of QSGW +dynamical mean field theory (DMFT); and to the best of our knowledge it has the only implementation of response functions (spin, charge, superconducting) within QSGW +DMFT.

Main Features of the Questaal Package
Ideally a basis set is complete, minimal, and short ranged. We will use the term compact to mean the extent to which a basis can satisfy all these properties: the faster a basis can yield converged results for a given rank of Hamiltonian, the more compact it is. It is very difficult to satisfy all these properties at once. KKR is by construction complete and minimal for a "muffin-tin" potential, but it is not short-ranged. In 1984 it was shown (by Andersen once again! [18]) how to take special linear combinations of them ("screening transformation") to make them short ranged. Andersen's screening transformation was derived for LMTOs, in conjunction with his classic Atomic Spheres Approximation, (ASA, Secs 2.7), and screening has subsequently been adopted in KKR methods also. The original Questaal codes were designed around the ASA, and we develop it first 2.5. Its main code no longer makes the ASA approximation, and generalises the LMTOs to more flexible functions; that method is developed in Sec. 3. These functions are nevertheless long-ranged and cannot take advantage of very desirable properties of short-ranged basis sets. Very recently we have adapted Andersen's screening transformation to the flexible basis of full-potential method (Sec. 2.9). Screening provides a framework for the next-generation basis of "Jigsaw Puzzle Orbitals" (JPOs) that will be a nearly optimal realisations of the three key properties mentioned above.
Most implementations of GW are based on plane waves (PWs), in part to ensure completeness, but also implementation is greatly facilitated by the fact that the product of two plane waves is analytic. GW has also been implemented recently in tightbinding forms using e.g., a numerical basis [19], or a Gaussian basis [20]. None of these basis sets is very compact. The FHI AIMS code can be reasonably well converged, but only at the expense of a large number of orbitals. Gaussian basis sets are notorious for becoming over-complete before they become complete. Questaal's JPO basis-still under development-should bring into a single framework the key advantages of PW and localised basis sets.
Questaal implements: 1. Density-functional theory (DFT). There is a standard DFT code, lmf (Sec. 3), and also three codes (lm, lmgf, lmpg) that implement DFT in the classical Atomic Spheres Approximation [5,18], presented in Sec. 2.7. All use the screened, tight-binding form (Sec. 2.10) lm, a descendant of Andersen's standard ASA package (Stuttgart code), is an approximate, fast form of lmf, useful mainly for close-packed magnetic metals; lmgf (Sec. 2.12) is a Green's function implementation closely related to the KKR Green's function, parameterised so that it can be linearised. Sec. 2.13 shows how this is accomplished, resulting in an efficient, energyindependent Hamiltonian lm uses. lmgf has two useful extensions: the coherent potential approximation (CPA, Sec. 2.18) and the ability to compute magnetic exchange interactions. lmpg (Sec. 2.14) is a principal layer Green's function technique similar to lmgf but designed for layer geometries (periodic boundary conditions in two dimensions). lmgf is particularly useful for Landauer-Buttiker transport [21,22,23,24], and it includes a nonequilibrium Keldysh technique [14]. 2. The GW approximation based on DFT. Questaal's GW package is separate from the DFT code; there is an interface (lmfgwd) that supplies DFT eigenfunctions and eigenvalues to it. It was originally formulated by Aryasetiawan in the ASA [25], derived from the Stuttgart ASA code; and it was the first all-electron GW implementation. Kotani and van Schilfgaarde extended it to a full-potential framework in 2002 [15]. A shell script lmgw runs the GW part and manages links between it and lmf. 3. The Quasiparticle Self-Consistent GW (QSGW ) approximation, first formulated by Faleev, Kotani and van Schilfgaarde [16]. Questaal's QSGW is a descendent of Kotani's original code, which with some modest modifications can be found at https://github.com/tkotani/ecalj/. QSGW also works synchronously with lmf, yielding either a static QSGW self-energy Σ 0 which lmf reads as an effective external potential added to the Hamiltonian, or a dynamical self-energy (diagonal part) which a post-processing code lmfgws, uses to construct properties of an interacting Green's function, such as the spectral function. See Sec. 4. 4. Extensions to Dynamical Mean Field Theory using a bath based on either DFT or QSGW. Questaal does not have its own implementation of DMFT, but has interfaces to Haule's CTQMC solver [26], and to the TRIQS library [27]. See Sec. 5. 5. An empirical tight-binding model (tbe). The interested reader is referred to Refs [28,29] and the Questaal web site. 6. A large variety of other codes and specialpurpose editors to supply input the electronic structure methods, or to process the output.

Outline of the Paper
The aim of the paper is to provide a consistent presentation of the key expressions and ideas of the LMTO method, and aspects of electronic structure theory, as they are implemented in the different codes in the Questaal suite. Necessarily this presentation is rather lengthy; the paper is organised as follows. Section 2 describes the LMTO basis, assuming the muffin tin and atomic sphere descriptions of the potential, together with tail cancellation and linearisation this comprises therefore a derivation of the traditional ("second generation") LMTO method. The transformation to a short range tight-binding-like basis, and other representations is described. The formulation of the crystal Green's function in terms of the LMTO potential parameters is derived, allowing the use of coherent potential approximation alloy theory. Non-collinear magnetism and fully-relativistic LMTO techniques are presented. Section 3 describes the full-potential code: its basis, augmentation method, core treatment and other technical aspects are described in detail. The LDA+U method, relativistic effects, combined LMTO+APW, and numerical precision are also discussed. Section 4 describes the GW method: the importance of self-consistency, the nature of QSGW in particular and its successes and limitations. Section 5 describes Questaal's interface to different DMFT solvers; section 6 discusses calculation of spin and charge susceptibilities. Finally, software aspects of the Questaal project are outlined in section 8. Several appendices are provided with a number of useful and important relations for the LMTO methodology.

The Muffin-Tin Potential and the Atomic Spheres Approximation
As noted earlier, the KKR method solves the Schrödinger equation in a MT potential to arbitrarily high accuracy. We develop this method first, and show how the LMTO method is related to it. In Sec. 3.1 we show how the basis in lmf is related to LMTOs. The original KKR basis set consists of spherical Hankel (Neumann) functions H RL (E, r) = h R (kr)Y L (r) as envelope functions with k 2 = E, augmented by linear combinations of partial waves φ R (ε, r)Y RL (r) inside augmentation spheres (Fig. 3). (See the Appendix for definitions and the meaning of subscripts R and L.) The envelope must be joined continuously and differentiably onto augmentation parts, since the kinetic energy cannot be singular. Note that for large , φ R → const×r . This is because the angular part of the kinetic energy becomes dominant for large . v MTZ Figure 3: Schematic of muffin-tin potential (black) and a solution (red) in a three-atom chain.

One-centre Expansion of Hankel Functions
An envelope function H L (E, r) has a "head" centred at the origin where it is singular and tails at sites around the origin. In addition to the head, tails must be augmented by linear combinations of φ (ε, r) centred there, which require H L be expanded in functions centred at another site. If the remote site is at R relative to the head, the onecentre expansion can be expressed as a linear combination of Bessel functions This follows from the fact that both sides satisfy the same second order differential equation (∇ 2 + E) = 0. The two functions centred at the origin satisfying this equation are the Hankel and Bessel functions.
Hankel functions have a pole at r=0, whereas Bessel functions are regular there, so this relation must be true for all r < |R|. The larger the value of r, the slower the convergence with M . Expressions for the expansion coefficients S can be found in various textbooks; they of course depend on how H and J are defined. Our standard definition is The C KLM are Gaunt coefficients (Eq. A.8).
To deal with solids with many sites, we write the one-centre expansion using subscript R to denote a nucleus and r relative to the nuclear coordinate: Envelope functions then have two ell cutoffs: b for the head at R, and a for the one-centre expansion at R . These need not be the same: b determines the rank of the Hamiltonian, while a is a cutoff beyond which φ R × φ R is well approximated by const×r + . A reasonable rule of thumb for reasonably well converged calculations is to take b one number larger than the highest character in the valence bands. Thus b = 2 is reasonable for sp elements, b = 3 for transition metals elements, b = 4 for f shell elements. In the ASA, reasonable results can be obtained for b = 2 for transition metals elements, and b = 3 for f shell elements. As for a , traditional forms of augmentation usually require a = 2 b for comparable convergence: lmf does it in a unique way that converges much faster than the traditional form and it is usually enough to take a = b + 1 [30].

Partial Waves in the MT Spheres
Partial waves in a sphere of radius s about a nucleus must be solved in the four coordinates, r and energy ε. Provided r is not too large, v(r) is approximately spherical, v(r)≈v(r); Even if the potential is not spherical, it is assumed to be for construction of the basis set, as noted earlier. Solutions for a spherical potential are partial waves φ (ε, r)Y L (r), where Y L are the spherical harmonics. Usually Questaal uses real harmonics Y L (r) instead (see Appendix for definitions).
φ (ε, r) satisfies Schrödinger's equation We are free to choose the normalisation of φ and use s 0 φ 2 (ε, r)r 2 dr = 1 (5) One way to think about solutions of Schrödinger's equation is to imagine each nucleus, with some v R (r) around it, as a scatterer to waves incident on it. Scattering causes shifts in the phase of the outgoing wave. The condition that all the sites scatter coherently, which allows them to sustain each other without any incident wave, gives a solution to Schrödinger's equation. This condition is explicit in the KKR method, and forms the basis for it. Imagine a "muffin tin" potential -flat in the interstitial but holes carved out around each nucleus of radius s. The phase shift from a scatterer at R is a property of the shape of φ R at its MT boundary, φ R (ε, s). Complete information about the scattering properties of the sphere, if v(r)=v(r), can be parameterised in terms of φ R (ε, s) and its slope, as we will see.
For a small change in energy of the incident wave, there can be a strong change in the phase shift. In the region near an eigenstate of the free atom the energy dependence is much stronger for the partial wave than it is for the incident waves striking it, so electronic structure methods focus on the scattering properties of the partial waves φ . This information is typically expressed through the logarithmic derivative function Consider the change in φ (ε, r) with ε for a given v(r). As ε increases φ (ε, r) acquires more curvature (Fig. 4). In the interval between ε=ε 0 where φ (ε 0 , s)=0 so that D = 0, and ε 2 where φ (ε 2 , s)=0, φ (ε, s) is positive. D thus decreases monotonically, with D → −∞ as ε→ε 2 . At some ε 1 in this region D = − − 1, which is the logarithmic derivative of a Hankel function of energy 0. ε 1 is called the "band centre" C for reasons to be made clear in Sec. 2.5: it is close to an atomic level and in tight-binding theory would correspond to an on-site matrix element. Increasing from ε 2 , D decreases monotonically from +∞ as shown in Fig. 4, reaching 0 once more, passing through some ε 3 where D =+ . This is the logarithmic derivative of a Bessel function of energy 0, and is traditionally called V .
Thus D is a monotonically decreasing cotangentlike function of ε, with periodic poles. At each pole φ (ε, r) acquires an additional node, incrementing the principal quantum number n. Between poles n is fixed and there are parameters C and V for each n. The linear method approximates D with a simple pole structure, which is accurate over a certain energy window. Similarly pseudopotentials are constructed by requiring the pesudofunction to match D of the free atom, in a certain energy window.
For principal quantum number n, φ has n− nodes and may vary rapidly to be orthogonal to deeper nodes. This poses no difficulty: we use a shifted logarithmic radial mesh, with point i given by Typically a few hundred points are needed for accurate integration. Core and valence waves use the same mesh.

Energy Derivative of D
Consider the matrix element integrated over a sphere of radius s: ε ν is some fixed energy. Taking into account the boundary condition at s, we obtain From this we obtain the energy derivative of D as [5] 2.4. Linearisation of Energy Dependence in the Partial Waves An effective way to solve Schrödinger's equation is to linearise the energy-dependence of the partial waves φ (ε, r), as This was Andersen's most important contribution to electronic structure theory [5]: it had a dramatic impact on the entire field. We will make extensive use of the linear approximation here.
In the linear approximation, four parameters (φ(s),φ(s), and their logarithmic derivatives) completely characterise the scattering properties of a sphere with v = v(r). Only three of them turn out to be independent. To see this, obtain an equation forφ by differentiating Eq. (4) w.r.t. ε: With the normalisation Eq. (5), φ andφ are orthogonal Of the four parameters φ(ε ν , s),φ(ε ν , s), D{φ}, D{φ}, only three are independent. Using the normalisation Eq. (5), we can establish the following relation between them: The third line follows from Eq. (4), and the second from Green's second identity which adds a surface term when φ andφ are interchanged. The Wronskian W {a, b} is defined as for a pair of functions a(r) and b(r) evaluated at point s.

The Traditional LMTO Method
For historical reasons Anderson constructed the original LMTO formalism with non-standard definitions for the Hankel and Bessel functions. We follow those definitions in order to be consistent with the historical literature. In this paper they are named H andĴ and are defined in the first Appendix.
Here we follow Andersen's development only in the context of E≤0, with κ 2 = − E. Note that E can be chosen freely and need not be connected to the eigenvalue ε. But for exact solutions in a MT potential, the energy ofĤ L (E, r) andĴ (E, r) must be chosen so E=ε−v MTZ .
LMTO and KKR basis sets solve Schrödinger's equation in a muffin-tin potential, which in the interstitial reduces to the Helmholtz equation, and linear combinations of Hankel functions as solutions that satisfy appropriate boundary conditions. We defer treatment of the boundary conditions to the next section and continue the analysis of partial waves for a single scattering centre for now.
As noted, the scattering properties depend much more strongly on the partial waves than the energy dependence of the envelopes. In keeping with the traditional LMTO method, we assume that the kinetic energy of the envelopes vanishes (the energy is taken to be close to the MT potential). Then Schrödinger's equation reduces further to Laplace's equation, whose solutions are Hankel and Bessel functionsĤ andĴ at E=0, which we denote aŝ H (r) andĴ (r). In close-packed solids there is reasonable justification for this choice: the spacing between spheres is much smaller than the wavelength of a low-energy solution to the wave equation (one not too far from the Fermi level).Ĥ (r) andĴ (r) are proportional to r − −1 and r respectively (see Appendix) and so φ can be continued into the in-terstitial in the vicinity of s, φ (ε, r∼s) =φ (ε, s) The first term is proportional toĴ (r), the second toĤ (r). In the remainder of this section we will develop expressions for the general κ case in part, showing also the κ→0 limit, and finally focus on constructing Hamiltonians and Green's functions with κ=0.

Energy-Dependent Muffin Orbitals
Eq. (13) is not yet a suitable basis because it diverges as r→∞ because of the J term. However, we can construct a family of "muffin-tin" orbitals that are continuous and differentiable and that do not diverge for large r R . N R and P R are coefficients fixed by requiring that the value and slope continuous at s R . Thus for r<s, χ RL (ε, E, r) consists of a linear combination of φ R (r) and J R (E, r) that matches smoothly and differentiably ontoĤ R (E, r). Apart from the "contaminating" J R term, χ RL is a solution for a single MT potential at sphere R. It vanishes at ε corresponding to the eigenvalue of the MT "atom," which occurs at ε=C . (Taking E=0, ε=C when D = − − 1; see Eq. (13).) In a latticeĤ must also be augmented at all R =R. To form an eigenstate, the contaminating term must be cancelled out by tails from χ R L centred elsewhere. Since anyĤ R L (E, r) can be expanded as linear combinations ofĴ RL (E, r), Eq. (3), it is easy to anticipate how the "tail cancellation condition," which forms the basis of the KKR method (2.6.1), comes about. The P are called "potential functions" and play a central role in constructing eigenfunctions. Expressions for P and N are developed in Sec. 2.8. Combined with the linearisation of the partial waves, (Sec. 2.4), information about P can be encapsulated in a small number of parameters; see Sec. 2.8.1.
Note the similarity between the χ RL and the partial waves. There is a difference in normalisation, but more importantly the term proportional toĴ RL for r R >s in Eq. (13) must be taken out of the MTO because it diverges for large r, as noted. SinceĴ is present for r<s, χ RL is not a solution of Schrödinger's equation for r R <s. However any linear combination of the χ RL can be taken as a trial solution to Schrödinger's equation. The z R L are expansion coefficients, which become the eigenvector if Ψ is an eigenstate. For any z R L , Ψ solves the interstitial exactly if the potential is flat and −E is chosen to correspond to the kinetic energy in the interstitial, E=ε − v MTZ , since each χ RL individually satisfies Schrödinger's equation.

Tail Cancellation
Inside sphere R there are three contributions to Ψ(ε, r): partial waves from the "head" function χ RL , the Bessel part of that function, and contributions from the tails of χ R L centred at other sites, which are also Bessel functions, Eq. (3). Thus, all the contributions to Ψ inside some sphere R, in addition to the partial wave, consist of some linear combination of Bessel functions. We can find exact solutions for the MT potential by finding particular linear combinations z RL that cause all theĴ RL inside each augmentation sphere to cancel.
From the definition Eq. (14), Eq. (15), has a onecentre expansion inside sphere R The one-centre expansion satisfies Schrödinger's equation provided that the second and third terms cancel. This leads to the "tail cancellation" theorem (16) and is the fundamental equation of KKR theory. For non-trivial solutions (|z RL | =0), the determinant of the matrix P −S must be zero. This will only occur for discrete energies ε i (P ) for which P −S has a zero eigenvalue. The corresponding eigenvector z RL yields an eigenfunction, Eq. (15), which exactly solves Schrödinger's equation the MT potential in the limit L→∞ and if E is taken to be the proper kinetic energy in the interstitial, E=ε i −V MTZ . In general there will be a spectrum of eigenvalues ε i that satisfy Eq. (16). The quantity (17) is called the "auxiliary Green's function" and is closely related to the true Green's function G [31] (poles of g and G coincide). We will develop the connection in Sec. 2.12. In KKR theory g is called the "scattering path operator."

The Atomic Spheres Approximation
Andersen realised very early that more accurate solutions could be constructed by overlapping the augmentation spheres so that they fill space. There is a trade-off in the error arising from the geometry violation in the region where the spheres overlap and improvement to the basis set by using partial waves in this region rather than envelope functions. The Atomic Spheres approximation, or ASA, is a shorthand for three distinct approximations: • v(r) is approximated by a superposition of spherically symmetric v R (r), with a flat potential in the interstitial.
• The MT spheres are enlarged to fill space, so that the interstitial volume is zero. The resulting geometry violations are ignored, except that the interstitial can be accounted for assuming a flat potential (the "combined correction" term). Errors associated with the geometry violation were carefully analysed by Andersen in his NMTO development [32].
• The envelope functions are Hankel functions with κ=0, augmented by partial waves inside MT spheres. There is no difficulty in working with κ =0, but κ=0 is a good average choice, as noted above. As ASA is an approximate method, little is gained by trying to improve it in this way. Real potentials are not muffintins, and the loss of simplicity does not usually compensate for limited gain in precision.
The full-potential methods use better envelope functions (Sec. 3).

Tail Cancellation in the ASA
In the ASA the spheres fill space, making the interstitial volume null. By normalising the χ of Eq. (14) as defined there, the eigenvectors z RL of P − S ensure that Ψ is properly normalised if This is because the Bessels all cancel, and the wave function inside sphere R is purely φ R N R z RL . Normalisation of z RL with the normalisation of φ (Eq. (5)) ensures that Ψ|Ψ =1. Making the spheres fill space is a better approximation than choosing spheres with touching radii, even with its geometry violation. This is because potentials are not flat and partial waves φ are better approximations to the eigenfunctions than theĤ . Moreover, it can be shown [33] that the resulting wave function is equivalent to the exact solution of the Schrödinger equation for v(r) equal to the sum of overlapping spherical potentials v(r)= R v R (r). This means v(r) is deeper along lines connecting atoms with a corresponding reduction of v(r) along lines pointing into voids, corresponding to the accumulation (reduction) of density in the bonds (voids).
Ψ varies in a nonlinear way with ε, so the tail cancellation condition entails a nonlinear problem. Once P (ε) (more precisely 1/P ) is linearised (Sec.2.8), the cancellation condition simplifies to a linear algebraic eigenvalue problem. This provides a framework, through the linear approximation Eq. (8), for constructing an efficient, energyindependent basis set that yields solutions from the variational principle, without relying on tail cancellation.

Potential and Normalisation functions
The tail cancellation condition Eq. (16), is conveniently constructed through the "potential function" P (ε), P is closely related to the logarithmic derivative D (ε), Eq. (6), which parameterises the partial wave in isolation. P depends on both D and the boundary conditions, which depend on what we select for the interstitial kinetic energy E. P (ε) is an always increasing tangent like function of energy, and in the language of scattering theory, it is proportional to the cotangent of the phase shift.
If the potential were not spherical, a more general tail cancellation theorem would still be possible, but P would need be characterised by additional indices: P =P L,L (ε) while for a spherical potential P depends on only. This is the only case we consider here.
N and P are fixed by requiring that χ (ε, κ, r) be continuous and differentiable at s. Expressions for N and P are conveniently constructed by recognising that any function f (r) can be expressed as a linear combination of a(r) and b(r) near s (meaning it connects smoothly and differentiably at r=s) through the combination Thus the matching conditions require N and P to be w is an arbitrary length scale, typically set to the average value of s. With the help of Eq. (7) the energy derivative of P is readily shown to be (Eq. A20, Ref. [31]) Using the following relation between Hankels and Bessels, it is readily seen thaṫ 2.8.1. Linearisation of P In this section we consider a single only and drop the subscript. By linearising the energy dependence of φ, Eq. (8), it is possible to parameterise P (ε) in a simple manner. First, we realise P is explicitly a function of D, and implicitly depends on ε through D≡D{φ (ε)}. Writing Eq. (20) in terms of D we see that The linearised φ(ε, r) (Eq. (8)) may be re-expressed using D in place of ε: where Eqs. (8) and (25)  To obtain an explicit form for P (ε) a relation between ε and ω is required. Matrix elements and overlap for Φ[D] are readily obtained from Schrödinger's equation, Eq. (4) and (9). With these equations and normalisation relations Eq. (5) and (10), we can find that ε[D] can be obtained from the variational principle Linearisation of φ or Φ has errors of second order in ε − ε ν , which means the variational estimate for the energy has errors of fourth order. From inspection of Eq. (27) we can deduce that the following linear approximation to ε has errors of second order in ε − ε ν . Thus, P can be parameterised to second order in ε as where the "potential parameters" γ, C and V are defined as A simple pole structure can be parameterised in several forms. A particularly useful one is where ∆≡(C − V )γ. ∆ can be expressed in terms of Wronskians as This last equation defines the sign of ∆ 1/2 . Eq. (32) is accurate only to second order because of the linear approximation Eq. (28) for ω(ε). From the structure of Eq. (27), it is clear that ω(ε), and thus P (ε) can be more accurately parameterised (to third order) by the substitution The third-order parameterisation requires another parameter, sometimes called the "small parameter" Thus P is parameterised to third order by four independent parameters. It is sometimes convenient in a Green's function context to use P and N (or Name Interpretation C band centre eigenvalue of MT "atom" and resonance in extended system, Eq. (30) ∆ bandwidth bandwidth in the absence of hybridisation with other orbitals, Eq. (33) p small parameter 3 rd order correction to second-order potential function P , Eq. (32) V "bottom" ε where D = + (free electrons) γ "transformation to orthogonal basis" see Sec. 2.9 equivalentlyṖ ; see Eq. (23)) in place of C and ∆. Green's functions can be constructed without linearising φ; this is the KKR-ASA method. How linearisation of φ resolves G in an energy-independent Hamiltonian is described in Sec. 2.8.3.

Spin Orbit Coupling as a Perturbation
The formalism of the preceding sections can be extended to the Pauli Hamiltonian. If we include the spin-orbit coupling perturbatively, and include the mass-velocity and Darwin terms the Hamiltonian becomes The 2×2 matrix refers to spin space. In orbital space, L ·Ŝ mixes spin components; also matrix elements of Eq. (25) andL ·Ŝ depend on both and m. We require matrix elements of the Φ(D), analogous to Eq. (26): How the ASA-Tail Cancellation Reduces to a Linear algebraic eigenvalue problem The eigenvalue condition is satisfied when ε is varied so that P −S is a matrix (P − S) RL,R L with P diagonal in RL, and S hermitian. If P is parameterised by Eq. (32).
Multiply on the right by S −1 and the left by Rearrange, keeping in mind that C, ∆, and γ are real and diagonal in RL This has the form of the linear algebraic eigenvalue problem hψ = εψ h is a hermitian matrix, with eigenvalues corresponding to the zeros in | P −S|. The tilde indicates that h is obtained from P and is thus accurate to second order in ε − ε ν ) (C and ∆ are calculated at ε ν , Eq. (30,33)). Linear MTO's will be constructed in Sec. 2.13, and h can be identified with h α − ε ν , Eq. (62), for α=γ and if P parameterised by P .
2.9. "Screening" Transformation to short-ranged, tight-binding basis Formally, Eq. (37) is a "tight-binding" Hamiltonian in the sense that it is a Hamiltonian for a linear combination of atom-centred (augmented) envelopes χ, Eq. (14) taken at some fixed (linearisation) energy ε ν . But the Hamiltonian is longranged becauseĤ (r) is long ranged (see Eq. (C.12) for κ=0) unless E = −κ 2 is −1 Ry or deeper. But such a basis is not accurate: the optimal E falls somewhere in the middle of the occupied part of the bands, and is roughly +0.3 Ry in close-packed systems; E=0 is a compromise.
The idea behind the "screening" or "tightbinding" transformation is to keep E= − κ 2 near zero but render the Hilbert space of the χ RL (ε, r) short range by rotating the basis into an equivalent set {χ RL (ε, r)}→{χ α RL (ε, r)}, by particular linear combinations which render χ α RL (ε, r) short ranged (or acquire another desirable property, e.g. be orthogonal). Figure 5: Contour of a screened s envelope function in a bcc lattice. Each contour represents a reduction in amplitude by a factor of 10. Dashed lines show contours with negative amplitude. At the "hard core" radius (Sec. 2.10.5) on the head site a screened function has pure character; at the "hard core" radius on all the tail sites the one-centre expansion of the envelope function vanishes for < max.
Andersen sometimes called the change a "screening transformation" because it is analogous to screening in electrostatics. Note thatĤ (E=0, r) ∝ 1/r, has the same form as a charge monopole, with long range behaviour. It becomes short ranged if screened by opposite charges in the neighbourhood. The same applies to higher order multipoles: they can become short ranged in the presence of multipoles of opposite sign.
The transformation can be accomplished in an elegant manner by admixing to the original ("bare") envelope (renamed fromĤ RL toĤ 0 RL to distinguish it from the "screened" one) with amounts ofĤ 0 R L in the neighbourhood of R: 12 Whatever prescription determines B α R L ;RL , it is evident that the Hilbert space is unchanged, and thatĤ α RL →Ĥ 0 RL if B α R L ;RL →δ R L ;RL The one-centre expansion ofĤ α RL in any channel R L is some linear combination of Hankel and Bessel functions, becauseĤ 0 RL are Hankels in their head channel R L =RL, and linear combinations of Bessels in other channels R L =RL (see Eq. (3)). Thus everyĤ 0 RL is expanded in R L by some particular linear combination of Bessel and Hankel functions B α R L ;RL determines α R L ;RL , and vice-versa. α used as a superscript indicates that it determines the screening.
A simple and elegant way to choose the screening transformation is to expand everyĤ 0 RL in channel R L =RL, by the same function J α (κ, r R )Y L (r R ). Then α need only be specified by two indices, α R L ;RL →α R .
To sum up, we specify the transformation through α R . EnvelopesĤ α RL are expanded at site R L =RL, as linear combinationŝ The structure constants S α R L ;RL are expansion coefficients that will be determined next, but already it should be evident that S α α→0 − −− →S 0 , where S 0 are the structure constants of Eq. (3).
In its own "head"Ĥ α RL must have an additional irregular part. By expressing the B α R L ;RL in Eq. (38) in terms S α R L ;RL aŝ we can see indeed thatĤ α RL (r) has the required one-centre expansion, Eqs. (40,41), provided that S α obey a Dyson-like equation In practice S α is calculated from It follows immediately from Eq. (43) that if there are two screening representations α and β, the structure constants connecting them are related by  (16). Here we mostly concern ourselves with the ASA with κ=0.

Redefinitions of Symbols
We have defined a number of quantities in the context of the original MTO basis set, Eq. (14) that will have a corresponding definition in a screened basis set. Several previously defined quantities are now labelled with a superscript 0 to indicate that their definitions correspond to the unscreened α=0 representation: RL and S R L ;RL ≡S 0 R L ;RL . The "potential parameters" C, ∆, and p, Eq. (30)(31)(32)(33)(34)(35)(36) and Table 1, can also be relabelled with representation-dependent definitions. It is unfortunately rather confusing, but the original definitions without superscripts Eq. (30)(31)(32)(33)(34)(35)(36) correspond not to C 0 , ∆ 0 , and p 0 , but to the particular screening representation α=γ (dubbed the "γ representation"). To be consistent with the new superscript convention for P 0 and N 0 , the appropriate identifications are φ(ε ν )≡φ γ (ε ν ),φ(ε ν )≡φ γ (ε ν ), and Another unfortunate artefact of the evolution in LMTO formalism is that the meaning of many symbols changed over time. γ is called Q −1 in Ref. [31].
In the most recent NMTO formalism, S and B have exchanged meanings. Questaal's ASA codes use definitions that most closely resemble the "second generation" LMTO formalism perhaps most clearly expressed in Ref. [34]. Reference 16 of that paper makes correspondences to definitions laid out in earlier papers.

Screened Muffin-Tin Orbitals
To define the analogue of the MTO, Eq. (14), in a screened representation we write The partial wave φ RL (ε, r) is understood to vanish outside its own head sphere, and P is a matrix diagonal in RL: P α R L ;RL (ε) = P α R L (ε) δ R L ;RL . J α R L is the linear combination of φ andφ that matches continuously and differentiablyĴ α R L defined in Eq. (39). Eq. (51) uses it instead ofĴ α in because when we later construct energyindependent MTO's we can generate basis sets that accurately solve Schrödinger's equation in the augmentation spheres. Since P α and N α depend only on values and slopes at the {s R }, the substitution has no effect on them.
that satisfy the set of linear equations and the normalisation RL |z α RL | 2 = 1 (53) simplifies to which is a normalised solution to the SE for v(r) = R v R (r).
In practice the solution is inexact because L summations are truncated. The solution is rapidly convergent in the L-cutoff, however; see Ref. [34] for an analysis.

Hard Core Radius
α can be physically interpreted as equivalent to specifying a "hard core" radius where the onecentre expansionĤ α RL vanishes in a sphere centred at R . This is evident from the one-centre expansion Eq. (40) and the form ofĴ α , Eq. (41).Ĵ α vanishes at the radius where In his more recent developments, Andersen defined the screening in terms of the hard core radius a instead of α, because nearly short-ranged basis functions can be obtained for a fixed a R =0.7s R , independent of κ and .
It is easy to see how such a transformation can render envelope functions short-ranged. The value ofĤ α R is forced to be zero in a sea of R L channels 14 surrounding it. Provided the a R are suitably adjusted, it quickly drivesĤ α R (r)→0 everywhere for increasing r. If the a R →0, the screening vanishes andĤ α R returns to the long-rangedĤ 0 R ; while if the a R becomes comparable to s R the value on the head must be something like 0 and 1 at the same time (heads and tails meet). The damping is too large and theĤ α R (r) "rings" with increasing r. For a R =0.7r s or thereabouts the ringing is damped andĤ α R (r) decays exponentially with r, even for κ=0.

MTO's and Second Order Green's Function
Through the eigenvectors z α RL of Sec. 2.10.4, we can construct the Green's function. This was done in Appendix A of Ref. [31], where the full Green's function, including the irregular parts, are derived. Here we will adopt a simpler development along the lines of Ref. [5], after linearising the χ α RL (ε, r). One way to see why P α −S α have similar eigenvalues for any α is to note that [P α ] −1 −[S α ] −1 does not depend on α, since [P α ] −1 and [S α ] −1 are shifted by the same amount (compare Eq. (43) and (48)). In scattering theory [P 0 (ε)] −1 is proportional to tangent of the phase shift, and we realise that the transformation (P 0 , S 0 )→(P α , S α ) corresponds merely to a shift of the scattering background. The pole structures of P α −S α can depend on α because of the irregular parts: P α −S α have the same poles as [P α ] −1 −[S α ] −1 only where P α and S α have no zeros or poles. Some care must be taken when generating the Green's function G.
The MTO's Eq. (51) form a complete Hilbert space for any α, but α can be chosen to satisfy varying physical requirements. To make short-ranged Hamiltonians it has been found empirically that the following universal choice α s = 0.34857 α p = 0.05303 yields short-ranged basis functions χ α RL (r) for κ=0, for any reasonably close-packed system.
Another choice is α=γ. In Sec. 2.8.3 it was shown how the tail cancellation condition had the same eigenvalues as a fixed Hamiltonian, Eq. (37). We are now equipped make a connection with the χ γ basis and Eq. (37). Moreover, this connection enables us to construct the second order Green's function. First, Eq. (43) enables us to recognise the quantity S −1 −γ −1 as S γ . Eq. (37) then has the simple two-centre form h = C + ∆ 1/2 S γ ∆ 1/2 The Green's function corresponding to some fixed h has the simple form for complex energy z. It is easy to see that G(z) can be expressed in the following form: This is the analogue of Eq. 17 in the γ representation. P γ (z)−S γ has a direct connection with z − h because of P γ (z) takes the simple form (z−C)/∆.
2.11.1. Scattering Path Operator in Other Representations To build G(ε) from general screening representations β we need to transform the scattering path operator g γ →g β . This can be accomplished using Eqs. (49) and (44). Following Ref. [34]: These transformations require only Eqs. (49) and (44); they do not depend on parameterisation of P .
Since Eq. (50) is representation-independent, (P α /P β ) can equally be written in the following forms: 2.12. The ASA Green's Function, General Representation As shown in Sec. 2.11 the relation between the Green's function G(E) and the scattering path operator g is particularly simple when potential functions are parameterised to second order. The relation between the ASA approximation to G and g can be written more generally as follows. In the ASA, every point r belongs to some sphere R with partial waves φ RL (ε, r) so that r and r (or R and R ) are the field and source points, respectively. It was first shown in Ref. [31], for the "bare" representation α = 0, and for a screened representation α in Ref. [34]. The first term cancels a pole appearing in the second term, connected to the irregular part of G (which we do not consider here). P α (ε) can be computed by integration of the radial Schrödinger equation for any ε. If this is done, and the structure constants S are taken as energy-dependent, this is the screened KKR method. Questaal's lmgf and lmpg parameterise P α (ε), to second order (Eq. (48)), or to third order (Sec. 2.13.4) in ε.
Note that G does not depend on choice of α; it can be used to as a stringent test of the correctness of the implementation.

The ASA Hamiltonian: Linearisation of the Muffin-Tin Orbitals
The energy-dependent MTO, Eq. (51), exactly solves the ASA potential for a fixed ε. To make a fixed, energy-independent basis set, we constrain the energy-dependent MTO, Eq. (51), to be independent of ε. As we saw in Secs. 2.8.3 and 2.11, there is a simple energy-independent Hamiltonian Eq. (37) that has the same eigenvalue spectrum as the second order G; this is χ γ RL (ε ν , r). We can construct an energy-independent basis χ α RL (r)=χ α RL (ε ν , r) for any α, by choosing the normalisation N α R (ε) in such a way that ∂χ α RL (ε, r)/∂ε=0 at ε = ε ν . Thus we require Then the condition thatχ α RL (r) vanish for all r be-comesJ Henceforth, when the energy index is suppressed it means that the energy-dependent function or parameter is to be taken at the linearisation energy ε ν . IfJ is replaced by Eq. (60), Eq. (51) becomes The Hilbert space of the {χ α RL (r)} consists of the pair of functions φ R L andφ R L inside all augmentation channels R L . Changing α merely rotates the Hilbert space, modifying how much φ R L anḋ φ R L each χ α RL contains. Eq. (61) may be regarded as a Taylor series of χ α RL (ε, r), Eq. (51), to first order in ε − ε ν , withφ α (r) h α playing the part ofχ α (r) (ε − ε ν ). The eigenvalues of h α are fact the eigenvalues of χ α RL (ε, r) to first order in ε − ε ν . If P is parameterised to second order, P (ε)→ P (ε), h α in the γ representation becomes the second order ASA Hamiltonian, Eq.(37), when α=γ.
2.13.1. Potential Parameters C α , ∆ α , and o α The LMTO literature suffers from an unfortunate proliferation of symbols, which can be confusing. Nevertheless we introduce yet another group because they offer simple interpretations of what is happening as the basis changes with representation α, and also to make a connection with the Green's function. It is helpful to remember there is single potential function P α (ε), which determines the normalisation N α (ε) though Eq. (47). P α (ε) can be parameterised to second order with three independent parameters C, ∆, and γ (Eq. (48)) and to third order with the "small parameter" p (Eq. (36)).
The energy derivative of φ α (ε, r) at ε ν iṡ The last form applies when P is parameterised to second order. We have introduced the "overlap" potential parameter o α . It vanishes in the γ representation and consequentlyφ γ =φ. This fact provides a simple interpretation of χ α RL (r), Eq. (61) in the γ representation. χ γ RL acquires pure φ character for its own head, and pureφ in spheres where R =R. This implies that the {χ γ } basis are orthogonal apart from interstitial contributions (neglected in the ASA) and small terms proportional to p (Eq. (36)). φ andφ combine in every sphere in the exact proportion Eq. (8) at each eigenvalue ε i of h γ . Thus eigenvalues of h γ are correct to one order in ε i −ε ν higher than h α =γ .
In the LMTO literature two other parameters are introduced to characterise h α in a suggestive form: where [. (37)) when α=γ.

How h α Changes with Representation
From Eq. (64) implies thatφ α transforms aṡ Dividing χ α in to φ andφ parts, we realise that o α + [h α ] −1 is independent of representation and therefore  (61)) is an energy-independent basis set and has an eigenvalue spectrum. Within the ASA (Sec. 2.7) matrix elements of the Hamiltonian and overlap are readily obtained. Using normalisation Eq. (5) and (36), Schrödinger's equation for partial waves Eq. (4), the parameterisation ofφ α Eq. (63), and neglecting the interstitial parts, 2.13.4. Third Order Green's Function G(z) depends on potential parameters C and ∆. These can be replaced with the following: which follows from Eq. (50) with α=γ. Then the substitution P →P through the replacement ε→ε = ε + (ε − ε ν ) 3 p, Eqs. (35) and (36). This yields an expression for G to third order -more accurate than the 2nd order G [34,31]. Questaal codes lmgf and lmpg permit either second or third order parameterisation.

Principal Layer Green's Functions
Questaal has another implementation of ASA-Green's function theory designed mainly for transport. lmpg is similar in most respects to the crystal package lmgf, except that is written as a principallayer technique. lmpg has a 'special direction', which defines the layer geometry, and for which G is generated in real space. In the other two directions, Bloch sums are taken in the usual way; thus for each q in the parallel directions, the Hamiltonian becomes one-dimensional and is thus amenable to solution in order-N time in the number of layers N.
The first account of this method was presented in Ref. [35], and the formalism is described in detail in Ref. [14], including its implementation of the nonequilibrium case. Here we summarise the basic idea and the main features.
lmpg is similar in many respects to lmgf except for its management of the layer geometry. The material consists of an active, or embedded region, which is cladded on the left and right by left and right semi-infinite leads.
The end regions are half-crystals with infinitely repeating layers in one direction. All three regions are partitioned into slices, or principal layers (PL), along the 'special direction'. The left-and rightend regions consist of a single PL, denoted −1 and n, which repeat to ∓∞. Thus the trilayer geometry is defined by five lattice vectors: two defining the plane normal to the interface (the potential is periodic in those vectors); one vector PLAT for the active region and one each (PLATL and PLATR) defining the periodically repeating end regions.
Far from the interface the potential is periodic and states are Bloch states. It is assumed that the potential in each end layer is the same as the bulk crystal (apart from a constant shift) and repeats periodically in lattice vectors PLATL and (PLATR) to ∓∞.
Partitioning into PL is done because E − H = G −1 is short-ranged. It is requirement that a PL is thick enough so that H only connects adjacent PL. Then H is tridiagonal in the PL representation and the work needed to construct G scales linearly with the number of PL. Moreover it is possible in this framework to construct G for the end regions without using Bloch's theorem.
Principal layers are defined by the user; they should be chosen so that each PL is thick enough so that H connects to only nearest-neighbour PL on either side. (Utility lmscell has a facility to partition the active region into PL automatically.) 2.14.1. Green's Function for the Trilayer lmpg constructs the auxiliary g, and if needed builds G from g by scaling (Sec. 2.12). In many instances g is sufficient (e.g. to calculate transmission and reflection probabilities [14]), although G is needed to make the charge density. Note there is a g (or G) connecting every layer to every other one; thus g has two layer indices, g ij (i and j refer to PL here).
g=(P − S) −1 for the entire trilayer can be constructed in one of two ways. The first is a differenceequation method, described in Ref. [35]. The second is simply to invert (P − S) using sparse matrix techniques. Both methods require as starting points the diagonal element g s,L −1−1 for semi-infinite system (consisting of all layers between ∞ and −1, with vacuum for all layers to the right of the L-region) and the corresponding g s,R nn for the R-region. The sparse-matrix method is simple to describe. Supposing the active region is considered in isolation; denote it as I. Then g −1 I =(P −S) I . The effect of the leads is to modify g −1 I by adding a self-energy to layers 0 and n: lmpg implements both the difference-equation and sparse-matrix techniques: both scale linearly with the number of layers, in memory and in time. It has been found empirically that they execute at similar speed for small systems, while the difference-equation method is significantly faster for large systems.

Green's Functions for the End Regions
To make g, the diagonal surface Green's functions g s,L −1−1 and g s,R nn are required. lmpg implements two schemes to find them: a "decimation" technique [36] and a special-purpose difference-equation technique applicable for a periodic potential [37].
The latter method requires solution of a quadratic algebraic eigenvalue problem, which yields eigenvalues r: they correspond physically to wave numbers as r = e ika . a is the thickness of the PL and k the wave number in the plane normal to the interface. k is in general complex since no boundary conditions are imposed; it is real only for propagating states. Eigenvalues occur in pairs, r 1 and r 2 , and in the absence of spin-orbit coupling, There is a boundary condition on the end leads for the trilayer, which excludes states that grow into an end region. The surface Green's function g s can be constructed from the same eigenvectors that make the "bulk" g [37].
A great advantage of this method is that its solution provides the eigenfunctions of the system; thus the Green's function can be resolved into normal modes. A large drawback is the practical problem of finding a solution to the eigenvalue problem. It can be converted into a linear algebraic eigenvalue equation of twice the rank; however, the resulting secular matrix can be nearly singular (especially if S is short-ranged). Also, the pairs r 1 /r 2 can range over a very large excursion, of unity for propagating states and many orders of magnitude for rapidly decaying ones. Capturing them by solving a single eigenvalue problem imposes severe challenges on the eigenvalue solver.
Decimation is recursive and generally efficient; however problems can appear at special values of k energy where the growing and decaying pair r 1 and r 2 become very close to unity. Unfortunately, those "hot spots" are often the physically interesting ones.
At present Questaal's standard distribution does not have a fully satisfactory, all-purpose method to determine g s , though one has been developed and will be reported in a future work.

Contour Integration over Occupied States
Re z Im z Figure 6: Representative contour integration for occupied states. Twelve points are taken for the interval (−1, 0). The contour deformation has eccentricity = 1/2, and a bunching parameter 1/2, giving more weight to the points near E F = 0.
Many properties of interest involve integration over the occupied states. In contrast to band methods which give eigenfunctions for the entire energy spectrum at once, Green's functions are solved at a particular energy, and must be numerically integrated on an energy mesh for integrated properties. The spectral function or density of states is related to G as and in the noninteracting case is comprised of a superposition of δ-functions at the energy levels.
Thus G(ε) has lots of structure on the real axis, which makes integration along it difficult. However, since below the Fermi level G should have no poles in the upper half of the complex plane, the Cauchy theorem can be used to deform the integral from the real axis to a path in the complex plane (Fig. 6). lmgf and lmpg use an elliptical path, with upper and lower bounds on the real axis respectively at the Fermi level and some energy below the bottom of the band. A Legendre quadrature is used, but the weights can be staggered to bunch points near E F where G has lots of structure. Thus five parameters define the mesh: the number of points, the upper and lower bounds, the eccentricity of the ellipse (between 0 for circle and 1 for a line on the real axis) and bunching parameter which also ranges between 0 and 1. The integrand on the contour is smooth, except near the endpoint z→E F . Good results can be obtained with a modest number of points, typically 12-20.
The exact potential function P α has no poles in this half plane; nor does the second order parameterisation. However, spurious poles may appear in the third order parameterisation. These may be avoided by working in the orthogonal representation (α = γ) and/or by choosing fairly large elliptical eccentricities for the contour.

Spin-orbit Coupling in the Green's Function
It has been shown in Sec. 2.8.2 that spin-orbit coupling (SOC) can be added perturbatively to the Hamiltonian, resulting in the matrix elements containing ξ [D , D]. These matrix elements are added to the right-hand-side of the first line in Eq. (26), while the overlap integrals (second line in Eq.( 26)) remain unchanged. In order to construct the Green's function with SOC, the resulting modification of the variational energy in Eq. (27) needs to be reformulated as a perturbation of the potential parameters.
Because the SOC operator is a matrix (see Sec. 2.8.2), the exact solutions φ νljκ (r) of the radial Pauli equation are linear combinations of spherical waves |lmσ and |lm σ with m + σ = m + σ = j. However, our perturbative treatment is still based on basis functions with definite spin that are calculated without SOC. The energy dependence, however, is modified by allowing the ω parameter to become a matrix, so that Eq. (25) is replaced by where we dropped the common superscript because the SOC operator is diagonal in . The summation in (72) involves at most two terms with m + σ = m + σ .
Instead of the simple variational estimate of (D) in Eq. (27), we now construct a generalised eigenvalue equation, which leads tô whereω is the matrix from (72), bothω andV SO are functions of D ↑ and D ↓ , andp andˆ ν are diagonal matrices with elements φ2 ν σ and νσ , respectively. The matrices are assumed to include the full basis set on the given site, i.e., their dimension is 2(2 + 1).
Theω matrix is found by solving Eq. (73). To first order in − ν , it givesω = 1 −ˆ ν −V SO , so that the matrix elements ofV SO are effectively added toˆ ν . Promoting the potential function P ( ) to a matrix and using the representation Eq. (32) and the definitions (29-31), we find that the parameters ∆ and γ are unaffected byV SO while C is promoted as C →Ĉ = C1 +V SO . However, in order to make the definition ofP ( ) unambiguous, we need to fix the correct order of matrix multiplication. We also need to ensure that the poles of G(z) have unit residues. We use the following definitions: whereV SO = dV SO /d comes from the energy dependence of the SOC parameters ξ [D , D]. Note thatṖ = µ R µ L , and the structure of G(z) guarantees that the poles of G(z) have unit residues. It is straightforward to check that, with energy-independentV SO , G(z) becomes the resolvent of the second-order Hamiltonian Eq. 37, with addedV SO , as expected.
To third order in − ν and to first order inV SO , we find from Eq. (73): where is defined in Eq. 35, and with Â ,B =ÂB +BÂ. BecauseV SO is defined at the fixed values of the logarithmic derivatives, the spin-orbit coupling parameters ξ ,σσ inV SO are calculated with replaced by : Of course, just as in the non-relativistic case, is also replaced by where it appears explicitly in Eqs. (75)(76)(77).V SO is always calculated as the exact energy derivative ofV SO . Figure 7 shows the comparison of the magnetocrystalline anisotropy calculated using lm and lmgf for two benchmark systems. Two cases are displayed: lmgf with second-order potential functions compared with the corresponding two-centre approximation in lm, and lmgf with third-order potential functions compared with the full threecentre lm calculation. The agreement in both cases for FePt [panel (a)] is very good, while for the (Fe 1−x Co x ) 2 B alloy in the virtual crystal approximation it is essentially perfect.

Fully Relativistic LMTO-ASA
We have developed a fully relativistic extension of the LMTO-ASA code within a relativistic generalisation of the density functional formalism [38,39,40,41,42]. In the most general case, one needs to solve the Kohn-Sham Dirac equation with where Here, σ is the vector of Pauli matrices, p is the momentum operator, B eff (r) is an effective spindependent potential acting on electrons. It should be noted that this is a simplified form of the relativistic Kohn-Sham equation in which the orbital contribution to the 4-component relativistic current is neglected. This simplification is necessary in order to avoid the significant formulaic and computational complications that arise in the relativistic current density formulation of the density functional theory [42].
For a spherically symmetric potential V (r) = 1/2[V ↑ (r) + V ↓ (r)] inside a single MT sphere, the direction of the magnetic field can be assumed to point along the z direction, . Then, Eq. (83) can be written as The solutions of the Kohn-Sham Dirac equation (85) are linear combinations of bispinors: Here, Ω κµ (r) are the spin spherical harmonics, µ is the projection of the total angular momentum, and κ is the relativistic quantum number, κ 2 = J(J + 1) + 1/4. The radial amplitudes g κµ (ε, r) and f κµ (ε, r) satisfy the following set of coupled differential equations: where In the general case, in the presence of a magnetic field, one has to solve a system of two infinite sets of mutually coupled differential equations because the magnetic field couples radial amplitudes with different relativistic quantum numbers κ. Specifically, states with κ 1 to those with κ 2 = κ 1 and κ 2 = −κ 1 − 1, i.e., states of the same , but also states with κ 1 to those with κ 2 = 1 − κ 1 , i.e., state with different 's, ∆ = ±2. To avoid this complication, this coupling is neglected. In such case, the set blocks into coupled equations for each pair µ. For |µ| = + 1/2, there is no coupling, so similar to the non-relativistic case, there is only regular solution with quantum numbers κµ, while for |µ| < + 1/2, we need to solve the set of four coupled equations (88) for the four unknown radial functions g κ1µ , f κ1µ , g κ2µ , and f κ1µ . The coefficients u, u , and u in Eq. (88) result from matrix elements of the type κµ|σ z |κ µ [42].
Physically, the neglected coupling corresponds to a magnetic spin-orbit interaction given by a term 2c −2 r −1 dB/drL · S in the weak relativistic domain [43]. Since this is proportional to the product of two small quantities (c −2 and dB/dr) its omission is justified in most cases.
The construction of the MT orbitals and the corresponding boundary conditions proceeds along the same principles as in the non-relativistic case described in the previous sections. The tail cancellation condition is conveniently formulated in terms of relativistic extensions of the potential and normalisation functions: The logarithmic derivative matrix being given in terms of the small and large components of the radial amplitude: Note that the indices in this matrix have different physical meaning. Although the values of index α are numerically equal to those of κ, index α stands for different behaviour of the radial function at the origin [41, 42] while κ stands for solutions with different quantum states. Therefore, these matrices are not symmetric and do not commute with each other.
The linearisation can be formulated in a matrix form too [in the following we will leave out the subblock index L; all presented matrices have the form (93)]. Within second-order approximation, the radial amplitudes are expanded around a linearisation energy: Then, a symmetric matrix form of the linearisation of the logarithmic derivative can be written as where By direct substitution of (96) into (91) we obtain the parameterisation of the potential function: or equivalently where and For an arbitrary representation α: which is the relativistic analogue of Eq. (48). In general, the matrices V , R, Q, C, γ and W are nondiagonal, some are symmetric and some are not, so they do not all commute with each other. This is related to the different physical origin of the matrix indices discussed above. The parameter W is the relativistic analogue of √ ∆ introduced earlier [see Eq. (33)], and Eq. (98) is analogous to the nonrelativistic Eq. (32). We should note that in the code, we have also included a third-order parameterisation of the potential function similar to the non-relativistic case, Eq. (35).
The physical Green function and Hamiltonian are written in terms of the potential function and scalar relativistic structure constants matrices after a transformation of the former from κµ to mm s basis: where these are the relativistic analogues of Eq (58) and the scattering path operator. Within the framework of the TB-LMTO and principal layer approach, the fully relativistic version of the green function for layered geometry is constructed straightforwardly from the site diagonal fully relativistic potential function [42].

Coherent Potential Approximation
The coherent potential approximation (CPA) is a Green's function-based method used to describe the electronic structure of disordered substitutional alloys. Questaal lmgf code implements the CPA in the Atomic Spheres Approximation, following the formulation of Refs. [44,42]. Any lattice site i can be occupied by any number of components a with probabilities (concentrations) c a i , which must be supplied by the user. These components can have different atomic sphere radii, and each has its own charge density, atomic potential, and diagonal matrix of potential functions P a i ( ). Each site with substitutional disorder ("CPA site") is also assigned a coherent potential matrix P i ( ) that has the same orbital structure as P a i ( ) but is off-diagonal with the restriction that it must be invariant under its site's point group. The elements of the P i ( ) matrix are complex even if is real, and it is fixed by the CPA self-consistency condition.
The configurational average of the scattering path operator is given bȳ where the site-diagonal matrix P( ) absorbs the coherent potential matrices for the CPA sites and the diagonal matrices of the conventional potential functions for the sites that are occupied deterministically ("non-CPA sites"). The matrix in Eq. (107) is integrated over the Brillouin zone in the usual way, and its site-diagonal blocksḡ ii are extracted. For non-CPA sites, the full Green's function, density matrix, and the charge density are obtained fromḡ ii in the usual way. For each CPA site i, we define the scattering path operatorḡ a i (separately for each component a), which is the statistical average under the restriction that site i is occupied deterministically by component a, while all other sites in the infinite crystal are occupied statistically, according to their average concentrations. Such quantities are called "conditionally averaged." The charge density for the component a on site i is obtained from the site-diagonal blockḡ a ii on site i of the conditionally averagedḡ a i . This site-diagonal block can be found from the matrix equation on that site: and the CPA self-consistency condition reads Iteration to self-consistency is facilitated [44,42] by introducing the coherent interactor matrix Ω i , for each CPA site, defined throughḡ ii = (P i − Ω i ) −1 . The conditionally averagedḡ a ii is thenḡ a ii = (P a i − Ω i ) −1 . The latter two equations can be used to re-express the self-consistency condition (109) in terms of P i , Ω i , and P a i without an explicit reference to the scattering path operator: The Ω i matrices (for each required complex energy point) are converged to self-consistency using the following procedure. At the beginning of the CPA iteration, Eq. (110) is used to obtain P i for each CPA site from Ω i , and then P i is inserted in Eq. (107). After integration over k, the site-diagonal blockḡ ii is extracted for each CPA sites and used to obtain the next approximation for ii , closing the self-consistency loop. The output Ω i matrices are linearly mixed with their input values. The mixing parameter can usually be set to 1 for energy points that are not too close to the real axis. The CPA loop is repeated until the Ω i matrices converge to the desired tolerance, after which a charge iteration is performed. At the beginning of the calculation, the Ω i matrices are initialised to zero unless they have already been stored on disk. The CPA loop can sometimes converge to an unphysical symmetry-breaking solution; this can usually be avoided by symmetrising the coherent potentials using the full space group of the crystal. Fig. 2.18 illustrates one recent application of Questaal's implementation of the CPA. A number of new, potentially high-impact technologies, Josephson MRAM (JMRAM) in particular, performs write operations by rotating a patterned magnetic bit. These operations consume a significant portion of a system's power. One way to reduce the power consumed by write operations is to substitute materials with smaller saturation magnetisation M s . The minimum usable M s is limited by the need to maintain large enough energy barrier to prevent data loss due to thermal fluctuations. Low M s can greatly reduce power particularly for JMRAM [47], which operates at around 4 K. One way to reduce M s in permalloy (Ni 80 Fe 20 alloys, the most commonly used JMRAM material), is to admix Cr into it, as Cr aligns antiferromagnetically to Fe and Ni. Fig. 2.18 shows some results adapted from a recent joint experimental and theoretical study of Py 1−x Cr x , [45]. The CPA calculations of M s track measured values fairly well. Also shown is the CPA energy band structure for the majority spin. Alloy scattering causes bands to broaden out, and scattering lifetimes can be extracted. A detailed account can be found in Vishina's PhD the-sis [46].

Noncollinear Magnetism
The ASA codes lm, lmgf and lmpg are fully noncollinear in a rigid-spin framework, meaning that the spin quantisation axis is fixed within a sphere, but each sphere can have its own axis. The formulation is a straightforward generalisation of the nonmagnetic case. Potential and Hamiltonian-like objects become 2×2 matrices in spin space. The structure matrix S is diagonal matrix in this space and independent of spin, while potential functions P and potential parameters of Sec. 2.8.1, become 2×2 matrices that are diagonal in but off-diagonal in spin. (Alternatively, a local spin quantisation axis can be defined which makes P diagonal in spin; then S is no longer diagonal.) How Questaal constructs the Hamiltonian in the general noncollinear case, and also for spin spirals, is discussed in Ref. [48].  These codes have implemented spin dynamics [11], integrating the Landau-Lifshitz equation using a solver from Bulgac and D. Kusnezov [50], the spin analogue of molecular dynamics and molecular statics. The formulation is described in detail in Ref. [48]. Codes also implement spin statics.
Torques needed for both are obtained in DFT from the off-diagonal parts of the spin-density matrix. A classic application is the study of INVAR. Fe-Ni alloys with a Ni concentration around 35 atomic % (INVAR) exhibit anomalously low, almost zero, thermal expansion over a considerable temperature range. In Ref. [49] it was shown that at 35% composition, Fe-Ni alloys are on the cusp of a collinearnoncollinear transition, and this adds a negative contribution to the Grüneisen parameter. Fig. 2.19 is reproduced from that paper.

Full Potential Implementation
Questaal's primary code lmf is an augmentedwave implementation of DFT without shape approximations. It also handles the one-body part of the quasiparticle self-consistent GW approximation, by adding the (quasiparticlised) GW selfenergy to the DFT part. Closely related codes are lmfgwd, a driver supplying input for the GW, lmfdmft, a driver supplying input for a DMFT solver, and lmfgws, a post-processing tool that generates quantities using the one-body part from lmf and the dynamical self-energy from GW or DMFT. This code uses different definitions for classical functions (see Appendix), and we shift to those definitions in what follows.
As for the DFT part, Questaal's unique features are: • it uses a three-component augmentation, following most of the original method of Ref. [30]. The augmentation is reviewed in Sec. 3.6.
Ref. [51] offers a slightly different presentation and shows how it is connected with the PAW method [4].
• it has a more general basis set. Ordinary Hankel functions solve Schrödinger's equation for a MT potential, but real potentials vary smoothly into the interstitial (Fig. 10). The envelope function of a minimal basis set must adapt to this potential. The traditional Questaal basis uses smooth Hankel functions [52], which may be thought of as a convolution of a Gaussian function and a traditional Hankel function; they are developed in Sec. 3.1. This traditional basis works very well for most systems; however when the system is very open, it is slightly incomplete (Sec. 3.13). One way to surmount the incompleteness is to combine smooth Hankel functions with plane waves; this is the 'Planar Muffin Tin" (PMT) basis [53] (see also Ref. [51]). While it would seem appealing, PMT suffer from two serious drawbacks: first, it tends to become over-complete even with a relatively small number of plane waves, and second, it is not compact (minimal, short ranged and complete as possible for the relevant energy window).
• Questaal's most recent development is the "Jigsaw Puzzle Orbital" (JPO) basis, which uses information from the augmentation to construct an optimal shape for the envelope functions. To the best of our knowledge, JPO's are the closest practical realization of compactness. This is particularly important in manybody treatments where the efficacy of a theory hinges critically on compactness. This new basis will be presented more fully elsewhere. In Sec. 3.12 we show how a transformation to a tight-binding representation can be carried out in a full-potential framework. In the present version the basis set is merely a unitary transformation of the original one.

Smooth Hankel Functions
In his PhD dissertation Michael Methfessel introduced a class of functions H kL (E, r s , r). As limiting cases they encompass both ordinary Hankel functions and Gaussian functions (see Eq. (125) and (127) below). They are explained in detail in Ref. [52]; here we present enough information for development of the Questaal basis set. They are all connected to the smooth Hankel function for =0. Its Fourier transform is which is a product of Fourier transforms of ordinary Hankel function for =0 with a Gaussian function of width 2/r s .
We use standard definitions of Fourier transforms Thus in real space, h 0 (E, r s ; r) is a convolution of a ordinary Hankel function and a Gaussian function. It has an analytic representation For small r, h 0 behaves as a Gaussian and it evolves smoothly into an ordinary Hankel when r r s . Questaal's envelope functions are generalised Hankel functions H L (E, r s ; r). Their Fourier representation has a closed form Y L (r) is an ordinary spherical harmonic scaled by r (Appendix A). Following the usual rules of Fourier transforms this function has a real-space representation so is meaningful to talk about Y L (−∇). The extended family H kL (r) is defined through powers of the Laplacian operator: A recursion relation for H L (E, r s ; r) for >0 can be derived from properties of Fourier transforms in spherical coordinates. Any function whose Fourier transform factors aŝ has a real-space form where f andf are related by [52] f (r) = 4πi and j is the spherical Bessel function. We can therefore express H L in a Slater-Koster form For envelope functions used in basis sets, e.g. H L (E, r s ; r), the radial portion of the Fourier transform does not depend on ; therefore corresponding real-space part depends on only through the j . This is a very useful fact. In particular it is possible to derive a recurrence relation to obtain χ for >1; this provides an efficient scheme for calculating them. The recurrence is derived in Ref. [52] (see Eq. 6.21): χ 0 is written in Eq. (113); and By taking limiting cases we can see the connection with familiar functions, and also the significance of parameters E and r s .
This is the Fourier transform of H 0 (E, 0; r) = Y 00 (r) exp(−κr)/r, and is proportional to the . This is the Fourier transform of a Gaussian function of width r s . For general L we can define the family as Evidently H L (q) is proportional to the product of the Fourier transforms of a conventional spherical Hankel function of the first kind, and a Gaussian. By the convolution theorem, H L (r) is a convolution of a Hankel function and a Gaussian. For r r s , H L (r) behaves as a Hankel function and asymptotically tends to . For r r s it has structure of a Gaussian; it is therefore analytic and regular at the origin, varying as r Y L (r). Thus, the r − −1 singularity of the Hankel function is smoothed out, with r s determining the radius for transition from Gaussian-like to Hankellike behaviour. Thus, the smoothing radius r s determines the smoothness of H L , and also the width of G L .
By analogy with Eq. (120) we can extend the G L family with the Laplacian operator: Eq. (130) shows that G kL has the structure (polynomial of order k in r 2 )×G L . Specifically [52] G kL (E, r) = 2 k+ (2k + 2 + 1)!! r 2k+ and Laguerre polynomials [52], which have the following orthogonality relation and as a consequence the G kL are orthogonal in the following sense: This can be also written as where We will need this relation for one-centre expansion of the H L around remote sites (Sec. 3.6) since the simple expansion theorem Eq. (1) does not apply to them. Comparing the last form Eq. (131) to Eq. (120) and the definition of H kL Eq. (120), we obtain the useful relations H k+1,L (E, r s ; r)+EH kL (E, r s ; r) This shows that H kL is the solution to the Helmholtz operator −∇ 2 +E in response to a source term smeared out in the form of a Gaussian. A conventional Hankel function is the potential from a point multipole at the origin (see Eq. 6.14 in Ref. [52]): smearing out the singularity makes H L regular at the origin, varying as r for small r instead of r − −1 . H kL is also the solution to the Schrödinger equation for a potential that has an approximately Gaussian dependence on r (Ref. [52], Eq. 6.30).

Gradients of Smooth Hankel Functions
Gradients of the H L are needed in several contexts, e.g. for forces and for matrix elements of the momentum operator, which enters into the dielectric function and velocity operator. Also, the energy derivative is needed in several contexts, e.g. for integration of some matrix elements (Sec. D).
The form of Eq. (117) suggests that gradient and position operators acting on H return functions of the same family. This turns out to be the case, and it makes possible matrix elements of these operators with two such functions. Consider the energy derivative first.
In real space, the energy derivative is most easily derived from an integral representation of h (Ref. [52], Eq. A5).
It is readily seen that Noting that the three components of the real harmonic polynomials Y 1m (r), Eq. (A.6) for =1, are (y,z,x), the gradient operator is simply with the appropriate permutation of p. Using Eq. (118), The operator product Y L (−∇)Y K (−∇) can be expanded in the same manner as the product of polynomials Y L (r)Y K (r) (Appendix A). More generally, for a function of the form Eq. (122), it is possible to show that The gradient operator maps Y ,m into a linear combination of at most two Y +1,m i and two Y −1,m i , with m i given in the following table and Table 2: The position operator (needed when the effective one-electron potential is nonlocal) also has the form Eq. (142), but with radial functions f (±) →rf (r). Note the close similarity with the gradient operator and in particular Functions of type Eq. (122) have the Slater-Koster form with another special property, namely that the Fourier transformf (q), Eq. (121), is independent of , from which it follows that f (r) satisfies the following differential equations (Ref. [52], Eq. 4.7 and 4.20) The H kL (r) family is of this type, and moreover, ∇ 2 H kL (r) merely maps H kL to another member of the family (Eq. (120)). Another useful instance is the real harmonic polynomials Y L (Eq. (A.6)).
Explicit forms for f (+) and f (−) for these functions are given in the table below.
Same as Y L In general, e.g. for partial waves φ inside augmentation spheres, f (+) and f (−) must be determined by numerical differentiation. For the pair of functions {r , r − −1 }, f (+) (f (−) ) for the first (second) function vanishes. Any function can thus be matched at some r to this pair (Eq. 12), which determines the projection of its gradient onto Y ∓1m .

Two-centre Integrals of Smoothed Hankels
One extremely useful property of the H kL is that the product of two of them, centred at different sites R 1 and R 2 , can be integrated in closed form.
The result a sum of other H kL , evaluated at the connecting vector R 1 − R 2 . This follows from the power theorem of Fourier transforms and the fact that H * k1L1 (q) H k2L2 (q) can be expressed as a linear combination of other H kL (q), or their energy derivatives. Derivations of these and related integrals are taken up in Appendix D.

Smoothed Hankels of Positive Energy
The smooth Hankel functions defined in (Ref. [52]) for negative energy also apply for positive energy. We demonstrate that here, and show that the difference between the conventional and smooth Hankel functions are real functions.
To extend the definition to any energy we define U ± as: The following relations are useful: Then for ε < 0, iκ is real and are also real. The difference in unsmoothed and smoothed Hankels is for = 0, −1 For ε > 0, κ is real and U + = U * − . Then the difference in unsmoothed and smoothed Hankels for = 0, −1: are real, though h 0 and h −1 are complex.

One-Centre Expansion
Ordinary Hankel functions have a one-centre expansion in terms of Bessel functions, Eq. (1): the shape of the radial function does not change as the vector R connecting source and field point changes, only the expansion coefficients S. This is also true for a plane wave. The H are more complicated: the r-dependence depend on R, tending to Bessel functions only when |R| r s . This complicates the one-centre of H. They can, however, be expanded in the polynomials P kL (Eq. (135)) using the biorthogonality relation Eq. (134). Any smooth function F its one-centre expansion can be written in a one-centre expansion where Expressions for integrals G kL with the H are written in Sec. 3.3.
Eq. (145) expands H L , which generalises Eq. (1) which expands H L (plane waves can similarly be expanded in Bessel functions). Eq. (145) is more cumbersome, and it introduces another cutoff k max . These are drawbacks of this method, but it also brings with it a significant advantage: envelope functions of very different types can be constructed. This adds a considerable amount of flexibility to the method, as smooth Hankels H L and plane waves can be combined in a uniform manner; this is the PMT basis [53]. Perhaps more important, basis functions can be tailored to the potential. This enables them to be very accurate whilst staying minimal. This is the JPO basis (Sec. 3.12).
If r s becomes small G kL becomes sharply peaked and the polynomial expansion becomes a Taylor expansion about the origin. But by allowing r s to be a significant fraction of the MT radius s R , the error gets distributed over the entire sphere and k max can be set relatively low. Note that r s used for this expansion need not be (and in practice is not) the same as r s used to make the envelope functions.

Three-component Augmentation
As noted in the introduction, the pseudopotential method shares many features in common with augmentation. Blöchl's PAW method [4] makes the connection explicit. Questaal starts from an envelope function F (0) RLi (r) that extends everywhere in space, augmented as follows: RLi (r) (146) Index RLi labels the envelope function: R and L mark the channel where functions are centred, and the angular momentum, while i marks the kind of envelope function at R. General envelope functions (e.g. plane waves) need not be atom-centred, for such functions we adopt the labelling convention R=L=0. Unlike the ASA, there is typically more than radial envelope function per R and L. (We will sometimes label F with a single Roman index, e.g. using i to implicitly refer to the entire label RLi.), F (1) is some linear combination of partial waves matching value and slope of F (0) (r), and F (2) (r) is the one-centre expansion of F (0) (r). F (0) , and F (2) are truncated at some max .
Conventional augmented methods construct matrix elements of a local or semilocal operator X straightforwardly as Evaluating the cross terms is unwieldy and for that reason conventional augmentation methods require F i be expanded to very high L. However, it has been observed independently by several authors, the first being Soler and Williams [54] that the integrand can be approximated in the following threecomponent form: Note the minus sign in the last term. The missing terms may be written as Write the L projection of F as P L F . By construction P L F (0) = P L F (2) so if the augmentation is carried out to L→∞, F (0) and F (2) are identical and the missing terms vanishes identically. This form makes it clear that not only does Eq. (147) converge to the exact result in the limit max →∞, but also that the error converges much more rapidly in max than does conventional augmentation.
This occurs for two reasons that work synergistically. The operator X is in practice nearly spherical, coupling parts of F with unequal L only weakly. Consider the one-centre expansion of a particular L of one of these components, and consider for simplicity the unit overlap operator. Near the augmentation boundary all components have the same value and slope by construction, and the ( -projected) F (0) −F (2) or F (1) −F (2) vanishes to second order in r−s. Second, the ( -projected) F varies as r for large for all three components of F , because the angular momentum becomes the dominant contribution to the differential operator. Thus for large the product of two projected functions varies as r 2 for all components, which is heavily weighted to the outer parts of the sphere, in the region where the projections of differences F (0) −F (2) and are then a good approximation to the j . By truncating both local components, only the (0) component remains, which as we have seen accurately represents the (exact) basis function, especially near s R where it is dominant. In other words, the projections of the envelope functions contain projections from max +1 to ∞, and do so with high accuracy starting at a relatively low max +1; indeed convergence is very rapid as shown in Fig. 11 (see also Fig. 3 of Ref. [30]). This explains why the pseudopotential and PAW approximations can be truncated at much lower than the conventional augmented wave methods. Indeed, the present approach has much in common with PAW, but does not make approximations or involve pseudo-partial waves or projector functions.

Secular Matrix
To construct the secular matrix, we need matrix elements of Hamiltonian and overlap: At this stage we need not specify what χ i is, except to note that it has an envelope part χ (0) i that extends everywhere in space, augmented in spheres taking the form Eq. (146). Specifically kL are the coefficients (Eq. (145)) that expand the smooth envelope in χ as polynomials P kL (r) (Eq. (135)) and where The local kinetic energy matrix τ kk l is symmetric, even while the individual terms in Eq. (152) are not, because the integral is confined to a sphere. However the surface terms from the true and smooth parts cancel becauseP kL and P kL match in value and slope there.
Note also that if φ andφ are kept frozen, e.g. computed from a free atom as we might do to mimic a PAW or a pseudopotential method, τ kk and σ kk are independent of environment. Though we won't pursue it here, it suggests a path to constructing a unique and transferable pseudopotential.
Matrix elements of the potential are more complicated and will be discussed in the next section. For now we take them as given, along with the overlap and kinetic energy. We can diagonalise the secular matrix and obtain eigenfunctions and the output density as a bi-linear combination of basis functions where the sum runs over the occupied eigenstates with w n the occupation probability. The bi-linear product may be written in a three-component form The first term yields a smooth n 0 which extends everywhere, and then two local terms, true and onecentre expansion of n 0 to some L cutoff As pointed out before, we expect the higher L components be carried accurately by n val 0 even for low max , which provides a simple justification for the pseudopotential and PAW approximations.

Core
The core levels and core density can be computed either scalar-relativistically, or fully relativistically, from the Dirac equation (Sec. 2.17). The core density is constructed in one of two ways: Selfconsistent core: a core partial wave φ c is integrated in the true potential of subject to the boundary condition φ c (s R ) = φ c (s R ) = 0. It was once thought that the advantage of determining the core in the true potential outweighed the inaccuracy originating from the artificial boundary condition at s R . Frozen overlapping core approximation: φ c is computed in the free atom and kept frozen-experience shows this to be a better approximation. We develop this case here.
The core density has three components, which add to the valence density: where The second term is a smooth Hankel function with C H fixed to fit the spill-out of the core density into the interstitial. It is important that this be taken into account when constructing the total energy and potential. The alternative, to renormalise the core and confine it to the sphere, is a much cruder approximation. This makes it possible for shallow cores to be treated accurately, without needing to include them in the secular matrix. The pseudocores g cn R inñ 0 andñ 2R nearly cancel, but they need to be included to construct V eff and the total energy described below. Also the two kinds of g cn R do not exactly cancel because it extends into the interstitial in the former case, but is truncated in the latter.
Smoothing radius r g must be small, so the core density is almost completely confined to the sphere. In practice we use the same Gaussians as for g val R entering into the valence multipole moments (see Eq. 159 below). In that way g cn R and g val R can be merged together. The Hankel smoothing radius r h is chosen independently, but it is also typically small, of order s R /2. The result should depend minimally on the choice of either.

Local Orbitals
Inside augmentation spheres, the usual Hilbert space consists of partial waves {φ ,φ }. Questaal has the ability to extend this space by one local orbital (LO), to {φ (ε ν , r),φ (ε ν , r), φ z (ε z , r)}. φ z consists of a partial wave evaluated at some energy ε z far above or far below the linearisation energy ε ν for φ . Addition of a LO greatly extends the energy window over which the band structure is valid; see for example Fig. of Ref. [55], where the band structure for Si was compared to LAPW, and to a full APW. Properties of Questaal's LO are explained in that reference in a GW context; here we outline the main features.
LO have one each of the following attributes: • Either core-like or high in energy, with principal quantum number ∓ 1 that of valence φ , respectively.
• Either a conventional LO, which consists φ z with some φ andφ admixed to make the value and slope vanish at s R [56], or an extended LO, which consists of φ z with a smooth Hankel function tail that extends continuously and differentiably into the interstitial. The extended LO can be applied only to the core-like case.
• Either a scalar relativistic LO, or a Dirac LO with µ=1/2. This is not usually important, but it can have consequences for heavy elements; see how it affects LDA and GW gaps in Sec. 3.9.
High-lying states are not usually important for DFT, because states above E F do not contribute to the potential. ores of order E F −2 Ry and deeper can usually be treated quite adequately as core levels integrated independently of the secular matrix (Sec. 3.7.2). There are mild exceptions: for high resolution such as needed for the Delta Codes project (Sec. 3.13), both deep and high-lying LO were used (in the latter case, particularly adding 4d LO to the 3d transition metals).
GW is more sensitive to states far from E F than DFT. Including low-lying cores at E F −2 Ry or even somewhat deeper with LO can modify valence QP levels by ∼0.1 eV. High-lying states are also more important in the GW context. ZnO is an extreme case where they were found to shift the gap by several tenths of an eV [57].

Effective potential
We seek a three-component form of the total energy functional. Its variational derivative with respect to the density yields an effective potential V eff , which should give the first-order variation of the electrostatic and exchange-correlation energy with respect to the trial density. The density has three components, the first interacts with a smooth potentialṼ 0 , the second with the local true potential V 1R , and the third with the local projection of the potential V 2R . Correspondingly, the accumulated sums over the first, second, and third terms produce n out 0 , n out 1R , and n out 2R , respectively. Construction of V eff is complicated by the fact that the electrostatic potential at some point depends on the density everywhere. Therefore it is not possible to construct the potential in a sphere solely from the density inside this sphere, or the smooth potential from n val 0 alone. We can solve this difficulty by defining a pseudodensityñ val 0 which has the same multipole moments inside augmentation sphere R as the true density n val 1R . Suppose the local sphere density n val 1R − n val 2R has multipole moments q M , where M = angular momentum. Then where G RM is a Gaussian of angular momentum M centred at R, with unit moment. G M must be sufficiently localised inside s R that the potential from the second term at s R is negligibly different from the potential of a unit point multipole at R (typically s R /4). Results should not depend on r g , but in practice if it is made too small, numerical integration of the G become inaccurate, while if too large charge is not confined to s R . Provided it is confined, the compensating Gaussians ensure that the electrostatic potential V es [ñ val 0 ] is exact in the interstitial.
The local representation n val 2R of the smoothed density must be equally compensated The q RM can be decomposed into a sum of partial moments By decomposing q RM this way, we can make any possible variation in the local density in the Hilbert space spanned by Eq. 155, and obtain the potential from the electrostatic corresponding to the variation. Matrix elements of the local potential are determined from variation of the electrostatic energy (Sec. 3.7.4) The true potential V 1R is the response to true partial densityP RkLPRk L , andṼ 2R the response to the smooth partial density P RkL P Rk L + Q Rkk LL M G RM . These two partial densities have the same multipole moments by construction.
We solve the Poisson equation for each of the three components All density components must include core (or pseudocore) contributions (Sec. 3.7.2). Poisson's equation forṼ es 0 is most easily solved by fast Fourier transformingñ 0 to reciprocal space, scaling each Fourier component n G by −8π/G 2 and transforming back. In principle the electrostatic potential can be calculated this way but in practice the compensation Gaussians inñ 0 can be fairly sharply peaked. Some parts are separated out and evaluated analytically to keep the plane wave cutoff low, as discussed in the next section.Ṽ es 0 (r) is exact in the interstitial region and extends smoothly through the spheres. V es 1R andṼ es 2R can be solved up to an arbitrary homogeneous solution which is determined by the boundary condition at s R . This adds a term proportional to r m Y M (r), which is determined by resolvingṼ es 0 into a one-centre expansion at s R .
PseudopotentialsṼ es 0 andṼ es 2R approximately cancel each other, but not exactly. This is in part because the latter has a finite L cutoff, and in part becauseṼ es 0 carries the tails of the core densities spilling into the interstitial. These incomplete cancellations are the key to rapid convergence of this method, and also allow relatively shallow cores not to be included in the secular matrix, with minimal loss in accuracy.

Three Component Total Energy in DFT
The total energy in DFT is comprised of the kinetic energy, electrostatic energy, and exchangecorrelation energy Questaal implements two functionals for E tot . The traditional Kohn-Sham functional is evaluated at the output density: U and E xc are evaluated from n out generated by the secular matrix, and T HKS = occ i ψ i | − ∇ 2 |ψ i + T core . T core = core i − n core V eff is the kinetic energy of the core. If the core is kept frozen, it can be evaluated from the free atom.
Harris [58] and independently Foulkes and Haydock [59] showed that the kinetic energy can be expressed in terms of n in : U =U [n in ] and E xc =E xc [n in ] and Both E HKS and E HF are calculated, though usually the latter is preferred because it converges more rapidly with deviations from self-consistency. It is nevertheless useful to have both energies, because they are calculated differently in Questaal.
U and E xc are computed from different densities, but more, T HKS and T HF are calculated in different ways. It is not trivial that at (nominal) self-consistency E HKS =E HF . The two always differ slightly; when they differ more than a small amount, it is an indication that some parameter (e.g. plane-wave cutoff) is set too low. Thus, E HKS −E HF is one measure of the convergence of the basis representing the charge density. In Questaal's three-component framework, nV eff a sum of three terms bilinear in the three componentsñ 0Ṽ es 0 , n 1R V es 1R andñ 2RṼ es 2R . T , U and E xc each have independent contributions from three components, for example the contribution to the electrostatic energy from the valence electrons is The last term cancels most of first inside the augmentation spheres, except the first retains L components above max to all orders. This is becausẽ V 2R should be a one-centre expansion ofṼ 0 up to max , andñ val 2R should be a one-centre expansion ofñ val 0 . Because V es [ñ val 0 ] is exact in the interstitial, including at s R , and it continued inside s R by adding by V es 1R d 3 r −Ṽ es 2R , the true potential is correct everywhere. Note that the derivative of E es wrt coefficients C (i) Rk L in Eq. 155 yield the local potentials Eq. 163, so that the effective one-body Hamiltonian corresponds to the functional derivative of the energy.
The total electrostatic energy must include the core. Write it analogously to Eq. 167 but add the (pseudo) core and nuclear density to n val (Sec. 3.7.2) The first term is the electrostatic energy of the smooth density, evaluated on a uniform mesh. It involves a valence and pseudo-(core+nucleus) term Besides being needed for U c+n U R andŨ R are readily integrated in the sphere together with the valence parts in Eq. 167.
The exchange-correlation potential is made analogously with the electrostatic potential: It is also divided into valence + core contributions. We reiterate that differences n 1R V es 1R d 3 r − n 2RṼ es 2R and n 1R xc (n 1R ) − n 2R xc (n 2R ) converge much more rapidly with -cutoff than either term separately, and as a consequence the threecomponent augmentation is much more efficient than the standard one.

Forces
Our original formulation [30] derived a simple expression for the derivative of H HF when a nucleus changes from R to R+δR, assuming that the partial waves φ R shift rigidly with δR. (lmf has a "frozen φ" switch, which if imposed, guarantees φ R is rigid. In practice we have found that the differences are small.) The original expression yielded forces that converge only linearly with deviations ∆n=n out −n in from self-consistency. It is possible to achieve faster convergence in ∆n, by adding a correction δV ∆n, where δV is determined from a change δn in in n in , but there is no unambiguous way to determine δn in . If the static susceptibility (Sec. 6) were calculated, a good estimate could be made. But χ(r, r , 0) is a second derivative of H HF : it is far more cumbersome to make than H HF itself, and approximation of χ by models such as the Lindhard function worked less well than hoped. A practicable, and simple alternate ansatz is to assume that δn in is given by a change in the Mattheis density construction (superposition of free-atomic densities), arising δR. This has worked very well in practice: it dramatically improves the convergence of the forces with iterations to self-consistency. Here we omit the derivation but refer to the original paper [30], and also to Appendix B of Ref. [51] for an alternate derivation including the correction term. The final expression for shift of nucleus R from R→R+δR is V 0 is the sum of the smooth electrostatic and exchange-correlation potential. This term describes the force of the smooth density on the Gaussian lumps which represent the core and the nucleus at each site, and is the analogue of the Helman-Feynman theorem. The eigenvalue shifts account for a change in the atom-centred basis set in the shift in R. This the price for taking a tailored basis, but one whose Hilbert space changes when R shifts. Here the smooth mesh potential is fixed and augmentation matrices τ , σ, and π (Eqns. 152,153,163) shift rigidly with the nucleus. Gradients only include redistribution of plane waves owing to the shifts in envelope functions, and in the expansion coefficients C The last term is the correction noted above. In future, it will probably be replaced by an efficient calculation of χ, the Mattheis-shift ansatz works rather well. Fig. 12 shows the force F Ox on an O neighbouring Co substituted for Ti in a 12-atom unit cell of TiO 2 , as lmf iterates to self-consistency. The deviation ∆F from the self-consistent force is vastly smaller with the correction term (compare solid and dashed lines), and F Ox is reasonable already for the Mattheis construction (first iteration). The inset compares the two kinds of ∆F with iteration to self-consistency, to show that without and with the correction, F Ox ∼∆ 2 Q and F Ox ∼(∆ 2 Q) 2 , the latter being the same rate of convergence as the energy itself.

LDA+U
The LDA+U method, introduced originally by Anisimov et al. [60,61,62] for open shell d-or fshell materials including on-site Coulomb and exchange interactions by means of the Hubbard model is implemented in both the lm and the lmf codes following the rotationally invariant approach described in [63]. Rotational invariance means that the occupations of the different d m orbitals are specified by a density matrix n σ = n σ mm independent of the specific choice of Cartesian axes defining the spherical harmonics. For brevity we refer to the orbitals on which U is applied as the d orbitals although the code is written sufficiently general that U terms can be applied to any nl of choice and even on multiple sets of orbitals.
The on-site Coulomb interactions are added to the LSDA total energy and a double-counting term is subtracted.
Several schemes for the double counting have been proposed [60,61,63,64,65] and are implemented in lmf but the prevailing approach is the fully localised limit (FLL) in which the double-counting term and U -terms cancel for the fully localised (atomic) limit in which the density matrix becomes a diagonal matrix with integer occupations. This means that in principle the total energy is already well described in the LSDA in the atomic limit but the orbital energies are not. The task of the LSDA+U approach is to optimise the density matrix when the orbitals on which U is applied are allowed to hybridise with the other orbitals in the system. It hence describes strictly speaking the orbital polarisation in the system. Note that the total energy is then separately a functional of the spin-dependent electron density and the local density matrix on the orbitals for which U terms are added. The density matrix or orbital occupations also influence the spin-charge densities and so the two contributions are not really independent. The U terms in LSDA+U are treated in the static mean-field Hartree-Fock approximation in contrast to the DMFT method where they are treated dynamically. The U part of the total energy is then written where {m} = m, m , m , m and V ee is the screened Coulomb interaction. The screening is taken into account by the choice of the U and J parameters describe below. The double counting term is where n σ = Tr(n σ ) and n = n ↑ + n ↓ . U and J are the screened Coulomb and exchange parameters, which can for example, be estimated by a separate self-consistent supercell calculation treating the atoms in which the d-orbital occupation is changed as an impurity problem [66]. or by means of the linear response approach [67]. Most often they are treated empirically and U is adjusted to spectroscopic splittings, for example. The J are often estimated from the unscreened atomic F k Slater integrals discussed below. The bare Coulomb interaction matrix elements can be evaluated exactly in terms of the Slater F k (ll) integrals and combinations of Gaunt coefficients as worked out for instance in Condon and Shortley [68] m, m |V ee |m , m = k a k (m, m , m , m )F k (ll) and F k (ll) = r 2 1 dr 1 r 2 2 dr 2 R (r 1 ) 2 R (r 2 ) 2 (r k < /r k+1 > ) where r < , r > are the smaller and larger or r 1 and r 2 . One could easily calculate these Slater integrals in terms of the partial waves inside the spheres. However, this would then not take into account the screening of the Coulomb interaction by the other orbitals in the system. Therefore it is customary to use the relation between the average U and J to the U mm and J mm , (177) given by Anisimov et al. [61] and where U mm = mm |V ee |mm , J mm = mm |V ee |m m . Using the tables of Condon and Shortley [68] one finds for d electrons 14J = F 2 (dd) + F 4 (dd) (178) and for f electrons For p-orbitals J = F 2 (pp)/5 and for s-orbitals J = 0. Examining the tabulated values of the Slater F k integrals from Hartree-Fock calculations, one finds that the ratios F 4 (dd)/F 2 (dd) ≈ 0.625 and 668 are approximately constant. Using these fixed ratios and the relation to the average exchange integral J one then can fix the F k values. The advantage of doing it this way is that the screening of the J and U can then be taken into account and there are only two empirical parameters. On the other hand one finds in practice that while U is strongly reduced from the atomic bare F 0 , the J is approximately unscreened, so one might as well use the unscreened directly calculated F k for k ≥ 2. From LDA calculated atomic wave functions, we find F 4 (dd)/F 2 (dd) = 0.658 ± 0.004 for 3d atoms, .001 for rare-earth RE 3+ . (Until recently, Questaal used a different convention; see note [69].) By minimising the LDA+U total energy with respect to the density matrix elements, one obtains a non-local potential In the ASA lm-version it is straightforward to add this nonlocal potential directly to the Hamiltonian inside the spheres because the orbitals defining the matrix V σ mm are just the single lm channel partial waves inside the sphere. It is a little more complex in lmf. Essentially, we now add the operator |φ σ lm V σ mm φ σ lm |. We add the corresponding augmentation matrix elements both for the φ andφ and for local orbitals if these are present for this channel.
The calculation within LDA+U starts from a set of initial occupation numbers of the m orbitals for each channel for which U and J parameters are provided. These define the initial density matrix. The calculation then proceeds by making this density matrix self-consistent simultaneously with making the standard spin density ρ σ (r) selfconsistent. We note that the configuration one converges to may depend on the starting density matrix and hence one should in principle consider different starting points and find the one which minimises the energy or use guidance from physical insight. In rare-earth (RE) based systems one finds for example that the lowest energy corresponds to the configuration which obeys Hund's rules [70].
We have here described only the FLL. The around-mean-field approach [64,60] and a mixture of the two [65] are also implemented and make use of the same ingredients. Recently, Keshavarz et al. [71] argued that adding U to spinunpolarised LDA, thus LDA+U may have advantages over adding them to LSDA, for example to extract inter-atomic exchange interactions independent of the reference system used (AFM or FM). This is presently not supported in the lm and lmf codes but might be useful to add in the future. Note that in Ref. [61] also a LDA+U rather LSDA+U approach was used.
A simpler version of the LDA+U is described by Dudarev et al. [72]. In that case, only a single parameter, namelyŨ = U − J comes into play. One can use this scheme in the codes by setting J = 0 and adjusting the providedŨ parameter. This means essentially that we neglect the orbital polarisation due to the anisotropic Coulomb interaction exchange terms but keep the Hartree-Fock like feature that empty states feel a different potential from occupied states. In fact, in this case the additional potential is Or if the density matrix is actually diagonal V σ m = U [ 1 2 −n σ m ]. This means that when a spin-orbital mσ on a given site is empty it is shifted up byŨ /2 and when it is filled, it is shifted down byŨ /2. This corresponds to the simplest form of the LDA+U formalism. It allows one for example to include a shift of a fully occupied d band. This is useful for example in Zn-containing systems. In LDA, the binding energy of these orbitals is underestimated. The deeper a band lies below the Fermi level, the higher its downward shift by the self-energy and this can be simply mimicked by the LDA+U method in this form. It has the effect for example of reducing the hybridisation of the Zn-3d with the valence band maximum and slightly increases the gap, affects things like band-offsets and the valence band maximum crystal field and spin-orbit induced fine splitting.
One can even use this approach for s or p electrons and thereby shift up the mostly cation-s like conduction band minimum in a semiconductor to adjust the gap. While this is of course purely empirical and shifting the gaps for the wrong reason, and hence far less accurate than the QSGW approach, it is sometimes useful in defect calculations because a defect level that would otherwise lie as a resonance in the conduction band can now become a defect level in the gap and allow one to properly control its occupation for different charge states of the defect. Using a U on p-orbitals of anions like O or N, this can also have the effect that the empty defect levels pushed out the valence band and localising on a single p orbital are pushed deeper into the gap. The essential function of the LDA+U terms in these applications is to introduce a Hartree-Fock like orbital dependence of the potential. This is important to reduce the self-interaction error of LDA or GGA and allows one in a simpler and much less expensive way to simulate what a hybrid functional would do. If one picks these shifts carefully to adjust the bulk band structure to the QSGW bands, it provides a rather efficient way to study defects with a corrected band gap and specific orbital selfinteraction, two of the main errors of LDA plaguing defect calculations. An example of this can be found in Boonchun et al. [73].

Relativistic Effects
The radial solvers all solve by default a scalar Dirac equation, which incorporates the dominant relativistic effects. All the DFT codes (lmf, lm, lmgf and lmpg) have the ability to incorporate spinorbit coupling perturbatively. In the band codes the perturbation is straightforward (See Sec. 2.8.2); the Green's function scheme is more subtle, particularly with respect to third order parameterisation, but a formulation is possible (Sec. 2.16). For lmf, the L · S term is added to the true local potential V 1R (see Sec. 3.7.4). As of this writing, only the spin diagonal part of the output density is kept in the self-consistency cycle.
Additionally the ASA codes lm, lmgf and lmpg) have a fully relativistic implementation (Sec. 2.17). The full-potential code lmf does not as yet; however it does have a facility to compute the core levels fully relativistically. It follows a method similar to that developed by Ebert [41].
As of this writing, the QSGW code allows spinorbit coupling only in a very restricted manner. Typically SOC has only a minor effect on the self-energy Σ: this is because SOC is a modification to the kinetic energy and only affects the potential in a higher order perturbation. We have found very good results by generating QSGW selfenergies leaving out SOC all together, and then as a post-processing step include it in the band structure.
Additionally, lmf has a special mode where the only diagonal parts of L·S (see Sec. 2.8.2) are added to the one-particle Hamiltonian. After diagonalisation the spin off-diagonal parts are used to modify the one-particle levels in a special kind of perturbation theory, as described in the Appendix of Ref. [74]. This has the advantage that the eigenvectors are kept spin-diagonal, which significantly reduces the cost of the GW code. Tests show that at the LDA level the modification to the band structure is very similar to that of the full L · S, and for purposes of determining the effect on Σ it should be quite reliable except in very special cases. Once a modified Σ has been found (as a one-shot correction to QSGW computed without SOC), lmf can be run with the full L·S, including the modification of Σ approximately in the manner just described. We have found the difference to be negligible except when elements are very heavy: CH 3 NH 3 PbI 3 is the most extreme case we have found so far: the effect of SO on the band structure through changes in Σ reduced the gap by 0.1 eV [74] Finally lmf has an ability to fold in approximately the effects of the proper Dirac Hamiltonian on the valence states. The difference between the full Dirac and scalar Dirac partial waves can be important for very small r, where the SOC contributions are the largest (see ξ(r) in Sec. 2.8.2). For small r, the true Dirac wave function varies as r γ , γ 2 = κ 2 − (2Z/c) 2 , whereas in the scalar Dirac case it varies as r . As a result, matrix elements ξ(r) are underestimated. The effect is typically very small, but it becomes non-negligible in heavy semiconductors such as CdTe. To ameliorate this error, lmf has the ability to use a fully relativistic partial wave in place of a standard local orbital, choosing µ = −1/2. Most important is the p 1/2 state with κ = 1. SO coupling is well described for light semiconductors such as GaAs and ZnSe (see Table 4), as Z increases γ deviates progressively from unity and the discrepancy with experiment increases. It is already significant for CdTe, and in PbTe the effect exceeds 0.1 eV. Replacing the scalar Dirac LO with the Dirac p 1/2 LO significantly improves this error in CdTe. In PbTe, the valence band edges are both at L: the two band edges have symmetry L + 6 and L − 6 , respectively (Fig. 13) Table 4: Spin orbit splitting ∆ 0 of the valence band at Γ, and energy bandgaps computed in the LDA and by QSGW, with a scalar relativistic (SR) local orbital on the cation and anion p states, and a Dirac p 1/2 local orbital. In the zincblende semiconductors the two band edges are at Γ, but in PbTe case they are both at L (Fig. 13. n are the principal quantum numbers of the cation and anion local p orbital; inally positive, but L + 6 and L − 6 are inverted very near L as can be seen by the change in colours at the band edges; thus the gap is actually inverted. Anticipating the discussion in Sec. 4, if a diagonalonly GW were carried out on top of this, the L + 6 and L − 6 would order properly, but the wrong topology would manifest itself as a band crossing near L, as shown for Ge in Fig. 6 of Ref. [55]. QSGW orders L + 6 and L − 6 correctly, somewhat overestimating the gap [75]. Adding a Dirac p 1/2 local orbital reduces the gap by nearly 0.2 eV. In the LDA the gap at L widens as a consequence of the fully relativistic partial wave (Table 4), another reflection of the inversion of L + 6 and L − 6 .

The PMT method
In the introduction it was noted that the LAPW and Questaal's generalised LMTO method are essentially the same except for the choice of envelope functions. LMTO's are much more compact, and require a much smaller Hilbert space, but they suffer from the problems of basis set completeness. An obvious alternative is to combine the two basis sets; this is the "PMT basis" [53]. This can be accomplished in practice quite neatly, because Questaal's standard H L require a more general one-centre expansion, Eq. 145 than simple LMTO's do (Eq. 1). Plane waves can be expanded in the polynomials P kL in a similar manner, as can other kinds of envelope functions. Thus the method provides a unified framework to seamlessly mix different kinds of envelope functions. Expressions for total energy, forces, assembling output density, etc, remain unchanged. The procedure is described in more detail in Ref. [53]; here we focus on the primary strengths and weaknesses of the method, using the total energy calculated for SrTiO 3 shown in Fig. 14 for discussion. It was redrawn from Ref. [53]. Four basis sets were chosen, ranging from pure LAPW ('null') to a moderately large basis of 75 smooth Hankels N (H L ) + 6 local orbitals.

Advantages of PMT
As expected, Questaal's purely atom-centred basis is vastly more compact than a pure LAPW basis (Fig. 14). Even a tiny basis of just O sp states dramatically improves on the convergence of the LAPW basis. PMT offers a marked advantaged from LAPW perspective: by augmenting that basis by a few functions. From the atom-centred perspective Questaal's standard moderately large basis misses about 5 mRy/atom in total energy, showing that it is not complete. Finally the figure shows that adding H L continues to dramatically improve the convergence to the point a standard minimal basis of H L (single spd function on each atom). Beyond that initial rapid gain, it doesn't seem to matter much whether the basis is increased by adding PW or more H L (compare blue-solid and greendashed lines).

Drawbacks of PMT
While they are far more efficient than ordinary LAPWs, PMTs show less promise than was initially hoped for.
• Compactness: LAPWs are not "intelligent", i.e. tailored to the potential, and therefore convergence is slow in the basis size. Also their extended range makes them ill-suited to advantages short-range functions have.
• Over-completeness: PMT has difficulties when H L and PWs are both sizeable. This is because they span much the same Hilbert space. Workarounds (reducing the Hilbert space by projecting out parts that contribute to a small 40 overlap) have been implemented, but they are inefficient and somewhat finicky -e.g. energy may not vary smoothly with the change in a nuclear coordinates.
• Interpolating Σ 0 : in QSGW context, the ability to interpolate the QSGW self-energy Σ 0 is degraded. A workaround has been made for this problem also (Ref. [76]) by projecting Σ 0 onto the MTO part of the eigenfunctions, interpolating it, and re-embedding. Information is lost in the projection/re-embedding step, resulting in ambiguities. The best way to address difficulties with interpolation is to construct very short-ranged, compact functions. This is shown clearly in Sec. 3.12.
In summary, PMT does best when treated as a slight augmentation of either the LAPW limit, or the MTO limit. Adding a few PWs with a low cutoff (2 Ry) was useful in our participation in the Delta Codes project (Sec. 3.13) particularly for the molecular solids such as N 2 which consist of 90% interstitial, or even more. Questaal's pure MTO basis does not handle those compounds that accurately.

Floating Orbitals
While the PMT method can make the basis nearly complete, it causes severe difficulties when interpolating the QSGW Σ 0 to an arbitrary k. An alternative is to add "empty" sites -points outside augmentation spheres where extra H L can be added to the basis. These sites have no augmentation spheres; they only enhance completeness in the interstitial. They are more ad hoc than plane waves (Sec. 3.10) particularly because there is no systematic path to convergence. Nevertheless they can be implemented efficiently; and Questaal has an automatic procedure to locate points to fill voids in the interstitial. In practice we use floating orbitals to converge QSGW self-energies with respect to the single-particle basis, in open systems. Their effect at the LDA level is usually not large, unless systems are very open or very accurate total energies are required. But for QSGW, where the potential depends on unoccupied states as well as occupied ones, it makes more of a difference, e.g correcting the fundamental in Si by of order 0.1 eV. We expect that the need for floating orbitals in such systems will be obviated by the new JPO basis. A precursor to it is discussed next. Building on the success of the "screening" of LM-TOs in the ASA (Sec.2.9), we have developed an analogue within the full-potential LMTO framework. As in the ASA case, "screening" does not alter the Hilbert space but renders basis functions short ranged. Screening is a precursor to the nextgeneration JPO basis, which will alter the Hilbert space, rendering it significantly more accurate for the same rank of Hamiltonian. JPOs are still in development; the method will be presented in a future work. They have the following properties:

Screened, Short-ranged Orbitals in the Full-Potential Framework
• They solve the SE with a (nearly) optimal number of basis functions for a given accuracy in the four dimensions (r, E).
• They are atom centred with a definite L character on their own MT sphere.
This provides a framework for exploiting the many advantages inherent in a short ranged basis sets. They can be used as projectors, replacing Wannier functions; they can be used to construct minimal Hamiltonians; near-sightedness can be exploited to make very efficient O(N ) solvers in DFT and O(N 3 ) solvers in GW and GW +BSE. That the basis functions are associated with a definite L character is important, for example, when singling out a correlated subspace. Screening of the H L works in a manner similar to screening traditional H L because H L asymptotically approaches a H L for large r (Fig. 10). H L − H L decays approximately as a Gaussian of radius r s : for traditional values of r s (r s 2s/3) it becomes negligible beyond first-neighbour distances. The screened H α L employ the expansion coefficients S α in the same manner as in the ASA (Sec. 2.9), so they have essentially the same range. What is sacrificed is the reinterpretation of screening in terms of hard core radius (Sec. 2.10.5), but this has no effect because the Hilbert space is unchanged. JPOs improve on the screened H α by restoring this condition and exploiting it to make the kinetic energy continuous everywhere.
Screening makes it possible to avoid Ewald summation otherwise necessary in the unscreened FP case (Sec. 3.5). Also, one-centre expansions are made more efficient.
By writing H α = {H α − H α } + H α , the first term which involves H, must be expanded in the more cumbersome polynomials P kL (Eq. (148)). In the screened case the difference becomes short-ranged and can be dealt with efficiently in real space. The latter can use the simpler expansion Eq. (42).
A restriction in transformation is that all functions of a given κ must share the same Hankel energy, while the traditional basis allows arbitrary choice for each orbital. However, this restriction is fairly mild because of the extra flexibility H L have through the choice of r s , which the H L do not. The r s strongly affect the kinetic energy near the MT boundary (Fig. 10), and they results are more sensitive to that choice than to the Hankel energy.
One large advantage of the transformation is that H α becomes short enough ranged so that matrix elements are limited by the range of the physical Σ 0 (r, r ), and not by the basis set. Σ α 0;RL,R L turns out not to be very long ranged, as suggested some time ago by Zein et al. [77]. Fig. 16 shows how the bandgap in NiO evolves with range cutoff R cut , defined as follows. Σ α 0;RL,R L is initially computed without range truncation. Then matrix elements of Σ α 0;RL,R L are set to zero for |R − R | > R cut , and the band structure calculated. As Fig. 16 shows, the band gap E G converges slowly with R cut in the traditional Σ α=0 0;RL,R L case, but for Σ α 0;RL,R L , E G is already reasonably converged including first and second neighbours. Note that for R cut →∞, the two methods yield the same result, showing that the screening transformation does not change the Hilbert space.
One powerful feature of lmf is its ability to interpolate Σ 0 . As noted in Sec. 4.2, some compromises must be made because the high-energy parts of Σ nn 0 do not interpolate well, and elements above an energy threshold (of order 2 Ry) must be set to zero. The screening mitigates this difficulty; it appears that the energy threshold can be taken to ∞.
The difference is not large, but in small bandgap cases where the precision of the method is very high, agreement with experiment may be limited by this difference.
The short range of Σ α 0;RL,R L can be used to much advantage: we realise from the properties of Fourier transforms that the number of k points directly translates into the number of neighbours in real space. Because Σ α 0;RL,R L is short ranged, it converges faster in the k mesh than Σ α=0 0;RL,R L , which saves significantly on computational effort. Also explicitly semilocal algorithms can be constructed, greatly reducing the cost to make Σ 0 .
At present the short ranged screened smooth Hankel functions shine the most when used to represent the QSGW self-energy Σ(ω, q) and interpolate it at intermediate q points not directly calculated in the heavy GW step. The interpolation process is significantly simplified and ambiguity is removed because the high energy Σ no longer needs diagonal approximation, there is no need to define energy threshold and there is no loss of information due to discarding the off-diagonal parts. This does add to the computing effort for directly computed QSGW Σ(ω, q) because no states are excluded and matrix sizes increase respectively. To counter that, significantly fewer Σ(q) need to be computed exactly, in our experience an equivalent result can be achieved by using only a half or in cases even a quarter of the original BZ sampling in each direction, for cells with few symmetry operations the savings add up quickly considering that the GW walks over pairs of q points (quadrupling the total number of points for low symmetry cells). Performance can be further improved by still utilising the diagonal approximation to Σ(ω, q) because the interpolation is excellent and the high energy part only weakly affects states near E F . To show the quality of the QSGW Σ(ω, q) expansion in the new, screened basis and the conventional unscreened, we plot the dependence of the NiO band gap as a function of the truncated self energy in Fig. 16.
At the LDA level, the better spatial localisation of screened basis function paves the way to efficient, real-space assembly of matrices (1 and 2 particle) while maintaining, and in cases improving, accuracy. The screened functions are also significantly less linearly dependent. The implementation is heavily vectorised and at present can handle a couple hundred atoms on a single node, with the scalability properly utilised the numbers can be an order of magnitude larger. For very large systems, it will be possible to construct O(N ) methods, but empirically it appears so far that systems of a few hundred atoms are probably still too small for such methods to be advantageous.
To summarize, our current screening transformation offers advantages of short-ranged basis sets. It is not yet optimally compact. However, the JPO basis, to be reported on soon, we believe we will be nearly the most compact (optimal convergence for a given rank of Hamiltonian, and short range). We believe these advantages will be very significant, and obviate the need for enhancements such as plane waves (Sec. 3.10) or floating orbitals (Sec. 3.11) to obtain high accuracy.

Delta Codes Validation Exercise
The Delta Codes project [6] is a continuing effort to mutually validate the different electronic structure codes used in condensed matter and materials modelling. The first comparison has focused on the equations of state calculated for a selection of elemental solids. The relative agreement of different codes, including lmf, can be examined on the project site https://molmod.ugent. be/deltacodesdft. Agreement is quantified using an average "Delta value"-the integrated difference between equation of state curves over a specified volume range.
The calculated equations of states (scalar relativistic and PBE-GGA exchange-correlation are specified) agree extremely closely, particularly  Figure 17: "Delta Codes" tests: integrated energy difference between the energy-volume curves calculated by lmf and Wien2k [78], expressed in meV/atom.
among the all-electron codes. The pseudopotential projects have also demonstrated precision very similar to the all-electron methods. The Delta Codes test set has been carried out for lmf with an automatic setup, avoiding any systemspecific modifications. The main settings defining the calculations are: Sphere radii: s R corresponding to touching spheres are used, evaluated at the smallest volume tested. max : The maximum L for the basis functions is 1 for H and He, 2 for Z < 18 and 3 otherwise. In each case, one higher is included in the partial wave expansion.
Semi-core states: Local orbitals are effective for including semicore states in the valence. We recommend treating states as semicore whenever the leaked charge exceeds 0.002e, or the eigenvalue is higher than ∼ −2 Ry. For transition metals and felectron systems, the inclusion of high-lying = 2 or 3 conventional local orbitals can significantly improve accuracy and these are added by default for these groups of elements.
Molecular cases: For cases where the touching muffin-tin spheres fill less than 30% of the cell volume, an additional LAPW basis has been included. The LMTO basis was not designed for molecular systems -but the shortcomings of the LMTO basis can be easily rectified by the addition of small number of plane-waves.
Basis setup: Basis parameters determined automatically as described below. Figure 17 shows the "Delta values" for Questaal with respect to the LAPW code Wien2k. The level of agreement is typical (or better) than most all electron methods involved in the validation exercise. In particular, the transition elements are extremely well represented by the smooth Hankel basis. Some of the first period and group VII cases involve molecular problems for which the muffin tins fill a small fraction of the unit cell: the performance of the LMTO basis in these cases becomes sensitive to r s and E choices. We do not attempt to choose optimal parameters, instead we include an additional LAPW (see Sec. 3.6), with cut-off 2 Ry or 4 Ry depending on the packing fraction. In addition to confirming that the smooth Hankel basis is capable of high precision, the testing procedure also demonstrates that the automatic procedure for setting up the basis and setting various numerical parameters is highly reliable.

Choice of basis parameters
Questaal's full-potential code allows two LMTO basis functions to be used per L. Each of these is defined by two parameters which must be chosen: the Hankel energy and the smoothing radius. Because the size of the LMTO basis is small, the choice of basis parameters can have a significant impact on accuracy. Different schemes for choosing basis parameters have evolved, of which the simplest is to fit the basis functions to the free atom wave functions. This provides one set of basis parameters for each L. The second set is obtained from the first by shifting the Hankel energy (typically by −0.5 to −0.8 Ry). This is a heuristic approach but it is automatic and effective and is the default.
An alternative method chooses smoothing radii directly from the potential. The gradient of the smoothed Hankel functions at typical muffin-tin sizes is sensitive to r s and variational freedom is obtained by differentiating the two sets of basis functions by r s instead of the Hankel energy. The smoothing radius describes the point where the basis function begins to deviate from the exponential, Hankel-like tail: this is similar to the behaviour of atomic eigenfunctions at their classical turning points.
Associating the smoothing radius with the classical turning point allows basis functions to be constructed that resemble atomic-like states at energies different to the atomic eigenvalues and the question of choosing suitable r s is translated into a determining a pair of reference energies-one each for the two basis sets per L. Energies of the valence states are typically in the range 0 to −2 Ry, and choices for the reference energies of E 1 = −0.5 and E 2 = −2.0 result in accurate basis sets for most materials. In this scheme, the Hankel energy remains a free parameter but similar results are found for Hankel energies in the range 0.1 to 1 Ry.
The choice of reference energies can be automated by expressing them in terms of the atomic potential at the muffin-tin radius. One basis set is chosen with a higher reference energy, giving a more diffuse basis function, the other is setup at a smaller, more negative, energy which yields a smaller r s and a more tightly bound basis function.
Because v(s R ) varies significantly across elements and materials, a proportional scheme is more suitable: e.g., v(r 1 s ) = 2v(s R ) and v(r 2 s ) = 1 2 v(s R ), which gives reliably accurate basis functions.

GW and QSGW
The GW approximation (G=Green's function, W =screened Coulomb interaction) is the first-order term in the formally exact diagrammatic expansion around some one-particle Hamiltonian H 0 [79].
Questaal's GW formalism is explained in some detail in Ref. [80], and its implementation of quasiparticle self-consistent GW (QSGW ) is explained in Ref. [81]. Here we recapitulate only the main points. From an implementation point of view, GW requires two-point functions, such as the susceptibility from which W is made (Sec. 6). This entails an auxiliary product basis of wave function products. Most commonly plane waves are used to implement GW. In such a case the same basis for two-particle objects can be used, because a product of plane waves is another plane wave, but for all-electron methods this is not possible. Representation of these objects requires a basis spanning the Hilbert space of wave function products. Fourcentre integrals (e.g. Eq. (193)) can be evaluated as integrals of pairs of product basis functions. This was first accomplished by Aryasetiawan [82] in an ASA framework, and extended by us [15] into a full-potential scheme.
In the ab initio context, GW has traditionally referred to a perturbation around the LDA: i.e. H 0 = H LDA so that GW = G LDA W LDA , though in recent years hybrid functionals or LDA+U have become popular. It has become increasingly recognised that results depend rather strongly on the choice of H 0 , or equivalently the V xc entering into it, by which we mean the beyond-Hartree parts of the effective potential. Thus the self-energy of H 0 is Σ = V xc in the language of many-body perturbation theory. The implications of starting point dependence are profound: the quality of GW depends on the quality of the reference. Moreover it is increasingly accepted that H LDA is only a good choice for H 0 for a very restricted materials class.
GW is a perturbation theory, and usually only the lowest order perturbation (diagonal part of Σ in the basis of the reference Hamiltonian) is kept. As a result eigenfunctions are not perturbed and the change in QP level ε kn is Z kn is the quasi-particle (QP) renormalisation factor and accounts for the fact that Σ is evaluated at ε kn rather than at the QP energy ε kn +δε kn . Eq. (182) is the customary way QP energies are evaluated in GW calculations. In Sec. VI of Ref. [55], we show that omitting the Z factor is an approximate way to incorporate self-consistency, and thus should be a better choice than including it. As a practical matter it is also the case that results improve using Z=1.
Ref. [55] notes a serious drawback in the diagonal-only approximation: when levels in the reference Hamiltonian are wrongly ordered as they are, e.g. at the Γ point in Ge, InN, and CuInSe 2 , and the L point in PbTe [75] the wrong starting topology results in an unphysical band crossing in the GW QP levels. (See, for example an illustration for Ge in Fig. 6 of Ref. [55]). Something similar will happen for PbTe (Fig. 13) Moreover, the off-diagonal parts of Σ can significantly modify the eigenfunctions and resulting charge density. This reflects itself in many contexts, e.g. strong renormalisation of the bandgaps in polar insulators such as TiSe 2 and CeO 2 [55]. It modifies orbital character of states near the Fermi energy in La 2 CuO 4 , and significantly affects how the metal-insulator transition comes about, when GW is combined with DMFT [83].

Need for self-consistency
The arbitrariness in the starting point means that there is no unique definition of the GW approximation. Errors in the theory (estimated by deviations from experiment) can be fairly scattered for a particular choice of reference, e.g. LDA, making it unclear what the shortcomings of GW actually are. Arbitrariness in the starting point can be surmounted by iterating G to self-consistency, that is, by finding a G generated by GW that is the same as the G that generates it (G out =G in ). But it has long been known that full self-consistency can be quite poor in solids [84,85]. A recent reexamination of some semiconductors [86] confirms that the dielectric function (and concomitant QP levels) indeed worsen when G is self-consistent, for reasons explained in Appendix A in Ref. [81]. Fully scGW becomes more problematic in transition metals [87]. Finally, even while scGW is a conserving approximation in the Green's function G, in W it is not: it violates the sum rule [88] and loses its usual physical meaning as a response function. As a result it washes out spectral functions in transition metals [87], often yielding worse results than the LDA.
A simple kind of self-consistency is to update eigenvalues but retain LDA eigenfunctions, as was first done by Aryasetiawan and Gunnarsson [89]. This greatly improves GW, especially in systems such as NiO where the LDA starting point is very poor. But it is only adequate in limited circumstances. In TiSe 2 it strongly overestimates the bandgap (see below), as it does in NiO [89]. An extreme case is CeO 2 where the position of the Ce 4f levels is severely overestimated; see Fig. 3 in Ref. [80]. The off-diagonal elements in Σ, which are needed to modify the eigenfunctions, can be very important.
Questaal employs quasiparticle selfconsistency [16,80,81]: by construction, the reference H 0 is determined within GW as in the fully self-consistent case. But the proper Σ=iGW cannot be used because it is energy dependent and non-hermitian, thus falling outside of an independent particle picture. This causes GW to degrade, and higher-order diagrams are needed [84] to restore the quality of GW. But by quasiparticlising Σ we can stay within a framework of perturbation around H 0 but choose H 0 in an optimal manner. QSGW quasiparticlises Σ(ε) by approximating it by a static hermitian potential as which is the QSGW form for a static V xc . This process is carried out to self-consistency, until Σ out 0 = Σ in 0 . Given Σ 0 , a new effective Hamiltonian can be 45 made, which generates a new Σ 0 : Σ 0 is nonlocal with off-diagonal components; both of these features are important. TiSe 2 is an interesting case where self-consistency becomes critical. It is a layered diselenide with space group P3m1. Below T c =200K, it undergoes a phase transition to a charge density wave, forming a commensurate 2×2×2 superlattice (P3c1) of the original structure. It has attracted a great deal of interest in part because the CDW may to be connected to superconductivity. How the CDW affects the energy band structure is a source of great controversy. It is accepted experimentally that in the P3c1 phase, TiSe 2 is semiconductor with a gap of ∼0.15 eV. Above T c whether intrinsic TiSe 2 has a gap is not settled. On the theoretical side, the CDW makes TiSe 2 an unusual materials system: DFT predicts a metal in both P3m1 and P3c1, as expected, but a recent G LDA W LDA calculation [90] predicts P3m1 to have a small positive gap. We have revisited this problem and confirmed the findings of Ref. [90] at the G LDA W LDA level. However, at the QSGW level, TiSe 2 is a metal in the ideal P3m1 phase. This is atypical for insulators: usually self-consistency widens the gap, as has long been known (see Fig. 1 in Ref. [80]). The origin can be traced to the modification of the LDA eigenfunctions by off-diagonal elements Σ n =n , which modify the density n(r). To see approximately the effect of the density change, assume LDA adequately describes δV eff /δn (this is the inverse susceptibility, the charge analogue of Eq. (190)). For a fixed Σ 0 , we can estimate the effect of the change δn renormalising the V eff through the change This is accomplished in a natural way with the Questaal package. The quantity in curly brackets is generated by GW in the first QSGW cycle, and lmf treats this term as an external perturbation. Running lmf to self-consistency allows the system to respond to the potential and screen it, yielding n GW .
The result of this process is shown as a dashed line in Fig. 18(b). It bears a close resemblance to the full QSGW result, including the negative gap. We will show elsewhere that TiSe 2 is an insulator in the P3c1 phase only as a consequence of lattice displacements relative to the symmetric P3m1 phase. GW is used less often in metals. As in the insulating case, GW based on DFT can sometimes work well, as it apparently does for SrVO 3 [91]. But the range of applicability is limited, and indeed G LDA W LDA can yield catastrophically bad results. Two cases in point are MnAs and FeTe. Fig. 19 shows LDA, G LDA W LDA , and QSGW energy bands near E F ; they are mostly of Mn or Fe 3d character. In the G LDA W LDA case, E F must be adjusted to conserve charge. (The shift is large, of order 1 eV). Consider MnAs first. The majority and minority 3d bands lie at about −1.5 eV and +1 eV in the LDA. This exchange splitting is underestimated: QSGW increases the splitting, putting bands at about (−2.5,1.5) eV while G LDA W LDA does the opposite, reducing the splitting relative to LDA. Also, the Fermi surface topology is poor: the hole pocket at Γ disappears, and the one at K is similar to the LDA. FeTe fares no better. G LDA W LDA is somewhere intermediate between LDA and QSGW. Most important is the unphysical, dispersionless band at E F on the X-M line, which yields a nonsensical Fermi surface. In both of these materials, the LDA describes the electronic structure better than G LDA W LDA . Ambiguities in the starting point make it difficult to know what errors are intrinsic to the theory and which are accidental. The literature is rife with manifestations of this problem; see e.g. Ref. [92]. Self-consistency (mostly) removes the starting-point dependence so that the errors intrinsic to the GW approximation can be best elucidated. Moreover, QSGW should yield better RPA total energies on averages than those calculated from other starting points, because the path of adiabatic connection is optimally described through QSGW [81]. To better illustrate these points Fig. 20 shows the ionisation potential of 3d tran-sition metal atoms and the heat of formation of 3d-O dimers, computed by the molgw code [93]. We focus on these properties because it is known, e.g. from Ref. [94], that the RPA tends to systematically overbind, and the error is connected with shortranged correlations. The ionisation potential and the dimer formation energy, both of which benefit from partial cancellation of such errors, are much better described.
Note the dramatic difference between choice of a Hartree-Fock or the PBE functional. As for the superiority of QSGW, Fig. 20 generally bears this argument out, though there are some surprises, e.g. the heat of formation from RPA@PBE deviates less from CCSD(T) results than QSGW. QSGW does not describe Cr well, probably because spin fluctuations are important there (Sec. 4.4), which skews the statistics with this small sample. The discrepancies are small enough that incomplete convergence in the basis set (results in Fig. 4.4 were obtained with a triple-ζ basis) are of the same order and could account for some of the results. The ionisation potential of simple sp atoms was analysed in Ref. [95], where it was computed from RPA total energy differences, compared to the eigenvalue of the GW one-particle Hamiltonian. There also it was shown QSGW significantly improves on RPA@HF, and also that QSGW is slightly biased towards Hartree-Fock, reflecting a slight tendency to underestimate screening. This is consistent with the tendency for QSGW to overestimate bandgaps. Koval et al. [96] found somewhat different results for small, second-row molecules.
QSGW -RPA is not accurate enough to reach quantum chemical accuracy, ∼1 kcal/mol. The addition of ladders should considerably improve on the RPA's known inadequacy in describing shortranged correlations. It has long been known that ladders dramatically improve the dielectric function (ω) ( (ω) and G enter in the coupling constant integration for total energy) and it would be interesting to explore if such low-order diagrams in a QSGW framework are sufficient to capture total energy to this accuracy.
While self-consistency does largely surmount starting-point dependence (it was recently shown to be the case for some insulators using Hartree-Fock and LDA as starting points [97]), but it is not strictly true that it does. The QSGW H 0 or G 0 has a Hartree Fock structure, and it can get stuck in metastable valleys in the same way as Hartree Fock; an example are the dual low-spin and high-spin states found for FeSe (Sec. 4.4). Also, in strongly correlated systems Σ(ε) can vary rapidly with energy. There can be more than one i that forms a stationary point. This happens, for example in CuO. It is usually an indication that the ground state is not well described by a single Slater determinant. QSGW restricts G 0 to a single determinant, and there is no longer an unambiguously optimal choice when there are two or more equally strongly competing ones.
There is no unique prescription for quasiparticlisation, but Eq. (184) minimises the difference between the full G and G 0 according a particular choice of norm. Recently it was shown [98] that Eq. (184) minimises the absolute value of the gradient of the Klein functional over all possible spaces of non-interacting Green's functions. There are formal reasons (Z factor cancellation [81] and conserving sum rule [88]) why quasiparticlisation should be better; there is also abundant empirical evidence, that quasiparticle self-consistency performs better than full self-consistency [87,99,86], and by construction this particular form is an optimum one.
QSGW has been implemented in numerous electronic structure packages, e.g. the spex code [100], the Abinit code [101], VASP [102], recently in the Exciting code [97], and also in molecules codes such as molgw [93] and NWChem [96]. For solids perhaps the most rigorous implementation is the spex code, which can use a local Sternheimer method to rapidly converge the calculation of the dielectric function [103]. As attempts to make efficient schemes that are also well converged, e.g. to calculate the RPA total energy, the local Sternheimer method will likely emerge as being very significant over time.

Questaal's QS GW implementation
Questaal's implementation of QSGW evolved from the ecalj package, developed by Kotani and coworkers in the early 2000's out of Aryasetiawan's GW -ASA code [13], which in turn was developed from the "Stuttgart" LMTO code. The original ecalj package can now be found at https: //github.com/tkotani/ecalj/. As of this writing we maintain the GW code as a separate branch (Fig. 21). In its present form the GW part of Questaal's QSGW implementation operates independently, receiving information about eigenfunctions and eigenvalues, and returning response function or a self-energy. As for the QSGW cycle, one cycle occurs in three parts. lmfgwd generates information about the eigenfunctions Ψ corresponding to H 0 , and lmgw receives the output of lmfgwd and to make a new Σ 0 . lmgw can also generate response functions and the fully dynamical self-energy Σ nn (ω). lmfgws is a post-processing code that yields observables computed from the interacting G, where Rather than storing Σ 0 on disk, Σ 0 − V xc is stored (V xc usually being the LDA potential). Thus lmf generates its customary LDA Hamiltonian, and adds Σ 0 − V xc to it. In practice this is accomplished by reading Σ 0 −V xc on the mesh of k points where it was generated, and inverse Bloch-summing to make Σ 0 − V xc in real space. Then Σ 0 − V xc can be interpolated to any k by a forward Bloch sum. This is a unique and powerful feature, as it makes it possible to generate QSGW eigenfunctions and eigenvalues at any k. There are some subtleties in this step: the interpolation does not work well if all the n|Σ nn 0 |n are included. It is solved by rotating to the LDA eigenfunctions, and zeroing out Σ n =n 0 for n or n whose energy exceeds a threshold, E cut ; i.e. diagonal-only approximation for high-energy subblocks of Σ 0 . E cut is typically ∼2 Ry; convergence can be checked by varying E cut . The prescription is explained in detail in Sec. IIG of Ref. [81]. The error is connected to the relatively long range of the smooth Hankel envelopes. Preliminary tests using short-ranged, screened Hankels indicate that this truncation is no longer needed.
Typically Σ 0 is a smoother function of k than the kinetic energy. Thus the k mesh on which Σ 0 is made can usually be coarser than the k mesh. Because of its ability to interpolate, lmf and lmgw operate with independent meshes.

Successes of QS GW
QSGW has the ability to calculate properties for a wide variety of materials classes in a man-ner that no other single theory can equal. It consistently shows dramatic improvement relative to DFT and extensions such as hybrid functionals or LDA-based GW. Atomic ionisation potentials [95], quasiparticle (QP) levels, Dresselhaus coefficients in semiconductors [104,105,106]; band offsets at the Si/SiO 2 interface [101], magnetic moments and spin wave excitations [107,108]; tunnelling magnetoresistance [109]; impact ionisation [110]; electric field gradients and deformation potentials [111], spectral functions [112], and dielectric response [113]. Its superior ability to yield QP levels and DOS made it possible to accurately determine valence maximum in SrTiO 3 [114]. In contrast to G LDA W LDA , QSGW is uniformly reliable with systematic errors (see Sec. 4.4). Fundamental gaps are usually in very good agreement with experiments, though they are systematically overestimated (see Fig. 1 in Ref. [80]), even in the strongly correlated M1 phase of VO 2 [115,116], where spin fluctuations are not important. It properly narrows the bandwidth of localised d bands and widens it in wide-gap semiconductors [114] and graphene [117].
QSGW can predict properties inaccessible to G LDA W LDA , such as magnetic ground state [112] and charge density. Fig. 22 shows the density calculated by QSGW in the plane normal to [001] for antiferromagnetic NiO and CoO. Two prominent features are seen: (1) NiO is much more spherical than CoO, and the density contours for CoO are elongated along the [110] line. Both of these findings are consistent with γ ray measurements [118,119]. It was noticed early on [80] that QSGW has a systematic tendency to overestimate bandgaps in semiconductors and insulators. To what extent dispersions are well described is much less discussed, in part because there is a limited amount of exper-imental data reliable to the precision needed. Perhaps the best materials family to serve as a benchmark are the zincblende semiconductors, where critical-point analysis of the dielectric function has been used to accurately measure in most zincblende semiconductors, not only the Γ − Γ (E 0 ) transition but also the L-L (E 1 ) and X-X (E 2 ) transitions (top left panel, Fig. 23). Also, reliably measured are the electron effective mass m * e (hole masses are much less well known), and for some systems, the nonparabolicity parameter α, characterising deviations from parabolicity near Γ, and defined as where ε is the band energy relative to the conduction band minimum. α is an important quantity in several contexts, particularly hot-electron semiconductor electronics. (Unfortunately α is difficult to measure, and values reported are usually some scalar average of the second rank tensor.)  186)), respectively. Fig. 23 shows that tendency to overestimate E 0 and E 1 is systematic and uniform, but there is less uniformity in E 2 . Usually E 0 can be precisely measured; similarly for E 1 because there is a sizeable volume of k near L where valence and conduction bands disperse in a parallel manner. E 2 is more difficult, and values are less certain. That being said, the data suggest E 0 −E 2 is, on average, about 0.1 eV below experiment. We have found that this error is, at least in part, due to the necessity of truncating high-lying parts of the off-diagonal Σ 0 ; see E cut , Sec. 4.1. It has recently become possible to re-evaluate this because a short-range basis was recently developed (Sec. 3.12) appears to interpolate Σ 0 allowing E cut →∞. Conduction band masses are also well described, though there is a systematic tendency to underestimate it, connected to the tendency to overestimate E G . Most of this error can be traced to the RPA approximation to the dielectric function. The optical dielectric constant ∞ (Eq. 199) in the ω→0 limit) is uniformly too small by a factor ∼0.8 for many kinds of semiconductors and insulators (Fig. 24). Plasmon peaks in Im (ω) are almost universally blue shifted [120,121] because the RPA omits the attraction between electron-hole pairs in their virtual excitations. A simple Kramers-Kronig analysis shows that as a consequence, ∞ =Re (ω→0) should be underestimated. Indeed we find that to be the case, but remarkably ∞ is consistently underestimated, by a nearly universal factor of 0.8. This is what motivated a hybrid of LDA and QSGW functionals [104] Σ scaled The reasoning is since W is dominated by the q→0, ω→0 limit, scaling W by 0.8 justifies Eq. (186). In practice, it seems that the tendency to overestimated bandgaps is almost completely ameliorated in these systems (compare red and blue circles, Fig. 23, as is the tendency to underestimate m * c . Exceptions are the small-gap InAs in InSb, where relative errors in bandgaps are still not small even with the 0.8 scaling. The hybrid scheme does a stellar job in many contexts: it improves on bandgaps in semiconductors (Fig. 23) and predicting the Dresselhaus splitting in sp semiconductors [104]. But the scheme is empirical, and it is limited. It does not properly correct the blue shifts in (ω) or take into account other excitonic effects; and fails to bring bandgaps in close agreement with experiment in systems with strong spin fluctuations such as CoO or La 2 CuO 4 , and deep states such as the Ga 3d shift farther from experiment (see Table below). Recent work [99,122] shows that the great majority of the error in charge susceptibility can be accounted for by ladder diagrams. (In the case of polar semiconductors there can also be a significant renormalisation of the gap from the Frölich interaction [123,124]; the electron-phonon interaction has a modest effect in other semiconductors, especially compounds with second row elements [125].) Without ladders the low-frequency dielectric constant is -almost universally -about 80% of experiment ( [126]). As we will show elsewhere [122] the addition of ladders in W seem to dramatically improve on the charge channel even in strongly correlated cases such as CoO. VO 2 is an excellent test bed for optics: it has strong correlations in the monoclinic phase; yet, the conductivity is well described by nonmagnetic QSGW, provided ladders are taken into account [116].
Below we present more detailed properties of the band structure obtained from classical QSGW on a single system, selecting GaAs because reliable measurements are available and it is representative of results obtained for the entire family of zincblende semiconductors (Table 4.3). We observe the following: 1. Dispersions Γ-L and Γ-X are not mostly well described, and vastly better than the LDA (which predicts the Γ-X dispersion to be 1 eV). 2. The valence bands match photoemission data to within experimental resolution. 3. The Ga 3d (see Table below) is pushed down relative to the LDA and also relative to  Table 5: Fundamental gap, effective masses, nonparabolicity α and location of 5/2 and 3/2 Ga 3d core states relative to the valance band maximum. Units are eV. The measurement of α was taken from Ref. [128], which is within 1% of a subsequent measurement [129]. Photoemission data taken from Ref. [130].
Binding of shallow core-like states on e.g. Cd, Zn, and Cu is underestimates by ∼0.4 eV; for deeper states such as Ga 3d it is larger. It was shown in Ref. [127] that a low-order vertex correction to GW accounts for most of this discrepancy, at least for shallow core-like levels. 4. Conduction bands are slightly overestimated and the mass slightly too large. 5. The nonparabolicity parameter α was calculated along the [001], [110], and [111] directions. It varies ∼30% for different directions, but since only scalar quantities are reported, we take a geometrical average. α is very sensitive to m * , so we can expect it to be underestimated a little; this turns out to be the case. Using Eq. (186) all three quantities align well with experiment: Other discrepancies become apparent in the rare earths: splitting between occupied and unoccupied 4f levels is too large [131], and multiplet splittings, which are significant for 4f , lie beyond the scope QSGW.
As we will show elsewhere [122] most of the systematic errors noted above are largely corrected when ladders are included in W . Kutepov [99] showed that the bandgaps significantly improve, but the improvement extends to a wide range of properties and very diverse kinds of materials systems. See for example, the excellent description of optical conductivity in VO 2 [116].
Metals and local-moment magnetic systems are similarly well described; see for example the excellent description of known properties of Fe in the Fermi liquid regime in Ref. [112], and the electronic structure of NiO and MnO in Ref. [16].
In summary, absent strong spin fluctuations, and a few other mild exceptions, QSGW with some low-order extensions (ladders and a low-order ver-tex for narrow semicore states), describe a wide range of materials throughout the periodic table with uniform accuracy and reliability that cannot be matched by any other method. The greatest failings appear in QSGW for systems where spin fluctuations are large. This is not surprising since the main many-body effect in GW are plasmons. GW has no diagrams in spin beyond the Fock exchange. When spin fluctuations become important, the first effect is to reduce the average moment [132,133]. As they increase, states begin to lose their coherence: the band picture and QSGW both begin to break down. A good example of this is La 2 CuO 4 (Fig. 25). At room temperature, undoped La 2 CuO 4 is a paramagnetic insulator, undergoing a transition to an antiferromagnetic insulator at 325 K [134]. Fig. 25 shows QSGW calculations in the nonmagnetic state, and the antiferromagnetic one. Approximating the paramagnetic state by nonmagnetic state is an inadequate approximation, though it is often done. In this approximation La 2 CuO 4 is predicted to be metallic, with a single Cu d x 2 −y 2 band crossing E F . In the antiferromagnetic state La 2 CuO 4 is found to be an insulator with a 4 eV bandgap. Experimentally La 2 CuO 4 is an insulator in both paramagnetic and antiferromagnetic phases, and with a gap not precisely known but on the order of 1 eV.

Limitations of QS GW
QSGW severely overestimates the optical gap in La 2 CuO 4 . There is a similar mismatch for CoO: the gap is overestimated by ∼2 eV in these cases, but not in NiO. Particularly telling is how for VO 2 nonmagnetic QSGW does describe the M1 phase very well when both V-V pairs dimerise [116], but not the M2 phase where only one of them does. (A gap opens up when M2 is calculated antiferromagnetically, but M2 is probably paramagnetic at room temperature.) M1 and M2 differ mainly through the V-V dimerisation, which suppresses spin fluctuations. As we will show elsewhere [122], including ladders considerably improves all of these antiferromagnetic insulators, but discrepancies remain particularly for La 2 CuO 4 . We believe this to be an artefact of the interaction between spin and charge fluctuations.  FeSe is a heavily studied superconductor with large spin fluctuations: it provides an excellent testbed to compare LDA and QSGW. Nonmagnetic calculations for LDA and QSGW are shown in Table 6. QSGW improves on the LDA, but discrepancies with ARPES are much larger than for, e.g. elemental Fe [112]. That LDA and QSGW should be different is readily seen from the k dependence of the Z factor, shown in Fig. 26. Local potentials such as the LDA cannot incorporate such an effect.
Spin fluctuations are important, but QSGW does not adequately capture them. They can be incorporated in a mean-field manner by constructing a Special QuasiRandom Structure [135] with, in this case, three spin-up and three spin-down Fe atoms. A symmetry is imposed that each spin-up site is linked to a spin-down site with equal and opposite moment (Fig. 26). Two magnetic solutions can be stabilised: a low-moment and high-moment solution. Quasiparticle spectra in the latter case are far removed from experiment, so we consider only the low-moment solution. The moments on each Fe site are different, with a range |m|=0.2±0.15µ B . The addition of magnetic terms shift QP levels closer to ARPES measurements, but a significant discrepancy with experiment remains. In Sec. 5 these results are revisited where local, high-order diagrams in spin are included.
These two systems show concretely how the shortcomings of GW become apparent when spin fluctuations are important. The simplest diagram beyond GW, the T matrix, was first considered in an ab initio framework by Aryasetiawan and Karlsson [136]; more recently it was implemented in the spex code [137]. If spin fluctuations are not strong, as seems to be the case for Fe and Ni, such an approach seems to describe spin excitations fairly well, though comparisons with experiments are too sparse to draw any strong conclusions yet. For strong spin fluctuations, e.g. in La 2 CuO 4 or other unconventional superconductors such as FeSe, it seems probable that many kinds of diagrams are needed, though the effective vertex entering into those diagrams is expected to be mostly local. Dynamical Mean Field Theory is ideally situated to address such situations, and is discussed in Sec. 5.
Our view is that QSGW with ladders in W are sufficient for reliable description of susceptibilities within the charge channel, especially when ladders are included in the self-consistency cycle [99,122] to make the QSGW H 0 . In the spin channel DMFT contributes the dominant missing diagrams to the self-energy, and the susceptibilities can be reliably determined from local vertices, connected to kdependent bubble diagrams.

DFT+DMFT and QSGW +DMFT
Hartree Fock and DFT are a band structure theories: electrons are treated as though they are independent; in solids electron states are Bloch waves. GW includes correlations that go beyond the Bloch picture; still it is a perturbation around the Bloch description. In Landau Fermi-liquid theory, electrons are replaced by 'quasi-particles' which are adiabatically connected to the single-particle Blochwave representation of electrons. This, by construction, is a theory for excited states that adds perturbative corrections to the band picture. In Landau adiabatic theory electrons can have an effective mass, which is different from its free mass, and additionally, lifetime broadening effects far from the Fermi energy. One extreme case of this scenario is when the low-energy quasi-particle vanishes (without magnetic ordering) and the atomic-like, highenergy excitation emerges and effective mass at the Fermi energy tends to ∞. In effect, this is an emergent, non-adiabatic, feature where single-particle spectral weight flows to higher energies to conserve the total spectral weight and subsequently leads to suppression of low-energy quasi-particles. Popular rigid-band techniques [138,139,140,141,142,143,144,145] would catch the physics of suppression of quasi-particles at Fermi energy, however, would fail to conserve the spectral weight since a dynamic self-energy (Σ(ω)) is absent from those theories.
DMFT was formulated as an effective theory that smoothly interpolates between the Bloch-wave limit and atomic limit. It is formulated as a local impurity embedded self-consistently in a medium or bath, and was introduced in a seminal work by Metzner and Vollhardt [146] in late 80's. Several other works from Mueller-Hartmann [147], Brandt and Mielsh [148], Jarrell et al. [149] and Georges et al. [150] helped to build a strong foundation for DMFT. The formulation and the implementation became popular through the seminal work by Georges et al. [151] and later emerged as a prospective candidate for bettering the ab-initio LDA description [152] of the bath.
An implementation of DMFT has three components: the partitioning of the Hamiltonian into impurity + bath, a means to solve the impurity problem, and the bath-impurity self-consistency. The bath is essential because correlated states of interest have both atomic-like and Bloch-like properties with interesting low-and high-energy excitations. Through self-consistency the impurity feeds back on the bath, and ensures that all the symmetries of the entire system are preserved. In Questaal, QSGW or DFT can act as the bath. We have interfaces to two local solvers: the hybridisation expansion flavor of continuous time Quantum Monte Carlo (CT-QMC) from TRIQS [27] and the CT-QMC written by Kristjan Haule [26].
For the partitioning, some correlated subspace must be identified and this is done using projectors. Usually the subspace is derived from atomiclike d-or f -states of a transition metal or f shell element. For projectors maximally-localised Wannier orbitals have been widely used [153,154]. We have so far adopted the projection into partial waves of a particular in an augmentation sphere [155]. There are strong reasons to prefer this scheme: the basis set is atom-centred and very localised with a definite , which is the Hilbert space where the correlations are strong. The less localised the basis, the more it contains weakly correlated components (e.g. O-p states in MTOs). Too much population of the correlated subspace with uncorrelated parts conflicts with the physical interpretation of separation of bath and interacting subspace and can reduce the reliability of the method [156]. Moreover, very localised basis functions have a much weaker energy-dependence. As a consequence, with a very local projector it is possible to construct an effective interaction over a wide energy window, reducing the frequency-dependence of the effective interaction, and essentially recovering the partial waves in augmentation spheres [17]. Partial waves closely resemble atomic orbitals, so solving a purely locally correlated Hamiltonian with the approximation of single-site DMFT (self energy is local) is a good approximation.
The thorniest issue is how to construct the effective local Hamiltonian. It requires three related quantities: definition of the local Green's function, an effective coulomb interaction U , which is partially screened by the bath, and the double counting (DC) correction.
In a fully ab-initio QSGW + DMFT theory, these quantities must be computed from the theory itself, but how to accomplish this is far from a settled issue. Questaal can estimate the Hubbard U and J using the constrained random-phase approximation [157] where the internal transition between states included in the DMFT subspace are removed. A fully internally consistent theory requires a frequency-dependent U (ω), but available solvers can only solve the local impurity problem with such a U in the density-density channel. Haule's prescription for excluding states in a large energy window partially mitigates this issue, since U is closer to bare and nearly energy-independent [17]. (It makes another approximation, namely that not all the states excluded from screening are included in the impurity problem; these states are effectively treated only at the bath level.)

Questaal's implementation of DMFT
Questaal does not have its own DMFT solver. It has an interface to Haule's CTQMC solver [26] and to the TRIQS library [27]. Haule's CTQMC solver uses singular value decomposition (SVD) and TRIQS uses Legendre polynomial basis as compact representations for single-particle propagators.
A fully ab initio implementation is a work in progress, that will be reported on elsewhere. Within Questaal U, J can be calculated within constrained RPA, following Ersoy et al. [158]. It has been tested on several materials; for instance, the Hubbard U and Hund's J for Ni are shown in Fig.  27 which agrees well with Ref. [158]. The Hubbard U for FeSe using QSGW gives U (0) = 3.4 eV, also close to what has been found in the literature [157]. In practical calculations today, we use a static U and adopt the fully localised limit (FLL) for double counting (Sec.3.8).
We have used QSGW +DMFT to study singleand two-particle properties of hole doped cuprate, La 2−x Sr x CuO 4 . The parent compound at x=0 is a Mott insulator; while QSGW calculated nonmagnetically leads to a metallic band structure, and an insulator with a wide gap when done antiferromagnetically (Fig. 25). The spins on the Cu in La 2 CuO 4 fluctuate strongly and corresponding spin fluctuation diagrams are missing from  [83,17] (charge blocking without magnetic ordering). The five Cu-3d partial waves in the augmentation spheres comprise the correlated subspace, and the 3d x 2 −y 2 splits off to form the gap (Fig. 28), which is the paramamagnetic analogue to the antiferromagnetic QSGW band structure (Fig. 25). We solve the correlated impurity Hamiltonian in the Cu-3d orbitals using the singlesite CT-QMC solver within paramagnetic DMFT. Embedding local Green's functions into the QSGW bath, we can compute the non-local Green's functions dressed by the local DMFT self energy. They can be used to make spectral functions and the bubble diagrams entering into response functions (Sec. 6.4).
On doping, La 2−x Sr x CuO 4 undergoes a metalinsulator transition when x is sufficiently large. At high temperatures, the x=0.12 system is known to be metallic. We recover a metallic solution in our QSGW +DMFT for x=0.12, using a virtual crystal approximation for x. We find the classical three peak spectral feature with atomic-like upper and lower Hubbard bands and a low energy incoherent quasi-particle peak at E F (Fig. 28). This structure is typical to strongly correlated metals in proximity of a localisation-delocalisation transition [151,159]. Comparing with the insulating case at x=0, the spectral features for x=.12 has a natural explanation in the fact that spectral weight flows to low energies under doping.
Single-site DMFT incorporates the local spin fluctuation diagrams through a local self energy. It can not explain Fermi-arc features of weakly holedoped cuprates and momentum-dependent suppres-sion of electron pockets in de-twinned FeSe. In weakly hole-doped cuprates Fermi arc emerges, most likely, due to spin fluctuations which are longer ranged in nature and the corresponding diagrams are absent from single-site DMFT. Similarly, in FeSe, there are nematic fluctuations (which are inherently nonlocal), and the corresponding diagrams are absent from our theory. To incorporate such diagrams one needs to go beyond the singlesite DMFT; cluster DMFT [160] or diagrammatic techniques like Dual Fermions [161].

Susceptibilities
Susceptibilities play a central role in many kinds of material properties. In the static limit they reduce to the second derivative of the total energy with respect to the corresponding perturbation. This section presents some general properties of linear susceptibilities, focusing on the transverse spin susceptibility χ +− , which is the magnetic analogue of the better known charge susceptibility. The formalism of the latter is very similar, but it is simpler as it concerns the response to a scalar potential. Also in the spin case, there is no analogue to the lowest order diagram (time-dependent Hartree approximation) in the charge case. To characterise the magnetic susceptibility beyond the noninteracting case requires a vertex such as the T matrix [136].
Here we outline some general features; other sections explain how they are implemented in Questaal. The spin susceptibility relates the induced moment δm to a perturbation δB: δm(r, t) = − dt d 3 r χ(r, r , t − t )B(r , t ) (187) χ is a three-component tensor, χ αβ with α and β Cartesian coordinates. The charge susceptibility has the same form, only the perturbing potential is the scalar electric potential, which induces a (scalar) charge density. In general, there can be cross-coupling between spin and charge, so that the full χ is a 4×4 matrix connecting spin and magnetisation.
We restrict ourselves to linear perturbations around an equilibrium point, where the unperturbed system is time invariant. χ is a function only of the difference t − t , so its Fourier transform depends on a single frequency ω. If in addition the reference system is in a periodic lattice, its Fourier transform space depends only on the difference in translation vectors, whose Fourier transform has a single q within the Brillouin zone with r and r limited to a unit cell. Denotingχ(q, r, r , ω) as χ Fourier transformed in both space and time,χ is typically what is measured by spectroscopy.
A perturbation causes the system to generate an internal field B xc , which adds to the total field, In collinear spin structures (all spins parallel to z), transverse and longitudinal components are decoupled. Moreover, by rotating from (x, y) to x +− = x ± iy, the three component χ becomes diagonal with transverse elements χ +− , χ −+ , and longitudinal element χ zz . Eq. (187) also applies to the charge channel with the substitution B→V and m→n, so that the full χ becomes a 4 × 4 matrix, as noted above. In general χ zz is coupled to the charge channel, while χ +− and χ −+ are not. Also if the equilibrium system is nonmagnetic, χ +− =χ zz . In this work we are concerned only with χ +− and the charge susceptibility χ Q , or alternatively the dielectric function. The "magnetic" kernel I xc = ∂B xc /∂m is in general a function of (r, r , ω), though it must satisfy a sum rule [107]. In many-body perturbation theory, the analogue of I is the most challenging quantity to obtain, but in the adiabatic time-dependent LDA it is static and local, and is written in the transverse case as Eq. (187) relates m to either B ext or B tot δm = χ 0 δB tot = χ δB ext (190) when χ 0 and χ are the bare (noninteracting) and full susceptibilities, respectively. The preceding equations imply the relation The noninteracting susceptibility can be computed by linearising Dyson's equation for the perturbation δB ext −χ +− 0 (r 1 , r 2 , ω) = G ↑ (r 1 , r 2 , ω) ⊗ G ↓ (r 1 , r 2 , −ω) (192) ⊗ indicates frequency convolution: . We use ↓ for spin 1 and ↑ for spin 2. Similar equations with permutations of ↑ and ↓ may be written for the other components of χ 0 . In the Lehmann representation χ +− 0 involves the product of four eigenfunctions (193) where k = q + k. We note in passing, that in a proper MBPT formulation of susceptibility, e.g. the T-matrix [136,137,162], is a four-point quantity: χ is solved by a Bethe-Salpeter equation involving W and a four-point analogue of Eq. (193), which at the end is contracted to two coordinates.
The Questaal code at present does not yet have a perturbative (T matrix) approach for the spin susceptibility, although one was recently reported in a QSGW framework [162]. It does have the ability to include the charge analogue of ladder diagrams to solve a Bethe-Salpeter equation for the dielectric function [113]. In the DMFT context, a twoparticle Green's function, where all local diagrams are included, make a local four-point vertex that replaces I.
The first QSGW implementation of magnetic response functions [107] was designed for local moment systems, by which is meant systems with rigid spins (RSA): when perturbed by a transverse δB ⊥ ext m rotates rigidly without changing shape. (Archetypal examples are Fe, NiO, and MnAs [107].) In such a case the representation of χ +− (r, r , ω) simplifies to m(r) χ +− (ω) m(r ) (194) and I can be completely determined by the sum rule [107], thus avoiding a diagrammatic calculation for it. This is Questaal's present perturbative approach to computing χ.
In the RSA χ +− (r, r , ω) is discretised to a lattice model and can be written as χ +− R,R (ω). This makes it convenient to construct a Heisenberg model 195) and extract J RR from χ as The δm are understood to be transverse to m R . Thus QSGW can be used to determine coefficients J RR entering into the Heisenberg model. This basic idea, with some enhancements, is what is described in Ref. [107].

Optical Response Functions in Questaal
The linear optical response is described by the imaginary part of the dielectric response function 2 (ω) or equivalently the real part of the frequency dependent conductivity σ 1 (ω) = −iω 2 (ω)/4π at frequencies in the ultraviolet-visible (UV-VIS) range. From it one can obtain the real part 1 (ω) by the Kramers-Kronig relation and hence other relevant functions such as the complex index of refractionñ(ω) = 1 (ω) + i 2 (ω) = n(ω) + iκ(ω) with κ(ω) the extinction coefficient and n(ω) the real part of the index of refraction. The reflectivity R(ω) is then obtained via the Fresnel equations. For instance, the normal incidence reflectivity is a directly measurable quantity. The absorption coefficient is measurable in transmission and spectroscopic ellipsometry allows one to measure directly 1 (ω) and 2 (ω). There are several levels of theory at which one can obtain the optical dielectric function supported in the Questaal package. Typically we need the macroscopic dielectric function: This is essentially obtained as a byproduct of the GW method (see Sec. 4). One way to obtain this limit is to take a small but finite q and use lmgw to obtain the plane wave matrix elements of the inverse dielectric function. The direction of the finite q then defines the specific component of [ 2 (ω)] αα which is actually a tensor. If one simply calculates instead one obtains the value without local-field corrections.
Alternatively, however, one can take the limit q → 0 analytically and arrive at the Adler-Wiser formula in the RPA, n n k∈BZ Here Ω is the unit cell volume and f nk are the occupation numbers, which in principle are given by the Fermi function at finite temperature but are in practice taken to be 0 or 1 for empty and occupied states respectively. The v α are the components of the velocity operator v =ṙ = (i/ )[H, r].
For systems with at least orthorhombic symmetry the 2 (ω) tensor is diagonal in the Cartesian components. This well-known equation in the independent particle and long-wavelength form gives the 2 (ω) in terms of matrix-elements of the velocity operator and vertical interband transitions and can also be obtained from the Kubo formula for the optical conductivity. Within the all-electron methods (no non-local pseudopotentials) and if GW -selfenergies are not included, the velocity matrix elements are equivalent to the momentum matrix elements v = p/m and can be obtained straightforwardly from the ∇ operator matrix elements within the muffin-tin spheres and using a Fourier transform for the smooth part of the wave functions in the full-potential implementation. The matrix elements of the ∇ operator between spherical harmonics times radial functions inside the muffin-tin spheres is obtained using the well-known gradient formula [163]; more generally, see Sec. 3.2. When the non-local self-energy operator or its Hermitian version are included, however, (im/ )[H, r] is no longer equal to the momentum operator. To correct for this, the approximation proposed by Levine and Allan [164,165] can be used, which consists in rescaling the matrix elements by a factor (E n k − E nk )/(E LDA n k − E LDA nk ). This is exact when the LDA and GW eigenfunctions are the same, and it works well in weakly correlated systems. However, it breaks down when correlations become strong, as in NiO. Questaal also has an approximate form for the proper nonlocal contribution to the velocity operator. As of this writing, it takes into account only intercell contributions. This approximation apparently does not work well in strongly correlated systems, and results with full matrix element will be reported in due course [122].
The long-wave length approximation without local field (or excitonic) effects of ε 2 (ω) can thus be obtained both in the ASA lm and the full-potential lmf codes and, when reading in the QSGW selfenergy, can use the corrected quasiparticle energies rather than the Kohn-Sham eigenvalues. While approximate, it has the advantage that the q → 0 limit is taken analytically and that one can decompose the optical response in contributions from each occupied-empty band pair. By furthermore plotting vertical interband transition energies for any given pair, one can analyse where the bands are parallel and give their largest contribution to the joint density of states. This provides a way to relate the critical points or van-Hove singularities in the optical response to particular interband transitions. lmf and lm use this to enable decomposition into band-pair contributions if desired; it can also resolve contributions to by k.
The code also allows one to calculate dipole optical matrix elements between core states and valence or conduction band states and this can be used to model X-ray absorption (XAS) and emission (XES) spectroscopies. Furthermore, one can calculate Resonant X-ray Emission Spectroscopy (RXES) also known as Resonant Inelastic X-ray Scattering (RIXS) using the Kramers-Heisenberg equations [166]. The interesting point about this spectroscopy is that it allows to probe transitions between valence and conduction band states of the same angular momentum on a given site. In fact, let's say one considers as X-ray edge the K-edge or 1s core level, then the RXES probes transitions between valence bands and conduction bands of p angular momentum on the site of core hole that are in resonance with the difference of the absorbed and subsequently coherently emitted X-ray photon transitions from these conduction and valence states to the same core-hole. In optical measurements in the VIS-UV region, on the other hand, one probes only to ±1 dipole allowed transitions. Using p-levels as core hole RIXS thus probes d − d transitions, which often may be strongly influenced by many-body effects. The Kramers-Heisenberg formula assumes the wavevector of the X-ray can be neglected so that only direct transitions at the same k-point are allowed and within this approximation it allows one in principle to extract banddispersions in a manner complementary to ARPES.
It was established 20 years ago that RPA approximation to the dielectric function can be dramatically improved by adding ladder diagrams via a Bethe-Salpeter equation [120,121] (BSE) in simple sp semiconductors. Traditionally the method is used by substituting GW eigenvalues for the DFT ones obtained from one-shot GW (Sec. 4). For simple sp semiconductors where LDA and GW eigenfunctions are similar, this works well. It does not, however, when correlations become strong, e.g. in CuO 2 , where it was shown that (ω) calculated from the BSE starting from a QSGW reference described the measured spectrum quite well [167]. Recently Cunningham et al. studied optics from BSE using QSGW a reference for a number of systems, and found very good agreement with experiment even in correlated systems such as NiO [113] and the monoclinic phase of VO 2 [116]. However, when spin fluctuations are large and QSGW does not describe the underlying band structure well, as in La 2 CuO 4 (see Sec. 4.4), the description begins to break down. Recently Cunningham has added ladders to W in the QSGW self-consistency cycle. This seems to dramatically improve on the description of (ω), even in an extreme case such as La 2 CuO 4 . These capabilities are embedded in the standard Questaal distribution.
In the ASA code lm second harmonic generation coefficients can also be calculated but this portion of the code has not recently been updated to become applicable in the full-potential lmf implementation. The calculation of NLO coefficients is a bit more complex even within the long-wavelength independent particle approximation because of the need to include both intra-and inter-band transitions and disentangle them in a careful manner. The way to obtain divergence free expression was described in a series of papers by John Sipe and coworkers [168,169,170]. The version implemented in the code [171] uses Aversa's length gauge formalism. The intra-band matrix elements r i and interband r e in this formalism are in which Ω is the volume of the unit cell and u nk (r) is the periodic part of the Bloch function. The ξ nn can be recognised to be a Berry connection. The manipulation of the matrix elements of r i in commutators is non-trivial but described in detail in Aversa and Sipe [168]. An update of these parts of the code to incorporate them in lmf and to be made compatible with QSGW band structures as input is intended. See also Sec. 6.4.

Spin Susceptibility and Magnetic
Exchange interactions in the ASA Sec. 6 presents general formulations of the spin and charge susceptibility. In the ASA, the static transverse susceptibility is implemented in the lmgf code in a formulation essentially similar to the rigidspin approximation described there, with an additional "long-wave" approximation [172] Using the sum rule, I need not be calculated but inferred from χ 0 . It is possible to compute χ 0 from the full Green's function, Eq. (58). But lmgf implements the classic Lichtenstein formula [173], in which exchange parameters J RR are derived in terms of the auxiliary g and the "magnetic force" theorem. This latter says that the change in total energy upon spin rotation simplifies to the change in eigenvalue sum; it is a special instance of the Helman-Feynman theorem. It was realised sometime later [172] that the Lichtenstein formula is correct only in the q→0 limit (the long-wave approximation), beginning to deviate at around k=0.25·2π/a in elemental 3d magnets. In the notation of this paper, Lichtenstein's formula reads Questaal implements it for crystals, and for alloys within the CPA. See Refs. [48,174,175] for a detailed description.
6.3. Comparing QS GW and LDA Spin response functions In addition to the dielectric function, The GW code has an implementation of spin response functions within the rigid spin approximation as described Sec.6. This avoids direct calculation of the vertex, and is very simple, but is restricted to localmoment systems. Ref. [107] applied this approach to NiO, MnO and MnAs, with excellent results.
Questaal has the ability to compute spin waves (SWs) directly, or to extract parameters J entering into the Heisenberg model, Eq. 195. One application -Mn x Ga 1−x As alloys -are particularly instructive because they continue to attract interest as a spintronics material. Here we will compare the (ASA) Lichtenstein formula against fullpotential, rigid-spin result using both LDA and QSGW Hamiltonians. Computing properties for random alloys of any x is feasible within the ASA, via either the Lichtenstein formula or relaxing large structures using spin statics, Sec. 2.19. But a similar study is not feasible today in QSGW, so instead we consider four-atom Special QuasiRandom Structures [135] to simulate Mn 0.25 Ga 0.75 As and Mn 0.75 Ga 0. 25 As, and also the x=1 case. Mn's large local moment of around 4µ B makes the approximations in Questaal's QSGW magnetic response good ones.
DFT predicts Mn x Ga 1−x As to become a spin glass when x 0.35, though T c depends on what fraction of Mn go into interstitial sites, and how the Mn are ordered on the Ga sublattice. One application of the Questaal code was to show how Mn in alloys that favour ordering on the (201) orientation can optimise T c [176]. In the pure Zb-MnAs case, x=1, LDA predicts to be strongly antiferromagnetic, with negative spin wave frequencies everywhere in the Brillouin zone (left panel, Fig. 29). The figures show SWs calculated in the ASA using the Lichtenstein formula, Eq. (205). Its long-wave approximation should be accurate for small q, but it underestimates the strength of J for large q [172]. This is apparent in Fig. 29, comparing the LDA and Lichtenstein formulas.
This finding seems to contradict a measurement on a quantum dot of Zb-MnAs (Ref. [177]), which predicts it to be ferromagnetic. Indeed, QSGW calculations of the same system show that spin wave frequencies are everywhere positive (left panel, Fig. 29). T c based on Heisenberg parameters extracted magnon peaks in the SW spectrum is positive, of order 600K (somewhat larger than what Okabayashi observed).
Exchange parameters J were extracted for Zbthree compositions, in LDA and for the pure MnAs case, in QSGW (right panel, Fig. 29). At 25% composition J is still positive, but the nearestneighbour J is already very small. At 50%, J is (R/a) 2 [110] [001] [110] [111] Figure 29: Left: spin waves in Zb-MnAs calculated in the GW code with QSGW potential (green) and LDA potential (blue), and also in the negative on average, and a spin glass is predicted. In pure Zb-MnAs, J oscillate in sign with neighbour distance, but the nearest neighbours (particularly along [001]) are strongly antiferromagnetic, leading to negative frequencies in the SW spectrum. QSGW shows a marked contrast. Only the NN is important, with J > 0. QSGW and LDA are so different because the LDA underestimates the exchange splitting between spin-up and spin-down 3d states, by ∼1 eV. This pushes the minority d too close to E F , and gives rise to long-range, antiferromagnetic interactions.

Response functions within DMFT
With a converged self-energy, the CTQMC can sample the two-particle Green's function [178] to obtain local spin and charge vertices. Finally, the non-local Bethe-Salpeter equations (BSE) in spin, charge and superconducting channels [178] can be solved from a local vertex and a non-local polarisation bubble in the respective channels. This allows us to compute the corresponding real [83] and imaginary part of susceptibilities [83,179] in those channels. In Fig. (30) we show the real part of the spin susceptibilities in x=0 La 2−x Sr x CuO 4 , adapted from Ref. [83]. There is a peak at q=(π, π) suggesting dominant Néel anti-ferromagnetic spin fluctuations 30.

Towards a High-Fidelity solution of the Many-Body Problem
Solving the many-electron problem with highfidelity is a formidably difficult task. Our strategy to accomplish this relies on the following premises: (i) Many-body solutions are best framed around a non-interacting starting point, and we believe that QSGW is the best choice among them, by construction.
(ii) Charge fluctuations governed by long-range interactions, but they can be treated accurately with low-order perturbation theory (iii) Spin fluctuations are governed mostly by single-site effective Hamiltonian (or action). The short-range interactions can be too strong to handle perturbatively, as can be seen by the non-analytic behaviour of one-and two-particle quantities, and their high degree of sensitivity to small perturbations, e.g. change in temperature. That being said, nonlocal contributions are much weaker and can be treated in perturbation theory. We have framed a hierarchical strategy around these premises (Fig. 31). At the lowest level is QSGW, which depends only minimally on DFT. We think this is of central importance to frame a consistently reliable theory, for reasons we have pointed out in Sec. 4.1.
Referring to the figure, QSGW is adequate for many purposes. When not, there are two routes: a perturbative, nonlocal path (1 , 2 , 3 ) : low-order diagrams are added to W to make (W → W ). To date we have added ladders to the charge channel; in progress is a project to add the electron-phonon interaction perturbatively [181], effectively adding another bosonic contribution to W . Also possible, but not yet accomplished in Questaal, is to add low order spin fluctuation diagrams such as the T matrix. All of them make a better G (G ++ in the Figure). The second path begins as nonperturbative, local path (1). From a local interaction nonlocal susceptibilities can be constructed (2). We have shown a few instances of this in Sec. 6.4, and the results are remarkably good in cases we have studied so far, e.g. Ref. [179]. Finally, the susceptibilities can be used to add a new diagrammatic contribution (3) to G (G++). The simplest addition is Dual Fermions [161].
Perhaps more satisfactory would be to have a single, unified approach, for example the Diagrammatic Monte Carlo (DiagMC) method [182]. But it is formidably difficult to make this method work practically in real systems, because of the enormous time and memory costs that realistic (as opposed to model) Hamiltonians require. We think that the JPO basis alluded to in Sec. 3.12 is perhaps the most promising framework to realise DiagMC for realistic Hamiltonians. Even in such a case an optimal approach would likely closely resemble Fig. 31, but with some cross-linking between paths (1) and (1 ), using a diagrammatic Monte Carlo solver only to include nonlocal diagrams beyond the RPA.

Software aspects
The software package has a long history and has featured a high level of modularisation from its early days. The different parts interoperate through shared interfaces and file data formats.
The implementation is mainly in procedural Fortran 77. Fixes and improvements as well as new feature code uses modern Fortran features (f2008+) for convenience as well as improved reliability and interoperability. Despite f77's shortcomings and questionable reputation, sound software engineering practices have been observed, there is (1) extensive code documentation in an uniform format, (2) almost no shared mutable state (i.e. common blocks or module variables); (3) action through side effect is avoided and data is passed through arguments, (4) unnecessary temporary data copies are avoided, (5) data structures are organised with performance in mind and most heavy number crunching is outsourced to performance libraries implementing the BLAS, LAPACK, FFTW3 APIs, (6) the code is written in an uniform style with consistent flow patterns. This practice has been extended through the use of modern version tracking and continuous integration pipelines incorporating regression, coverage and performance testing, and minimal style policy enforcement.

Release policy
We use the popular distributed version tracking system git. The public online hosted repository is hooked to continuous integration pipelines executing the steps above on each push event. New features are developed in separate feature branches, if pushed to the public online repository, these are visible to all users. When judged safe and reasonably usable, feature branches are merged to the main development branch ("lm"). After extensive use, the development branch is merged to the release branch ("master") and a new release is tagged with a numerical version in the format vmajor.minor.patchlevel. If/when issues are discovered and fixed the patchlevel is incremented in a new tag and the new commits are merged back to the development branch and from there to the feature branches. This approach reduces the maintenance burden significantly, however it does mean that once the minor version is incremented there are no more patches offered to the older versions and users are strongly encouraged to update to the latest version available. Since new developments are effectively done by interested users, there is as of yet no contractual support offered, we feel this is a justified arrangement and in the long term offers users new features and improved performance with effectively close to no risk. The major version is only incremented when a very significant, possibly incompatible rewrite of core components or the input system has been performed.
New code and changes to existing code require: (1) coding standards pass; (2) regression tests suite pass (approximately 350 tests as of this writing); and (3) test coverage of at least 90% of code.

Parallelisation
The programmes of the package support multilevel parallelisation multiprocessing through the message passing interface (MPI) and simultaneous multithreading through performance libraries and (when necessary) manual OpenMP directives. The full-potential program uses multiprocessing mostly during the Brillouin zone sampling, outside of the q-loop the MPI processes cooperate on the local potential generation, only multithreading is available within the processing of each q-point. The empirical polarisable ion tight binding program offers more flexibility in this regard by allowing simultaneous assignment of groups of processes to each q-point as well as spreading different blocks of qpoints over groups of processes [183]. A somewhat similar approach has been recently developed by Martin Lüders for the GW code. While it is available in a feature branch it is not yet considered production ready and has not been merged to the main branch. The GW code is fairly multithreaded and for systems of at least 15-20 atoms performs well on many-core wide vector architectures like Intel's Xeon Phi family of processors (tested on x200, Knight's Landing). A small but performancecritical part was also ported to Nvidia's CUDA parallel platform and shows promising performance on hardware with good double precision floating point capabilities (for example Titan, P100, V100 compute cards). The polarisable ion tight binding program makes more extensive use of CUDA; it is described in [183].
To ease parallel IO, classic fortran binary files are being moved to HDF5 based data files. This also improves interoperability with various postprocessing environments because the HDF5 libraries offer bindings to a variety of popular programming languages.

Usability
Full-potential blm automatic input file generator lmf main band code lmfa free atom solver lmfgwd interface to GW lmfdmft interface to DMFT Atomic sphere approximation lm ASA band code lmgf ASA crystal Green's function code lmpg ASA principal layer Green's function lmstr ASA structure constants tbe empirical tight-binding Post-processing and utilities fplot a graphics package pldos a DOS postprocessor plbnds a bands postprocessor lmchk checks structure-related quantities lmdos assembles partial DOS lmfgws dynamical self-energy postprocessing lmscell supercell maker mcx a matrix calculator Editors lmf|lm --rsedit restart file lm --popted Optics lmf --wsig edit Static Σ 0 lmfgws --sfuned Dynamic Σ lmf --chimedit Magnetic susceptibility lmscell --stack Superlattice An ordinary text command line is all one needs to be able to use the programmes from the package. The main executables (see Table 7) support a number of flags, notable among which is "--input", it causes the full input understood by the executable to be printed together with default values and short documentation for each token. Another very valuable and possibly unique feature is the ability to override almost any input file value through the command line through the small embedded preprocessor which renders the input transparent internally (Fig. 32). A third is the suite of special-purpose editors (see Table). There is extensive documentation online together with many tutorials and ready made example script snippets. Users are encouraged to report issues and offer ideas through the online ticketing system and contribute improvements or fixes through pull requests or inline patches.

Selected Publications from Questaal
In this section we point to original papers where new capabilities were developed within Questaal. Some the concepts are not original with Questaal, though many are.
• An early DFT calculation of the Schottky barrier height at a metal/semiconductor contact, showing the importance of screening at the interface [184] • An early calculation of alloy phase diagrams combining statistical theory and DFT [185].
• A systematic technique for deriving force theorems within DFT [9] • The first formulation of adiabatic spin dynamics within DFT [11], and its application to explain the Invar effect in permalloy [186] • A formulation of self-consistent empirical tightbinding theory [28] • A formulation of Electron Energy Loss Spectroscopy in DFT [187] • The original description of Questaal's fullpotential method, Sec. 3 [30] • The original description of Questaal's allelectron GW approximation [188], and first calculation of RPA total energy in a solid [189] • Solution of the Boltzmann transport equation combined with DFT [190] Figure 32: Basic input file: a full-potential GGA calculation for HCP Gd with 7µ B l = 3 starting moment. A preprocessor directive (nk) is used to specify the k-sampling, requesting 1000 points in the full Brillouin zone: the generated grid (12,12,7) has components corresponding to the relative BZ vector lengths. nk, like pwemax-which flags the inclusion of LAPW basis functions-can be changed from the command line. The LMTO basis is to be setup automatically, caused by the AUTOBAS token in the HAM (Hamiltonian) section.
• The original formulation of Quasiparticle Self-Consistent GW [16], demonstration of its wide applicability [80], and a detailed description [81] • Questaal's implementation of nonequilibrium electron transport in nanosystems [14], and significant applications to magnetic transport [191,21,192,23,24] • A fusion of genetic algorithms and exchange interactions to predict and optimise critical temperatures in multi-component systems [176] • Dressselhaus terms calculated ab initio with high fidelity in zincblende semiconductors [104,193,106] • First GW description of 4f systems, showing the tendency to overestimate splitting between occupied and unoccupied f states [131] • Spin wave theory within QSGW [107] • Prediction of Impact Ionisation rates with QSGW [110] • The PMT method of Sec. 3.10 [53] • A formulation of tunnelling transport within QSGW [109] • Effect of spin-orbit coupling on the QSGW selfenergy [74] • Approximate description of transient absorption spectroscopy within QSGW [194] • Questaal's first application of QSGW +DMFT, showing the need to go beyond GW when spin fluctuations are important [112].

Distribution and Licensing
Historically the code has been distributed as a set of source tarballs, in addition registered users can git-clone our online repository. The code is distributed under the terms of the General Public license version 3. Contributions under a compatible license are welcome.

Conclusions
Questaal is a descendant of one of the early allelectron methods developed in Stuttgart. It has gradually evolved to its present form, but as no paper has been written for any of its stages the present work is intended to summarise much of this evolution. Our aim was to present the many expressions from classic works, combined with previously unpublished recent developments, in a unified way to show the connections between the parts.
We made an attempt to show Questaal's strengths as well as its limits, within varying levels of theory. We think Questaal provides a very promising path to efficiently solve the electronic structure problem with high fidelity. It is unique in its potential to span such a wide range of properties and materials, and with varying levels of approximation.

Acknowledgements
Many people have contributed to the Questaal project. Much of the mathematics and basic algorithms in the lmf implementation were largely the work of Michael Methfessel. GW was formulated and implemented largely by Ferdi Aryasetiawan and Takao Kotani, and QSGW by Sergey Faleev. Martin Lüders wrote a parallel version of this code and was involved in some validation aspects. Derek Stewart wrote the original code for Landauer-Buttiker transport, and Faleev extended it to the nonequilibrium case. Pooya Azarhoosh improved on the optics and performed TAS studies. Alena Vishina redesigned the fully relativistic Dirac code, making it self-consistent. Herve Ness has redesigned significant parts of the layer Green's function technique.
Brian Cunningham and Myrta Grüning added ladder diagrams to W , and Savio Laricchia is close to completing a QSGW -based approach to the electron-phonon interaction and related quantities. Both of these additions will have significant impact in the future. Also, many thanks to Kristjan Haule and Gabi Kotliar for helping us get started in DMFT, and to Ole Andersen for being the founder and inspiration for so many of the ideas behind Questaal.
Finally, many thanks to Andy Millis and the Simons Foundation, who enriched Questaal in fundamental ways, and to EPRSC which made it possible to become a community code (CCP9 flagship project: EP/M011631/1).
The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gausscentre.eu) for funding this project by providing computing time on the GCS Supercomputer Su-perMUC at Leibniz Supercomputing Centre (www. lrz.de), the UK Materials and Molecular Modelling Hub for computational resources, which is partially funded by EPSRC (EP/P020194/1) and STFC Scientific Computing Department's SCARF cluster (www.scarf.rl.ac.uk). J.J. acknowledges support under the CCP9 project "Computational Electronic Structure of Condensed Matter", part of the Computational Science Centre for Research Communities (CoSeC).

Appendices
An augmentation radius is labelled by symbol s. Subscript R is used denote a site-dependent quantity, such as the augmentation radius s R . When R appears in a subscript of a function of r such as H RL (r), it implies that r is relative to R, i.e. r − R. When a symbol refers to angular momentum, an upper case letter such as L, refers to both angular quantum number and magnetic quantum number parts, and m.

A. Definition of Real and Spherical Harmonics
Where spherical harmonics are used, Questaal uses the same definitions as Jackson [195]: which are real polynomials in x, y, and z.
The product of two spherical harmonic polynomials can be expanded as a linear combination of these functions in the same manner as ordinary ones The Gaunt coefficients C KLM are nonzero only when k + − m is an even integer, so the r.h.s. is also a polynomial in (x, y, z).

B. On Matrix Elements of Position and Gradient Operators
Matrix elements of position and gradient operators sandwiched between two functions of the form Eq. 122 are required for optics (see Secs. 3.2 and D).
with p=−1,0,1. F p and F p differ only in their radial parts; the coupling of the three angular momenta is the same. They are special instances of Wigner 6j symbols but, because of the gradient, expansion coefficients ± must be kept separate to account for radial parts that depend on whether Y 1p Y 2,m is mapped to Y 2+1,m or Y 2−1,m (Eq. (142)).

C (±)
i; 2,m 2 ;p is a compact representation of a pdependent linear transformation mapping a linear combination Y L2 (r) to a different combination. For purposes of exposition, write it as a p-dependent matrix C

C. Definition of Hankel and Bessel Functions
To distinguish radial parts of spherical functions from solid versions, we denote spherical parts with subscript , and solid functions with subscript L, which refers to both the and m parts.
The Questaal codes generally follow Methfessel's definitions for Hankel and Bessel functions. Writing k= √ E with Im k≥0, they are related to standard spherical Neumann functions n and Bessel functions j as follows: where h 0 = Re e ikr /r and j 0 = sin(kr)/kr (C.11) H L and J L are real for any real energy, and structure constants Eq. (1) have the property S KL (R) = S * LK (−R). They can generated from the =0 functions using the operator Y L (−∇); see, e.g. Eq. (118). The latter are useful for derivations and also practically, for instance, to express the gradient of ∇H L as a linear combination of other H L .
Eq. (125) shows the explicit relation to the usual spherical Hankel function of the first kind, h (1) (z).
Spherical harmonics Y L can be substituted for the Y L in Eq. (C.11).
For historical reasons, Andersen made a different set of definitions, which are convenient for the ASA when E=0. In this paper we distinguish them from Questaal's standard definitions Eq. (C.11) with a caret. For w is a length scale which can be chosen arbitrarily (usually taken to be some average of the MT radii).