Electronic-structure methods for materials design

The accuracy and efficiency of electronic-structure methods to understand, predict and design the properties of materials has driven a new paradigm in research. Simulations can greatly accelerate the identification, characterization and optimization of materials, with this acceleration driven by continuous progress in theory, algorithms and hardware, and by adaptation of concepts and tools from computer science. Nevertheless, the capability to identify and characterize materials relies on the predictive accuracy of the underlying physical descriptions, and on the ability to capture the complexity of realistic systems. We provide here an overview of electronic-structure methods, of their application to the prediction of materials properties, and of the different strategies employed towards the broader goals of materials design and discovery. Simulations can be used to accelerate the characterization and discovery of materials. Here we Review how electronic-structure methods such as density functional theory work, what properties they can be used to predict and how they can be used to design materials.

INSIGHT | Review ARticle NaTure MaTerialS exact ground-state wavefunction). F[ρ] is universal, since it does not depend on the external potential; adding to it ∫v ext ρdr provides a total-energy functional Ev ext [ρ] that is minimized by the exact ground-state density and gives the exact ground-state energy 7 . The original requirements of having non-degenerate ground states and of v-representability (that is, that the functional be defined only on charge densities that are actual ground states of local external potentials) have then been relaxed by the Levy and Lieb formulations 16 .
While F[ρ] is formally well defined, it is not known. To make progress, Kohn and Sham (KS) suggested to decompose it in the sum of three functionals 17 (T s [ρ], E H [ρ] and E xc [ρ]) where the first two can be explicitly defined and calculated, and the unknown leftovers are pushed into the last one. To do so, they introduced an auxiliary system of electrons not interacting with themselves (the KS particles) that, when subjected to a local potential v KS (r), yields the same ground-state charge density of the real system of interacting electrons in the external potential v ext (r). Such construction provides a definition of what v KS is, and allows one to define T s as the kinetic energy of the non-interacting KS particles, that is, a trivial second derivative of their single-particle orbitals (thus, T s is an implicit functional of the density but an explicit functional of the KS orbitals). E H [ρ] is then set to be the classical, electrostatic ('Hartree') energy of the charge density ρ. This leaves to the 'exchange correlation' (xc) functional E xc [ρ] the challenge to recover the exact energy by adding the missing parts of the kinetic energy of the interacting electrons not captured by the KS particles, and of the electron-electron interactions not captured by Hartree electrostatics. Crucially, T s and E H contribute to a large fraction of F [ρ], albeit at the price of reintroducing orbitals to calculate T s ; the remaining xc functional E xc has to be approximated and will determine the accuracy of the calculations.
The ground-state energy can then be obtained by direct minimization of the total-energy functional Ev ext [ρ], or equivalently by the Euler-Lagrange equations associated with the variational principle. These are known as Kohn-Sham equations and, assuming discrete occupied states, are:

Box 1 | Hierarchies of electronic-structure methods
The wavefunction domain. Greatly developed in quantum chemistry, the wavefunction domain starts from Hartree-Fock (HF) and post-Hartree-Fock approaches (a broad class of methods 184 ranging from Møller-Plesset (MP) perturbation theory to coupled cluster (CC) and configuration interaction (CI)) to deliver improved accuracy. Some of these approaches have been extended to treat solid-state systems, also in combination with stochastic sampling 185 -quantum Monte Carlo (QMC) having a long history of delivering accurate results for materials 186,187 . Remarkably, the total energy can also be written as an explicit functional of the second-order reduced-density-matrix γ 2 , but the conditions for which γ 2 is the contraction of a proper wavefunction are not known, precluding variational searches. These are known for the first-order γ 1 , but the correlation functional of reduced-density-matrix functional theory (RDMFT) 188 is unknown and has to be approximated.
The spectral domain. The spectral domain targets spectral properties with many-body Green's function methods; these introduce diagrammatic approximations for the non-local (that is, function of two space variables r and r′) and dynamical (that is, function of the frequency ω) self-energy Σ(r, r ′ , ω), allowing one to obtain the one-particle Green's function and thus the spectral function and the total energy. Beyond, there is a hierarchy of equations of motion for the n-particle Green's function G n in terms of the G n+1 . Dynamical mean-field theory (DMFT) 57,58 builds a self-energy for a localized manifold (typically for some d or f electrons) through the infinite-dimensional limit of the Anderson impurity model, obtaining an orbital-local or short-ranged G DMFT loc (ω) that can describe strong correlations. Extensions include dynamical cluster approximations or the combination of DMFT and the GW approximation 59 .
In spectral formulations, the quasiparticle weights (determined from the derivatives of the self-energy with respect to frequency) do not have to be integers, and complex features such as satellites and side bands can emerge. Functionals dependent on orbital densities ρ(r, i), where i labels the different orbitals, can give rise to local and orbital-dependent self-energies 189 aimed at reproducing the spectral properties 190 of the interacting system. The resulting approaches are then flexible enough to address both total energies and spectral properties 25,191 .
The time-dependent and non-equilibrium domain. This domain is under very active development, also driven by the experimental capabilities to probe electron dynamics at ultrashort timescales. In time-dependent DFT (TDDFT) 64,192 , a correspondence is established between the time-dependent charge density and the external time-dependent potential and initial wavefunction; a KS picture then leads to a time-dependent exchange-correlation potential that depends on the entire history of the system. TDDFT can target the real-time evolution of a system of interacting electrons, but can also describe neutral excitations, since in exact TDDFT these are given by the poles of the density-response function 193 .
While practical TDDFT is still approximate, it asks the right question with respect to excitations (where are the poles?), and already adiabatic functionals (that are not history dependent) provide very good results in molecules. In solids simple xc approximations do not bind excitons, but progress can be made by directly approximating (in linear response) the f xc kernel 64 , or by using potentials from hybrid functionals 49 or DFT+U 194 . These latter lead to time-dependent Hamiltonians with an effective non-local screened exchange, mirroring real-time approaches 66,175 to the solution of the Bethe-Salpeter equation (see Box 2). Last, non-equilibrium Green's functions (NEGF) and the Kadanoff-Baym equations 65 allow for time-dependent generalizations of many-body perturbation theory that can, for example, address transients and time-resolved spectroscopies 63 .
Spectral Time and non-equilibrium Wavefunction γ 2 (r 1 , r 2 , r 1 , r 2 ) Electronic-structure methods. The complex landscape of electronic-structure methods is captured here grouping hierarchies of methods that progressively extend scope and accuracy while increasing cost and complexity.

NaTure MaTerialS
Review ARticle | INSIGHT NaTure MaTerialS where v H and v xc are the functional derivatives with respect to ρ(r) of E H and E xc , φ n and ε n are the nth KS eigenstate and eigenvalue, and the sum runs over the occupied states. The solution of the KS equations needs to be self-consistent, since v H and v xc depend on the charge density ρ(r).
Exact properties. If the exact xc functional were known, ground-state total energies and charge densities would be exact. Quantities directly related to these-first-order and higher-order derivatives of

Box 2 | Green's function methods
Green's function (GF) methods 65,66 provide a systematic approach to address electronic excitations (for example, band structures or optical spectra) and, in principle, total energies. Here the fundamental quantity is G(rt, r′t′), the one-particle GF describing the time propagation of an interacting system when one electron is added at r′t′ and then removed at rt (vice versa for the hole; either case addresses a charged excitation). The GF obeys the Dyson equation is the electronic self-energy capturing the many-body complexity, and G 0 is the GF of the Hartree where equation (3) is an eigenvalue problem for the Dyson orbitals f n (r) that, through equation (4), provide the Fourier transform of the one-particle GF. It is remarkable that these coupled equations actually mirror the KS equations (1) and (2) for orbitals and density. The key difference lies in having a non-local and dynamical self-energy Σxc(r, r ′ , ω) rather than a local exchange-correlation potential.
The eigenvalues ϵ n -the poles of the GF-are total energy differences for the addition or removal of an electron from the interacting system. The Dyson orbitals f n (r) are neither orthogonal nor normalized; for the occupied ones (that can be many more than N), the sum of their weights integrates to N. In the thermodynamic limit, the poles of the GF merge into a branch-cut continuum, giving rise to more complex spectral features, such as the broadening of the non-interacting peaks (hence, 'quasiparticles') and the emergence of satellites and side bands (see Fig. 7.2 in ref. 66 ). These can be experimentally detected by means of photoemission spectroscopy 61 .
Many-body perturbation theory constructs approximations to the self-energy by means of Feynman diagrams 66 ; only certain classes are included, giving rise to many and ever-more-complex approaches. The most used one is the GW approximation 114,183,196 , Σ xc = iGW, where W is the screened Coulomb interaction in the random-phase approximation 66 . Most of the time, GW is applied as a one-shot correction (G 0 W 0 ) on top of KS states and energies. Self-consistency can improve the accuracy of excitation energies 197 and, recovering the missing particle conservation 198 , also of the total energies. Extensions beyond GW add additional diagrams to Σ xc , from vertex corrections 199 to cumulant expansions 200 . Diagrams describing the interaction with other excitations can also be added, and can be very relevant to predict or interpret experiments, as shown in the figure below.
The G 2 Green's function describes the dynamics of two particles (electrons and holes) added to the system and contains, among other, information about neutral excitations and response functions required to address optical properties and other spectroscopies. Taking the variation of G with respect to an arbitrary external perturbation leads 65,66 to the Bethe-Salpeter equation (BSE), a Dyson-like equation for the two-particle correlation function L = − G 2 + GG. Besides the bare exchange coming from the Hartree potential, the kernel of the BSE contains Kxc = δΣxc δG , paralleling the exchange-correlation kernel fxc = δvxc δρ of TDDFT 193 . In the GW approximation, and neglecting the dependence of W on G, the BSE kernel becomes K xc = iW, accounting for electron-hole interactions. This vertex equation can then be cast in the form of an eigenvalue problem for a two-particle Hamiltonian that can address excitonic effects 66 . In practical calculations 172,173 , it requires as a preliminary step the GW orbital energies and the screened Coulomb interaction W.

INSIGHT | Review ARticle
NaTure MaTerialS the energy, or expectation values of local single-particle operatorswould also be exact. So, at the Born-Oppenheimer level, structural, elastic, dielectric, vibrational and thermodynamical properties of materials 13,18 could all be described correctly, provided zero-point motion 19 and finite-temperature anharmonicity 20 were accounted for. In contrast, only a single spectral property would be explicitly reproduced by the KS particles: in a molecule (and in a solid, provided the vacuum level could be accessed 21 ), exact KS DFT would recover the true highest occupied molecular orbital (HOMO) as its highest occupied KS eigenvalue, since this latter determines the asymptotic decay of the charge density, which needs to be exact. The exact KS HOMO is also the opposite of the exact ionization potential, owing to Janak's theorem and the piecewise linearity of the total energy with respect to the total number of electrons 21 . No other KS level has a direct physical interpretation: even the position of the exact KS lowest unoccupied molecular orbital (LUMO) is incorrect, making the KS gap different from the physical one 22 by the so-called derivative discontinuity (see below). Nevertheless, the optimized-effective-potential approach 23 shows that the KS potential is the variational-best local and static approximation to the electronic self-energy (Box 2); this lends support to using the KS eigenvalues as orbital energies and band structures (since we do not have the exact KS functional, this is, except for the HOMO, a double approximation). In a finite system, one can take the alternative Δ-SCF approach, where the ionization potential, electron affinity and HOMO-LUMO gap are obtained (exactly in exact DFT) as total-energy differences from separate self-consistent field (SCF) calculations performed with N − 1, N and N + 1 electrons, rather than as eigenvalues of a single calculation. For commonly used functionals, Δ-SCF breaks down in the thermodynamic limit of infinite systems and the results asymptotically approach the KS gap 21,24,25 . The exact functional leads instead to a potential that has a discontinuous (uniform in space) jump when an infinitesimal fraction of an electron is added to the system 21 . Intuitively, this derivative discontinuity is present because in exact KS DFT, the incorrect LUMO needs to become the correct HOMO as soon as it is infinitesimally filled, something that is very difficult to capture in approximate functionals and is the missing quantity to be added to the exact but unphysical KS gap to retrieve the correct fundamental one 21,22 . Such requirement makes exact KS DFT highly non-trivial 21 and challenging to approximate or improve. Generalized KS schemes where the KS potential itself becomes non-local-such as meta-generalized-gradient approximations (meta-GGAs) 26 , DFT+U 27 or hybrid functionals 28 -mitigate some of these failures 14,24 . Last, we note that, owing to the first Hohenberg-Kohn theorem, there exist in principle functionals of the density for any physical observable of the ground-state wavefunction, and not only those mentioned earlier. Nevertheless, these functionals are in general not known, and an independent variational principle for these might not exist.

Density functional practice.
To perform an actual calculation, one needs to address the numerical solution of the KS equations and to approximate the xc functional. The numerical solution 29 involves algorithmic choices (for example, direct minimization or iterative diagonalization); representation of the orbitals, with a wide variety of basis sets possible (atomic or numerical orbitals, wavelets, plane waves, linearized augmented plane waves, real-space grids), either with an all-electron or a pseudopotential treatment of the electron-ion interactions 30 ; and numerical methods (for example, Monkhorst-Pack sampling and finite-temperature smearing). This variety translates to the many and widely used electronic-structure codes available nowadays (see, for example, Table 1 of ref. 31 ). For materials simulations, plane-and augmented-wave approachesnaturally embodying periodicity-are some of the most commonly found. Localized basis sets become advantageous when dealing with large-scale applications, for which linear-scaling methods have been developed and applied 32 (standard DFT calculations are cubic scaling, due to the orthonormality constraint of the KS orbitals in equation (1)). Linear scaling is also achieved if the non-interacting kinetic energy T s [ρ] is approximated directly 33 . Finally, stochastic approaches 34 , even with sublinear scaling, have been proposed and extended to excited-state properties.
The quest for approximations to the xc functional is a more fundamental one. The local-density approximation (LDA) was already made in the original KS formulation 17 , applying Fermi's suggestion to the xc functional rather than to the kinetic-energy functional. Albeit unbeknown at the time, LDA satisfies an exact sum rule for the xc hole 35 ; it is this exact constraint, and the intuition of building the non-interacting KS system to approximate the kinetic energy, that has allowed DFT to describe real materials with predictive accuracy. The treatment of spin polarization and magnetism is very challenging, and extensions that introduce spin-resolved densities, functionals and spinors for non-collinear magnetism are then used 16 . Improving on LDA and its spin-resolved extension, which get so many things correct in the first instance, is a challenge. One approach is to introduce the gradients of the charge density, and then the kinetic-energy density, while satisfying as many exact and asymptotic constraints as possible. These GGAs 36 and meta-GGAs 26 represent currently-especially with PBEsol and SCAN-some of the most used all-purpose functionals for materials simulations 13 ; still, challenges remain 37 . Alternatively, one can train on higher-accuracy calculations, as is done for B3LYP 38 , the Minnesota functionals 39 or those based on machine learning (for example, Bayesian error estimation or neural networks). Direct approximations of the KS potential, rather than the functional, have also been attempted: these include mBJ 40 (a meta-GGA) and GLLB-SC 41 (estimating the derivative discontinuity); both are used to improve bandgap predictions. Also, the quality of xc functionals can be improved, with substantial computational costs, through the adiabatic-connection fluctuation-dissipation theorem 23,42 .
Remarkably, most functionals are incorrect in one of the simplest cases possible-one-electron systems-where the Hartree and xc terms should cancel exactly, as it happens in Hartree-Fock, to deliver the same solution as the Schrödinger equation 43 . The orbital-density-dependent Perdew-Zunger correction 44 addresses this self-interaction error (SIE) in many-electrons systems, but may nevertheless require downscaling 45 . A number of authors have connected SIEs to the lack of piecewise linearity of the approximate total energy as a function of the number of electrons 21 , often heuristically extended to fractional occupations of deeper orbitals or manifolds 43,46,47 . Broadly speaking, short-ranged hybrids or Hubbard functionals 27,28,46 improve total energies, while other range-separated or Koopmans functionals 25,48,49 improve spectral properties. In fact, the success of hybrid functionals 28,50,51 , obtained by adding an appropriate fraction of full or range-separated Fock exchange, can also be rationalized as correcting SIEs and imposing some piecewise linearity.
The local nature of many approximations (the exact xc potential is local, but depends non-locally on the density everywhere) affects other key properties, from image potentials at metal surfaces to the 1/r decay of v KS (r) and the Rydberg series in atoms and molecules to long-range van der Waals interactions. These latter interactions are particularly important for materials-from two-dimensional heterostructures to molecular and organic solids-but notable progress has been made, and empirical or first-principles formulations have emerged 52 that have greatly improved predictive accuracy.
The challenges. Some of the most important materials for scientific advance and technological innovation remain challenging. Examples include mixed-valence systems 53 62,63 to free-electron lasers) call for additional theoretical developments and implementations [64][65][66][67] . To provide some guidance in this complex landscape we summarize in Box 1 many of the electronic-structure methods in use today, and in Box 2 Green's function methods, of central relevance for the calculation of spectroscopic properties.

Predictions of materials properties and spectroscopies
The power of first-principles computational materials science has now become apparent owing to the wide range of properties and spectroscopies that can be studied for systems of increasing complexity 66,68 . As guidance, we sketch in Tables 1 and 2 the very rich phenomenology that can be addressed nowadays, providing pointers to the models, theories and electronic-structure toolbox that can be used to make these predictions. In the following, we present some selected subjects that are most relevant to materials, and highlight in Figs. 1 and 2 some milestone approaches that are now widely used.
Thermodynamic ensembles and realistic conditions. Volume and pressure are not only the simplest thermodynamic variables to control in periodic-boundary conditions but also some of the most powerful, owing to the relevance and phenomenological richness of high-pressure physics. In fact, first-principles simulations have had some of their early and notable impact in the field of materials under high pressure 69,70 , effortlessly imposing extreme conditions that might not even be possible in earthly experiments. If the exact functional were known, these properties would be exactly reproduced, provided all the microscopic phenomenology were included. The second column points to the broader concepts, models and phenomenology from physics, chemistry and materials science; the third column highlights the electronic-structure quantities, algorithms and theories that are typically coded. Band-structure properties require more advanced methods, but are often approximated using KS states. STM, scanning tunnelling microscopy; AFM, atomic force microscopy; TEM, transmission electron microscopy; QM-MM, quantum mechanics -molecular mechanics; el, electron; ph, phonon; PAW, projector augmented wave.

NaTure MaTerialS
Temperature represents the first major computational challenge, since it requires probing atomic vibrations and motion, and a breakthrough came with Car-Parrinello molecular dynamics 71 , evolving simultaneously the nuclear and electronic degrees of freedom through an effective Lagrangian 72 . Other approaches followed, based on robust direct-minimization algorithms 73 or efficient iterative ones 74 ; these could be applied equally well to insulators or metals 74,75 . Thermostats and barostats can also couple simulations 76 to reservoirs to maintain the desired temperature or pressure. First-principles molecular dynamics considers every nucleus as a distinguishable entity, and so the resulting statistics is the Maxwell-Boltzmann one of classical particles; imposing the correct Bose-Einstein statistics requires path-integral sampling 77 or coloured-noise thermostats 78 . In the harmonic approximation, the dynamics of crystalline solids is that of an ensemble of independent oscillators, for which the Bose-Einstein partition function can be calculated analytically. The capability to calculate phonon dispersions with density functional perturbation theory 18,79,80 or with finite differences allows one to derive the Helmholtz free energy and, in the quasi-harmonic approximation, the temperature dependence of the lattice parameter and the elastic constants 18 . Stronger anharmonicity can be tackled with molecular dynamics, where free energies can be obtained from thermodynamic integration 81 , with higher-order interatomic force constants or by sampling the phonon Green's function 20 . Collective variables and metadynamics 82 allow one to capture the most relevant degrees of freedom and determine their free energy landscape.
In alloys, the dependence on composition can be studied by calculating free energies as a function of chemical potential to determine phase coexistence and composition-temperature phase diagrams (see 'Building model Hamiltonians' below); if one of the phases is liquid, full molecular dynamics simulations are required 69 . Ensemble simulations can also be applied to paramagnetic phases 83 . Surface science has been another early driver towards the exploration of materials under realistic conditions: first rationalizing the descriptors driving reactivity 84 , exploring the complex potential energy surfaces of molecular chemisorption 85,86 and surface reconstructions 87 , and extending the dependence on chemical potentials to the partial pressure of a gas in equilibrium with a bulk or with a surface 88 or to electrochemical potentials 86,89 and pH 90 . Fittingly, the first high-throughput search for novel materials was driven by the quest for heterogeneous catalysts 91 .
Multiscale and multiphysics. The scope of materials properties that can be predicted from first-principles can be broadened further by multiscale or multiphysics simulations. The potential-energy surface of crystals can be fitted to first-principles calculations, to study phase transitions, for example, in bulk ferroelectrics 92 or in complex geometries 93 . Such an approach has now been greatly expanded owing to progress in machine learning, where the potential-energy surface can be learnt on-the-fly 94,95 , fit to neural networks 96 or featurized in kernel-regression methods 97 building on vast amounts of consistent calculations. Wannier Hamiltonians 98 can capture the building blocks of electronic structure, providing tight-binding  Table 1 footnote. SERS, surface-enhanced Raman spectroscopy; SEIRAS, surface-enhanced infrared absorption spectroscopy; SFG, sum-frequency generation; IETS, inelastic electron tunnelling spectroscopy; HREELS, high-resolution electron energy loss spectroscopy; INS, inelastic neutron scattering; ARPES, angle-resolved photoemission spectroscopy; UPS, ultraviolet photoemission spectroscopy; STS, scanning tunnelling spectroscopy; IXS, inelastic X-ray scattering; EELS, electron energy loss spectroscopy; NMR, nuclear magnetic resonance; EPR, electron paramagnetic resonance; MCD, magnetic circular dichroism; XPS, X-ray photoelectron spectroscopy; XAS, X-ray absorption spectroscopy; XANES, X-ray absorption near-edge structure; EXAFS, extended X-ray absorption fine structure; RIXS, resonant inelastic X-ray scattering; XMCD, X-ray magnetic circular dichroism; QED, quantum electrodynamics; q, quasi-momentum transfer.

NaTure MaTerialS
parametrizations from first principles that can be used, for example, to study device operations; state-of-the-art methods can also directly access the properties of realistic nanostructures 99-101 and interfaces 62,102 .
A notable example of multiphysics simulations is that of transport theories, where the microscopic dynamics of excitations is abstracted into the dynamics of their populations through master equations 103 , such as the Boltzmann 104 or Wigner 105 transport equations, and extended also to non-equilibrium dynamics 106 . The finite excitation lifetimes due to the scattering mechanisms that limit transport-either extrinsic (such as defects) or intrinsic (most commonly, electron-phonon and phonon-phonon scattering)can be obtained from Fermi's golden rule, for example, through the evaluation of the probability amplitudes for three-body events (these can be obtained from the third derivatives of the energy functional 19,107 ). Green-Kubo relations can also provide transport coefficients directly from molecular dynamics 108 .
Structural materials have seen the development of multiscale methods to understand fracture, adhesion, yield strength and other quantities associated with the materials microstructure. First-principles calculations can then provide microscopic parameters such as misfit volumes 109 for phenomenological models at different scales, atomic forces to learn on-the-fly 95 , and stacking faults for dislocation dynamics 110 .

Materials spectroscopies.
Spectroscopies are central to the characterization of materials: matching accurate experimental measurements with first-principles predictions provides both validation and understanding, and a causal relation between electronic and atomic phenomena and their macroscopic effects 111,112 . We summarize most of the techniques accessible to first-principles simulations in Table 2, using a classification that tries to follow the electronic-structure methods employed. Broadly speaking, there are spectroscopies that can be addressed by DFT (for example, some vibrational 18,80 and magnetic 113 spectroscopies, often addressed with density functional perturbation theory), and those that require the calculations of electronic excitations (charged excitations 66,114 , where the number of electrons in system varies, as in a photoemission experiment, or neutral excitations, where the number remains constant and electron-hole pairs are created, as in an absorption experiment 66 ). This latter distinction recalls also the different gaps in materials, where the fundamental or transport gap is determined by the valence and conduction band edges, whereas the optical gap is lowered due to the presence of excitons. The correct description of excitations is thus also very relevant to transport 115,116 , in addition to its formal connection at zero bias to linear response 65 . Owing to their importance and the requirement to go beyond DFT, we have dedicated part of Box 1 and the whole of Box 2 to spectral and non-equilibrium (time dependent) electronic-structure approaches.  182 . c, First-principles thermodynamics: phase diagram of Ga 1−x As x Sb (coexistence and spinodal curves, and composition X MG and temperature T MG of the miscibility gap) (ref. 129 ). d, Density functional perturbation theory: phonon dispersions in GaAs (ref. 79 ). e, High-throughput materials discovery: formation enthalpies of multinary alloys 91 . Credit: panels reproduced with permission from: a, ref. 166

NaTure MaTerialS
Electronic structure and topology. The topological properties of electrons in solids 117,118 have rapidly taken a prominent role in current condensed-matter physics and materials science. While these first manifested themselves with the integer and fractional quantum Hall effects, in electronic-structure methods geometric phases emerged with the modern theory of polarization 119 , driven by the need to calculate the electronic polarization, a non-trivial task due to the fact that the position operator in a periodic solid is ill defined. It was shown that the polarization, when recast as an integral of the linear response to an electric field 120 , measures an adiabatic flow of current. This can be computed as the Berry phase acquired by the wavefunctions under parallel transport 121 , or, equivalently, from the displacements of the Wannier function centres 98 , which represent an exact mapping of the dielectric response into classical electrostatics.
A complementary effort has also given rise to the modern theory of magnetization 122,123 ; notably, an implication of these theories is that the xc functional may also depend on macroscopic quantities such as the polarization 124 . The topological properties of materials-from Chern and quantum spin Hall insulators to Weyl semimetalsare nowadays very actively investigated with electronic-structure methods 125 , even charting the entire landscape of known materials 126 .

Materials' design and discovery
Owing to the accuracy and efficiency of first-principles simulations, and the exponential availability of computational power, not only has the range of calculated materials properties expanded dramatically but also the use of these to design and discover novel materials has become an extremely active and growing research area. We identify four main approaches to first-principles design and discovery, and briefly outline each one here: model Hamiltonians, structure prediction, high throughput and data driven.

Building model Hamiltonians.
In this approach, calculated materials properties are mapped onto an explicit parametric model, which can then be explored more extensively than is possible with direct, first-principles calculations. One of the most famous examples comes from alloy theory in the form of the cluster expansion 127 (CE). The CE can address many problems in materials science that require the total energies of a large number of substitutional structures. Examples include the aforementioned compositiontemperature phase diagrams of alloys, but also the search for the lowest-energy structure among substitutional configurations of lattice sites by different species, the calculation of the energetics of random alloys, and the stability of differently oriented superlattices, substitutional impurities or antiphase boundaries. In the CE, the energy of any atomic configuration on a lattice is written in a generalized Ising-like form in terms of products of single-site spin variables over clusters of atoms, where the coefficients of each term represent the effective interaction for that cluster. First-principles energies for selected sets of substitutional arrangements can then  be fit to the CE, yielding the interactions. The resulting parametric CE can then be used in statistical mechanics simulations (such as Monte Carlo) or enumeration algorithms to solve the problems listed above, or inspire the choice of special 'quasirandom' periodic structures that target the correlation functions of more complex disordered materials 128 . The use of first-principles calculations in the cluster expansion was pioneered in the 1980s and 1990s 129,130 and the approach has proven to be applicable to a wide range of systems, including metal alloys 131 , semiconductor alloys 129 and oxides 132 , but also vacancy ordering in superconductors 133 and batteries 134 , surface ordering and precipitate shapes 135 . The CE is only an example of a much broader class of model-building approaches, but represents well the key idea of leveraging DFT calculations to parametrize a simpler Hamiltonian that can then be more extensively explored.
Structure prediction. The fundamental paradigm of materials science involves the structure-property relation: the structure of a material (at all length scales) ultimately controls its properties. First-principles calculations require structures as input, and hence calculations can be aided by collections of experimental crystal structures. However, in the attempt to predict new, as-yet-undiscovered compounds, one is immediately faced with the challenge of predicting crystal structures a priori. The editor of Nature famously wrote in 1988: "One of the continuing scandals in the physical sciences is that it remains impossible to predict the structure of even the simplest crystalline solids from a knowledge of their composition" 136 . In these subsequent years, dramatic progress has been made, and there now exists a collection of computational methods capable of predicting a wide array of crystal-structure types, often using only first-principles calculations and no other input. The structure-prediction problem is made difficult because of the very large dimensionality for the space of possible crystal-structure symmetries, cell shapes and basis vectors. Most of the structure-prediction methods are based on the idea of attempting to minimize the energy over this high-dimensional space, where the minimization is performed using an efficient optimization algorithm.
Successful methods have used a variety of approaches, such as genetic algorithms, particle swarm methods, molecular dynamics minima hopping, random structure searches and Monte Carlo techniques. A detailed description of these, along with substantial examples of their successful application to materials discovery, has recently been presented 137 . These approaches can also be applied to the structure-solution problem (that is, the solution of crystal structures given experimental diffraction data) by minimizing a cost function that is a combination of the first-principles total energy and the match to the experimental diffraction pattern 138 .
High-throughput first-principles calculations. Another successful and widely used approach to materials design is that of high-throughput (HT) first-principles calculations 1,91 . Here the focus shifts from calculating the property of interest of one compound at a time to computing it at scale for many compounds. In the extreme, one can even attempt to compute properties for all known or predicted compounds. HT calculations are made possible due to the automation of workflows in performing first-principles calculations, robust rates of automation for these calculations with little or no manual intervention, and the applicability of first-principles methods to most or all of the periodic table, owing to the availability of verified libraries of pseudopotentials 10,74 .
Some of the first HT calculations already combined ideas from machine learning and data-driven approaches: from the search, in combination with an evolutionary algorithm, for the most stable four-component alloys out of the 192,016 face-centred cubic (fcc) and body-centred cubic (bcc) structures that can be constructed out of 32 different metals 91 , to the Pareto-optimal set in compressibility, stability and cost for alloys out of a HT dataset of thousands of metallic compounds 139 , to crystal-structure prediction through heuristic-rule extraction on a large library of calculated properties 140 . HT first-principles calculations have been used to build large and powerful databases of materials structures and properties, the earliest examples being the Materials Project 141 , AFLOWlib 142 and the Open Quantum Materials Database (OQMD) 143 . Each of these is based on DFT-calculated properties of experimentally observed and of novel, predicted compounds. These databases include not only properties obtained from standard total-energy calculations (such as energy, density of states, electronic structure, magnetic moments) but also have been increasingly used to curate additional datasets for more complex properties (for example, elastic and piezoelectric tensors, thermoelectric properties, surface energies, phonon dispersions and X-ray absorption near-edge spectroscopy spectra). Their utility and versatility has been demonstrated by their application to design and discover novel materials in a vast array of applications: strengthening precipitates in structural alloys, lithium (and beyond) battery materials, efficient thermoelectric materials and solar thermochemical water-splitting materials, to name a few. Also, these databases are by no means the only examples and many new efforts are emerging-both delivering [144][145][146] or consolidating curated data 147 .
In discovery efforts involving large datasets, a screening strategy is often employed: one starts with a large pool of candidate compounds, and applies more and more stringent criteria to downselect the most promising compounds. Selection criteria prioritize quantities that are relatively straightforward and inexpensive to calculate, and only later introduce more computationally complex, and hence expensive, quantities for further refinement. A very common criterion is the zero-temperature ground-state phase stability, which is identified through a convex hull approach (Box 3). Such a screening strategy is illustrated in Fig. 3, for the case of robust and synthesizeable photocatalysts for CO 2 reduction 148 , but the literature offers many examples that have been summarized in several reviews 1,149,150 .
Data-driven approaches. Machine learning and data-driven approaches can greatly aid the search for new materials. Materials informatics approaches are generally based on three distinct components: a resource of materials data, a representation to quantitatively describe each material and machine learning algorithms to discover patterns within the data or to predict the properties of new materials. The use of HT computations for data collection has the great benefit of improving systematically and exponentially the size of the data pool, and the discovery or prediction portion of the workflow is greatly facilitated by open-source, state-of-the-art algorithms and software packages both in electronic-structure and machine learning.
The key challenge, therefore, lies in the construction of a material representation. This representation is a set of quantitative attributes that describe the relevant characteristics for the materials property of interest. Work in this field has generally fallen into two categories of representations: those that are functions of composition only, and those that additionally contain structural attributes of the crystallographic unit cell. Many of the composition-only attribute sets are primarily based on statistics of the properties of constituent elements. For instance, a material could be described just from the fraction of each element present and various intuitive properties, such as the maximum difference in electronegativity, to model the formation energies, bandgaps or other properties [151][152][153][154] . Incorporating structure-dependent attributes is an obvious route to improve the material representation. Examples of these  155 , the pair radial distribution function and the Coulomb matrix 156 , with machine learning algorithms then built to predict scalar or tensorial quantities or fields. Representations based on Voronoi tessellation of crystal structures have been suggested, using the resulting attributes of the Voronoi polyhedral shape and connectivities 157 . A recent improvement comes from representations based on crystal graphs in a convolutional neural network 158 . These (and recently introduced variants 159,160 ) are highly versatile approaches that can lead to accurate predictions, and involve learning material properties directly from graph-like representations of crystal structures ('crystal graphs'). Data-driven machine learning models of materials properties have seen a large number of applications targeted at the discovery of novel materials with promising properties; many of these models have also been used to direct HT searches, by steering these efficiently towards more promising compounds. In fact, the discovery rate for new and stable compounds has been greatly accelerated by the use of machine learning 156,160 . Recent studies 161,162 have shown how ideas from network theory can be applied to HT datasets, by mapping these data onto graph representations and analysing the topological features of this graph. This approach allows for a top-down study of the organizational structure of networks of materials, based on the interaction between materials themselves. A large-scale network has been defined by the convex hull of essentially all compounds in the OQMD dataset; analysis of this 'complete phase-stability network of all inorganic materials' shows that it is a densely connected complex network of 21,000 thermodynamically stable compounds (nodes) interlinked by 41 million tie-lines (edges) defining their two-phase equilibria. Since the edges of the graph represent two-phase equilibria, the number of edges coming out of a node can be thought of as a measure of the stability of the compound. Hence, the connectivity of nodes in this phase-stability network allows for the derivation of a rational, data-driven metric for material reactivity, the 'nobility index' . 162 Another illustration of the power of this network theory approach involves assigning a time stamp to each node/compound associated with the year of its experimental discovery. This allows one to determine the graph at any previous point in time, and to determine whether the topology of the graph shows any indication of the imminent discovery of a compound in the years before its synthesis. Training a machine learning model of this time evolution in terms of the topological features of the graph allows one to predict the likelihood that hypothetical, computer-generated materials will be amenable to successful experimental synthesis 161 .

Box 3 | Convex hull construction
The convex hull is a powerful graphical construct for determining the phase stability of compounds (most commonly, at zero temperature). For a compound to be thermodynamically stable, it must not only be lower in energy than all other phases at the same composition but also be lower in energy than all linear combinations of phases. If one identifies the lowest-energy linear combinations of phases for each composition of a system, then the set of all such energies forms a convex hull. Phases that are on the convex hull are thermodynamically stable, and those that have an energy that lies above the convex hull are unstable or metastable. For a binary A-B system, there is only one composition variable, and so the convex hull can be represented as a two-dimensional (2D) object on a composition-energy plot (panel a). For an N-dimensional system, there are N − 1 independent compositions, and so the convex hull becomes an N-dimensional object. In ternary systems, the convex hull is 3D, but it is common to take a 2D projection of the convex hull on a Gibbs triangle, eliminating the energy axis (panel b). Similarly, in quaternary systems, the 4D convex hull is often represented as a 3D projection on a Gibbs tetrahedron (panel c). If one is given a set of DFT-calculated energies for a given system, determination of the convex hull is straightforward using a variety of standard algorithms. The construction of convex hulls is automated in many of the high-throughput DFT databases (for example, the Materials Project 141 , OQMD 143 and AFLOWlib 142 ).
The convex hull also provides a simple, straightforward way to assess the stability of new, predicted phases: their energies may be compared with those on the existing convex hull, and a phase is stable if its energy lies below; the convex hull and all its derivatives then must be updated to include the new, stable phase. Convex hulls. a, Convex hull for a binary system, connecting the lowest-energy phases at each composition with tie-lines. b,c, Projections of the 3D (b) and 4D (c) convex hulls for a ternary or quaternary system on a Gibbs triangle or tetrahedron 161 . Credit: figure adapted with permission from ref. 161 , under a Creative Commons License CC BY 4.0

Outlook
As simulations can suggest novel materials, in addition to predicting novel properties of experimentally known materials, key challenges become the assessment of thermodynamic stability, synthesis conditions, manufacturability and tolerance of the predicted properties to intrinsic and extrinsic defects. DFT estimates might ultimately need to be augmented by more advanced electronic-structure methods or machine learning algorithms to improve accuracy, and by seamless deployment of computational materials science methods to address realistic conditions-from vibrational entropies to defects' concentrations to applied electrochemical potentials. Materials synthesis will be further driven by the freedom afforded by many combinatorial templates-from perovskites to half-Heuslers to metal-organic frameworks-with automated experiments and automated simulations starting to handshake, and human or artificial intelligence deciding which experiment is required by a simulation, and which simulation is needed by an experiment. On a more fundamental level, achieving predictive accuracy in the simulations and capturing the realistic complexity of materials remain grand challenges. Not only magnetism or correlations push our ingenuity, but also ever-more-advanced experimental techniques test our capability to address time-resolved or ultrafast processes, to describe correctly the nano-and microstructure of materials, or to sample configurations and processes across length scales or timescales. Some of these latter capabilities are essential to understand synthesis, processing, manufacturing and ultimately failure. Last, simulations can make meaningful predictions only when the underlying theory is understood and coded, although there is also the possibility of observing unexpected behaviour emerging from interactions that are correctly captured, and for data mining pointing to unexpected correlations.
The sustained performance scaling in computing hardware and the emergence of disruptive accelerators for exponentially expensive tasks-such as neuromorphic and quantum computing-together with those driven by artificial intelligence that are already becoming established 94,97 make for one easy prediction. Electronic-structure simulations will keep increasing their relevance and impact in accelerating, streamlining and focusing our efforts in materials design and discovery, delivering ever-more-complex capabilities for predicting and characterizing properties and performance, and addressing those, and those of the devices that are built out of them, in ever-more-realistic conditions and environments. This acceleration mirrors the one of information-and-communication technologies, rather than that of physical infrastructures; is compounded by the novel ideas and algorithms that enter the field; and, at variance with physical infrastructure, can be instantaneously shared worldwide. The impact of the field is already apparent in its acceptance and uptake by the experimental community, and in the remarkable predictions made-from ultrafast thermal transport 163 to electronphonon mediated superconductivity in hydrides 164 to the emergence of flat bands in twisted-bilayer graphene 165 -that have inspired even more remarkable experiments.
Against this background it is then very fitting to note that while physical infrastructures and major installations for research-from synchrotrons to radiotelescopes to supercomputers-are well established, and rightly so, in our scientific society, the support and planning for the computational infrastructures, from the widely used scientific software 31 that powers the research presented here, to the verification of codes and the validation of theories, to the dissemination and curation of computational data, tools and workflows, and the associated career models these entail and require are only just, belatedly, emerging. And users and beneficiaries of these capabilities now go well beyond core practitioners, underlying the importance of robustness and reliability in electronic-structure simulations, and education in their scope and limits-something this Review aims to contribute to.
We conclude by recalling that it is often said that human ages take their name from the materials that characterize them, and the computational design and discovery of novel materials, as enablers of the technologies that power our economy and sustain our society, will be be firmly at the centre stage for the coming decades. What to do with them will remain, for better or for worse, a human decision.