Variance reduction for effective energies of random lattices in the Thomas–Fermi–von Weizsäcker model

In the computation of the material properties of random alloys, the method of ‘special quasirandom structures’ attempts to approximate the properties of the alloy on a finite volume with higher accuracy by replicating certain statistics of the random atomic lattice in the finite volume as accurately as possible. In the present work, we provide a rigorous justification for a variant of this method in the framework of the Thomas–Fermi–von Weizsäcker (TFW) model. Our approach is based on a recent analysis of a related variance reduction method in stochastic homogenization of linear elliptic PDEs and the locality properties of the TFW model. Concerning the latter, we extend an exponential locality result by Nazar and Ortner to include point charges, a result that may be of independent interest.


Introduction
In material science, direct simulations based on density functional theory [14,15,21] are currently limited to hundreds to thousands of atoms and therefore to material samples just about one order of magnitude larger than the atomic length scale (see e. g. [22]).Multiscale approaches -employed for example in the simulation of dislocations [9,19,25] -rely on an extrapolation of the elastic properties of the material from such microscopic samples to larger scales, a concept also known in the context of continuum mechanics as "method of representative volumes".While for materials with a periodic lattice the computational problem on the atomic scale may often be simplified to a problem on a single periodicity cell [3,9,19,22,25], such a simplification is no longer possible for materials with random atomic lattices like random alloys (see Figure 1 for an illustration).As a consequence, for random alloys the atomic-scale samples must be chosen significantly larger, giving rise to a computationally costly problem.
For the computation of the effective properties of random alloys, an approach called "special quasirandom structures" (SQS) has been proposed by Zunger et al. [29] to increase the accuracy of DFT computations without increasing computational effort.The key idea of the method of special quasirandom structures is to construct a periodic configuration of atoms with finite but large periodicity cell ("superlattice") which reflects certain statistical properties of the random atomic lattice particularly well -like the proportion of the atomic species, the proportion of nearest-neighbor contacts of the various atomic species, and so on (see Figure 2 Figure 1.A simple example of a random atomic lattice, the different atomic species being indicated by the colors red and blue (left).An illustration of the method of representative volumes (right): For ab initio computations of material properties, a sample of microscopic extent must be chosen.
for an illustration).Further developments and applications of this method of "special quasirandom structures" may be found in [27,28].Related approaches have been employed in the context of homogenization in continuum mechanics [1,2,23].
Inspired by the method of special quasirandom structures, in the continuum mechanical context of homogenization of random materials a selection approach for representative volumes has been proposed by Le Bris, Legoll, and Minvielle [16]: This selection approach proceeds by considering a large number of microscopic samples of the random material and selecting the sample that is "most representative" for the material as measured by certain statistical quantities, like for example the volume fraction in the case of a two-material composite.The effective material properties are then approximated by numerically evaluating the cell formula provided by homogenization theory on the selected sample.In the context of stochastic homogenization of linear elliptic PDEs −∇ • (a ε ∇u) = f , for the computation of the effective (homogenized) coefficient the selection approach has been shown to yield an increase in accuracy of up to one order of magnitude in a numerical example with ellipticity ratio 5 [16], while requiring negligible computational effort.Recently, a rigorous mathematical analysis of the selection approach by Le Bris, Legoll, and Minvielle in the context of homogenization of linear elliptic PDEs has been provided by the first author [11].
The main goal of the present paper is to show that the selection approach of Le Bris, Legoll, and Minvielle [16] -which is conceptually closely related to the method of special quasirandom structures of Zunger et al. [29] -also allows for an increase of accuracy in the computation of the effective elastic properties of random atomic lattices in the context of orbital-free density functional theory (orbital-free DFT).More precisely, we neglect exchange-correlation energy and consider the approximation of effective energies of random atomic lattices in the framework of the Thomas-Fermi-von Weizsäcker (TFW) model.In the TFW model, for a given nuclear charge distribution m the associated electronic density ρ of the ground state is determined by minimizing the TFW energy ˆCW |∇ √ ρ| 2 + C T F ρ 5/3 + 1 2 (m − ρ)φ dx with the electric potential φ being subject to the Poisson equation By rescaling, we may henceforth assume that C W = 1 and C T F = 1.Recall that it is convenient to reformulate the TFW model in terms of the square root of the electronic density u := √ ρ.With this notation, the Euler-Lagrange equation for the TFW model reads − ∆u + 5 3 u 7 3 − φu = 0, (1a) In orbital-free DFT, further contributions accounting for exchange and correlation energy are typically added to the TFW energy (and, corrrespondingly, to the Euler-Lagrange equation).In the present work, we shall neglect those terms.We will also assume that the positions of the nuclei are given a priori.While in a more realistic model the positions of the nuclei would be determined by energetic relaxation, the question of crystallization in variational models of interacting atoms is a challenging topic on its own, with positive answers currently restricted to rather elementary (mostly non-quantum mechanical) models; see e. g. the review [6].For this reason, we restrict ourselves to the aforementioned setting of fixed nuclei positions.For an overview of the mathematical theory of the TFW model, we refer to [7] and the references therein.
Recall that in the framework of hyperelasticity the deformation of an elastic body is determined by minimization of the total (elastic and potential) energy.In the context of an atomic lattice, the elastic energy is given as the overall energy of electrons and nuclei.In a multiscale approximation, the macroscopic deformation of the elastic body is approximated on the atomic length scale by affine deformations.In many cases, for a macroscopically affine deformation the state of minimal energy of the atomic lattice is given by an approximately affinely deformed atomic lattice (a principle known as the Cauchy-Born rule, see e. g. [8,12]).The associated effective (homogenized) elastic energy density is then given by the thermodynamic limit (i.e. the "infinite-volume average") of the energy of the affinely deformed atomic lattice.In other words, in the context of the TFW approximation the effective elastic energy density is given as the thermodynamic limit of the TFW energy, i. e.
-up to subtracting the self-energy of point charges -by the quantity where the nuclear charge distribution m has been subjected to an appropriate affine change of variables to account for the affine deformation of the lattice.Note that the almost-sure existence of this thermodynamic limit has been established for certain random lattices in [5]; see also [4,7] for an overview and related questions.Under the assumptions (A0)-(A3), the almost sure existence of the limit (2) could also be shown by an argument similar to our proof of Theorem 3.
In practical computations of the effective energy (2), the infinite-volume average in (2) must be replaced by an average over a finite volume, say, a box of the form [0, L] d , an approach also known in the context of continuum mechanics as the method of representative volumes.Note that in this setting one must specify appropriate boundary conditions for ρ on ∂[0, L] d .We shall denote the resulting finite-volume approximation for E ∞ by E RVE L .As boundary layer effects may negatively impact the rate of convergence (in the length L) of the representative volume approximations E RVE L towards the thermodynamic limit E ∞ (see for instance [11] for a brief discussion of the analogous problem in the context of periodic homogenization of elliptic PDEs), it is desirable to work with periodic representative volumes.In the context of nuclear charge distributions m arising from random lattices, this requires the existence of a periodization of the probability distribution of the nuclear charges m, that is an L-periodic variant m of the probability distribution of m (see for instance Figure 2 for an illustration).Note that care must be taken to align the definition of the representative volume with a possible underlying periodic structure.For a more precise explanation of this notion of periodization, see the discussion preceding conditions (A3 a )-(A3 c ) below.From now on and for the rest of the paper, we will assume that the representative volume approximation E RVE L for the effective energy density E ∞ has been obtained by evaluating the averaged TFW energy (see (6) below) on such a periodic representative volume.
Our main result -Theorem 3 -states that the selection approach for representative volumes of Le Bris, Legoll, and Minvielle [16] increases the accuracy of approximations E RVE L , at least for a wide class of random nuclear charge distributions: Instead of choosing a representative volume (that is, an L-periodic nuclear charge distribution) uniformly at random from the (periodized) probability distribution, it is better to preselect the representative volume to be "particularly representative" for the random alloy in terms of certain basic statistical quantities like the proportion of different types of atoms in the representative volume, the proportion of nearest-neighbor contacts of certain types in the representative volume, and so on.We denote the resulting approximation for the effective energy density by E sel-RVE L .In Theorem 3 we show that the approximation E sel-RVE L is typically more accurate than the approximation E RVE L .From a mathematical viewpoint, the interest in our main result is twofold: • It provides a rigorous justification of the method of "special quasirandom structures" in a quantum mechanical model, the setting in which these methods were first developed [29].
• It provides a first example of a nonlinear PDE for which the selection approach for representative volumes of Le Bris, Legoll, and Minvielle [16] can be proven to be successful.
Let us briefly comment on the mechanism for the gain in accuracy achieved by the method of special quasirandom structures.The leading-order contribution to the error in the method of representative volumes consists in fact of fluctuations, while in expectation the method of representative volumes is accurate to much higher order.In fact, in the case of the TFW model the systematic error of the method of representative volumes decays even exponentially in the size of the representative volume An illustration of the method of special quasirandom structures: An L-periodic "superlattice" (with L 1) is built to reflect the statistical properties of the random material particularly well -like the percentage of atoms of the two species, the statistics of nearest-neighbor configurations, the statistics of configurations of three neighboring atoms, and so on.
At the same time, the fluctuations display only CLT scaling behavior that is they behave like the fluctuations of the average of L d i. i. d. random variables.Thus, a variance reduction method -a method to reduce the fluctuations of the approximations E RVE L while mostly preserving the expected value E[E RVE L ] -is expected to lead to an increase in accuracy.
The selection of "particularly representative" material samples may be viewed as such a variance reduction method : In fact, we shall prove that the joint probability distribution of the effective energy E RVE L and statistical quantities like the percentage of atoms of a certain species in the representative volume (and/or quantities like the percentage of nearest-neighbor configurations of two given atomic species, etc.) is close to a multivariate Gaussian.Conditioning on the event that the auxiliary statistical quantity -which we shall denote by F -is close to its expected value then reduces the variance of the computed energies E RVE L , provided that E RVE L and the auxiliary quantity are nontrivially correlated (see Figure 3).At the same time, the expected value E[E RVE L ] is not changed much by selecting only representative volumes subject to the condition that F is close to its expected value.
The main challenge in the proof is the derivation of the quantitative multivariate normal approximation result for the joint probability distribution of the energy E RVE L and the statistical quantities F of the representative volume.Just like in [11], we make crucial use of the locality properties of these quantities of interest, which allow for a quantitative (multivariate) normal approximation.In [11], in the context of the homogenization of the linear elliptic PDE −∇ • (a ε ∇u) = f , a localization result for the effective energies has been established: In [11], the contribution of terms with dependency range ∼ to the overall energy a RVE L ξ • ξ is seen to be essentially of the order −d , which is essentially twice the order of the fluctuation scaling −d/2 .By means of a "multilevel local dependency structure" [10], this allowed for the derivation of a quantitative multivariate normal approximation result for the joint probability distribution of the representative volume approximation a RVE L of the effective coefficient and auxiliary statistical quantities like the averaged coefficient F := − ´[0,L] d a dx [11].
Due to the strong -exponential -localization properties of the TFW model (see [20] for the case without point charges and Theorem 5 below for the general case), we in principle would not even need to appeal to the "multilevel local dependence structure" introduced in [11,10], but could directly work with a multivariate central limit theorem with a standard local dependence structure.However, it will be convenient for us to employ the abstract variance reduction result of Lemma 7, which is established in [11,10].
Notation.We use standard notation for Sobolev spaces: By W 1,p (Ω) we denote the space of functions v ∈ L p (Ω) whose distributional derivative ∇v also belongs to L p (Ω), along with the usual norm ||v|| p W 1,p (Ω) = ´Ω |v| p + |∇v| p dx.As usual, we use the abbreviation By H 1 uloc (R d ) we denote the space of functions v : R d → R whose restrictions v| B1(x) belong to H 1 (B 1 (x)) for all x ∈ R d , with a uniform bound on the local Sobolev norm ||v|| 2 By B r (x) we denote the ball of radius r around x ∈ R d .We also use the shorthand notation B r := B r (0).By C we will denote a generic constant depending only on quantities like ρ, M , and ω 0 (see the assumptions (A1) and (A3) below), whose precise value may vary from occurrence to occurrence.
For a set M , we denote by M the number of its elements.
For two vector-valued random variables X and Y , we denote the covariance matrix as usual by Cov[X, Y ].We also use the notation Var X as a shorthand notation for Cov[X, X].

Main Results
In this article, we prove that the selection approach for representative volumes of Le Bris, Legoll, and Minvielle [16] leads to an increase in accuracy when calculating effective energies for random lattices in the context of the Thomas-Fermi-von Weizsäcker model (1), at least for a wide class of random nuclear charge distributions.For a precise statement of our assumptions and our main result, see (A0)-(A3) and Theorem 3 below.
Under more general conditions, we establish an exponential locality result for the TFW model, an auxiliary result that generalizes a corresponding result by Nazar and Ortner [20] and that may also be of independent interest.For a more precise statement of the assumptions and the result, see (A1) and Theorem 5 below.
Consider any Bravais lattice and denote by F ∈ R 3×3 a matrix whose columns are given by the corresponding three primitive vectors.Our key assumptions on the nuclear charge distribution m are as follows.
(A0) Let m be a random nuclear charge distribution (a -random -locally finite nonnegative Radon measure) on R 3 .In other words, let a probability space (Ω, F, P) be given along with a random variable m taking values in the space of locally finite nonnegative Radon measures on R 3 .(A1) Suppose that uniform local finiteness of the nuclear charge distribution m holds in the following sense: There exist constants ρ > 0, M ≥ 0, and ω 0 > 0 such that m is of the form ) with m c ≥ 0, some c y = c y (m) > 0, and some set P = P(m) ⊂ R 3 satisfying |x − y| ≥ 4ρ for all x, y ∈ P with x = y, and in addition the estimate holds.Furthermore, suppose that an averaged lower bound for the nuclear charge density of the form m dy ≥ ω 0 holds for all R ≥ ω −1 0 for some ω 0 > 0. (A2) Let m be stationary, i. e. suppose that the law of the shifted charge distribution m(• + x) coincides with the law of m for every x ∈ F Z 3 .(A3) Let m have a finite range of dependence r ≥ 1, i. e. suppose that for any two Borel sets A, B ⊂ R 3 with dist(A, B) ≥ r the restrictions m| A and m| B are stochastically independent.
We shall also use the concept of a periodization of an ensemble of nuclear charge distributions m (where an ensemble of nuclear charge distributions is defined as a probability measure on the space of nuclear charge distributions): A periodization of an ensemble of nuclear charge distributions is an ensemble of nuclear charge distributions m which are almost surely LF Z 3 -periodic for some L 1 and for which the probability distribution of m| x+F [0, The condition (A2) imposes a statistical homogeneity assumption on the random lattice.Since we want to include the model case of a periodic lattice like Z 3 whose sites are occupied by random atomic nuclei (i.e. at whose lattice sites there is a random multiple of a Dirac charge; see Figure 1) in our assumptions, we cannot assume translation invariance of the law of the nuclear charge distribution m with respect to arbitrary shifts x ∈ R 3 .Instead, in the case of the lattice Z 3 we have to restrict the translation invariance to discrete shifts x ∈ Z 3 .As we are interested in the effective elastic properties and as most (elastic) affine deformations of Z 3 destroy the Z 3 periodicity, we have to cover the case of an arbitrary Bravais lattice F Z 3 in our assumption (A2).
Let us now give a precise definition of the TFW energy and its thermodynamic limit.
Definition 1.Let m be a nuclear charge distribution satisfying the assumption (A1).For a set Q ⊂ R 3 with finite volume, we introduce the Thomas-Fermi-von Weizsäcker energy where ) is the (unique) solution of the TFW equations (1) (see Theorem 16) and where We define the thermodynamic limit E ∞ of the energy density -the effective energy density -as if the limit exists.
Let m be a nuclear charge distribution satisfying the assumptions (A0)-(A3).Given L ≥ 1, let m be a periodization of the probability distribution of the nuclear charge distribution m subject to (A3 a )-(A3 c ).We define the approximation E RVE L of the effective energy density E ∞ by the representative volume method as where ) denotes the (unique) solution of the TFW equations (1) given the nuclear charge distribution m.Note that both ũ and φ inherit the L-periodicity of the nuclear charge distribution m [7].
Finally, let N ∈ N, let F be a measurable R N -valued function of the (periodized) nuclear charge distribution m, and let δ > 0. We then define E sel-RVE L to denote the approximation of the effective energy density E ∞ using the selection method for representative volumes with the selection performed according to the criterion In other words, the probability distribution of E sel-RVE L is given as the conditional probability distribution of E RVE L given the event (5).
Let us briefly discuss the TFW energy (3).The first two terms in (3) correspond to the kinetic energy of the electrons in the Thomas-Fermi-von Weizsäcker approximation.The third and fourth term correspond to the Coulomb energy.Here, the contribution from the nuclear charges m has been split into two terms, representing the absolutely continuous part m c and the singular part x c x δ x of the nuclear charge distribution.The presence of the difference φ − φ x in the fourth term in (3) corresponds to the usual subtraction of the self-energy of point charges.Note that the difference φ − φ x satisfies the PDE ) ensures that the pointwise evaluation of φ − φ x at the point x in the above definition is indeed meaningful.
We next state additional assumptions and notation which will be needed to formulate our main result on the analysis of the selection approach for representative volumes.
Assumption 2. Consider a probability distribution of nuclear charges m on R 3 satisfying (A0), (A1), (A2), and (A3).Let L ∈ N, L ≥ 2, and assume that there exists an L-periodization m of the probability distribution of m subject to (A1), (A2), and (A3 a ) -(A3 c ).Let F( m) = (F 1 ( m), . . ., F N ( m)) (for some N ∈ N) be a collection of statistical quantities of the nuclear charge density m which are subject to the conditions of Definition 4 with K ≤ C 0 and B ≤ C 0 | log L| C0 for some C 0 > 0. Suppose that the covariance matrix of F( m) is nondegenerate and bounded in the natural scaling in the sense We introduce the condition number κ of the covariance matrix of (E RVE L , F( m)) and the ratio r Var between the expected order of fluctuations and the actual fluctuations of the approximation Let us briefly mention that the following statistical quantities F satisfy the conditions of Definition 4 below and are therefore admissible choices in our main result (i.e. in Theorem 3 below): • The density of nuclei of a specific type • The density of nearest-neighbor contacts of two specified types of nuclei in case that the nuclei are arranged on the lattice F Z 3 .• Similar statistics of configurations of three or more neighboring atoms or corresponding quantities for more general atomic lattices.Note that it is precisely these type of statistics of the random atomic lattice that are considered in the original formulation of the method of special quasirandom structures [29].
We are now in a position to formulate our main result, the gain in accuracy by the selection approach for representative volumes in the context of the TFW model for random alloys.Note that our main result comprises essentially three assertions: • The increase in accuracy of E sel-RVE L (as compared to E RVE L ) (10), which is achieved via the reduction of fluctuations by essentially the fraction of the variance of E RVE L explained by the statistical quantities F. • The higher-order approximation quality (9) of the expected value • The lower bound (12) for the probability that a randomly chosen nuclear charge distribution m meets the selection criterion (8).
Theorem 3. Let Assumption 2 be satisfied.Denote by E RVE L the approximation for the effective energy E ∞ by the standard representative volume element method and by E sel-RVE L the approximation for E ∞ by the selection approach for representative volumes introduced by Le Bris, Legoll, and Minvielle [16] in the case of a representative volume of size L. Further assume that in the selection approach, the representative volumes are selected from the periodized probability distribution according to the criterion for some δ ∈ (0, 1] satisfying δ N ≥ CL −3/2 | log L| C .Then, the selection approach for representative volumes is subject to the following error analysis: (a) The systematic error of the approximation The variance of the approximation E sel-RVE L is bounded from above by where |ρ| 2 is the fraction of the variance of E RVE L explained by the F( m).In other words, |ρ| 2 is the maximal squared correlation coefficient between E RVE L and any linear combination of the F( m).This explained fraction of the variance is given by the expression (c) The probability that a randomly chosen nuclear charge distribution m satisfies the selection criterion (8) is at least We next state our precise assumption on the statistical quantities F.
Definition 4 (similar to [10, Definition 3]).Let L ≥ 2, and consider a probability distribution of LF Z 3 -periodic nuclear charges m on R 3 satisfying (A1), (A2), and be a random variable of the periodized nuclear charge.We say that X is a sum of random variables with multilevel local dependence if there exist random variables 3 , and constants K ≥ 1 and B ≥ 1 with the following properties: • The random variable • The random variables satisfy almost surely |X n y | ≤ BL −3 .Our proof of Theorem 3 makes use of the following exponential locality result for solutions to the TFW equations, which extends a similar locality result of [20] to include point charges and which may be of independent interest.Theorem 5. Let m 1 and m 2 be two nonnegative nuclear charge distributions (i.e. nonnegative locally finite Radon measures) subject to the assumption (A1).Denote by ) the corresponding solutions to the TFW equations (1).Then the perturbations of the electronic density w := u 1 − u 2 and the potential ψ := φ 1 − φ 2 decay exponentially away from the perturbation of the nuclear charge distribution δm := m 1 −m 2 .More precisely, there exist constants C = C(ρ, M, ω 0 ) > 0 and γ = γ(ρ, M, ω 0 ) > 0 such that for all y ∈ R 3 the estimate ˆR3 The following proposition comprises the exponential locality result for the TFW energy.Given two nuclear charge distributions m 1 and m 2 , the difference between the values of the TFW energy evaluated for m 1 and m 2 within the domain Q 1 exponentially decreases with the distance between supp(m 1 − m 2 ) and Q 1 .Corollary 6.Let Assumption 9 be satisfied.Then, there exist constants C = C(ρ, M, ω 0 ) > 0 and c = c(ρ, M, ω 0 ) > 0 such that for any cube Q 1 ⊂ R 3 with unit volume the estimate holds true.

Analysis of the Method of Special Quasirandom Structures
The following lemma serves as the main technical tool to prove Theorem 3. In fact, it is an abstract version of [11,Theorem 2]: One may adapt the proof of [11,Theorem 2] in a one-to-one fashion to establish Lemma 7.
and let C > 0 denote a generic constant which only depends on d, N , and C 0 as well as on K and B from Definition 4. Let Z = (Z 0 , Z 1 , . . ., Z N ) be a vector of random variables.Suppose that each Z i , i ∈ {0, . . ., N }, is a sum of random variables with multilevel local dependence according to Definition 4. Assume that the covariance matrix of (Z 1 , . . ., Z N ) is nondegenerate and bounded in the natural scaling in the sense that Let δ ∈ (0, 1] satisfy δ N ≥ CL −d/2 | log L| C , and let Z 0,sel be a random variable whose law coincides with the probability distribution of the random variable Z 0 conditioned on the event Introduce the condition number κ of the covariance matrix of (Z 0 , . . ., Z N ), and the ratio r Var between the expected order of fluctuations and the actual fluctuations of Z 0 Then, the following estimates hold true: (a) The difference of the expected values of Z 0,sel and Z 0 satisfies The variance of Z 0,sel is bounded from above by where |ρ| 2 is the fraction of the variance of Z 0 explained by (Z 1 , . . ., Z N ).
In other words, |ρ| 2 is the maximal squared correlation coefficient between Z 0 and any linear combination of the Z i , i ∈ {1, . . ., N }.This fraction of the variance of Z 0 explained by Z 1 , . . ., Z N is given by the expression (c) The probability that (Z 1 , . . ., Z N ) satisfies the imposed selection criterion is at least By combining the locality properties of the TFW model established in Theorem 5 with the abstract variance reduction result of Lemma 7, we now establish our main result.
Proof of Theorem 3. Throughout the proof, we will assume F = Id.The case of general F is similar.
The proof of Theorem 3 is an immediate consequence of Lemma 7 as soon as we have established two results: 1) The approximation E RVE L for the thermodynamic limit energy E ∞ by the method of representative volumes (where (ũ, φ) denotes the solution of the TFW equations as stated in Theorem 16 ; note that by the L-periodicity of m, the solution (ũ, φ) is Lperiodic) is a sum of random variables with multilevel local dependence according to Definition 4.
2) The estimate for the systematic error Step 1: Proof of 1).We denote by Q (x) the cube x + [− 2 , 2 ) 3 .In order to establish the property 1), we may write and Here, we employ the notation m| ext Q K log 2 L (yi) to denote the extension of the restriction m| Q K log 2 L (yi) to R 3 by a constant multiple of the Lebesgue measure for any Borel set A ⊂ R 3 .The constant K will be chosen below.Note that m| ext Q K log 2 L (yi) is still subject to uniform bounds of the form (A1).The first of the three conditions on the X n y in Definition 4 is satisfied for the choice (15) as the random variables E 0 yi only depend on m| Q K log 2 L (yi) .The second condition trivially holds true due to the definition of E 0 yi and E 1+log 2 L .The third condition for the E 0 yi -that is, the bound |E 0 yi | ≤ BL −3 -follows from the structure of the Thomas-Fermi-von Weizsäcker energy in (3) and the bounds on u, φ and m from Proposition 14.Finally, to establish 1) it only remains to show the bound |E 1+log 2 L | ≤ CL −3 .As a consequence of Corollary 6 and the equality m = m| ext for the choice K := 6 c + 1, where the positive constants C and c only depend on ρ, M , and ω 0 .Taking the sum over all y ∈ Z 3 ∩ [0, L) d and multiplying by L −3 , we have shown the desired bound Step 2: Proof of 2).In order to establish ( 14), we may write Taking the expectation and estimating the terms in the last line using Corollary 6 and the equality m = m| ext Q L/2 (y) on Q L/2 (y), we obtain Using the equality in law of m| ext Q L/2 (y) and m| ext Q L/2 (y) , we deduce Deriving the same estimate as in ( 16) for m, we obtain (14).

Locality of the TFW Equations Involving Point Charges
An important tool which we will utilize frequently to deal with the Dirac charges and the corresponding singularities of the potential is given by the class of cut-off functions η ρ which we introduce now.Note that given a nuclear charge distribution m, we will define η ρ in such a way that η ρ vanishes in a ρ-neighborhood of the point charges and that η ρ ≡ 1 holds outside of a 2ρ-neighborhood of all point charges.Notation 8.For ρ > 0, we define the cut-off function ).For all ρ > 0, one thus has η ρ ∈ C 1 ([0, ∞)) and there exists a constant Moreover, for any discrete set P ⊂ R 3 satisfying |x − y| ≥ 4ρ for some ρ > 0 and all x, y ∈ P, x = y, define η ρ : R 3 → [0, 1] via η ρ := η ρ (| • − z|) on B 2ρ (z) for all z ∈ P and η ρ := 1 elsewhere.Then, we have η ρ ∈ C 1 (R 3 ) and is valid on {η ρ > 0}.
We define the short-hand notations w where P ⊂ P 1 ∪ P 2 is the set of all x ∈ P 1 ∪ P 2 for which δc x := c 1,x − c 2,x = 0 holds true.Moreover, we will use the notation η to denote the cutoff function η ρ from Notation 8 with P taking the role of P. Finally, we introduce ξ := e −γ|•− y| for some 0 < γ < 1 and y ∈ R 3 .Note that this choice entails |∇ξ| ≤ γξ ≤ ξ.
As a key step towards Theorem 5, we derive an upper bound for the weighted L 2 -norm of w, ∇w, and √ η∇ψ by adapting the strategy in [20] to the more general case of locally finite nonnegative Radon measures representing the nuclear charges.
Lemma 10.Let Assumption 9 be satisfied.Then, there exist positive constants holds, where η and ξ (depending on γ) are defined in Assumption 9.
Proof.Following the argumentation in [20], we have and test (19) with wξ 2 (note that by w ∈ H 1 uloc (R 3 ) and the exponential decay of ξ and ∇ξ, wξ 2 is indeed an admissible test function).This yields The elementary estimate 2 )w ≥ 2 > 0 only depending on ρ, M , and ω 0 , as well as the identities 1 + u (the latter inclusion holding by u i ∈ H 2 uloc (R 3 )).Due to Lemma 12, L 1 , L 2 , and hence L are non-negative operators on H 1 (R 3 ).In fact, wξ ∈ H 1 (R 3 ), wξ, L(wξ) ≥ 0 and wξ, L(wξ) + ν We continue by testing (20) with The representation and, thus, By estimating η|∇ψ| 2 ξ 2 ≤ 2η|∇(ψξ)| 2 + 2ηψ 2 |∇ξ| 2 , by applying an absorption argument together with |∇η| 2 ≤ c η (ρ)η to ∇ψ on the right hand side, and by employing the uniform L ∞ -bound for u 1 and u 2 from Proposition 14, we obtain In order to handle the |a|w 2 ξ 2 -term, we bound the integral over R 3 by the sum over all integrals over all balls of radius 1 located at points with integer coordinates.After applying a Gagliardo-Nirenberg-estimate and Young's inequality, we again arrive at (a multiple of) an integral over R 3 as every x ∈ R 3 can belong to at most eight unit balls around integer points: We now choose τ > 0 -arising from Young's inequality -in such a way that ´R3 |∇w| 2 ξ 2 dx can be absorbed on the left hand side of (23).As L is non-negative, the right hand side of ( 22) already serves as an upper bound for ´R3 w 2 ξ 2 dx.As a consequence, the claim of the lemma follows.
The next lemma establishes an L 2 -bound for ψ, and at the same time improves the bound on the L 2 -norm of w, ∇w, and √ η∇ψ.
Lemma 11.Let Assumption 9 be satisfied.Then, there exist positive constants where η and ξ (depending on γ) are defined in Assumption 9.
We are now in position to prove our locality result Theorem 5.
Proof of Theorem 5.In view of Lemma 11, for proving Theorem 5 it suffices to show that the bound on the L 2 -norm of w, ∇w, ψ, and ∇ψ from Lemma 11 also serves as an upper bound for the L 2 -norm of the second order partial derivatives We first establish a bound for ´R3 η|∆ψ| 2 ξ 2 dx.From ( 20), we derive Using Young's inequality and absorption as well as the bounds from Proposition 14 and Lemma 11, we arrive at (28) We will now employ integration by parts to arrive at i,j ˆR3 We utilize the bounds |∇ξ| ≤ ξ and |∇η| 2 ≤ c η (ρ)η and continue as The assertion of the lemma now follows from (28) and Lemma 11.
We finally establish the locality result for the TFW energy.
Proof of Corollary 6.The difference of the TFW energy for m 1 and m 2 is given by We recall ξ := e −γ|•− y| , where y := − ´Q1 x dx is the center of Q 1 .We further recall the definition of η from Notation 8 and find using also Proposition 14 and Lemma 11 ≤ C e 2ργ e −γ dist(P ,Q1) ˆ{η<1} (w 2 + ψ 2 )ξ dx Consequently, we have which (using also the bounds from Proposition 14) implies the desired estimate for the first term in (29).Concerning the second expression, we derive (using again Proposition 14) ˆQ1 u where v(x) ∈ [u 1 (x), u 2 (x)] for all x ∈ Q 1 .As for the previous term, we obtain from Lemma 11 ˆQ1 u The first part of the Coulomb energy in ( 29) can be estimated via and, hence, arrives at the desired bound by applying Lemma 11 and the same arguments as above.We are left to control the change of the Coulomb energy related to the atomic nuclei in (29), which we may bound by In the case that dist(supp(m 1 − m 2 ), Q 1 ) ≤ 2ρ, we observe that (13) holds true for the right hand side of the previous equation as it is bounded by a constant (due to the uniform bound on φ i − φ i,x ) and as e −c dist(supp(m1−m2), Q1) is bounded from below by a positive constant.And if dist(supp(m and φ 1,x = φ 2,x as well as c 1,x = c 2,x for all x ∈ (P 1 ∪ P 2 ) ∩ Q 1 .As a consequence, in this case we have and we may now employ Theorem 5 and the bound from Proposition 14 to finish the proof.

Uniform Bounds on Solutions to the TFW Equations
For our arguments we need uniform estimates on the solutions to the TFW equations which depend only on the parameters ρ, M , and ω 0 .For this reason, we repeat some of the calculations of [7,20] to show that they do in fact yield uniform estimates.The following lemma and its proof are based on similar considerations in [7,20].
Lemma 12. Let a ∈ L 2 uloc (R 3 ) and suppose there exists some u ∈ H 2 uloc (R 3 ) with inf B R (0) u > 0 for all R > 0 and (−∆ + a)u = 0.Then, w, (−∆ + a)w := Proof.We first note that u ∈ L ∞ (R 3 ) ∩ C(R 3 ) due to the uniform boundedness of u H 2 (B1(x)) for every x ∈ R 3 .Regularizing a ∈ L 2 uloc (R 3 ) (e.g. by convolution with some mollifier), we obtain a family of functions a ε ∈ C ∞ (R 3 ) which converge for ε → 0 to a in L 2 (B R ) for any R > 0. A standard result on differential operators [13] ensures that the minimal eigenvalue λ ε of −∆ + a ε as an operator on B R with Dirichlet boundary conditions is simple and is associated with a non-negative eigenfunction and elliptic regularity theory, we deduce , hence ∇v ε is well-defined and ∂vε ∂n ≤ 0 on ∂B R .Moreover, it holds that We shall now prove that the eigenvalues λ ε are bounded for ε → 0. For any where the last expression is bounded due to a ε → a in L 2 (B R ).This yields an upper bound of the form λ ε ≤ C. In addition, the equation ε dx and the Gagliardo-Nirenberg-Sobolev type estimate where we utilized the normalization of v ε and the boundedness of , which provides both a lower bound for λ ε of the form λ ε ≥ −C and an upper bound for v ε H 1 (B R ) .Up to a subsequence, we thus know that λ ε converges for ε → 0.
Appropriate bounds on the solutions to the Thomas-Fermi-von Weizsäcker equations ( 18) can be constructed with the help of Proposition 13.The proof mainly relies on arguments from [7] and [20] where corresponding estimates have been deduced in similar situations.Proposition 13.Let M > 0 and m = m c + x∈P c x δ x where m c ∈ L 2 uloc (R 3 ), m c ≥ 0, c x > 0 and P ⊂ R 3 such that |x − y| ≥ 4ρ > 0 for all x, y ∈ P with x = y, and Then, there exists some R 0 ≥ 1 such that for each R n ≥ R 0 and m Rn := m χ B Rn (0) , there exists a solution (u Rn , φ Rn ), in the sense of distributions.Using the notation η ρ introduced in Notation 8, this solution satisfies the bounds for all 0 < ρ < ρ, 1 ≤ i ≤ 3, where C, C p , C ρ > 0 are independent of M and R n .
Proof.We assume from now on that m is not identically zero on R 3 .Consequently, there exists some R 0 ≥ 1 such that ´R3 m Rn dx > 0 holds true for all R n ≥ R 0 .
A pointwise upper bound for φ Rn cannot hold due to the point charges, but we may follow the arguments of [7,20] to establish upper bounds for φ Rn in L p uloc (R 3 ), p < 3, which are uniform in R n .
We introduce the set which is open and bounded due to the previous calculations.Furthermore, the constant and positive function h := (C * M ) Thanks to the maximum principle, we arrive at φ Rn * ω 2 ≤ c ω + C with a constant C > 0 independent of M and R n .
In the case that φ Rn ≤ 0 on R 3 , we have due to (37) the pointwise bounds −2Λ ≤ φ Rn ≤ 0. Otherwise, the positive part φ + Rn is not identically zero, and we shall derive appropriate L p -bounds for φ + Rn .We first recall that φ Rn ∈ C(R 3 \(P ∩ B Rn (0))).In particular, φ + Rn is continuous away from the set P ∩ B Rn (0) and with constants C > 0 independent of M and R n .Now, choose some arbitrary x 0 ∈ R 3 \(P ∩ B Rn (0)) satisfying φ Rn (x 0 ) > 0. On the one hand, we obtain the bound On the other hand, we may write φ + Rn (y) ds(y) dτ, and we immediately see that there exists some t ∈ ( ) and we shall derive bounds for the three terms on the right hand side which are independent of R n .
The first bound follows from the mean value property of harmonic functions and the estimates in (39) and (38) via where the constant C > 0 is independent of M and R n .For the second problem, we proceed as in [20] and find a solution Together, we arrive at and as x 0 ∈ R 3 \(P ∩ B Rn (0)) has been chosen arbitrarily, we further obtain a.e. in R 3 where C > 0 is a constant independent of M and R n .For p ∈ [1, 3), we then conclude that φ where C p > 0 denotes a constant depending only on p. Combining this estimate with the lower bound for φ Rn in (37), entails -as a first step -the desired L p -bound on φ Rn in (32).
Step 2: Further bounds.In order to establish the bounds on u Rn in (32), we first utilize (36) to find (40)    Rn + φ Rn u Rn .
As the right hand side belongs to L 7 4 (B 2 (x 0 )) (which will be detailed immediately), a standard result (see e.g.[13,Theorem 8.17]) ensures the following norm estimates on B 2 (x 0 ):  u Rn H 1 (B2(x0)) ) where C > 0 denotes various constants independent of M and R n .As a consequence, u Rn L ∞ (R ) 12 ).
For establishing the remaining bounds on φ Rn , we split Rn where all singularities of φ Rn are collected within the first term.The cut-off function ω ∈ C ∞ c (R 3 ) satisfying ω = 1 on B 1 (0) and ω = 0 on R 3 \B 2 (0) enforces each contribution from the Coulomb potential to have finite range.The non-singular function φ c Rn is then subject to (42) Testing this equation with ω(•−y)φ c Rn on B 2 (x 0 ) for some arbitrary As the expression inside the brackets in the last line is bounded by a constant which only depends on the choice of ω, we deduce where γ > 0 will be chosen subsequently.Besides, we observe that A similar estimate can be derived for the second order derivatives ∂ ij φ c Rn , 1 ≤ i, j ≤ 3. To this end, we first note that  .
The H 2 -type bound on φ on the set R 3 \P can be deduced by a similar reasoning.We start by observing that for all 1 ≤ i, j ≤ 3, R > 0, and 0 < ρ < ρ due to the bounds in (32).By selecting a diagonal sequence φ Rn , we find that φ Rn weakly converges to φ in H 1 (B R (0) ∩ int{η ρ/2 = 1}) for all R > 0 and 0 < ρ < ρ.This fact gives rise to (and an analogous bound on η ρ ∂ ij φ) for all x 0 ∈ R 3 , 0 < ρ < ρ, and 1 ≤ i, j ≤ 3. We subsequently rewrite (31) in the distributional formulation.For all v ∈ C ∞ c (R 3 ), we have ˆR3 − u Rn ∆v + 5 3 u Due to the convergence properties of u Rn and φ Rn derived above, these equations converge to the corresponding distributional formulation of (45) for n → ∞.
Note that in the literature sometimes a condition equivalent to (A1) is used.
Remark 15.We now give an equivalent characterization of the inf-condition for charge distributions m in (A1), which also appears in [7].An analogous statement without Dirac measures has been proven in [20].But as the result only appeals to the mass, the proof is the same.Let m = m c + y∈P c y δ y where m c ∈ L 2 uloc (R 3 ), m c ≥ 0, c y > 0 and P ⊂ R 3 such that |x − y| ≥ 4ρ for all x, y ∈ P, x = y, for some ρ > 0.Then, the following statements are equivalent.

Figure 3 .
Figure 3.The joint probability distribution of the approximations for the effective energy E RVE L and auxiliary statistical quantities F like the percentage of atoms of a certain species is close to a multivariate Gaussian (left).Conditioning on the auxiliary statistical quantity F being close to its expected value then reduces the variance of E RVE L , provided that the two random variables are nontrivially correlated (right).

L 2 ]
3 coincides with the probability distribution of m| x+F [0, L 2 ] 3 for all x ∈ R 3 .Given such a periodization m, we substitute (A3) by (A3 a ) -(A3 c ): (A3 a ) The nuclear charge m is almost surely LF Z 3 -periodic for some L 1. (A3 b ) There exists a finite range of dependence r > 0 such that for any two Borel LZ 3 -periodic sets A, B ⊂ R 3 with dist(A, B) ≥ r the restrictions m| A and m| B are stochastically independent.(A3 c ) There exists a nuclear charge distribution m satisfying (A1), (A2), and (A3) such that for any x 0 ∈ R 3 the law of the restriction m| x0+F [0, L 2 ] 3 coincides with the law of m| x0+F [0, L 2 ] 3 .Let us briefly comment on our main assumptions.The condition (A1) is nothing but a uniform local upper and lower bound on the charge distribution of the nuclei.The condition (A3) is a strong decorrelation assumption restricting all stochastic dependencies to a scale r ≥ 1.
y| dx holds, where the cutoff η (0 ≤ η ≤ 1) is defined in Assumption 9 below and where δm c := m 1,c − m 2,c .Here, we have made use of the decomposition m i = m i,c + x∈Pi c i,x δ x provided by assumption (A1).