Main-group test set for materials science and engineering with user-friendly graphical tools for error analysis: systematic benchmark of the numerical and intrinsic errors in state-of-the-art electronic-structure approximations

Understanding the applicability and limitations of electronic-structure methods needs careful and efficient comparison with accurate reference data. Knowledge of the quality and errors of electronic-structure calculations is crucial to advanced method development, high-throughput computations and data analyses. In this paper, we present a main-group test set for computational materials science and engineering (MSE), that provides accurate and easily accessible crystal properties for a hierarchy of exchange-correlation approximations, ranging from the well-established mean-field approximations to the state-of-the-art methods of many-body perturbation theory. We consider cohesive energy, lattice constant and bulk modulus of a set of materials that representatives for the first- and second-row elements and their binaries with cubic crystal structures and various bonding characters. A strong effort is made to achieve high numerical accuracy for cohesive properties as calculated using the local-density approximation (LDA), several generalized gradient approximations (GGAs), meta-GGAs and hybrids in all-electron resolution, and the second-order Møller–Plesset perturbation theory (MP2) and the random-phase approximation (RPA) both with frozen-core approximation based on all-electron Hartree–Fock, PBE and/or PBE0 references. This results in over 10 000 calculations, which record a comprehensive convergence test with respect to numerical parameters for a wide range of electronic-structure methods within the numerical atom-centered orbital framework. As an indispensable part of the MSE test set, a web site is established http://mse.fhi-berlin.mpg.de. This not only allows for easy access to all reference data but also provides user-friendly graphical tools for post-processing error analysis.

For most existing test sets, the developers provide all essential information for each calculation, including molecular geometry, basis set, and/or code-specific numerical setting [39,40].
Condensed-matter physics and materials science is lacking behind, so far, with respect to comparable benchmark datasets.Two types of accurate reference data are crucial.One is essentially the exact values, obtained from either precise experiments or high-level theoretical calculations.They are prerequisite for an unbiased benchmark of the intrinsic errors associated with, for example, the exchange-correlation (XC) approximations in DFT or more generally the treatment of relativistic effects.The other type of reference data can help to quantify the numerical error in calculated values using any chosen method, which might come from the basis set incompleteness, finite k-point sampling, or other approximations made in the numerical implementation.Unfortunately, for condensed-matter systems, neither of these two types of reference data are easy to obtain.We also mention here the Born-Oppenheimer approximation that decouples the dynamics of nuclei and electrons.This should be assessed as well, but will not be done in this paper.
The paper is structured as follows: After a detailed survey of the test sets that are widely used in quantum chemistry, as well as in materials science, we discuss the arXiv:1808.09780v1[cond-mat.mtrl-sci]29 Aug 2018 underlying challenges to obtain numerically accurate reference data in the latter.In Section II, a representative test set for materials science and engineering (MSE) is established.It is followed by an overview of a dedicated web site http://mse.fhi-berlin.mpg.de.We take three examples to demonstrate that the web site features a multi-mode access framework, versatile visualization, and a linear regression tool for post-processing data analysis.Section IV presents the numerical strategies applied in this work to obtain the numerically wellconverged data on a diverse levels of theory.We make our conclusions and present outlooks in Section V.

A. Test sets in quantum chemistry
The importance of reliable test sets for the success of quantum-chemistry methods was first realized by Pople and co-workers [1,26,[41][42][43][44].Along with the development of the Gaussian-n theories [1,26,[41][42][43][44], a hierarchy of extrapolating levels of correlation and basis sets have been developed to obtain increasingly accurate thermochemistry.A sequence of pioneering test sets for quantum chemistry were also developed and used for methdological validation.These testsets are now widely recognized as the Gn test sets, addressing atomization energies and other energetic properties of molecules with increasing numbers of accurate reference values obtained from experiments [1,26,[41][42][43][44].One of these test sets, G3/99 [26] generated in 1999, has been widely used in the development of density-functional approximations (DFA) to describe covalent bonding in the main group molecules [1,[25][26][27][28][29][30][31][32][33][34][35][36][37][38].Since then, many well-established test sets have been proposed for different, mainly ground state properties of molecules and molecular processes [1-15, 26, 42].For example, Hobza and co-workers designed the S22 set, comprising 22 non-covalent binding complexes of biologic relevance, which can be further divided into 7 systems for hydrogen bonds, 8 for dispersion bonds, and 7 for mixed bonds [7].Moreover as one of the most popular standard benchmark sets, the BH76 set proposed by the Truhlar group consists of forward and reverse barrier heights of 19 hydrogen transfer reactions, 6 heavy atom transfer reactions, 8 nucleophilic substitution reactions, and 5 unimolecular and association reactions [4].Most recently, the Grimme group compiled a comprehensive benchmark test set, GMTKN30, which includes 30 subsets collected from the literature, covering a large section of chemically relevant properties of the main-group molecules [14,15].
It may appear plausible that reference data, i.e. the accurate values of the relevant chemical and physical properties of atoms and molecules, can be acquired from experiments [1-3, 6, 11-13, 26, 42], but the required accuracy is not always achievable.The influence of electronvibrational coupling is often unclear, which, can hamper the comparison with the directly computed values.Thus reference data would ideally be created by accurate quantum chemistry methods such as the coupled-cluster (CC) method with single, double, and perturbative triple excitations, CCSD(T) [45][46][47], or single, double, triple and perturbative quadruple excitations, CCSDT(Q), or full configurational interaction (full-CI) theory extrapolated to the complete basis set (CBS) limit [48][49][50].For the test sets aforementioned, the corresponding reference data are either carefully chosen from accurate experiments [1-3, 6, 11-13, 26, 42] or obtained from high-level first-principles calculations [4,5,[7][8][9][10].
As the most popular choice in quantum chemistry, the atom-centered Gaussian-type orbital (GTO) basis sets provide well-converged total energies of atoms and molecules with a reasonable basis set size for the meanfield methods, including the local-density approximation (LDA), generalized gradient approximations (GGAs) and hybrid functionals in DFT, and the Hartree-Fock method in WFT.For an all-electron approach and for heavier atoms (e.g.Z>18), i.e. when wave-functions oscillate more strongly at the core region, it is more convenient, efficient, and accurate to use numerical atom-centered orbital (NAO) basis sets [51,52].
For advanced correlation methods which require unoccupied single-particle states, e.g. the second-order Møller-Plesset perturbation theory (MP2) [53], the randomphase approximation (RPA) [54][55][56][57], or CCSD(T) [45][46][47]58], the slow basis set convergence is a more serious problem [23,24,[59][60][61][62].It can be ascribed to the inaccurate description of the electron-electron Coulomb cusps using smooth orbital product expansions [63,64].The so-called correlation-consistent basis sets have been proposed [24,59], which allow for an analytic extrapolation to the CBS limit.Alternatively by introducing an explicit dependence on the inter-electronic distance into the wave function (F12 strategies [64][65][66][67][68]), it is possible to consider the cusp explicitely and suppress the basis set incompleteness error at a finite basis set size.Both techniques were proposed to address the basis-set incompleteness errors in advanced correlation methods for both atoms and molecules, which have been demonstrated to be very successful for light elements (e.g.Z<18) and for ground-state properties [24,59,64,68].Therefore, accurate chemical or physical properties, e.g.reaction energies, reaction barrier heights and isomerization energies, can be obtained without concern for the numerical error cancellation originating from a given finite basis set.However, the quantum-chemistry test sets are mainly available for the ground-state properties of light elements and small molecules, as it remains a challenge to obtain accurate reference data with advanced correlation methods for heavy elements (e.g.Z>18), excited states, and large systems.

B. Test sets in materials science
In computational materials science, the situation is more complex and less developed than in the quantum chemistry of molecules, and there is an urgent need of accurate test sets for the developement of advanced DFAs for solids.Staroverov and co-workers considered lattice constants, bulk moduli, and cohesive energies of 18 solids, as well as jellium surface energies to benchmark the TPSS functional, a meta generalized gradient approximation (meta-GGA) [69].They also presented results for LDA, GGA PBE, and meta-GGA PKZB [70].Heyd and Scuseria used lattice constants and bulk moduli for 21 solids and band gaps for 8 semiconductors out of this set to investigate the screened Hartree-Fock-exchange properties in the HSE hybrid functional.Comparison was also made with LDA, PBE, and TPSS [71].This test set (or part of it) has been used to benchmark the HSE and HSEsol as implemented in VASP [72,73], to understand the failure of B3LYP for solids [74], and to develop improved DFAs for solids [75][76][77].A set of 60 cubic solids was used by Haas, Tran, and Blaha to compare the performance of different GGA functionals, LDA and TPSS in describing lattice constants and bulk moduli [78][79][80].To compare the accuracy of different van der Waals (vdW) functionals applied to solids, Klimes and co-workers determined cohesive properties for a set of 23 bulk solids [81], based on a set of materials and properties used by Csonka and co-workers who tested the accuracy of GGA and meta-GGA functionals [82].Recently, Tran et al. performed an extensive test on the lattice constant, bulk modulus, and cohesive energy for a large set of XC approximations belonging to rungs 1 to 4 of Jacob's ladder of DFT [83], performed in a non-self-consistent way using PBE orbitals.The test set used contain 44 strongly and 5 weakly bound solids, most of which were cubic, except for hexagonal graphite and h-BN.These test sets of materials with different properties have already been playing an important role in developing and benchmarking the DFAs for solids.These test sets exclusively rely on reference values from experiment; No quasi-exact calculated results exist, which can be used as reference data for solids.
For reasons of comparability and consistency, reference values from theory provide the unique opportunity to compare calculations based on exactly the same atomic structure and exclude finite-temperature, zero-point vibration, relativistic and electron-vibration coupling effects.Recently, the "gold standard" method in quantum chemistry, CCSD(T), has gained attention in materials science [84][85][86][87][88][89].A significant step towards an exact description of solids was demonstrated by Booth et al. by performing a full-CI quality calculation with the aid of the Quantum Monte Carlo (QMC) stochastic strategy [85].However, the numerical difficulty and computational complexity for describing the large extent of solids has strongly limited the applications of these highly accurate quantum chemistry methods to only model systems with very few electrons, basis functions, and k points [84,85,88,90].Other QMC stochastic strategies, such as Variational Monte Carlo (VMD) [91] and Diffusion Monte Carto (DMC) [92] feature a low computational scaling with respect to the system size and a weak dependence on basis set.QMC has been demonstrated to be as accurate as aforementioned quantum chemistry methods, e.g.CCSD(T), if certain numerical issues could be considered properly, which includes the fixed-node error and the form and optimizaiton of the trial function [93][94][95].
Unlike the molecular systems investigated by quantum chemistry, the extended periodic materials are often simulated in reciprocal space with periodic boundary conditions (PBC) for the sake of fully taking advantage of the periodic symmetry.However, it also introduces extra numerical difficulties: The first Brillouin zone in reciprocal space must be sampled by a finite number of k points, for which different kinds of k-mesh have been proposed to provide an efficient sampling according to different periodic symmetries [96][97][98][99][100].Basis set incompleteness error is another big issue: The atom-centered GTO-type basis sets that are dominant in quantum chemistry become cumbersome for condensed matter systems [101,102].In particular, large and especialy diffuse GTO basis sets might cause severe ill-conditioning problems due to the lack of rigorous orthogonality for GTO basis functions.The plane-wave basis sets are a popular alternative choice for solids as they provide an intrinsically improvable description of the electronic orbitals with only one parameter: the planewave cutoff energy.However, the delocalized nature of plane waves makes it inefficient to describe the localized core electrons surrounding atomic nuclei.In practice, the pseudo-potential approximation, which removes the core electrons from part of calculations, are often needed together with the plane-wave basis sets in the projector augmented wave (PAW) framework [103,104].More recently, it has been demonstrated that the use of compact NAO basis sets [52,105] or the linearized augmented plane wave (LAPW) strategies [106] are able to converge the mean-field approximations in all-electron and full-potential resolution with relativistic effects explicitly included as opposed to implicitly as is the case for pseudopotential methods.
Such diversity of methods leads to a need for a standard data set that is comparable among different numerical implementations.This is indeed the purpose of the ∆-value concept introduced by Cottenier et al. [107,108], which is currently focusing on the implementation of the PBE method for solids.The numerically well-converged PBE results from the all-electron, full-potential LAPWbased program WIEN2k [108,109] are taken as reference.To quantify the numerical errors in the description of equations of state, the ∆-value is defined as the root-mean-squared (RMS) energy difference between the equations of state of two codes, averaged over an exhaustive test set of crystalline solids, containing all groundstate elemental crystals.
Despite the enormous success achieved by the (semi)local density-functional approximations, e.g.LDA and GGA PBE, there are several notorious failures of these methods.For instance, there exist serious self-interaction errors and significant underestimation of vdW interac-tions for these functionals, which demand more sophisticated approximations.The meta-GGAs, in which the orbital kinetic energy density is added to the density functional evaluation, belong to a higher rung of the Jacob's Ladder in DFT.Representative are the nonempirical TPSS [69] and the multiparameter empirical M06-L [110].A recently proposed nonempirical meta-GGA, SCAN, which satisfies 17 known exact constraints appropriate to a semilocal functional [111], has been found to show a big step ahead in accuracy for manifold chemical and physical properties and has attracted increasing interest in computational materials science.The so-called hybrid functionals, e.g.HSE06 [112], incorporate the information of the occupied orbitals in Hartree-Fock-like "exact exchange".The next level of complexity is to derive DFAs using the information of virtual orbitals.Two methods at such level, MP2 [53] and RPA [18,34,35,[54][55][56]113], are state-of-the-art in computational materials science [34,76,77,85,87,[114][115][116][117].The numerical errors in these methods can either be inherited from the aforementioned algorithms to solve the one-electron Kohn-Sham (or Hartree-Fock) equations, or arise from extra algorithms, such as the choice of the self-consistent Kohn-Sham orbitals for the post-processing evaluations [118]; the resolution-of-identity technique to handle the twoelectron four-center integrals [18,[119][120][121], and the localization approximations [114,122,123] to reduce the computational scaling in these advanced correlation methods.
In this context, it becomes imperative for computational materials science to have a representative test set with numerically well-converged reference values at various levels of theory.In spirit of the existing quantumchemistry test sets and the ∆-value concept, we introduce in the following our test set for materials science and engineering, coined MSE, which is based on results acquired using density functional methods from LDA, PBE, PBEsol, SCAN, TPSS, M06-L, HSE06; and state-of-theart MP2 and RPA.The numerical convergence of these methods is investigated in terms of total energy.Cohesive energies, lattice constants and bulk moduli are then reported.A comprehensive understanding of the numerical errors, particularly in MP2 and RPA, is discussed in order to aid the community's pursuit of a numerically stable implementation of CCSD(T) and full-CI for the exact description of solids.

II. TEST SET FOR MATERIALS SCIENCE AND ENGINEERING (MSE)
In this project, we select 7 elemental solids and 12 binaries with cubic structure, as the first step in creating the MSE test set.As illustrated in Fig. 1, the set is composed of elements from the first and second rows of the periodic table, consisting of the body-center-cubic (bcc), face-center cubic (fcc), diamond, rocksalt, and zincblende structures.Thus, the set includes materials with metallic, covalent, ionic, vdW, and mixed bonding characters.The Fritz Haber Institute "ab initio molecular simulations" (FHI-aims) electronic-structure package [52] is used to generate the numerically well-converged reference data in the MSE test set.The reasons for this choice are as follows: • The numerical accuracy of the FHI-aims package has been shown to be equivalent to the accuracy of other high-quality codes [124].In terms of speed, FHI-aims is the clear winner, but this was not considered in Ref [124].
• A range of popular DFAs have been implemented in FHI-aims and can be used routinely for atoms, molecules, clusters, and periodic systems.Besides the conventional (semi)-local functionals, e.g.LDA, PBE, PBEsol, SCAN, TPSS and M06-L, a realspace implementation of the exact exchange operator in PBC by Levchenko et al. has been demonstrated to allow for a practical use of hybrid functionals (including HSE06) and the Hartree-Fock method, as well, for both molecular and periodic systems [105].Furthermore, as will be reported soon, a massively parallel, in-memory implementation of periodic MP2 and RPA methods has also been implemented in FHI-aims using canonical crystalline orbitals.
• FHI-aims employs numerically tabulated NAO basis sets [52].The default basis sets were developed in 2009 starting from a minimal basis that is composed of exact occupied orbitals for spherically symmetric free atoms [52].Such minimal basis captures the essential core-electron behavior in the vicinity of atomic nuclei.Additional "tiers" were defined in a step-wise minimization of the LDA total energies of symmetric dimers for all elements.These hierarchical basis sets, tier-n (n=1-4) for short, can provide the CBS total energies for meanfield approximations (LDA, PBE, PBEsol, SCAN, TPSS, M06-L and HSE06 in this project) in an allelectron description.
• To address the slow basis set convergence of advanced correlation methods (MP2 and RPA in this project), FHI-aims provides a sequence of NAO basis sets with Valence Correlation Consistency [24], namely NAO-VCC-nZ with n=2, 3, 4 and 5.The basis set incompleteness error in the valence correlations of MP2 and RPA can be removed using two-point extrapolation schemes [24].Unless otherwise stated, the MP2 and RPA calculations in this work are valence-only (frozen-core) using allelectron Hartree-Fock and PBE orbitals, respectively.Assuming complete convergence of the kmesh and basis sets, any discrepancy between our results and those obtained with a plane-wave basis should presumably originate from the error of the pseudo-potentials used to generate the valence and virtual orbitals in the self-consistent procedure using Hartree-Fock or PBE.Besides the minimal basis, the NAO-VCC-nZ basis sets comprise an additional group of s, p hydrogen-like functions, named enhanced minimal basis [24].For isolated molecules, such s, p group has been demonstrated to be useful to improve the description of valence correlations [24].However, the densely packed nature of the condensed-matter systems largely alleviates the difficulty of saturating the s, p basis space for valence correlations.In this work, we exclude the enhanced minimal basis and re-optimize the NAO-VCC-nZ basis sets, but do not change the name for simplicity (interested readers are referred to Ref. 24 for the optimization strategy in detail).
At present the NAO-VCC-nZ basis sets in FHI-aims are only available from H to Ar [24].As a consequence, the MSE test set is currently focused on light main group elements (see Fig. 1).In this paper, we report the numerically well-converged results of MP2 and RPA for 14 selected materials, including 4 elemental solids (Ne and Ar in the fcc, and C and Si in the diamond structure) and 10 binaries (LiF, LiCl, LiH, MgO and MgS with rocksalt, BeS, BN, BP, SiC and AlP with zincblende structure).We snapshot in Fig. 2 the MP2 data currently available.In the long term, the MSE test set shall be extended to include heavy elements and non-cubic structures, defects, and surfaces, including representatives for the majority of systems of interest in materials science and engineering.
Relativistic effects were investigated as well, using the atomic zeroth order regular approximation (atomic ZORA) [52].By performing a linear regression comparison, we confirm quantitatively that the relativistic effect has a negligible influence on the materials formed by light elements, regardless of any XC approximation that is used (see the following section for more details).Thus, we report the reference data without consideration of relativistic effects.As a side benefit, this approach excludes the numerical uncertainty arising from different relativistic methods [52].Obviously, for heavy elements, the relativistic effect must be considered carefully.

III. THE WEB SITE OF THE MSE TEST-SET
A key feature of our MSE test set project is to establish a dedicated web site http://mse.fhi-berlin.mpg.de, which allows for easy access and analysis of the data.

A. Multi-mode access to the reference data
At the MSE home page, an overview of the MSE test set project is given along with a table that summarizes all materials available for a given DFA.The materials are sorted by their crystalline structures with a layout similar to Fig. 1.The default DFA is PBE; the table for other DFAs is obtained via a drop-down select box.A search engine allows to access a group of results for a given material, structure, or DFA; and/or combination of the above.Figure 2 shows a snapshot of the filtered table produced by the search engine, listing the MP2 reference data currently available.This multi-mode search framework, together with a well-organized data structure, guarantees easy access to the more than 10,000 calculations in the current MSE test set.

B. Visualization
The MSE web site provides a quick and easy-to-use visual display of the test set data.For any individual reference value calculated for a given material and using a specific DFA, the crystal structure and the wellconverged equation of state are displayed.If available, one can also visualize the numerical convergence towards this reference data; this includes convergence tests for basis set, k-mesh, and internal numerical parameters.Furthermore, a statistic analysis of the basis-set convergence for the whole test set can be visualised, which is derived using the mean absolute deviation (MAD).
As an example of this functionality, Fig. 3 presents the basis set incompleteness error in MP2 cohesive energies per atom for (A) Si diamond and (B) 14 materials representing covalent, ionic, vdW, and mixed bonding characters.Clearly, NAO-VCC-nZ basis sets provide an improved description for advanced correlation methods with the increase of the basis set size, which allows for the extrapolation to the CBS limit from NAO-VCC-3Z and 4Z (i.e.CBS [3,4]).As a recommendation for practical use, we note that the basis set extrapolation from NAO-VCC-2Z and 3Z (CBS [23]) guarantees a near NAO-VCC-4Z quality, but with much less computational cost.

C. Linear regression
For the benchmark studies of the DFAs, the MAD and the root-mean-square deviation (RMSD) are often used to quantify the numerical errors in calculations.For a given group of materials with the number of materials The CBS [3,4] value is taken as the reference data.The convergence test is performed with an 4x4x4 Γ-centered k grid.In Sec.IV, more details about the numerical convergence for different methods will be discussed.
of n and the targeted observables of {Y i }, if the computational data are {y i } with the errors distributed as x i = Y i −y i , we then define the relevant MAD and RMSD as Despite both MAD and RMSD being measures of accuracy, we note that larger errors have a disproportionately large effect on RMSD, making it more sensitive to outliers.
A linear regression by means of a least-squares fit allows us to separate the error into predictable (or systematic) and material-specific (or residual) parts [125,126].The resulting linear model allows the analysis of the material-specific deviations { i } and the corresponding root-mean-square deviation (RMSD) as The systematic error can be determined by the difference between RMSD and RMSD.It was argued that the material-specific deviations represent a true measure of the inadequacy of the method or the numerical incompleteness of the basis set and k-mesh [125,126].To make the most of this analysis method, we have equipped the MSE web site with a linear-regression analysis tool.
In three examples, we will demonstrate how to gain insight into the numerical and intrinsic limitations of a state-ofthe-art DFAs.
(1) Basis set convergence in the MP2 lattice constants: Slow basis set convergence is a well-documented problem for advanced correlation methods like MP2 and RPA. Figure 2 presents the basis set incompleteness errors in MP2 cohesive energies using the valence correlation consistent basis set in the NAO framework.Taking the MP2 results in the CBS limit as the reference, Table I   the basis set size, RMSD and RMSD with NAO-VCC-3Z remain about 0.1 Å, which is unacceptable for real applications.
Figure 4 shows the linear regression analysis.It clearly suggests that there are two outliers, which are Ne and Ar with fcc structure.We note that Ne and Ar crystals are bonded by vdW forces, which are overestimated by the MP2 method; however is an intrinsic error of the MP2 method and thus not addressed here.For isolated molecules, it has been discussed in previous literature [18,24] that an accurate description of weak interactions using advanced DFAs needs either a counterpoise correction scheme or very large basis sets with diffuse basis functions to address the notorious basis set superposition error.As suggested by our results in this work, this conclusion is also true for solids governed largely by vdW interactions.Excluding these two cases from the linear regression analysis results in a much better quantitative accuracy (values in parentheses in Table I).The RMSD for NAO-VCC-3Z and RMSD for NAO-VCC-2Z are below 0.02 Å, which clearly suggests that NAO-VCC-3Z or even 2Z with a systematic correction would be good enough to converge the MP2 lattice constants for strongly bonded systems, while the weak interactions must be treated with special care.
(2) Starting point (SP) influence on meta-GGAs and RPA: The common practice of evaluating meta-GGAs, TABLE II.RMSDs and material-specific RMSD of the starting-point influence on cohesive energies per atom."DFA@SP" denotes which method is based on which starting point orbitals."sc-DFA" denotes the self-consistent study.(unit: eV) Methods to compare RMSD RMSD sc-SCAN vs SCAN@PBE 0.015 0.015 sc-M06-L vs M06-L@PBE 0.049 0.043 sc-TPSS vs TPSS@PBE 0.035 0.033 RPA@PBE vs RPA@PBE0 0.119 0.030 like TPSS, M06-L and SCAN, and advanced many-body perturbation methods, like RPA, is by performing an energy evaluation a posteriori using LDA, GGA or hybrid GGA orbitals [35,83].Meanwhile, the Hartree-Fock orbitals are used for the MP2 method generally in quantum chemistry.The self-consistent implementation of meta-GGAs has been realized in many computational packages, including FHI-aims.In the MSE test set, we provide accurate reference data of TPSS, M06-L and SCAN for both (a) self-consistent and (b) non-self-consistent using PBE orbitals, calculations.We also examine the starting point influence on RPA cohesive energies by using PBE and PBE0 orbitals.For simplicity and consistency, we use "DFA@SP" in this discussion to denote which method is based on which starting point (SP) orbitals.
Table II shows the influence of the starting point on the cohesive energies per atom in terms of the RMSD.The linear regression was performed to extract the materialspecific part of the deviations.Clearly, the starting point influence is mild for meta-GGAs: SCAN shows the smallest influence, leading to a RMSD of only 15 meV.Meanwhile, the linear regression analysis suggests that these errors are almost material-specific.The corresponding systematic errors (RMSD − RMSD) are less than 2 meV.In contrast, RPA is much more sensitive to the choice of the starting point orbitals, though the influence is quite systematic: the material-specific error RMSD is only about 30 meV.This small error indicates that a careful choice of the starting point orbitals can improve the RPA performance because of its systematic underestimation of cohesive energies [118] and weak interactions [34].
For meta-GGAs and RPA, the starting point influences on lattice constant and bulk modulus are similar: the material-specific deviations are mild, though the systematic deviation is quite large for RPA results.The readers interested in this topic can easily access the data and perform the linear regression online by themselves.It is also easy to investigate the influence of relativistic effects on different methods and properties in the same manner.
(3) Cross-over comparison between different methods: Figure 5 summarises the cross-over comparison of cohesive energies between different methods.RMSDs and the material-specific RMSDs are shown in different subtables.In this comparison, RPA@PBE reference data were used and meta-GGAs were calculated self-consistently.The nine DFAs investigated covers all five rungs of the Jacob's ladder of DFT.From the many-body perturbation theory point of view, MP2 and RPA consider the electronic correlations explicitly in the many-body interaction picture, while others are all mean-field approximations.
LDA and RPA are the density functional approximations on the lowest and highest rungs of the Jacob's ladder, respectively, and they show the largest deviation in the cohesive energy calculations, leading to a RMSD of over 1 eV.In fact, LDA shows a large disagreement with all other methods, with the average RMSDs about 0.7 eV.This observation agrees with a widely accepted argument that the local density approximation derived from homogeneous electron gas completely misses the high order density derivative or response information that is important for a proper description of chemical bonding.However, our linear regression analysis suggests that this error is quite systematic: The material-specific RMSD is only 0.16 eV between LDA and RPA.
A similar analysis can be applied to study the intrinsic limitations in MP2 and RPA.It is well-known that the RPA method performs unsatisfactorily in the calculations of cohesive energy for solids [35,37,127,128].From the many-body perturbation point of view, this error is due to the lack of an infinite summation of the second-order exchange diagrams, which is necessary to eliminate the notorious one-electron self-interaction error in the RPA method [35,37].On the other hand, the MP2 method itself is exactly free from such self-interaction error, but completely ignores high-order perturbative diagrams.In consequence, a large RMSD of about 0.68 eV between MP2 and RPA cohesive energies is observed, which can be traced back to the dissimilarity of underlying physical models used in the two methods.Recently, much effort has been devoted to improve RPA and MP2 from different perspectives, but these approaches often exacerbate the computational complexity dramatically.Our linear regression analysis suggests that the difference between MP2 and RPA cohesive energies is also quite systematic, leading to a material-specific deviation RMSD of only about 100 meV.
We also compared the results of different methods with experimental data that were corrected for thermal and zero-point vibrational effects [83,107].In-line with the observation from the above cross-over comparision, most of the errors in different theretical approximations are quite systematic, leading to the material-specific errors RMSDs in 9 methods are lower than 210 meV, with M06-L presenting the largest difference.The visualization of the linear regression analysis suggests that abnormally large errors in M06-L occur in metallic systems (Na and Al), which can be ascribed to an incorrect oscillation of the exchange enhancement factor of M06-L when approaching to the uniform electron gas limit [129].Taking these two points out, the RMSD and RMSD of M06-L are reduced to 131 meV and 95 meV, respecitvely.
To summarise this section, we took three examples to demonstrate the usage of our MSE web site.In the context of ongoing innovation in computational materials science, driven by data technology, there is a growing awareness of the importance of effective data sharing and recycling.Here, we argue that a dedicated test-set web site should be about more than an easy, static access of the reference data.In order to liberate the power of test sets, of key importance is providing friendly analysis interface that facilitate the users to play with the online data, repeat the observation in the original papers, and even gain new insights from the data by themselves.

IV. NUMERICALLY WELL-CONVERGED REFERENCE DATA
In this section, we introduce the numerical strategies applied to obtain the reference data in the MSE test set.Numerically well-converged reference data at various levels of theory are crucial to achieve an unbiased benchmark and discussion in the previous section.
Due to the use of NAO basis sets, the electronicstructure problem is addressed using numerical integration in FHI-aims.The specific technical aspects behind the numerical setting of the grids used for threedimensional integrations are described in Ref. 52 for mean-field approximations, and in Ref. 24 for advanced correlation methods.In this work, very dense grids are employed to ensure an meV accuracy in total energy consistently for all methods.
For hybrid functionals (HSE06 in this work) and more sophisticated treatments of the exchange and correlation (e.g.MP2 and RPA), the evaluation of electron repulsion integrals (ERIs) is a major computational bottleneck.Within FHI-aims, an efficient localized resolutionof-identity (called "RI-LVL") method [121] was developed, which expands the basis-function products in a set of auxiliary basis functions located at the same atoms as the basis functions.Of key importance is the completeness of the auxiliary basis set, which ultimately determines the accuracy of the RI-LVL method.For isolated molecules, Ihrig et al. have demonstrated that the RI-LVL errors in HSE06, MP2 and RPA total energies per atom can be converged to sub-meV using a relatively small, but sufficient auxiliary basis set [121].
In FHI-aims, the RI-LVL method has also been adopted in the periodic implementation of HSE06 [105], MP2 and RPA methods.For all 19 materials in the MSE test set, we have carefully converged the RI-LVL errors in HSE06, MP2 and RPA total energies with respect to the size of the auxiliary basis set.Our results demonstrate that the accuracy of the RI-LVL method developed initially for isolated molecules is completely transferable to extended systems.Since it is a specific numerical issue for FHI-aims, the results are included in the MSE web site (see Sec. III for usage) and we refer the readers to Ref. [105] and [121] for more details and discussions about the RI-LVL implementations.
In contrast, the numerical error in terms of k-mesh and basis set sizes is a universal difficulty for any software implementation.In the following section, we focus on the numerical convergence in both aspects for the total energy per atom of each method and each material in the MSE test set.

A. The k-mesh convergence
The k-mesh convergence for LDA and PBE methods is investigated together with Γ-center (GC) and Monkhorst-Pack (MP) sampling strategies [130][131][132], both of which span the first Brillouin zone in an evenly spaced manner.The MP k-mesh coincides with the GC one for an uneven number of grid points [131,132].In this project, we choose a set of k-meshes with even numbers of grid points.Our observation is consistent with previous investigations where the MP k-mesh shows faster convergence behavior than the GC k-mesh for insulators and semiconductors at the semilocal density functional level of theory [131,132].In order to achieve 1 meV accuracy, the 4×4×4 MP grid is sufficient for rare-gas crystals (Ne and Ar with fcc structure), and the 6 × 6 × 6 MP grid for other ionic-and covalent-bond systems.To achieve the same accuracy, an 8×8×8 (sometimes even denser) grid is often necessary with the GC k-mesh.However, the slow k-mesh convergence in the calculation of metallic systems is more serious, and cannot be improved using the MP k-mesh.Based on LDA, PBE, and three meta-GGAs convergence benchmarks, we summarize the k-mesh setting in Table III, recommended for all semi-local DFAs investigated in this paper: LDA, PBE, PBEsol, TPSS, M06-L and SCAN.
A linear-scaling HSE06 method for condensed matter systems has been implemented in FHI-aims [105].In order to fully utilize the sparsity of the density matrix in real space, this HSE06 implementation needs a projection of density matrix in a Γ-center k-mesh to the corresponding Born-von-Karman (BvK) supercell by means of Fourier transformation.Furthermore, FHIaims features a massively parallel, in-memory implementation of canonical MP2 and RPA methods for periodic systems.At present, HSE06, MP2 and RPA calculations can be performed only with the Γ-center kmesh in FHI-aims.MP2 and RPA calculations have been reported using k-meshes centered at Γ-point for the projector-augmented-wave method and plane wave basis sets [76,77,85,87,88,133].
Table III lists the k-meshes for converged HSE06 calculations: Generally speaking, HSE06 shows a similar convergence behavior as LDA and PBE using the Γcenter k-mesh.Notable differences happen only with very rough grids, e.g. in the rare-gas crystals (Ne and Ar) with 2 × 2 × 2 and ionic crystals with 4 × 4 × 4. Such abnormality can be ascribed to an improper description of the integrable singularity of the Coulomb potential in reciprocal space [72,134], which is mathematically equivalent to a slow decay of Coulomb interaction in condensed matter systems.In FHI-aims, a cut-Coulomb operator is used for HSE06 to generate the Coulomb matrices at every k-point [105,112].However, a reasonably dense kmesh is still required.Taking C diamond and MgO rocksalt as examples, Fig. 6 shows the k-mesh incompleteness error of total energies per atom for five methods: where n k is the number of k points in each direction (n k × n k × n k ).Our results reveal that a Γ-center mesh with 6 × 6 × 6 grid points is enough for HSE06 to address this singularity issue, thus providing a similar k-mesh sampling quality as LDA and PBE.
In the framework of the mean-field approximations, the evaluation of the exchange-correlation energy needs only the electron density and/or occupied orbitals.However, the advanced correlation methods, i.e.MP2 and RPA, explicitly consider the excitations that are between occupied and virtual orbitals and that cross over different k points.Therefore, it is not surprising to see that MP2 and RPA calculations for solids have a much higher demand on the sampling of the first Brillouin zone.The k-mesh convergence of the periodic MP2 method has been investigated with the Γ-center sampling strategy by Grüneis et al. [76,135].In FHI-aims, we adopt a hybrid strategy to generate the Coulomb matrices in MP2 TABLE III.The k-meshes that are used for different methods."GC" denotes an evenly spaced Γ-center k-mesh and "MP" for the Monkhorst-Pack one.For mean-field approximations, the convergence criteria is 1 meV (or tighter) in the calculations of the total energy per atom.For MP2 and RPA, the complete k-mesh extrapolation from 6 × 6 × 6 and 8 × 8 × 8, denoted as CKM(6,8) is adopted with α = 3 (see Eq. 5).and RPA calculations, i.e. using the cut-Coulomb operator only for the Γ point and the full Coulomb operator for the rest of the k points.For MP2 and RPA, this hybrid choice shows a faster k-mesh convergence compared to the cut-Coulomb strategy used in HSE06 calculations [105].As Fig. 6 illustrates, the total energies per atom calculated at MP2 and RPA levels converge dramatically slower than those of LDA, PBE and HSE06.

(semi-)local
The k-mesh incompleteness error in C diamond remains to be 15 meV for both MP2 and RPA total energies with a dense 8 × 8 × 8 k-mesh.We notice that other off-set evenly sampled k meshes [96,97], the so-called meanvalue point strategy [98], and smearing sampling methods, such as the method of Methfessel and Paxton [99], the improved linear tetrahedron method [100], etc., have all been proposed to provide a better sampling with fewer (or even one) k points, but at the semi-local DFT level of theory only.It is interesting to investigate the influence of these sampling strategies on advanced correlation methods.The computational expense of the canonical MP2 (and RPA) scales as N 3 k (and N 2 k ), where N k = n 3 k is the total grid number in a given k mesh.For C diamond and NAO-VCC-2Z basis set (with 28 basis functions per unit cell), the MP2 calculation with the 8 × 8 × 8 k-mesh requires 10 hours to complete using 320 CPU cores of an Infiniband-connected Intel cluster with Intel Xeon E5-2680 v2 cores (2.8GHz, 20 cores per node).The periodic MP2 implementation in FHI-aims is a k-mesh-oriented parallelization in the framework of the Message Passing Interface (MPI), guaranteeing efficient MP2 calculations with thousands of cores.However, with the optimistic assumption of a perfect parallel scalability, it would need over 2000 CPU cores to finish the MP2 calculation with a denser k-mesh (10 × 10 × 10) in a comparable amount of time.While the RPA calculations with such a dense k-mesh and the NAO-VCC-2Z basis set can be carried out at a reasonable cost, the remaining k-mesh error is about 7 meV for C diamond.The RPA calculations with denser k-meshes and larger basis sets become infeasible A practical way to approach the complete k-mesh limit (CKM) for advanced correlation methods is the two-point extrapolation in terms of an inverse relation between ∆E total [n k ] and n k : The exponential "α" determines the speed of the k-mesh convergence of different methods.On a logarithmic scale (see Eq. 5 and Fig. 6), "α" is the negative value of the slope of ∆E total [n k ] towards the CKM limit.Previously, the exponential α = 3.00 has been successfully used to extrapolated MP2 and CCSD(T) cohesive energies for several materials [85].
Table IV shows the performance of such k-mesh extrapolation for five selected materials (two elemental and three binary solids) from the MSE test set.The complete k-mesh extrapolation from 8 × 8 × 8 and 10 × 10 × 10, CKM (8,10), are taken as the reference for the periodic RPA calculations.The deviations of the CKM(6,8) values are smaller than 1.5 meV for all five materials and the mean absolute deviation (MAD) is only 1.1 meV.Our results confirm the validity of the practical extrapolation (Eq.5) with α = 3.00 for advanced methods provided that reasonable dense k-meshes are utilized.However, a notable decrease of the performance can be observed in the combination of CKM (4,6) and α = 3.00.The MAD remains at about 8.0 meV (with the maximum error of 14 meV for C diamond) for the RPA total energies per atom.
The CKM (6,8) values with α = 3.00 are chosen as the reference for MP2.Table IV indicates that the CKM(4,6) extrapolation is not good enough when using the default choice of α = 3.00, and it can be effectively improved with an exponential value α = 3.95 as optimized for RPA.Our results reveal that advanced correlation methods share a similar k-mesh convergence behavior, and thus the CKM (6,8) values with α = 3.00 are good enough to converge the error within 2.0 meV for insulators and semiconductors, which is choosen for both MP2 and RPA calculations in this work (see Table III).
We notice that the periodic MP2 and CCSD(T) results reported in the literature were often extrapolated using CKM (5,6) or even sparser CKM (3,4) with α = 3.00 [85].It is an understandable compromise considering the computational expense with denser k-meshes and larger basis sets for these methods; However, in order to achieve k-mesh-converged values with the numerical uncertainty within 10 meV, an adapted exponential factor for CKM extrapolation is necessary.

B. The basis set convergence
The difficulty of using the GTO basis sets, the de facto choice in quantum chemistry, to extended systems has been studied extensively in different kinds of crystals mainly using the CRYSTAL program [101,136,137].To quote the statement in their User's Manual (Chapter 10) [138], "as a rule, extended atomic basis sets, or 'triple zeta' type basis sets should be avoided (...), because the outer functions are too diffuse."On the other hand, despite the predominate use in calculations on periodic systems, the plane-wave basis set is ineffective to describe the localized core-electron states and usually needs to be used in conjunction with pseudopotentials.
The NAO basis sets of FHI-aims [24,52], including tiern and NAO-VCC-nZ, hold the promise to provide numerically well-converged total energies for mean-field approximations in all-electron description, as the minimal basis of both tier-n and NAO-VCC-nZ is composed of the exact core and valence orbitals of spherically symmetric free atoms.In the meantime, a confining potential can be used to generate the primitive NAOs that are exactly localized in a certain region, so that any numerical instability caused by an unnecessary overlap between the diffuse atom-centered basis functions can be significantly suppressed (see Refs. 24 and 52 for details).
Table V lists the basis set incompleteness errors in HSE06 total energies per atom using the tier-n series, which are the default choice for mean-field approximations in FHI-aims.The tier-2 basis set is recommended, and tier-3 is considered to be good enough to reach the complete basis set limit [52].While the tier-n basis sets are optimized with respect to small molecules, i.e. symmetric dimers for all elements [52], our results demonstrate their transferability to the extended systems with diverse chemical environments.The average (maximum) error in tier-2 and tier-3 are about 8 meV (16 meV) and 2 meV (9 meV) in HSE06 total energies per atom of light elements' materials.Note that the basis size of tier-2 is slightly larger than Dunning's GTO triple-zeta basis set cc-pVTZ [59], which has been recommended to be avoided for calculations on solids [138].In our work, we do not encounter any self-consistent field (SCF) convergence problem with tier-2 in all our calculations with GGAs and hybrid GGAs, including LDA, PBE, PBEsol, and HSE06.For meta-GGAs, the SCF convergence can be achieved with tier-2 as well, but some code-specific numerical parameters must be set carefully.For instance, we should use a finer numerical integration grid for the vdW systems [139], and it is necessary to introduce a threshold for the kinetic-energy related variable τ , which removes values that do not effect the total energy but do cause sigularities when calculating the potential in the NAO framework [140].Both parameters have been tested extensively, with values chosen such that the errors these parameters introduce are in the sub-meV regime.
FHI-aims provides a larger tier basis set (tier-4) for some light elements.We notice that for the binaries with mainly ionic bonding characters (LiF -MgS), the SCF procedure with can fail.In this work, we report HSE06 total energies per atom using tier-4 for C and Si with diamond structure, SiC and AlP with zincblende structure.Compared with tier-3 results, a noticeable change can be observed only for AlP (about 3 meV).It suggests that tier-3 is competent to provide numerically well-converged HSE06 total energies for most of materials.Concerning the two worst cases of tier-3, i.e.Na bcc and Ar fcc, the basis set incompleteness errors are about 5 meV and 9 meV, respectively.For Na and Ar, the tier-4 basis sets are unavailable.We then introduce 2 s-type and 2 p-type hydrogen-like functions from NAO-VCC-3Z as an additional s, p group to the tier-3, denoted as tier-3+.Table V and Fig. 7 reveal that such s, p group effectively compensates for the basis set incompleteness in tier-3, resulting in the HSE06 total energies per atom with CBS quality for Na bcc and Ar fcc.
The NAO basis functions used in the tier-n series are apt to saturate the bonding region in the middle of two atoms, as they are optimized to minimize the LDA total energies of symmetric dimers [52].In contrast, the sequence of NAO-VCC-nZ basis sets is determined by minimizing the RPA total energies of atoms [24], thus introducing more compact NAO basis functions than the tier-n series.With similar basis sizes, NAO-VCC-3Z delivers a larger basis set incompleteness error (18 meV on average) than tier-2.It indicates the necessity of introducing diffuse atom-centered basis functions in small basis sets to balance the aforementioned incompleteness in core and bonding regions, which renders a better performance for mean-field approximations.However, such discrepancy can be reduced with the increase of the index "n" in both sequences, the average error being only 2 meV and 3 meV for tier-3 and NAO-VCC-4Z with similar basis sizes.NAO-VCC-5Z is the largest NAO basis set in FHIaims, and yields the lowest HSE06 total energies per atom for most of the materials in the test set.However, SCF convergence problem occurs at Al fcc, C diamond, LiF and MgO with rocksalt structure, BN, BP and SiC with zincblende structure.For binary crystals "AB", we tried a hybrid basis set strategy N(n 1 /n 2 )Z, which is a shorthand notation of using NAO-VCC-n 1 Z for "A" and NAO-VCC-n 2 Z for "B".Compared with NAO-VCC-4Z results, better HSE06 total energies per atom can be obtained without SCF convergence problem for LiF, MgO, BP and LiH with N(4/5)Z and for SiC with N(5/4)Z.Fig. 7 presents the differences of the HSE06 total energies per atom (∆E total ) between the best NAO-VCC-nZ and tier-n basis sets.The discrepancy ∆E total is very small, about 1 meV on average.Considering that NAO-VCC-nZ and tier-n are designed for completely different purposes, such an agreement with each other clearly indicates that the complete basis set limit has been approached for the calculations of HSE06 total energies per atom in all-electron description.Other (semi-)local DFAs share a similar basis set convergence as HSE06.The observation and discussion here are then transferable.In addition to HSE06, one can find a comprehensive basis set convergence test with LDA, PBE, TPSS, M06-L, and SCAN in the MSE web site.For PBEsol, the best NAO basis sets determined by PBE are used to calculate the reference data directly.
For MP2 and RPA calculations, the results are extrapolated to the CBS limit using a two-point extrapolation formula [24,60] based on NAO-VCC-3Z and NAO-VCC-4Z, namely CBS (3,4): Here E[n] is the RPA or MP2 total energy per atom using NAO-VCC-nZ basis sets (with n=3 and 4).The accuracy of the CBS(3,4) scheme has been demonstrated in the RPA and MP2 calculations of 18 symmetric dimers from H 2 to Ar 2 [24].Taking the CBS extrapolation values from aug-cc-pV5Z and 6Z as the reference, the average basis set error in CBS (3,4) values is about 13 meV.However, for solids, it is impossible to perform the RPA and MP2 calculations using very large and diffuse GTO basis sets, i.e. aug-cc-pV5Z and 6Z.For LDA, PBE and HSE06, we observe a very good transferability of tier-n and NAO-VCC-nZ from isolated molecules to extended solids as shown in Table V.We argue that the performance of the CBS (3,4) strategy in molecules is also transferable to solids, which can be used to converge the RPA and MP2 total energies per atom in the MSE test set with a similar numerical uncertainty on average.While the extrapolation from NAO-VCC-4Z and NAO-VCC-5Z shall further reduce the numerical uncertainty of the reference data [24], NAO-VCC-5Z is too expensive to be used for periodic MP2 and RPA calculations at present.The k-mesh convergence of MP2 and RPA methods tested in the previous section is carried out together with the smallest NAO-VCC-2Z basis sets (see Fig. 6).However, it would be impossible to perform MP2 calculations with 8 × 8 × 8 for NAO-VCC-4Z, and sometimes even a NAO-VCC-3Z calculation is too expensive, because of the unfavorable scaling of the MP2 method with respect to the number of k points.
Taking C diamond as an example, Table VI shows the k-mesh convergence of the MP2 total energies per atom for different basis sets.The most time consuming result in this table is the combination of NAO-VCC-3Z and the 8 × 8 × 8 k-mesh, which requires about 4 days using 320 CPU cores of an Infiniband-connected Intel cluster with Intel Xeon E5-2680 v2 cores (2.8 GHz, 20 cores per node).As illustrated in Table VI, while the k-mesh error of a 4 × 4 × 4 mesh varies with the basis set employed by several meV, it becomes almost independent of the choice of basis sets for 6 × 6 × 6 and 8 × 8 × 8 meshes.Ohnishi et al. [141] and Grüneis et al. [133] had a similar observation in MP2 and CCSD calculations using the GTO-type and plane-wave basis sets, respectively.In consequence, they concluded that the long-range behavior of the correlation energy depends mostly on the low-lying excitations, and proposed their progressive downsampling technique to approach the complete k-mesh and basis set limit for advanced correlation methods.
In this work, we adopt a similar downsampling concept to estimate the MP2 total energies per atom with the NAO-VCC-4Z basis set and the 8 × 8 × 8 k-grid: using the energy change between 6 × 6 × 6 and 8 × 8 × 8 with the NAO-VCC-3Z (or 2Z) basis set ∆E nZ total [n k = 8] (n = 3 or 2) of: In summary, the reference data for the MP2 and RPA total energies (and also valence correlation energies) per atom are obtained by the combination strategy of CKM (6,8), CBS (3,4) and ∆E nZ total [n k = 8] with (n = 3 or 2).The error bar is estimated to be 20 meV for these MP2 and RPA reference data.The numerical uncertainty in the combination strategy is dominated by the CBS(3,4) extrapolation, i.e. 15 meV for CBS (3,4), 2 meV for CKM (6,8), and less than 1 meV for ∆E 3Z total [n k = 8] or ∆E 2Z total [n k = 8], respectively.

C. Lattice constant, bulk modulus, and cohesive energy
Based on the efforts discussed above to achieve the complete k-mesh and basis set convergence in terms of absolute total energy for various levels of theory, we next focus on the calculated lattice constants, bulk moduli and cohesive energies using these parameters.
To find the optimized lattice parameters for each material and each method, we calculate seven points within a range of ±5 % around the initial value of the lattice constant, and fit the respective (volume, energy) data points to the Birch-Murnaghan equation of state to obtain the optimized lattice constant.If all 7 lattice constants used to generate the equation of state lie within a range of ±7 % around the optimized value, the value is taken as the final, optimized lattice constant a 0 .Otherwise, this optimized lattice constant is used as a new starting point and more (volume, energy) data points are calculated, so that finally all materials' optimized lattice constants are obtained from an equation of state fitted to seven points with lattice constant values in a range of ±5% to ±7% around the final, optimized lattice constant a 0 .We have carefully tested the stability of this procedure using the PBE functional.The obtained optimized lattice constants and bulk moduli are accurate within 0.02 Å and 0.1 GPa, respectively.
The cohesive energies E M coh of the materials are then calculated at the optimized lattice constant a 0 : where N atom is the number of atoms in the unit cell, E M the total energy of the unit cell for material M, and the sum is taken over the total energies E atom of the constituent atoms in their spin-polarized symmetry-broken ground state, i.e. no fractional occupancies.For (semi-)local and hybrid DFAs, the best basis sets and k-grid settings recommended in Table III and Fig. 7 are used to produce numerically well-converged lattice constants, bulk moduli and cohesive energies.These reference data are in all-electron description.The postprocessing MP2 and RPA correlations are frozen-core, but evaluated based on the Hartree-Fock, PBE and/or PBE0 orbitals in all-electron description.The basis set convergence of these properties are well-documented for (semi-)local and hybrid DFAs.However, the slow basis set convergence together with an unfavorable computational scaling makes it a big challenge to perform a systematic investigation of the basis set convergence of these materials' properties at MP2 and RPA levels.We present here the MP2 results of five selected materials from the MSE test sets in Table VII.To the best of our knowledge, it is the first report of the basis set convergence of MP2 for condensed matter systems using NAO basis sets.The readers can easily access the MP2 and RPA data on convergence test for all 13 materials in our MSE web site.
Table VII reveals that the basis set convergence is relatively fast in the MP2 calculations of lattice constants.NAO-VCC-3Z is good enough to provide a well-converged lattice constant (the convergence criterion is 0.01 Å) for all five materials (for details see also the first example in Sec III).In agreement with the previous investigations using plane-wave basis sets [76,85] or a Gaussian and plane waves hybrid approach [117], the slow basis set convergence of MP2 cohesive energies is observed with NAO basis sets as well.Compared with CBS(3,4) values, the basis set incompleteness errors at NAO-VCC-4Z still remain about 60-130 meV.Such slow convergence arises from the slower basis set convergence of the MP2 total energies in bulks than that in free atoms.It is welldocumented in quantum chemistry that the accurate geometry information for the MP2 and other advanced correlation methods can be obtained with the basis sets of triple-zeta quality [142], but the converged atomization energy or other energy differences cannot be achieved with finite basis sets [41,143,144].Our results confirm that this conclusion is also valid for solids.While the bulk moduli obtained at the NAO-VCC-4Z basis set are not well-converged, the basis set error remains about 3 -7 GPa.
Inspecting Table VII also reveals the capability of NAO-VCC-nZ to provide a consistently improvable description of cohesive properties.In general, the calculated MP2 cohesive energies increase with the cardinal index (n) of NAO-VCC-nZ.On the one hand, it allows for an accurate extrapolation to the CBS limit; and on the other hand, the NAO-VCC-4Z values set up a quite rigorous lower bound for the converged values of MP2.In other words, any results smaller than the NAO-VCC-4Z values might be unreliable.
The canonical MP2 method was implemented in VASP for periodic systems in 2009 [135].Later, Grüneis et al. calculated the MP2 cohesive properties of 13 materials in 2010 [76].To approach the complete basis set limit in the framework of the projector-augmented-wave (PAW) method using plane-wave basis sets, they proposed an TABLE VII.MP2 cohesive energies per atom, lattice constants, and bulk moduli.All results have been extrapolated to the complete k-mesh limit using CKM (6,8) 4.63 a In [76], the MP2 results with VASP were extrapolated to the CBS limit.The k-mesh was sampled by a composite scheme.The Hatree-Fock orbitals were generated in the pseudo-potential framework.b In [85], the MP2 results with VASP were extrapolated by CKM (5,6) with α = 3.The Hartree-Fock orbitals were generated in the pseudo-potential framework.
inverse relation between the MP2 correlation energy and the energy cutoff E χ that represents the overlap charge densities for the CBS extrapolation.Meanwhile, an efficient composite scheme was proposed to reduce the computational cost due to the unfavorable scaling of the kpoint number.To be specific, the Hartree-Fock total energy is calculated with a dense 10 × 10 × 10 Γ-centered k-mesh, but the MP2 second-order direct and exchange terms are evaluated with 6 × 6 × 6 and 3 × 3 × 3 kmeshes, respectively.In 2013, Booth et al. updated the MP2 cohesive energies of three materials [85], including C diamond, BN and AlP with zincblende structure.The main difference is to replace the composite scheme by an extrapolation scheme from 5 × 5 × 5 and 6 × 6 × 6, i.e.CKM (5,6) with α = 3, to approach the converged k-mesh sampling.Table VII also lists the canonical MP2 cohesive properties calculated with VASP.FHI-aims predicts very similar cohesive energies to the VASP values with k-grid extrapolation scheme.The difference is about 40 meV for C diamond and AlP zincblende, and 100 meV for BN in zincblende.For MgO and Si, the discrepancy of over 170 meV between FHI-aims and VASP values shall be ascribed to the k-grid incompleteness of the composite scheme used in the previous VASP calculations [76].

V. CONCLUSIONS
With the materials science and engineering (MSE) test set, we have accomplished the first step towards a representative test set with well-defined and relevant cohesive and electronic properties and reference values obtained from a hierarchy of the first-principles calculations.The accuracy of the reference values in the MSE test set is mainly determined by the well-defined numerical setting of each applied DFA.A strong effort has been made to provide numerically converged values from state-of-theart theory, including periodic MP2, and RPA calculations in the complete basis set limit and the complete k-space limit as well.At present, we provide 14 accurate data for MP2 and RPA.
A web site of the MSE test set, http://mse.fhi-berlin.mpg.de, is equipped with a multi-mode access framework, versatile visualization, and a linear regression tool for post processing data analysis.In this work, we demonstrate that these features dramatically assist the post-processing data analysis that is necessary to detect the numerical error in calculations and uncover the intrinsic limitations of DFAs.The presented paradigm for the test set construction is applicable to any new material and materials' property.

VI. ACKNOWLEDGMENTS
The authors thank Fawzi Mohamed for his support in setting up the MSE web page, Steffen Kangowski for creating the layout of the MSE web page, and Norina A. Richter and Toktam Morshedloo for their contribution in the preparation stage of the MSE project.AJL is grateful to the Royal Society of Chemistry for the award of a Researcher Mobility Fellowship, and acknowledges the use of the ARCHER high performance computing facilities, and associated support services, via membership of the UK HPC Materials Chemistry Consortium (EP/L000202).

FIG. 1 .
FIG. 1.The MSE test set containing 7 elements and 12 binaries with cubic structures.

FIG. 2 .
FIG.2.Snapshot of the MP2 data currently available from the MSE web site.
FIG. 4. Linear regression of the MP2 lattice constants at different basis sets to the reference values in the CBS limit.NnZ is the shorthand notation of the valence correlation consistent basis set NAO-VCC-nZ with n=2, 3 and 4.

FIG. 5 .
FIG. 5. Cross-over comparison of cohesive energies between different methods and experiment (Exp) in terms of the RMSD.The direct RMSDs are shown in the up triangular part, while the values in the lower triangular part are the material-specific errors (RMSDs) after the linear regression.(unit: eV)

FIG. 6 .
FIG. 6. Dependence of the total energy per atom on the number of k points n k (along one direction).The Γ-center k-mesh is used.C diamond (A) and MgO rocksalt (B) results are presented on a logarithmic scale.The basis sets used are tier-2 for LDA and PBE and NAO-VCC-2Z for RPA and MP2.The reference total energies are calculated with the 12 × 12 × 12 k-mesh for LDA, PBE and HSE06.For RPA and MP2, the references are extrapolated by CKM(8,10) and CKM(6,8) with α = 3.0, respectively.

FIG. 7 .
FIG.7.Energy differences of HSE06 total energies per atom with the best NAO-VCC-nZ (NnZ for short) and the best tier-n (∆E total =E total [NnZ]-E total [tier-n], in meV).The corresponding basis sets employed for each material are listed on the left and right hand side, respectively.tier-3+ is the tier-3 basis set plus 2 s-type and 2 p-type hydrogen-like basis functions from NAO-VCC-3Z.For binary crystals "AB", N(n1/n2)Z is a short-hand notation of using NAO-VCC-n1Z for A and NAO-VCC-n2Z for B. The basis sets in red deliver the lowest HSE06 results for each material, which are taken as the reference data in this work.

TABLE I .
RMSDs and material-specific RMSD of the basis set incompleteness errors for the MP2 lattice constants.The values in parentheses exclude Ne and Ar.(unit: Å)

TABLE IV .
(6,8)) of extrapolated RPA and MP2 total energies per atom for five materials (in meV).The complete k-mesh (CKM) extrapolation is performed from the different combinations of k grids, namely CKM(n k 1 ,n k 2 ), and with different exponentials α.The errors listed are the deviations of E total -E Ref total , where the reference E Ref total for RPA is the extrapolated value from 8 × 8 × 8 and 10 × 10 × 10, CKM(8,10), with α = 3.00.For MP2, the reference is extrapolated by CKM(6,8)and α = 3.00.The mean absolute deviations (MADs) are shown for RPA and MP2 separately.

TABLE V .
Basis set errors in HSE06 total energies per atom (in meV).The reference data are the lowest energies in two sequences of NAO basis sets, i.e. tier-n and NAO-VCC-nZ.HSE06 results of Na bcc and Ar fcc are calculated using tier-3+ which is tier-3 plus 2 s-type and 2 p-type basis functions from NAO-VCC-3Z.b For a binary crystal AB, a hybrid basis set with NAO-VCC-n 1 Z for A and NAO-VCC-n 2 Z for B is denoted as N(n 1 /n 2 )Z.These are N(4/5)Z for LiF, MgO and LiH rocksalts, BP zincblende, and N(5/4)Z for SiC. a

TABLE VI .
Errors in the MP2 total energies per atom using different basis sets for C diamond (in meV).The reference is the CKM(6,8) extrapolated values with α = 3.00 (see Eq. 5).
. The VASP results are listed as well.