Data-driven estimation of transfer integrals in undoped cuprates

Undoped cuprates are an abundant class of magnetic insulators, in which the synergy of rich chemistry and sizable quantum fluctuations leads to a variety of magnetic behaviors. Understanding the magnetism of these materials is impossible without the knowledge of the underlying spin model. The typically dominant antiferromagnetic superexchanges can be accurately estimated from the respective electronic transfer integrals. Density functional theory calculations mapped onto an effective one-orbital model in the Wannier basis are an accurate, albeit computationally cumbersome method to estimate such transfer integrals in cuprates. We demonstrate that instead an Artificial Neural Network (ANN), trained on the results of high-throughput calculations, can predict the transfer integrals using the crystal structure as the only input. Descriptors of the ANN model encode the spatial configuration and the chemical composition of the local crystalline environment. A virtual toolbox employing our model can be readily employed to determine leading superexchange paths as well as for rapidly assessing the relevant spin model in yet unknown cuprates.

The data-driven approach accompanied by modern machine-learning (ML) techniques becomes an increasingly important tool of scientific investigations across many domains of physics.From quantum to fluid mechanics [1,2] learning from data facilitates descriptions of complex phenomena for which analytical approaches are prohibitively challenging.The adoption of ML in solid-state physics and material science is particularly appealing: the sheer amount of collected experimental and computed records propels the community to design data-driven frameworks for prediction of materials properties [3,4].
The application of ML for problems of solid-state physics is not straightforward.One of the key challenges is to represent periodic (crystalline) or finite (molecule, local crystal environment, etc.) atomic systems as descriptors -data structures amenable to ML methods.Such descriptors must be invariant with respect to the choice of the unit cell (crystals) or to global rotations in a finite system.Several classes of descriptors have been developed for material properties prediction: Coulomb matrix [5], partial radial distribution function (PRDF) [6], smooth overlap of atomic positions (SOAP) [7], diffraction fingerprint (DF) [8] and three-dimensional (3D) Zernike descriptor (3DZRD) [9].The latter were designed for the characterization of 3D shapes [10,11] and successfully employed for comparison of molecules [12][13][14].These descriptors are invariant with respect to the number of chemical species in the dataset, they store detailed information about the spatial arrangement, and do not require additional simulation software.These features as well as the compact size of the resulting data structures make 3DZRD ideally suited for a description of diverse, dissimilar crystalline environments.
Another key challenge is the construction of descriptors that represent material properties.For instance, it is widely accepted that electronic, magnetic, and topological properties of bulk materials are rooted -in a highly nontrivial way -in their electronic structure.However, using the complete band structure of a material as a universal descriptor is not possible for a number of reasons: different number of bands, non-universal discretization of the Brillouin zone, huge dimensionality etc.A more practical approach is to restrict the description to the states relevant for the physical quantity of interest.Naturally, this is possible only for a certain class of materials and only for specific physical property.
Following this idea, we apply a data-driven approach to assess spin models in undoped cuprates -stoichiometric inorganic materials containing divalent copper and oxygen atoms.In contrast to their doped counterpartsthe high-temperature cuprate superconductors [15] -undoped cuprates are magnetic insulators with the 3d 9 electronic configuration of Cu 2+ .The sizable Jahn-Teller distortion lifts the orbital degeneracy, giving rise to half filling and localized S= 1  2 spins.Owing to the plethora of structure types and the quantum limit assured by S= 1  2 , undoped cuprates exhibit a variety of magnetic behaviors [16], from simple quantum dimers and spin chains -to exotic collective behaviors such as the spin-liquid regime in herbertsmithtite γ-Cu 3 Zn(OH) 6 Cl 2 [17], Bose-Einstein condensation of magnons in Han purple BaCuSi 2 O 6 [18], or bound magnon states in volborthite Cu Understanding the magnetic properties of cuprates requires the knowledge of the underlying spin model.While exchange anisotropies are generally present and can alter the magnetic properties, the backbone of spin models are isotropic interactions, and the relevant Heisenberg Hamiltonian is the following sum: where S i and S j are spin operators on sites i and j.The set of relevant magnetic exchange integrals {J ij } determines the spin model.It is important to note that each individual J ij term is a sum of antiferromagnetic (J AF ij < 0) and ferromagnetic (J FM ij < 0) contributions that are driven by competing processes [20].Commonly, , with the exception of short-range exchanges for which the ferromagnetic contribution can become dominant.
The antiferromagnetic contribution is a textbook example of the superexchange mechanism and can be derived via second-order perturbation theory of the Hubbard model in the strong-coupling limit at half-filling as [21][22][23].Here, U eff is the Coulomb repulsion within an effective molecularlike orbital, which in most cuprates is dominated by 3d x 2 −y 2 orbital of Cu and σ-bonded 2p orbitals of O.There is empirical evidence that U eff from the range 4-5 eV gives a proper description of the magnetism of cuprates [24][25][26].Hence, the knowledge of transfer integrals t ij paves the way to a quantitative assessment of the spin model in the majority of cuprate materials.Yet, extracting t ij directly from the structural information is essentially impossible; instead, it requires first-principles calculations followed by an additional modeling.
To overcome this challenge, we propose a data-driven approach for prediction of transfer integrals in cuprates, which requires the crystal structure as the only input.Our approach is based on the local crystal environment description utilizing 3DZRD.The local crystal environment descriptor is used as input for the ML model which is trained on the results of high-throughput densityfunctional-theory (DFT) calculations for hundreds of cuprate materials.DFT calculation for each material is followed by automatized Wannierization and a manual quality control.The trained model is wrapped into a freely accessible web application1 that can be used for a quick estimation of relevant transfer paths in new cuprate materials.
The paper is organized as follows: Section 2 describes the high-throughput DFT calculations and the dataset of transfer integrals.Section 3 details the descriptor for the Cu..Cu bonds.In Section 4 we compare three different ML approaches for predicting transfer integrals and estimate the accuracy by a cross-validation procedure (CV).In the last section, we discuss the performance of our ML model for different classes of cuprates.In particular, for the parent (undoped) compounds of high-temperature superconducting cuprates we show that the ANN model quantitatively captures the ratio between nearest and next-nearest-neighbor transfer integrals.

Dataset generation
We start with the description of the high-throughput DFT calculations employed for the generation of the dataset of transfer integrals.The list of materials contains 672 unique structures of undoped cuprates.The structures were filtered out from the 10 710 cuprate structures stored in the Inorganic Crystal Structure Database (ICSD) [27].For this screening, the following criteria were consecutively applied: (i) the presence of Cu 2+ ions, (ii) electroneutrality (zero total charge), (iii) absence of sites with fractional occupancies, (iv) the minimal inter-atomic distance of 0.5 Å, and (v) the absence of other magnetic atoms beyond Cu [28].The latter criterion is necessary to filter out compounds with multiple magnetic atoms, where the presence of additional bands in the relevant energy range may render the effective one-orbital model inapplicable and its results misleading.For the analysis of the crystal structures we used the pymatgen library [29] for Python.
For each structure, we performed DFT calculations to construct Wannier Hamiltonians and subsequently determined the transfer integrals.All DFT calculations were performed using the generalized gradient approximation (GGA) [30] with the full potential code FPLO of version 18.00-52 [31].The computational workflow comprised several steps.First, scalar-relativistic nonmagnetic DFT calculations were carried out and the Hellmann-Feynman forces were calculated.Second, for hydrogen-containing compounds whose calculated forces exceeded the threshold of 0.1 eV/ Å, we optimized the internal coordinates of H atoms within GGA.The rationale behind this step are largely inaccurate H positions as determined by x-ray diffraction (which is by far most common method of structure determination).Since a considerable number of cuprates contain hydrogen, typically as hydroxyl groups or water molecules, inclusion of such partly optimized structures allowed us to considerably extend the data set.All other cuprates whose forces exceeded the threshold 0.1 eV/ Å were discarded.Third, we calculated the orbital-resolved density of states (DOS) and band structure.From the orbital-resolved DOS, the energy interval which contains the copper 3d x 2 −y 2 bands was determined.The energy interval is selected such that the contribution of the magnetic 3d x 2 −y 2 orbital in the total density of Cu 3d states exceeds 5 %.The determined energy window [E min , E max ] was adopted for Wannierization in the next step.Fourth, the Wannier transformation procedure was performed to obtain the effective oneorbital Hamiltonian H in the Wannier basis.We used copper 3d x 2 −y 2 orbitals as projectors and the interval [E min , E max ] as the energy window to construct the Wannier functions (WF).The latter is necessary to discriminate the target antibonding orbital (crossing the Fermi energy) from its bonding sibling at the bottom of the valence band.The transfer integral between two WF w i and w j placed at copper sites i, j is determined as (real) Hamiltonian matrix element t ij = ⟨w i |H|w j ⟩.The details on the construction of Wannier functions and the construction of the respective tight-binding models are provided in the papers [32,33].
After the calculation pipeline was completed, we obtained a list of transfer integrals {t ij } that connected i-th and j-th copper sites situated at the distance r ij from each other for all valid structures [28].For construction of the dataset we select transfer integrals larger than 5 meV with Cu..Cu spacing less than 8 Å.The distribution of calculated transfer integrals t ij among Cu..Cu distances is shown in Fig. 1.The crystal chemistry of cuprates sets a natural lower limit for the bond lengths; accordingly, there is no transfer integral with the distance less than 2.4 Å in the dataset.Remarkably, for the vast majority of transfer integrals, the absolute values are below 0.2 eV.This natural imbalance of the dataset will inevitably affect the performance of predictions.

Crystal Environment Descriptor
To describe the crystal environment, we first determine the midpoint ⃗ p between a given pair of copper atoms and build a sphere with the empirically determined threshold radius R th = max(4, r ij /2 + 0.2) Å centered at ⃗ p. Next, all atoms in the sphere are enlisted in the crystal environment alongside with nearest neighbors of i and j.We consider nearest neighbors as atoms distanced from i or j not farther than 2.5 Å.After the local crystal environment is assembled, we shift the coordinate system origin to the centroid (the point between copper pair) ⃗ p and normalize atoms coordinates by r 0 = 6 Å to fit the crystal environment in the unit ball.To construct a robust representation of the local crystal environment we introduce the piecewise function of site positions I(⃗ r).The function I equal to the q-th atom oxidation number O q in the ball with center at the position of q-th atom ⃗ r q and radius R q equals to the ionic radius of the atom The normalization factor r 0 is a sum of the maximal considered Cu-Cu distance max ∥⃗ r ij ∥ = 4 Å and a maximal considered ionic radius 2 Å.The function I(x, y, z) describes the spatial configuration and chemical composition of the crystal environment placed in the unit ball with x 2 + y 2 + z 2 ≤ 1.An example of the local crystal environment defined by ( 2) is shown in Fig. 2. We describe the selected crystal environment I in the form of a finitedimensional vector.Such representation provides a robust way for numerical  operations with crystal environments, e.g.similarity and sorting.To obtain the finite vector representation of the crystal environment we decompose the I(x, y, z) in the truncated basis of three-dimensional (3D) Zernike functions (3DZF) Z m nl which are defined as follows [11,34,35] where indices n and l are positive integers which satisfy condition n ≥ l; m changes from −l to l with constraint (n − l) is even number; k = (n − l)/2 and Y lm (θ, ϕ) are spherical harmonics, and (r, θ, ϕ) are spherical coordinates [36].For convenience, we use Cartesian coordinates (x, y, z) representation of 3DZF implying change of coordinates: Z m nl (x, y, z) = Z m nl x 2 + y 2 + z 2 , arctan x 2 + y 2 /z, arctan y/x .3DZF form the complete basis of orthogonal functions in the unit ball, so that the function I(x, y, z) defined in the unit ball x 2 + y 2 + z 2 ≤ 1 can be expanded in the introduced basis [37].
The decomposition coefficients read where the normalization factor is the volume of the unit ball V = 4π/3 [28].Note, c m nl is not invariant with respect to rotations of the crystal environment I. Rotationally invariant characteristics can be obtained by assembling the vector ⃗ C nl whose components are all (2l + 1) coefficients with different m for given pair of n and l.The norm of obtained vector ∥ ⃗ C nl ∥ = C nl determined as is invariant with respect to the rotation of the crystal environment, thus the pre-alignment is not required.We introduce the finite dimensional vector-descriptor of the crystal environment I as where copper-copper distance r ij is incorporated into the descriptor as well.
The size of the descriptor ⃗ D is determined by the cut-off order of the Zernike 3D moments n max and corresponding l max in the truncated basis.The size of the 3DZF basis grows with n max as the sum of the series nmax n=0 (n 2 + 3n + 2)/2.In the present work, we chose the cut-off order n max = 25.The vector ⃗ D encodes the information about spatial configuration and chemical composition of the crystal environment, allowing the introduction of the mapping of ⃗ D on the transfer integral.

Transfer Integral Prediction
Our high-throughput DFT calculations yielded N = 1800 local crystal environments { ⃗ D} with corresponding transfer integrals {t ij }.We build the ML model to predict the continuous-valued attribute t ij associated with the local crystal environment descriptor ⃗ D. To solve this regression problem, we tested the following models: (i) linear (LIR), (ii) random forest (RFR) [38] regression models, and (iii) ANN.To achieve robust generalization and stability of the models, we employ the bagging (bootstrap aggregating) ensemble technique [39].The main idea behind bagging is to train multiple instances of the same model on different subsets of the training data and then combine their predictions to make the final estimation.For each model in the ensemble, a random sample is drawn with a replacement from the original training dataset.Thus, some data points may appear multiple times in the sample, while others may be left out.When making predictions, the individual predictions from each model are combined using a voting method (for classification tasks) or averaging (for regression tasks).In the work, we employ ensemble models with 100 estimators.
As a metric for the ensemble regression model performance with predictions τ we use: (i) the coefficient of determination where (ii) the root mean squared error (RMSE) and (iii) mean absolute error For RMSE, squared errors of the model are included in the average, making this measure more sensitive to outliers.As more variance in predictions, a larger RMSE.The MAE provides a mean of linear scores with all errors weighted equally.For model selection, we employ two CV strategies: the shuffle-split and kfold.In the shuffle-split, the dataset is randomly shuffled and then split into training and test subsets containing a specific percentage of the original data.The procedure is repeated the specified number of iterations.In each iteration the model is trained and evaluated accordingly.The shuffle-split procedure was implemented using the scikit-learn library [40]  In the k-fold CV, the entire dataset is split into k approximately equal parts (folds).Each ML model is trained on the k − 1 folds and evaluated on one fold.The k-fold procedure was implemented using the scikit-learn library [40] with k = 6.The results of the k-fold CV are presented in Table 2. Similarly to the shuffle-split CV, k-fold CV shows that the ensemble ANN has the best scores with MAE = 18 meV and RMSE = 28 meV with the standard deviation of 1 and 3 meV respectively.

Model
We also evaluated the ANN model on the random test-train split with 20 % of the data allocated for the test subset.The prediction of transfer integrals for the test set is shown in Fig. 3 as a scatter plot of the calculated values versus predicted ones.

Discussion
In this section, we will apply our ensemble ANN model to different classes of cuprates and discuss its predictive power.The first example are parent compounds of high-temperature superconductors (HTSC).The common structural feature of these antiferromagnets are cuprate planes formed by corner-sharing CuO 4 plaquettes.The Cu-O-Cu angle amounts to 180 • , maximizing electron transfer between the nearest neighbors (t 1 ) and giving rise to a sizable antiferromagnetic exchange of about 1500 K [41].In addition, the favorable mutual orientation of plaquettes boosts the coupling between second neighbors (t 2 ), as confirmed experimentally [42].Hence, the magnetic properties of undoped HTSC cuprates are described by the frustrated square-lattice model, with competing first-and second-neighbor antiferromagnetic exchanges.Interestingly, ramifications of this competition go far beyond the magnetism: the t 2 /t 1 ratio shows correlations with the superconducting transition temperature [43].Thus, an accurate estimation of this ratio is crucial for understanding the physics of HTSC materials.
To test the accuracy of our ensemble ANN model, we consider the following parent HTSC compounds: La  For all compounds, we recover the frustrated square lattice model with a dominant t 1 and a considerably smaller t 2 .For ease of comparison with Ref. [43], we plot the resulting t 2 /t 1 ratios as a function of the distance between Cu and the apical oxygen atom (d a ) in Fig. 4(a).In the same plot, we show the results of direct calculations of these transfer integrals by DFT calculations and Wannierization.A very good agreement is found for all seven cases.
The closely related family of double-perovskite cuprates A 2 CuTO 6 (A = Ba or Sr, T = Te or W) represents a more challenging test case.Here, the t 2 /t 1 ratio crucially depends on the nature of the T atom: in the two Tecontaining compounds, the leading coupling follows the shortest connections (t 1 ), while in the other two compounds the empty 5d shell of W boosts the diagonal coupling (t 2 ) [44].The sensitivity of our ANN model does not suffice to fully account for this trend: it yields a dominant t 2 for all four compounds.Despite this shortcoming, the model correctly reproduces the t 1 -t 2 model, and the predicted t 2 /t 1 ratio is lower for Te-containing (1.55 for Sr 2 CuTeO 6 and 1.3 for Ba 2 CuTeO 6 ) than for W-containing (1.6 for Sr 2 CuTeO 6 and 1.95 for Ba 2 CuWO 6 ) compounds.
Next, we turn to quasi-one-dimensional cuprates.The dominance of t 1 is correctly reproduced for the quasi-one-dimensional Sr 2 CuO 3 [45], another compound with corner-sharing connections between CuO 4 squares.Importantly, this structure features one shorter Cu-Cu connection, which is not accompanied by sizable electron transfer, and our model correctly captures this aspect: the respective predicted transfer integral is about 20 times smaller than the leading intra-chain term.For another quasi-one-dimensional compound, linarite PbCuSO 4 (OH) 2 featuring edge-sharing chains, our model correctly recognizes the relevance of first-and second-neighbor transfer integrals along the chains, and correctly identifies the leading interchain coupling [46].Importantly, in linarite like in many other edge-sharing cuprates, the nearestneighbor exchange is ferromagnetic.Such exchanges have a more complex nature and can not be described within the effective one-orbital model, which is at the core of our approach.However, the presence of an edge-sharing connection essentially implies the relevance of the respective magnetic exchange, making a dedicated estimation of the electron transfer unnecessary.
As a less trivial case, we consider two isostructural natural minerals in which Cu 2+ atoms form a kagome lattice: kapellasite α-Cu 3 Zn(OH) 6 Cl 2 and haydeeite α-Cu 3 Mg(OH) 6 Cl 2 .The relevance of the cross-hexagon coupling t d and the corresponding magnetic exchange was suggested based on DFT results [47,48] and confirmed experimentally [49,50].(As a side note, the J d exchange is the principal source of frustration in these systems, because the nearest-neighbor exchange J 1 is ferromagnetic.)Here, we consider crystal structures of kapellasite and haydeeite that were determined by neutron diffraction; these structures are not in the ICSD and hence were not used for training.The ANN model yields nearly identical results for both materials, suggesting the leading t 1 ≃ 78 meV, plus sizable t 2 and t d of about 25 meV each.The t 1 and t d values are comparable with the first-principles calculations [47].Given that the latter used a different functional and a different structural input, the agreement is very good.Yet, the relevance of t 2 in the ensemble ANN model is a spurious result which is at odds with the DFT calculations and experiments.
In all previous examples except Sr 2 CuO 3 , the electronic structure featured several relevant transfer integrals.There are many cuprates whose magnetism is shaped by a single coupling dominating over other terms, but it is unclear which coupling is dominant.An instructive example is the spin-dimer compound Cu 2 TeO 5 , where magnetic dimers do not coincide with the structural ones [51,52].For this material, our ANN model successfully reproduces the magnitude of the strongest coupling and its dominance over other terms.Another relevant example is the spin-chain compound CuSe 2 O 5 [26], where electron transfer is facilitated by the [Se 2 O 5 ] 2− anionic group connecting two CuO 4 squares that are at an angle to each other.Also here the ANN model correctly identifies the leading transfer integral.
Naturally, the predictive power of the model is limited, and in some cases the desired accuracy is not reached.For instance, in Bi 2 CuO 4 the leading coupling operates between the structural chains formed by stacks of CuO 4 squares, while the nearest-neighbor coupling within these stacks is three times weaker [53].Our ANN model correctly reproduces the leading coupling, yet predicts that the nearest-neighbor coupling has a similar strength.While the structure of ANN does not allow us to unequivocally determine the root cause of this discrepancy, we believe that it stems from the correlation between the magnitude of the transfer integral and the Cu..Cu separation.While on the average shorter distances indeed correspond to larger ∥t∥, in some materials like Bi 2 CuO 4 it is not the case.
There are several ways to improve the accuracy of the model.An apparent solution is to extend the dataset by including structures that are not represented in the ICSD.Also a revision of materials that were filtered out due to failed Wannierization can make the dataset bigger.However, such amendments will lead to incremental, moderate improvement of the predictive power.Based on our analysis, we conclude that main factor limiting the accuracy of the model are the crystalline environments descriptors.Making them more specific to chemical environments, e.g. by taking the connectivity of atoms within a chosen sphere into account or a more explicit consideration of charge densities, and keeping them as compact as possible may significantly improve the performance of the model.
To finalize the discussion, we emphasize that the main strength our model is its ability to identify magnetically relevant couplings.This is particularly useful for involved structures with a large number of short-and middle-range Cu..Cu separations, where the leading electron transfer paths can be highly nontrivial.The performance remains good across different classes of cuprates, which allows for efficient screening: evaluation of transfer integrals for a single material takes between dozens of seconds and a few minutes.Naturally, our model can not serve as a complete replacement to full-blown first-principle calculations, because error bars for the individual terms may be too high for certain quantitative analyses.However, the model's predictive power is enough to perform qualitative assessment of interactions in spin models.Furthermore, the developed model holds substantial promise for enabling the inverse construction of hypothetical materials with prescribed magnetic topologies.For instance, one can create a Cu-O+X network and manipulate its structure to achieve a particular magnetic coupling arrangement, as indicated by the transfer integrals data generated by the model.Alternatively, one can begin with a known material and inquire about the alterations needed to activate or deactivate, as well as strengthen or weaken, specific magnetic connections.This then opens a plethora of questions on how these enhanced properties may be received in an actual chemical solid-state structure.Thereby this method may offer a novel avenue to engineer materials with distinct magnetic properties and unlock these applications, for instance in the fields of magnetic cooling or data storage.

Conclusions
We constructed an ensemble deep learning model that estimates the magnitude of transfer integrals in undoped cuprates.These terms underlie the leading mechanism of the magnetic exchange, and their knowledge is crucial for correctly determining the microscopic magnetic model.We employed a mapping onto a three-dimensional Zernike descriptor to describe crystalline environments that correspond to individual transfer integrals.The resulting ANN model trained on our high-throughput DFT calculations results can predict transfer integrals with reasonable error MAE = 18 meV.The model efficiently differentiates between weak and sizable transfer integrals, which is most important for estimating the relevant spin model.We discuss the limitations of this approach and outline ways of improving the numerical accuracy.

Fig. 1 (
Fig. 1 (Color online) Transfer integrals obtained from the DFT calculations as a function of the Cu..Cu distance.Region I harbors transfer integrals between edge-sharing (a) and cornersharing (b) CuO 4 plaquettes, while region II is dominated by transfer integrals between CuO 4 plaquettes that do not share oxygen atoms (c).The inset shows the distribution of the computed transfer integrals.Crystal environmentOxidation number +2 -2

Fig. 2 (
Fig. 2 (Color online) Schematics of the workflow: selection of the local crystal environment from the cuprate crystal structure, generation of rotationally invariant descriptor ⃗ D via decomposition of the local crystal environment function in the truncated basis of 3DZF and prediction of the transfer integral t ij with ML algorithm trained on the dataset from DFT calculations.The illustrating example Ba 2 CuHgO 4 (ICSD Identifier 75724) hosts pairs of corner-sharing CuO 4 square-like plaquettes.

Fig. 3 (
Fig. 3 (Color online) Performance of the ensemble ANN model on the testing and training datasets for random split.The ensemble ANN shows R 2 = 0.7, RMSE = 28 meV and MAE = 18 meV on the testing dataset and R 2 = 0.9, RMSE = 19 meV and MAE = 11 meV on the training dataset.The inset figure shows the distribution of the ensemble ANN model error for the test dataset with µ and σ are mean value and standard deviation of the errors.The solid line corresponds to the normal distribution with parameters µ and σ.
a sum of squared residuals of the regression model andS tot = M p=1 (t p ij − t ij ) 2is a total sum of squares with t ij being the mean value of transfer integral in the test dataset with M samples.

Table 1
Results of shuffle-split CV of the selected ensemble models.The average value of R 2 , RMSE, and MAE on six splits is given alongside the standard deviation.

Table 2
with six splits, a test size Results of k-fold CV of the selected ensemble models with six folds.The average value of R 2 , RMSE, and MAE is given alongside the standard deviation.of20%,and a training size of 80 % of the entire dataset.The results of the CV are presented in the Table1.The model selection procedure shows that ensemble ANN has the best performance among the selected models.In particular, ensemble ANN has the lowest average errors, MAE = 18 meV and RMSE = 29 meV with the standard deviation of 1 and 3 meV respectively.