Measuring transferability issues in machine-learning force fields: The example of Gold-Iron interactions with linearized potentials

Machine-learning force ﬁelds have been increasingly employed in order to extend the possibility of current ﬁrst-principles calculations. However, the transferability of the obtained potential can not always be guaranteed in situations that are outside the original database. To study such limitation, we examined the very diﬃcult case of the interactions in gold-iron nanoparticles. For the machine-learning potential, we employed a linearized formulation that is parameterized using a penalizing regression scheme which allows us to control the complexity of the obtained potential. We showed that while having a more complex potential allows for a better agreement with the training database, it can also lead to overﬁtting issues and a lower accuracy in untrained systems.

Machine-learning force fields have been increasingly employed in order to extend the possibility of current first-principles calculations. However, the transferability of the obtained potential can not always be guaranteed in situations that are outside the original database. To study such limitation, we examined the very difficult case of the interactions in gold-iron nanoparticles. For the machinelearning potential, we employed a linearized formulation that is parameterized using a penalizing regression scheme which allows us to control the complexity of the obtained potential. We showed that while having a more complex potential allows for a better agreement with the training database, it can also lead to overfitting issues and a lower accuracy in untrained systems.

I. INTRODUCTION
Atomistic modeling is often divided in two different types of simulations. On the one hand, quantum methods including Hartree-Fock and DFT approaches are considered the most accurate and are employed for virtually any types of chemical species [1,2]. On the other hand, classical force fields are used to perform large-scale and long-time simulations with less accuracy [3,4]. However, it is still difficult to connect both approaches and until now, one can hardly perform a simulation involving millions of atoms for nanoseconds while retaining the accuracy of quantum methods.
For all of these techniques, the main procedure consists in using a very universal analytical formulation for the force field which is then parameterized to match a database of DFT calculations including total energy, forces and stress tensors. However, it is admitted that MLIP can sometimes show poor transferability towards systems that are not included in the learning database. In the worst scenario, the MLIP is so-well fitted to its learning database that non-physical behaviors may be observed outside of it. In order to fix this issue, the main proposal is to regularly check the accuracy of the potential as the machine-learning molecular dynamics simulations are carried out and to improve the MLIP "on the fly" [38][39][40]. Yet, to the best of our knowledge, such flaw of the approach has never been quantitatively investigated while being acknowledged by both users and developers.
For our case study, we choose interactions in gold-iron nanoparticles. In principle, such system can be found concurently in three different chemical orderings namely alloy, Janus and core-shell. Yet, recent experiments have shown that the synthesized Au-Fe nanoparticles are made of an iron core wrapped in a gold shell and that the shape of the iron core depends strongly on the amount of surrounding gold [41][42][43][44][45]. These nanoparticles have potential biomedical applications as iron is known for its intrinsic ferromagnetism and gold capping can protect the iron core from oxidation. However, rationalizing the results of the synthesis along with predicting the material properties would require numerical simulations which are sparse for gold-iron nanoparticles [45][46][47][48][49]. Indeed, while full quantum calculations can not be employed to study clusters of more than tens of atoms, the empirical modeling of gold-iron nanoparticles is a also very difficult case because these two metals are non miscible at room temperature on a very large domain of the phase diagram. There are therefore no iron-gold alloy crystal structures on which to adjust the parameters of an empirical potential. Moreover, iron is magnetic and crystallizes in a bcc structure while gold crystallizes in an fcc structure, which makes the development of a potential capable of capturing all the properties of this alloy even more complex. Previous attempts have shown their limits by stabilizing metastable alloys [50] or by not being able to find the most stable Fe/Au interfaces [46,47], leading us to develop potentials specifically dedicated to a particular problem, and hence highly non-transferable [49].
In this article, we begin by describing the methodology including linearized machine-learning potential and a penalizing regression scheme. In the results section, we first studied the influence of the descriptor functions. Then, we showed that the methodology allows one to quickly obtain MLIPs with different degrees of complexity. Afterwards, the transferability of these different potentials was tested on forces in untrained chemical orderings namely Janus and Core-shell. While the error should decrease monotonically when increasing the MLIP complexity, we observed a surprising non-monotonic behavior thus illustrating that more complexity does not necessarily lead to a better MLIP overall. Such transferability issue was reduced by using a more diverse set of descriptors. Finally, we measured some properties of the bulk and investigate the possibilities and the limitations of the obtained MLIP towards bulk simulations even if it was trained on nanostructures.

II. METHODS
A. The Φ-Lassolars machine-learning interaction potential For our MLIP, we employed the analytical formulation originally put forward by Seko et al. [10,17,18,51,52]. In this method, the total potential energy of a configuration made of N atomic positions is first given by we considered a weighted linear combination of descriptors indexed by n: where ω n is the linear coefficient associated with the descriptor X (i) n . Until now, moment tensors [16], grouptheoretical high-order rotational invariants [52] and bispectrum components [11,12,53] were previously proposed as descriptors for such linearized potentials. In this work, we favored a simpler formulation which consists in developing the descriptor space in explicit twobody, three-body and N-body interactions: [3B] [NB] where R ij is the distance between atoms i and j, θ ijk is the angle centered around the atom i, and l and m are two positive integers. For the cut-off function, we chose what was originally proposed by Behler and Parrinello [8]: f c = 1 2 (1 + cos(π(R ij /R cut ))) with R cut being set at 6Å. The switch function denoted f s is employed in order to prevent from non-physical behavior of the N-body contribution at short distances. To do so, two distances r 1 and r 2 are first defined as respectively 95% and 105% of the minimum of the dimer interactions and then a function is constructed to smoothly go to from 0 to 1 in the range of [r 1 : r 2 ]: f s (u) = 6u 5 − 15u 4 + 11 3 where u = (R ij − r 1 )/(r 2 − r 1 ) [54]. Altogether, these expressions allow for a direct computation of the forces as well as the stress tensors by differentiating with respect to the positions. [10] Regarding the basis of functions f n , there are no physical restrictions. In particular, for the twobody interactions, one can tune f n to mimic traditional interatomic potentials as for example Morse, Lennard-Jones, Buckingham or Yukawa potentials or use simple functions like Gaussians, Lorentzian or Asymmetric lognormal functions. Likewise, for the three-body interactions, the current formulation is very similar to what is done in the Stillinger-Webber potential [55]. Finally, the N-body interaction is a generalized form of the EAM potential where the embedding function is a polynomial of the atomic density [56]. In this case, the integer m corresponds directly to the degree of N-body order. The difference between our formulation and the most recent ones proposed by Seko et al. is that, in the N-body interactions, we did not include any explicit angular dependence and did not mix different forms of the atomic density.
For the fitting procedure of such a linear model, previous studies have proposed the use of genetic algorithm [12,53], weighted ordinary least squares [11], Bayesian linear regression [39], ridge regression [18,51,52] and Least absolute shrinkage and selection operator (Lasso) [10,17]. In order to construct a simpler MLIP, we employed the Lasso regression with the Least Angle Regression Scheme (Lars, together denoted LassoLars) [57]. In practice, along with the ordinary least square objective function, χ 2 OLS , the Lasso scheme adds a penalty on the sum over the absolute value of the coefficients ω n and the employed error function is therefore given by: where α is a parameter that controls the degree of penalty. The penalty on the absolute value of the coefficients enforces lots of the linear coefficients to be exactly 0. Additional, using Lars allows us to select the most relevant descriptors by measuring their correlation to the target. Using LassoLars as a regression scheme is at the expense of accuracy and flexibility for the MLIP but it allows for a considerable reduction in the complexity of the potential [Please see the Appendix for further motivation regarding the choice of LassoLars and for additional details on the Φ-LassoLars implementation].

B. DFT database
Building a general purpose potential for Au-Fe nanocrystals would require to model the atomic interactions not only in different structures (crystal polymorph, interfaces and liquid for instance) but also in different chemical orderings including alloyed, Janus and core-shell nanoparticles. Because the focus of this work is to measure the issues related to transferability, we purposely employed an incomplete database made of only three types of nanoparticles: (1) Alloys with almost equimolar compositions, (2) Pure iron in the body-centered cubic (bcc) phase, (3) Pure gold in the face centered cubic (fcc) phase[see Fig. 1]. Then, molecular dynamics (MD) simulations by means of a house code were carried out to melt the constructed nanoparticles. Simulations were performed in the NVT ensemble obtained with Andersen thermostat at 1400 K during 500 ps using a timestep of 1 fs. We used simple pair-wise potentials made of Lennard-Jones and Morse interactions for gold and iron respectively and of Lennard-Jones interaction for the gold-iron cross-interaction. The employed Morse parameterization is the one of Hung et al. [58] while the two Lennard-Jones potentials for gold-gold and iron-gold were simply parameterized in order to match the bulk lattice parameters and the cohesive energy. Along the melting path, we extracted configurations that are representative of the solid to liquid transition. In addition, each initial structure was also manually compressed. The distances are reduced by a factor of 75% and along the compression, we extracted ten configurations in order to sample structures of higher density and better reproduce the repulsion at short distances. For the same reason, diatomic molecules FeFe, AuFe and AuAu with distances down to 1Åwere also added in the database. For each of these configurations, forces were finally computed at the DFTlevel using single-point calculations. Spin-polarized DFT calculations were performed with the VASP code [59], using PAW type pseudopotentials for iron and gold [60], a plane wave cutoff of 650 eV and a Methfessel-Paxton smearing parameter σ of 0.01 eV. All calculations were done at the Γ-point of the Brillouin zone. Altogether, the database is made of 181653 atomic configurations with an almost equal proportion in the three types of nanoparticles (34% of alloy, 34% of pure gold and 32% of pure iron). We note that the database sampling was made with classical interaction potentials which is less satisfying than performing ab initio MD. Yet, the advantage is that it allows us to quickly sample configurations that remain physically valid and it prevents from running very computationally demanding simulations for those relatively large nanoparticles (more than 50 atoms and more than 100 for respectively the alloy and the pure nanoparticles). In addition, by employing configurations that span from the crystalline to the liquid regime, we assure a large variety of atomic neighborhoods with forces ranging from 10 −4 to 5 eV/Å. Finally, during the manual compression of the nanoparticle and when computing the forces for diatomic molcules, it may happen that some atoms are very close thus leading to very large forces. In the χ 2 , those large forces would contribute a lot while being very unlikely to emerge in a realistic dynamics. Therefore, before performing the fitting procedure, the database is filtered out to remove cases where the forces are lower than 5 eV/Å. Such value was chosen large enough to keep some part of the repulsive interaction but not too large to avoid a fitting that is focused only on the repulsive part.
Within the fitting procedure, the database is randomly divided in a training (95%) and a test (5%) sets.

III. RESULTS
A. Influence of the descriptor space First, five different types of descriptors were tested. In particular, we used three functions that are "peak" functions ie. Gaussian, Lorentzian and Log-normal peaks and two functions that are usually employed for orbital calculations and that diverge at short distance ie. Slattertype (STO) and Gaussian-type (GTO) orbital [Please see Table I]. Then, the LassoLars method is employed to obtain five different interaction potentials using α = 10 −7 . For the three-body and the N-body interactions, we used respectively l = [1, 2, 3, 4, 5] and m = [4,5,6,7]. Altogether, we have around a thousand available descriptors which include Au-Au, Fe-Fe and Fe-Au interactions for each type of descriptors. In Fig. 2.a, we show the fitting error measured as the root mean square error (RMSE) for the five different types of descriptors. RMSE measured on training and test data sets are similar which means that at this stage, no overfitting is observed. In addition, it appears that the two orbital functions are not as good as the peak functions although the STO still give an RMSE equal to 0.17 eV/Å. Gaussian and Lorentzian descriptors are able to reproduce the forces with an RMSE respectively equal to 0.13 and 0.14 eV/Å. Such values are similar to what is obtained with the generally employed MLIP methods including neural networks, Gaussian approximation method and linearized potentials [61]. In Fig. 2.b, the number of non-zero coefficient is plotted for the five different types of descriptors. In the cases of GTO and STO functions, much fewer descriptors were selected in comparison to the peak functions. In overall, it remains that the LassoLars algorithm allows one to drastically decrease the number of employed descriptors with respect to the number of available descriptors.
In addition, before studying the transferability issues, we wish to illustrate a second advantage of using the Las-soLars algorithm which consists in having a penalizing parameter that controls both the accuracy and the complexity of the obtained potential. According to Fig. 3, by increasing α, the number of non-zero coefficients and the computational cost can be reduced at the expense of  increasing the RMSE. As such, with the LassoLars algorithm, one can simply choose which degree of accuracy or complexity is required for their usage. Finally, the presence of a plateau for the smallest values of α shows that the LassoLars regression only selects relevant descriptors thus reducing the potential complexity. Similar to what was obtained previously using α = 10 −7 , we note that: (1) the 3 peak functions are the most accurate and behave similarly and (2) the STO function gives slightly higher RMSE yet with much fewer non-zero coefficients. Fig. 3 evidences that the LassoLars algorithm gives the ability to finely control the complexity of the potential at the expense of the accuracy on the DFT database. In the following, we will test how this complexity can influence the MLIP transferability.

B. Complexity vs Transferability
For that purpose, three additional morphologies of gold-iron nanoparticles were designed: two Janus and one core-shell [see Fig. 4.a]. Being able to accurately retrieve the interactions in those structures is a difficult test for the MLIP as the training set did not posses any of those demixed structures. Fig. 4(b) shows the corresponding RMSE on the forces without having trained the potential on these structures and using only Gaussian functions. Fig. 4(c) shows that most of the errors are located at the gold/iron interface which was not included in the training database. Surprisingly, the RMSE behavior is nonmonotonic with a minimum located for the three structures around α = 8 × 10 −6 which corresponds to 90 nonzero coefficients. More specifically, when transferring the obtained potential to Fe100Au100 Janus nanoparticles, the RMSE obtained with α = 10 −7 is 30% higher than with the less complex potential that was obtained with α = 8 × 10 −6 . Therefore, while increasing the complexity of the potential leads to a better agreement within the training database, it does not necessarily lead to an improvement of the RMSE in unlearned structures using in our case different chemical ordering (ie. Janus and Core- shell instead of alloy nanoparticles). Our challenging test demonstrates that precautions should be made when using machine-learning approaches and that increasing the complexity does not automatically lead to a better overall potential.
According to Fig. 5, the non-monotonic behavior that was highlighted when using only Gaussian functions is also observed with the two other peak functions ie. Lorentzian and Log-normal. We also note that, GTO is again very inaccurate and should most probably be avoided for usage as descriptor. Finally, the STO functions while being less accurate on the training database seems to give in overall better results in terms of transferability.
Finally, a combination of three different descriptors (Gaussian, Gaussian-type orbital and Lorentzian peak) is tested in order to employ simultaneously two different types of peak functions along with a function that diverge at short repulsion. STO was chosen for its remarkable ability to decrease the number of non-zero coefficient while Lorentzian peaks also showed slightly less non-monotonocity. On the one hand, regarding its fitting performance, this combination gives an RMSE similar to that obtained previously yet with fewer non-zero coefficients [See Fig. 5]. Having these two additional functions gives more flexibility in the descriptor space and fewer functions can therefore be selected for the same accuracy. On the other hand, for the transferability towards the unlearned structures, the combination gives RMSE that are comparable to the peak functions and even better in two cases (ie. Fe110Au111 and Core-Shell). More importantly, it allows for a further reduction of the nonmonotonicity. We note that being able to combine three different types of descriptors at the same time is an additional advantage of using a constrained linear regression scheme such as LassoLars.

C. Quality of the potential
Even if the aim of the paper was not to obtain an allpurpose MLIP for gold-iron interactions, we still wish to asses the quality of the obtained potential. Regarding the force accuracy on the learned database, Fig. 6(a.b) shows the correlation plot obtained with two different values of α. For the RMSE, we obtained in both cases values of 0.14eV/Å when using the three descriptors at the same time. Such value is on par with most of the currently employed MLIP methods [27] and by comparison, the EAM potential that was recently developed for Au-Fe nanoparticle [62] gives a value of 1.4 eV/Å. Our MLIP is thus already a drastic improvement in force evaluation for the studied nanoparticles [See Fig.6.c].
Furthermore, additional simulations with the MLIP were performed to check some properties in the bulk phase although it was not included in the training set. Simulations were carried out using the large-scale molecular dynamics software LAMMPS [63] in which the proposed MLIP was implemented. Periodic boundary conditions were employed and the different minimization runs were performed down to a net force of 10 −6 eV/Å. We measured eight different lattice constants (pure iron bcc, pure gold fcc, alloys with 25%, 50% and 75% of gold in both bcc and fcc phases). In addition, our fitting did not include any energy. Therefore, to take into account the atomic energy, the MLIP energies are shifted in order to match the cohesive energy of pure iron and gold most stable states ie. bcc and fcc respectively. Then, we also measured the cohesive energy for each alloying structures. Results are compared in Fig. 6.(d,e) to DFT calculations that were previously obtained. [47] The errors are lower than 3% for lattice spacing and 12% for cohesive energy which is already satisfying considering that in the database, we employed alloying proportions that are much closer to 50% and only used nano-crystals. Therefore, being able to reach such small relative errors even for those extreme alloying proportions (25% and 75%) and in addition with bulk structures is an encouraging result for our MLIP.
Finally, we measured phonon dispersion curves using supercell approach implemented in PHONOPY [64] [See Fig.6(e,f)]. In practice, the dynamical matrix was obtained by moving each symmetrically independent atoms by 0.01Å. We used a supercell of size 4×4×4. The agreement between DFT and MLIP curves is not as good as what is usually obtained in MLIP works [17,65] but it remains qualitatively satisfying if one considers that our MLIP was not trained on any bulk structures.
Even if our potential has not been designed to reproduce bulk properties, these results are already very encouraging. Obtaining an accurate MLIP potential that is transferable to any phase and/or structure is a considerable challenge for multi-component systems and would require to carry out additional DFT calculations to build a bigger database including bulk structures but also interfaces. Besides, in order to target a specific application, one should also perform "on-the-fly" optimization of the potential as proposed by Jinnouchi et al. [38,39]. However this is not the main purpose of this paper, which focuses on shedding light onto the relationship between complexity and transferability in machine learning force fields. It remains that our current MLIP potential may be used as a first step when studying bimetallic Fe-Au nanoparticles.

IV. DISCUSSION AND CONCLUSION
To summarize, this work aims at measuring transferability issues that can occur when using MLIP in untrained structures. To begin, we presented the employed MLIP method that includes a linearized potential and a penalizing regression scheme. Then, we discussed the influence of the descriptor space and showed that although the three different peak functions behave similarly in terms of accuracy and number of non-zero coefficients, the repulsive functions lead to a worst accuracy but with a lower number of non-zero coefficients. Next, we demonstrated that by using the LassoLars algorithm instead of the previously employed linear regression scheme, one can finely tune the complexity of the potential along with its accuracy. This is because the penalizing parameter α allows for turning off most of the initially proposed descriptors thus controlling the overall complexity of the potential. With this ability in hand, we measured transferability issues using three unlearned structures that are qualitatively different from what was considered in the training database. We showed that while the accuracy on the trained structures decrease monotonically as the value of α is decreased, it is not the case for those untrained structures. Indeed, when using only one type of descriptor, it exists an optimal value of α that allows for the best transferability. Finally, we introduced a way to overcome this transferability issue which consists in using different types of descriptors simultaneously. This again shows an other advantage of using the LassoLars algorithm which is able to actively select the most appropriate descriptors. Finally, we computed some properties of the Fe-Au in bulk and showed that the obtained potential is already qualitatively satisfying. But, before being able to really our potential from practical applications, we plan to improve further it by adding bulk and interface DFT calculations within the database and by implementing "on the fly" learning. As a perspective, the obtained MLIP will be further improved and then employed to study nucleation in core-shell FeAu nanoparticles as observed in experiments. [41][42][43][44][45] Before closing, we wish to discuss two additional points.
First, the very intuitive MLIP expression that was employed here allows us to give some insights on the nature of the interactions. Here, we focus on the combined MLIP where Gaussian, STO and Lorentzian peak were used simultaneously. We can indeed distinguish between each descriptors (Gaussian, STO, Lorentzian). For that purpose, we compute the ratio between the absolute value of the forces given by each descriptor and the sum of the absolute values of the three descriptors and then perform an average over each force components (x,y,z) of all of the atoms within the training database (alloy, pure iron and pure gold nanoparticles). Fig. 7.a shows that the preponderant functions are different depending on the considered interactions thus highlting the advantage of using simultaneously the three types of descriptors. More interestingly, the same can be done in order to distinguish between two-body, three-body and N-body contributions [see Fig. 7.b]. In our case, the two-body and three-body contributions are more important than the N-body contributions. This may explain why the previously employed EAM potentials that do not possess any explicit angular contributions could not accurately compute the forces.
Moreover, we would like to raise an additional implication of our work in which the transferability issues were measured and connected to the complexity of the potential. Indeed, as previously discussed, when using the LassoLars regression scheme, the complexity of the potential can be adjusted using the penalizing parameter. In the alternative MLIP methods, the same is done by modifying (1) the number of neurons and hidden layers in the case of neural-network potential [5] and (2) the number of selected configurations after sparsification in the case of gaussian approximation model [9,32]. For some users of these techniques, the rule of thumb may be to use these adjusting parameters in order to increase the complexity of the potential which necessarily improves the accuracy on the learning database. Yet, our work indicates that transferability issues should be expected by such operation. In this section, we wish to further motivate the choice of our regression scheme by working on a case study where we will compare LassoLars with two commonly employed linear regression schemes naming Ridge and the coordinate descent Lasso. For this test, the database is generated with an equimolar mixture of binary Lennard-Jones particles thus allowing to directly verify the obtained MLIP. With such a binary system, an additional challenge for the fitting algorithm is to distin-guish self-species and cross-species interactions. In practice, positions and forces were measured for 50 configurations of 64 atoms in the liquid regime. Regarding the basis of descriptors, only two-body interactions were considered and we used 17 Lennard-Jones functions with different distance parameters including those in the original simulations. All of the four employed methods manage to retrieve a linear combination of Lennard-Jones functions that matches the original interactions. However, it appears that only LassoLars can find the correct coefficients ω n setting all coefficients to 0 except those of the original interactions [see Fig. 8]. Such result shows the advantage of using LassoLars instead of the commonly employed linear regression methods.

B. Numerical implementation of the Φ-LassoLars method
In this section, we give some additional details on the described MLIP. First, obtaining an MLIP using the Φ-LassoLars method consists in two steps. In the first step, a homemade C++ parallelized code using OpenMP was developed to construct a matrix: • Each columns of the matrix designates a specific descriptor which can be 2B, 3B or N B for all the functions and considered values for their parameters described in Table 1.
• Each rows of the matrix correspond to the force on a given direction, atom and structure.
In the second step, a python code concatenates the obtained matrices for each structures and read the associ-ated DFT forces. The same python code finally employs the LassoLars method as implemented in the sklearn package to obtain the linear coefficients associated to each columns of the matrix. Then, in order to use the obtained MLIP, the same C++ code is employed to read the obtained coefficients and generate input files for LAMMPS simulations. Those consists on three parts: 1. For the 2B interactions, we directly add all of the selected linear contributions and generate a table file that can be read by LAMMPS using pair style table.
2. For the 3B interactions, we build a homemade routine that is added to LAMMPS and use an personalized input file.
3. For the NB interactions, we use a python code based on atsim.potentials [66] to generate EAMlike files that can be directly read by LAMMPS.
Finally, the pair style hybrid/overlay is employed to combine all the contributions.
Regarding the computational timing for building the MLIP, a matrix for a structure of 64 atoms containing all of the functions (2B, 3B, NB for all values in Table 1) for one type of descriptors is obtained in approximately 3 minutes. This process can be parallelized since each structure can be treated independently. Then, after reading all of the input matrices, a LassoLars fitting takes less than 20 seconds when using only one type of functions and less than 2 minutes when using simultaneously three types of functions. Results are obtained on one Intel E5-2650 processor.
ACKNOWLEDGEMENT JL acknowledges financial support of the Fonds de la Recherche Scientifique -FNRS. Computational resources have been provided by the Consortium des Equipements de Calcul Intensif (CECI), by the Fédération Lyonnaise de Modélisation et Sciences Numériques (FLMSN) and by the Regional Computer Center CALMIP in Toulouse (grant n • p1141). JL thanks David Tew for having introduced him to the machine-learning potential problem and C. Patrick Royall for supervising the early method development. JL is also grateful to Francesco Turci, Joshua F. Robinson and Atsuto Seko for fruitful discussions. Authors are finally grateful to Abdul-Rahman Allouche and Albert P. Bartók for proofreading the manuscript.