PyXtal FF: a Python Library for Automated Force Field Generation

We present PyXtal FF, a package based on Python programming language, for developing machine learning potentials (MLPs). The aim of PyXtal FF is to promote the application of atomistic simulations by providing several choices of structural descriptors and machine learning regressions in one platform. Based on the given choice of structural descriptors (including the atom-centered symmetry functions, embedded atom density, SO4 bispectrum, and smooth SO3 power spectrum), PyXtal FF can train the MLPs with either the generalized linear regression or neural networks model, by simultaneously minimizing the errors of energy/forces/stress tensors in comparison with the data from the ab-initio simulation. The trained MLP model from PyXtal FF is interfaced with the Atomic Simulation Environment (ASE) package, which allows different types of light-weight simulations such as geometry optimization, molecular dynamics simulation, and physical properties prediction. Finally, we will illustrate the performance of PyXtal FF by applying it to investigate several material systems, including the bulk SiO2, high entropy alloy NbMoTaW, and elemental Pt for general purposes. Full documentation of PyXtal FF is available at https://pyxtal-ff.readthedocs.io.


Introduction
Molecular dynamics (MD) simulations have been used routinely to model the physical behaviors of many complex systems [1,2,3]. The accuracy of the simulations is highly dependent on the underlying potential energy surface (PES) of the system. In principle, MD simulations can be based on ab initio quantum-mechanical [4] or classical force field methods. The ab initio MD (AIMD) simulations usually employ density functional theory (DFT) approximation [5], which can provide a reliable representation of the system. Despite the accuracy, DFT simulation can be extendable only to a few hundreds of atoms at a few picoseconds. This is due to solving the Kohn-Sham equation requires thousands of quantum-mechanical calculations that are scaled at O(N 3 ) with respect to the number of atoms N. In consequence, simulating the structural evolution of many of complicated systems in DFT remains demanding in spite of the remarkable progress in computational facilities and efficient algorithms. This bottleneck in the DFT method is likely to persist in the foreseeable future. On the other hand, the classical MD method can model large systems at long-time scale, countering the unwavering issue of the DFT method. A great amount of efforts has been dedicated in developing PES using the classical method [6,7,8,9,10]. The reconstruction of the PES is usually based on simple analytical functions related to the scalar properties of the system. The class force fields can be applied to comprehend the qualitative behavior of the system. However, they are often inadequate to describe the quantitative properties of the system.
Recently, machine learning methods have been widely applied to resolve the dilemma in comprising between accuracy and cost [11]. The machine learning potential (MLP) are trained by minimizing the cost function to attune the model to deliberately describe the ab initio data. The cost of atomistic simulation is orders of magnitude lower than the quantum mechanical simulation, allowing the system to be scaled up to 10 5 -10 6 atoms [12,13]. Among many different ML models, two regression techniques are becoming increasingly popular in the materials modelling community. They include the neural networks and Gaussian process regressions. The neural networks approach has an unbiased mathematical form that can adapt to any set of reference points through an iterative fitting process given "enough" training data. The first well accepted neural networks potential (NNP) was originally applied to elemental silicon system by Behler and Parrinello [14], which demonstrated that the NNP was able to reproduce the energetic sequences of many silicon phases, as well as the radial distribution function of a silicon melt at 3000 K from DFT simulation. To gain a better predictive power, they also proposed to use a series of symmetry functions (see section 2.1.1), instead of the Cartesian coordinates, as the descriptors to represent the atomic environment. Since then, many attempts have been undertaken to improve the capability of neural networks approach [15]. The accomplishments of neural networks approach have been extended to multi-component [16,17] and organic [18] systems. In addition, Gaussian Approximation Potential (GAP), in conjunction with the bispectrum coefficients of atomic neighbour density (see section 2.1.3)), was first introduced to model the carbon, silicon, germanium, iron, and gallium nitride [19]. GAP was further enhanced by replacing bispectrum coefficients with smooth overlapping power spectrum coefficients with explicit radial basis [20]. Similar to GAP, Thompson et al. [21] developed (quadratic) Spectral Neighbor Analysis Potential (SNAP) method based on the Taylor expansion of bispectrum coefficients. In addition, linear regression model based on the moment tensor-comparable to atomic environments inertia tensors-as the descriptor [22] was also demonstrated to be a competitive approach. Many applications based on different MLP models have shown that machine learning potentials work remarkably well in different types of atomistic simulations [23,24,25,26,27,28].
In the recent years, several software packages [16,29,30,31,32,33] were developed to train the MLPs. Among these, the RuNNer [16] is a closed source software for developing NNP, and aenet [34] is mainly written in FORTRAN/C and utilizes atom-centered symmetry functions (see section 2.1.1) as the descriptor. Similar codes, such as the n2p2 package [29] in C++, SIMPLE-NN package [30] in Python/C, and AMP package [31] in Python/FORTRAN, have similar feature as aenet package. SIMPLE-NN leverages the capability of Tensorflow platform-a deep learning GPU-accelerated library, and AMP provides several other descriptors such as Zernike and bispectrum components. Our recent works also suggested that NNP can be developed using bispectrum and power spectrum components as the descriptor while training on energy, forces, and stress simultaneously [35,36]. Moreover, DeepPot-SE [37] and SchNetPack [33] packages introduce additional filters to the descriptor such as distance-chemical-species-dependent filter and continuous convolutional filter, respectively, prior to the deep learning model.
In this paper, we present PyXtal FF-an open-source package in Python scripting language-for developing MLP such as NNP and generalized linear potential (GLP). The objective of PyXtal FF is to provide handy user-interface in developing MLP with training of energy, force, and stress contributions simultaneously. PyXtal FF creates MLP based on atom-centered descriptors such as (weighted) atom-centered symmetry functions [11], embedded atom density [38], SO(4) bispectrum coefficients [19], and smooth SO(3) power spectrum [20]. Finally, we will demonstrate the usage of the current features of the package with SiO 2 [30], high entropy alloy [39], and elemental Pt [37] as examples.

Theory
In this section, we will provide in-depth discussions of the two main ingredients in creating MLP: atom-centered descriptor and regression technique. The construction of the total energy of a crystal structure can be written as the collections of atomic energy contributions, in which is a functional (E ) of the atom-centered descriptor (X i ): Specifically, the functional represents regression techniques such as neural networks or generalized linear regressions. Since neural networks and generalized linear regressions have well-defined functional forms, the analytic derivatives can be derived by applying the chain rule to obtain the force at each atomic coordinate, r m : Force is an important property to accurately describe the local atomic environment especially in geometry optimization and MD simulation. Finally, the stress tensor is acquired through the virial stress relation: where ⊗ is the outer product. According to Eqs. 2 and 3, one needs to compute the energy derivative ∂E ∂X and the derivatives of descriptor X with respect to the atomic positions. For a structure with N atoms and L descriptors per atom, the energy derivative is a 2D array of [N, L]. The force related derivative (dxdr) can be best organized as a 4D array with the dimension of [N, N, L, 3]. Note that dxdr[i, j, :, :] is zero when the i-j atomic pair has a distance larger than the cutoff distance. Thus, it may become a sparse array when the structure has a large number of atoms. Correspondingly, one can easily derive the 5D rdxdr array by multiplying r to each dxdr according to the outer product. In Python, one can simply compute the forces and stresses based on the following Einstein summation.  [3 , 3] """ force = -np . einsum ( " ik , ijkl -> jl " , dedx , dxdr ) stress = -np . einsum ( " ik , ijklm -> lm " , dedx , rdxdr ) Listing 1: Force and stress computation in Python.

Atom-centered Descriptors
Descriptor-a representation of a crystal structure-plays a critical role in constructing reliable MLP. If the MLP is directly mapped from the atomic positions or the Cartesian coordinates, it can only describe systems with the same number of atoms due to the fixed length of the regression input. In addition, Cartesian coordinates are poor descriptors in describing the structural environment of the system, restricted by the periodic boundary conditions. While the total energy of the structure remains the same by translation, rotational, or permutation operations, the atomic positions will change. Several types of descriptors have been developed in the past few years [40]. For example, Coulomb matrix has been widely used due to its simplicity. Coulomb matrix encompasses self interaction based on the nuclear charge and Coulomb repulsion between two nuclei [41,42]. Logically, the Coulomb matrix can be upgraded for periodic crystals through Ewald summation that includes long range interaction calculated in reciprocal space. In addition, many-body tensor representation-derives from Coulomb matrix while related to bag of bonds which corresponds to different types of bonding in molecular systems-can be used for both finite and periodic systems when interpretability/visualization is desirable [43]. These descriptors have been widely used to model the molecules.
In the atom-centered descriptors, one usually needs to consider the neighboring environment for the centered atom within a cutoff radius of R c . To ensure the descriptor mapping from the atomic positions smoothly approaching zero at the R c , a cutoff function ( f c ) is included to every mapping scheme: where R is distance. The cutoff function is zero at R c and the intensity decreases as R approaches R c . Consequently, the derivative of the cutoff function is: One should heed of the importance of the vanishing derivative of cutoff function at R c , which is important in describing the force. By definition, there is no discontinuity as the slope decays to zero at R c . In the following, we will introduce four types of atomcentered descriptors in details. The corresponding derivative terms can be found in Appendix A, Appendix B and our recent work [36].

(Weighted) Atom-centered Symmetry Functions (G)
The atom-centered symmetry functions (ACSFs) are the very first types of descriptors used in the MLP development [14]. In general, there are two classes of ACSFs: radial and angular symmetry functions [11]. The radial symmetry function or G (2) describes the radial distribution of the atomic environment, and the angular symmetry functions, G (4) and G (5) , account for the three-body angular distribution of atoms in the neighborhood. The G (2) is expressed as the sum of the radial distances between the center atom i and the neighbor atoms j as follow: Here, G (2) value is controlled by the width (η) and the shift (R s ). G (4) and G (5) symmetry functions are a few of many ways to capture the angular information via three-body interactions (θ i jk ). As the structures are constraint by the periodic boundary condition, a three-body periodic description such as cos(θ i jk ) is used. The explicit form of G (4) and G (5) are: ζ determines the strength of angular information. The degree of ζ is normalized by 2 1−ζ for unvarying the values of G (4) and G (5) symmetry functions due to ranges of ζ. λ values are set to +1 and -1, for inverting the shape of the cosine function. The difference between G (4) and G (5) symmetry functions is in the interactions between the neighbors j and k. The modification in G (5) symmetry function yields in dampening value of G (5) , which can be beneficial in representing larger atomic separation between the two neighbors. Clearly, the number of ACSFs will grow depending on chemical species as the separations of chemical species are needed. For instance, in a binary AB system, the number of G (2) ACSFs on specie A need to double to distinguish A-A and A-B pair interactions. For G (4) , three different triplets A-A-A, A-A-B, B-A-B (where the middle position denotes the center atom) will be needed. To avoid this unpleasant growth, one can apply a weighting parameter based on the chemical species when counting these atomic pairs and triplets. One popular choice is simply to use the atomic number as the weighting parameters. Hence, Gastegger and coauthors proposed the weighted version of ACSF [44], in which each component of the radial and angular symmetry functions in Eqs. (6,7,8) can be multiplied by the followings: the weighted ACSF: where Z j , Z k represents the atomic number of neighboring atom j and k.
To obtain a satisfactory MLP model, one has to choose a set of parameters to construct the (w)ACSF descriptors, which may require some demanding human intervention [44,37,33]. As mentioned, the choice of λ is straightforward. In general, ζ takes the value of 1. Increasing ζ focuses on the strength of the angular information in region close to 0 • and 180 • , and decreasing it will weaken the contribution of angular information at around 90 • . Since the exponential term has larger effect on the symmetry functions, the selection of η and R s can be more elaborate. Several routines are available in the literature [44,38] by fixing η while varying R s or vice versa.

Embedded Atom Density (ρ)
Embedded atom density (EAD) descriptor [38] is inspired by embedded atom method (EAM)-description of atomic bonding by assuming each atom is embedded in the uniform electron cloud of the neighboring atoms [6,45]. In EAD, the electron density is modified by including the square of the linear combination the atomic orbital components: where Z j represents the atomic number of neighbor atom j. L max is the quantized angular momentum, and l x,y,z are the quantized directional-dependent angular momentum. For example, L max = 2 corresponds to the d orbital. Lastly, the explicit form of Φ is: According to quantum mechanics, ρ follows the similar procedure in determining the probability density of the states, i.e. the Born rule. EAD can be regarded as an alternative version of ACSF without classification between the radial and angular term. The angular or three-body term is implicitly incorporated in when L max > 0 [38]. By definition, the computation cost for calculating EAD is cheaper than angular symmetry functions by avoiding the extra sum of the k neighbors. In term of usage, the parameters η and R s are similar to the strategy used in the Gaussian symmetry functions, and the maximum value for L max is 3, i.e. up to f orbital.

SO(4) Bispectrum (B)
The SO(4) bispectrum components [19,20,21] are another type of atom-centered descriptor based on the harmonic analysis of the atomic neighbor density function on the 3-sphere. The atomic neighbor density function is given by [20]: Where w i is a species dependent weight factor and f c is a cutoff function. The cutoff function is f c is introduced to ensure that the atomic neighbor density function goes smoothly to zero at the cutoff. Then we map the atomic neighbor density function from 3-D euclidean space to another 3-D space, the surface of a four dimensional hypersphere: where r 0 is a parameter and the polar angles are defined by: The Winger-D matrix elements (D j m ,m ) are the harmonic functions on the 3-sphere, therefore an arbitrary function defined on the 3-sphere can be expanded in terms of Wigner-D matrix elements. Here we expand the atomic neighbor density function on the 3-sphere in terms of Wigner-D matrices.
Where the expansion coefficients c j m ,m are given by the following inner product Finally, the SO(4) bispectrum components can then be calculated using third order products of the expansion coefficients: where C is a Clebsch-Gordan coefficient.

Smooth SO(3) Power Spectrum (P)
The Smooth SO(3) Power Spectrum components were been proposed to describe the atomic local environment [20]. In contrast to the SO(4) bispectrum components, the Smooth SO(3) power spectrum is based on an alternative atomic neighbor density while also expanded on the 2-sphere and a radial basis. The alternative atomic neighbor density is defined in terms of Gaussians as follows: Then the atomic neighbor density function is then expanded in terms of spherical harmonics and a radial basis g n (r) as shown in Eq. 15: Where the expansion coefficients c nlm are given by where I l is a modified spherical Bessel function of the first kind. A convenient radial basis for this purpose, g n (r), consisting of cubic and higher order polynomials, orthonormalized on the interval (0, R c ) has been suggested by Bartok [20].
where W n,α are the orthonormalization coefficients given by the relation to the overlap matrix S by W = S −1/2 and And the elements of the overlap matrix S are given by and finally, the Smooth SO(3) power spectrum is given by

Regression Models
Here, we discuss the regression model, i.e., the functional form (E ) presented in Eq. 1. Each regression model is speciesdependent, i.e. as the the number of species increases, the regression parameters will increase. For the sake of simplicity, we will explanation the regression models for the single-species system.
In any regression model, the objective is to minimize a loss function which describes the discrepencies between the prediction and true reference values (including energy, force, and stress tensors) for each atomic configuration in the training data set.
where M is the total number of structures in the training pool, and N atom i is the total number of atoms in the i-th structure. The superscript Ref corresponds to the target property. β f and β s are the force and stress coefficients respectively. They scale the importance between energy, force, and stress contribution as the force and stress information can overwhelm the energy information due to their sizes. Additionally, a regularization term can be added to induce penalty on the entire parameters preventing overfitting: where α is a dimensionless number that controls the degree of regularization.
Clearly, one has to choose differentiable functional as well as its derivative due to the existence of force (F) and stress (S ) contribution along with the energy (E) in the loss function. In the following sections, generalized linear regression and neural network regression will be introduced.

Generalized Linear Regression
This regression methodology is a type of polynomial regression. Essentially, the quantum-mechanical energy, forces, and stress can be expanded via Taylor series with atom-centered descriptors as the independent variables: where N is the total atoms in a structure. γ 0 and γ are the weights presented in scalar and vector forms. Γ is the symmetric weight matrix (i.e. Γ 12 = Γ 21 ) describing the quadratic terms. In this equation, we only restricted the expansion up to polynomial 2 due to to enormous increase in the weight parameters.
In consequence, the force on atom j and the stress matrix can be derived according to Eqs. (2,3), respectively: Note that the energy, force, and stress share the weights parameters {γ 0 , γ 1 , ..., γ N , Γ 11 , Γ 12 , ..., Γ NN }. Once the energy, force and stress tensors are known, the derivative of the loss function can be evaluated. Finding the zero derivative of loss function (Eq. 19) in linear regression is equivalent to solve a set of linear equations of Ax = b. In PyXtal FF, we construct such A matrix and use the numpy.linalg.lstsq solver to obtain the least-squares solution. Compared to the linear regression, neural networks provides more flexible functionals to fit a large data sets. Figure 1 shows a schematic diagram based on neural networks training. Prior to the neural networks architecture, the atom-centered descriptors are mapped based on the atomic environment of a structural configuration as discussed in the previous section. These descriptors serve as the input to the neural networks architecture and are arranged in the first layer as shown in Figure 1b. The next layers are the hidden layers. Neural networks can simply cast more weights parameters as needed through increasing number of hidden layers and/or hidden layers nodes without the increasing number of descriptors. The nodes in the hidden layers carry no physical meaning.

Neural Network Regression
For each of the hidden nodes, activation functions such as Tanh and Sigmoid functions are frequently used in our NNP implementation. While ReLU as an activation function is extremely popular in image processing, we believe ReLU is not an appropriate choice in constructing MLP, due to the function carries discontinuity at zero. These nodes are connected via the weights and biases and propagate in forward direction only. In the end, the output node represents the atomic energy. A mathematical form to determine any node value can be written as: The value of a neuron (X l n i ) at layer l can determined by the relationships between the weights (W l−1,l n j ,n i ), the bias (b l−1 n i ), and all neurons from the previous layer (X l−1 n j ). W l−1,l n j ,n i specifies the connectivity of neuron n j at layer l − 1 to the neuron n i at layer l. b l−1 n i represents the bias of the previous layer that belongs to the neuron n i . These connectivity are summed based on the total number of neurons (N) at layer l−1. Finally, an activation function (a l n i ) is applied to the summation to induce non-linearity to the neuron (X l n i ).X n i at the output layer is equivalent to an atomic energy, and it represents an atom-centered descriptor at the input layer. The collection of atomic energy contributions are summed to obtain the total energy of the structure. At the end, the total energy, forces ans stress tensors are compared to the reference values (see Eq. 19). This process is called forward propagation.
Similar to the linear regression, one needs to obtain a set of weight parameters to minimize the loss function. In NN architecture, the gradient of loss with respect to the weight parameters can be conveniently done by the backpropagation algorithm. Hence, a number of optimization algorithms can be applied here to update the weights iteratively, until the optimal solution is found.

PyXtal FF Workflow
In this section, we discuss about development of PyXtal FF and its philosophy. PyXtal FF is written in Python. Presently, the package is equipped with two regression models and four types atom-centered descriptor, as explained in Section 2. These regression models and atom-centered descriptors are easily extendable without changing the core user-interface features. Figure 2 represents the workflow of PyXtal FF.
First, PyXtal FF utilizes the Atomic Simulation Environment (ASE) package [46] to parse the DFT data assembled in several formats, including ASE database, JSON, extended XYZ and the VASP OUTCAR formats. Further, ASE is also employed to compile the atomic neighborhood of each atom in the unit cell based on the periodic boundary conditions within a cutoff radius. After the neighboring data are gathered, it will compute the user-defined type of descriptor. The computation follows the theory described in Section 2.1 utilizing NumPy-a Python library for scientific computing [47]. For every structure, the descriptor calculator will return the descriptors and the force and stress related derivatives. Eventually, the descriptors represent as the independent variables in the regression models to obtain the energy, and the derivative terms are needed to compute the force and stress values.  For the regression, the Pyxtal FF supports two models, linear (quadratic) regression and neural networks. Here we focus on the latter since it is a more popular workforce for MLP development. The neural network regression is powered by Py-Torch [48]-an open-source deep learning framework based on automatic differentiation [49]. Currently, we support three optimization algorithms for training: Limited Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) [50], adaptive Moment Estimation (Adam) [51], and stochastic gradient descent (SGD) with momentum [52]. The L-BFGS method with approximated line search is the recommended optimizer when the training data is relatively small, as the quasi-Newton method is generally more stable and finds local optima more efficiently. With larger training datasets, however, the L-BFGS method is memory demanding, and one can seek to use first-order methods such as Adam or SGD with momentum. Both SGD and Adam algorithms are usually done in mini-batches, where the the gradients for each weight update are calculated based on a subset of the entire training data set. Training in mini batches can reduce the variance of the parameter updates leading to stable convergence. If needed, the training can also be done in graphical processing units (GPU) mode.
In addition to the force field generation, PyXtal FF also pro-vides the supports to utilize the trained models for several types of atomistic simulations, including geometry optimization, MD simulation, physical properties prediction, and phonon calculation. These features are managed by ASE calculator, in which the MLP potential passes the energy, forces, and stress tensors to the calculator and ASE performs the relevant atomistic simulations. Since these simulation will be powered by Python, we only recommend to use them for light weight simulations. In the near future, we are going to work on interfacing the trained MLP with LAMMPS [53] to enable the truly large scale atomistic simulation.

Example Usage
PyXtal FF can be used as stand-alone library in Python scripts. A PyXtal FF example code to train Pt model is shown in the following listing The atom-centered descriptors and the model are described in dictionary. The dictionary keys determine the necessary command for the code and are made as intuitive as possible. Most of keys follow the hyperparameters in the section 2. By default, PyXtal FF will use neural networks as the regression algorithm. Here, PyXtal FF will look for train.json and test.json files as the training and test data set, respectively.
After the training is complete, the trained model is saved in the result folder (Pt-Bispectrum) with a name of 16-16checkpoint.pth, in which 16-16 denotes two hidden layers with 16 nodes each. PyXtal FF provides a built-in interface with the ASE code [46], in which one can use the model to perform different types of calculations through ASE. Below is a simple example to perform the geometry optimization on a Pt bulk crystal (Pt bulk.cif ) based on the trained model from the listing 2.

Applications
In this section, we choose three different examples to illustrate the power of PyXtal FF and benchmark the performances of different descriptors. While the linear regression scheme is also supporte by PyXtal FF, we will focus on the NNP model as it provides more flexibility. The examples to be investigated mainly differ by the source of datasets, including (1) single SiO 2 from pure MD simulation; (2) collective data set of NbMoTaW from various approaches; (3) elemental Pt consisting of bulk, surfaces and clusters from different runs of MD simulations.

Binary System
The SiO 2 data set [30] was generated by the DFT method within the framework of VASP [57], using the generalized gradient approximation Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [58]. The kinetic energy cutoff was set to 500 eV, and the energy convergence criterion is within 10 meV/atom. The MD trajectories are taken at different temperatures including liquid, amorphous and crystalline (αquartz, α-cristobalite, and tridymite) configurations. The original data set contain 3,048 SiO 2 configurations (60 atoms per structure). For simplicity, we considered a subset that consists of 1,316 structures. with the goal of to gaining an overview of performances and computation costs for each descriptor. Below gives the parameters to define each descriptor. In short, we choose a universal cutoff value of 4.9 Å for all descriptors. Each descriptors requires some manual selection of hyperparameters in the real (e.g., η, λ, ζ, R s ) or integer (l max , n max ) space. The ACSF parameters were taken from Ref. [30] which lead to 70 descriptors. In its wACSF version, the number is reduced to 26. For EAD, we chose a similar set of parameters for η and R s , which make 30 descriptors when L max = 2. For SO3 and SO4, only the integer type hypeparameters need to be provided. In this work, we set 40 SO3 descriptors with n max = 4 and l max =3, and 30 SO4 descriptors with l max =4. The neural network regression will be used with two hidden layers with 30 nodes each. Table 1 summarizes the performances of each training after 12000 steps. First, the ACSF-70 set yields the best accuracy in both energy (1.3 meV/atom) and forces (81.2 meV/Å), while the errors in its corresponding wACSF-26 set rise by 60-70% in both energy (2.1 meV/atom) and forces (141.8 meV/Å). On the other hand, the weighted EAD-30 descriptor, supposed to mimic G2 and G4 ACSFs, gives the highest errors (4.0 meV/atom for energy and 300 meV/Å for forces). This may be due to lack of optimization on the hyperparameters. However, it should be noted that the computation of EAD is much faster than ACSF. Therefore, it is worth exploring a systematic approach to obtain the optimum set for EAD. For the two spectral descriptors, SO3-40 seems to outperform SO4-30 while it cost about a similar level of CPU time. In terms of accuracy, SO3-40 (1.4 meV/atom in energy MAE and 115.1 meV/Å in force MAE) is in the middle of ACSF-70 and wACSF-26. Another remarkable advantage of the spectral descriptors is that tuning the hyperparameters is much easier. If one does not want to spend too much time on choosing the hyperparameters, SO3 seems to be a better choice than ACSF. We note that all descriptor computations are based on Python. It is expected that the speed will be much faster when they are implemented in FORTRAN or C languages.

High Entropy Alloy
High entropy alloys (HEAs) are systems that encompass four or more equimolar/near-equimolar alloying elements. It has been shown that HEAs carry many interesting properties such as high hardness and corrosion resistance [59,60]. Due to the high computational cost of DFT method, HEA serves as a great example in MLP development with PyXtal FF. Here, we will use NbMoTaW HEA as an example [39], which are comprised of elemental, binary, ternary, and quaternary systems. Each of the elemental systems has their ground state, strain-distorted, surface, and AIMD configurations. The binary alloys are composed of solid solution structures with the size of 2 × 2 × 2 supercell. Lastly, 300 K, 1000 K, and 3000 K AIMD configurations along with special quasi-random structures establish the ternary and quaternary data points. The total structures used in developing the MLP are 5529 configurations for training set and 376 configurations for test set.
In the original work [39], SNAP based on the linear regression predicted that the MAE values for energies are 4.3 meV/atom and 5.1 meV/atom for the training and test sets, and the MAE values in forces are 0.13 eV/Å and 0.14 eV/Å. The results demonstrated a quite satisfactory accuracy comparable to the quantum calculation. However, it needs to be noted that the energy training in Ref. [39] was based on the comparison of formation energy relative to the elemental solids, which spans from -0.193 to 0.934 eV/atom for the entire dataset, whereas the atomic energy spans from -12.960 to -9.502 eV/atom. Training with the energy in a normalized range can surely reduce the error of fitting. However, this fitting method does not fully solve the force field prediction problem since it relies on some DFT reference data. We attempted to employ the linear regression model to fit only the absolute DFT energy based on the same descriptor as used in Ref. [39], the resulting MAE values are 944 meV/atom and 6.329 eV/Å for energy and force when the force coefficient is 10 −4 . The MAE values of formation energy fitting yield remarkable improvement of 22 meV/atom and 0.243 eV/Å for energy and force, respectively. Despite this improvement, the accuracy is insufficient. Perhaps, it is due to lack of fine tuning of hyperparameters, such as atomic weights and cutoff radii for each species.
To obtain a better accuracy, we decided to fit the absolute DFT energy and forces based on the NNP model. We employed the smooth SO(3) power spectrum as the descriptor, which are formed by n max =4 and l max =3 with 40 components in total up to the cutoff radius of 5.0 Å. The NNP training is executed with 2 hidden layers with 20 nodes for each layer while energy, force, and stress contributions are trained simultaneously. The importance coefficients of force and stress are set to 10 −3 and 10 −4 , respectively. The results of the NNP training is illustrated in Figure 3. The NNP energy MAE values for the training and test sets are 6.69 meV/atom and 7.57 meV/atom, and the NNP force MAE values are 0.14 eV/Å and 0.17 eV/Å. In addition, the MAE value of stress for training set is 0.078 GPa. Our results of energy and force yield worse performance compared to the previous report. Nevertheless, our NNP model offers a more general representation of the DFT PES since it does not rely on any prior reference values.
Furthermore, we calculated physical properties such as elastic constants, bulk and shear moduli, and the Poisson's ratio of the cubic elemental crystals (see Table 2). From the table, the overall performances of NNP in predicting the physical properties are reasonable, except that the C 44 value of Nb is negative. However, this is consistent with the fact that the DFT's C 44 is also significantly lower than other terms. Hence, the negative C 44 is acceptable if one considers the noise of stress data in training. Meanwhile, the C 44 value may be remedied by providing additional training data set focusing on the shearing effect of Nb, increasing the importance of the stress coefficient, or increasing the hidden layer size in the NNP training. In addition, SO(3) descriptor can be conveniently expanded in terms of both radial basis (n max ) and angular momentum (l max ) for achieving better overall accuracy. Compared to crystalline systems, surfaces and nanoparticles generally represent the more challenging cases in MLP training as the nanoparticle contain more versatile atomic environments and more complex PES is expected. Here, we applied the NNP model to a Pt data set [37], which consists of three data types: Pt surface, Pt bulk, and Pt cluster. There are 927 clusters of 15 atoms, and the Pt bulk type consists of 1717 configurations which are composed of 256 atoms. Pt surface are constructed from (001), (110), and (111) surfaces. Respectively, there are 949, 819, and 700 structures which consist of 320, 160, and 320 atoms. In our NNP development, we chose 90% of the total structures randomly as the training set, and the remaining 10% as the test set. The SO(3) power pectrum descriptor with l max = 3 and n max = 4 at radius cutoff of 4.9 Å was used to construct the MLP in the NNP model with two hidden layers with 30 nodes each. Unlike the previous examples, the minibatch scheme with the Adam optimizer was employed. In each iteration, the training process was updated in a batch size of 25 configurations.  In the original literature [37], DeepPot-SE includes MoS 2 slab and Pt clusters on MoS 2 substrate (MoS 2 /Pt). The performance of DeepPot-SE yields satisfactory results. Meanwhile, embedded atom neural networks method can achieves outstanding results using a fraction of the same data set [38]. Both of the methods exploit a large number of neural networks parameters, in the order of 10 4 -10 5 , while the current study only adopts 2191 weight parameters for exemplary purpose. As shown in Table 3, the accuracy from our small neural network model is comparable to that of DeepPot-SE results. Not surprisingly, the group of Pt bulk has the lowest errors with only 1.64 meV/atom for the RMSE in energy and 67 meV/Å in force. On the contrary, the errors on Pt clusters are about 2-4 time higher for both energy and forces. This is expected since the local atomic environments in the clusters are more diverse and thus learning the relation is harder. Nevertheless, the values from this exploratory study is comparable to the results from deep learning models. This example also suggests that a small NNP model with the properly constructed features can be a complementary solution for MLP development.

Conclusion
In conclusion, we introduced PyXtal FF a versatile package for developing MLPs that can perform at the DFT level. Currently, the code allows one to construct the MLPs from four different types of atom-centered descriptors: (w)ACSFs, EAD, SO4 bispectrum, or SO3 power spectrum. Two regression models, the generalized linear regression and neurual networks, are supported to train the MLP by simultaneously fitting the data of energy, forces and stresses from the ab-initio simulation. In particular, we focus on the neural networks potential development. Our software package utilizes PyTorch as the main machinery, which is equipped with neural network models, automatic differentiation, as well as various optimization algorithms. We demonstrated the features of the current PyXtal FF version by three examples on SiO 2 , NbMoTaW HEA, and elemental Pt, respectively. In general, the mean absolute error values of each trained MLPs fall into the range of several meVs/atom in energy and several hundred meV/Å in forces. While training on stress is optional, it is helpful to improve the general accuracy of the model. More importantly, this is crucial to yield better prediction on materials' elastic properties. As such, the MLPs can be applied to investigate the materials properties in greater accuracy than the classical potentials built from the empirical model. Finally, the PyXtal FF is an open source code. We welcome anyone who is interested in MLP development to contribute to this project.
Following the Eq. 6 the derivative with respect to an atom m can be written in the following form: For the periodic system, the computation of ∂R i j ∂r m is straightforward except that one needs to consider one additional case. When i = j, the derivative is always zero. In Eqs. (7,8), the cosine function can be defined as: where r i j is the relative position between atom j and atom i. In the following, the expressions for the derivative with respect to an interacting atom m are: λζ(1 + λ cos θ i jk ) ζ−1 ∂ cos θ i jk ∂r m f c (R i j ) f c (R ik ) − 2η(1 + λ cos θ i jk ) ζ R i j ∂R i j ∂r m + R ik ∂R ik ∂r m + R jk ∂R jk ∂r m f c (R i j ) f c (R ik ) ∂R ik ∂r m (A.5) The derivatives of atomic distances, R i j and R ik , carry the same meaning as in Eq. A.2. The expression of the cosine of triple-atom angle is where δ m j is the Kronecker delta between atom m and j.