Differentiable thermodynamic modeling

A new framework of thermodynamic modeling is proposed by introducing the concept of differentiable programming, where all the thermodynamic observables including both thermochemical quantities and phase equilibria can be differentiated with respect to the underlying model parameters, thus allowing the models learned by gradient-based optimization. It is shown that thermodynamic modeling and deep learning can be seamlessly integrated and unified within this framework. A preliminary successful application is demonstrated for the Cu-Rh system. It is expected that thermodynamic modeling in a deep learning style can increase prediction power of models, and provide more effective guidance for design, synthesis and optimization of multi-component materials with complex chemistry via learning various types of data.

A new framework of thermodynamic modeling is proposed by introducing the concept of differentiable programming, where all the thermodynamic observables including both thermochemical quantities and phase equilibria can be differentiated with respect to the underlying model parameters, thus allowing the models learned by gradient-based optimization. It is shown that thermodynamic modeling and deep learning can be seamlessly integrated and unified within this framework. A preliminary successful application is demonstrated for the Cu-Rh system. It is expected that thermodynamic modeling in a deep learning style can increase prediction power of models, and provide more effective guidance for design, synthesis and optimization of multi-component materials with complex chemistry via learning various types of data.
Various phenomena in materials have their roots in thermodynamics, of which the fundamental laws have been well established very early. However, thermodynamics of materials is far from being well studied, due to the vast number of degrees of freedom including composition, temperature, pressure, strain, external field and dimensionality (e.g., bulk vs nano). In addition, the underlying microscopic mechanisms contributing to macroscopic thermodynamic observables are diverse, including lattice disorder, atomic vibration, electronic excitation, magnetic excitation, etc., making accurate and comprehensive theoretical descriptions difficult. Therefore, developing a sufficiently efficient methodology is highly crucial for thermodynamic modeling to guide design and synthesis of materials more effectively. For this end, there have emerged many encouraging progresses in recent years. One notable direction is the usage of machine learning (ML) techniques 1-13 , the generalizability of which allows predictions based on limited amount of data. So far, the methods along this direction can be classified into two types based on the types of training data and predicted quantities of the ML models. The type-I models are trained on and predict thermochemical quantities 1-8 , while the type-II models are trained on and predict phase equilibria [9][10][11][12][13] . Naturally, one may ask if there can be a cross-type learning, i.e., learning thermochemical quantities from phase equilibria (or learning phase equilibria from thermochemical quantities, which is essentially within type-I, since the former can be derived if the latter are fully determined, thus this type is ignored here). Such type of learning has the following advantages. First, it is quite often that thermochemical data are scarce or not reliable enough, thus training on phase equilibria is the desirable or even the only option. Second, prediction of thermochemical quantities like Gibbs energies allows calculations of not only phase equilibria, but also useful parameters such as thermodynamic factor in diffusion. Third, phase equilibria calculated based on thermochemical quantities are more physics-based compared with direct predictions by ML. Therefore, it is of great practical and scientific interests to develop this new type of ML paradigm for thermodynamics.
In this work, a new framework capable of learning thermodynamic potentials from both thermochemical data and a) Electronic mail: pinweng@andrew.cmu.edu phase equilibrium data is proposed, and it will be shown that thermodynamic modeling and deep learning are seamlessly unified within this framework. This is achieved by introducing differentiable programming (DP) 14 , a programming paradigm where the computation flow can be differentiated throughout via automatic differentiation, thus allowing gradient-based optimization of parameters in the flow. Due to its ability to incorporate physics into deep learning, DP has increasing applications in different fields recently, including molecular dynamics 15 , tensor networks 16 , quantum chemistry 17 and density functional theory (DFT) 18 .
Consider a set of phases {G θ i }. The thermodynamic potential of the phase θ i can be represented as which is a function of the descriptor F(x) based on composition x, external conditions C (e.g., temperature and pressure) with parameters A θ i . Usage of F(x) is also called feature engineering. When F is identity, i.e., F(x) = x, raw composition is directly used. G θ i can be in various forms that are differentiable, such as the conventional one based on polynomials 19,20 and the more ML-oriented one based on deep neural networks 5 .
Once {G θ i } of all the relevant phases are known, each thermochemical quantity g j and phase equilibrium e j can then be calculated, i.e., which are functionals of {G θ i }. The loss function can then be computed as where l is a function measuring the difference between two values, with a common choice to be the squared error l(g j , g j ) = (g j − g j ) 2 . Finally, the parameter set {A θ i } of thermodynamic potentials can be obtained by minimizing the loss function The minimization of the loss function L relies on calculating its gradient ∇ {A θ i } L, which is made possible by DP. Obviously, the most non-trivial part in the above computation flow for differentiation is the phase equilibrium calculation for e j , which generally requires minimization of arXiv:2102.10705v1 [cond-mat.mtrl-sci] 21 Feb 2021 the thermodynamic potential of the whole system, the major subject of research in computational thermodynamics. The minimization can be usually divided into two main steps 21 . The first step is a global minimization where the thermodynamic potential surface is sampled and an initial solution is generated. The second step is refining calculations to obtain the final solution satisfying all the equilibrium conditions.
The scheme of differentiable thermodynamic modeling is shown in fig. 1. The forward mode is essentially similar to a routine thermodynamic calculation. In the backward mode, the computations of gradients propagate from the loss function towards the model parameters through a series of intermediate steps, and all the single steps are finally assembled together to obtain ∇ {A θ i } L based on elementary rules of differentiation. It can be seen that such scheme is very similar to that in deep learning. In fact, differentiable programming can be regarded an extension of deep learning with the use of diverse differentiable components beyond classical neural networks.
The above general framework has been implemented for a simple two-phase binary system Cu-Rh for demonstration, which has a liquidus and a solidus between the fcc phase and the liquid (liq) phase besides a fcc miscibility gap right below the solidus. To calculate phase equilibria under given temperature and pressure, Gibbs energy is used as the thermodynamic potential. The four-parameter model by Lesnik et al. 22 is taken as the target to learn: where θ =fcc,liq, R is the gas constant, From this set of model parameters, the training data is generated, including phase boundaries at 1000-2200 K. The Gibbs energy minimization is divided into two steps. The first step is a global minimization on a grid of compositions based on a convex-hull algorithm, generating an approximate solution which is to be refined in the second step. The second step is an iterative self-consistent procedure, where the Newton-Raphson method is used to solve phase equilibria under fixed chemical potential which is then updated based on the newest solved phase equilibria in each iteration, and the iterations stop when convergence is achieved. The loss function for this example system is defined as where the rectified linear unit ReLU(z) = max(0, z), and α is a scaling factor for improving convergence in minimization of the loss function. In the present case, α = 100 is used. The driving force DF of the metastable phase is the distance in terms of Gibbs energy between the stable tangent plane and a tangent plane parallel to the metastable phase 20 . The above loss function contains two contributions from the liquid-fcc equilibrium and the fcc miscibility gap respectively. Since the target type of phase equilibrium may not be correctly reproduced if the model parameters have large deviation from their target values, penalties are imposed instead in such scenarios to favor the regions where min x DF = 0 and min the liquid-fcc equilibrium and the fcc miscibility gap exist at some compositions. A differentiable program for calculating the above loss function is written using JAX 15 , a machine learning library which can automatically differentiate Python and NumPy functions. Notably, JAX can differentiate through control flows like loops and branches, which are key structures in the Gibbs energy minimization. The gradient of the loss function, ∇ {A θ i } L, is then obtained by automatic differentiation of the program. Given its gradient, the loss function is minimized using a gradient-based optimization algorithm, the L-BFGS-B method.
The training process for the Cu-Rh model system is shown in fig. 2. Despite starting with quite unreasonable initial model parameters, the minimization of the loss function is quite efficient, with good convergence achieved within a few tens of steps, which is made possible by the explicitly calculated gradient of the loss function. In this case, the liquid-fcc equilibrium always exists during training, thus the loss function has only two non-zero contributions from displacement of the phase boundaries (liquidfcc and fcc-fcc) and absence of the phase separation (fcc miscibility gap) caused by inaccurate model parameters, respectively. The latter contribution vanishes after the fcc miscibility gap is made exist, and then the training is only accompanied with quantitative adjustment of the phase boundaries. With the loss function minimized, its four-component gradient driving the training process approaches the zero vector, and the model parameters also converges to the true values. To better understand how the thermodynamic model evolves, the trajectories of Gibbs energies of the two involved phases at 1200 K in the training are plotted. In consistence with initial absence of the fcc miscibility gap, the Gibbs energy of the fcc phase is initially a convex function without spinodal decomposition, but gradually trained to be non-convex leading to phase separation. The phase diagram of the Cu-Rh system predicted by the trained model is shown in fig. 3, along with the training data. It can be seen that the predicted phase diagram is in excellent agreement with the training data, meaning the present model training is highly successful. The above example provides a preliminary successful application of differentiable thermodynamic modeling, which is able to learn thermodynamics from mixed types of data on thermochemistry and phase equilibria in a deep learning style. Due to simplicity of the binary system and limited amount of training data, a polynomial with raw elemental compositions directly used as inputs is a suitable form to represent the excess Gibbs energy of each phase, which is also the routine practice in conventional thermodynamic modeling. However, such representation may suffer "curse of dimensionality" in high-dimension chemical space. For instance, considering an extreme case where a phase contains 100 elements, there are 100 compositional variables, C 2 100 = 4950 binary interactions, C 3 100 =161700 ternary interactions and C 4 100 =3921225 quaternary interaction, totaling a daunting number of model parameters with each interaction represented by a conventional parameterized polynomial. It is therefore desirable to explore an alternative representation of the Gibbs energy. Due to its strong expressivity, the neural network has emerged as a promising tool for this purpose, and there have been quite a few related works in the literature. Using elemental compositions as inputs, a deep neural network trained on DFT data has achieved a mean average error of 0.05 eV/atom in predicting formation enthalpies 3 . There have been also efforts in introducing physical attributes such like electronegativity and atomic radius into inputs by feature engineering to alleviate the difficulty due to vast dimensions of the composition space 23 . This is still a field under active research, and further discussions are beyond the scope of the present work. However, it is quite obvious that the present framework of differentiable thermodynamic modeling can provide a necessary platform for introducing neural networks to learn thermodynamics from diverse types of data.
From the perspective of mapping, a set of phase equilibrium data is actually a sample from the map f : (x,C) → e, where e is the phase equilibrium at composition x and external condition C. To incorporate more physics, thermodynamic potentials {G θ } is introduced as intermediate variables with two maps f 1 : (x,C) → {G θ } and f 2 : {G θ } → e, which are usually called "thermodynamic model" and "phase equilibrium calculation", respectively. The map f is just their composition: Note that f 1 and f 2 are very different in nature. f 1 is usually complicated and sometimes even obscure, packaging the whole physics of each single-phase material that is difficult to calculate explicitly without capturing all the atomic and electronic details, but this is just the part deep learning can find its largest use. In contrast, f 2 is more straightforward, thus most suitable for a direct physical computation. Differentiable thermodynamic modeling offers a seamless integration of these two components.
In summary, the present work proposes a deep learning framework for thermodynamic modeling, which is termed differentiable thermodynamic modeling. It is based on differentiable programming, which allows differentiation throughout the computation flow and therefore gradientbased optimization of parameters. Under this framework, thermodynamics can be learned from different types of data on thermochemistry and phase equilibria in a deep learning style, and thermodynamic modeling and deep learning are de facto unified and indistinguishable. Its preliminary success is demonstrated by application in training a model for the Cu-Rh system. It is expected that differentiable thermodynamic modeling can facilitate exploration of thermodynamics of multi-component systems with complex chemistry, as well as design, synthesis and optimization of multi-component materials.

RESEARCH DATA
The data that support the findings of this study are available from the corresponding author upon reasonable request. The data and code used in this work will be also made publicly available on Github.

ACKNOWLEDGEMENT
The author would like to acknowledge interesting presentations and discussions from the Scientific Machine Learning Webinar Series (https://www.cmu.edu/aced/sciML.html).