Thermodynamically Consistent Machine-Learned Internal State Variable Approach for Data-Driven Modeling of Path-Dependent Materials

Characterization and modeling of path-dependent behaviors of complex materials by phenomenological models remains challenging due to difficulties in formulating mathematical expressions and internal state variables (ISVs) governing path-dependent behaviors. Data-driven machine learning models, such as deep neural networks and recurrent neural networks (RNNs), have become viable alternatives. However, pure black-box data-driven models mapping inputs to outputs without considering the underlying physics suffer from unstable and inaccurate generalization performance. This study proposes a machine-learned physics-informed data-driven constitutive modeling approach for path-dependent materials based on the measurable material states. The proposed data-driven constitutive model is designed with the consideration of universal thermodynamics principles, where the ISVs essential to the material path-dependency are inferred automatically from the hidden state of RNNs. The RNN describing the evolution of the data-driven machine-learned ISVs follows the thermodynamics second law. To enhance the robustness and accuracy of RNN models, stochasticity is introduced to model training. The effects of the number of RNN history steps, the internal state dimension, the model complexity, and the strain increment on model performances have been investigated. The effectiveness of the proposed method is evaluated by modeling soil material behaviors under cyclic shear loading using experimental stress-strain data.


Introduction
Traditional constitutive modeling is based on constitutive or material laws to describe the explicit relationship among the measurable material states, e.g., stresses and strains, and internal state variables (ISVs) based on experimental observations, mechanistic hypothesis, and mathematical simplifications. However, limited data and functional form assumptions inevitably introduce errors to the model parameter calibration and model prediction. Moreover, with the pre-defined functions, constitutive laws often lack generality to capture full aspects of material behaviors [1,2].
Path-dependent constitutive modeling typically applies models with evolving ISVs in addition to the state space of deformation [3,4]. The ISV constitutive modeling framework has been effectively applied to model various nonlinear solid material behaviors, e.g., elasto-plasticity [5,6], visco-plasticity [7], and material damage [8]. However, ISVs are often non-measurable, which makes it challenging to define a complete and appropriate set of ISVs for highly nonlinear and complicated materials, e.g., geomechanical materials. Further, the traditional ISV constitutive modeling approach often results in excessive complexities with high computational cost, which is undesirable in practical applications.
Recurrent neural networks (RNNs) designed for sequence learning have been successfully applied in various domains, such as machine translation and speech recognition, due to their capability of learning history-dependent features that are essential for sequential prediction [48,49]. The RNN and gated variants, e.g., the long short-term memory (LSTM) [50] cells and the gated recurrent units (GRUs) [51,52], have been applied to path-dependent materials modeling [53], including plastic composites [54], visco-elasticity [55], and homogeneous anisotropic hardening [56]. RNN-based constitutive models have also been applied to accelerate multi-scale simulations with path-dependent characteristics [57,58,59,60,61]. Recently, Bonatti and Mohr [62] proposed a self-consistent RNN for path-dependent materials such that the model predictions converge as the loading increment is decreased. However, these RNN-based data-driven constitutive models may not satisfy the underlying thermodynamics principles of path-dependent materials.
In this study, we propose a thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials, which relies purely on measurable material states. The first thermodynamics principle is integrated into the model architecture whereas the second thermodynamics principle is enforced by a constraint on the network parameters. In the proposed model, an RNN is trained to infer intrinsic ISVs from its hidden (or memory) state that captures essential history-dependent features of data through a sequential input. The RNN describing the evolution of the data-driven machinelearned ISVs follows the thermodynamics second law. In addition, a DNN is trained simultaneously to predict the material energy potential given strain, ISVs, and temperature (for non-isothermal processes). Further, model robustness and accuracy is enhanced by introducing stochasticity to inputs for model training to account for uncertainties of input conditions in testing.
The remainder of this paper is organized as follows. The background of thermodynamics principles is first introduced in Section 2. In Section 3, DNNs and RNNs are introduced and their applications to path-dependent materials modeling are discussed. Section 4 introduces the proposed thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials, where two thermodynamically consistent recurrent neural networks (TCRNNs) are discussed. Finally, in Section 5, the effectiveness and generalization capability of the proposed TCRNN models are examined by modeling an elasto-plastic material and undrained soil under cyclic shear loading. A parametric study is conducted to investigate the effects of the number of RNN steps, the internal state dimension, the model complexity, and the strain increment on the model performance. Concluding remarks and discussions are summarized in Section 6.

Thermodynamics Principles
The balance of energy, i.e., the first thermodynamics principle, can be expressed as [63] ρė = σ :ε − div q + ρh, where the superposed "." denotes the material time derivative; ρ is the material density; e is the specific internal energy andė is its rate; σ is the Cauchy stress tensor; ε is the strain tensor; σ :ε denotes the rate of mechanical work; q is the heat flux; h is the specific rate of heat supply. The local form of the second thermodynamics principle expressed in terms of the Clausius-Duhem inequality reads [63] ρṡ + div where s denotes the specific entropy and T is the positive absolute temperature.
Combining the first and second thermodynamic principles yields the dissipation inequality The left-hand side of Eq. (3) represents the total dissipation rate that can be decomposed into the non-negative mechanical dissipation rate D and the non-negative thermal dissipation rate D th [63,64]: Considering a constant material density and defining the specific internal energy per unit volume as E = ρe and the specific entropy per unit volume as S = ρs, we haveĖ = ρė andṠ = ρṡ. Therefore, Eq. (4a) can be rewritten as Denoting the specific Helmholtz free energy as F = E − T S [63] and taking the time derivative givesḞ =Ė −Ṫ S − TṠ.
Combining Eq. (5) and Eq. (6) giveṡ For strain-rate independent materials, the Helmholtz free energy can be defined as [63] F := F (T, ε, z), where z = (z 1 , ..., z N ) is a collection of N internal state variables (ISVs) introduced to characterize the state of path-dependent materials, which can also be interpreted as history variables [21]. However, ISVs are often non-measurable and the identification of the ISVs is often based on empiricism, which is nontrivial for materials with highly complex and nonlinear path-dependent behaviors. Here, a ML-enhanced data-driven approach is proposed to automatically infer the essential ISVs that follow the thermodynamics principles, which will be discussed in Section 4. Differentiation of Eq. (8) giveṡ Equating Eq. (7) with Eq. (9) gives The arbitrariness ofṪ ,ε, andż leads to the following relations These relations are derived based on the universal thermodynamics principals and considered in the proposed data-driven constitutive models. In Section 4, we will introduce a thermodynamically consistent machine-learned ISV approach for data-driven modeling of path-dependent materials with the consideration of the thermodynamically consistent relations (Eq. (11)).

Deep Neural Networks (DNNs) Constitutive Models
Deep neural networks (DNNs), as the core of the deep learning [65], represent complex models that relate data inputs, x ∈ R din , to data outputs, y ∈ R dout . A typical DNN is composed of an input layer, an output layer, and L hidden layers. Each hidden layer transforms the outputs of the previous layer through an affine mapping followed by a nonlinear activation function a(·), which can be written as: where the superscript (l) denotes the layer the quantities belong to, e.g., x (l) ∈ R n l is the outputs of layer l with n l neurons. W (l) ∈ R n l ×n l−1 and b (l) ∈ R n l are the weight matrix for linear mapping and the bias vector of layer l, respectively, where n 0 = d in is the input dimension. They are trainable parameters to be optimized through training. For a fully-connected layer, the number of trainable parameters is (n l−1 + 1)n l . Commonly used activation functions include the logistic sigmoid, the hyperbolic tangent function, the rectified linear unit (ReLU), and the leaky ReLU [66]. Note that the choice of the activation of the output layer depends on the type of ML tasks. For regression tasks, which is the application of this study, a linear function is used in the output layer where the last hidden layer information is mapped to the output vectorŷ, expressed as:ŷ = W (L+1) x (L) + b (L+1) , whereŷ denotes the DNN approximation of the data output y. Fig. 1(a) shows the computational graph of a feed-forward DNN with three input neurons, two hidden layers, and two output neurons.
For path-dependent materials, the current stress response depends on the past stress-strain history. Fig. 1(b) shows the computational graph of a DNN constitutive model for path-dependent materials that takes one history step of stress-strain states and the current strain increment as input and predicts the Figure 1: Computational graphs of (a) a feed-froward deep neural network (DNN) with three input neurons, two hidden layers, and two output neurons, (b) a DNN constitutive model that takes one history step of stress-strain states and the current strain increment as input and predicts the current stress increment, and (c) a DNN constitutive model that takes one history step of stress-strain states and the current total strain as input and predicts the current total stress. current stress increment. Alternatively, the current strain increment can be replaced with the current total strain as input and the current total stress as the output, as shown in Fig. 1(c). The effects of the input/output representation will be discussed in the next subsection. These DNN constitutive models can be extended to consider pre-defined ISVs as input in addition to the measurable material states. An additional DNN can be used to model the evolution of the ISVs [47]. However, the dependency on pre-defined ISVs limits its applications, especially when only the measurable material states of path-dependent behaviors are available, e.g., the soil example to be demonstrated in Section 5.2.
Note that for DNN constitutive models, the number of history input steps is fixed once the model architecture is determined, which means the number of history input steps used for testing must be the same as that used in training. Furthermore, it is difficult for DNNs with recurrent connections from the output of step n to the input of step n+1 to capture the essential information about the past history since the outputs are explicitly trained only to match the training set targets not being informed of the past history [65]. These issues are addressed by RNNs introduced in the next subsection.

Recurrent Neural Networks (RNNs) Constitutive Models
Recurrent neural networks (RNNs) designed for sequence learning have demonstrated successful applications in various domains, such as machine translation and speech recognition, due to their capability of learning history-dependent features that are essential for sequential prediction [48,49]. Fig. 2(a) illustrates the computational graphs of a folded RNN and an unfolded RNN, where h is a hidden state that captures essential history-dependent features from past information, which makes RNNs particularly suitable for modeling path-dependent material behaviors. Unfolding of the RNN computational graph results in parameter sharing across the network structure, reducing the number of train-able parameters and thus leading to more efficient training. The length of input/output sequences can be arbitrary, which allows generalization to sequence lengths not appeared in the training set. Each step can be viewed as a state. Despite the history sequence length, the trained RNN model always has the same input size, since it is specified in terms of transition from one state to another rather than in terms of a variable-length history of states [65]. The forward propagation of RNN begins with an initial hidden state and the propagation equations at time step (state) n are defined as where a tanh is the hyperbolic tangent function; W xh , W hh , and W hy are trainable weight coefficients for input-to-hidden, hidden-to-hidden, and hidden-tooutput transformations, respectively; b h and b y are trainable bias coefficients. These trainable parameters are shared across all RNN steps. Eqs. (13a) transforms the previous hidden state h n−1 and the current input x n to the current hidden state h n , while (13b) transforms the current hidden state h n to the current outputŷ n . The history information is captured by the hidden state of RNN by repeating the transformation in Eq. (13a) for all RNN steps. The hidden state that carries the essential history-dependent information is passed to the final step and informs the final prediction. Depending on applications, RNNs can have flexible architectures of input and output, such as one-to-one, one-to-many, many-to-one, and many-to-many [67]. For example, the unfolded RNN shown in Fig. 2(a) is a many-to-many type of RNN, which can be applied to, for example, name entity recognition. For path-dependent constitutive modeling, the many-to-one type of RNN is more suitable. Fig. 2(b) illustrates the computational graph of an RNN constitutive model that takes one history step of stress-strain states and the current strain increment as input and predicts the current stress increment, defined as an incremental RNN, whereas Fig. 2(c) show the computational graph of an RNN constitutive model that takes one history step of stress-strain states and the current total strain as input and predicts the current total stress, defined as a total-form RNN. Unlike the standard RNNs that has the same input size for all time steps (states), the history-step input of the RNN constitutive models shown in Fig. 2(b)-(c) contains both strain and stress components whereas the currentstep input contains only strain components. Considering one history step, the forward propagation of a typical total-form RNN is expressed as follows where W hh , W εh , W σh , and W hσ are trainable weight coefficients for hiddento-hidden, strain-to-hidden, stress-to-hidden, and hidden-to-output transformations, respectively, and b h and b y are trainable bias coefficients.
To capture complex history-dependent patterns, deep RNNs are more advantageous [65]. Similar to DNNs, stacking of fully-connected hidden layers can be added to affine input-to-hidden, hidden-to-hidden, and hidden-to-output transformations.
Remark. Our studies show that the total-form RNN is less sensitive to the strain increment than the incremental RNN, as a consequence of interpolation outperforming extrapolation. For instance, considering a training dataset with one stress-strain path and a constant loading (strain) increment, the final-step input (the strain increment) of the incremental RNN is constant, whereas the final-step input (the total strain) of the total-form RNN is not constant and covers the whole range of strain in the stress-strain path. During testing on the same stress-strain path but with a different constant loading increment, larger or smaller than that of the training data, the incremental RNN becomes inaccurate since the final-step input (the strain increment) of the testing data is beyond the range of the training strain increment and the prediction is an extrapolation. In contrast, the total-form RNN remains accurate because the final-step input (the total strain) of the testing data is within the range of training total strain and the prediction is an interpolation. Therefore, the proposed data-driven models in this study are built upon the total form.

Gated Recurrent Units (GRUs)
Standard RNNs suffer from short-term memory due to vanishing and exploding gradient issues that arise from recurrent connections [68,69,65]. More effective RNNs for learning long-term dependencies have been developed, including the long short-term memory (LSTM) [50] cells and gated recurrent units (GRUs) [51,52]. A typical GRU consists of a reset gate r n that removes irrelevant past information, an update gate u n that controls the amount of past information passing to the next step, and a candidate hidden stateh n [52]. Compared to LSTM, the GRU has fewer parameters as it lacks an output gate. Considering one history step, the forward propagation of a typical GRU is expressed as follows where * denotes the Hadamard (element-wise) product; a σ is the sigmoid function; a tanh is the hyperbolic tangent function; W hr , W xr , W hu , W xu , W hh , W xh , and W hy are trainable weight coefficients; b r , b u , bh, b h , and b y are trainable bias coefficients. Eq. (15d) calculates the current hidden state h n by a linear interpolation between the previous hidden state h n−1 and the candidate hidden stateh n , based on the update gate u n . The RNN-based constitutive models proposed in this study are applicable to all types of RNNs for complicated path-dependent material behaviors with long-term dependent features.

Model Training
Since the forward propagation (Eqs. (13)- (15)) is inherently sequential, i.e., each time step can only be computed after the previous one, the computation of the gradient of the loss function with respect to the trainable parameters cannot be parallelized and it needs to follow the reverse unfolded computational graph. The back-propagation through time algorithm is applied to RNNs [65].
During training, the model receives the ground truth stress data of history steps, which is a teacher forcing procedure emerging from the maximum likelihood criterion [65]. However, the disadvantage of teacher forcing training arises when the trained model is applied in an open-loop test mode with the network's previous outputs fed back as input for future predictions. The computational graphs of the test mode are shown in Fig. 3 corresponding to the RNN models (in the train mode) shown in Fig. 2(b)-(c). In this case, the inputs the trained model receives could be quite different from those received during training, forcing the model to perform extrapolative predictions and thus lead to large errors. Furthermore, such prediction errors could occur at the very first prediction, accumulate and propagate quickly, and contaminate the subsequent predictions. To mitigate the issue of error propagation due to the teacher forcing training and enhance model accuracy and robustness, we introduce stochasticity to the training set by adding random perturbations to the ground truth stress data. In this way, the network can also learn the variability of the input conditions resembling those in the test mode.

Thermodynamically Consistent Machine-Learned Internal State
Variable Approach for Path-Dependent Materials

Thermodynamically Consistent Recurrent Neural Networks (TCRNNs)
To ensure thermodynamical consistency in the data-driven path-dependent materials modeling, thermodynamics principals introduced in Section 2 are em- Figure 3: Test mode of (a) a RNN constitutive model that takes one step history of stress-strain states and the current strain increment as input and predicts the current stress increment, and (b) an RNN constitutive model that takes one history step of stress-strain states and the current total strain as input and predicts the current total stress. The stress prediction from previous step is used as a part of the history input for the current-step prediction.
bedded into RNNs to extract the hidden ISVs. The proposed TCRNN consists of an RNN and a DNN. The computational graphs for non-isothermal or isothermal processes are illustrated in Fig. 4. where W hh , W εh , W σh , W T h , and W hz are trainable weight coefficients for hidden-to-hidden, strain-to-hidden, stress-to-hidden, temperature-to-hidden, and hidden-to-ISV transformations, respectively; b h and b z are trainable bias coefficients; a h (·) and a z (·) are activation functions. Note that the trainable parameters are shared across all steps of the RNN, which enhances training efficiency.ẑ n is the machine-learned ISVs from the hidden state h n of the RNN and its rate,ż n , can be computed by automatic differentiation [70], which will be discussed in the next subsection. Eqs. (17a-b) transform the history and current measurable material states to the current hidden state h n that carries the essential past information. For an RNN with more than one history step, Eq. (17a) is repeated for all history steps. Eq. (17c) infers the current ISVẑ n from the current hidden state h n . A linear activation (a z (·)) is used for the transformation from the hidden state to the ISV in Eq. (17c). The selection of the activation a h (·) requires particular attention due to the issue of second-order vanishing gradients [47]. For effective training via back-propagation, the gradient of the output derivative with respect to the trainable parameters requires non-zero second-order gradients of activation functions. As a result, the activation function SiLU(x) = x/(1 + e −x ) is selected for a h (·) due to its smoothness and non-zero second-order gradients, as shown in Fig. 5. Following the extraction of the ISV, a DNN is then used to predict the Helmholtz free energy given the strain, the temperature and the machine-learned where f DN N represents a DNN, as discussed in Section 3.1. The activation in the output layer is linear. The output Helmholtz free energy is then used to compute the stressσ, the dissipation rateD, and the entropyŜ according to Eq. (11), which implicitly enforces the first thermodynamics principle. The second thermodynamics principle, i.e.,D ≥ 0, is enforced in the loss function by constraining the network parameters, which will be discussed in the next subsection. The gradients of the output with respect to the inputs are obtained by automatic differentiation [70]. Since the output derivatives are involved in the loss function, SiLU is used for the activation of the hidden layers to avoid the issue of second-order vanishing gradients as discussed above.

Secondary Outputs
To balance feature contributions to the loss function and accelerate the training process, the training dataset is standardized to have zero mean and unit variance. For instance, a feature x of the dataset is standardized by its mean µ x and standard deviation std Hereafter, a bar symbol is used to denote standardized quantities. From Eq.
The direct calculation ofż n through Eq. (27) can be computationally intractable, especially when the dataset size, the internal state dimension, and the number RNN input steps are large. Alternatively, the rate of the ISVs can be approximated byż n ≈ ∆ẑ n /∆t, where ∆ẑ n =ẑ n −ẑ n−1 is the increment of the ISVs at the current step n. To obtain ∆z n , alternative TCRNNs are proposed, as shown in Fig. 6, where the last two steps of the RNN infer the ISVs,ẑ n−1 andẑ n , respectively. These TCRNN models enhance training efficiency.  Figure 6: Computational graphs of (a) a thermodynamically consistent recurrent neural network (TCRNN) for non-isothermal processes and (b) a TCRNN for isothermal processes. The increment of the machine-extracted internal state, ∆ẑn, is computed and used for calculating the dissipation rate.

Model Training
The loss function is expressed as where β i , i = 1, .., 3, are regularization parameters. || · || L1 denotes the L 1 norm, and β 3 is set to be zero if the data of the entropy is unavailable. The training consists of forward propagation and backward propagation. During the forward propagation, the machine-learned ISVs are implicitly embedded in the calculation of the predicted measurable quantities by following the thermodynamics principles. During the backward propagation, the errors of the measurable quantities are back-propagated to update the model's trainable parameters and refine machine-learned ISVs.
In some cases where the data of the dissipation rate D is unavailable, the non-negativity condition, i.e.,D ≥ 0, can be imposed instead, which is resulted from the thermodynamics second law, Eq. (4a). To this end, the rectified linear unit (ReLU) can be utilized, i.e., ReLU(x) = max(0, x) ≥ 0, which is positive only if x is positive. Hence, ReLU(−D) is positive only ifD is negative, which corresponds to violation of the non-negativity conditionD ≥ 0. Including ReLU(−D) to the loss function penalizes the violation of the non-negativity condition and enforcesD ≥ 0 to be satisfied, which imposes a constraint on the network parameters during training. The loss function becomes Similarly, if the data of the Helmholtz free energy F is unavailable, the nonnegativity condition, i.e.,F ≥ 0, can be imposed by adding ReLU(−F n ) to the loss function, In some cases where prior knowledge of certain ISVs are available, the TCRNN models can be trained in a hybrid mode by leveraging the existing ISVs and simultaneously inferring additional thermodynamically consistent ISVs that are essential to path-dependent behaviors. Considering z p n as the known ISVs andz p n as the corresponding standardized quantity, the loss function becomes where the last term enables the TCRNN model to learn the existing ISVs. For the TCRNN model to infer additional essential ISVs, the prescribed internal state dimension, |ẑ|, should be greater than the dimension of the existing ISVs, |z p |. Note that both existing and machine-learned ISVs are passed to the DNN to predict the Helmholtz free energy (Eq. 22) and the downstream calculations (Eqs. [26][27]. Apart from thermodynamics, the time (or self) consistency condition is critical for convergence of numerical approximation when ∆t → 0 [44].
To achieve the time consistency condition, the training set can be augmented by additional samples constituted by zero strain increment and zero stress increment at different material states (time steps), which enables the machine-learned material model to learn the time consistency condition from data. Alternatively, the self-consistency condition can be integrated into the RNN architecture by definition [62]. The optimal parameters of TCRNN are obtained by minimizing the loss functions (Eqs. (28)- (31)) using the open-source Pytorch library [71]. The Adam optimizer [72] is adopted for back-propagation training and the initial learning rate is set to be 0.001. Since the training dataset is standardized to balance feature contributions to the loss function, the training is less sensitive to the regularization parameters. Hence, the regularization parameters are set to be 1 unless further tuning is required to achieve better training performances.

Modeling Elasto-Plastic Materials
To demonstrate the accuracy, robustness, and generalization performances, the proposed TCRNN is applied to model a material with synthetic data generated by the one-dimensional elasto-plastic material with kinematic hardening. The Helmholtz free energy potential is expressed as where E = 100 GP a is the Young's modulus; H = 100 GP a is the kinematic hardening parameter; ε is the total strain; ε p is the plastic strain, which can be considered as a phenomenological ISV. The yield stress is k = 100 M P a. The stress and the dissipation rate can be obtained by Eq. (11b-c) as follows The dataset is generated by Eqs. (33)- (34), which contains five samples with the same stress-strain path (two loading-unloading cycles), as show in Fig.  7. The only difference in these samples is the strain increment, ranging from 3.75 × 10 −5 to 7.5 × 10 −5 . The data of the sample with a strain increment of 5.0 × 10 −5 is used to train the TCRNN. The remaining samples are in the testing set to evaluate the trained model.
To address the issue of error propagation in the test mode due to the teacher forcing training, as discussed in Section 3.2.2, and enhance model accuracy and robustness, stochasticity is introduced to the stress data so that the model learns the uncertainties of the input conditions resembling those in the test mode. Random perturbations are generated from a normal (Gaussian) distribution with a zero mean and a standard deviation of r × σ max , where σ max is the maximum stress in the data and r is a user-defined parameter to control the level of randomness. Fig. 7 shows the original stress-strain data in a black solid line and the randomly perturbed data in red circles. During supervised training, the noiseless stress is the ground truth and the input stress variable is no longer deterministic but rather stochastic, contributed by the random perturbations.
The TCRNN model based on the time rates of ISVs (ż), as shown in Fig.  4(b), is employed in this example. A GRU is used to infer the ISVẑ and describe its evolution by following the thermodynamics second law. The GRU consists of one hidden layer for all affine transformations in Eq. (15) with the model complexity represented by the dimension of the hidden state, |h|. Since the data of the free energy F and the dissipation rate D can be obtained by Eq. (34) in this example, the loss function in Eq. (28) is employed with β 1 = β 2 = 1 and β 3 = 0. A relative error used to measure the prediction accuracy is defined as follows where Σ andΣ contain the stress data and stress predictions at all time steps of a loading path, respectively. In the following subsections, the effects of the strain increments on model performance are first investigated. Since the machine-inferred ISVs are critical to the accuracy of path-dependent materials modeling, various factors that can affect the quality of the machine-inferred ISVs are investigated, including the number of RNN steps, the internal state dimension (|ẑ|), and the model complexity (|h|). Further, the generalization capabilities of the TCRNN model are examined.

Effects of Strain Increments
We first investigate how the strain increment affects the prediction accuracy of the TCRNN model. The model has a scalar ISV and a hidden state dimension of 30 (|h| = 30). Fig. 8 compares the predictions with data, where the case with a green color line is used for training and those with blue color lines are used for testing. The trained model achieves 1.1% relative error (Eq. (35)) on the stress prediction for the training case and 1.9% mean relative error for the testing cases. The mean relative error is obtained by averaging the relative errors (Eq. (35)) of all cases. The good agreement between predictions and data for all quantities, including the stress, the Helmholtz free energy, and the dissipation rate demonstrates that the model maintains high prediction accuracy and robustness as the strain increment varies. Fig. 9 shows that the machine-learned ISV is monotonically correlated with the phenomenological ISV, which demonstrates the capability of the TCRNN model in extracting mechanistically and thermodynamically consistent ISV essential to dissipative elasto-plastic material behaviors.

Effects of The Number of RNN Steps
In the second example, TCRNN models with a scalar ISV and various RNN steps (including history and current steps) are examined. Note that the model complexity is independent of the number of RNN steps due to parameter sharing across all RNN steps. For a relatively compact model with |h| = 20, Fig.  10(a) shows that the relative errors of training and testing samples decrease significantly when the number of RNN steps reaches 4, a critical number of RNN steps, which is expected since increasing the number of RNN steps enables the model to extract more accurate path-dependent features from longer-term stress-strain history. As the number RNN steps further increases, the relative errors of training and testing samples remain at a plateau, with around 0.4% and 1.6% relative errors, respectively. The plateau indicates that further increasing the number of RNN steps does not improve the quality of the machine-inferred ISVs and thus the model accuracy, which could be potentially limited by |ẑ| or |h|. For a more complex model with |h| = 30, Fig. 10(b) shows a similar convergence behavior but the critical number of RNN steps increases to 5. This shows that the number of RNN steps play an important role in model accuracy and performance. Unnecessarily increasing the model complexity may lead to an increase in the number of RNN steps for achieving the same level of accuracy.

Effects of Internal State Dimension
The internal state dimension (|ẑ|) has a direct impact on the quality of the machine-inferred ISVs and thus the model performance. If |ẑ| is too small, the TCRNN model cannot capture all important thermodynamically consistent path-dependent features even if the number of RNN steps and model complexity are sufficient. In this example, TCRNN models with 5 RNN steps, |h| = 20 and various internal state dimensions are examined. Fig. 11(a) shows that as the dimension of the ISV increases from 1 to 5, the relative errors of training and testing samples remain at a plateau, with around 0.5% and 1.5% relative errors, respectively. It indicates that a scalar ISV is sufficient for effectively capturing the path-dependent material behavior in this case. The convergence of the model performance against the internal state dimension is particularly important. In practice, the internal state dimension of path-dependent materials is often unknown a priori. The convergence property shows that the TCRNN model remains accurate and robust even if an excessive internal state dimension is prescribed. This convergence property also allows one to identify the optimal |ẑ| given measurable material states of path-dependent materials.

Effects of Model Complexity
The model complexity represented by the hidden state dimension is another important factor influencing the quality of the machine-inferred ISVs and model performance since the ISVs are inferred from the hidden states that directly capture the essential stress-strain path-dependent features. If the hidden state dimension is too small, important stress-strain path-dependent features could be lost, leading to inaccurate machine-inferred ISVs and poor model performance. In this example, the effects the TCRNN model complexity (hidden state dimension) are investigated. The TCRNN models examined have a scalar ISV, 5 RNN steps, various hidden state dimensions ranging from 5 to 100. Fig.  11(b) shows that as the hidden state dimension increases, the relative errors of training and testing samples decrease and then reach a plateau, with around 0.5% and 1.5% relative errors, respectively. This shows that a compact network is sufficient to achieve a satisfactory accuracy in this example and increasing the model complexity does not improve the model accuracy.

Model Generalization
In the following examples, three variables are considered to evaluate the generalization performances of the TCRNN model, including the loading strain per cycle, the unloading strain per cycle, and the number of (loading-unloading) cycles. The TCRNN model with 15 RNN steps, a scalar ISV, and |h| = 30 is employed.
In the first test, we consider a two-dimensional parameter space constituted by the loading strain per cycle and the unloading strain per cycle. The dataset contains 16 cases with the same number of loading-unloading cycles but with different loading and unloading strains per cycle. Fig. 12 shows the comparison between the predicted stress and the data, where case 1, 4, 9, and 16, located at the corners in the figure, are used for training with the data marked with the green solid lines, and the remaining cases are used for testing with the data marked with the blue solid lines. From top to bottom, the loading strain per cycle increases from 10 −2 to 1.4 × 10 −2 . From left to right, the unloading strain per cycle increases from 5 × 10 −3 to 5.5 × 10 −3 . The mean relative errors of the training and testing cases are 3.1% and 2.8%, respectively. The good agreement between the data and the predictions demonstrates that the trained TCRNN model can successfully predict the testing cases within the prescribed parameter space.
In the second test, we consider a two-dimensional parameter space constituted by the number of loading-unloading cycles and the loading strain per cycle. The dataset contains 16 cases with the same unloading strain per cycle, 5.5 × 10 −3 . Fig. 13 shows the comparison between the predicted stress and the data, where case 1, 4, 9, and 16, located at the corners in the figure, are used for training with the data marked with the green solid lines, and the remaining cases are used for testing with the data marked with the blue solid lines. From top to bottom, the number of loading-unloading cycles increases from 1 to 4. From left to right, the loading strain per cycle increases from 10 −2 to 1.4×10 −2 . The mean relative errors of the training and testing cases are 2.3% and 3.4%, respectively. The good agreement between the data and the predictions further demonstrates the strong generalization ability of the TCRNN constitutive model.

Modeling Soil under Cyclic Shear Loading
The effectiveness of the proposed TCRNN constitutive model is further evaluated by modeling undrained soil under cyclic shear loading [73,74]. The experimental data is collected from the undrained soil samples under initial triaxial confinement of 40kP a and cyclic shear loading. A cyclic stress ratio (CSR) is defined as the ratio of the maximum shear stress to the initial vertical stress. The experimental data contains the shear strain, the vertical strain, the shear stress, and the vertical stress. Fig. 14 shows the experimental data with a CSR of 0.15, 0.16, and 0.17. The stress-strain relationships are highly nonlinear and path-dependent due to coupling effects of changes in volume, matric suction, degree of saturation, effective stress, shear modulus, etc. [75]. Modeling such path-dependent material behaviors by a phenomenological approach is challenging and complicated, which often relies on certain phenomenological ISVs.  Given only the stress-strain data, data-driven models and phenomenological models that require pre-defined ISVs cannot be applied. In contrast, the proposed TCRNN model can be effectively applied since it only requires measurable material states, and the model is capable of inferring essential ISVs from the measurable material states by following the thermodynamics principles.
The TCRNN based on the increment of ISVs (∆ẑ), as shown in Fig. 6(b), is employed in this example. A GRU is used to infer the ISVẑ in Eq. (21) and describe its evolution by following the thermodynamics second law in Eq.
(2). The GRU consists of one hidden layer for all affine transformations in Eq. (15) with the model complexity represented by the hidden state dimension, |h|. Since the training data contains only stresses and strains, the loss function in Eq. (30) is employed with β 1 = β 2 = 1. The experimental data with a CSR of 0.15 and 0.17 are used for training, while the experimental data with a CSR of 0.16 is used for testing. The effects of the number of RNN steps, the internal state dimension, and the model complexity on the model performance are investigated.
Given TCRNN models with an internal state dimension (|ẑ|) of 2 and a hidden state dimension (|h|) of 30, the number of RNN steps is varied from 5 to 60 and its influences on the model prediction accuracy are shown in Fig. 15(a) As the number of RNN steps increases, the relative errors of training and testing samples decrease and eventually converge to a plateau, with values around 3% and 11%, respectively. The plateau indicates that further increasing the number of RNN steps does not improve the model accuracy.
The internal state dimension (|ẑ|) required to effectively model the pathdependent material behaviors is unknown a priori, which depends on the complexity of the path-dependent behaviors. Here, we investigate the effects of |ẑ| on model prediction accuracy, which is varied from 1 to 10, while the number of RNN steps and |h| are fixed as 40 and 30, respectively. Fig. 15(b) shows that the relative errors of training and testing samples are large when the machineinferred ISV is a scalar, indicating that a scalar ISV is insufficient to capture all essential path-dependent features. As |ẑ| increases, the relative errors of training and testing samples decrease and then reach a plateau, with values around 2.7% and 12%, respectively, which shows that the TCRNN model remains accurate and robust even if an excessive |ẑ| is prescribed. The convergence behavior also allows one to identify the optimal |ẑ|, around 2 in this example, given the measurable material states of path-dependent materials.
Lastly, |ẑ| and the number of RNN steps are fixed as 5 and 40, respectively, while |h| is varied from 5 to 100 to investigate the effects of model complexity (|h|) on model performance. Fig. 15(c) shows that the relative errors of training and testing samples decrease as |h| increases and eventually reach a plateau, with values around 2.8% and 14%, respectively. The plateaus in the convergence curves indicate that further increasing the model complexity does not improve the model accuracy. Fig. 16 compares shear stress experimental data with the predictions of the trained TCRNN model that employs 40 RNN steps, |ẑ| = 2, and |h| = 30. The relative errors of training and testing samples are around 3.2% and 9.4%, respectively. It shows that the TCRNN model is able to learn the path-dependent material behaviors from the measurable material states under given loading conditions and effectively predicts the path-dependent responses under untrained loading conditions, further demonstrating the generalization ability and effectiveness of the TCRNN model in practical applications. Further, the trained TCRNN material model is thermodynamically consistent, which is verified by the non-negative predicted free energy and the predicted dissipation rates that satisfy the thermodynamics second law, as shown in the second and the third rows of Fig. 16, respectively. The histories of the machine-learned ISVs are shown in the last row of Fig. 16, revealing interesting path-dependent patterns similar to the behaviors of the predicted free energy and dissipation rate.

Conclusions
In this study, we introduced a machine-learned internal state variable (ISV) approach for data-driven modeling of path-dependent materials, which is ther-  modynamically consistent and relies purely on the measurable material states. The proposed TCRNN constitutive models consist of two main components: an RNN that infers ISVs (Eq. (21)) and describes their evolution by following the thermodynamics second law (Eq. (2)), and a DNN that predicts the Helmholtz free energy (Eq. (22)) given strain, ISVs, and temperature (for non-isothermal processes). Two TCRNN constitutive models are developed, one based on the time rates of ISVs (ż), as shown in Fig. 4, and the other one based on the increments of ISVs (∆ẑ), as shown in Fig. 6. The latter model shows an enhanced efficiency as it utilizes an approximation ofż for the calculation of dissipation rate and avoids time-consuming differentiation of the RNN outputs with respect to all RNN inputs. Model robustness and accuracy is enhanced by introducing stochasticity to the training data to account for uncertainties of input conditions in the testing.
In the demonstration of modeling elasto-plastic materials, the parametric study shows that the model accuracy converges as the number of RNN steps, the internal state dimension, and the model complexity increase. All these factors play an important role in the model performance. Given path-dependent material behaviors, there exists an optimal internal state dimension to capture the essential path-dependent features by the TCRNN model. It has been shown that the TCRNN model remains accurate and robust even if an excessive internal state dimension is prescribed. The monotonic correlation between the machine-inferred and the phenomenological ISV of the elasto-plastic material demonstrates that the TCRNN constitutive model can infer mechanistically and thermodynamically consistent ISVs. The proposed TCRNN constitutive model is shown to remain robust against various strain increments and have strong generalization capabilities.
The effectiveness of the proposed TCRNN constitutive model is further demonstrated by modeling undrained soil under cyclic shear loading using experimental data, where only measurable material states (stresses and strains) are available. A similar convergence behaviors of the model accuracy are observed from a parametric study of the number of RNN steps, the internal state dimension, and the model complexity. The generalization capability of the TCRNN constitutive model is demonstrated by the effective prediction of the thermodynamically consistent response of undrained soil under the loading conditions different from the ones used in training, which reveals the promising potential of the proposed method to model complex path-dependent materials behaviors in real applications.
The proposed TCRNN constitutive model is general and applicable to model a wide range of path-dependent materials. It is efficient and can be applied to accelerate large-scale multi-scale simulations with complex microstructures and path-dependent material systems. To investigate reliability of model predictions, a future extension would be to integrate uncertainty quantification [76] into the proposed TCRNN model.