Neural integration for constitutive equations using small data

Data-driven models based on deep learning algorithms intend to overcome the limitations of traditional constitutive modelling by directly learning from data. However, the need for extensive data that collate the full state of the material is hindered by traditional experimental observations, which typically provide only small data - sparse and partial material state observations. To address this issue, we develop a novel deep learning algorithm referred to as Neural Integration for Constitutive Equations to discover constitutive models at the material point level from scarce and incomplete observations. It builds upon the solution of the initial value problem describing the time evolution of the material state, unlike the majority of data-driven approaches for constitutive modelling that require large data of increments of state variables. Numerical benchmarks demonstrate that the method can learn accurate, consistent, and robust constitutive models from incomplete, sparse, and noisy data collecting simple conventional experimental protocols.


INTRODUCTION
Increase in computational power and large availability of data has recently brought to a transformative focus in mechanics towards data-driven models.Mainly based on deep learning algorithms, these approaches represent a promising set of tools for addressing open challenges in solid mechanics and material science (see, without being exhaustive, [15,24,56,34,55,12,5,39,25,47]).Within this framework, data-driven approaches have increasingly been explored to overcome the limitations associated with the traditional constitutive modelling paradigm that builds upon the development of material models, which tend to follow iterative calibrations and ongoing adjustments of evolution equations of state variables [4,36].This process usually relies on only a few experimental observations for any new given material.Instead, data-driven approaches aim to learn directly from observations, thus overcoming the necessity for subjective human's trial-and-error calibration (for a review, we refer to, [40,6]).
Further motivated by the need of developing models that respect established physical principles, recent developments have given rise to a wealth of methods we now refer to as physics-informed machine learning (see [20]).This new class of data-driven approaches enables to leverage prior knowledge stemming from physics, such as thermodynamics, by devising appropriate architectures that enforce constraints, resulting in enhanced generalisation and extrapolation capabilities (see e.g.[33]).
Nonetheless, at present, only a few approaches can directly learn material models from the actual set of data experimental observations tend to provide (for instance [45,12,50,54]).The main reason stems from the fact that data-driven methods generally demand big data for learning.Here, the big refers to the set of information that collects extensive observations of the full state of the system (density, stress, temperature, etc.) as well as the numerical details of the loading protocol.These are necessary ingredients for the vast majority of current 'incremental formulations' that tend to require full experimental data about the increments of all the state variables (for more detail, we refer to Section 2).Despite the ongoing development of experimental techniques that aim at better characterising materials (see, for instance, [46]), those techniques can still not thoroughly and fully capture the complete state of materials at any loading scenario, especially complex ones.We are thus confronted with the big problem of small data, where small refers here to a few partial (incomplete) sets of measurements (see also [54]. To overcome the aforementioned difficulties, the conventional strategy (among others, cf.[54]) consists of performing in silico experiments where all necessary information can be generated through computer simulations of idealised materials (see e.g.[26,55,49,32,27]).However, the underlying drawback of this approach dwells from the need of knowing both the physics and the geometries at the fine material scale (microstructure), which is practically impossible for complex, opaque materials and boundaries, especially during dynamic loading.Recently, a new class of approaches arose to address the lack of knowledge of the full state of the materials (referred to as EUCLID, [11,12]).The method resorts to the automatic identification of constitutive models based on iterative solutions of the boundary value problem describing the deformation of the specimen, during an experiment.Whilst this approach enables to discover constitutive equations from direct measurement data, it still requires high-resolution information of internal full-field displacement data, which may not be readily available, especially for opaque materials with complex microstructures, such geo-and bio-materials.
The key point of the current contribution is the development of a novel deep learning method for the discovery of constitutive equations, at the material point level, from small data -that is, from partial and sparse observations of the material state.A core idea is in resorting to established practices in constitutive modelling.Traditionally, the evaluation of a constitutive model against experimental evidence follows the solution of an initial value problem: after specification of the initial values for the state variables, one integrates in time the evolution equations, computes the corresponding stress, and compares with experimental observations, relying solely on available stress-strain data pairs.This work embraces part of this long-standing practice as it proposes to discover constitutive models based on the solution of the initial value problem associated with the evolution equations.Thus, there is no need of computing incremental values or rates of state variables, which are not directly available in experiments, or knowing the full material state at each time (as opposed to incremental formulations).
To achieve this, we leverage the framework provided by neural differentials equations [3].Neural differential equations are a deep learning algorithm that blends artificial neural networks with time integration and enable learning time-continuous dynamics in the form of a system of differential equations solely from discrete-time observations.The framework constitutes the basis of our new approach, to which we refer in the following as the Neural Integration for Constitutive Equations (NICE) method, see Figure 1.Experimental tests (a) gather the stress-strain (σ versus ε) response, as well as initial and final material configurations, represented here by an incomplete set of sampled state variable data (X (0) and X (T ) ).From these measurement data, the system learns the evolution equations (b) through a neural differential operator f θ with parameters θ.The continuous-time and integral form of f θ then computes the state space trajectory X (t) θ for the stress evaluation (d), using a second network parametrised by ω that represents the specific internal energy uω, along with constitutive constraints derived from thermodynamic principles (c).
Neural integration of constitutive equations replaces the classical incremental data-fitting approach with a time integral formulation, relying on the computation of gradients with respect to network parameters using the adjoint sensitivity method [41,3].The learning process thus resorts to the minimisation of the error of predicted trajectories with respect to sparsely sampled measurement data.In addition, our approach leverages previous developments of the Thermodynamics-based Artificial Neural Networks (TANN, see [34,32]), where the material is solely defined by a set of state variables (following the seminal work of [4]).Here we extend the theoretical formalism to account for the dependence of the material behaviour on the density, which allows us to properly model porous materials.The thermodynamics, on one hand, and the integral formulation, on the other hand, make the newly proposed method not only capable to identify consistent material representations, but also learning from very scarce and incomplete data.
The performance and accuracy of the new method are investigated using numerical benchmarks, as these allow a thorough probing of the predictions by direct comparison with a reference constitutive model.With the ultimate goal of applying the proposed approach in real scenarios, we focus our attention on the way the training data sets are generated.To this end, we simulate numerical experiments and collect data by respecting conventional protocols, marked by the application of monotonic loading and subsequent unloading until the specimen reaches an unstressed state, rather than including cyclic or more exotic loading paths, as these are rarely probed in real experiments.
The approach is tested by considering two emblematic benchmarks.The first one involves an elasto-plastic material and represents the proof of principle in learning from partial data -that is, without using as an input elastic strain data.Thanks to the integral formulation and thermodynamics, the neural model identifies appropriate initial guesses for the values of the elastic strain which are then part of the whole learning process.Their evolution is predicted by only fitting the time evolution of the stress.
The second benchmark consists instead of a porous, elasto-plastic material, described by three state variables: density, elastic strain, and solid fraction.While the evolution of density is directly prescribed by the balance of mass, the evolution of the solid fraction is of completely different nature with respect to the elastic strain.This second example showcases the ability of the method to learn from sparse and limited data.In particular, we consider both a dense and a sparse time acquisition of the state space, and assess the predictions at inference for unobserved loading paths and protocols.An overall good agreement is found, even in the case when only two samples of the solid fraction are available: at the beginning and at the end of each loading path.
With the aim of evaluating the influence of corrupted measurement data, we further consider synthetic noise in the values of all quantities of interest (in contrast, for instance, with [34]).The approach demonstrates high level of robustness to noise in the training data.
The paper is structured as follows.Section 2 outlines a few background elements upon which our approach is built.Section 3 presents the methodology of the neural integration for constitutive equations.Here, we focus attention on both the practical implementation and the scalability of the methodology in dealing with scarce, limited data, sampled at irregular times.Section 4 showcases the accuracy, performance, and robustness of the proposed methodology by focusing on two rather simple applications, yet with powerful implications for future extensions to experimental observations.In parallel, we present in Appendix a detailed comparative analysis between the time integral formulation and the conventional incremental (or rate) formulation.Although the latter is rooted in the same thermodynamic framework, it exhibits increased sensitivity to noise and is incapable of discovering constitutive models when faced with incomplete or sparse data sets.Finally, Section 5 discusses future developments that might allow the field to address the inherent difficulties associated with direct learning from real experimental data.
The following notation is used throughout the manuscript: is the identity tensor, tr(•) represents the trace of a second-order tensor, and div v = ∂vi ∂xi , with x i the spatial coordinates, v j a vectorial quantity, and i, j = 1, 2, 3. Einstein's summation is implied for repeated indices.For the ease of reading, we distinguish functions from their values with a superposed caret, whenever a marked difference is important: for instance, u is a function and u its value.

THEORETICAL BACKGROUND
We start by introducing the theoretical background behind the core of the new method: (1) the commonly used structure of data-driven neural networks in the field of constitutive modelling and the corresponding caveats, (2) the mathematical framework that enables overcoming them, and (3) the constitutive restrictions stemming from thermodynamics.

The common structure: incremental formulation
The vast majority of data-driven deep learning approaches for constitutive modelling builds upon an 'incremental formulation' of the evolution equations.Many architectures and structures have been deployed with this purpose, and mainly feed-forward, residual, and recurrent neural networks (e.g.[15,24,56,34,53,49,50,32]).
For compactness of notation, we avoid specifying the state space of the material at this point (further introduced in subsection 2.3) and consider a hypothetical time-evolving system whose state is denoted by y.The aforementioned approaches describe the evolution of the state according to the incremental formulation where y (n) is the value of the state at discrete times {t n } N n=1 ; h θ is a generic neural operator with parameters θ that establishes the evolution of the state; and q represents the set of additional control variables the state depends upon, such as time or the values of the state at preceding times.Depending on the specified control variables, expression (1) can represent different neural network architectures.Note that the extension of (1) to a incremental formulation is also possible relying on finite-difference approximations of the rates of y (see for instance [32]).
Due to the dependence on the time interval discretisation, training the operator h θ requires dense, timeparametrised data that describe the full state.This is especially true when dealing with states that vary rapidly in time, due to the incapacity of modelling the instantaneous rate of change in y.Additional limitations lie on the need of having regularly spaced intervals (see [3]) and on the impossibility of operating with incomplete measurement data.When only partial knowledge of the state y (n−1) is available, it is impossible to infer the next state y (n) .
Reformulating in the context of evolution equations, the incremental form (1) cannot yield accurate representations of evolving state variables in the experimental setting of complex materials, see Figure 1.

Neural differential equations: integral formulation
In contrast with the aforementioned class of incremental formulations, neural differential equations (see [52,16,3], among others) are deep learning algorithms allowing the parametrisation of continuous-time dynamics of the state y(t) formulated as an ordinary differential equation in the form ẏ = dy dt (t) = f θ (y(t), q) , y(0 where q are the control variables and f θ is an evolution equation parametrised by a feed-forward neural network, where θ collects the weights and the bias terms and under the premises that the operator f θ is integrable. From the knowledge of the initial condition y (0) , the state at any time t is obtained by solving the initial value problem (2) with a differential equation solver The parameters θ are determined by means of the gradient descent algorithm [14], using the adjoint sensitivity method [41].This passes through the minimization of the error, at discrete-time points t n with n = 1, . . ., N and t N ≡ T , between the neural network predictions, y θ (t n ), and the set of measurement data, y (n) , using e.g., the mean squared error, The formulation (2-3) can be further extended to model a wide class of differential equations, and not only ordinary differential equations (see e.g.[7,35,22]).
Note that the idea of employing neural differential equations for constitutive modelling has not been properly addressed so far (a few formulations are presented in [19,44,27], however these works do not address the problem of small, incomplete measurement data which is the motivation of the present work).

Thermodynamics and constitutive restrictions
We consider a porous, single component material, undergoing infinitesimal strains and isothermal processes.The state of the material is assumed to be completely defined by a set of p + 2 state variables X = {ρ, ε e , z 1 , . . .z p }, where ρ is the volumetric mass density, ε e is the elastic strain, and z i , with i = 1, . . ., p, are (additional) dissipative state variables describing irreversible phenomena (see [10]), such as damage and dislocation density.For compactness of notation, we denote the set of dissipative state variables as z = {z 1 , . . .z p }.
Alternatively, some often consider the total strain ε, rather than the elastic one.However, here we disregard this second path as a classification based on the total strain renders the state space itself dependent of the (arbitrarily chosen) referential configuration (see [43]).
By virtue of the classification {ρ, ε e , z} and the axiom of equipresence [48], the specific (per unit volume) internal energy is specified as where the superposed caret in u serves to distinguish the internal energy function from its values.Constitutive equations must comply with the energy and entropy balance (i.e.first and second law of thermodynamics).These principles are here formulated in terms of the dissipation inequality (Clausius-Duhem inequality, [4]), namely where d is the specific mechanical dissipation rate, σ is the stress tensor, ε and ε are the infinitesimal strain and strain rate tensors.A more general formulation could follow Landau's hydrodynamic principles, which could allow one to consider other scales of temperature and viscous stress (see [23,18,10]), but this goes beyond the focus of the current paper.
Further assuming the internal energy to be differentiable with respect to the state variables, the time derivative in inequality (6) We further define the plastic strain rate εp , such that εp ≡ ε − εe .
From the mass balance, i.e., ρ + ρ div v = 0, with the spatial velocity given by v = dx dt , we obtain the evolution equation of the volumetric mass density, ρ ρ = 1 : ε, ( where we have adopted the usual sign convention in soil mechanics that consider positive deformation in compression, and used 1 as the identity tensor.Finally, substitution of the time derivative (7) and expressions (8)(9) in the Clausius-Duhem inequality (6) yields Considering the arbitrariness of εe and ż and that the dissipation rate does not depend on εe brings to the following constitutive restrictions where σ e is the elastic stress, conjugate to ε e ; µ is the chemical potential, conjugate to ρ; τ are the thermodynamic forces of the dissipative mechanisms, conjugate to z; and p T is the thermodynamic pressure (see [9]).Constitutive equations ( 11) are accompanied by the evolution equations of the state variables that, in the most general form, read which has to fulfil the dissipation inequality ( 6), while f ρ = ρ(1 : ε) is given by Eq. ( 9).

NEURAL INTEGRATION FOR CONSTITUTIVE EQUATIONS
Building upon the integral formulation provided by neural differential equations (subsection 2.2) and relying on the thermodynamic structure (subsection 2.3), we devise a novel deep learning framework for the discovery of constitutive equations in small data regimes.The appeal of the approach is that we require neither (1) a dense time sampling of the state space X nor (2) a complete measurement of the data.Furthermore, the approach is applicable even if each variable of interest (state X and stress σ) is acquired at irregular sampling times, in contrast with the common incremental formulation (subsection 2.1).
To achieve this goal, we aim at identifying the evolution equations, f (X , ε), and the internal energy function, u (X ), in order to obtain thermodynamic admissible material models from data, cf. Figure 1.In doing so, we replace the unknown functions with two neural operators, namely f θ (X , ε) and u ω (X ), with parameters, respectively, θ and ω.Thus, we rewrite the constitutive equations, according to Eq. ( 11), as where Equation (13a) corresponds to the solution of the initial value problem related to the evolution equations, where X (0) = X (0) is the initial condition.The neural operator f θ is treated as a neural differential equation, using the explicit midpoint method with fixed time step h and global error of order O h 2 .Note that we focus here on a (total) strain control loading protocol, where the strain rates are provided at any given time.The primary goal in this case is to calculate the stress and state variables at each time point.In the evolution equation network, the strain rate thus acts as a control variable q ← ε, cf.Eq. (2).Equations (13b-13c) originate instead from the developments presented in subsection 2.3, where u has been replaced by its neural network approximation u ω .

Initial conditions
To predict the time evolution of the state variables, initial conditions X (0) = {ρ, ε e , z} (0) are required.However, in contrast with the density and the dissipative state variables, the initial conditions for the elastic strain cannot be identified without either (i ) prescribing a priori initial values (e.g.zero) or (ii ) assuming a particular form of the internal energy and determining the material elastic moduli (using pressure and shear wave velocity measurements, for instance).Yet, these two paths carry intrinsic assumptions on the material behaviour and may prevent the identification of (thermodynamically admissible) constitutive equations that best fit the measurement data.
With the aim of developing a scalable approach able to deal with the broadest range of materials and initial conditions, we propose to identify the initial conditions for the elastic strain from the knowledge of the initial values of the remaining state variables {ρ (0) , z (0) } and the stress σ (0) , by solving the following nonlinear equation find ε e(0) such that r (0) where σ θω (0) is the thermodynamic expression of the material stress (13b), evaluated at time t = 0, and σ (0) is the corresponding measured value.For the solution of the nonlinear problem (14), we treat ε e(0) as a learnable hyper-parameter, which, initialised at zero, is iteratively updated using gradient descent by back-propagating the residual r (0) ε e (see subsection 3.2).For the sake of clarity, it is worth noticing that the system of nonlinear equations in problem ( 14) is underdetermined, due to the countless internal energy functions u ω for which the residual r (0) ε e is zero, i.e., the number of parameters ω is much larger than the number of components of the stress tensor.However, the proposed technique allows for the identification of only those values of the elastic strain that are thermodynamically admissible, i.e., in agreement with the Clausius-Duhem inequality, without the need of introducing additional biases.

Learning process
Given the small data regime and in the absence of measurements of the elastic strain, the learning process of the neural integral constitutive equations, presented in Algorithm 1, consists of the minimisation of the following loss function where L σ and L z are the supervised losses related to the predictions of the stress and dissipative state variables, respectively; L d is an unsupervised loss for the fulfilment of the dissipation inequality (13c); r ε e is the residual for the fulfilment of the initial conditions of the elastic strain, given by Eq. ( 14); ∥ • ∥ 2 represents the ℓ 2 norm; and λ is a weighting parameter controlling the importance of the ℓ 2 regularization (weight decay) of the parameters θ and ω.Weight decay is here considered to penalise complexity of the neural networks and reduce eventual overfitting.We write Here, {t n , σ (n) , ε (n) } N n=1 denote the N sampling times of the stress measurements and protocol (control); {t m , z (m) } M m=1 specify the M sampling time of the acquisition of the state variables z; and ⟨ x ⟩ ≡ (|x| + x) /2 are the Macaulay brackets.Note that the quantities involved in Eqs.(15)(16)(17)(18) are all dimensionless and normalised in the range [−1, 1], except for the time scale which is normalised in the range [0, 1].For more detail, we refer to the repository accompanying this manuscript [30].
The distinction between N and M allows one to operate with arbitrary sampling frequencies of the measurement data.Note that M ≪ N in small data regimes, and that t N ≡ t M ≡ T , where T represents the last time point, t ∈ [0, T ].The error related to the dissipation inequality is evaluated at the same sampling times of the stress measurements, t n .
We use early stopping criterion based on the loss (15) associated with the validation data set to stop the training process and avoid overfitting (cf.Section 4).
It is also worth mentioning that, even in the absence of the entire time evolution of the elastic strain, only the initial values of the latter have to be identified.Indeed, the time integral formulation of the evolution equations allows learning the trajectory of ε e by minimisation of the error related to the material stress.This is an important property of the approach proposed as it does not only enables learning from limited (sparse) data but also in conditions where only partial information about the state of the material is available.
Finally, notice that the loss function (15) does not include errors related to the prediction of the values of the specific internal energy and the dissipation rate (in contrast, for instance, with [32]), as the latter are not necessary conditions for the fulfilment of the laws of thermodynamics and the accurate prediction of the material response.
Algorithm 1: Pseudocode of the neural integration for constitutive equations.

NUMERICAL BENCHMARKS
The accuracy and capability of the developed approach are evaluated using synthetic data generated using analytical constitutive models.Although testing with real experimental data is the ultimate goal, using numerically generated data offers the benefit of having exact knowledge of the ground truth, i.e., the constitutive model that underlies the data.
To mimic idealised experiments, we generate numerical data sets for training, validation, and testing using standard protocols in experimental mechanics, which are characterised by monotonous loading of the specimen and unloading to an unstressed configuration.In so doing, we disregard cyclic loading paths and other (more) exotic protocols in the generation of data for training the model, as those are rarely carried out in reality.For each protocol, we sample the stress values at 1/N time points and the dissipative state variables at 1/M time points (13).In addition, we assess the robustness of the approach with respect to errors that may be present in the data used in the training process.Such errors are simulated by generating synthetic noise, sampled from a normal distribution and added to the ground-truth data sets.
We select two benchmarks that highlight the challenges in learning constitutive representations in small data regimes: a one-state variable model (ε e ) and a three-state variables model (ρ, ε e , z).
In all cases, the entire data set is split into training, validation, and test sets, following approximately the following proportions: 60-20-20%.The new algorithm is implemented using the deep learning library PyTorch [38] and the package torchdiffeq [2] for the time integration of the evolution equations and the back-propagation framework to determine gradients.The time step of the explicit midpoint method is selected as h = T 800 .For the sake of conciseness, the comparison of the proposed approach with the common incremental formulation is discussed in Appendix A.

Elasto-plastic media
The first example concerns an incrementally nonlinear elasto-plastic, non-associative, Drucker-Prager material, with constant density (cf.Sect.8 in [8]).The specific internal energy is defined as where ε e v = 1 : ε e is the volumetric invariant of the strain tensor and ε e s = 2 /3 ε e′ : ε e′ is the invariant of the deviatoric strain tensor, where ε e′ = ε e − ε e v 3 1.We select K = 70 GPa, G = 60 GPa, friction coefficient equal to one, seismic parameter equal to one, and dilatancy parameter equal to one half (cf.[8]).
Note that the entire material response is described by a single state variable (ε e ), thus the loss term L z is neglected.Despite the simplicity of such material model, this example serves as a proof of principle of our approach in retrieving evolution equations and material response from partial measurement data, with and without synthetic noise.

Data generation
We generate data by applying single loading-unloading paths, with constant strain rate, and integrating in time the evolution equation for the elastic strain rate (using Eq. 8.15 in [8]) to compute the evolution of stress according to the energy function (19).The material stress is here identified in terms of its volumetric and deviatoric parts: p = 1 : σ 3 and q = 3 /2 σ ′ : σ ′ , with σ ′ = σ − p1.From an initial unstressed configuration, the material is first subjected to isotropic compression to generate states at different confining pressures, p ∈ [200, 2200] kPa.This then represents the initial configuration for subsequent numerical experiments composed of the following loading paths: isotropic compression/extension, undrained compression, and drained triaxial compression/extension (see Figure 2a).Each path is composed of N = 40 measurements of the stress values.
The neural operator f θ is composed of three hidden layers with 36 nodes (Gaussian error linear unit activations, GELU ( ) = 1.7 2 /(1 + e −1.7 )) and one linear output layer, while u ω consists of two hidden layers with 64 units (softplus activations s + ( ) = log(1 + e )) and one linear output layer.The training is performed for the entire batch of training data, using Adam optimiser, with adaptive learning rate, with upper and lower bounds equal to 10 -2 and 10 -4 , respectively.The generated data sets involve drained triaxial ( ṗ = ± q/3), undrained triaxial ( εv = 0), and isotropic compression ( εs = 0), where the narrow line represents critical state.

Noise-free measurement data
At first, we consider data free of any noise.Figure 2b displays the evolution of the training and validation losses during the learning process.Convergence is reached after approximately 20'000 epochs with a mean absolute percentage error in the stress prediction for the test set equal to 0.9%.The model is then deployed to infer the stress-strain behaviour for new, unobserved loading protocols and paths.Figures 3 and 4 compare the predictions (denoted with the label NICE) with the ground-truth results (denoted by ref) for cyclical undrained and drained triaxial loading paths, respectively.Notice that the neural model enables the prediction of consecutive unloading and reloading sequences, despite the learning process concerned only simple monotonous protocols.Despite some minor differences, the approach yields accurate predictions with respect to the reference values, independently of the particular loading protocol.The neural integration for constitutive equations additionally allows modelling unobserved loading pathsthat is, predictions under strain configurations that do not belong to the training and validation sets.This is presented in Figure 5, where the predictions are shown for a drained triaxial path at constant pressure, i.e., ṗ = 0, and prescribed axial strain rate, and unseen cyclical loading protocol.Once more, the accuracy of the predictions is independent of the frequency and number of the unloading-reloading cycles.

Influence of noise in the training and validation data sets
To mimic noise in an hypothetical data acquisition system, the original training and validation data sets are corrupted with synthetic noise according to where x represents the ground-truth values of a generic quantity in terms of the stress and the state variables; x * are the values corrupted with noise; N (0, δ) is a normal distribution with zero mean and standard deviation equal to δ; µ x is the mean value of x; and δ is the noise percentage.Figure 6 displays the influence of the amplitude of the synthetic noise on the data used for training in terms of the stress-strain response for one of the loading-unloading paths.The architectures of the neural network and the training strategy are kept the same as in the previous scenario.Figure 7 displays the evolution of the training loss L σ for the different noise levels.Notice that the final value of the loss depends on the amount of synthetic noise suggesting that the network does not overfit the corrupted data (cf.[34]).
Given the noise, the predictions by the proposed neural framework are tested for unseen loading conditions.Figure 8a presents the mean absolute percentage error of the stress predictions for the un-noisy test set for different levels of synthetic noise (δ). Figure 8b shows the predictions of the network trained on corrupted data sets with those obtained without synthetic noise and the ground-truth solutions, for unobserved loading-unloading cycles.
The newly proposed approach enables high degrees of robustness with respect to noisy data.In particular, the error of the predictions for the test set is always smaller than the mean value of the synthetic noise added to the training data (8% versus 20%).This is thanks, in part to the hardwiring of the existence of the energy potential and entropy balance in the formulation (as demonstrated in [34]) but primarily thanks to the integral formulation.Indeed, the latter is intrinsically robust to high noise amplitudes (cf.gated integration, [51]), especially when compared with the conventional incremental formulation (see Appendix A).

Elasto-plastic porous media
The second example concerns elasto-plastic porous media (see [42], for more).The reference constitutive model is built on three state variables: elastic strain (ε e ), bulk density (ρ), and solid fraction ϕ = ρ/ρ s , where ρ s is the solid density, defined as the ratio of the mass of the solid phase over its volume.The same model can additionally be extended to brittle materials by accounting for the dependence on the granular temperature, although this is neglected in the present work (by selecting c = 0, Eq. 19 in [42]).Following this model, the specific internal energy is defined as where ρ * s is the unstressed solid density, which is a material parameter.Despite the aforementioned internal energy function does not depend the solid fraction, the latter is responsible for density hardening of the bounding surface.The evaluation is based on the following model parameters: bulk modulus K = 10 MPa and shear modulus G = 6 MPa, the slope of the critical state line M = 1.5, the dimensionless parameter that determines the effective isotropic yield pressure β * = 0.1, and the unstressed solid density ρ * s = 600 kg/m 3 .For more, we refer to [42].
In deploying the proposed approach, the internal energy network is considered with the classification provided by Eq. ( 5).Unlike the previous case, here the model specifically includes z ≡ ϕ by virtue of the axiom of equipresence (see [48]).In contrast with the first benchmark, we deploy the complete formulation of the neural integral constitutive equations (presented in Section 3), using the loss function given by Eq. ( 15), and showcase the capabilities of the neural differential approach in presence of scarce and incomplete data.4.2.1.Data generation Data are generated by applying single loading-unloading strain-loading paths, with constant strain rate, as for the previous benchmark.In order to characterise the influence of the solid fraction, four specimens are considered with ϕ init = 0.5, 0.6, 0.7, 0.8 at the initial unstressed configuration.This is done to mimic a hypothetical experimental acquisition where the state variables of only a finite and reduced number of configurations can be actually investigated.From the unstressed configuration, we apply isotropic compression to generate states at different confining pressures (see Figure 9a), p ∈ [1000, 8500] kPa, which are followed by numerical experiments composed of undrained and drained triaxial compression and isotropic extension (see Figure 9b).
Each path is composed of N = 140 measurements of the stress.To probe the ability of our approach in learning from small data, two acquisition strategies are considered for the dissipative state variable values: M = N and M = 2.The former represents a dense time acquisition of the material state, while the latter represents the frequent scenario depicted in Figure 1, where only the initial and final values of the material state are known thanks to measurements.Note that in both scenarios, as in reality, the observations of the elastic strain are unavailable.
The neural operator f θ is composed of three hidden layers with 42 nodes (Gaussian error linear unit activations) and one linear output layer, while u ω consists of two hidden layers with 64 nodes (softplus activations) and a linear output layer.The same training procedure detailed in subsection 4.1 is followed.

Noise-free measurement data
First consider the idealised case of noise-free data sets for the training and validation of the deep learning algorithm.In particular, it is instructive to compare the accuracy and performance of the proposed formulation depending on the sampling frequency of the dissipative state variable, ϕ.Thus, after training both configurations (i.e., M = N and M = 2), the predictions for the unobserved cyclical loading protocols are compared with a set of undrained triaxial compressions (Figure 10).Specifically, Figures 10a, 10b, and 10c depict the constitutive response of specimens prepared from an initial unstressed configuration, with solid fractions of ϕ init = 0.5, and isotropically compressed upon reaching three different initial confining pressures, p (0) = 2500, 5500, 8500 kPa, respectively.The results are shown in terms of the stress ratio, solid fraction, and deviatoric elastic strain.
Regardless of the acquisition frequency, the proposed approach enables accurate constitutive representations of a material with multiple state variables from partial data.The method allows, as for the case of a simple elastoplastic material, the modelling of cycles of unloading and reloading stages even in the absence of such protocols in the training data set.Despite the lack of any measurement of one state variable (ε e ), both configurations accurately predict the evolution of the stresses and state variables, including the elastic strain.Note, however, that the discovered elastic strain coincides with the ground truth up to a constant term (which we removed in the plots in Figure 10).Furthermore, a dense parametrisation of the solid fraction (z) enables very accurate predictions in terms of all quantities of interest.But, the model continues to deliver high accuracies in the Figure 10: Predictions at inference in terms of the stress ratio (q/p, top), solid fraction (ϕ, middle), and deviatoric elastic strain (ε e s , bottom) for an undrained triaxial compression loading-unloading-reloading cycles with unobserved loading protocol for the two configurations of the acquisition of the state variables: dense (M = N ) and sparse (M = 2).Material prepared with initial solid density ϕ init = 0.5 and subjected to isotropic compression up to different initial confining pressures p (0) = 2500, 5500, 8500 kPa, from left to right.
sparse acquisition configuration (M = 2).This demonstrates the capabilities of the neural integral constitutive equations in learning constitutive models from small data regimes.
One minor difference exist between the two cases, M = N and M = 2. Relying on a dense parametrisation of ϕ, the model with M = N accurately identifies the plateau in the evolution of the solid fraction at unloading, while in the case M = 2, for some loading paths, the solid fraction does not necessarily remain constant.The reason lies in the fact that the neural model with M = 2 has not observed such a phenomenon through the given training data, and thus cannot fully grasp the steady evolution at unloading.Next, the focus is directed towards probing the inference capabilities of the sparse neural model (M = 2) under more complex loading scenarios.To this end, Figures 11 and 12 compare the approach with the reference constitutive model for a randomly cycled undrained triaxial and a drained triaxial tests path at constant pressure, respectively.The former consists of a specimen with initial solid fraction ϕ init = 0.5 and tests the predictions at varying of the initial confining pressure, p (0) .The latter tests the influence of the initial solid fraction, for the same value of p (0) .Notice that the model has not observed the drained triaxial test case.
In both scenarios, the network predictions are in good agreement with the reference model.This is particularly true for the evolution of the stress, despite the demanding loading protocols involving many cycles with random unloading and reloading frequencies.Regarding the solid fraction, the network yields overall accurate predictions, despite some minor differences in the evolution during unloading, as mentioned above.
As a final application, consider the case of cyclic isotropic compression-extension loading, from unstressed configurations, with unobserved initial values of the solid fraction, namely ϕ init = 0.3 ÷ 0.8, as shown in Figure 13.Despite some minor discrepancies for very small values of the solid fraction (Figure 13a), the neural model yields good extrapolation capabilities, outside the range of values originally observed during training.
For the sake of completeness, the comparison with the incremental formulation is given in Appendix A.2. There, we demonstrate that the conventional incremental data-fitting approach is incapable of discovering accurate constitutive models when faced with incomplete or sparse data sets.This holds true regardless of the thermodynamic structure hardwired in the neural network architecture, demonstrating the importance of the newly proposed integral formulation.Figure 12: Predictions at inference (M = 2) in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an unobserved loading path (random loading protocols of drained cyclical triaxial tests at constant pressure, ṗ = 0 with prescribed axial strain rate).In all the above the initial confining pressure is given by p (0) = 2500 kPa.However, the initial solid fraction of the unstressed configuration varies ϕ init = 0.5, 0.6, 0.7, 0.8, from left to right.

Influence of noise in the training and validation data sets
To evaluate the robustness of the approach, the original training and validation sets are corrupted with synthetic noise (cf.paragraph 4.1.3),while the training procedure is repeated using the neural network architecture discussed in subsection 4.2.Note that noise is considered for all state variables, except the bulk density.
In Figure 14, the mean absolute percentage error of the stress and solid fraction predictions for the noise-free test set are studied for varying degrees of the level δ of the synthetic noise.As before, we consider two acquisition configurations: one with M = N and another with M = 2.For stress predictions, the error is consistently found to always be smaller or equal to the amplitude of the noise used to corrupt the data.This trend holds true regardless of whether the parametrisation of the state space is dense or sparse.Conversely, the predictions for the solid fraction exhibit a significant sensitivity to noise levels in the sparse scenario where M = 2.In fact, the neural model is trained using data from only two observations of solid fractions, ϕ (0) and ϕ (T ) .Thus, the synthetic noise added does not have a distribution with zero mean and its influence is exacerbated.This result suggests that, when dealing with relatively large acquisition errors (δ), the very small data set used in this study may not provide sufficient robustness, especially in terms of predicting the evolution of state variables.Figures 15 and 16 present the predictions for an unobserved cyclic loading path for the dense and sparse acquisition configurations, respectively.In the first case (M = N ), the results demonstrate a high degree of accuracy in the neural model predictions.We observe a slight underestimation in the predicted values of stress and solid fraction only when subjected to high noise levels, when δ = 10%.
In the case of M = 2, the method still manages to provide stress predictions that qualitatively match the ground truth, even when noise is introduced.However, the predictions for the solid fraction, as noted earlier, are inevitably affected by the noise, leading to a less accurate representation of this variable under noisy conditions.

CONCLUSIONS
Despite the recent advancements in data-driven approaches, one previously unresolved issue lies in the fact that these methods requires large amounts of data.Previous neural network methods in the field of constitutive modelling therefore fail to deal with real small data from physical experimental measurements, which can only retrieve limited partial and incomplete observations of the material state.
In this contribution, we developed a novel deep learning approach to address the issue of small data by building upon the solution of the initial value problem describing the time evolution of the material state.Inspired by recent works on neural differential equations [3], the proposed approach blends artificial neural networks with time integration and learn time-continuous evolution equations.The approach also embraces the philosophy provided by the thermodynamics-based neural networks [34] and enables the identification of constitutive models that are always consistent with thermodynamics.
Numerical benchmarks showcased the benefits and accuracy of the proposed methodology.First, it removes the need of "measuring" state variables such as the elastic (or, equivalently, the plastic) strain, which cannot  ): predictions at inference in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an undrained triaxial with random loading protocol.The material is prepared with initial solid fraction ϕ init = 0.5 and isotropically compressed up to a confining pressure of p (0) = 5500 kPa.The amplitude of the synthetic noise (δ) added to the values of the stress and the state variables composing the training and validation sets varies from left to right.truly be measured or retrieved from real experiments, at least without taking gross assumptions.Second, the approach has demonstrated ability to learn accurate and robust constitutive models even in presence of generally complex model materials, and given only sparse sampling and corrupted measurement data.
Within this work, the nature of the state variables for a given material was assumed to be known.In more general cases, the same method could be advanced further to identify either: (i ) a predominant set of variables based on an initial, larger set of data, by enforcing parsimony (cf.[12]), or (ii ) surrogate (hidden) state variables from the sparse acquisitions of the microscopic material state in conjunction with dimensionality reduction techniques (cf.[31,49]).
This work advocates the possibility of building data-driven models that require substantially smaller data sets for learning constitutive equations compared to the majority of incremental approaches.However, further exploration of the capabilities and limitations of the proposed framework are the objective of future work.Nonetheless, the same approach can be generalised for other classes of materials beyond those considered here.In particular, it would be worthwhile to dedicate future studies on assessing whether materials with a much larger set of state variables can still be learned using only partial and limited information.This can be interesting especially for those state variables that are difficult to be collected from experiments, for instance, the granular temperature [29,28] and the force-fabric in discrete media [21].Similarly, future studies should be dedicated on testing the capabilities of the formulation in handling complex phenomena such as rate-dependent (e.g.[1,42]) and non-isothermal processes.
The newly developed formulation of neural integration for constitutive equations represents an important step to overcome the issues associated with data scarcity.Applications of the presented methodology in presence of real, experimental observations should be the ultimate objective of future works.There, accounting for additional requirements -stemming from physical (e.g.hydrodynamics, [23,9]) and mathematical [13,17] considerations, including Lebesgue integration to accommodate discontinuities -could enable further reduction of the amount of small data required for the machine learning of even more complex material behaviours.): predictions at inference in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an undrained triaxial with random loading protocol.The material is prepared with initial solid fraction ϕ init = 0.5 and isotropically compressed up to a confining pressure of p (0) = 5500 kPa.The amplitude of the synthetic noise (δ) added to the values of the stress and the state variables composing the training and validation sets varies from left to right.
The predictions of the neural incremental formulation are presented in Figures A. 18 and A.19 for an unobserved loading path, and are compared with those provided by the newly proposed approach, as well as with the ground-truth solution.The classical incremental data-fitting formulation clearly fails regardless of the sampling frequency for the solid fraction.The difference with the newly proposed approach only lies in the formulation (incremental versus integral).These results demonstrate that the thermodynamic structure, presented in subsection 2.3 and kept in the neural incremental approach, is not a sufficient to learn from small data.The reason lies in the fact that the neural incremental formulation cannot correctly learn the evolution of both the elastic strain and the solid fraction.Whilst thermodynamics enables the network to distinguish the nature of the two variables, the lack of observations for the elastic strain reveals the fallacy of the traditional incremental formulation.
Given the unrealistic predictions obtained for the incremental formulation, it seems reasonable to infer that in the case of corrupted measurement data, the predictions can only become worst.Conventionally incremental formulation with sparse acquisition of the state variables (M = 2) versus the proposed approach (NICE): predictions in terms of the stress (p/q, top), solid fraction (ϕ, middle), and deviatoric elastic strain (ε e s , bottom) for an undrained triaxial compression loading-unloading-reloading cycles with unobserved loading protocol.Material prepared with initial solid density (a) ϕ init = 0.5 and (b) ϕ init = 0.8, respectively, and isotropically compressed up to different confining pressures, p (0) = 2500, 8500 kPa.

Figure 1 :
Figure 1: Graphical illustration of NICE: Neural Integration for Constitutive Equations.Experimental tests (a) gather the stress-strain (σ versus ε) response, as well as initial and final material configurations, represented here by an incomplete set of sampled state variable data (X (0) and X (T ) ).From these measurement data, the system learns the evolution equations (b) through a neural differential operator f θ with parameters θ.The continuous-time and integral form of f θ then computes the state space trajectory X in e d t r ia x .(a)loading paths and protocols (b) training and validation losses

Figure 2 :
Figure 2: Test cases for training and validation.(a) Loading protocols for the generated data and (b) the evolution of training and validation losses during training.The generated data sets involve drained triaxial ( ṗ = ± q/3), undrained triaxial ( εv = 0), and isotropic compression ( εs = 0), where the narrow line represents critical state.

Figure 3 :
Figure 3: Predictions at inference for an undrained cyclic loading path (unobserved protocol) with one, four, and six unloading-reloading cycles (from left to right).

Figure 4 :Figure 5 :
Figure 4: Predictions at inference for a drained triaxial cyclic loading path (unobserved protocol) with one, three, and nine unloading-reloading cycles (from left to right).refNICE ref NICE ref NICE

Figure 6 :
Figure 6: Effect of the synthetic noise amplitude (δ) in terms of the stress-strain response for one of the loadingunloading paths used for training.

Figure 7 :
Figure 7: Evolution of training (unshaded) and validation (shaded) loss during training, at varying noise amplitudes used to corrupt the training data.

Figure 8 :
Figure 8: Performance and robustness of the approach at varying amplitudes of the synthetic noise (δ) added to the training and validation sets.(a) Mean absolute percentage error of the network predictions with respect to the un-noisy test set, and (b) predictions for an unseen constant pressure cyclical drained triaxial loading path for δ = 1, 5% (top) and δ = 10, 20% (bottom).

Figure 9 :
Figure 9: Loading paths and protocols for training, validation, and test data sets.(a) Isotropic compression loading tests followed by extensional unloading with four initial values of the solid fractions (ϕ init ), and (b) undrained and drained loading-unloading triaxial compression for an initial confining pressure p (0) = 1.0 MPa, starting from four different solid fractions.

Figure 11 :
Figure11: Predictions at inference (M = 2) in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an undrained triaxial compression with random cyclical loading protocols.Tests starts with a similar initial solid fraction ϕ init = 0.5 at the unstressed configuration and follows an isotropic compression to reach three different initial confining pressures, p (0) = 2500, 5500, 7000, 8500 kPa, from left to right.ref NICE

Figure 15 :
Figure15: Influence of corrupted training data for a dense sampling of the solid fraction (M = N ): predictions at inference in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an undrained triaxial with random loading protocol.The material is prepared with initial solid fraction ϕ init = 0.5 and isotropically compressed up to a confining pressure of p (0) = 5500 kPa.The amplitude of the synthetic noise (δ) added to the values of the stress and the state variables composing the training and validation sets varies from left to right.

Figure 16 :
Figure16: Influence of corrupted training data for a sparse sampling of the solid fraction (M = 2): predictions at inference in terms of the stress ratio (q/p, top) and solid fraction (ϕ, bottom) for an undrained triaxial with random loading protocol.The material is prepared with initial solid fraction ϕ init = 0.5 and isotropically compressed up to a confining pressure of p (0) = 5500 kPa.The amplitude of the synthetic noise (δ) added to the values of the stress and the state variables composing the training and validation sets varies from left to right.

Figure A. 18 :
Figure A.18: Conventionally incremental formulation with dense acquisition of the state variables (M = N ) versus the proposed approach (NICE): predictions in terms of the stress (p/q, top), solid fraction (ϕ, middle), and deviatoric elastic strain (ε e s , bottom) for undrained triaxial compression loading-unloading-reloading cycles with unobserved loading protocol.Material prepared with initial solid density (a) ϕ init = 0.5 and (b) ϕ init = 0.8, respectively, and isotropically compressed up to different confining pressures, p (0) = 2500, 8500 kPa.
Figure A.19:Conventionally incremental formulation with sparse acquisition of the state variables (M = 2) versus the proposed approach (NICE): predictions in terms of the stress (p/q, top), solid fraction (ϕ, middle), and deviatoric elastic strain (ε e s , bottom) for an undrained triaxial compression loading-unloading-reloading cycles with unobserved loading protocol.Material prepared with initial solid density (a) ϕ init = 0.5 and (b) ϕ init = 0.8, respectively, and isotropically compressed up to different confining pressures, p (0) = 2500, 8500 kPa.