Machine learning of evolving physics-based material models for multiscale solid mechanics

In this work we present a hybrid physics-based and data-driven learning approach to construct surrogate models for concurrent multiscale simulations of complex material behavior. We start from robust but inflexible physics-based constitutive models and increase their expressivity by allowing a subset of their material parameters to change in time according to an evolution operator learned from data. This leads to a flexible hybrid model combining a data-driven encoder and a physics-based decoder. Apart from introducing physics-motivated bias to the resulting surrogate, the internal variables of the decoder act as a memory mechanism that allows path dependency to arise naturally. We demonstrate the capabilities of the approach by combining an FNN encoder with several plasticity decoders and training the model to reproduce the macroscopic behavior of fiber-reinforced composites. The hybrid models are able to provide reasonable predictions of unloading/reloading behavior while being trained exclusively on monotonic data. Furthermore, in contrast to traditional surrogates mapping strains to stresses, the specific architecture of the hybrid model allows for lossless dimensionality reduction and straightforward enforcement of frame invariance by using strain invariants as the feature space of the encoder.


Introduction
Recent advances in materials science and manufacturing techniques are paving the way for the design of materials with highly-tailored microstructures, including metamaterials (Kumar et al., 2020;Bessa et al., 2019), novel composite material systems (Gantenbein et al., 2021;Woigk et al., 2022), printed cementitious materials (Xu et al., 2022) and multifunctional living materials (Gantenbein et al., 2023).The common thread in these new developments is a shift from traditional design focused on tailoring structures to material constraints towards tailoring material microstructures to macroscopic constraints.This shift in turn requires the development of highly-detailed models of material behavior across spatial scales and a shift to virtual structural certification, as trial-and-error design becomes infeasible (Telgen et al., 2022;Khajehtourian and Kochmann, 2021;Furtado et al., 2021).
Scale bridging has been traditionally performed through a bottomup approach: physics-based constitutive models at smaller scales are calibrated using experiments and used to perform numerical simulations (using e.g. the Finite Element (FE) method) on representative lower-scale domains from which higher-scale physics-based models can be calibrated (Van der Meer, 2016;Krauklis et al., 2019).However, physics-based constitutive models come with a priori assumptions that often fail to reproduce complex lower-scale behavior ( Van der Meer, 2016).The alternative is to opt for an FE 2 (or Computational Homogenization) approach: lower-scale FE models are embedded at every Gauss point of a higher-scale model and material behavior is directly upscaled with no constitutive assumptions at the higher scale (Feyel, 1999;Kouznetsova et al., 2001;Geers et al., 2010).Yet, the computational cost associated with repeatedly solving a large number of micromodels quickly becomes a bottleneck, in particular for manyquery procedures such as design exploration and optimization that require several higher-scale simulations to be performed.
Since the bottleneck of FE 2 lies in computing lower-scale models, a popular approach to reduce computational effort is to substitute the original FE micromodels with either structure-preserving reduced-order models (Ferreira et al., 2022;Goury et al., 2016;Ryckelynck, 2005;Ghavamian et al., 2017;Rocha et al., 2020b;Daniel et al., 2022;Scanff et al., 2022) or purely data-driven surrogates (Ghaboussi et al., 1991;Lefik et al., 2009;Le et al., 2015;Bessa et al., 2017;Rocha et al., 2020a;Wang et al., 2022) trained offline.More recently, Recurrent Neural Networks (RNN) have become the model of choice especially for strain path-dependent materials, with a large body of literature dedicated to their use and tuning to different applications (Ghavamian and Mozaffar et al., 2019;Gorji et al., 2020;Abueidda et al., 2021;Chen, 2021;Logarzo et al., 2021;Borkowski et al., 2022).RNNs can reproduce complex long-term time dependencies in material behavior by learning latent representations of the material state, making them fast and flexible surrogates.However, these learned representations are not a priori related to actual thermodynamic internal state variables and the model is therefore poorly interpretable (see Koeppe et al., 2021 for an interesting discussion on the subject).Furthermore, training for path dependency requires sampling from a potentially infinite-dimensional space of arbitrarily-long strain paths.This means training RNNs to reproduce complex material behavior often requires an inordinate amount of data (curse of dimensionality) and their purely data-driven nature limits their ability to extrapolate away from paths seen during training.
In order to address these drawbacks, a growing number of recent works are shifting focus to models with a fusion of data-driven and physics-based components.Inspired by physics-informed neural networks (Raissi et al., 2019), the authors in Masi et al. (2021) opt for data-driven models with physics-inspired bias by enforcing thermodynamic principles in a weak sense through an augmented loss function.In a similar vein, the model in Linka et al. (2021) learns hyperelasticity by linking together several carefully crafted neural nets to represent quantities with clear physical meaning, improving the interpretability of the resulting model.In Vlassis and Sun (2021) the authors extend a similar hyperelastic surrogate with a network that learns plastic flow direction and the evolution of a yield surface parametrized by a level set function, resulting in a hyperelastic-plastic model with superior extrapolation capabilities.A common thread in the aforementioned approaches, however, is that their learning architectures are heavily dependent on the type of model being learned (e.g.hyperelasticity, plasticity), making extensions to other models a convoluted task.In contrast, the authors in Liu et al. (2019), Liu (2021) propose a surrogate for heterogeneous micromodels constructed by directly employing unmodified versions of the constitutive models used for the micro constituents and using a customized network architecture to infer a homogenization operator from data that combines their responses.Nevertheless, the method employs a highly-specialized iterative online prediction routine requiring extra implementation effort and with increased computational overhead when compared to that of traditional surrogates mapping strains to stresses.Finally, in Wang et al. (2019), Flaschel et al. (2021Flaschel et al. ( , 2022) ) a dictionary of candidate physics-based models is assumed and the role of machine learning shifts instead to that of performing model selection and/or design of experiments.
In this work we explore an alternative approach for constructing hybrid surrogate models for path-dependent multiscale simulations.We start from the premise that existing physics-based modelse.g. the ones used to describe microscale constituents -are not flexible enough to reproduce macroscale behavior but nonetheless encapsulate crucial physical features such as frame invariance and loading/unloading conditions.It is our aim to avoid learning these features directly from data, as that would require either an excessively large dataset or a highly-specialized learning architecture.We therefore opt for keeping the constitutive model as intact as possible and instead increasing flexibility by allowing some (or all) of its material parameters to evolve in time.The resulting model can be seen in Fig. 1: a data-driven encoder 1 that learns the evolution of a set of material properties is linked to a physics-based material model decoder that maps strains to stresses.In contrast to other strategies in literature, we keep the architecture as general as possible: a general feature extractor parses 1 We adopt the terms encoder and decoder in order to clearly split the model in two distinct parts which we will treat with different learning approaches.As in conventional machine learning models, our encoder-decoder architecture transforms to and from a clearly defined latent space.In the present discussion it does not imply a dimensionality bottleneck.macroscopic strains into input features for the encoder -which can be as simple as the strains themselves or other derived quantities (e.g.strain invariants) -and any type of constitutive model can in principle act as decoder (e.g.hyperelasticity, plasticity, damage).By relegating stress computations to the decoder, we effectively introduce physicsbased bias to the model.2Furthermore, by letting the material model handle the evolution of its own internal variables, the model benefits from a recurrent component with interpretable memory structure that allows path dependency to arise naturally.The strategy we explore here is related to the one we propose in Maia et al. (2023), but in that work we let an encoder learn local strain distributions for several virtual material points with fixed properties.We see the two approaches as being complementary, and therefore with potential for being used in combination to form a flexible range of hybrid surrogates.
The remainder of the work is organized as follows.Section 2 contains a primer on concurrent multiscale (FE 2 ) modeling and discusses the difficulties of training purely data-driven surrogates.In Section 3, we particularize the model of Fig. 1 to the case of a feedforward neural network encoder and discuss aspects related to offline training and online numerical stabilization.In Section 4 we assess the performance of the hybrid model in reproducing the behavior of fiber-reinforced composites using different encoder input features and decoder models.Finally, some concluding remarks and future research directions are discussed in Section 5.

Concurrent multiscale (FE 𝟐 ) modeling
In this section we present a short discussion on FE 2 modeling.The goal is not to be comprehensive -the interested reader is referred to Kouznetsova et al. (2001), Geers et al. (2010) for detailed discussions on the subject -but rather to expose the computational bottleneck associated with the method and pinpoint where surrogate models can be used to alleviate the issue.We then demonstrate how a Recurrent Neural Network (RNN) can be used as surrogate model and showcase some of the difficulties associated with their training and their extrapolation capabilities.

Scale separation and coupling
In FE 2 we assume the problem being solved can be split into a homogeneous macroscopic domain  and a heterogeneous microscopic domain  ≪  where small-scale geometric features are resolved.
Here we opt for a first-order homogenization approach assuming the displacements on both scales can be related by: where microscopic displacements   are split into a linear contribution proportional to the macroscopic strains   and a fluctuation term ũ that accounts for microscopic heterogeneities.
Since   varies throughout the macroscopic domain, a micromodel for  is embedded at each Gauss point in  and a microscopic boundary-value equilibrium problem assuming small displacements and strains is solved: microscopic stress   is related to microscopic strain   with traditional physics-based constitutive models for each phase in the heterogeneous domain.In the general case where the material models feature internal variables , we can write the constitutive update for the microscale domain as: where   are the material parameters of the microscopic constituents, the operators  and  can be split into an arbitrary number of blocks with different models (e.g.elasticity, elastoplasticity, damage) for the different material phases, and   is a concatenation of the internal variables of every microscopic Gauss point and therefore fully describes the path-dependent state of the microscopic problem.
In order to determine the strains   that serve as boundary conditions for the micromodels, a macroscopic small-strain equilibrium problem is solved: but this time no constitutive assumptions are adopted.Macroscale stresses are instead directly homogenized from the microscopic response: which couples the macroscopic strain   with the microscopic solution.Since Eq. ( 1) also couples the solutions in the opposite direction, a bidirectional coupling is formed which requires the two-scale equilibrium problem to be solved iteratively.

Data-driven surrogate modeling
The coupled problem of Section 2.1 is extremely computationally demanding.The lower-scale domain  usually features complicated geometric features and must therefore be modeled with dense FE meshes in order to ensure accuracy.Worse yet, an independent microscopic problem must be solved at every integration point in  for every iteration of every time step of the simulation.This nested nature quickly forms a computational bottleneck.
Since the bulk of the computational effort lies in solving the micromodels, a popular approach to make multiscale analysis viable for practical applications is to substitute the microscopic FE models by data-driven surrogates.The idea is to perform a number of micromodel simulations under representative boundary conditions and use the resulting stress-strain pairs to train a machine learning model to be deployed when performing the actual two-scale simulations of interest.Naturally, the approach tacitly assumes that the number of offline micromodel computations required to train the model is much smaller than the number of times the microscopic behavior will be computed online.In the following, we use a simple example to demonstrate a number of difficulties associated with training such a model to reproduce path-dependent material behavior.

Example: A one-dimensional RNN surrogate
For this short demonstration, we train a Long Short-term Memory (LSTM) network (Hochreiter and Schmidhuber, 1997) to reproduce one-dimensional (single stress/strain component) elastoplasticity.The goal is not to provide a comprehensive investigation on constitutive modeling with RNNs, but rather to clearly identify parallels with our proposed approach and, more importantly, highlight how the two approaches differ from each other.
The architecture of the model is shown in Fig. 2(a) and is implemented in PyTorch (Paszke et al., 2019).In order to minimize the risk of overfitting, a pragmatic model selection procedure is performed by first training the model with several non-monotonic strain paths and gradually increasing cell size until reasonable accuracy is obtained.This leads to a parsimonious model with a single LSTM cell with 5 latent units.
At this point it is interesting to draw a parallel between the network and the micromodel whose behavior is being reproduced: the concatenation of the hidden state  and cell state  of the LSTM cell can be seen as a lower-dimensional surrogate for the set of microscopic internal variables   of Eq. (3).However, in contrast to the variables in , the latent variables  and  have no physical interpretation and evolve purely according to heuristic memory mechanisms that mimic patterns inferred during training.
First, we train the LSTM using only monotonic data.Since only one strain component is being modeled, this initial dataset is composed simply of one strain path in tension and one in compression.The trained model is then used to predict a tension path with one unloading-reloading cycle.Having never seen unloading during training, the network reverses course and unloads on top of its loading path (Fig. 2(b)).This result is hardly surprising, but sheds light on the potentially deceiving nature of the training procedure: even though we are only concerned with a single strain component, predictions actually take place in an augmented space that describes strain paths in time which can be arbitrarily high-dimensional (as paths can be arbitrarily long).
We can further demonstrate this manifestation of the curse of dimensionality with the two additional examples of Fig. 3.In Fig. 3(a)  we train the network with two unloading paths and it fails to predict a third one at an intermediate strain level.Here it can be deceiving to assume the third path can be interpolated from the other two: in the 48-dimensional space of strain paths (we use paths with 48 time steps each) the network is actually operating far away from training data.In Fig. 3(b) the network tries to reproduce a path seen during training but we first let the material rest at zero strain for five time steps before loading starts and for another five time steps at the end of the path.With purely data-driven latent dynamics, the initial rest disturbs the memory structure of the network and causes large deviations for a large portion of the path.For the rest at the end of the path, we see that the surrogate fails to predict the characteristic that the stress does not change upon constant deformation.
Training data-driven models to accurately reproduce path dependency is therefore not straightforward: their latent representations of material state are not interpretable and even phenomena as trivial as resting at zero strain must be learned from data.At the core of successful applications of RNNs to this task are either extensive datasets obtained with carefully crafted sampling strategies (Wu et al., 2020;Logarzo et al., 2021) or highly tailored datasets for specific macroscopic problems (Ghavamian and Simone, 2019).Alternatively, active learning frameworks may be used to skip offline training altogether (Knap et al., 2008;Rocha et al., 2021), but at the cost of producing slower surrogates.

A hybrid surrogate model
In this work we attempt to avoid the curse of dimensionality by relegating to a physics-based material model some of the tasks the RNN of Section 2.3 has to explicitly learn from data.In this section, we further formalize the hybrid approach of Fig. 1 by looking at the roles of each model component and their dependencies in time.We then particularize the model for the case of a feedforward neural network (FNN) encoder and discuss feature selection and numerical stabilization strategies.

Evolving material parameters
Physics-based material models are traditionally formulated with a fixed set of parameters  either directly computed from a specific set of (numerical) experiments or indirectly from stress-strain measurements in a Maximum Likelihood Estimation (MLE) approach. 3Here we start from the premise that letting (part of)  evolve in time increases flexibility and allows the model to capture more complex material behavior.Conversely, keeping the remainder of the model intact improves interpretability and provides physics-based bias to the data-driven model tasked to learn this evolution.
In Fig. 4, the hybrid model of  microscale models of Eq. ( 3), we assume the constitutive behavior at the macroscale is given by a physics-based material model: but now with time-dependent parameters   .Note that the model response at time  depends on the material state at time  − 1 through a set of internal variables   −1 (Fig. 4).This gives the model a recurrent nature not unlike that of the RNN of Fig. 2(a) with its state variables  and .The advantage here is that  has clear physical interpretation (plastic strains, damage variables, etc.) and its evolution is handled by the fixed operator  composed of clearly interpretable algorithmic steps grounded in physics and/or classical material phenomenology (e.g. a return mapping algorithm).
On the encoder side, we let the material properties  evolve according to an evolution operator  whose shape is learned from data: as a function of a set of input features  that are themselves obtained from the macroscopic strains through a feature extractor  : where   could be simply the strains themselves or other quantities derived from it.This further split of the encoder architecture into  and  helps clearly delimitate where the purely data-driven part of the hybrid model begins.More importantly, note that   depends only on the current features   and we therefore assume  is not recurrent (Fig. 4).This choice effectively limits the flexibility of  and makes the hybrid surrogate fully rely on the more robust model   to explain path-dependent phenomena, helping counter the curse of dimensionality associated with sampling strain paths.For instance, it opens up the possibility to train the surrogate exclusively with monotonic data, as we will demonstrate in the examples of Section 4.
In the following sections, we particularize the model for the case of  being a fully-connected neural network and for specific choices of  and .Nevertheless, the general architecture of Figs. 1 and 4 is meant to be as flexible as possible: • The nature and dimensionality of  is not tied to that of   since strains are also given directly to   ; • Other machine learning models for regression can also be used as , and it could in principle be split into different models handling the evolution of different subsets of .Any number of model parameters may also be left out of  and either fixed as constants or optimized to constant values during training; • No assumption is made on the form of   or the nature or dimensionality of   .Instead of a single model, it could also for instance be a mixture of physics-based models combined with analytical homogenization techniques.

Feature extractors
A pragmatic choice for  is to simply assume  is the macroscopic strain vector   itself.It is also a familiar one, as we can then relate the resulting model to conventional surrogates mapping strains to stresses.However, since macroscopic strains are also directly passed on to the decoder, the architecture gives us the freedom to experiment with different input features.Fig. 5 shows the two model architectures we explore in this work.For the two variants in Fig. 5(a) we either use   itself or a set of small-strain invariants of the macroscopic strain tensor of increasing dimensionality: where the variants are given by the well-known expressions: Additionally, since the current study focus on elastoplasticity, it is also interesting to explore input feature spaces including invariants from the deviatoric strain tensor: where: By using features based on invariants, the resulting surrogate can be made to naturally inherit frame invariance as long as the behavior of the physics-based decoder is also frame invariant.This stands in contrast with traditional black-box surrogates mapping strains to stresses.Furthermore, opting for invariant-based features can be seen as a physics-based dimensionality reduction operation that can potentially reduce the amount of data needed to train the hybrid model.
We also investigate the possibility of extracting features from the outputs of a precalibrated material model  subjected to the same strain path seen at the macroscale (Fig. 5(b)).This  is a physics-based material model, similar to , with the difference that its properties are not trained to reproduce training data but are fixed a priori (i.e. it is precalibrated).Note that this specific architecture introduces an additional recurrent component to the model through the set  of internal variables of .From a machine learning perspective, the role of  would be analogous to that of a temporal convolution operator or an RNN cell appended to the encoder.The key difference, however, is that  is fixed a priori and therefore should not require extra sampling effort with respect to the more straightforward extractor in Fig. 5(a).
Naturally, different choices for  yield models with distinct learning capabilities, and we therefore assume  encapsulates relevant information about not only the current values of   but also of its history.In the present scenario where the data is coming from micromodel computations, we opt for the intuitive choice of having  be one of the known constitutive models used to describe the microscopic material phases.We can therefore conceptually see  as an imaginary representative material point at the microscale that is always subjected to the average micromodel strain.We then use either a subset of its internal variables  or a set of invariants  σ of its stress outputs as features.

Neural network encoder
For simplicity, we opt for modeling the evolution of  using classical feedforward neural networks with fully-connected layers.As both architectures in Fig. 5 ultimately compute macroscopic stresses given macroscopic strains, we can use supervised learning to train the model with a straightforward Maximum Likelihood approach.Gathering the complete set of network weights in a vector  and seeing the complete surrogate as a monolithic model that computes an approximation σ for stresses, we adopt the following observation model for the snapshot stresses : where the superscript  is dropped for convenience,  is an identity matrix, and  is an additive Gaussian noise. 4Under the assumption of a squared loss, maximizing the likelihood of a training dataset with  observations amounts to minimizing the loss function (Bishop, 2006): with the variance of the noise that explains data misfit being simply  = ∕2.The resulting loss function is the same one used for conventional data-driven surrogates and is therefore straightforward to implement.Nevertheless, it is worth noting that since we cannot directly observe , computing the gradients of  with respect to  involves backpropagating derivatives through the decoder .Furthermore, since  affects the evolution of the internal variables , backpropagation in time becomes necessary.Starting from Eq. ( 14) and walking back through the graph of Fig. 4, the gradient of the loss at time step  of a given strain path is given by: where the remaining gradient chain ∕ is computed with conventional backpropagation through the network.If  is implemented in a code base that allows for automatic differentiation (e.g. in PyTorch), these time dependencies are naturally taken into account as long as a persistent gradient tape is used within each strain path. 5In this work we instead implement network training directly into an existing FE code, and therefore opt for the pragmatic approach of computing all partial derivatives of quantities derived from  using finite differences.
Finally, in order to enforce upper and lower bounds for  and avoid unphysical parameter values (e.g.negative elasticity moduli), we apply sigmoid activation to the final layer of the network and scale the parameters back from a [0, 1] range using predefined bounds:

Material decoders
As previously mentioned, any constitutive model can in principle be used as .For the present study we focus on reproducing elastoplasticity and therefore narrow our choices down to the following set of potential decoders with increasing levels of complexity.The simplest one is a linear-elastic isotropic material with no internal variables: where index notation is used for convenience.For this model,  comprises only the bulk and shear moduli  and , or equivalently the Young's modulus  the Poisson's ratio .
The second decoder option is a simple plasticity model with  2 (von Mises) flow.The stress update in this case becomes: where strain is additively decomposed into elastic and plastic ( p ) contributions.The yield criterium and plastic flow rule are given by: where  is the deviatoric part of the stresses,  is a plastic multiplier,  y is a yield stress parameter and we write the Frobenius norm as ‖⋅‖ F .In order to keep the model as simple as possible, we assume  y is a material constant and therefore end up with a perfectly-plastic model with associative flow.The internal variables of this model are components of the plastic strain vector  p and the only new material parameter is the yield stress  y .Finally, we also consider the more complex pressure-dependent, non-associative plasticity model proposed by Melro et al. (2013).Stress update is the same as in Eq. ( 18), but yield surface and plastic flow are given by: where   is the Kronecker delta,  t and  c are yield stresses in tension and compression, respectively, and  p is a new parameter controlling plastic contraction and allowing for compressible plastic flow.Hardening can be described by making the yield stresses general functions of  p , but when used as a decoder we assume  t and  c do not depend on  p and instead let the decoder  describe their evolution.The model by Melro et al. (2013) is also the one used to describe the microscopic material phase responsible for the nonlinear behavior observed when homogenizing micromodel response, and can therefore be seen as the natural choice for .Nevertheless, the other two decoders can provide interesting insights on the effect of introducing different levels of bias to the hybrid model.

Online predictions and inherited stability
The architecture of Fig. 1 is developed to be minimally intrusive and allow for existing material models to be used as decoders with minimum effort.We therefore implement the online routine of the model as a wrapper around an existing implementation of .The basic structure of the wrapper can be seen in Algorithm 1.The hybrid nature of the model allows for a robust approach that ensures the numerical stability of the original model  is inherited by the surrogate.This is achieved by only updating  at the end of each time step, after the global implicit Newton-Raphson scheme converges.Material properties are therefore fixed while the global solver is iterating, and that means the tangent stiffness  comes directly from  and inherits its stability features.As an example, the  2 plasticity model of Eq. ( 19) is unconditionally stable as long as its hardening modulus ℎ ≥ 0 for any , which is the case for the perfectly-plastic version we consider here.It then follows that any hybrid surrogate with  2 decoder is also unconditionally stable.Note that this is only possible because strains are directly passed on to the decoder and would therefore not be an option for conventional surrogates (e.g. the RNN of Fig. 3).For those surrogates, the tangent stiffness would come directly from the jacobian of a highly-flexible data-driven model, often at the cost of numerical stability.

Numerical stabilization
Nevertheless, the decoder  may be inherently unstable even with fixed material constants.This is for instance the case for the model by Melro et al. (2013): the non-associative flow rule of Eq. ( 20) can cause the tangent stiffness   to lose positive definiteness under certain strain conditions and for certain combinations of model parameters.To accommodate such a scenario and open up the possibility for online model adaptivity in other contexts, we propose a scheme for updating the encoder  on the fly in order to enforce extra constraints locally.
Back to Algorithm 1, at the beginning of a new time step we keep  fixed to the one obtained with converged strains from the previous step and let the solver make a first strain prediction.After this first iteration, a stability criterion is checked and used to define a new loss function that can be used to update network weights in case instability is detected.Here we employ the determinant of the acoustic tensor : where   is the vector normal to the strain localization direction creating the instability, which we find through an angle sweep procedure as in Van Der Meer and Ke (2022): we sample the space of possible localization angles densely (a one-dimensional search for 2D strain simulations) with 100 equally-spaced angles, compute  and its determinant for each corresponding normal vector   and pick the angle that leads to the lowest value for the determinant.We use det () as a metric of stability and trigger a retraining procedure in case a negative value is detected.We therefore introduce a new loss function: where ⟨⋅⟩ − extracts the negative part of its operand and  0 is a reference value for the acoustic tensor computed at the start of the simulation.We minimize this new loss at every unstable point for a small number of epochs with low learning rate, and to discourage significant drifts from the original model6 we finish the stabilization procedure by updating the network using the original loss of Eq. ( 14) for a single minibatch.Finally,  is updated using the retrained model and is kept fixed for the remaining iterations.7 Note that the local constraint of Eq. ( 22) is therefore only enforced in a soft way and remaining instabilities might still cause the global solver to diverge, in which case we cancel the current increment, go back to the beginning of the time step and allow for the procedure to be triggered again.

Numerical examples
The proposed model was implemented in an in-house Finite Element code developed using the open-source C++ numerical analysis library Jem/Jive (Nguyen-Thanh et al., 2020).In order to allow for seamless online retraining, network training was also implemented within the same code.We start this section by describing the datasets and model selection strategies used to build the surrogates.We then investigate the performance of the approach under several choices of encoders and decoders.Finally, we use the model within an FE 2 simulation and demonstrate the online stabilization procedure of Section 3.5.All simulations are performed on cluster nodes equipped with Xeon E5-2630V4 processors and 128 GB RAM running CentOS 7.

Data sampling and model selection
Models are trained to reproduce the behavior of the fiber-reinforced composite micromodel shown in Fig. 6.Fibers are modeled as linearelastic and the matrix is described by the pressure-dependent nonassociative elastoplastic model by Melro et al. (2013) (Section 3.4).Microscale material properties are adopted from Van der Meer (2016).The microscopic geometry shown in Fig. 6 results from an RVE study performed in Van der Meer (2016) and is therefore considered representative.Following the discussion in Section 3, our aim is to investigate up to which extent it is possible to circumvent the curse of dimensionality associated with path dependency by training surrogates exclusively on monotonic strain paths and having time-dependent behavior arise naturally from a physics-based decoder.We therefore limit ourselves to monotonic paths for training.For consistency, we also employ exclusively monotonic data to perform model selection.
For efficiency, we limit the present investigation to 2D simulations (i.e. three strain components) in the plane perpendicular to the fibers, but nevertheless expect the discussion and conclusions to generalize to 3D simulations as long as appropriate orthotropic decoders are employed.Datasets with 2000 monotonic strain paths are generated under both plane strain and plane stress assumptions.Fig. 7 shows the complete plane strain dataset, with a similar one also being generated for plane stress.Each path is generated with an FE 2 simulation of a single macroscopic element under displacement control along a fixed direction in strain space sampled from a uniform distribution.To circumvent convergence issues, we employ an adaptive time stepping technique that progressively reduces time step size when the simulation does not converge and gradually increases it back for subsequent increments.The simulations are stopped once a strain norm of 10 % is reached.As the adaptive scheme leads to paths with different numbers of time increments, we balance the dataset by ensuring every path is composed of 30 steps with strain norms as equally spaced as possible.
To keep model selection straightforward and avoid the need for cumbersome k-fold cross validation or bootstrapping, we train a preliminary model with enough flexibility and an extensive training dataset and gradually increase the size of the validation set until the validation error converges to a good estimate of the expected prediction error (Hastie et al., 2009).This results in validation sets with 500 paths selected at random from the original datasets, leaving 1500 paths to be used for training.We then perform model selection by gradually increasing the complexity of our FNN encoders until the validation error stabilizes.From experimenting with different architectures, we find that encoders with 5 hidden layers of 50 units each with Scaled Exponential Linear Unit (SELU) (Klambauer et al., 2017) activation provide enough flexibility for all the examples treated here.To ensure enough regularization when computing learning curves with small datasets, we employ Bernoulli dropout layers with a rate of 1 % after every hidden layer.Networks are trained for 20 000 epochs using the Adam optimizer with recommended parameters (Kingma and Ba, 2017) and 32-path minibatches.The model with lowest historical validation error is kept after training, further reducing the risk of overfitting on small datasets.
To assess the capabilities of the trained surrogates, we compute an additional test dataset comprising 50 monotonic, 50 unloadingreloading and 50 slow cycling paths, examples of which are shown in Fig. 8.To keep the comparisons fair, none of these paths are used to perform model selection and are therefore only considered after the surrogates are trained.We will use example curves like those from Fig. 8 for visual inspection of the model performance, but also the complete sets of 50 curves each for more rigorous statistical analysis.A summary of the models considered in the following sections can be seen in Table 1.

Elastic decoder
It is interesting to first consider the simple linear-elastic decoder of Eq. ( 17), as it has no internal variables and therefore leads to a surrogate model comparable in nature to a conventional FNN trained on stress-strain pairs.As we will demonstrate, however, the limited physical bias provided by such simple model already proves advantageous.Here we let both elastic properties be controlled by the learned encoder: where the bounds 10 1 <  < 10 5 and 0 <  < 0.5 are enforced as described in Eq. ( 16).
We first perform a feature selection study and investigate how efficiently the model learns as the size of the dataset is increased.From the original plane strain training dataset of 1500 monotonic strain paths, we draw datasets with sizes ranging between 1 and 150 paths without replacement and use them to train networks with different encoder features.To get a reliable estimate of the expected prediction error, we repeat this process 50 times for each dataset size and encoder type, and for comparison we also do the same for conventional FNNs trained directly on stress targets (keeping the same architecture but going directly from the final hidden layer to stresses).This amounts to a total of 3400 trained networks from which we can compute an estimate of the prediction error by averaging ‖ ‖  − σ‖ ‖ over the 500 paths left for validation.
Fig. 9(a) plots averages of the validation error over the 50 training datasets used for each size.Although the hybrid architecture does not show an advantage over the FNN when the encoder is trained on strain features, there is a clear gain in learning speed when using only the two first strain invariants as features.Apart from accelerating learning and resulting in lossless dimensionality reduction, using invariants also results in a surrogate which is frame invariant under small strains.For comparison, we also train a conventional FNN on the same set of features, but those are unsurprisingly not enough to describe general strain states and much of the material response is interpreted by the FNN as observation noise.We zoom into the first part of the learning curves in Fig. 9 Training the invariant-based hybrid network with the complete dataset of 1500 curves leads to surrogates with validation errors of about 4 MPa, accurately representing the monotonic behavior of the original micromodel.Fig. 10 shows representative predictions of this model for paths from the test set.It is also interesting to plot how the latent space of the hybrid model evolves within a single strain path.This is done for the monotonic path of Fig. 10a in Fig. 11     (solid lines).In order to reproduce exponential hardening followed by perfect plasticity, the value of  decays asymptotically to zero while  stabilizes at a relatively high level.
As can be seen in Fig. 10, this surrogate with no internal variables is not capable of predicting non-monotonic strain paths, and effectively behaves like a hyperelastic material model just as the conventional FNN would.Nevertheless, the flexible and interpretable encoder-decoder architecture of Fig. 1 allows for new creative approaches in feature selection.As a demonstration, we keep the trained network of Fig. 10 intact and only modify its feature extractor to introduce a simple path-dependent mechanism:  which freezes the evolution of  if the path becomes non-monotonic.Note that the network does not need to be retrained and this modification can be employed exclusively when making online predictions, as the new features reduce to the original ones for the monotonic paths used for training (c.f.Fig. 11, dashed line).
We plot two representative non-monotonic paths predicted by the modified model in Fig. 12. From the hyperelastic behavior of Fig. 10, the modified surrogate now behaves as a damage model: the non-linear material behavior is explained by a loss of stiffness which is made persistent by the history-aware feature extractor.This path-dependent behavior can also be observed by looking at the evolution of material properties during the slow cycling path of Fig. 12b, which we plot in Fig. 13.With the original encoder (solid lines), properties are allowed to recover during unloading, while the modified feature extractor (dashed lines) keeps them constant until the path is fully reloaded.Nevertheless, although an improvement to the original model, it is unreasonable to expect the physical bias introduced by a purely elastic model to reliably represent an elastoplastic micromodel.We therefore move to decoders with more relevant physics.

𝐽 2 decoder
In this section we choose as decoder  the elastoplastic model of Eq. ( 19) with  2 plastic flow.Standing on its own, the model is a priori perfectly plastic (constant  y ),8 but here we let its yield stress be controlled by the data-driven encoder: while enforcing 10 1 <  y < 10 3 and keeping the Young's modulus and Poisson's ratio fixed to values obtained from a single linear micromodel simulation.In contrast to the model with elastic decoder of the previous section, we now employ prior knowledge of the micromodel behavior and assume that all non-linearity should be explained by plasticity and do not let the elastic properties be dictated by the encoder.Still, the assumption of isotropic and incompressible plastic flow is a departure from the more complex pressure-dependent and non-associative behavior shown by the micromodel.Here we are therefore concerned with the effect of trading the flexibility of an elastic decoder for significantly more physical bias from a lower-fidelity representation of material behavior.
At this point it is interesting to compare the performance of the hybrid surrogate with predictions coming from the state-of-the-art mesoscale material model for polymer composites proposed by Vogler et al. (2013).It is an orthotropic elastoplastic model with pressuredependent non-associative flow precalibrated with a small number of monotonic uniaxial and biaxial stress-strain curves obtained from simulations on the exact same micromodel of Fig. 6 (see Van der Meer, 2016 for details on the calibration procedure).For this section, we switch to a dataset in plane stress, allowing the  2 model to describe richer nonlinear behavior under strain states.
Fig. 14 shows the evolution of validation set loss when training the  2 -decoded model with 1500 plane stress training The error quickly stabilizes at around 20 MPa, significantly lower than 44 MPa average prediction error obtained with precalibrated mesomodel.The added flexibility with respect to the original perfectlyplastic  2 model can be seen in the test set curves plotted in Fig. 15: the data-driven encoder leads to correct predictions of nonlinear hardening (Fig. 15(a)) and pressure-dependent plastic flow (Fig. 15(b)).The figures highlight the inability of the mesomodel to predict the behavior in certain of the strain space, particularly under compression-dominated scenarios.
The minimum validation error attained by the model is, however, nevertheless significantly higher than the 4 MPa obtained with the elastic decoder of the previous section.This result is not entirely surprising, as the elastic decoder introduces much less bias into the model and therefore allows for a greater degree of flexibility when fitting monotonic data.On the other hand, what cannot be directly gleaned from Fig. 14 is that the  2 decoder benefits from having physics-based memory coming from its internal variables that allows for making predictions of non-monotonic behavior based solely on our assumption that nonlinearities come from plastic strain and therefore without ever having to see it during training.
In Fig. 16 we plot predictions of the  2 surrogate for two different unloading-reloading paths from the test dataset.The model predicts unloading very well without being trained for it.Nevertheless, as Fig. 14 suggests, the model struggles to predict monotonic behavior under a number of different scenarios, from which it follows that any non-monotonic predictions along the same directions will also be inaccurate.Fig. 17 shows three examples of this behavior.
The choice of decoder therefore involves a tradeoff between bias and flexibility that can be deceiving to base solely on validation error computed on monotonic data.Indeed, the decoder used in the next section outperforms  2 -based decoders in most situations, but nevertheless a choice for the simpler decoder might still be justifiede.g. if the unconditional numerical stability of a  2 decoder is desirable.

Non-associative pressure-dependent elastoplastic decoder
As one final exploration on model selection, we use as decoder the same elastoplastic model by Melro et al. used to describe the matrix material at the microscale (Melro et al., 2013).As mentioned in Section 3.4, this model is the natural choice for , as it attempts to explain the observed microscopic non-linear behavior with the same model from which the behavior arises.As before we keep the elastic properties of the model intact and let only the yield stresses and the plastic Poisson's ratio change in time: where 10 1 <  t < 10 4 , 0 <  p < 0.5 and 1 <  c  t < 100.We opt for the ratio  c  t instead of simply  c in order to also enforce  c >  t .We expand upon the feature selection study of Fig. 9 by looking at several feature extractors coming both directly from strains and from the output of a precalibrated Melro model  with the same properties used at the microscale (Fig. 5(b)).As mentioned in Section 3.2, this   precalibrated model has fixed material properties and does not directly perform macroscopic stress predictions but rather only acts as a time convolution operator, converting the complete strain history seen by the surrogate into a single set of input features  for the encoder.By encapsulating the same physics that generates the nonlinear material behavior present in training snapshots, it can therefore be seen as a physics-rich alternative to e.g. a network of 1D convolution layers parsing the complete history of strains from  = 0 until the time step at which predictions are sought.
We also include predictions from the precalibrated mesomodel by Vogler et al. (2013) and selected curves from Fig. 9(a) for comparison purposes.As before, we train 50 networks of each type for each size of dataset ranging from 1 to 150 paths drawn from the original dataset with 1500 paths.Each trained network is then used to compute the validation error over the 500 monotonic validation paths and the 150 test paths (50 extra monotonic paths, 50 paths with unloading-reloading and 50 slow cycle paths).This results in an extensive study comprising 6800 trained networks and over one million test set simulations.
are summarized in Fig. 18, with each point in a curve being the average over 50 networks.Once again using invariants as features proves beneficial, leading to lossless dimensionality reduction and frame invariant surrogates.All tested models perform better than the precalibrated mesomodel, with a gap of more than one order of magnitude for the best performing surrogates.Interestingly, models with Melro-based decoders seem to learn as fast and be as flexible as models with elastic decoders, already for the monotonic curves in the validation dataset.This suggests that the new decoder does not impose extra undesirable bias in learning the specific material behavior treated here other than the assumptions that had already been introduced by elasticity (e.g.symmetries and couplings encoded by the elastic stiffness tensor).Any benefits reaped when extrapolating to non-monotonic paths, as we will see in the following, are therefore obtained at a negligible price in terms of monotonic behavior accuracy.This stands in contrast with the discussion on the  2 decoder of the previous section.
Although Fig. 18 is not enough to discern between several of our encoder choices, it is interesting to take a closer look at the two clearly underperforming options.Fig. 19  → Melro for the same monotonic test path.The model with a single feature struggles to predict the entirety of the path, indicating that further reducing the dimensionality of the feature space is not possible for this dataset.The oscillatory stress predictions make this model unsuitable for online stress evaluation in a multiscale setting.For the model with plastic strain features, the feature extractor shows no plastic strains until high stress levels while in the micromodel plasticity starts much earlier, forcing the surrogate to remain in the elastic regime until a sudden jump brings it back to the expected path.
Moving to unloading-reloading paths, we compare the performance of different feature sets by plotting the average test error over the 50 unloading-reloading paths in Fig. 20(a).Here an interesting observation can be made: even the surrogate → Elastic -which cannot predict unloading at all -attains a lower test error than the precalibrated mesomodel.This apparent contradiction can be explained by plotting in Fig. 20(b) the average error computed only at unloading or reloading time steps: use of an elastic decoder -and therefore of a conventional FNN or an RNN trained with insufficient data -excels at predicting monotonic response but is consistently inaccurate for nonmonotonic paths and shows little improvement when more monotonic paths are added to the training dataset.
In contrast, the best-performing Melro models are consistently more accurate than the precalibrated mesomodel even when trained on very little data.We plot in Fig. 21 selected representative unloading paths from the test dataset for four of the surrogates.Unloading is once again well captured without having been seen during training, and since it emerges from a purely physical mechanism, it is reasonable to expect unloading at different points along the path to yield comparable results (c.f.Fig. 3(a)).Nevertheless, relatively small differences in unloading slope can still lead to large differences in stress at the end of the unloading branches.Furthermore, the model can struggle with tension-compression switches and predict spurious hysteresis loops.
Indeed, we observe a consistent inability by the models to properly predict switches between tension and compression within the same path.This becomes clear when looking at slow cycling test paths composed of several of these switches (Fig. 8(c)).We plot learning curves for the test on slow cycling paths in Fig. 22, for complete paths as well as exclusively for the non-monotonic branches of the paths.In contrast with results up until now, here we see larger differences in performance for different feature sets.As expected, elastic decoders are once again shown to be unsuitable to predict non-monotonic paths, and the difference here is even more pronounced than in for singleunloading paths (c.f.Fig. 20) as most of the path is composed of unloading/reloading branches.The model encoded with stress invariants coming from an elastoplastic feature extractor performs best among the models we test.But crucially, none of the surrogates manages to surpass the precalibrated mesomodel in this case.
As a demonstration, we select a representative path from the test dataset and plot predictions made with four different feature sets in Fig. 23.As expected, larger errors are observed for more pronounced tension-compression switches as models either over-or undershoot the stress levels at compression-tension switch points.Interestingly, most models manage to converge back to the correct stress path after reloading, since hardening behavior is completely dictated by their nonrecurrent data-driven encoders.The exception is the model with stress invariant features ( → Melro) which we plot in more detail in Fig. 24 for each stress component separately, performing significantly better than the rest but showing a number of undesired oscillations in stress response due to the (physically) recurrent nature of its features forcing its neural network encoder to operate in extrapolation.

FE 2 example
We conclude our discussion with an FE 2 demonstration using the proposed hybrid surrogate.We model the tapered macroscopic bar with geometry and boundary conditions shown in Fig. 25.The model is meshed with 1620 linear triangles with a single Gauss point each and is loaded in tension until plastic strain localization takes place.The combination of the tapered geometry with the several circular voids along the model result in a complex range of stress states throughout the model.In contrast to the cases considered so far, this example also covers non-proportional strain paths.To facilitate convergence, the substepping approach proposed in Somer et al. (2009) is employed and an adaptive stepping algorithm is used at the macroscale that automatically reduces time step size and recomputes the current increment if either the micro-or macroscopic Newton-Raphson solver fails to converge.
We use the → Melro model of the previous section as surrogate, trained on the complete set of 1500 monotonic training strain paths.The global load-displacement curve at the right edge of the model is plotted for the full-order FE 2 solution and using the hybrid surrogate in Fig. 26(a).Since we update decoder properties in an explicit fashion (i.e.once per time step, see Algorithm 1), we use a displacement increment  = 3.5 × 10 −3 mm for the approximate model, 10 times smaller than the one used for the full-order model.
As mentioned in Section 3.5, the model by Melro et al. can suffer from numerical stability issues even with fixed material properties, and it is reasonable to expect these issues to become worse when letting properties evolve with time.Indeed, with no additional stabilization the model using the network fails to converge at the point marked in Fig. 26(a).In contrast, the stabilization procedure of Section 3.5 allows for a complete path to be obtained.For this first result, we stabilize the network for 5 epochs with a learning rate of 1 × 10 −5 for the stabilization loss (Eq.( 22)) and 1 × 10 −9 for retraining on a single monotonic training path selected at random.We also consider a model with an unloading/reloading switch after the onset of macroscopic plasticity.Results are shown in Fig. 26(b).The surrogate approximates the full-order behavior fairly accurately9 and several orders of magnitude faster than the full-order model.
It is worthwhile to further comment on the obtained levels of acceleration.On average, the computation of a single monotonic training path takes approximately 250 s, and training a surrogate for 20 000 epochs with the complete 1500-path dataset takes approximately 51 h on a single computing core.This would amount to a total offline computation time of 155 h, higher than the required for the FE 2 simulations of Fig. 26.This would however not be a consistent comparison.The hybrid nature of the proposed approach allows for training accurate surrogates with only a small fraction of this complete dataset.Indeed, Fig. 18 suggests a surrogate trained on 20 paths would offer comparable performance, which would amount to a total offline computation time (dataset generation and training) of approximately 2 h.Furthermore, this offline cost does not scale with the density of the macroscopic mesh being considered or with the number of macroscopic simulations being performede.g. in a parametric study or design optimization procedure.
We now look closer on the performance of the proposed online stabilization approach.We empirically find that retraining the network until every violating material point is fully stabilized is not strictly necessary in order to achieve convergence, and therefore opting for a small number of stabilization epochs proves to be an efficient approach.It is nevertheless interesting to investigate the impact of the number of stabilization epochs and of the subsequent retraining minibatch on the original dataset.We solve the monotonic example of Fig. 26(a) with different numbers of stabilization/retraining epochs ranging from 2 to 100 and compute the validation loss (on the 500-path validation set used for model selection) at the end of every macroscopic time increment in order to keep track of how much the stabilized network deviates from its original pretrained state.
Results are shown together with the corresponding load -displacement curves in Fig. 27.All curves remain stable at first, as stabilization is only triggered when the first unstable points are detected.From that point, models which do not undergo retraining after stabilization lose accuracy at a rate proportional to the number of stabilization epochs.However, this unintuitively does not lead to improved global stability: the loss of accuracy by the surrogate leads to spurious global softening (c.f.Fig. 27(b)) which in turn leads to further need for stabilization.Models stabilized for 50 and 100 epochs continuously fail to converge and we opt for terminating the simulation after 100 canceled time increments.On the other hand, models retrained with as little as a single strain path (out of the original 1500) after each stabilization epoch are able to maintain the original model accuracy while offering enough stability gains to allow the simulation to converge until the final step, with little change in global behavior for different stabilization regimes.More insight can be obtained on the different stabilization strategies by plotting the cumulative execution time of the simulation and the cumulative number of detected unstable strain states with time increments for different numbers of stabilization epochs.Results can be seen in Fig. 28.In general, simulations without retraining tend to run faster and result in improved stability, although any gains are quickly overshadowed by losses in accuracy (c.f.Fig. 27).Stabilizing for more epochs results in a reduction in the total number of unstable points detected, but beyond 5 epochs this does not result in an overall reduction in the computational cost of the simulation given the increased effort spent on individual stabilization operations.
As one final result, we run the monotonic simulation with the hybrid surrogate for different time step sizes.As previously mentioned, the hybrid approach allows for explicit update of  within an implicit simulation by obtaining the tangent stiffness matrix directly from the decoder.This however introduces a time step size dependency whose impact merits investigation.We plot in Fig. 29 predictions with step sizes spanning four orders of magnitude, including the same one used to obtain the full-order response.The combination of the explicit property update with the online stabilization procedure indeed introduces an upper bound for time step size for this specific problem.It stands to reason that the sensitivity to time step size also depends on the choice of decoder and on which material properties are included in .Further investigation into the matter in future works is therefore warranted.

Conclusions
In this paper, we propose a hybrid surrogate modeling architecture   for multiscale modeling of heterogeneous materials.The model is composed of a data-driven encoder for material properties and a physicsbased decoder that computes stresses.In the resulting architecture, the encoder increases the flexibility of existing material models by letting their properties evolve in time, while the decoder provides beneficial bias and interpretability to the model.The model is conceived with flexibility in mind, allowing existing implementations of physics-based material models to be used with no extra modifications.Furthermore, by letting the decoder directly receive strain inputs, the encoder architecture is highly flexible and allows for preservation of frame independence.A semi-explicit online prediction algorithm is also proposed that allows for imposing extra constraints to model behavior in a semi-supervised way.
We demonstrate the architecture by reproducing pressure -dependent elastoplastic behavior coming from homogenized fiberreinforced composite micromodels.The simple model with a linearelastic decoder learned faster than conventional data-driven surrogates, allowed for lossless feature space dimensionality reduction through the use of strain invariants, and was able to approximate path-dependent behavior through a simple history-aware feature extractor.Models with perfectly-plastic  2 decoders were shown to successfully learn nonlinear hardening and pressure dependency and predict unloading-reloading while being trained exclusively on monotonic data, outperforming a state-of-the-art mesomodel for composites in accuracy for arbitrary loading directions.Employing as decoder the same plasticity model used at the microscale led to highly-accurate monotonic response and fairly accurate extrapolation to unloading/reloading behavior.Finally, the model was used to solve a complex FE 2 model and the benefit of the online stabilization procedure was demonstrated.
We find the approach to be a promising new way to build hybrid surrogates which therefore merits further research on a number of fronts.The current architecture is not by construction concerned with enforcing unconditional thermodynamic consistency or other physical constraints of interest.Although we do find empirically that welltrained surrogates with thermodynamically consistent decoders tend to perform well, some constitutive models might not be suitable for having their properties evolve in time.Fortunately, the framework can cope with extra constraints without necessarily giving up on its flexibility, by enforcing them locally through online retraining.Although training exclusively on monotonic paths already allows for path dependency to be fairly well captured, some decoders might perform better in extrapolation if trained with a (small) number of extra non-monotonic and non-proportional strain paths -for instance when encoder and decoder can each explain the same phenomenon on their own (e.g.pressure dependency in the model by Melro et al.).We also foresee combining the present approach with the one in Maia et al. (2023) into a unified family of flexible hybrid surrogates with a range of possible combinations of feature extractors for physics-rich time convolution, fixed-property models with learned strain distributions and evolving material models.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.The hybrid surrogate combining a data-driven encoder for material parameters and a physics-based material model decoder.

Fig. 2 .
Fig. 2.An LSTM recurrent neural network as surrogate for 1D path-dependent material behavior trained with only monotonic data.
Fig. 1 is unrolled in time for a number of consecutive time steps and represented as a graph showing the dependencies between variables.Filled and hollow nodes represent observed and latent variables, respectively, and are color coded to represent the different model components in Fig. 1.Similar to the

Fig. 4 .
Fig. 4. Graph representation of the hybrid model architecture combining a data-driven encoder and a physics-based decoder.Filled circles represent observable variables and hollow circles represent latent variables.

Fig. 5 .
Fig. 5.The two types of FNN-based model architectures explored in this work, with different feature extraction steps.

Fig. 6 .
Fig. 6.The micromodel used in the examples of this work.
(b), this time also showing single standard deviation uncertainty bands coming from the variance among the 50 training datasets.The hybrid network outperforms conventional FNNs in the low data regime and tends to be less sensitive to changes in dataset starting from about 20 training paths.Nevertheless, the extra flexibility of conventional FNNs allow them to achieve lower validation errors if significantly more training paths are used.
I.B.C.M.Rocha et al.

Fig. 7 .
Fig. 7.The complete plane strain dataset used to train the surrogates, comprising 2000 monotonic strain-stress paths.A similar dataset is generated under plane stress conditions.

Fig. 8 .
Fig. 8. Examples from a test dataset with 50 paths of each type.They are not used to train any of the networks or perform model selection.

Fig.
Fig. Learning curves of models with elastic decoders and conventional FNN models.Mean error over the 500 validation monotonic paths.

Fig. 13 .Fig. 14 .
Fig. 13.Evolution of latent space properties of elastic decoder models for a slow cycling test path.

Fig. 15 .
Fig. 15.Predictions from the network with  2 decoder.Letting the yield stress evolve extends the model to more complex plasticity behavior.

Fig. 16 .
Fig. 16.Network predictions with  2 decoder for unloading paths after being trained exclusively with monotonic paths.

Fig. 17 .
Fig. 17.Examples of strain paths not well predicted by the  2 decoded model.

Fig. 24 .
Fig. 24.Slow cycling response of a Melro-decoded surrogate with stress invariant features, different stress components plotted separately.

Fig. 26 .
Fig. 26.FE 2 example: load-displacement curves with and without online stabilization, compared to the ground-truth solution.

Fig. 27 .
Fig. 27.Performance of the surrogate model for stabilization strategies of varying intensities with and without retraining after stabilization.

Fig. 28 .
Fig. 28.Impact of stabilization regime on execution time and number of unstable points throughout the simulation.

Algorithm 1 :
Material wrapper implementing the online component of the hybrid surrogate.Input: strain    at macroscopic Gauss point Output: stress   and stiffness   at macroscopic Gauss point 2 if global solver has converged : 3 store latest converged strain:   ←   ; 4 commit material history:   ←   ; 5 compute new features:   ←  (   ) ; 6 update model parameters for the upcoming time step:  ←  (   ) ; 7 if first global iteration of time step and Gauss point is unstable : 8 stabilize encoder:  ←  (    ) ; 9 recompute features:   ←  11 return   ,

Table 1
Summary of the hybrid models used in the numerical examples.