An indirect training approach for implicit constitutive modelling using recurrent neural networks and the virtual fields method

Accurate material description is crucial to achieve high-quality results in computational analysis software. Phenomenological constitutive laws generalize the material behavior observed in simple mechanical tests. The resulting empirical expressions contain parameters that need to be calibrated through an inverse optimization process. Advancements in Digital Image Correlation (DIC) techniques have enabled the extraction of non-uniform multi-axial displacement fields, facilitating the development of heterogeneous mechanical specimens and unlocking access to richer material behavior data. Despite these advancements, constitutive models are still limited by mathematical formulations and biases due to simplifying assumptions. Artificial Neural Networks (ANNs), as universal function approximators, could potentially drive a paradigm shift in the field. ANNs do not require explicit pre-formulations, hence avoiding the need for identifying unknown parameters. Moreover, ANNs are able to implicitly learn patterns purely from data, allowing to circumvent the biases induced by the simplifying assumptions of first-order principles. However, the training of ANNs for implicit constitutive modelling is not straightforward, given the impossibility of obtaining direct measurements of stress. This paper explores the integration of Recurrent Neural Networks (RNNs) with the Virtual Fields Method (VFM) for material modeling. This approach uses global force and displacement data to indirectly train the neural network, with the equilibrium being evaluated globally through the VFM loss function. The proposed method is (i) computationally efficient by not requiring Finite Element Analysis (FEA), (ii) compatible with both full-field measurements and numerically generated data, and (iii) able to handle experimental boundary conditions.


Introduction
The widespread adoption of computational analysis software in engineering, in particular Finite Element Analysis (FEA) software, has reinforced the need for precise material data, hence drawing an increased attention to the material characterization [1,2].Traditionally, constitutive laws are derived from phenomenological observations and physical knowledge and, therefore generalize the behavior observed in basic mechanical tests [3,4].The developed expressions contain empirical parameters that require calibration through an inverse optimization process, aimed at minimizing the discrepancy between experimental tests and numerical simulations [4].For decades, constitutive parameter identification was limited to one-dimensional homogeneous deformation data, due to the restrictions imposed to the specimens' shapes and loading settings and the lack of high-resolution measurement equipment.In such conditions, an extensive number of experiments were often required to calibrate more complex models, which made the process expensive and time-consuming [5].In recent years, the introduction of Digital Image Correlation (DIC) techniques allowed to extract non-uniform multi-axial displacement fields at a large number of measurement points, unlocking the development of heterogeneous mechanical specimens capable of yielding richer material behavior data (e.g.[6][7][8][9], among others).Additionally, the ability to directly link the full-field data to the FEA's discretized fields, enabled more sophisticated approaches for the identification of constitutive parameters, such as: the Finite Element Model Updating (FEMU) [10] and the Virtual Fields Method (VFM) [11].Despite the previously mentioned advancements, phenomenological constitutive models are still constrained by their mathematical formulation and are generally biased, due to the simplifying assumptions of first-order principles [12].Moreover, a given model cannot guarantee good results in different settings from those used to calibrate it [13].
The emergence of Big Data and the exponential increase in computer processing power led to an unprecedented growth in various fields of Artificial Intelligence (AI), notably Machine Learning (ML) [14].Among a plethora of ML algorithms, Artificial Neural Networks (ANNs) have risen to prominence as universal function approximators [15], offering superior modelling performance across a broad spectrum of applications.ANNs have the potential to lead a paradigm shift in constitutive modelling [16].A key advantage is that these do not require to postulate explicit formulations, eliminating the need to identify unknown constitutive parameters [17].Furthermore, ANNs can effectively leverage vast amounts of data from full-field measurements, learning patterns implicitly and, hence, mitigating the biases inherent to conventional constitutive models, given sufficient data and model capacity [12].The applicability of ANNs for constitutive modelling was first explored by Ghaboussi et al. [18], who demonstrated promising results with a Feed-Forward Neural Network (FFNN) trained to capture the cyclic behavior of concrete.This laid the foundations for subsequent developments and the expansion to other materials, with an early application to metal behavior attributed to Furukawa and Yagawa [19], who extended the concept to viscoplasticity using a state-space representation.Notable foundational contributions also include the implementation of an ANN material into a FEA code [20] and the introduction of an analytical tangent stiffness matrix for ANN-based material models [21].Advancements in computational resources enabled to explore more sophisticated architectures, such as Recurrent Neural Networks (RNNs).RNNs are well suited for implicit constitutive modelling due to built-in memory cells that capture path-dependency effects [22].
Most approaches reported in the literature for ANN-based material modelling fall within the standard supervised learning paradigm.The training process relies on labelled data pairs, with strain tensors typically provided as input features to predict the corresponding stress state.Depending on the problem, internal variables of the traditional constitutive model (e.g.accumulated plastic strain, isotropic or kinematic hardening, or others) may additionally be used, either as features or outputs.However, these internal variables exist only numerically, lacking real experimental meaning.Moreover, stress is impossible to be measured from complex heterogeneous mechanical specimens.Even from simple uni-axial homogeneous tests, stress is obtained from indirect calculations and assumptions, such as theoretical boundary conditions.In a context where real experimental data is used to train the ANN model, such limitation imposes the training process to be carried out indirectly.Therefore, the training dataset must be composed of measurable variables only, such as displacements and/or strains and global force.During training, the variable of interest, usually the stress, is not directly taken into account in the computation of the loss function and it may be indirectly obtained from intermediate calculations.Within the framework of indirect training, this paper proposes an innovative approach by coupling the VFM with a RNN for material modelling.This allows the use of global force and displacement data to indirectly train the neural network, with the equilibrium being evaluated globally through the VFM loss function.The main advantages of this approach are: (i) the computational efficiency gained from the non-use of FEA, (ii) the compatibility with both full-field measurements and numerically generated data and (iii) the independence of the boundary conditions.
The paper is organized as follows.Section 2 provides an introduction to RNNs with gated recurrent units (GRUs) and the corresponding mathematical background.In Section 3, the VFM is introduced as well as its theoretical background.The coupled RNN-VFM indirect training methodology is also presented.Section 4 provides details about numerical implementation, data generation and training procedures.Finally, in section 5, the indirect training methodology is validated against the standard supervised approach.Results from the RNNs inference on the validation data and the corresponding FEA implementations as material models are presented and discussed.

Introduction
Modelling path dependent plasticity using ANNs is a challenging task due to the history dependency effect [22].Standard feed-forward architectures are unable to capture the phenomenon, as there is no mechanism to establish a relationship between the inputs in the current time step with the outputs from previous time steps [23,24].The interplay between past and present states is a suitable use case for RNNs.These networks have cyclic connections that allow to maintain a memory of previous inputs (see Figure 1b), making them ideal for tasks involving sequential or time-dependent data [25], in which material behavior modelling is included.Within this scope, using strain sequences with a history of k length, up to the current time step t, a RNN can capture temporal dependencies, allowing it to predict the corresponding stress state accurately.These capabilities have resulted in a slew of works reported in the field [22][23][24][26][27][28][29].

Mathematical formulation
To take advantage of RNNs for implicit constitutive modelling, in the present work a variant known as Gated Recurrent Units (GRUs) is employed.GRUs are a streamlined version of Long Short-Term Memory (LSTM) units, designed to capture dependencies in sequential data efficiently, while mitigating some of the computational complexity [30].As observed in Figure 1a, a basic GRU cell comprises the following key components [30,31]: 1. Update gate (z t ) -determines how much is retained from the previous hidden state and how much it should be updated with new information: 2. Reset gate (r t ) -specifies which information from the previous hidden state should be forgotten: 3. New gate (n t ) -computes a new candidate hidden state based on the input and previous hidden state: 4. Final state (h t ) -combines the previous hidden state and the new memory content, controlled by the update gate: The variables h t and x t are, respectively, the hidden state and the input at time t, while h (t−1) is the hidden state at time t − 1, or the initial hidden state at t = 0.The entities sig and tanh are, respectively, the sigmoid and the hyperbolic tangent activation functions and the operator ⊙ is the Hadamard product.The weight matrices W and bias vectors b are appended with double indices identifying the mapping between the different components of the GRU cell.A deeper description of these mathematical entities and their corresponding dimensions, in the context of the present work, is provided in Table 1.
The hidden state h is the key element behind the flow of information relating previous time steps and future predictions.In a way analogous to the internal variables of classical constitutive laws, h carries the information about the current plastic state of the material [29].The GRU cells can be combined with layers of different types in order to build specific architectures.
Figure 1: Schematic representation of (a) a standard GRU cell and (b) an unrolled recurrent neural network.

Introduction
The Virtual Fields Method (VFM) is a state-of-the-art technique employed in the inverse identification of phenomenological material models.First introduced by Grédiac [32], the VFM is renowned for its computational efficiency [5], with the main advantage of not requiring the Finite Element Method (FEM) for any forward calculations, avoiding the modelling of boundary conditions and the specimen's geometry [33,34].The method is based on the Principle of Virtual Work (PVW), which states the internal virtual work, as a result of the internal forces, must be equal to the external virtual work performed by the external forces in the surface S, in the absence of body forces, such that [11,35]: where σ is the stress tensor, ε * is the virtual strain, u * is the virtual displacement, V is the volume of the solid and T is the vector of loading tractions.The virtual displacements are mathematical test functions, with no real physical meaning and independent of the measured displacements, that satisfy the equilibrium from Equation 5 [35].The equilibrium holds for any virtual displacement u * as long as it is continuous, piece-wise differentiable over the specimen's body and kinematically admissible, in order to satisfy the displacement boundary conditions.The virtual strains ε * are derived from the virtual displacements u * using the strain-displacement relationship [11]: Besides the requirements previously enumerated, there are no specific rules for the choice of virtual fields, rendering the user with an infinity of fields to choose from [34].Nevertheless, it is convenient to find an expression such that the virtual displacement u * would vanish or be constant along the boundary of prescribed displacements [11,33].The reason behind the simplification is the fact that the traction distribution is often unknown and only the resultant force F is available from the load cell.By doing so, the set of possible choices for the virtual fields are narrowed down and the computation of the external virtual work W * ext is simplified, such that [36]: Worthy of note, it is the fact that the strain measurements from full-field techniques are only available at the surface of the specimen.Thus, an assumption for the throughthickness behavior of the material is necessary [34].Since thin specimens are often employed in mechanical testing, it is common to reduce the problem to two dimensions by assuming a plane stress state, although different assumptions are possible.In that case, for a thin specimen of thickness h, Equation 5 may be modified into [33]: where S and ∂S are, respectively, the surface and the boundary of the specimen.In full-field measurements the displacements are captured using DIC techniques in a very large number points or subsets, as such, the integral in Equation 8 may be approximated as a discrete sum for any time stage t.Keeping that in mind and assuming a suitable set of virtual fields is used, the identification process is carried out by minimizing a loss function with respect to the constitutive parameters x defined as follows: where S j is the area of the j-th element and n VFs , n t , n pts are the numbers of virtual fields, time stages and measurement points, respectively.The choice of virtual fields constitutes a key point of the VFM with a great impact in the final solution.These virtual entities are often user-defined, nonetheless, systematic procedures are available to automatically define them, namely: the stiffness-based and the sensitivity-based virtual fields [33].It is important to refer that the User-Defined Virtual Fields (UDVFs) are tied to the user's own expertise and intuition and do not evolve in time.

Sensitivity-based virtual fields
The concept behind the sensitivity-based virtual fields is to apply a perturbation to each of the model's parameters in order to obtain a stress sensitivity and its temporal evolution [33].The stress field is the only quantity that depends directly on the constitutive parameters in the VFM, so the stress sensitivity maps highlight areas that strongly depend on a given parameter [37].The spatial sensitivity of stress to each model parameter may be computed as: with i being the i-th term of the constitutive model parameters x and t the time stage.The virtual displacements u * are found solving the following system of equations applied to a virtual mesh [37]: where B is the global strain-displacement matrix, used to map the virtual displacements at the nodes to every virtual strain.Considering a virtual mesh of bi-linear quadrilateral elements, B results from the assembly: where n e is the number of elements in the mesh and B e is the element strain-displacement matrix, defined through the spatial derivatives of the standard bi-linear nodal shape func-tions N e in the coordinate space (x, y), such that [38]: If the displacement is prescribed at the boundaries, the traction is unknown and, as such, the displacements must be set to zero.On the other hand, if the traction distribution is unknown but the resultant force is known, a constant virtual displacement is set at the boundaries.By applying these constraints, a modified global strain-displacement matrix B is obtained and the virtual displacements u * can finally be computed as follows [33]: with pinv( B) being the pseudo-inverse of the modified strain-displacement matrix.The virtual strains ε * are then computed as: The identification process is carried out, with the virtual fields u * (i) and ε * (i) being updated multiple times in order to minimize the loss resulting from enforcing the PVW, within an infinitesimal strain framework, as follows: with S j being the surface area of the j-th measurement point and α (i) the scaling factor of the i-th virtual field, employed to guarantee similar magnitudes between different virtual fields.

Coupled RNN-VFM for indirect training
Within the framework of indirect training only a handful of works were reported in the literature.For example, Xu et al. [39] proposed a symmetric positive definite neural network (SPD-NN) to predict the Cholesky factors of the tangent stiffness, with the stress tensor being computed in an incremental form from intermediate steps.Although the method provides numerically stable solutions, it applies exclusively to the non-linear segment of the material response and plasticity has to be identified through a transition function, relying on an estimate of the yield stress given by the user.Both Huang et al. [40] and Xu et al. [41] proposed a method using an ANN to replace the constitutive model in a FEA solver.The solver imposes the physical constraints during training, enabling the ANN to learn material behaviour.Similarly, Liu et al. [42] proposed integrating an ANN with FEA to form a coupled mechanical system, with inputs and outputs being determined at each incremental step.These approaches work in a way similar to the FEMU method and, while the physical correctness of the network's predictions is guaranteed, the first drawback is the need to write a FEM code based on an automatic differentiation package (e.g., Pytorch or TensorFlow), making the FEA step time-consuming.A second drawback, is the need to run multiple FEA simulations run per epoch, rendering the ANN training computationally demanding.In the domain of hyperelasticity, Huang et al. [40] proved that an ANN could be made thermodynamically consistent by predicting the hyperelastic energy instead of the stress tensor, which was computed by automatic differentiation.A similar earlier approach was shown by Man and Furukawa [43] using an energy-based loss function to indirectly train an ANN.Thakolkaran et al. [3] used an ensemble of Input Convex Neural Networks (ICNNs) to successfully learn the constitutive response of a set of well-known hyperelastic constitutive models, resorting only to full-field data.Still in the framework of hyperelasticity, in a recent work by Jeong et al. [44] an ensemble of four independent Physics-Informed Neural Networks (PINNs) was employed to extract the material behavior from full-field data, with physical constraints being encoded within a multi-objective loss.Despite the promising results and generalization capacity shown from the last two approaches, both resort to training several independent neural network models, which adds complexity.On another recent work in the domain of hyperplasticity, Franke et al. [45] employ an advanced energy-momentum scheme for time discretization to improve the performance of a Physics-Augmented Neural Network (PANN) in dynamical simulations, ensuring energy and momentum conservation.The authors report excellent results, however it is not clear their approach is applicable to real experimental data.Finally, notable contributions by Flaschel et al., aimed at the unsupervised discovery of an explicit closed form expression of the yield surface of a certain category of materials [46] and its extension to a general class of materials [47].These approaches ensure physically consistent solutions, however, the authors still resort to several assumptions about the existence of internal history variables and the explicit formulations of the models.
In the current work a novel approach is proposed by coupling the VFM with a single GRU-based RNN model (Figure 2).By doing so, one can use force and displacement data to indirectly train the neural network.The strains can be obtained from the spatial gradient of the displacements and fed as inputs to the RNN material model M, which will provide the stress components.Then, during training, the equilibrium can be evaluated globally by the application of the PVW.The network's parameters are searched until the gap between the internal and the external parts of the virtual work is minimized.This methodology benefits from all the advantages of the VFM, while being compatible with experimental data coming from full-field measurements and allows to predict the full elastoplastic material response employing only one RNN model.The possibility of integrating the VFM with neural network training had previously been shown by the authors in a very foundational state on a limitedpage conference paper [16].Despite the purely academic nature, the work allowed to identify many key areas of improvement, effectively laying the foundation for a more solid approach, which is now hereby presented.

RNN material model
2. Compute the virtual fields 3. Apply the PVW condition

Numerical DIC database
Synthetic data to train the RNN models was gathered from a set of FEA simulations of a displacement-driven heterogeneous biaxial test, using a cruciform specimen with a thickness h = 1 mm and dimensions specified in Figure 3a [36].Due to geometric symmetries only one fourth of the specimen was modelled.Symmetry boundary conditions were defined at the edges y = 0 and x = 0, with displacements being prescribed at each arm of the specimen, as depicted in Figure 3b.Several virtual experiments were generated with displacements of different magnitudes (in mm) being prescribed in paired combinations from the set: (u x , u y ) ∈ {0.5, 1.0, 1.5, 2.0, 2.5, 3.0}. ( The FEA simulations were carried out with Abaqus Standard software, assuming plane stress conditions and a small strain formulation.Regarding the material behavior, isotropic hardening was considered, with the flow stress evolving according to the Swift's law: where σ y is the flow stress, εp the equivalent plastic strain, σ 0 the initial yield stress, K and n are material-specific hardening parameters.For the purpose, a soft steel material was considered, characterized by the set of constitutive parameters listed in Table 2. The geometry was discretized using CPS4R elements and the mesh was refined according to a convergence study, from which an average element size of 0.52 mm was chosen, resulting in a total of 2121 elements.Similar refinement levels are cited in the literature for this type of specimen ( [36,49], among others).The reduced integration elements were chosen in order to simulate DIC subsets from full-field measurements, thus, all the necessary field output variables were extracted at the centroid of the element.For each time step, the strain and stress tensors were extracted for all the elements and the resultant force applied at each of the specimen's arms was computed from the equilibrium of the internal forces, through the following: where F is the global force, l the length of the specimen, n is the number of elements, A is the area of the element and h the thickness.From the generated dataset, a total of 36 virtual experiments were randomly selected and split in half, such that one part was used for training and the remaining part for validation.The training set was split such that 80% of the included virtual experiments were used for training and the remaining 20% for testing.Scaling of the deformation input data was performed by first computing the mean and variance of each component of the strain tensor, in the training dataset, and then applying the following transformation (i.e.mean normalization): where ε ′ is the scaled strain tensor, µ = [ µ εxx µ εyy µ εxy ] is the vector of mean values and s = [ s εxx s εyy s εxy ] is the vector of variances of the strain tensor components, respectively.The statistical variables µ and s were saved in order to later apply the same transformation to the test set and during inference on the validation set, prior to organizing the data into strain sequences.For each virtual experiment, the inputs have dimensions where n pts is the number of measurement points, i.e. the number of elements, n t is the number of time stages, k the RNN sequence length and the last dimension is the number of strain components.

ANN architecture
An implicit material model was developed based on a RNN consisting of an array of two GRU layers, followed by a fully-connected hidden layer with 32 neurons.The recurrent architecture processes strain data in the form of time sequences, unfolded using a sliding window of width k = 4 time steps.Each sequence took into account the strain components of the last three time stages up to, and including, the current stage t.For each mechanical test, a padding of zeros was prepended to the initial time stage, allowing the strain sequencing to be initiated from that point in order to avoid loss of information for the first four time steps.A schematic representation of the architecture is depicted in Figure 4.

Training procedure
Two GRU-based RNN models with identical architectures and hyperparameters were trained in a Python environment using the PyTorch library.A first model (Dir-RNN) was trained employing a standard supervised learning approach, with labelled data.A second model (Ind-RNN) was trained employing the indirect training methodology based on the VFM.The transfer learning approach was initially used to validate the indirect training methodology and check the correct implementation of the VFM workflow.By doing so, it is expected that the Ind-RNN model to show faster convergence and lower error [50], benefiting from prior knowledge gained from direct training.More importantly, the loss should not degenerate if the integration of the VFM proves suitable for RNN training.For the Ind-RNN, the set of user-defined virtual fields from Table 3 was selected.For each virtual field, the variables W and H represent the width and the height of the specimen, respectively, while X and Y are the material coordinates in the reference configuration.According to [36], the first two virtual fields allow to isolate each component of the external virtual work, along the xand y-direction, thus enabling the use of the corresponding components of global force acting on each arm of the specimen.The third virtual field allows to activate all the components of stress when calculating the internal virtual work, while discarding the contribution of the external virtual work.A graphical representation of these virtual fields is available in Figure 5.
Table 3: Set of user-defined virtual fields selected to train the Ind-RNN model.

Virtual field
Training was carried out using the Adam optimizer with a weight decay factor of 10 −3 .A linear warmup scheduler was employed in order to increase the learning rate from an initial value of 10 −3 up to 5 × 10 −3 within the first five epochs.From that point onward, a gradual decay was applied according to a cosine annealing scheduler defined as [51,52]: where η t is the current learning rate, η min and η max are the minimum and maximum learning rates, T cur and T max are the current iteration and the maximum number of iterations, respectively.The maximum learning rate η max was set to 5 × 10 −3 and the minimum learning rate η min to 10 −3 .Before each epoch, the order by which the virtual experiments were processed was shuffled.The corresponding input data was also shuffled along the first dimension and fed into the network in the form of small mini-batches of size [66 × n t × k × 3] due to GPU memory limitations.The training process was set to run during a maximum of 10000 epochs, nonetheless, an early-stopping criterion was defined in order to prevent overfitting, such that training would be interrupted if no further improvement was registered in the test loss function L after 500 epochs.For the Dir-RNN model the loss was based on the mean squared error between the RNN predicted stress, σ = M(θ, ε) and the target stress σ, defined as: where n is the number of training samples.For the Ind-RNN model, the loss function from Equation 9 was slightly adapted for the context of implicit constitutive modelling with RNNs, where the set of trainable parameters θ can be seen, in a way, analogous to the classical model parameters x, such that: According to Martins et al. [36] the VFM-based loss should be normalized by the external virtual work.However, the virtual field 3 does not contain that contribution, hence, the loss has to be normalised by the maximum value of the internal virtual work.

UMAT implementation
A second step to validate the trained RNN models, involved incorporating them as material constitutive models in the FEA program Abaqus Standard, through a custom user material subroutine (UMAT).The language of choice for the UMAT development was C++ due to the existence of an official C++ implementation of PyTorch (LibTorch).The latter offers most of the functionality of its Python counterpart while maintaining a highlevel syntax, which made it easier to deal with the ANN-related operations in the C++ environment.The architecture and data flow for the UMAT implementation are described in the flowchart of Figure 6.The PyTorch-based RNN models had to be converted to a serializable format in order to work in the a Python-independent environment.All the RNN-related operations were kept separate from the UMAT subroutine (umat.cpp)and implemented in a function deployed as a dynamic-link library (DLL) (annlib.dll).The UMAT works as the interface between Abaqus Standard and the latter, essentially dealing with the input/output of the necessary variables and updating the strain history, which is stored as a state variable to be used on the next time step.The DLL function is invoked in the UMAT code and receives the strain history, the mean and the variance of the training data as input arguments.Within the DLL function, the strain history is converted to tensor format and scaled to have zero mean and unit variance.The RNN model is loaded persistently into memory in the code's first execution.The scaled strain history is then fed as input to the RNN in order to predict the stress tensor, which is returned as an output back to the UMAT, along with the stiffness matrix.For the latter, the user could either use the elastic stiffness matrix, provided a sufficiently small time-step during the simulation, or it could be easily computed from the RNN outputs, via an autodifferentiation method available in LibTorch.If this is the case, it should be noted that the computed jacobian may not be a symmetric positive definite matrix, which most likely would cause instabilities and convergence problems.The UMAT implementation for the ANN-based material models was validated with one-element tests, specifically uniaxial traction along xand y-directions and a shear test.

Learning curves
Regarding the Dir-RNN model (Figure 7a), both train and test losses sharply decrease during the first 50 to 100 epochs, followed by a gradual evolution for the remaining epochs of training.The oscillations exhibited by both curves reveal a certain difficulty in achieving convergence.Nonetheless, the early-stopping was triggered after 1383 epochs, saving the model state at this point with a train and test loss of 1.3182 MPa 2 and 2.2190 MPa 2 , respectively.For the Ind-RNN model (Figure 7b), both train and test losses show similar behavior, however the initial loss is several orders of magnitude lower and the training was interrupted much earlier, with the early-stopping criteria being triggered after only 30 epochs.The model state was saved at this point, with a train and test loss of 8.830×10 −10 J 2 and 2.248×10 −9 J 2 , respectively.Although the loss functions for both models were measured in different units, the faster training and overall lower loss values of the Ind-RNN were to be expected due to the parameter transfer from the Dir-RNN model.In both cases, the gap between the train and test loss is not significant, indicating the RNNs did not overfit.Regarding the Ind-RNN model, the loss did not degenerate confirming the suitability of the integration of the VFM for RNN training.

RNN inference on validation data
To infer the quality of the RNNs predictions using unseen data, a virtual experiment with prescribed displacements u x = u y = 1.5 mm was chosen as an example from the validation dataset.Here, both the global and local stress responses are presented, with the latter being evaluated at the centroid of element no.10.For both cases, the RNN model predictions are compared against the FEA solution and the corresponding absolute difference is shown.
Regarding the global stress response, Figures 8 and 9 show the contour maps of the stress components for the last time stage (n t = 124) of the virtual experiment.One can observe that the Dir-RNN model approximates the FEA solution quite well, exhibiting low absolute errors across the specimen, with the highest error magnitudes corresponding to key regions around the fillets near the central hole and on the lower-left corner.The Ind-RNN model shows similar levels of performance, however, its predictions are bound to higher absolute errors that spread to wider regions of the specimen, albeit with maximum magnitudes kept within reasonable levels.This model also has a slight tendency to underestimate the stress in specific regions around the central hole of the specimen, as denoted by the white blobs seen around those areas.
Looking at the results corresponding to the local stress response, Figures 10 and 11 show the stress curves predicted by the Dir-RNN and Ind-RNN models, respectively, for the whole duration of the virtual experiment.The results show the predicted curves are a nearperfect match of the FEA solution, with both models showing similar levels of performance.This is supported by the corresponding absolute error curves that register very low mean and maximum values.Nonetheless, the Ind-RNN model shows, in general, slightly higher deviations for all the stress components.This was to be expected for the indirectly-trained model.For both RNN models, the error curves appear smooth, with the exception of sharp peaks, that show around the yield transition region (aprox.n t ∈ [20,30]), corresponding to the highest error magnitudes.The results successfully attest the capacity of the developed GRU-based RNN models on capturing the full elastoplastic response of the material.

UMAT validation
Figures 12 and 13 show the uniaxial tensile and shear strain-stress curves obtained for the one-element tests with the UMAT implementation of the Dir-RNN and the Ind-RNN models, respectively.The curves are compared with the FEA solution and with the RNNs' inferenced predictions.The corresponding absolute errors along the time stages are also shown.The results show the Dir-RNN model is capable of predicting the uniaxial behavior very well at both rolling (RD) and transverse (TD) directions with overall low absolute errors, however some numerical instabilities were detected with the test along the x-direction (RD),   resulting in slight oscillations towards the final segment of the stress curve.The lowest errors are exhibited for the uniaxial traction along the y-direction (TD), especially for the UMATversion of the Dir-RNN model.Higher deviations were registered for the shear response.
Regarding the Ind-RNN model, it can be seen that its predictions provide identical results, however, high discrepancies are observed with its UMAT implementation.Once again, the lowest errors are shown for the y-direction (TD), nonetheless the model seems unable to predict the initial state correctly.The highest instabilities seem to occur for the test along the x-direction (RD).Although these results are not perfect, it is interesting to see that the RNNs were able to capture the material behavior from the biaxial experiments in such a way that they are still capable of performing relatively well in a completely different setting.The source of the instabilities is essentially of numerical nature induced by spurious predictions interfering with the equilibrium iterations.Taking this into account, although the feasibility of the proposed indirect training methodology is confirmed, it certainly would benefit of the application of constraints.19) and the UMAT implementation of the Ind-RNN model, with the corresponding absolute error along the time stages.

Figure 3 :
Figure 3: Biaxial test virtual experiment setup: (a) general dimensions of the cruciform specimen, (b) set of applied boundary conditions and (c) chosen mesh density.

Figure 4 :
Figure 4: GRU-based RNN architecture for material behavior modelling.Sequences of length k = 4 of the components from the scaled strain tensor ε ′ = [ ε ′ xx ε ′ yy ε ′ xy ] are taken as input features and mapped to the stress tensor σ = [ σ xx σ yy τ xy ], obtained as output.The green rectangular block represents a linear fully-connected (FC) layer.

Figure 7 :
Figure 7: Learning curves for (a) the Dir-RNN model and (b) the Ind-RNN model based on the VFM.

Figure 8 :
Figure 8: Contour of the three components of stress for stage n t = 124 of the virtual experiment with boundary conditions u x = u y = 1.5 mm.Comparison between the FEA solution and the Dir-RNN model predicitons, with corresponding absolute difference.

Figure 9 :
Figure 9: Contour maps of the three components of stress for stage n t = 124 of the virtual experiment with boundary conditions u x = u y = 1.5 mm.Comparison between the FEA solution and the Ind-RNN model predicitons, with corresponding absolute difference.

Figure 12 :
Figure 12: Uniaxial tensile and shear strain-stress curves.Comparison between the FEA solution based on the Swift's law (Equation19) and the UMAT implementation of the Dir-RNN model, with the corresponding absolute error along the time stages.

Figure 13 :
Figure 13: Uniaxial tensile and shear strain-stress curves.Comparison between the FEA solution based on the Swift's law (Equation19) and the UMAT implementation of the Ind-RNN model, with the corresponding absolute error along the time stages.

Table 1 :
Overview of the mathematical entities, and corresponding dimensions, for a GRU-based RNN, where: n is the batch size, k the sequence length, n cells the number of GRU cells, s in the input size, s out the output size and s hid the hidden size.

Table 2 :
Constitutive parameters for the soft steel material.