Micromechanics-based surrogate models for the response of composites: A critical comparison between a classical mesoscale constitutive model, hyper-reduction and neural networks

state-of-the-art


Introduction
Numerical analysis of fiber-reinforced composite materials is, by nature, a multiscale endeavor.Although most of the design effort in composites is concentrated at the structural level (macroscale), most of the material characterization effort is spent at the mesoscale (thin coupon-sized specimens) (Ciutacu et al., 1991;Grammatikos et al., 2016).At the same time, many of the current knowledge gaps in composite behavior stem from physical and chemical processes taking place at the much smaller microscale (individual fibers and surrounding matrix), where performing discerning experiments becomes a complex and delicate task (Qian et al., 2013;Naya et al., 2016).Bridging these scale gaps through high-fidelity numerical analysis (Melro et al., 2013;van der Meer, 2016;Gagani et al., 2017) and increasingly substituting real experiments by virtual testing campaigns (Cox and Yang, 2006) is seen as the way forward in the design of composite structures.
A popular modeling approach consists in using micromechanical models to calibrate mesoscale constitutive models (van der Meer and Sluys, 2009;Vogler et al., 2013).The appeal of this approach lies in allowing the use of realistic constitutive models for each microscopic constituentfibers (Qian et al., 2013;Pimenta et al., 2009), matrix (Krairi and Doghri, 2014;Poulain et al., 2014) and fiber/matrx interface (Alfano and Sacco, 2006;Turon et al., 2006) and using homogenization techniques to derive the mesoscopic behavior from a number of numerical microscopic experiments.However, the ability of mesoscopic models to correctly represent the composite material under general stress states is limited by assumptions made in order to minimize the number of parameters to be calibrated.This can be seen, for instance, in (van der Meer, 2016), where the state-of-the-art mesoscopic plasticity model by Vogler et al. (2013) is put to the test by comparing its predictions with micromechanical results and found to be lacking in its ability to represent the influence of matrix plasticity in the fiber direction on the longitudinal shear behavior of the composite material, a loading scenario commonly encountered in practice.
An alternative to homogenized mesomodels is the concurrent multiscale (FE 2 ) approach (Geers et al., 2010;Miehe et al., 1999;Kouznetsova et al., 2001).FE 2 allows material behavior to be directly derived from embedded microscopic models without introductions any mesoscopic constitutive assumptions.However, even though the method effectively carries microscopic fidelity over to the mesoscale without loss of generality, the computational effort required by having an embedded micromodel at each and every mesoscopic integration point can be extreme (Rocha et al., 2019a).It is therefore interesting to seek alternative strategies that improve computational efficiency without sacrificing the generality of FE 2 .
One such strategy consists in reducing the computational complexity of the microscopic boundary-value problem through Model Order Reduction (MOR) techniques: through a series of analysis snapshots obtained before model deployment (offline training), reduced-order solution manifolds are computed both for displacements (Kerfriden et al., 2011;Chevreuil and Nouy, 2012) and internal forces (Hern� andez et al., 2017;Chaturantabut and Sorensen, 2010;van Tuijl et al., 2017).During the many-query multiscale analysis, projection constraints ensure that only solutions belonging to these reduced manifolds are sought, resulting in dramatic reductions in the number of degrees of freedom and constitutive model computations.The advantage of using such dimensionality reduction techniques is that, although the amount of freedom the micromodel has to represent general stress states is reduced, it is still driven by the original high-fidelity microscopic material models and therefore still obeys basic physical assumptions made at the microscale (e.g.thermodynamic consistency, loading-unloading conditions).Furthermore, recent innovations allow the training process (Goury et al., 2016) and basis construction (Ghavamian et al., 2017) to be optimized, leading to hyper-reduced models with increased accuracy and efficiency.
Alternatively, physics-based constitutive models may be altogether abandoned by employing artificial neural networks as surrogate models (Lefik et al., 2009).This approach is based on the fact that neural networks are universal approximatorsi.e.capable of approximating any continuous function to an arbitrary level of precision provided that enough parametric freedom is given to the model (Cybenko, 1989).A network can be trained with macroscopic stress-strain snapshots from a full-order micromodel and subsequently employed online to give predictions of stress and tangent stiffness.Since the early work of Ghaboussi et al. (1991), a number of efforts have been made to improve predictions by restricting the parameter space by focusing on a fixed macroscopic strain distribution (Ghaboussi et al., 1998), using gated neural layers with memory in order to capture path dependency and unloading (Ghavamian and Simone, 2019), including additional microscopic parameters such as material volume fractions in the network input (Le et al., 2015) and attempting to infuse the network with physics-based constraints (Lu et al., 2018).Nevertheless, the use of artificial neural networks as surrogate constitutive models is still far from widespread, and its applicability to model general stress states of complex micromodels is still an open issue.
In summary, three different alternatives to a fully-resolved micromodel have been discussed: physics-based mesoscale models, hyperreduced micromodels and artificial neural networks.Conceptually, these three approaches can be seen as entities of the same nature: surrogate models that require an offline calibration phase and sacrifice part of the generality and accuracy of a micromodel in favor of computational efficiency.In this work, the three strategies are compared in terms of calibration effort, efficiency and generality of representation.In order to keep the focus on the surrogate modeling techniques, matrix plasticity is the only source of nonlinear microscopic behavior considered in the study.Firstly, the multiscale equilibrium problem to be solved is briefly described.Secondly, each of the three acceleration approaches is presented, starting with a brief description of a state-of-the-art mesoscale plasticity model for composites (Vogler et al., 2013) followed by formulations of the hyper-reduced and neural surrogate models.Finally, the three strategies are put to the test in a number of numerical examples involving both pure stress cases and combined loading conditions.

Multiscale analysis of laminated composites
In order to introduce the context of the present discussion, the fullorder concurrent multiscale equilibrium problem for which surrogate models are sought is presented.Two distinct spatial scales are identified.In the mesoscale, individual composite plies are modeled as homogeneous orthotropic media.Descending to the microscale, a Representative Volume Element (RVE) of the composite microstructure is modeled, consisting of a number of unidirectional fibers and surrounding matrix.
When coupling these two scales, the goal is to exploit the highfidelity information obtained at the microscale to derive the constitutive behavior of a material point at the mesoscale.Before comparing the different approaches to perform this coupling through an offline training/calibration phase, this section outlines how an online scale coupling can be achieved without mesoscopic constitutive assumptions or loss of generality through the FE 2 technique.In the context of the present study, FE 2 is regarded as the reference solution that represents both the upper bound of model fidelity and the lower bound of computational efficiency.Formulating alternative strategies based on surrogate models entails significantly improving efficiency while retaining as much fidelity as possible.

Mesoscopic problem
Let Ω be the continuous and homogeneous mesoscopic domain being modeled and let it be bounded by the surfaces Γ u and Γ f on which Dirichlet and Neumann boundary conditions are applied, respectively (Γ u \ Γ f ¼ ∅).Stress equilibrium and strain-displacement relationships in Ω are given by: where divð ⋅Þ is the divergence operator, rð ⋅Þ is the gradient operator, body forces are neglected and a small strain formulation is adopted.In order to solve for the displacements u Ω , a constitutive relation between stresses and strains must be introduced: where the dependency on the strain history ε Ω h accounts for the possibility of path dependency.For the moment, no assumptions on the behavior of the constitutive operator D are made.In a general sense, D should account for the information on material behavior coming from smaller scales that is lost when assuming that Ω is a continuous and homogeneous medium.
In a FE environment, the domain is discretized by a finite element mesh with N degrees of freedom and the equilibrium problem is solved by minimizing the force residual r Ω 2 R N : where N and B contain the shape functions and their spatial derivatives, respectively, t Γ are the tractions at surface Γ f and the Dirichlet boundary conditions uj are implicitly applied.The formulation is completed with the definition of the tangent stiffness matrix with D Ω being the tangent material stiffness matrix.Although not explicit in the preceding equations, it is important to note that since composite laminates are anisotropic materials, constitutive computations are performed in a local material coordinate system and rotation operators are used to bring σ Ω and D Ω back to global coordinates.

Microscopic problem
Let ω define the microscopic domain of a Representative Volume Element (RVE) of the material where individual fibers and surrounding matrix are modeled.The domain is assumed to be continuous and bounded by the Dirichlet and Neumann surfaces γ u and Maintaining the small strain assumption and neglecting body forces, stress equilibrium and strains are given by: At the microscale, constitutive operators for fibers and matrix are assumed a priori.Fibers are modeled as isotropic and linear-elastic and the matrix is modeled with the plasticity model proposed by Melro et al. (2013).The matrix response starts as linear-elastic and transitions to plasticity with pressure-dependent hardening until the response reaches a perfectly-plastic regime.The model is briefly described in the following, with most formulation details being omitted for compactness.For further details, the interested reader is referred to (Melro et al., 2013;van der Meer, 2016).
The stress-strain relationship in tensor notation is given by: where D e is the fourth-order elastic stiffness matrix and an additive decomposition between elastic and plastic strains (ε p ) is assumed.The onset of plasticity is defined by a pressure-dependent paraboloidal yield surface: with I 1 and J 2 being stress invariants and the yield stresses in compression (σ c ) and tension (σ t ) being functions of the equivalent plastic strain ε eq p in order to allow for the occurrence of hardening: ε eq p ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where ν p is the plastic Poisson's ratio.The development of plastic strains is dictated by the non-associative flow rule: where Δγ is the plastic multiplier increment computed through a return mapping procedure (van der Meer, 2016) and S is the deviatoric stress tensor.The formulation is completed by the definition of the consistent tangent operator, obtained by differentiating Eq. ( 6) with respect to the strains (van der Meer, 2016).With constitutive models in place, the equilibrium residual r ω to be minimized is computed as:

Scale coupling
The basic idea behind the FE 2 approach consists in defining the mesoscopic constitutive operator D of Eq. ( 2) as the homogenized response of a finite element micromodel embedded at each integration point of the domain Ω (Fig. 1).Assuming the principle of separation of scales holds (ω≪Ω) (Geers et al., 2010), a link between the two scales is enforced by satisfying: where ũ is a fluctuation displacement field subjected to ũγ þ ¼ ũγ À , where γ À and γ þ represent pairs of opposing microdomain boundaries.In practice, enforcing Eq. ( 11) entails converting the macroscopic strain ε Ω into prescribed displacements at the corners of the micromodel, tying nodes at γ À and γ þ through periodic boundary conditions and solving the resultant boundary-value problem (Kouznetsova et al., 2001).
After convergence of the microscopic nonlinear analysis, the Hill-Mandel principle is used to recover the mesoscopic stresses: while the tangent stiffness is obtained through a probing operator P based on the microscopic stiffness matrix K ω according to the procedure in (Nguyen et al., 2012): which completes the formulation.The FE 2 approach effectively defines the operator D through an implicit procedure that involves no mesoscopic constitutive assumptions.However, the associated computational effort can be prohibitive even for simple applications.In the next sections, three alternative strategies for defining D are presented.

Mesoscale constitutive model
The mesoscopic constitutive model proposed by Vogler et al. (2013) and later revisited by Van der Meer (van der Meer, 2016) is briefly presented here as a way of defining the D operator of Eq. (2) through a physics-based model that effectively condenses the microscale material behavior into a small number of mesoscale constitutive parameters calibrated with micromechanical simulations.
A unidirectional composite lamina is modeled as an orthotropic material with pressure-dependent plasticity and assuming an additive decomposition of strains.The stress-strain relationship is similar to the one of Eq. ( 6) but the stiffness tensor D e is now orthotropic.The onset of plasticity is defined by the following yield surface, written in Voigt notation: where A is given by:  and a ¼ ½0α 3 α 3 000� T .The α coefficients are piecewise-linear functions of the equivalent plastic strain ε eq p and pressure-dependency is introduced by allowing for distinct values of α 32 and α 3 to be defined depending on the sign of σ 2 þ σ 3 .
Plastic strain evolution is dictated by the flow rule: where Δγ is the plastic multiplier computed by a return mapping procedure (van der Meer, 2016) and G is given by: with ν p being the plastic Poisson's ratio.Calibration of the mesomodel consists in determining ν p and the α coefficients through a set of micromechanical numerical experiments.
The procedure used here follows the one described in (van der Meer, 2016).From the homogenized stress-strain curves obtained from the micromodels, the components of D e are obtained and with those the equivalent plastic strain histories.With values for σ and ε eq p , the model parameters are computed as: where ts stands for transverse shear, ls for longitudinal shear, ut and uc for uniaxial tension and compression, respectively, and bt and bc for biaxial tension and compression, respectively.With this relatively limited amount of calibration data, the model can be used to predict the behavior under general stress states.

Neural networks
An alternative to a physically-motivated mesoscopic model is the use of a purely data-driven approach, the idea consisting in the introduction of a parametric regression model S used to compute an approximation σ _ of the stresses: where W are model parameters.In contrast to the parameters in Eqs. ( 18)-( 20), parameters in W have no direct physical meaning, being instead calibrated through a fitting procedure based on observations of the actual micromechanical model: where X 2 R 2nε�P is a snapshot matrix with P ε Ω -σ Ω pairs obtained from micromodel executions.Given enough parametric freedom, the surrogate should be able to encapsulate the observed constitutive information (X) and provide accurate stress predictions when presented with previously unseen values of ε Ω .
Here, S is chosen to be the feed-forward artificial neural network shown in Fig. 2, being composed of a number of fully-connected neural layers (dense layers) followed by a dropout layer that regularizes the model.When used to make predictions, strains are fed to the first neural layer (input layer) and values are propagated until the final layer is reached (output layer), at which point the output neurons contain the predicted stress σ _ .In the next sections, each component of the network is briefly described and further details are given on how training is performed.

Dense layer
A dense neural layer i propagates neuron states (a) from the previous layer i À 1 and subsequently applies an activation function ϕ to the resulting values in order to introduce nonlinearity in the network response: where W i 2 R ni�niÀ1 is a weight matrix and b i 2 R ni is a bias term, with n i being the number of neurons of layer i.The activation function ϕ here represents the element-wise application of the sigmoid function: on the neuron values, with the exception of the output layer which is left unactivated (a l ¼ v l ).Different activation functions are used depending on the intended application (Bengio et al., 2013), with the sigmoid function being a popular choice for building regression models.In general, increasing n i leads to a higher representational capability, following from the intuitive fact that the amount of fitting freedom of the model increases with the number of trainable parameters.In practice, however, models that are too large tend to exactly represent training data but fail to generalize to unseen inputs (overfitting) (Bishop, 2006).

Dropout layer
Dropout is an increasingly popular regularization strategy used avoid the phenomenon of overfitting (Srivastava et al., 2014).Here, a dropout layer is positioned immediately before the output layer and stochastically deactivates some of the neurons coming from the previous layer: where � indicates element-wise multiplication, r d 2 ð0; 1� is the probability that a given neuron is set to zero and r 2 f0; 1g n lÀ2 is a boolean vector determined by drawing from a uniform unit distribution and comparing the value to r d .If the drawn value is lower than the dropout rate, the correspondent element of r is set to zero.In order to keep the average of the neuron values unchanged after dropout, neurons that are not deactivated are scaled by 1 À r d .
During training, r is redrawn each time the network is used to make a prediction.This means that, on average, neurons of layer lÀ 2 will have been deactivated at least once.This introduces a regularizing effect because the network cannot rely on the availability of any given neuron in order to make accurate predictions.When using the network model online, the dropout layer is removedwhich is equivalent to setting r d to zeroand all neurons contribute to the response.

Training
The objective of the training process is to minimize a loss function that represents how well predictions match actual model observations: where P is the number of snapshots and the 1=2 factor is added for convenience when computing the gradients of L. In order to keep track of how well the model generalizes to unseen data, it is common to remove part of the snapshots from the training process to act as a validation set and use them to compute a separate error measure to be used as stopping criterion for the optimization.
Based on this objective function, a Stochastic Gradient Descent (SGD) optimization algorithm is used to update the trainable parameters W and b: where L j is the loss term of the j-th sample, o indicates current values, n indicates updated values and B is the size of the sample mini-batch used in the update.The idea behind using a mini-batch instead of updating the parameters using either one sample at a time or all samples at once is that it provides a balance between speed of convergence and gradient variance.In any case, a complete solver iteration (epoch) is only complete after the model has seen every sample in the training seti.e. after approximately P=B mini-batches.Finally, the operator A depends on the choice of solver.Here, the Adam solver proposed by Kingma and Ba Kingma and Ba ( 2014) is adopted.
In order to compute the gradients appearing in Eq. ( 27), a backpropagation procedure is adopted: based on the network state (v, a and r) after computing each1 training sample, the chain rule is used to propagate the derivative of the loss function starting from the output layer and progressively moving back through the network.For this, an auxiliary quantity d i 2 R ni is defined for each layer.At the output layer l, it is simply defined as: Next, the effect of the activation function is taken into account: after which it is possible to compute the gradients of the trainable parameters: Finally, the values of d of the previous layer (the next layer to be backpropagated) can be computed as: and the algorithm moves to Eq. ( 29) for layer i À 1.For the dropout layer, since it does not have any trainable parameters, the effect of the stochastic dropout is simply backpropagated to the previous layer:

Use as constitutive model
To make new stress predictions, the input layer is set to the applied mesoscopic strain, a complete forward pass is performed and the final activated neuron values of the output layer give the predicted stress: For the consistent tangent stiffness, it is necessary to compute the jacobian J of the network: which is obtained with a backward pass through the network (from output to input): where I ϕ' i is a matrix whose diagonal contains the derivatives of the activation function with respect to the neuron values v: Fig. 2. A neural network acting as a surrogate constitutive model.An arbitrary number of dense neural layers is combined with a single dropout layer that regularizes model response.

Hyper-reduced-order modeling
Instead of resorting to surrogate mesoscopic models, FE2 can be made efficient by accelerating the associated microscopic boundaryvalue problems.In this section, two complexity reduction operations are applied to the equilibrium problem of Section 2.2.First, the number of degrees of freedom of the problem is drastically reduced, followed by a hyper-reduction phase on which a reduced global integration scheme for internal forces is defined.The techniques are only described briefly in order to keep the focus on their application to the problem at hand.More details on the underlying formulations can be found in (Rocha et al., 2019b).

Proper Orthogonal Decomposition (POD)
The first strategy consists in projecting the original equilibrium problem of size N onto a reduced solution manifold spanned by a basis matrix Φ 2 R N�n : where φ i are a set of orthonormal basis vectors that represent global displacement modes.By constraining the possible displacement configurations to the ones lying in the latent space defined by Φ, the number of degrees of freedom of the problem is reduced from N to n≪ N. The full-order displacement field is recovered as a linear combination of the latent variables α 2 R n : In order to solve for α, the full-order residual of Eq. ( 10) is constrained to lie on the reduced space through the Galerkin projection Φ T r ω ¼ 0, yielding reduced versions of the internal force vector and stiffness matrix:

Empirical Cubature Method (ECM)
Even though the POD-reduced problem has only a small number of degrees of freedom, solving for α still involves computing stresses at every integration point in order to obtain f ω and K ω for use in Eq. ( 39).However, given the fact that f ω r is of small dimensionality, it is intuitive to surmise that the amount of constitutive information needed to define it is also significantly reduced.
This hypothesis may be posited more formally as follows: From the complete set of M integration points with original integration weights w i , it is possible to define a reduced set of m≪M integration points with modified integration weights ϖ j such that the approximation: leads to a negligible loss of accuracy.This idea is the basis for the Empirical Cubature Method (ECM) proposed by Hern� andez et al. (2017).
The reduced set Z of m integration points is chosen from among the original M points by using a Greedy least-squares procedure that solves: where J and b are given by: where Λ is a basis matrix for the contribution of each integration point to the global reduced force vector f ω r .With β, the modified integration weights of points in Z are computed as ϖ i ¼ ffi ffi ffi ffi ffi w i p β i .For details on the Greedy selection procedure, the reader is referred to (Hern� andez et al., 2017).
During the online FE 2 analysis, the responses of integration points not included in Z are never computed, leading to a full-order internal force vector composed almost solely by zeros.On the other hand, the homogenization procedure of Section 2.3 requires a complete assembly of f ω and K.In order to bypass this issue, a tangent mode contribution matrix H 2 R n�nε is computed for each micromodel such as to satisfy: where α are the latent variable values resulting from solving the equilibrium problem with applied macroscopic strains ε Ω .With this oper- ator, the homogenized stress and stiffness are computed as:

Training
Both reduction stages are constructed with mechanical behavior information that must be computed before model deployment, similar to the calibration procedure of Section 4.3.For POD, the basis matrix Φ is computed from a series of P displacement snapshots X u 2 R N�P decomposed into elastic and inelastic parts: where a snapshot is considered inelastic if at least one integration point in ω has non-zero equivalent plastic strain.Following the elastic/inelastic training strategy presented in (Hern� andez et al., 2017), the basis Φ 2 R N�ðneþniÞ is given by: where each portion of the basis (n e elastic and n i inelastic modes) is obtained through a truncated Singular Value Decomposition (SVD) operation: with the modified snapshot matrices and Y being a basis matrix computed from the SVD of X e .In order to guarantee that every possible stress state in the elastic regime is exactly reproduced by the reduced model, the decomposition that generates U e is truncated at n ε components (n ε ¼ 6 for three-dimensional micromodels).For U i , the basis includes all basis vectors whose associated singular values satisfy the condition: with S 1 i being the first (and highest) singular value and ε sv a truncation tolerance.
For ECM, training consists in running the POD-reduced model for the same original training cases 2 and collecting snapshots of stresses at every integration point.Following again the elastic/inelastic strategy, a basis matrix for stresses Ψ 2 R Mnε�q is computed, with q ¼ n e þ n i in order to keep the truncations consistent with the ones from the first reduction phase.
With Φ, the basis matrix for internal forces used in Eq. ( 42) can be obtained: with each of the q submatrices Λ j 2 R M�n being given by: and the contribution of each integration point being: where Φ i is the submatrix of Φ that contains the degrees of freedom of the finite element that contains point i, B i is the matrix of shape function derivatives evaluated at point i and s j and φ j are respectively the singular value and left-singular vector associated with the j-th mode of Ψ.

Comparing the strategies
The surrogate modeling strategies have been implemented in an inhouse Finite Element code based on the Jem/Jive Cþþ numerical analysis library (Dynaflow, 2019).All models were executed on a single core of a Xeon E5-2630V4 processor on a cluster node with 128 GB RAM running CentOS 7.
The micromodel used as a basis for training the reduced-order models is the one shown in Fig. 1.This is the same RVE adopted by Van der Meer in (van der Meer, 2016) and is assumed to be sufficiently representative of the mechanical response of a mesoscopic material point.Material properties for both the micromodel and the calibrated mesomodel of Section 3 are also adopted from (van der Meer, 2016).In order to guarantee constant stress ratios in biaxial scenarios while avoiding large strain steps during the perfect plasticity regime, a special arc-length constraint a is adopted: with which the load factor λ that scales unit forces applied at the corner nodes of the RVE is controlled so as to guarantee that the unsigned sum of displacements at the same locations is equal to a prescribed value u.
All snapshots used for training come from models loaded monotonically with a constant stress ratio (proportional loading) until the norm of the strain at controlled nodes reaches a value of 0.1.To test the trained surrogates, a homogeneous mesoscopic 1-element model3 with a single integration point and the same dimensions as the original micromodel is used, with the fiber direction (1-axis) aligned with the mesoscopic xaxis.
Neural networks with a single hidden dense layer are considered.Deeper networks with up to 5 hidden layers have also been investigated, but were found to provide lower accuracy than shallow networks with a similar number of parameters.Results from these deeper networks are therefore not included in the discussion.Unless otherwise specified, training sets are formed by randomly drawing 80% of the samples of the original dataset without replacement, with the remaining 20% serving as a validation set.At the beginning of training, network biases are initialized as zero and weights are initialized with draws from an uniform distribution in the interval ½ À1; 1� and scaled with the factor ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi and Bengio, 2010).The dropout rate is fixed at r d ¼ 0:05 for all models.Although this is a much lower rate than the one adopted for instance in (Ghavamian et al., 2017), it is found to provide sufficient regularization for the network and dataset sizes treated in this study.For the SGD solver, the default values recommended in (Kingma and Ba, 2014) are used for all hyperparameters.All models are trained for a total of 200,000 epochs and the final model parameters are the ones associated with the lowest historical validation error.The only hyper-parameter to be studied is therefore the width n 1 of the hidden dense layer.

Pure stress states
First, reduced models are trained to reproduce the material behavior of a single unidirectional composite layer under isolated stress components, i.e. uniaxial cases in the parameter space.Here the training dataset consists of twelve stress-strain curves, two for each of the n ε ¼ 6 mesoscopic strain components (positive and negative directions).From this point on, strain and stress components are expressed in the local mesoscale coordinate systemi.e.fε 11 ; ε 22 ; ε 33 ; γ 12 ; γ 13 ; γ 23 g, where the 1-axis is the fiber direction and the superscript Ω is dropped for compactness.
Hyper-reduced models are trained with different values of the inelastic SVD tolerance ε sv (Eq.( 49)).The resultant model predictions for the transverse stress σ 22 are shown in Fig. 3.For high values of ε sv -i.e. with a small number of inelastic modes -the plasticity response is not correctly captured, with predictions improving as the tolerance is lowered and more modes are added.Note that the snapshot decomposition of Section 5.3 effectively guarantees an exact response during the elastic regime.A similar response is observed for the remaining five strain components.
Using the surrogate models to reproduce stresses at the same strain values used for training, an average error over the complete dataset comparing the training targets σ with the surrogate responses b σ can be defined: with n t i being the number of load steps comprising the stress-strain curve associated with each strain component i. Errors are computed for different values of ε sv , with results being shown in Fig. 4. As with Fig. 3, the error starts at a high value when only elastic modes are used and decreases to values as low as 0.4 MPa for ε sv ¼ 0:01.Fig. 4 also includes the average error of predictions made with the mesoscopic model of Section 3. Since that model explicitly ensures no plasticity occurs in the fiber direction while the actual microscopic response in that direction is slightly nonlinear, the average absolute error over the dataset appears to be high4 even though all the other directions are very well captured.For this reason, Fig. 4 shows two accuracy levels for the mesomodel, with and without including σ 11 .
Since controlling the tolerance only influences the number of modes n indirectly, the error tends to decrease in discrete steps.This can also be observed in Fig. 5, which shows how the number of modes n and integration points m increases as ε sv is reduced.Since the reduction in the number of integration points is made possible by the POD reduction, maintaining a low ECM integration error for higher values of n requires a larger set of cubature points.In any case, the reduction remains relatively efficient even for the lowest ε sv considered here -with compression factors N=n � 1284 and M=m � 65.
The same dataset is used to train neural networks with a number of hidden units n 1 ranging from 10 to 1000.In order to track the training process, the evolution of the average absolute error over the validation set (20% of the complete dataset) is plotted in Fig. 6.The monotonic error decrease observed for all curves suggests that no overfitting to the data is occurring.Increasing the size of the hidden layer improves the obtained predictions but with diminishing returns for n 1 larger than 100.Indeed, doubling the size of the hidden layer from 500 to 1000 leads to a negligible decrease in the error.
The same trend can be observed in Fig. 7, where online predictions are computed from a one-element model loaded in the 2-direction (transverse direction).Although accurate predictions of the perfect plasticity plateau can be obtained by using sufficiently large networks, both the initial stiffness and the response leading up to the plasticity plateau are still slightly inaccurate even for n 1 ¼ 1000.The important observation to be made here is that even though neural networks are regarded as universal function approximators, the regularization brought by the dropout layer has the adverse effect of making an exact fit with the training data very difficult to achieve.
The average absolute error for the complete dataset obtained with networks of different sizes is plotted in Fig. 8.Although showing a similar trend as Fig. 6, two important differences between the errors in these two cases should be noted.Firstly, errors in Fig. 8 take into account the whole dataset, while Fig. 6 only shows errors computed for samples in the validation set.Secondly, while errors in Fig. 6 are computed by feeding the network with the exact strain vectors coming from micromodels, Fig. 8 is obtained by using the trained network online in a oneelement model that includes numerical noise intrinsic to the Newton-Raphson procedure used to solve it.The presence of numerical noise combined with the fact that datadriven models lack any sort of physical constraint to their behavior can lead to substantial error accumulation as the analysis progresses: wrong stress predictions lead to wrong solutions for the displacements which in turn become wrong strains to be fed to the network.After a few time steps, the network will be operating well outside of its training space and making nonsensical predictions.
In order to demonstrate how the inclusion of a dropout layer increases model robustness against noise, two networksone of size    are used to predict the response of a model loaded in transverse tension (2-direction) with and without the inclusion of small perturbations to all three shear components, ε 12 ¼ À ε 13 ¼ ε 23 ¼ 0:01ε 22 .
Results are shown in Fig. 9.While the regularized response remains unchanged after the introduction of noise, the unregularized model branches off into an unphysical softening regime.Note how the unregularized model actually gives better predictions than the regularized one before it starts to lose precision: training a robust and accurate model entails finding a balance between the bias introduced by regularization and the variance introduced by allowing the model to become overly complex (this is also known as the bias-variance tradeoff).
Before moving on to more complex stress states, an interesting conclusion can be drawn by letting the reduced models make predictions on a strain range beyond the one used during training.Fig. 10 shows the straightforward case of tension in the fiber direction (σ 11 ).The training snapshots teach the models how the stress response should behave for strains in the range ½0; 0:1�, but in the range ð0:1; 0:2� the models must rely on their extrapolation capabilities.Owing to its stronger physical foundation, the hyper-reduced model correctly predicts a nearly linear stress response, while the network deviates from linearity after only a few time steps and transitions to an unphysical perfectly-plastic response.For hyper-reduced models, it is enough to stop training after the material response stabilizes.For neural networks the requirement is slightly stronger, as the complete strain range to be encountered online should be seen by the model during training.
Finally, the impact on computational efficiency of increasing the size of the reduced models is investigated.Execution times are related to model size (number of POD modes n or size of the hidden neural layer n 1 ) in Fig. 11, where the smallest model of each type (ε sv ¼ 1:0 or n 1 ¼ 10) is used to normalize the curves.For the neural model, increasing the size of the model 100 times only leads to an execution time approximately twice as long (0.09s), indicating that other operations related to the 1-element FE model (e.g.solving the 24-DoF equilibrium system) are more expensive than the very efficient neural network computations.For the hyper-reduced model, an increase of only 2.5 times on the number of POD modes leads to a 5 times longer computation (20.70s).In any case, both models are still significantly faster than the full-order one (3167s).
For linear materials, a simple linear combination of the pure stress states considered in this section would be enough to describe any combined stress state.Unfortunately, the material behavior being learned here is highly nonlinear and path dependent.In the next sections, the accuracy impact incurred by using pure stress combinations to approximate combined stress scenarios is investigated.Furthermore, the ability of surrogate models to incorporate new information coming from additional micromechanical simulations (retraining) is assessed.

Biaxial transverse tension
For the next set of examples, the trained models of Section 6.1 are used to predict material response under biaxial transverse tension loading (a combination of σ 22 and σ 33 ).A common design practice when dealing with plasticity is to compute a yield stress envelope by plotting the final stress levels for different stress ratios.for all stress components, they are already capable of predicting both the lower (θ ¼ 0 ∘ ) and upper (θ ¼ 90 ∘ ) bounds of the tension-tension envelope of Fig. 12.In order to investigate the accuracy of the models upon extrapolation from the training set, they are used to predict the response for θ ¼ 45 ∘ .The models are also retrained by including extra training cases that gradually approach the center of the envelope from both sides with the limit of the new training sets being represented by the angle θ lim (Fig. 12) -and used to predict θ ¼ 45 ∘ .For these new trainings, ε sv ¼ 0:01 is adopted and the size of the hidden neural layer is fixed at n 1 ¼ 500.Error levels over the training set similar to the ones in Figs. 4 and 8 are obtained for the retrained models.
The obtained responses are very accurate even with no additional retraining (θ lim ¼ 0 ∘ ).This is an interesting feature of the projectionbased reduction: an accurate response at θ ¼ 45 ∘ hinges on correctly accounting for pressure-dependent yielding, which the POD model does in an approximate way by using information obtained from pure compression snapshots.A similar level of accuracy is obtained for σ 22 .
The network model does not perform as well.With no additional retraining, the stress stabilizes at a value approximately 50% lower than the reference one.Adding training cases closer to the one being predicted brings the response closer to the target, but even with training points at θ ¼ 40 ∘ and θ ¼ 50 ∘ the maximum stress is still approximately 10 MPa off.On the other hand, the regularization applied to the network does ensure a stable response with physically-sound shape (linear, plastic hardening and perfect plasticity) even upon significant extrapolation from the training set.
Although the robustness of the network model is an advantageous feature when working with nonlinear solvers at the mesoscale, the model outputs the expected curve shape even when the actual stress values are far from being correct and therefore does not provide any clue that it is operating outside of its training space.Ideally, the analyst should be provided not only with a prediction but also with a measure of how much confidence the model has in giving it.
The next example explores the bootstrap strategy, a popular approach    for estimating uncertainty in neural networks (Khosravi et al., 2011).Instead of relying on the prediction of a single6 network, 50 different networks are trained with all pure stress cases and one extra case with θ ¼ 45 ∘ and used to predict the complete envelope.Each network has different initial weights and different training sets obtained through a bagging process (Breiman, 1996): from the complete bag of 3500 stress-strain pairs, samples are randomly drawn, included in the training set and placed back in the bag until the training set has 3500 pairs.This process leads to sets that see approximately 63.2% of the original sample pool, with some pairs appearing more than once.The samples that remain unseen are used as a validation set.Fig. 15 shows the envelopes predicted by each of the 50 networks as well as the average prediction.Following (van der Meer, 2016), the stresses that define the envelope are computed at a strain level of ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Close to trained points ð0 ∘ , 45 ∘ and 90 ∘ Þ, predictions from all networks are close to the average one, indicating a high level of confidence in the prediction.Moving away from the trained points, the level of disagreement between networks gradually increases, indicating that predictions in those ranges of θ should be used with care.Naturally, this additional piece of information comes at the cost of computing 50 network responses instead of one, but more efficient techniques such as Bayesian neural networks can also be used to derive network responses with uncertainty intervals (Khosravi et al., 2011).
Plotting the ensemble response together with predictions obtained with the mesomodel of Section 3 in Fig. 16, it can be seen that both give predictions with roughly the same level of accuracy, with errors of up to 10 MPa.The advantage of the network model over the mesomodel lies in the possibility of retraining.Fig. 16 also shows the prediction of a single network trained with all values of θ used to construct the envelope.Even though this network is now trained on two complete datasets (pure stress states and biaxial transverse tension), the size n 1 ¼ 500 of the network is kept unchanged.Nevertheless, the same level of accuracy shown in Fig. 8 is achieved.
Finally, an analogous study is performed with the hyper-reduced model.The response of models trained with pure stress cases plus a single biaxial case (θ ¼ 45 ∘ ) and with all envelope points are shown in Fig. 17.With only a single biaxial training point, the hyper-reduced model already outperforms the mesomodel.Expanding the training set leads to an almost perfect agreement with the full-order model, but a price is paid in terms of efficiency: the model including all stress ratios has a reduced space of size n ¼ 30 and m ¼ 714 cubature points (compare with n ¼ 18 and m ¼ 241 for the model trained with only 0 ∘ , 45 ∘ and 90 ∘ Þ.In practice and depending on the application, it might be more advantageous to accept a relatively small loss of accuracy in order to keep the surrogate model efficient.

Longitudinal shear and transverse tension
The next set of examples considers the combination of longitudinal shear (σ 12 ) and transverse tension (σ 22 or σ 33 ).This is a loading scenario commonly encountered by laminated composites in service.It is therefore an important stress combination to consider when training surrogate models.Here, the relevant stress ratio is θ ¼ arctan � σ12 σtt

�
, where σ tt can be either σ 22 of σ 33 .Changing the direction of this transverse stress leads to different micromodel responses, a distinction that is lost in the  invariant-based mesomodel.First, models are trained with a combination of pure stress states and a number of extra cases defined by the limit stress ratio θ lim 2 ½0 ∘ ; 90 ∘ � (analogous to Fig. 12) and used to predict the response of θ ¼ 45 ∘ .For this first part, σ tt ¼ σ 22 .Fig. 18 shows results for hyper-reduced models.
For this load combination, information gathered from only pure stress cases (θ lim ¼ 0) is not enough to properly reproduce the response at θ ¼ 45 ∘ , with a relative error of 13% for the maximum stress level.Adding extra training cases quickly reduces the error, as expected.Although not shown in Fig. 18, a similar accuracy level is obtained for σ 22 .Interest- ingly, predictions by the network model for this load combination are significantly better than the ones obtained for biaxial transverse tension.With the addition of relatively few extra training cases (from θ lim ¼ 30 ∘ ), the network converges to the micromodel solution, as can be seen in Fig. 19.
For the next test, the network and hyper-reduced model of Figs.18 and 19 trained with θ lim ¼ 40 ∘ and σ tt ¼ σ 22 are used to predict the curve with θ ¼ 45 ∘ but this time with σ tt ¼ σ 33 .The obtained results can be seen in Fig. 20.None of the surrogates is able to correctly predict the shear response when the direction of the transverse stress is shifted.The hyper-reduced model is the one with the lowest error, being able to correctly predict the response up to the perfect plasticity regime and overshooting the maximum stress by about 5%.Interestingly, the mesomodel is the one with the largest discrepancy.Since the model is invariant-based, no distinction is made between σ 22 and σ 33 when combining them with τ 12 , leading to excellent agreement for the σ 22 -τ 12 combination but not for σ 33 -τ 12 .
Fig. 20 illustrates the high level of complexity of the parameter space being treated here and raises the issue of how to best sample this parameter space in order to ensure accuracy under general stress states.For the mesomodel, sampling is a simple task that consists of a small predefined amount of micromechanical experiments (Section 3).But the underlying assumptions that allow for such a simple calibration process lead to highly inaccurate predictions for this specific loading scenario which is still a relatively simple one.The biggest drawback of the mesomodel is that there is no straightforward way to substitute these prior assumptions by posterior knowledge coming from additional micromodel simulations.
For hyper-reduction and neural networks, the problem is the opposite: these models can readily incorporate new epistemic information but must contend with sampling a potentially infinite parameter space.Although the question of sampling is much simplified here by focusing on monotonic loading along a number of load paths defined a priori, it is an open issue that should be addressed in tandem with the development of new surrogate modeling techniques (Goury et al., 2016;Ghavamian and Simone, 2019).
Models trained with pure stress cases plus two combined stress cases -θ ¼ 45 ∘ for σ tt ¼ σ 22 and σ tt ¼ σ 33 -are used to predict the complete stress envelopes for σ 22 -τ 12 and σ 33 -τ 12 .The bootstrap strategy is once again employed in order to obtain the average and variance of a combination of 50 different network models.Results are shown in Fig. 21, with each envelope point corresponding to predictions at a strain level ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ε 2 tt þ γ 2 12 q ¼ 0:04.It is interesting to note that the network ensemble gives more accurate and more predictions for the region of the envelope dominated by shear than for the one dominated by transverse stresses.The average response is compared with the one obtained from a single network trained on the complete dataset as well as with mesomodel predictions in Fig. 22.As in Section 6.2, adding extra training cases improves predictions.Once again the same model size n 1 used for pure stress cases is enough to learn the larger dataset considered here without loss of accuracy.

Axial stress and longitudinal shear
One last stress combination is briefly examined, namely longitudinal shear (τ 12 ) with tension in the fiber direction (σ 11 ).For high σ 11 = τ 12 ratios, the longitudinal shear response is heavily affected by the presence of plastic strains in the fiber direction.Since the mesomodel of Section 3 explicitly eliminates the possibility of plasticity developing under axial loading, its effect on the shear behavior is not captured.Van der Meer (van der Meer, 2016) points to this as being a major weakness of Vogler's mesomodel, so it is interesting to investigate how well the other surrogate strategies can handle this scenario.
The hyper-reduced model trained only on pure stress cases is used to predict shear response for a set of ratios σ 11 =τ 12 2 ½57; 29; 11; 6; 0�.Re- sults are shown in Fig. 24.Without any additional training, the hyperreduced model reproduces the curves for all ratios remarkably well.On the other hand, a network without additional retraining gives poor predictions (Fig. 25).This example illustrates the advantage of reduction methods that, although constrained to a reduced solution manifold, are still driven by the original constitutive laws of the full-order micromodel (see (Liu et al., 2019) for an interesting alternative involving neural networks infused with actual constitutive laws).
The neural network is retrained by including every curve in Fig. 25 in addition to the pure stress curves.The resultant curves are shown in Fig. 26.Although providing better predictions, the retrained network is still not able to accurately capture the response leading up to the perfect plasticity plateau.This is consistent with the observed, for instance, in Fig. 7 and seems to be a side effect introduced when regularizing the network.

FE 2 example
As one final illustrative example, the surrogate models are used to simulate the interlaminar shear test shown in Fig. 27.The model consists of a short beam composed of unidirectional composite layers with fibers aligned in the 0 ∘ direction shown in Fig. 27.Symmetry is exploited by modeling only half of the span of the beam and the problem is simplified by modeling the beam in 2D with a plane strain assumption.The model is discretized with 484 constant-strain triangles each with a single integration point.For models requiring an embedded RVE (full-order FE 2 and hyper-reduced), the same 3D micromodel used for training the surrogate models is adopted and only in-plane stress and stiffness components are upscaled.Due to the short span between supports, strain localizes at mid-thickness (Fig. 27) in a region dominated by longitudinal shear (τ 12 ).
The models of Section 6.1, trained only with pure stress cases, are used as surrogates (ε sv ¼ 0:01, n 1 ¼ 500).The full-order FE 2 problem is also solved as reference.This is a challenging scenario for the surrogates since the model experiences a complex combination of longitudinal shear, fiber stress and transverse tension and compression close to the load and support.Furthermore, the plane strain assumption at the macroscale leads to stress combinations not covered during training under pure stress states.The analysis is executed for 118 time steps, after which global convergence cannot be obtained for the full-order FE 2 model.None of the surrogates show this lack of robustness, but for the sake of comparison with the full model they are also stopped after 118 time steps.
The resultant load-displacement curves are shown in Fig. 28.Despite operating under a complex scenario not covered during training, all surrogates predict the response well.The network model is the one showing the highest discrepancy, with predictions for the load factor approximately 5% lower than the reference ones.This lack of precision during the hardening regime is consistent with previous observations (c.f.Figs.7 and 26).
Execution times and speedups are shown in Table 1.Even with a coarse mesoscopic mesh with only 484 embedded micromodels, the fullorder model takes more than one week to run.Without additional techniques such as parallelization or the construction of surrogates, FE 2 is effectively unsuitable for any practical application.Among the surrogate models, the mesomodel is the most efficient, followed by the neural network and the more expensive hyper-reduced model.
There is, however, no clear-cut recommendation to be made as to which strategy should be chosen.The mesomodel is fast and robust but fails in predicting relevant loading combinations.The neural network is fast, can be retrained to incorporate new information and its efficiency scales well with model size, but it has poor extrapolation capabilities and grapples with the bias-variance tradeoff.The hyper-reduced model retains relevant physical information, extrapolates well to unseen data and readily handles unloading and path-dependency, but is inherently slower than the other options and scales poorly with the size of its latent space.

Conclusions
Three different approaches for constructing surrogate models for multiscale analysis of laminated composites have been compared through an extensive series of numerical tests.Comparisons involved a state-of-the-art orthotropic mesoscale model with pressure-dependent plasticity, feed-forward neural networks with dropout regularization and hyper-reduced models combining the POD and ECM techniques.Even though substantial computational efficiency gains could be obtained with all of the approaches, each comes with a particular set of advantages and drawbacks:    (Figs. 23 and 24).Once formulated, it is not possible to easily include in the model new epistemic information gained from running additional micromechanical models.
� Neural networks are fast, can be trained to reproduce general stress states and can be retrained to incorporate additional data (c.f.Figs. 25 and 26).However, their extrapolation capabilities are limited, which makes using them away from their training sets risky (Fig. 14).Furthermore, unregularized networks can lead to high errors and nonsensical predictions by feeding on their own inaccuracy (Figs. 9 and 10).Finally, conventional feed-forward networks assume a unique relationship between stresses and strains and therefore cannot handle unloading or strain path dependency.
� Hyper-reduction tends to give better predictions with a lower training effort by retaining physical information from the original full-order model.Hyper-reduced models tend to generalize well to unseen data, albeit with varying degrees of success (c.f.Figs. 13, 18 and 24), and can be retrained on new observations (Fig. 23).On the other hand, they are significantly slower than the other surrogates and their efficiency does not scale well as more precision is sought or as more training cases are added (Fig. 11).
Although none of the techniques were found to be optimally efficient and accurate in every situation, they could be employed in combination in order to leverage their strengths and minimize their weaknesses.For instance, for a given mesoscopic structure to be modeled, one could first use the mesomodel to quickly solve the problem, gather a number of representative strain histories from multiple integration points and inject those in a single micromodel in order to generate highly-tailored training data for hyper-reduced models or neural networks.This can be used to efficiently solve the issue of sampling over an extremely large space of possible strain combinations without having to run full-order FE 2 models.Alternatively, an adaptive approach could be used to switch between surrogates: an ensemble of neural networks could be used to compute the response at all points but predictions with low confidence would be substituted by those coming from a hyper-reduced micromodel.In any case, the present in-depth investigation on the advantages and limitations of each technique may serve as a valuable starting point for building smarter multiscale analysis frameworks for laminated composites.

Fig. 1 .
Fig.1.The FE 2 approach: A concurrent link is established between meso and microscales.

Fig. 3 .
Fig. 3. Hyper-reduced model trained with pure stress states.Predictions improve as the truncation tolerance ε sv is reduced.

Fig. 4 .
Fig. 4. Average absolute errors of the hyper-reduced model for the pure stress dataset.

Fig. 6 .
Fig. 6.Evolution of the average validation error during training of networks with different hidden layer widths (n 1 ).

Fig. 7 .
Fig. 7. Predictions of transverse stress made by neural network models with different hidden layer sizes (n 1 ).

Fig. 8 .Fig. 9 .
Fig. 8. Average absolute errors over the entire pure stress dataset for network models with different hidden layer sizes (n 1 ).

Fig. 10 .
Fig. 10.Surrogate models used to predict material behavior outside of the strain range seen during training.The hyper-reduced model predicts the correct response, while the network shows an unphysical perfectly-plastic behavior.

Fig. 11 .
Fig. 11.Increases in execution time when model size (n for hyper-reduced models and n 1 for network models) is increased.

Fig. 12 .
Fig. 12. Illustration of a biaxial yield envelope.The angle θ defines the ratio between the two stress components.When training surrogates, θ lim is used to define the bounds of the training space.

Fig. 13 .
Fig.13.Hyper-reduced model predictions of the biaxial transverse tension response when θ ¼ 45 ∘ .Curves from models trained only on pure stress states (θ lim ¼ 0) as well as models retrained with additional biaxial cases (θ lim > 0) are shown.

Fig. 14 .
Fig. 14.Network model predictions of the biaxial transverse tension response when θ ¼ 45 ∘ .Curves from models trained only on pure stress states (θ lim ¼ 0) as well as models retrained with additional biaxial cases (θ lim > 0) are shown.

Fig. 17 .
Fig. 17.Biaxial yield envelopes obtained with a hyper-reduced model trained on pure stress cases plus θ ¼ 45 ∘ and with another one trained with all values of θ.The mesomodel envelope is shown for comparison.

Fig. 23 .
Fig. 23.Biaxial yield envelopes (σ 22ð33Þ À τ 12 ) obtained with a hyper-reduced model trained on pure stress cases plus θ ¼ 45 ∘ and with one trained with all values of θ.The mesomodel envelope is shown for comparison.

Fig. 24 .
Fig. 24.Hyper-reduced model predictions for the biaxial σ 11 -τ 12 response under various stress ratios.The model trained with only pure stress cases predicts unseen scenarios remarkably well.

Fig. 25 .
Fig. 25.Network model predictions for the biaxial σ 11 -τ 12 response under various stress ratios.The curves are not reproduced well without additional network retraining.

Fig. 26 .
Fig. 26.Biaxial σ 11 -τ 12 predictions for the retrained network model.The predictions improve but are still not as accurate as the ones obtained with the untrained hyper-reduced model.

Fig. 27 .
Fig. 27.Short-beam FE 2 example.Loads and boundary conditions are shown as well as the plastic strain field at the final time step.

Fig. 28 .
Fig. 28.Load-displacement curves for the short beam FE 2 example.Predictions made with all three surrogate modeling strategies are shown.The analysis is stopped before the perfect plasticity plateau due to non-convergence of the fullorder FE 2 model.

Table 1
Execution times and speedups for the FE 2 examples.Full-order values are used as reference.