Deep autoencoders for physics-constrained data-driven nonlinear materials modeling

Physics-constrained data-driven computing is an emerging computational paradigm that allows simulation of complex materials directly based on material database and bypass the classical constitutive model construction. However, it remains difficult to deal with high-dimensional applications and extrapolative generalization. This paper introduces deep learning techniques under the data-driven framework to address these fundamental issues in nonlinear materials modeling. To this end, an autoencoder neural network architecture is introduced to learn the underlying low-dimensional representation (embedding) of the given material database. The offline trained autoencoder and the discovered embedding space are then incorporated in the online data-driven computation such that the search of optimal material state from database can be performed on a low-dimensional space, aiming to enhance the robustness and predictability with projected material data. To ensure numerical stability and representative constitutive manifold, a convexity-preserving interpolation scheme tailored to the proposed autoencoder-based data-driven solver is proposed for constructing the material state. In this study, the applicability of the proposed approach is demonstrated by modeling nonlinear biological tissues. A parametric study on data noise, data size and sparsity, training initialization, and model architectures, is also conducted to examine the robustness and convergence property of the proposed approach.


Introduction
Constitutive modeling is traditionally based on constitutive or material laws to describe the explicit relationship among strain, stress, and state variables based on experimental observations, physical hypothesis, and mathematical simplifications. However, the phenomenological modeling process inevitably introduces errors due to limited data and mathematical assumptions in model parameter calibration. Moreover, constitutive laws rely on pre-defined functions and often lack generality to capture full aspects of material behaviors [1].
With advancements in computing power and proliferation of digital data [2], machine learning (ML) based data-driven approaches, such as artificial neural networks (NNs), have emerged as a promising alternative for constitutive modeling due to their ability in extracting features and complex patterns in data [3]. This type of approaches usually requires parameterization of machine learning models with given constitutive data a priori, and thus, they are classified as model-based approaches. For example, NNs have been applied to modeling a variety of materials, including concrete materials [4], hyper-elastic materials [5], viscoplastic material of steel [6], and homogenized properties of composite structures [7]. Accordingly, the NN-based constitutive models were integrated into finite element codes to predict path-or rate-dependent materials behaviors [8,9,10,11]. Recently, deep neural networks (DNNs) with special mechanistic architectures, such as recurrent neural networks (RNNs) haven been applied to path-dependent materials [12,13,14]. In [14], for example, graph neural networks (GNNs) based representation of topology information of crystal microstructures (e.g., Euler angles) were introduced for learning anisotropic elasto-plastic responses. Besides, the deep material network approach was proposed in [15,16] to simulate multiscale heterogeneous materials, in which the NNs were constructed based on hierarchical mechanistic building blocks derived from analytical two-phase elasticity model and homogenization theory.
Another strategy in data-driven materials modeling is to bypass the constitutive modeling step by formulating an optimization problem to search for the physically admissible state that satisfies equilibrium and compatibility and minimizes the distance to a material dataset [17,18,19]. In this data-driven approach, the search of material data at each integration point from the material dataset is determined via a distance-minimization function and is called the distance-minimizing data-driven (DMDD) computing. This data-driven computing paradigm has been extended to dynamics [20], problems with geometrical nonlinearity [21,1], inelasticity [22], anisotropy [23], material identification and constitutive manifold construction [24,25,26,27]. A variational framework for data-driven computing was proposed in [28,29] to allow versatile in the employment of special approximation functions and numerical methods.
To better handle noise induced by outliers and intrinsic randomness in the experimental datasets, data-driven computing integrated with statistical models or machine learning techniques were developed, including incorporating maximum entropy estimation with a cluster analysis [30], regularization based on data sampling statistics [31], and locally convex reconstruction inspired from manifold learning for nonlinear dimensionality reduction [28]. In the local convexity data-driven (LCDD) computing proposed in [28], the convexity condition is imposed on the reconstructed material graph to avoid convergence issues that usually arise in standard data fitting approaches. Recently, Eggersmann et al. [32] introduced tensor voting, an instance-based machine learning technique, to the entropy-based data-driven scheme [30] to construct locally linear tangent spaces for achieving higher-order convergence in data-driven solvers.
The above mentioned (distance-minimizing) data-driven computing approaches are distinct from the NN-based constitutive modeling approaches. The latter constructs the surrogate models of constitutive laws independent to the solution procedure of boundary value problems, whereas the former is "model free" and it incorporates the material data search into the solution procedure of boundary value problems. Thus, the former approach was also called model-free datadriven computing [30,22]. The model-free data-driven approach circumvents the need of using material tangent during solution iteration processes, which offers another unique feature in computational mechanics. From the perspective of machine learning algorithms, the NN approach is considered as supervised learning, and it requires pre-defined input-output functions, e.g. strain-stress laws. It has been shown that selecting the response function by NNs within the constitutive framework is non-trivial [33,12,14]. On the other hand, the datadriven computing approaches with unsupervised learning such as clustering and manifold learning are capable of discovering the underlying data structure for the constitutive manifold [28,18]. However, this type of data-driven approaches could encounter difficulties in high dimensional applications when material data sampling involves multidimensional and history-dependent state variables. As demonstrated in [17,19,28], higher data dimension results in lower convergence rate with respect to data size, and thus demands effective dimensionality reduction for improved effectiveness of data-driven computing. Further, the model-free data-driven schemes rely on the direct search of nearest neighbors from the material dataset, leading to limited extrapolative generalization when the distribution of data points becomes sparse.
In this work, we aim to develop a novel data-driven computing approach to overcome the curse of dimensionality and the lack of generalization in classical model-free data-driven computing approaches [17,18,28]. To this end, we propose to introduce a novel autoencoders based deep neural network architecture [34,35] under the LCDD framework [28] to achieve two major objectives: dimensionality reduction and generalization of physically meaningful constitutive manifold. It should be emphasized that there have been various nonlinear dimensionality reduction (i.e. manifold learning) techniques developed for complex high-dimensional data [36,37,38,39,40], but these methods remain challenging in out-of-sample extension (OOSE), that is to project new unseen data onto the learned low-dimensional representation space, due to the lack of explicit mapping function. While nonparametric OOSE [40] is an option to construct the mapping, the computational complexity increases substantially with the size of dataset. On the other hand, owing to its deep learning architecture, an autoencoder is capable of naturally defining the mapping functions between high-and low-dimensional representation and capturing highly varying nonlinear manifold with good generalization capability [40,2], as to be discussed in Section 3.
By integrating autoencoders and the discovered low-dimensional embedding into the data-driven solver, the proposed framework is referred to as autoembedding data-driven (AEDD) computing, which can also be considered as a hybrid of the NN-based constitutive modeling and the classical model-free data-driven computing. In this approach, the autoencoders are first trained in an offline stage to extract a representative low-dimensional manifold (embedding) of the given material data. Autoencoders also provide effective noise filtering through the data compression processes. The trained autoencoders are then incorporated in the data-driven solver during the online computation with customized convexity-preserving reconstruction. Hence, all operations related to distance measure, including the search of the closest material points in the dataset are performed in the learned embedding space, circumventing the difficulties resulting from high dimensionality and data noise. To ensure numerical stability and representative constitutive manifold parameterized by the trained autoencoder networks, an efficient convexity-preserving interpolation is proposed to locally approximate the optimal material data to a given physically admissible state. Specifically, in this work, we present two different solvers to perform locally convex reconstruction, and demonstrate the one directly providing interpolation approximation without using decoders outperforms the one that fully uses the encoder-decoder network structure. Furthermore, it is shown that the proposed method is computationally tractable, since the additional autoencoder training is conducted offline and the online data-driven computation mainly involve lower-dimensional variables in the embedding space.
The remainder of this paper is organized as follows. The background of physics-constrained data-driven framework is first introduced in Section 2, including the data-driven equations in variational form and the material local solvers. In Section 3, the basic theory of deep neural networks and autoencoders are presented. Section 4 introduces the convexity-preserving interpolation for data-driven materials modeling with improved performance. Finally, in Section 5, the effectiveness of the proposed AEDD framework are examined, and a parametric study is conducted to investigate the effects of autoencoder architecture, data noise, data size and sparsity, and neural network initialization on AEDD's performance. The proposed method is also applied to biological tissue modeling to demonstrate the enhanced effectiveness and generalization capability of AEDD over the other data-driven schemes. Concluding remarks and discussions are summarized in Section 6.

Formulations of physics-constrained data-driven nonlinear modeling
This section provides the basic equations of the physics-constrained datadriven computational framework for nonlinear solids [21,28,1], followed by a review of two material data-driven local solvers and the associated computational approaches.

Governing equations of nonlinear mechanics
The equations governing the deformation of a solid in a domain Ω X bounded by a Neumann boundary Γ X t and a Dirichlet boundary Γ X u in the undeformed configuration are given as in Ω X , where u is the displacement vector, E is the Green Lagrangian strain tensor, S is the second Piola-Kirchhoff (2nd-PK) stress tensor, and DIV denotes the divergence operator. Without loss of generality, the governing equations (1) are defined in the reference (undeformed) configuration [41], which is denoted by the superscript "X". In Eq. (1), F is the deformation gradient related to u, defined as F(u) = ∂(X + u)/∂X, where X is the material coordinate, and b, N, t, and g are the body force, the surface normal on Γ X t , the traction on Γ X t , and the prescribed displacement on Γ X u , respectively. The first equation in (1) is the equilibrium. The second equation in (1) is the compatibility. The third and forth equations in (1) are the Neumann and Dirichlet boundary conditions, respectively. To obtain the solutions to the boundary value problem in Eq. (1), material laws that describe the relation between stress and strain are required, e.g., The material law is typically constructed with a pre-defined function f based on experimental observation, mechanics principles, and mathematical simplification with model parameters calibrated from limited material data [4,42] or by computational homogenization approaches such as FE 2 [43,44], which inevitably introduce materials modeling empiricism and errors [45]. Moreover, the consistent tangent stiffness associated with the material law is often required in nonlinear computation [41]. For complex material systems, phenomenological material models are difficult to construct. The physics-constrained data-driven computing framework [17,18,28] offers an alternative which directly utilizes material data and bypasses the need of phenomenological model construction.

Data-driven modeling of nonlinear elasticity
In this framework, the material behavior is described by means of strain and stress tensors (Ê,Ŝ) given by the material genome database. A material database E = {(Ê I ,Ŝ I )} M I=1 ∈ E is defined to store the material data, where M is the number of material data points, and the hat symbol " ∧ " is used to denote material data. Here, E denotes the admissible set of material database, which will be further discussed in Section 2.3. To search for the most suitable (closest) strain-stress pairs (Ê * ,Ŝ * ) for a given state (E, S), an energy-like distance function extended from [17] is defined: with whereĈ is a predefined symmetric and positive-definite tensor used to properly regulate the distances between (E, S) and (Ê,Ŝ). Usually, the selection of the coefficient matrixĈ depends on the a priori knowledge of the given dataset. One widely adopted normalization scheme in machine learning is based on the variance of the data, but the selection is not unique. For example, the Mahalanobis distance of data is employed to compute the coefficient matrices [31]. Recently, He et al. [1] proposed to use the ratio of the standard deviations of the associated components of the stress-strain data to construct a diagonal coefficient matrix. Henceforth, the strain-stress pair (Ê,Ŝ) ∈ E extracted from the material database is called the material data (state), whereas (E, S) is called the physical state if it satisfies the physically admissible set C given by the equilibrium and compatibility equations in Eq. (1), denoted as (E, S) ∈ C. As a result, the data-driven modeling problem can be formulated as: This data-driven problem is solved by fixed-point iterations, where the minimization of F with respect to (E, S) and (Ê,Ŝ) are performed iteratively until the intersection of two sets, C and E, is found within a prescribed tolerance. We denote the minimization corresponding to the material data as the local step, i.e. Eq. (3), while the one associated with the physical state as the global step, which will be discussed as follows. Given the optimal material data (Ê * ,Ŝ * ), the global step of the data-driven problem (6) is expressed as the following constrained minimization problem [1]: With the Lagrange multipliers λ and η, Eq. (7) is transformed to the minimization of the following functional: The Euler-Lagrange equation of Eq. (8) indicates that η = −λ on Γ X t and λ = 0 on Γ X u [46]. Consequently, we have By means of integration by parts and the divergence theorem, Eq. (9) is reformulated as The stationary conditions of Eq. (10) read: δu : δS : δλ : As Eq. (11b) provides correction between the physical stress S and the material stress data S * , a collocation approach is considered in Eq. (11b) to yield: which represents the stress solution update. Substituting Eq. (12) into Eqs. (11a) and (11c) yields: The solutions u and λ are solved from Eqs. (13) by means of the Newton-Raphson method [41], and the physical state stress S is subsequently obtained from (12). As such, Eqs. (12)- (13) are the computational procedures to solve Eq. (7). Moreover, in this boundary value problem, Eq. (13), the optimal material data (Ê * ,Ŝ * ) not only provides the underlying material information learned from material database, but also serves as the material data-based connection to relate the strain in compatibility and the stress in equilibrium equations in Eqs. (13a) and (13b), respectively. In this study, the reproducing kernel particle method (RKPM) [47,48] is employed to discretize the unknown fields u and λ in Eqs. (13) due to its capabilities of nodal approximation of state variables and enhanced smoothness that are particularly effective for data-driven computing. The formulation of the reproducing kernel approximation is given in Appendix A. Moreover, for effectiveness in data-driven computing, a stabilized nodal integration scheme (SCNI [49], see Appendix B) is used to integrate the weak formulations in Eq. (13) for reducing the number of integration points where the stress and strain material data need to be searched [28].
Combining the ease of introducing arbitrary order of continuity in the RK approximation as well as the employment of the SCNI scheme to integrate the weak equations in Eq. (13), it allows the material data search and variable evaluation to be performed only at the nodal points, avoiding the necessity of computing field and state variables separately at nodal points and Gauss points, respectively, for enhanced efficiency and accuracy of data-driven computing. In addition, under the SCNI framework, the nodal physical stress S is directly associated with the material stress data without introducing additional interpolation errors, see [28,1] for more details. The Green Lagrangian strain tensor E is computed from the RK approximated displacement u and is evaluated at the nodal points. Note that stress update equation in Eq. (12) is also carried out nodally.
The physical state (E, S) evaluated at the integration points x α are denoted as {(E α , S α )} N α=1 , where N is the number of integration points. Note that due to the employment of nodal integration [49,28], the integration points share the same set of points as the nodal points. For simplicity of notation, we denote z α = (E α , S α ) as the physics state andẑ α = (Ê α ,Ŝ α ) the material data associated with the integration point α in the following discussions.
In data-driven computing, the local step (3) and the global step (7) are solved iteratively to search for the optimal material data from the admissible material set E that is closest to the physical state satisfying the physical constraints given in Eq. (1). The convergence properties of this fixed-point iteration solver have been investigated in [17,19]. It should be noted that the selection of the optimal material data by the material data-driven local solver is crucial to the effectiveness of data-driven computing [30,28], and further discussion will be presented in the following Sections 2.3 and 4.  [28] solvers. The datadriven solution z * is given by the intersection of the admissible set of physical states C and the material admissible set E.

Material data-driven local solver
The effectiveness of data-driven computational paradigm discussed in Section 2.2 relies heavily on the search of optimal material dataẑ * α = (Ê * α ,Ŝ * α ), α = 1, ..., N , from the material dataset E. When adopting the distance-minimizing data-driven (DMDD) approach proposed in [17], the local material solver in Eq.
(3) used to find the optimal material data can be defined as where the distance functions d E and d S are referred to Eqs. (4) and (5), α denotes the indices of integration points, and N is the total number of integration points.
For enhanced data-driven computing, especially with sparse noisy data, an alternative approach called local convexity data-driven (LCDD) computing was proposed in [28] by introducing the underlying structure of material data via manifold learning. In this approach, the local solver is expressed as: where E l α := E l (z α ) denotes a local convex subset formed by k material data points closest to the given physical state z α = (E α , S α ), as illustrated by the polygons in Fig. 1(b). The local minimization problem (15) is solved by means of a non-negative least-square algorithm with penalty relaxation, see details in [28]. This approach introduces a local embedding reconstruction of datsa which is more robust in dealing noisy data and outliers. Fig. 1 shows the comparison of the DMDD and LCDD solvers, where (v) is the iteration index and one iteration consists of solving one global (physical) step, i.e. Eqs. (12)-(13) and one material data-driven local step, e.g. Eq. (14) or (15), as noted in Section 2.2. The local step of the DMDD solver (14) searches for the material data closest to the given physical state directly from the material dataset E, see Fig. 1(a). It has been shown that this heuristic solver suffers from noisy dataset and requires enormous data to guarantee satisfactory accuracy [30,28]. On the other hand, the LCDD solver (15) searches for the optimal material data based on the locally constructed convex space E l α informed by the neighboring data, as shown in Fig. 1(b). The key idea behind the construction of E l α is to provide a smooth, bounded and lower dimensional admissible space for optimal material data search in Eq. (15), and to preserve the convexity of the constructed local material manifold for enhanced robustness and stability in data-driven iterations.
While the material data-driven local solver in Eq. (15) locates the optimal data from the defined feasible set (constructed by a set of local neighboring points), the final solution of the associated data-driven modeling problem Eq. (6) is not guaranteed to be globally optimal. This is consistent to other existing data-driven approaches [17,19,28] where the optimality fundamentally depends on the characteristic of the material dataset. However, as demonstrated in references [17,19,28], if the material dataset is well posed, the proposed datadriven solver can converge optimally as the density of data points increases.
Remark. It should be noticed that in both Eqs. (14) and (15) the nearest points are sought based on the metric functions d E and d S . Thus, it suffers from the notorious "dimensionality curse" when data-driven modeling attempts to scale up to high-dimensional material data. Although the innate manifold learning in LCDD allows noise and dimensionality reduction, the proper definition of the metric functions in high-dimensional phase space remains challenging [2]. Besides, as the nearest neighbors are searched locally from the existing data points of the material dataset, it leads to limited extrapolative generalization to be demonstrated in Section 5.2. Furthermore, the data search and the locally convex reconstruction through a constrained minimization solver at every local step during data-driven computation could result in high computational cost especially for the large and high dimensional material dataset.
To address the issue of the curse of dimensionality, we propose to use autoencoders in the data-driven local solver for deep manifold learning of material data, allowing effective discovering of the underlying representation of stress-strain material data. To the best of the authors' knowledge, this is the first attempt to apply deep manifold learning in physics-constrained data-driven computing. In the following exposition, we demonstrate how autoencoder based deep learning enhances accuracy, robustness, and generalization ability of datadriven computing.

Autoencoders for low-dimensional nonlinear representation of material data
For effective data search in the local step, Eq. (3) or Eqs. (14) and (15), in this section, we first review the basic concepts of deep neural networks and autoencoders that are used for deep manifold learning. We then present the employment of autoencoders to construct low-dimensional nonlinear representation (embedding) of material data. Thereafter, optimum data search on the low-dimensional data manifold using a locally convex projection method is presented in Section 4.

Background: Autoencoders
As the core of the deep learning [2], deep neural networks (DNNs) or often called multilayer perceptrons (MLPs), are used to represent a complex model relating data inputs, x ∈ R din and data outputs y ∈ R dout . A typical DNN is composed of an input layer, an output layer, and L hidden layers. Each hidden layer transforms the outputs of the previous layer through two operators, i.e., an affine mapping followed by a nonlinear activation function σ(·), and outputs the results to the next layer, which can be written as: where x (l) ∈ R n l is the outputs of layer l with n l neurons, and W (l) ∈ R n l ×n l−1 and b (l) ∈ R n l are the weight matrix for linear mapping and the bias vector of layer l, respectively, where n 0 = d in is the input dimension. Some of the commonly used activation functions include logistic sigmoid, rectified linear unit (ReLu), and leaky ReLu. In this study, a hyperbolic tangent function is used as the activation function for hidden layers, σ(·) = tanh(·). Note that the setup of the output layer depends on the type of machine learning tasks, e.g., classification, regression. For regression tasks, which is the application of this study, a linear function is used in the output layer where the last hidden layer information is mapped to the output vectorỹ, expressed as: whereỹ denotes the DNN approximation of the output y. We denote θ as the collection of all trainable weight and bias coefficients, θ = {W (l) , b (l) } L+1 l=1 . Autoencoders [34,35] are an unsupervised learning technique in which special architectures of DNNs are leveraged for dimensionality reduction or representation learning. Specially, an autoencoder aims to optimally copy its input to output with the most representative features by introducing a low-dimensional embedding layer (or called a code). As shown in Fig. 2, an autoencoder consists of two parts, an encoder function h enc (·; θ enc ) : R d → R p and a decoder function h dec (·; θ dec ) : R p → R d , such that the autoencoder is much less than the input dimension d, the encoder h enc is trained to learn the compressed representation of x, denoted as the embedding x ∈ R p , whereas the decoder h dec reconstructs the input data by mapping the embedding representation back to the high-dimensional space.
It is important to note that similar to any other dimensionality reduction techniques [50], the employment of autoencoders is based on the manifold hypothesis, which presumes that the given high-dimensional input data, e.g., the material dataset E, lies on a low-dimensional manifold E that is embedded in a higher-dimensional vector space, as shown by the schematic figures at the bottom of Fig. 2. Figure 2: Schematic of an autoencoder consisting of an encoder and a decoder, where the dimension of the embedding layer is smaller than the input dimension. For a high-dimensional input object, the encoder learns a compressed low-dimensional embedding, on which the decoder optimally reconstructs the input object.

Nonlinear material embedding
In this study, autoencoders are used to discover the intrinsic low-dimensional material embedding of the given material dataset E = {ẑ I } M I=1 , whereẑ I = (Ê I ,Ŝ I ) and M is the number of material data points. Given the autoencoder architecture h(·; θ enc , θ dec ) in Eq. (17), the parameters θ * enc and θ * dec are computed by minimizing the following loss function: where β is a regularization parameter, and || · || F denotes the Frobenius norm.
Here, the loss function consists of the reconstruction error over all training data and a L 2 -norm based weight regularization term used to prevent over-fitting issues [2,51].
The training procedures of autoencoders in terms of the loss function in Eq. (18) are performed offline. Thus, training on a large material dataset does not result in additional overhead on the online data-driven computation. The details of the training algorithms for autoencoders are given in Section 3.3.
Remark. It is well known that for an autoencoder with a single hidden layer and linear activation function, the weights trained by the mean-squared-error cost function learn to span the same principal subspace as principal components analysis (PCA) [52]. Autoencoders based on neural networks with nonlinear transform functions can be thought of as a generalizaiton of PCA, capable of learning nonlinear relationships.
Given the trained autoencoder h(·; θ * enc , θ * dec ), we can define a low-dimensional , ∀z ∈ Z}, in which the material state is described by a lower-dimensional coordinate system z . Here, the prime symbol (·) is used to denote the quantities defined in the embedding space, and Z denotes the high-dimensional phase space where the material statesẑ and the physical states z are defined. For example, the embedding set of the given material data is whereẑ I = h enc (ẑ I ; θ * enc ) forẑ I ∈ E. Considering the data-driven application on learning the underlying structure of material data, autoencoders provide the following advantages: 1) Deep neural network architecture enables autoencoders to capture highly complex nonlinear manifold with exponentially less data points than nonparametric methods based on nearest neighbor graph [53,40,2]. 2) Autoencoders provide explicit mapping functions, i.e. h enc and h dec , between the high-and low-dimensional representation so that the trained encoders allow efficient evaluation of the embedding of new input data. 3) Through information compression by encoders, unwanted information of material data, such as noise and outliers, can be filtered while preserving its essential low-dimensional manifold structure [2].
Compared to data-driven methods based on conventional manifold learning techniques [18,28,32], the explicit nonlinear mapping functions learned by autoencoders are particularly attractive to data-driven computing because not only can they encode the essential global structure of the given material data for enhanced generalization ability, they also greatly reduce online computational cost by using the pretrained autoencoders. Furthermore, as we can see in next section, due to the availability of low-dimensional embedding E , we can introduce a convexity-preserving interpolation scheme to effectively search for the optimal material data associated with the given physical state.

Autoencoder architectures and training algorithms
As the architectures of encoder and decoder in an autoencoder are symmetric, we only use the encoder architecture to denote the autoencoder architecture. For example, the encoder architecture in Fig. 2 where the first and last values denote the numbers of artificial neurons in the input layer and embedding layer, respectively, and the other values denote the neuron numbers of the hidden layers in sequence. As such, the decoder architecture in this case The offline training on the given material datasets is performed by using the open-source Pytorch library [54], and the optimal parameters θ * enc and θ * dec of autoencoders are obtained by minimizing the loss function (Eq. (18)). The regularization parameter β is set as 10 −5 . A hyperbolic tangent function is adopted as the activation function for all layers of autoencoders, except for the embedding layer and the output layer, where a linear function is employed instead. To eliminate the need of manually tuning the learning rate for training, an adaptive gradient algorithm, Adagrad [55], is employed, where the initial learning rate is set to be 0.1 and the number of training epochs is set to be 2000. The training datasets are standardized such that they have zero mean and unit variance to accelerate the training process. It should be noted that the training of autoencoders could get trapped in local minima and this can be overcome by pretraining the network using Restricted Boltzmann Machines or by denoising autoencoders [56,40,57].

Auto-embedding data-driven (AEDD) solver
We now develop the AEDD solver based on autoencoders to search for the optimal material data in the solution process of the local step (e.g., Eqs. (14) or (15)). We begin with introducing a simple interpolation scheme to preserve local convexity in the material data search, which is essential in enhancing the local solver performance, followed by presenting two AEDD solvers with the employment of convexity-preserving reconstruction.

Convexity-preserving interpolation
With deep manifold learning by autoencoders, we are able to extract the underlying low-dimensional global manifold of material datasets, and to enhance the generalization capability of the material local solver. During data-driven computing, material and physical states are projected onto the constructed material embedding space E , and a convexity-preserving local data reconstruction is introduced for enhanced stability and convergence in the local data search on the embedding space.
In this approach, because the material embedding points in low-dimensional space are explicitly given by the offline trained autoencoders, interpolation schemes on the embedding space is straightforward without suffering the highdimensionality issues. A convexity-preserving, partition-of-unity interpolation method is therefore introduced into the material data-driven solver, using Shepard function [58] or inverse distance weighting. Shepard interpolation has been widely used in data fitting and function approximation with positivity constraint [59,60,61].
Here, the Shepard functions are applied to reconstruct the material embedding of a given physical state, z = h enc (z; θ * enc ), by its material embedding neighbors, expressed as where z recon is the reconstruction of z ,ẑ I is the material data embedding in E defined in Eq. (19), N k (z ) is the index set of the k nearest neighbor points of z selected from E , and the shape functions are In Eqs. (20) and (21), φ is a positive kernel function representing the weight on the data set {ẑ I } I∈N k (z ) , and I denotes the interpolation operator that constructs shape functions with respect to z and its neighbors. Note that these functions form a partition of unity, i.e., I∈N k (z ) Ψ I (z ) = 1 for transformation objectivity. Furthermore, they are convexity-preserving when the kernel function φ is a positive function. Here, an inverse distance function is used as the kernel function φ(z −ẑ I ) = 1 ||z −ẑ I || 2 .
It is worth noting that the interpolation functions defined in Eqs. (21) and (22) are equivalent to the RK approximation function in (A.6) with zero-order basis. Fig. 3 demonstrates the locally convex reconstruction by the proposed interpolation in Eq. (20). For example, the given blue asterisk is mapped to the bluesquare point by using the Shepard interpolation. It can be seen that the three given points (inside (red), on-edge (pink), and outside (blue)) are all mapped to locations within the convex hull, showing the desired convexity-preserving capability. The interpolation is simple and efficient as the interpolation functions in Eq. (21) can be constructed easily in a low-dimensional embedding space.

Auto-embedding data-driven (AEDD) solver in data-driven computing
The physics-constrained data-driven computing described in Section 2 is conducted in the high-dimensional phase space Z (or called data space), where the physical state z α ∈ C, the material dataẑ α ∈ E and the material dataset E are defined in Z. We use the subscript "α" to denote the quantities at integration points with the employment of numerical discretization, see Section 2.2. To enhance solution accuracy and generalization capability of data-driven computing, deep manifold learning enabled by autoencoders is introduced into the material data-driven local solver. Recall that autoencoders introduced in Section 3.2 are trained offline and the trained encoder h enc and decoder h dec functions are employed directly in the online data-driven computation. As such, the encoder maps an arbitrary point from the data space to the embedding space, i.e. z α = h enc (z α ), whereas the decoder performs the reverse mapping, i.e.z α = h dec (z α ). With the autoencoders and the proposed convexity-preserving data reconstruction in the embedding space introduced in Section 4.1, we propose the following two AEDD approaches for the material data-driven local solver. The objective is to find the optimal material data for a given physical state z α computed in Section 2.2.

AEDD local solver: Solver I
Let z α be the physical state obtained from the global step with physical constraints, Eq. (7), or the corresponding variational equations, Eqs. (12)- (13). We first introduce a local solver that uses decoders for reverse mapping from the embedding space to the data space, denoted as Solver I. In this approach, the local problem defined in Eq. (3) is reformulated by three steps, as described below: Step 1 : Step 2 : Step 3 : for α = 1, ..., N , whereẑ I ∈ E (see Eq. (19)), and I is the convexity-preserving interpolation operator defined in Eq. (20). The schematic of data-driven computing with Solver I is illustrated in Fig.  4(a), where the integration point index α is dropped for brevity. For example, at the v-th global-local iteration, after the physical state z (v) (the blue-filled triangle) is obtained from the global physical step (Eq. (7)), Step 1 of the local solver (Eq. (23a)) maps the sought physical state from the data space to the embedding space by the encoder, z (v) = h enc (z (v) ), depicted by the whitefilled triangle in Fig. 4(a). In Step 2, k nearest neighbors of z (v) based on Euclidean distance are sought in the embedding space and the optimal material embedding solutionẑ * (v) (the red square) is reconstructed by using the proposed convexity-preserving interpolation (Eqs. (20)- (22)). Lastly, in Step 3, the optimal material embedding stateẑ * (v) is transformed from the embedding space to the data space by the decoder,ẑ * (v) = h dec (ẑ * (v) ) (the red star in Fig.  4(a)). Subsequently, this resultant material data solutionẑ * (v) from the local solver in Eq. (23) is used in the next physical solution update z (v+1) . These processes complete one global-local iteration. The iterations proceed until the distance between the physical and material states is within a tolerance, yielding the data-driven solution denoted by the green star in Fig. 4(a), which ideally is the intersection between the physical manifold and material manifold in the data space.
Here, the nearest neighbors searching and locally convex reconstruction of the optimal material state are processed in the filtered (noiseless) low-dimensional embedding space, resulting in the enhanced robustness against noise and accuracy of the local data-driven solution.

AEDD local solver: Solver II
Although autoencoders aim to transform the input material data to output data with maximally preserving essential features, see Fig. 2, the decoder functions h dec do not exactly reproduce the given material data in the data space due to the information compression and errors inevitably introduced by training processes [2]. During Step 3 of Solver I (Eq. (23c)), the material embedding solutionẑ * α in Eq. (23b) projecting back to the data space by decoders could involve data reconstruction errors. That is, the performance of AEDD Solver I is subject to the quality of the trained decoder functions.
To enhance the robustness and stability of data-driven computing, we propose the second AEDD local solver (Solver II) that circumvents the use of decoders and, instead, uses the interpolation scheme in Eq. (20) to perform locally convex reconstruction directly on material dataset. The procedures of this solver are expressed as Step 1 : Step 2 : for α = 1, ..., N , whereẑ I ∈ E are the material data given in the original data space. The key ingredient of this approach is that the modified locally convex reconstruction in Step 2 involves interpolation functions constructed in embedding space but interpolating material data that are in data space. It can be viewed as a blending interpolation approach compared to that in Solve I. The effectiveness of Solver I and II will be compared and discussed in Section 5.2.1.
In both Solver I and II, the interpolation functions Ψ I (z α ) are evaluated on the embedding space related to the embedded physical state z α and its k nearest neighbors in the material embedding data {ẑ I } I∈N k (z α ) ⊂ E . In Solver II, however, these functions are weighted on the un-projected material data {ẑ I } I∈N k (z α ) ⊂ E corresponding to the k selected neighbors. Because the locally convex reconstruction in Eq. (24b) gives the optimal material data solution immediately in the data space, the decoder is avoided in Solver II. Fig. 4(b) shows a schematic of the proposed data-driven computing based on Solver II. Taking the v-th global-local iteration as an example, the physical state z (v) obtained from the global physical step (Eq. (7)) is mapped to the embedding space z (v) by the encoder. In Step 2 of Solver II, the same k nearest neighbors search is performed on the embedding space E , while their corresponding material data in the original dataset are used in the data reconstruction via Eqs. (20)- (22). As shown in Fig. 4(b), the locally convex reconstruction of the material embedding state z (v) can be directly performed with the material data {ẑ I } I∈N k (z α ) in the data space by using data indices, yielding the optimal material solutionẑ * (v) .
It is worth emphasizing that in comparison with the LCDD approach [28], the key difference in the proposed solver is that the neighbor search and data reconstruction are performed on the embedding space E , a lower-dimensional space constructed by the pre-trained encoder function. Thus, the proposed AEDD with Solver II can be considered as a enhanced generalization of LCDD for high-dimensional material data. We use this approach as the default AEDD method, unless stated otherwise.

Numerical results
In this section, the proposed AEDD approach is first tested on a cantilever beam using synthetic material data generated by constitutive laws. In this example, the effects of several factors on autoencdoers and the resulting AEDD datadriven solutions are investigated, including the size, sparsity, and the noise level of material datasets, neural network initialization during autoencoder training, and autoencoder architectures, aiming to validate the robustness and reliability. In the second subsection, AEDD is applied to modeling biological tissues using experimental data measured from heart valve tissues to demonstrate the enhanced generalization capability.
For simplicity, we consider homogeneous material in the following numerical examples, and thus the same material dataset, e.g., , with M data points, is used for all integration points.

Cantilever beam: Verification of the AEDD method
To verify the proposed AEDD framework (with Solver II in Section 4.2.2 by default), a cantilever beam subjected to a tip shear load is analyzed, as shown in Fig. 5. The Saint Venant-Kirchhoff phenomenological model with Young's modulus E = 4.8 × 10 3 N/mm 2 and Poisson's ratio ν = 0 is used to generate material datasets for training autoencoders. The problem domain is discretized with 41 × 5 randomly distributed nodes. The data-driven analysis is performed with 10 equal loading steps under a plane-strain condition. Following the same setting in [28], the weight matrixĈ used in the distance metric (Eqs. (4)-(5)) and the physical solver (Eq. (13)-(12)) is defined aŝ

Preparation of material datasets
To assess the robustness and convergence property of AEDD against noise presented in the given material datasets, four manufactured noisy material datasets approximating the Saint Venant-Kirchhoff phenomenological model with different data sizes, i.e. M = 10 3 , 20 3 , 30 3 , and 40 3 , are considered. The generation procedure of these noisy datasets is described below. First, an noiseless dataset,Ē = {z I } M I=1 is generated, where each Green-Lagrangian strain component is uniformly distributed within the range [−0.02, 0.02] and the 2nd-PK stress components are obtained by using the elastic tensor in Eq. (25) that relates strain to stress. The example with M = 20 3 is shown in Fig. 6(a), where the strain and stress components are displayed separately for visualization. Following [30,28], Gaussian perturbations scaled by a factor dependent on the size of datasets, 0.4z max / 3 √ M , are added to each component of the noiseless datasetĒ to obtain the associated noisy datasets E = {ẑ I } M I=1 , wherez max is a vector of the maximum values for each component among the noiseless dataset. The noisy dataset corresponding to M = 20 3 is shown in Fig. 6(b). Fig. 7 shows the other three noisy material datasets.

Effects of autoencoder architecture and initialization
In order to assess the effects of initialization during training and network architectures on autoencoders' accuracy and robustness, five random initializations and four architectures of autoencoders are considered. For the given noisy material datasets associated with this plane-strain cantilever beam problem, it is observed that autoencoders with an embedding layer of the dimension p = 1 or p = 2 could not capture a meaningful embedding representation. This is consistent to the observation in [28] where the number of neighbor points to construct the locally convex embedding is suggested to be larger than the number of intrinsic dimensionality, which is 2 of the employed linear elastic database.  Hence, it requires the embedding dimension to be greater than 2. As described in Section 3.3, the encoder architecture is used to represent the autoencoder architecture. Four encoder architectures, 6 − 4 − 3, 6 − 5 − 4, 6 − 5 − 4 − 3, and 6 − 10 − 8 − 5, are considered in the following tests.
The first row in Fig. 8 shows the error curves (mean with standard deviations shaded) of the final training and testing losses against the size of training dataset for different autoencoder architectures. Here, the noisy material datasets of different sizes, M = 10 3 , 20 3 , 30 3 , and 40 3 , that defined in Section 5.1.1 are used for training the autoencoders, where autoencoders are trained with five random initialization for each case. Besides, to fairly compare the testing errors between the autoencoders trained with various sizes of training data, we use the same test dataset consisting of 729 material data points that are generated from the same procedure in Section 5.1.1 but not included in the given material datasets.
As we can see, all the selected autoencoders converge well, yielding smaller training and testing errors as the size of material dataset increases. Moreover, it is observed that the autoencoder with a larger architecture could lead to greater variation due to training randomness, indicated by the standard deviations. This is because the training algorithms used to minimize the loss function in Eq. (18) do not guarantee global minimization, and a larger DNN with more trainable parameters may cause higher randomness. However, the trained results shown here are satisfactory due to the employment of regularization. It also shows that the training and testing losses decrease as the dimension of the embedding layer increases.

Data-driven modeling results
The data-driven solution is compared with the constitutive model-based reference solution using Eq. (25). To better assess the accuracy of AEDD solutions, a normalized root-mean-square deviation (NRMSD) is introduced  The trained autoencoders corresponding to different material datasets and architetures are then applied to data-driven simulations, where the number of nearest neighbors used in locally convex reconstruction of the data-driven solver is set as 6. NRMSD of data-driven solutions with respect to the model-based reference solution is given in the bottom row of Fig. 8. For all architectures, it can be observed that the AEDD solutions (both mean values and variation) improve as the number of training data increases, which suggests a good convergence property. Although using an embedding dimension of 5 (encoder: 6 − 10 − 8 − 5) yields the highest accuracy, the AEDD solutions obtained from using an embedding dimension of 3 and 4 are satisfactory. It also shows that the overall patterns of error convergence in NRMSDs are similar across different encoder architectures using the same size of training dataset, indicating that the AEDD solutions are not sensitive to the width and depth of the encoder architecture as long as autoencoders of a sufficiently large size are used. Considering that using a more complex encoder architecture with a larger embedding dimension would increase computational cost in data-driven computing, an encoder architecture 6 − 4 − 3 is used in the numerical examples. Fig. 9 shows that the normalized tip deflection-loading curve predicted by the proposed AEDD method agrees well with the model-based reference. The noisy data set of size M = 40 3 is used in this case. The results obtained by LCDD are also provided for comparison in Fig. 9(a), where a few loading steps yield divergent data-driven solutions when the noisy material data is employed. On the other hand, the AEDD method stays robust even with noisy data employed. It is also worth noting that when using Solver I (Eq. (23)) in AEDD, we also observe unconverged solutions (which are not reported in the Figures). We attribute this to the information loss caused by the decoder functions. On the other hand, Solver II with the convex interpolation functions defined in the embedding space and the material data points in data space yields stable solutions.
The comparison of AEDD and LCDD with respect to the iteration number and the computational cost are given in Fig. 10. In this case, the architecture of 6 − 4 − 3 is used. While the number of iterations for convergence varies in AEDD due to the non-uniqueness of autoencoder training, it generally requires less data-driven iterations than LCDD to achieve converged solutions, as shown in Fig. 10(a). This is because a more generalized embedding space is used in AEDD for computing the local material solution. We also observe that with less noisy material data, the required iteration number decreases regardless of the increase in data size, an attractive property for data-driven computing.
Moreover, because the data search and the convexity-preserving interpolation in AEDD local solver are performed in the low-dimensional embedding space instead of the high-dimensional data space, the computational cost of AEDD is substantially reduced compared to LCDD, as shown in Fig. 10(b).

Data-driven modeling with sparse noisy datasets
To evaluate the performance of the proposed AEDD approach when datasets are sparse, three noisy material datasets (Table 1) are generated in a similar manner as described in Section 5.1.1 but with fewer data points compared to Fig.  7. First, several loading paths are selected with uniformly distributed Green-Lagrangian strains for each of the loading paths. The corresponding 2nd PK stresses are generated using the elastic tensor given in Eq. (25). Consequently, the sparse noisy material datasets are given in Fig. 11, where Gaussian perturbations scaled by 0.4z max / 3 √ M are added independently pointwise to both the strain and the stress data.  An autoencoder (6-4-3) is trained using the sparse noisy datasets and used in AEDD modeling of the cantilever beam problem. The normalized tip deflectionloading responses predicted by the proposed AEDD method are compared with the constitutive model-based solutions, as shown in Fig. 12. The results demonstrate that the proposed AEDD method remains robust and accurate when dealing with noisy material datasets at different levels of sparsity and that the data-driven prediction accuracy improves as the data density increases.

Biological tissue data-driven modeling
The effectiveness of the proposed AEDD computational framework is examined by using the biological data from biaxial mechanical experiments of a porcine mitral valve posterior leaflet (MVPL) [62]. Fig. 13(a) shows the schematic of a MVPL specimen with a dimension 7.5mm × 7.5mm subjected to prescribed displacements, where the tissue's circumferential and radial directions are denoted as x and y axes, respectively, and the stretch ratios along these two directions are defined as λ Circ and λ Rad .
A total of eleven protocols (Table 2) includes nine biaxial tension protocols with various tension ratios and two pure shear protocols, as illustrated in Fig.  13(b). The normal components of the Green strain and the associated 2nd-PK stress tenors generated from the 11 biaxial mechanical testing are plotted in Fig. 13(d) and Fig. 13(e), respectively. It shows that the measured data points are sparse in the stress-strain phase space. It is noted that in the mechanical testing the direct measurements are the applied membrane tensions, T Rad and T Circ , and the displacements are estimated by digital image correlation techniques. Thus, the measured Green strain and 2nd-PK stress data are based on homogeneous deformation assumption in the test specimen. More details about the tissue strain and stress calculations as well as the experimental setting can be found in [62,1].
Five study cases are considered to evaluate the performance of the proposed AEDD framework, which is compared with that of the LCDD method [28,1]. In these tests (Case 1-5), the experimental data (see Fig. 13(d-e)) associated with the selected biaxial testing protocols, called training protocols, are used for constructing material dataset E, and different data-driven modeling approaches with the constructed material dataset are tested on other protocols (called testing protocols) to assess their performance against the experimental results. The training and testing protocols of the Five study cases are described as below: • Case 1: Training Protocols: 1, 3, 4, 7, and 8; Testing Protocols: 2 and 5, used to investigate AEDD's performance in interpolative and extrapolative predictions.
For the first three cases, the protocols used for training are symmetrically distributed, while the training protocols are asymmetrically distributed for the last case.
For AEDD, autoencoders are first trained offline using the training protocols and then employed in the local step of the data-driven solvers (Section 4.2) of AEDD during the online computation. In the following study, Solver II (Section 4.2.2) is employed and the number of nearest neighbors in locally convex reconstruction of the data-driven solver is set as 6. A diagonal matrix is used as the weight matrixĈ with each diagonal component being the ratio of the standard deviation of the associated component of the stress data to that of the strain data. This is similar to the normalization technique used in deep learning that applies the standard deviation of each input unit to inversely scale the input data [2].
The prediction of data-driven methods on testing protocols that are not included in the training dataset are compared with the corresponding experimental data. The NRMSD (Eq. (26)) normalized with respect to the maximum stress of the experimental data is employed to assess the prediction performance of the methods. In the data-driven modeling, considering the symmetric geometry of the tissue specimen and the symmetric loading conditions, the upper right quarter of the sample is modelled with symmetric boundary conditions, as shown in Fig. 13(c), and the prescribed displacements are applied to the top and the right boundaries.

Case 1
We first examine the data fitting capability whereby the data-driven methods are tested on the training protocols 1, 3, 4, 7 and 8, as shown in Fig. 14(a) and Fig. 14(d). It shows that both AEDD (NRMSD AEDD =0.008) and LCDD (NRMSD LCDD =0.022) provide satisfactory fitting results, but AEDD yields a slightly higher accuracy. Since the strains and stresses of testing protocols 2 and 5 lie inside and outside the domain covered by the data of the training protocols, respectively, as shown in Fig. 13(d-e), the AEDD predictions on the testing protocols 2 and 5 are interpolative and extrapolative predictions, respectively. For the interpolative prediction test on Protocol 2, the results of these two approaches also agree well with the experimental data, as shown in Fig. 14(b) and (e). The NRMSD errors indicate that LCDD achieves a higher accuracy, i.e. NRMSD LCDD = 0.009 < NRMSD AEDD = 0.021. However, its extrapolative prediction on Protocol 5 is worse than that from AEDD (NRMSD LCDD = 0.158 > NRMSD AEDD = 0.059), see Fig. 14(c) and (f). The results demonstrate better extrapolative generalization ability of AEDD. It could be attributed to the underlying low-dimensional global material manifold learned by the autoencoders. Specifically, AEDD performs local neighbor searching and locally convex reconstruction of optimal material state based on geometric distance information in the low-dimensional global embedding space, which contains the underlying manifold structure of the material data and contributes to a higher solution accuracy and better generalization performance. In contrast, LCDD performs local neighbor searching and locally convex reconstruction purely from the existing material data points without any generalization, leading to lower extrapolative generalization ability.
Another proposed AEDD method with Solver I (Section 4.2.1) using the same training protocols as material dataset are also investigated, as shown in Fig. 15. As expected, compared to the results obtained by using Solver II, see Fig. 14(b) and (c), the prediction capability by Solver I decreases on both testing protocols. Especially in the interpolative prediction test Protocol 2, the NRMSD error increases to 0.04 from 0.021. We attribute the larger errors with Solver I to the employment of decoders in constructing the optimal material state. Since we have demonstrated that the AEDD approach with Solver II provides better data-driven prediction results, we only consider this approach in the following study.

Case 2
In this case study, the objective is to verify how the incorporation of material data of different deformation modes affects the interpolative and extrapolative predictability in the proposed data-driven modeling. Two pure shear protocols are introduced in the training material dataset in addition to the biaxial tension protocols used in Case 1. The two pure shear protocols (10 and 11) in the training dataset exhibit different material behaviors from the remaining biaxial tension protocols (1, 3, 4, 7, and 8). The AEDD predictions on the testing protocols 2 and 5 are interpolative and extrapolative predictions, respectively. As can be seen from Fig 16(a) and (d), both LCDD and AEDD maintain good fitting performance for all the biaxial tension and pure shear training protocols. They also perform well for the testing Protocol 2 ( Fig. 16(b) and (e)) with almost the same accuracy in Case 1. This is a desirable property in data-driven methods. AEDD again yields higher accuracy than LCDD in the extrapolative test (Protocol 5), as evidenced by the smaller NRMSD value 0.071 in the AEDD prediction over 0.159 in the LCDD prediction. This further demonstrates the enhanced extrapolative generalization in the proposed autoencoder-based approach. Moreover, compared with Case 1, Fig. 16(c) shows that AEDD with the material data from the pure shear protocols improves the prediction for strain E < 0.35 but results in slightly more discrepancies in the high strain range.

Case 3
The extrapolative prediction performance of AEDD is further explored in this case study. Here, three biaxial tension protocols (Protocols 1, 2, and 6) with similar loading patterns, as illustrated by the experimental data in Fig.  13(d) and (e), and two pure shear protocols (Protocols 10 and 11) are used for the material training dataset. The AEDD and LCDD approaches are tested on two testing protocols (Protocols 3 and 4) subjected to larger loading ratio differences between tissue's circumferential and radial directions. Again, Fig. 17 shows that AEDD outperforms LCDD in both training and testing protocols. In the testing cases (Protocols 3 and 4), while the LCDD results show clear discrepancies from the experimental data, AEDD provides a better accuracy, as evidenced by reducing the NRMSD with more than 50% from the LCDD prediction. This example further verifies better extrapolative generalization capability of AEDD. As can be seen from Fig. 14 and 16, both LCDD and AEDD work well for interpolative testing cases when using training protocols with symmetrical loading conditions. In Case 4, we investigate how a material training dataset from asymmetrically distributed protocols (biaxial tension Protocols 2, 5, 7, and 8) affects the interpolative prediction performance. Although the simulation results on the training protocols from both AEDD and LCDD agree well with experimental data, as shown in Fig. 18(a) and (b), the accuracy of LCDD deteriorates substantially on the testing protocols compared to AEDD, as shown in Fig. 18(g). The results demonstrate that AEDD's performance is more robust when dealing with irregular training datasets, which could be attributed to the underlying material manifold learned by the autoencoders.

Case 5
The results in Cases 1-4 have demonstrated that AEDD yields improved interpolative and extrapolative prediction compared to the LCDD approach by introducing autoencoders in the material data-driven local solver. In this last case, we investigate the performance of the AEDD method on the testing dataset that are fully unrelated to the training dataset. Specifically, autoencoders were trained using the biaxial tension protocols 1-9 for AEDD predictions on the pure shear protocols 10 and 11. As displayed in Fig. 19, AEDD predictions on the pure shear protocols (10 and 11) show some deviations from the experimen- tal data. It is because the training protocols are all biaxial tension protocols that do not contain any information about the material behaviors in the pure shear protocols. These results demonstrate that the predictive capability of the machine learning techniques such as AEDD depends on the richness and quality of the given training data.

Conclusion
In this study, we introduced the deep manifold learning approach via autoencoders to learn the underlying material data structure and incorporated it  Figure 19: Data-driven prediction by AEDD on (a) Protocol 10 and (b) Protocol 11. Protocols 1-9 are used to train the autoencoder into the data-driven solver to enhance solution accuracy, generalization ability, efficiency, and robustness in data-driven computing. The proposed approach is thus named auto-embedding data-driven (AEDD) computing. In this approach, autoencoders are trained in an offline stage and thus consume little computational overhead in solution procedures. The trained autoencoders are then applied in the proposed data-driven solver during online computation. The trained encoders and decoders define the explicit transformation between lowand high-dimensional spaces of material data, enabling efficient embedding extension to new data points. A simple Shepard convex interpolation scheme is employed in the proposed data-driven solver to preserve convexity in the local data reconstruction, enhancing the robustness of the data-driven solver.
A parametric study is conducted in the beam problem to investigate the effects of noise in material datasets, the size and sparsity of datasets, neural networks initialization during training, and autoencoder architectures on the performance of autoencdoers and data-driven solutions. Autoencoders with four different architectures are trained with synthetic noisy material datasets generated from a phenomenological model and different random initialization. The parametric study shows the performance of the offline trained autoencoders improves as the amount of training data increases regardless of the examined autoencoder architectures and neural network initialization. AEDD predictions are accurate and robust when dealing with sparse noisy datasets with the solutions converging to the constitutive model-based reference solutions as the number of material data and data density increase. In addition, with the offline trained autoencoders and efficient Shepard convex reconstruction for online computation, AEDD shows enhanced computational efficiency compared to the LCDD approach [1]. The effectiveness of the proposed framework is further examined by modeling biological tissues using experimental data. The proposed AEDD framework shows a good performance in modeling complex materials. Through five study cases, the proposed approach show stronger generalization capability and robustness than the LCDD approach [1]. This is attributed to the fact that the local neighbor searching and locally convex reconstruction in the proposed data-driven solver is based on geometric distance information in the filtered global embedding space learned by autoencoders, which contains the underlying manifold structure of the material data. The results of the last case also demonstrates the effects of richness and quality of the training data on the predictive capability of the AEDD method.
Although using 6 nearest neighbors in the locally convex reconstruction of the data-driven solver and an empirical weight matrixĈ based on statistical information of data is sufficient for AEDD to produce accurate and robust solutions in the problems of this study, choosing the optimal number of nearest neighbors and the optimal weight matrix requires further investigations. The results of the proposed data-driven approach demonstrate the promising performance by integrating the autoencoder enhanced deep manifold learning into data-driven computing of systems with complex material behaviors.
function with a local support size "a", controlling the smoothness of the RK approximation function, for example, the cubic B-spline kernel function: In Eq. (A.2), b(x) is a parameter vector determined by imposing the n-th order reproducing conditions [47,48], |i + j + k| = 0, 1, ..., n. The RK approximation function is then obtained as,

Appendix B. Nodal integration scheme
The stabilized conforming nodal integration (SCNI) approach is employed for the domain integration of the weak form (Eq. (13)) to achieve computational efficiency and accuracy when using RK shape functions with nodal integration quadrature schemes.
The key idea behind SCNI is to satisfy the linear patch test (thus, ensure the linear consistency) by leveraging a condition, i.e. the divergence constraint on the test function space and numerical integration [49], expressed as: where 'ˆ' over the integral symbol denotes numerical integration. In SCNI, an effective way to achieve Eq. (B.1) is based on nodal integration with gradients smoothed over conforming representative nodal domains, as shown in Fig. B.20, converted to boundary integration using the divergence theorem where V L = Ω L dΩ is the volume of a conforming smoothing domain associated with the node x L , and∇ denotes the smoothed gradient operator. In this method, smoothed gradients are employed for both test and trial functions, as the approximation in Eq. (B.2) enjoys first order completeness and leads to a quadratic rate of convergence for solving linear solid problems by meshfree Galerkin methods. As shown in Fig. B.20, the continuum domain Ω is partitioned into N conforming cells by Voronoi diagram, and both the nodal displacement vectors and the state variables (e.g., stress, strain) are defined at the set of nodes {x L } N L=1 . Therefore, if we consider two-dimensional elasticity problem under the SCNI framework, the smoothed strain-displacement matrixB I (x L ) used in (16) is expressed as:B