Prediction of composite microstructure stress-strain curves using convolutional neural networks

• Used convolutional neural networks to predict the entire stress-strain curve of binary composites with excellent accuracy • Demonstrated the ability to compress stress-strain curves with negligible loss in information • Established novel custom loss function for neural networks predicting compressed stressstrain curves • Formulated interpretable evaluation metric of a machine learning model ’ s ability to predict stress-strain curves


Introduction
Understanding the relationship between structure and property for materials is a seminal problem in material science, with significant applications for designing next-generation materials. A primary motivating example is designing composite microstructures for load-bearing applications, as composites offer advantageously high specific strength and specific toughness. Recent advancements in additive manufacturing have facilitated the fabrication of complex composite structures, and as a result, a variety of complex designs have been fabricated and tested via 3D-printing methods [1][2][3][4][5][6][7][8][9][10]. While more advanced manufacturing techniques are opening up unprecedented opportunities for advanced materials and novel functionalities, identifying microstructures with desirable properties is a difficult optimization problem.
One method of identifying optimal composite designs is by constructing analytical theories. For conventional particulate/fiberreinforced composites, a variety of homogenization theories have been developed to predict the mechanical properties of composites as a function of volume fraction, aspect ratio, and orientation distribution of reinforcements [2,11,12]. Because many natural composites, synthesized via self-assembly processes, have relatively periodic and regular structures, their mechanical properties can be predicted if the load transfer mechanism of a representative unit cell and the role of the self-similar hierarchical structure are understood [13]. However, the applicability of analytical theories is limited in quantitatively predicting composite properties beyond the elastic limit in the presence of defects, because such theories rely on the concept of representative volume element (RVE), a statistical representation of material properties, whereas the strength and failure is determined by the weakest defect in the entire sample domain. Numerical modeling based on finite element methods (FEM) can complement analytical methods for predicting inelastic properties such as strength and toughness modulus (referred to as toughness, hereafter) which can only be obtained from full stress-strain curves.
However, numerical schemes capable of modeling the initiation and propagation of the curvilinear cracks, such as the crack phase field model, are computationally expensive and time-consuming because a very fine mesh is required to accommodate highly concentrated stress field near crack tip and the rapid variation of damage parameter near diffusive crack surface [14][15][16][17][18][19][20][21]. Meanwhile, analytical models require significant human effort and domain expertise and fail to generalize to similar domain problems. In order to identify high-performing composites in the midst of large design spaces within realistic time-frames, we need models that can rapidly describe the mechanical properties of complex systems and be generalized easily to analogous systems. Machine learning offers the benefit of extremely fast inference times and requires only training data to learn relationships between inputs and outputs e.g., composite microstructures and their mechanical properties. Machine learning has already been applied to speed up the optimization of several different physical systems, including graphene kirigami cuts [22], fine-tuning spin qubit parameters [23], and probe microscopy tuning [24]. Such models do not require significant human intervention or knowledge, learn relationships efficiently relative to Fig. 1. Workflow of modeling process. To preprocess stress-strain curves, we convert them to PCA-coordinate representations, fitting a PCA model to the training set and applying the learned model to compress both the training and test set. Our CNN model takes as input the composite design and makes predictions in the PCA-coordinate space, with the loss function weighted by the explained variance ratio of each component. Finally, the CNN makes predictions about composite designs it hasn't seen. These PCA predictions are transformed back into regular stress-strain curves. To analyze our results, we compare the predicted material descriptors compared to the actual ones, derived from the predicted and actual stress-strain curves. the input design space, and can be generalized to different systems [25][26][27][28][29][30][31][32][33][34].
In this paper, we utilize a combination of principal component analysis (PCA) and convolutional neural networks (CNN) to predict the entire stress-strain curve of composite failures beyond the elastic limit. Stress-strain curves are chosen as the model's target because they are difficult to predict given their high dimensionality. In addition, stress-strain curves are used to derive important material descriptors such as modulus, strength, and toughness. In this sense, predicting stress-strain curves is a more general description of composites properties than any combination of scaler material descriptors. A dataset of 100,000 different composite microstructures and their corresponding stress-strain curves are used to train and evaluate model performance. Due to the high dimensionality of the stress-strain dataset, several dimensionality reduction methods are used, including PCA, featuring a blend of domain understanding and traditional machine learning, to simplify the problem without loss of generality for the model.
We will first describe our modeling methodology and the parameters of our finite-element method (FEM) used to generate data. Visualizations of the learned PCA latent space are then presented, along with model performance results.

Modeling stress-strain curves with PCA
We represent all the stress-strain curves as arrays of stress values evaluated at 61 strain points (referred to as stress vector, hereafter), and reduce the dimensionality of the stress vector by employing PCA, a linear method that identifies an orthonormal basis along which to project the data in a lower dimension. PCA is only fit on training data to find the orthonormal basis, which is defined by principal component vectors. Using this basis, both the training data and test data are transformed by projecting the stress vector into the lower dimensional latent space and finding a new set of coordinates to define the stress vectors.
The 100,000 × 61 stress matrix is denoted as the X matrix, where each row represents a stress-strain curve and each column a stress value at a given strain value. Even though stress-strain curves are composed of a stress and a strain vector, only the stress vector is passed to PCA, because the strain vector across all composite blocks had the same values and lengths. Using the strain vector would have added no extra information to the model because it is constant for all inputs. This allows us to significantly reduce the dimensionality of the output, making it easier for the model to make predictions. The stress matrix is first mean-centered by column as shown in Eq. (1), so that each column has a mean of 0. PCA utilizes singular value decomposition (SVD), shown in Eq. (2), which is a real matrix factorization method. The Σ T Σ matrix contains the eigenvalues that determine the variance explained in each principal component while the V T matrix contains the principal components in order of decreasing variance explained. The first 15 principal components were chosen to be kept and the rest discarded. This is done by projecting the data along the first 15 principal components in V T to obtain a 15-dimensional representation of the stress-strain curve. The cumulative explained variance (C.E.V.) is the sum of the first 15 eigenvalues normalized by the sum of all eigenvalues, shown in Eq. (3). By using dimensionality reduction, our CNN model is able to learn more efficiently because it is learning in a lower dimensional space, so that we can obtain higher accuracy with less training data.

Creating a customized loss function for CNN
Each eigenvalue represents the explained variance of the associated principal component basis-vector. In order to communicate to our CNN that some principal components were more important than others, we constructed a customized loss function that weighted errors in each principal component by the associated eigenvalue i.e., principal components that have larger explained variance were weighted more heavily in the loss function that those with less explained variances. In Eq. (4), we show the standard mean squared error loss function compared to the PCA weighted loss function we devised, as shown in Eq. (5).
To give intuition regarding the formulation of this customized loss function, recall that the CNN now outputs a 15-dimensional vector as a prediction. However, each component in this 15-dimensional vector corresponds to a particular principal component coordinate, which has a corresponding eigenvalue denoting its importance. Our customized loss function in Eq. (5) uses the explained variance ratio of each component in the loss function to 'teach' the model that certain components matter more than others, whereas the naïve mean squared error loss function in Eq. (4) treats each component equally. A flow diagram of the data processing, model training, and error analysis is shown in Fig. 1.

CNN implementation and training
A convolutional neural network was trained to predict this lower dimensional representation of the stress vector. The input to the CNN was a binary matrix representing the composite design, with 0's corresponding to soft blocks and 1's corresponding to stiff blocks. PCA was implemented with the open-source Python package scikit-learn [35], using the default hyperparameters. CNN was implemented using Keras [36] with a TensorFlow backend. The batch size for all experiments was set to 16 and the number of epochs to 30; the Adam optimizer was used to update the CNN weights during backpropagation [37].
A train/test split ratio of 95:5 is usedwe justify using a smaller ratio than the standard 80:20 because of a relatively large dataset. With a ratio of 95:5 and a dataset with 100,000 instances, the test set size still has enough data points, roughly several thousands, for its results to generalize. Each column of the target PCA-representation was normalized to have a mean of 0 and a standard deviation of 1 to prevent instable training.

Finite element method data generation
FEM was used to generate training data for the CNN model. Although initially obtained training data is compute-intensive, it takes much less time to train the CNN model and even less time to make highthroughput inferences over thousands of new, randomly generated composites. The crack phase field solver was based on the hybrid formulation for the quasi-static fracture of elastic solids and implemented in the commercial FEM software ABAQUS with a user-element subroutine (UEL).
The crack phase field model is a variational approach where the crack discontinuity is approximated as a continuous scalar field called crack phase field d(x). The two different models for crack discontinuities are shown in Fig. 2 [18][19][20]. The phase parameter has a diffusive nature with a regularization parameter, l c , and the diffusive crack converges to the original sharp crack in the limit as l c → 0. Crack phase field modeling allows one to simulate complex crack evolution such as curvilinear crack path, crack branching or coalescence. Also, unlike XFEM (extended-FEM) which requires predefined crack initiation site or path, entire fracture process involving crack nucleation and propagation can be While there are several methods to formulate the crack phase field model, we employed the hybrid formulation [14] which has been reported to be adequate for modeling the fracture of composite materials [15]. In the hybrid formulation, the key idea is to couple the crack phase field d(x) with the displacement field u(x) via the degradation of strain energy density ψ as shown in Eq. (6), where ϵ, ψ 0 þ , and ψ 0 − refer to strain tensor, tensile part of strain energy density and compressive part of strain energy density obtained without considering d(x), respectively. Eq. (7) shows how we can link the crack phase field d(x) with the stress field σ(x) via the degradation of elastic stiffness, where σ refers to stress tensor accounting for d(x) while σ 0 and C 0 refers to stress tensor and stiffness tensor without considering d(x). k is a very small dummy constant which is inserted to ensure numerical The crack phase field is updated via the critical energy release rate criterion: where irreversibility condition is ensured by history variable H þ ¼ max τ∈½0;t ψ þ 0 ðϵðx; τÞÞ (i.e. crack healing is prohibited). The critical advantage of hybrid formulation over the widely used anisotropic formulation [18] lies in the appropriate description of the stiffness degradation. The stiffness degradation in the anisotropic formulation is given by the following: This stiffness degradation scheme, which exclusively considers the tensile part of the strain energy density, leads to spurious load bearing capacity upon combined tensile and shear loading near the crack tip. As shown in Fig. 3, the anisotropic formulation predicts spurious crack growth and unphysical load-displacement response after complete fracture, while the hybrid formulation offers reasonable fracture pattern and load-displacement curve. Because full stress-strain responses of composites are considered, the hybrid formulation is chosen in the present study.
FEM simulations are performed with 2D plane stress conditions with CPS4 element of a large plate divided into 121 blocks, with 70 stiff and 51 soft blocks randomly assigned to 121 sites in the composite unit cell, resulting in N10 34 distinct configurations. Stress-strain curves of 100,000 composite configurations are obtained until the complete fracture with strain increment of 0.0000227 while the interface between stiff and soft blocks is assumed to be perfectly bonded. Each block was divided into 144 elements (12 by 12) and the regularization parameter, i.e. characteristic diffusive length of the crack, is set to be twice the length of each element. All configurations fail completely at the maximum strain of 0.00136. Hence, each stress-strain curve can be represented as an array of 61 stress values. See Fig. 4 for a description of the FEM model inputs and outputs.

Visualizing PCA
In order to better understand the role PCA plays in effectively capturing the information contained in stress-strain curves, the principal component representation of stress-strain curves is plotted in 3 dimensions. Specifically, we take the first three principal components, which have a cumulative explained variance~85%, and plot stress-strain curves in that basis in Fig. 5a and provide several different angles from which to view the 3D plot. Each point represents a stress-strain curve in the PCA latent space and is colored based on the associated modulus value. Based on Fig. 5a, it seems that the PCA is able to spread out the curves in the latent space based on modulus values, which suggests that this is a useful latent space for CNN to make predictions in.
The cumulative explained variance ratio with respect to the number of dimensions is shown in Fig. 5b. As shown in the figure we are able to capture N99.5% of the variance in the stress-strain curves, a factor of 4 reduction in dimensionality with negligible loss in quality of information given to the model by using only 15 principal components. While it is difficult to visualize stress-strain curves as 61-dimensional vectors, we can intuitively understand why PCA is able to find a linear lower dimensional representation based on our understanding of stress-strain curves. Because these curves have a linear portion i.e. elastic deformation, followed by fracture and crack propagation, they have generally similar.

CNN model design and performance
Our CNN was a fully convolutional neural network i.e. the only dense layer was the output layer. All convolution layers used 16 filters with a stride of 1, with a LeakyReLU activation followed by BatchNormalization. The first 3 Conv blocks did not have 2D MaxPooling, followed by 9 conv blocks which did have a 2D MaxPooling layer, placed after the BatchNormalization layer. A GlobalAveragePooling was used to reduce the dimensionality of the output tensor from the sequential convolution blocks and the final output layer was a Dense layer with 15 nodes, where each node corresponded to a principal component. In total, our model had 26,319 trainable weights.
Our architecture was motivated by the recent development and convergence onto fully-convolutional architectures for traditional computer vision applications, where convolutions are empirically observed to be more efficient and stable for learning as opposed to dense layers. In addition, in our previous work, we had shown that CNN's were a capable architecture for learning to predict mechanical properties of 2D composites [30]. The convolution operation is an intuitively good fit for predicting crack propagation because it is a local operation, allowing it to implicitly featurize and learn the local spatial effects of crack propagation.
After applying PCA transformation to reduce the dimensionality of the target variable, CNN is used to predict the PCA representation of the stress-strain curve of a given binary composite design. After training the CNN on a training set, its ability to generalize to composite designs it has not seen is evaluated by comparing its predictions on an unseen test set. However, a natural question that emerges is how to evaluate a model's performance at predicting stress-strain curves in a real-world engineering context. While simple scaler metrics such as mean squared error (MSE) and mean absolute error (MAE) generalize easily to vector targets, it is not clear how to interpret these aggregate summaries of performance. It is difficult to use such metrics to ask questions such as "Is this model good enough to use in the real world" and "On average, how poorly will a given prediction be incorrect relative to some given specification". Although being able to predict stress-strain curves is an important application of FEM and a highly desirable property for any machine learning model to learn, it does not easily lend itself to interpretation. Specifically, there is no simple quantitative way to define whether two stress-strain curves are "close" or "similar" with realworld units.
Given that stress-strain curves are oftentimes intermediary representations of a composite property that are used to derive more meaningful descriptors such as modulus, strength, and toughness, we decided to evaluate the model in an analogous fashion. The CNN prediction in the PCA latent space representation is transformed back to a stressstrain curve using PCA, and used to derive the predicted modulus, strength, and toughness of the composite. The predicted material descriptors are then compared with the actual material descriptors. In this way, MSE and MAE now have clearly interpretable units and meanings. The average performance of the model with respect to the error between the actual and predicted material descriptor values derived from stress-strain curves are presented in Table 1. The MAE for material descriptors provides an easily interpretable metric of model performance and can easily be used in any design specification to provide confidence estimates of a model prediction. When comparing the mean absolute error (MAE) to the range of values taken on by the distribution of material descriptors, we can see that the MAE is relatively small compared to the range. The MAE compared to the range is b10% for all material descriptors. Relatively tight confidence intervals on the error indicate that this model architecture is stable, the model performance is not heavily dependent on initialization, and that our results are robust to different train-test splits of the data.
Ranking plots are shown in Fig. 5c and demonstrate that the CNN can accurately learn the underlying distribution of values. A sample of predicted versus actual stress-strain curves are provided in Fig. 6. Despite complex non-smooth behaviors exhibited in stress-strain curves, the CNN is able to approximate the non-linear portions of the stress-strain curves well despite only using linear dimensionality reduction, which speaks to the learning ability of CNN to learn complex behaviors just from the microstructure. In addition, our CNN's ability to predict modulus, strength, and toughness values derived from stress-strain curves is significantly improved over our previous work [30], which directly predicted the scaler material properties from the composite design.

Future work
Future work includes combining empirical models with optimization algorithms, such as gradient-based methods, to identify composite designs that yield complementary mechanical properties. The ability of a trained empirical model to make high-throughput predictions over designs it has never seen before allows for large parameter space optimization that would be computationally infeasible for FEM. In addition, we plan to explore different visualizations of empirical models in an effort to "open up the black-box" of such models. Applying machine learning to finite-element methods is a rapidly growing field with the potential to discover novel next-generation materials tailored for a variety of applications. We also note that the proposed method can be readily applied to predict other physical properties represented in a similar vectorized format, such as electron/phonon density of states, and sound/light absorption spectrum.

Conclusion
In conclusion, we applied PCA and CNN to rapidly and accurately predict the stress-strain curves of composites beyond the elastic limit. In doing so, several novel methodological approaches were developed, including using the derived material descriptors from the stress-strain curves as interpretable metrics for model performance and dimensionality reduction techniques to stress-strain curves. This method has the potential to enable composite design with respect to mechanical response beyond the elastic limit, which was previously computationally infeasible, and can generalize easily to related problems outside of microstructural design for enhancing mechanical properties.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.