Augmentation of scarce data -- a new approach for deep-learning modeling of composites

High-fidelity full-field micro-mechanical modeling of the non-linear path-dependent materials demands a substantial computational effort. Recent trends in the field incorporates data-driven Artificial Neural Networks (ANNs) as surrogate models. However, ANNs are inherently data-hungry, functioning as a bottleneck for the development of high-fidelity data-driven models. This study introduces a novel approach for data augmentation, expanding an original dataset without additional computational simulations. A Recurrent Neural Network (RNN) was trained and validated on high-fidelity micro-mechanical simulations of elasto-plastic short fiber reinforced composites. The obtained results showed a considerable improvement of the network predictions trained on expanded datasets using the proposed data augmentation approach. The proposed method for augmentation of scarce data may be used not only for other kind of composites, but also for other materials and at different length scales, and hence, opening avenues for innovative data-driven models in materials science and computational mechanics.


Introduction
Classical high-fidelity numerical simulations of the elasto-plastic response of Short Fiber Reinforced Composites (SFRCs) require Finite Element (FE)-or Fast Fourier Transform it was independent of the number of strain-increment [21].Moreover, there have been developments in mechanistically informed neural networks to ensure coherence with the laws of physics [18,22].Liu et al. [22] developed a physics-based neural network model in which the surrogate model is composed of two ANNs: one which calculates the yield and the elastic response, and a second ANN for the non-linear plasticity response.The recent advances of ANN architectures have solved certain challenges, however, the RNNs rely on a large training dataset, which is considered a bottleneck for training accurate RNN-based constitutive models [18].
Since a large amount of data remains crucial for future developments in the field, transfer learning approaches have been proposed to address this issue.This approach could be employed to adopt a network trained on one material and to be used for new materials with smaller additional datasets [26].Recently, Ghane et al. [27] used transfer learning to address initialization challenges and sparse data issues for usage of RNNs for cyclic behavior of elasto-plastic woven composites.This approach can also be used to develop accurate data-driven models by initially training an ANN using a large amount of lowfidelity data, and then apply transfer learning using a small high-fidelity dataset [23,28].
However, this approach requires multiple datasets for initial pre-training and subsequent fine-tuning of the networks.Therefore, it may not be applicable when the sources of data are limited.
In this study, we are proposing a data augmentation approach to expand an original and small high-fidelity dataset, without a need for additional simulations.This approach is particularly useful when multiple sources of data are not available, and expanding the available data is expensive, time-wise and/or cost-wise.The proposed approach is to consider the simulations (and/or experimental results) input and output in multiple configurations i.e., to rotate the available data from the original coordinate system to multiple coordinate systems.The proposed approach is applied to micro-mechanical FE/FFT simulated data of non-linear elasto-plastic response of SFRCs.Using the implemented data augmentation approach, we have successfully developed an RNN as a surrogate model for non-linear elasto-plastic response of SFRCs, using only a limited number of original full-field micromechanical simulations.This is a novel approach to drastically increase a dataset size, which could be significant for future developments of ANN surrogate models not only for composites but also for other kind of materials.
The rest of this paper is structured as follows.Section 2 outlines the development of the high-fidelity original data.This includes RVE generation of SFRCs with variety of fiber orientation distributions, randomly evolving strain paths, and the homogenization of the micro-structure response using FE-or FFT-analysis.Section 3 describes how the original datset was expanded using the proposed data augmentation approach.Section 4 provides an in-depth explanation of the RNN set-up and its architecture.Section 5 presents the results of training an RNN on various extent of rotated data and discusses the significance and limitations of the proposed approach.Section 6 finishes this paper with some final remarks about the conclusions of this study.

Original Data
This study relies on the original data generated by Cheung and Mirkhalaf [23], in which RVEs of SFRCs with specific material properties of matrix and fibers were generated for a range of different fiber orientations and fiber volume fractions.In addition to the material properties, 6D-strain paths were randomly generated to cover the non-linear pathdependent response.The data was generated using Digimat-FE, which solves the boundary value problem for each RVE geometry.In the following subsections, there is a detailed explanations of each step necessary to produce the original data.

Random orientation tensor generation
The orientation of short rigid fibers within SFRCs is influenced by the characteristics of the surrounding viscous fluid during the manufacturing process.This orientation has a significant impact on the mechanical properties of the composite.SFRCs are stronger and stiffer in the orientation of the short fibers, while more compliant in the direction of least orientation of fibers [29].Advani and Tucker [29] originally described the probability distribution function of fiber orientation, denoted as ψ.This distribution is related to a set of even-ordered tensors known as orientation tensors.These tensors help quantify the orientation of fibers within a composite material.Fiber orientation is described using two angles, θ and ϕ, as shown in Figure 1.The fiber orientation p is given by   φ Figure 1: Orientation of a short fiber (p) in a coordinate system, described by two angles θ and ϕ [29].
The probability (P) of a fiber direction (p), exisiting between θ 1 and (θ 1 + dθ) and ϕ 1 and (ϕ 1 + dϕ) is given by The orientation distribution function ψ(p) is periodic repeating with π: The integral over the unit sphere is equal to one: The components of the second order orientation tensor (a) is given by: In this study, we used the method developed by Friemann et al. [19] to generate random reference orientation tensors.For a 3D orientation tensor, eigenvalues (the components of a diagonal orientation tensor) were sampled as a set of three positive numbers summing up to one.The eigenvalues of the randomly generated orientation tensors are shown in orientation tensor, by applying a random rotation tensor generated with Arvo's algorithm [30].An example of a randomly generated orientation tensor is shown in Figure 3.In the case of planar reference orientation tensors, one diagonal component was set to zero.Two numbers were sampled for the x and y directions, with their sum equal to one.The tensor was then randomly rotated around the z-axis, followed by a 90-degree rotation in the x, y, or z-axis.For the uni-directional orientation tensor, the first, second, or third component of the diagonal was randomly selected and set to 1, with all other components being 0.

Random strain path generation
Random strain paths were generated following the procedure developed by Friemann et al. [19], resulting in a random 6D strain walk.Firstly, the number of drift directions (nDrift) was randomly sampled from (1,2,5,10).Then, each component of the 6-dimensional vector representing the drift direction was sampled individually from a normal distribution with a mean of 0 and a standard deviation of 1. Subsequently, the drift directions were normalized to have a magnitude of 1.This normalized drift direction was then repeated a set number of times (determined by the number of time steps, set as 100, divided by nDrift).The previous steps were iterated for nDrift times with a new drift direction until 100 time steps was reached.For each time step, a perturbation was introduced by sampling a random noise vector from a standard distribution with a mean of 0 and a standard deviation of 1, these noise vectors were then scaled by a gamma value ranging from 0 to 1. Finally, a cumulative sum was computed by adding both the drift and noise to obtain the path.The path was then scaled so that the maximum strain equaled the specified maximum strain, which ranged from 0.01 to 0.05.

RVE size determination and FE/FFT-simulations
The RVE size determined by Cheung and Mirkhalaf [23], to include sufficient microstructural detailed to accurately capture the non-linear elasto-plastic response.The approach follows the criteria proposed by Mirkhalaf et al. [31] for determining RVE size, i.e., the coefficient of variation in deformation behavior should be less than a desired value; and the average responses should fall within a desirable error range.RVE sizes were selected based on a statistical analysis, which optimizes computational time with data accuracy.
The approach also limits the maximum RVE size for 3D fiber orientation distributions to ensure computational efficiency.Once the RVE size was determined, FE/FFT-analysis was employed using Digimat-FE.
It should be mentioned that a single fiber length was considered for the RVE generations.
Recently, Mentges et al. [32] showed that using a Representative fiber length, accurate predictions are obtained (comparable to simulations considering a fiber length distribution).

Specific loading test simulations
In addition to the random 6-dimensional loading data, specific loading tests were simulated, too.These were performed to later evaluate the effectiveness of the trained RNN on standard loading conditions.These were cyclic loading tests with the strain components going from 0 to 0.035, then to -0.035, and returning to 0. The loading cases included uniaxial normal stress (σ 11 ), uniaxial shear stress (σ 12 ), biaxial stress in two normal directions (σ 11 + σ 22 ), biaxial stress in normal and shear (σ 11 + σ 23 ), and a plane strain (ε 11 + ε 22 ).
Each loading test was applied to five different RVEs with random orientation tensors.Table 1 provides the properties of each RVE.

Data Augmentation Approach
Instead of increasing the dataset with conducting more expensive simulations, this study proposes to augment data using the original dataset.This approach includes rotation of the 6D input and output data to multiple confiduratons using randomized rotation tensors.This is schematically shown in Figure 4. Using this approach, it is possible to investigate whether data augmentation of multi-scale micro-mechanical simulations, is a feasible strategy to increase the dataset size, to enhance the RNN prediction capabilities while capturing the non-linear elasto-plastic response.Therefore, the training dataset, including the orientation tensor, strain path and stress evolution, was augmented by using fast random rotations, by implementing the Arvo's [30] algorithm.Each second order tensor, i.e., the orientation, the strain, and the stress, can be represented by a 3 × 3 matrix: In the Arvo's algorithm, the rotation matrix (R) is defined as follows: A unit vector (v ), parallel to the plane of reflection, and the reflection is given by the negatively scaled Householder matrix (−H ): −H = 2vv T − I.
A random rotation matrix (M ), is defined by The random rotation matrix (M ) was applied to the strain, orientation, and stress tensors: Effectively, the coordinate frame of the training data has been randomly rotated in the 3D space, as illustrated in Figure 5.

RNN Model Development
An RNN model architecture was chosen for training and testing, to evaluate the data augmentation approach.The initial RNN model implemented in this study was first developed by Friemann et al. [19] and further developed by Cheung and Mirkhalaf [23].

Neural network model architecture
The RNN has 13 inputs, comprising of 6 unique orientation tensor components, a sequence of 6 strain tensor components, and a fiber volume fraction.The output of the RNN is a sequence of 6 unique stress tensor components.Thus, ensuring that complex 6-dimensional stress-strain evolutions could be generated.
The RNN architecture was composed of three Gated Recurrent Unit (GRU) layers [34], each with 500 hidden states.The GRU updates for the next time input.In this study,  100 time steps were utilized, however, the RNN architecture is not limited to any number of time steps.Following the GRU layers, there exist a dropout layer [35] with a 50% dropout rate.After that, there is the final layer including 6 neurons (for the 6 output stress components).Figure 6 illustrates the RNN architecture.

Training of the neural network
The original data was divided into a training, validation, and test datasets including 80%, 15% and 5% of the data, respectively.The training dataset was limited due to the computational effort required to produce full-field FE/FFT-simulations.The RNN model was initially trained on the original dataset of 547 data samples, each including 100 time steps.
Each sample was randomly rotated, in the range from 1-20 times, effectively increasing the dataset by the number of rotations added.
To effectively train the neural network, the default Matlab loss function for time-series regression was utilized in this study.This function incorporates the sequence length (S), the number of outputs (R), the target (t), and the network prediction (O): The loss function was minimized using ADAM optimizer.Default values of parameters such as gradient decay factor (β 1 = 0.9), squared gradient decay factor (β 2 = 0.999), and the offset (ε = 1e − 8) were chosen in a suitable range of neural networks [36].L2-regularization and gradient clipping were also incorporated to prevent overfitting and exploding gradients respectively [37,38].
Optimizing hyperparameters is critical to effectively train the neural networks.Particularly, this includes maximum epochs, minimum batch size, initial learning rate, learning rate drop period and factor, and gradient threshold.These are optimized based on learning rate decay in relation to the number of iterations.Bayesian optimization function [39] in Matlab was incorporated to optimize the hyperparameters, allowing for up to 65 trials with the objective of minimizing validation loss.The final optimized parameters were determined by selecting the iteration with the lowest validation loss from the best trial.
In this study, training hyperparameters were optimized initially using the original dataset.They were iteratively checked as more rotations were added to the dataset yet did not add significant improvements to the performance.Therefore, in the range from zero rotations to 15 rotations, the hyperparameters were kept constant, i.e., identical to the original data.Finally, they were additionally optimized for 15 and 20 rotations.

Results & Discussion
The trained RNN was evaluated based on its predictive capabilities in capturing the non-linear elasto-plastic responses of SFRCs, with various amount of augmented data.Subsequently, the implementation of data augmentation via random rotations was evaluated.
Following this, specific loading cases, such as uniaxial, biaxial stress, and plane strain loadings, were examined to assess the RNN's ability to predict non-random stress-strain paths.

Evaluation metrics
The trained RNN was evaluated based on how accurate it could predict the von Mises stress, as given in Equation ( 14): ). (14)   From this, the mean relative error (MeRE) and the maximum relative error (MaRE) were calculated: The von Mises stress was chosen as a metrics because it allows for an evaluation that incorporates all 6 stress components.Furthermore, the von Mises stress is physically relevant for determining the material's behavior, providing a meaningful metrics for evaluating the accuracy of the network predictions.

Testing results for randomly evolving strain paths
The RNN model was initially evaluated for its accuracy in predicting the stress evolutions of the test dataset, i.e., the dataset including 5% of the original data.A consistent decrease in both MeRE and MaRE was observed as the number of augmented datasets increased from the original data to 20 times augmented datasets, as illustrated in Figure 7. Data augmentation, achieved by adding 20 randomly rotated stress-strain paths, reduced the MeRE by almost 50%, from 0.0659 to 0.03398.Therefore, the RNN effectively predicted complex anisotropic non-linear elasto-plastic deformations.An example figure of the predicted random path in relation to the high-fidelity simulated data, is given in Figure 8.It can be clearly seen that the augmented datasets are improving the network predictions.

Specific loading tests results
The trained RNNs were also tested on specific loading cases, particularly cyclic loading cases of uniaxial stress, biaxial stress and plane strain.This type of loading is commonly applied to materials to assess their performance.However, it is fundamentally different compared to random strain paths, as some stress or strain components are consistently set to zero.The MeRE and MaRE for specific loading tests are shown in Figure 9. Once again,  the RNN model exhibited accurate predictive capabilities while using augmented datasets.
The MeRE was drastically decreased by the data augmentation approach, from 0.20272 to 0.0628.Thus, expanding the dataset resulted in a substantial reduction in prediction errors, aligning more closely with high-fidelity FE/FFT-simulated data.
Furthermore, the RNNs stress-strain predictions together with the original micro-mechanical simulations for a uniaxial stress cyclic load is given in Figure 10.The network trained on the original dataset poorly matches the shape of the cyclic loading curve.However, as more rotations are added, the predicted output progressively aligns closely with the simulated data.Similarly, for a biaxial normal stress loading case (σ 11 and σ 22 ), a load cycle is analyzed.From the output stress components, the von Mises stress is calculated and given in Figure 11.
The obtained results in this study show the effectiveness of the proposed data augmentation method.It can dramatically reduce the required computational/experimental cost for developing data-driven models.For instance, producing 3D models using experimental data would be challenging since typical testing rigs only allow for specific loading scenarios [18].However, fatigue or other mechanical behaviors of SFRCs are crucial for accurate modeling, yet are typically unsupported by classical models [4].Experimental results may be utilized to discover unknown constitutive laws.For example, surrogate models trained on experimental data have demonstrated improved predictions of strain hardening in titanium under uniaxial stress [40].By implementing test-rig setups, one may produce stress-strain paths in a limited number of directions.Such a limited dataset could then be easily expanded using the proposed data augmentation approach.By allowing a network to train on experimental data, it allows for modeling of unknown laws and facilitate better predictions of the material behavior.This could be crucial for advancing the design of composites and other materials.Thus, this approach to augment data has valuable potentials for advancements in a wide range of industries.It should also be mentioned that despite the obtained great results, the method could potentially be improved by choosing the rotations angles in a systematic way.In other words, whether or not random rotations are optimal for this approach remains as an open question and requires further investigations.

Conclusions
Macroscopic behavior of different composite materials, including SFRCs, depends on microstructural parameters.Therefore, establishing a structure-property relationship requires the use of micro-mechanical models.By using a computational homogenization method applied to realistic RVEs, mimicking the actual material micro-structure, highly accurate predictions are obtained.Nonetheless, difficult RVE generations and computationally expensive simulations remain as main challenges.More recently, data-driven methods, using ANNs, have been proposed as an alternative method to solve these issues.Yet, the data-hungry nature of ANNs poses a challenge for generation of required data for training and validation of an ANN model, which limits advancements of data-driven models.
In this study, we addressed the challenge of limited high-fidelity data for training ANNs by introducing a novel approach -augmenting the original dataset through rotations.Using the proposed method, the original dataset of limited high-fidelity simulations was expanded to varying extents by using different amounts of random rotations.An RNN was trained and validated using different datasets (the original dataset and different augmented datasets) of path-dependent non-linear elasto-plastic behavior of SFRCs.The results demonstrated that the proposed data augmentation approach significantly mitigated the data requirement for deep-learning-enhanced modeling of SFRCs.
We believe that the proposed data augmentation approach is not exclusive to SFRCs and may be used for non only other composites, but also other kind of materials such as polymers, metals, ceramics etc.Also, the data augmentation method could potentially be applied to lower scale models such as Molecular Dynamics simulations to develop efficient surrogate models.This can dramatically reduce the required time and computational resources for dataset developments, and hence, results in accurate and remarkably efficient data-driven models for modeling and designing materials.

Figure 2 .Figure 2 :
Figure 2. Subsequently, the diagonal orientation tensor were rotated to obtain the final

Figure 3 :
Figure 3: An example figure of randomly generated orientation tensor.

Figure 4 :
Figure 4: An illustration showing the data augmentation approach considering multiple configurations and their corresponding coordinate systems.The RVE contour plot is taken from [33].

Figure 5 :
Figure 5: A representation of the proposed data augmentation approach using random rotations.The second order orientation, strain, and stress tensors are rotated from one configuration to another configuration using a rotation tensor relating the corresponding coordinate systems.

Figure 6 :
Figure 6: The RNN architecture in which, each GRU unit contains 500 hidden states.

Figure 7 :
Figure 7: Test dataset results for networks trained with different datasets (original and augmented ones).The bars represent the average MeRE and MaRE of the von Mises stress by comparing the networks predictions with the FE/FFT simulations.

Figure 8 :
Figure 8: Results of one of the test dataset samples for networks trained with original, 3 augmented, and 20 augmented datasets, showing the von Mises stress calculated from network prediction compared with the FE/FFT simulations.

Figure 9 :
Figure 9: Specific loading test results for networks trained with different datasets (original and augmented ones).The bars represent the MeRE and MaRE of the von Mises stress by comparing the networks predictions with the micro-mechanical simulations.

Figure 10 :
Figure 10: Stress-strain plot illustrating the results of a uniaxial stress (σ 11 ) loading test on sample 3, for networks trained with original, 3 augmented, and 20 augmented datasets compared with FE/FFT simulations.

Figure 11 :
Figure 11: Results of a specific loading test on sample 5 under biaxial stress loading (σ 11 +σ 22 ) for networks trained with the original, 3 augmented, and 20 augmented datasets, showing the von Mises stress calculated from networks predictions compared to micro-mechanical simulations.

Table 1 :
Orientation tensor components, fiber volume fraction for each of the specific test samples.