Investigation of Granite Deformation Process under Axial Load Using LSTM-Based Architectures

Granite is generally composed of quartz, biotite, feldspar, and cracks. The changes in digital parameters of these compositions reflect the detail of the deformation process of the rock. Therefore, the estimation of the changes in digital parameters of the compositions is much helpful to understand the deformation and failure stages of the rock. In the current study, after dividing the frames in the video images photographed during the axial compression test into four parts (or, the upper left, upper right, lower left, and lower right ones), the digital parameters of various compositions in each part were then extracted. Using these parameters as input dataset, a long short-term memory (LSTM) based neural network was then established for exploring the changes of various compositions. After dividing the deformation process into four stages based on the stress-strain curve and using the digital parameters of various compositions as the dataset, the LSTM-based neural network for estimating the rock deformation stage was also established. The root mean squared error (RMSE) and goodness of fit (R) and the average accuracy (ACC) were used to evaluate the efficiencies of these two LSTM-based neural networks. The influences of variables (such as the number of hidden layers, maximum epoch, learning rate, minimum batch size and train ratio) on efficiencies of the LSTM-based neural networks were thereafter explored. It shows that the super parameters have a great influence on the efficiency of the established LSTM-based neural network for estimating digital parameter changes of various compositions; the estimations were relatively good if the number of hidden layers, maximum epoch, learning ratio, minimum batch size, and train ratio is 2, 150, 0.005, 10, and 0.8, respectively; the compositions with the greater percentage have a greater accuracy using the neural network; the great-small sequence of ACC is biotite, feldspar, crack, and quartz, if the LSTM-based architecture for estimating deformation stages was used. These results may be referable both for investigating the availably of the established LSTM-based architectures and for exploring the deformation process of the rock materials.


Introduction
Rock is a typical material composed of various compositions. For example, granite is generally composed of quartz, biotite, and feldspar. The changes of these compositions reflect the deformation stages of the rock. Therefore, it is much important to determine the composition changes and the rock deformation stages in investigating the rock properties. The conventional technique is much focused on the whole deformation of the rock. There are few works, as we know, in exploring the detailed movements of various compositions at different deformation stages. Because each composition may be represented by the digital parameters, the estimation of the instant digital parameters at the corresponding deformation stage is much important in understanding the failure process of the rock under external load. In this study, we will establish two long short-term memory (LSTM) based neural networks to respectively estimate the composition digital parameters and the rock deformation stages.
The image processing techniques have been utilized to compute the instant digital parameters of each composition. To investigate the changes of various compositions in a rock by the use of the image processing techniques, Xu et al. [1] obtained the lengths and areas of the compositions in granite under axial load. Rigopoulos et al. [2] examined the details in crack initiation and propagation, Lin et al. [3] measured the related displacements in a rock, Akesson et al. [4] explored the locations of the newlygenerated cracks, and Han et al. [5] track the movement of crack by taking crack as tracking targets.
In recent years, many researchers tried to use the LSTM-based neural networks to solve the problems in the field of civil engineering. In construction engineering, Zhong et al. [6] combined the conditional random field and bidirectional LSTM to automatically extract the qualitative construction procedural constraints in Chinese regulations after recognizing the named entities; Qu et al. [7] proposed practical quantitative indexes to evaluate the concrete dam deformation based on rough set (RS) theory and a LSTM-based network; Zhang et al. [8] established a two-level structure with LSTM-based network respectively to improve signal quality and to build crack signal for detecting the acoustic emission signals of rail cracks; Rashid et al. [9] presented a LSTM-based RNN (recurrent neural network) for construction equipment activity recognition using the synthetic time-series training data; Tang et al. [10] proposed a forecast model for rail transit based on LSTM-based network by combining spatio-temporal parameters as the input; Sagheer et al. [11] proposed a deep long-short term memory (DLSTM) architecture using the production data of two actual oilfields; Khatir et al. [12] assessed the combination use of artificial neural network with particle swarm optimization for the damage quantification of laminated composite plates. In bridge engineering, Mangalathu et al. [13] utilized the LSTM algorithm to classify building damages based on the textual descriptions recorded after an earthquake; Guo et al. [14] found that the LSTM-based neural network is effective for the deflection estimation of the bridge health state; Tran-Ngoc et al. [15] employs the cuckoo search algorithm to improve the training parameters in the artificial neural network for the numerical models of a steel beam and a large-scale truss bridge; Khatir et al. [16] combined the transmissibility functions with the artificial neural network to conduct the damage detection in a girder bridge. In underground works, Gao et al. [17] used the traditional recurrent neural (RNN) networks, LSTM-based networks and gated recurrent unit (GRU) networks to conduct the real-time estimation of operating parameters in tunnel boring machines (TBM); Yang et al. [18] presented the coal gangue recognition results by comparing LSTM and other learning algorithms based on the collision vibration signal between the coal gangue and metal plate; Zhou et al. [19] presented a neural network, containing a wavelet transform noise filter, convolutional neural network feature extractor, and LSTM estimator, for determining the attitude and position of the shield machine. In geotechnical engineering, Do et al. [20] combined the LSTM algorithm and multi-layer neural network to forecast the crack propagation in risk assessment of engineering structures without analysis tools; Nguyen et al. [21] applied the LSTM method and the multi-layer neural networks to predict fracture growth rates of a concrete specimen in the permeable porous media; Xu et al. [22] and Xie et al. [23] presented a dynamic model to estimate landslide displacement both by decomposing the displacement into the trend and periodic components and by using the empirical mode decomposition and LSTM-based method; Yang et al. [24] utilized the time series theory and a LSTM-based neural network to estimate the transient landslide displacement; Geng et al. [25] adopted a dilated causal temporal convolution network (DCTCNN) and a CNN-LSTM hybrid model to forecast seismic events; Zhang et al. [26] proposed two schemes of the LSTM-based network for nonlinear structural seismic responses after clustering the seismic inputs using dynamic K-means clustering approach.
An LSTM-based architecture may make full use of dynamic digital parameters to automatically explore features at various deformation stages, and perform well in the studies related to long distance dependencies (especially for the time-series data). This architecture can also reduce the noise signals caused by the surroundings and be much suitable for the real-time detection application. Nevertheless, few works, as we know, were conducted in using the LSTM-based neural networks for estimating the instant composition digital parameters during the rock deformations. Temporal parameters in the video image are linked to the deformation stages and can be extracted at each instant. These sequential data may be used to develop a data-driven model to investigate the multistage features of the rock deformations. The deformation evolution of the rock specimen is a complex nonlinear dynamic process, in which the deformation conditions at one time may affect the stability conditions at the next time. Therefore, the LSTM-based architecture may be used to investigate the granite deformation process under axial load. In this study, using the video image photographed during the laboratory compression test and cropping each frame in the video image into different parts, the threshold technique was then used to determine the types and locations of various compositions. The composition digital parameters were thereafter extracted and used as the dataset to establish two LSTM-based neural networks respectively for exploring the changes in composition digital parameters and for determining the deformation stage of whole specimen. The influences of various variables on the efficiencies of the established LSTM-based neural networks are furthermore examined.

Processing of Granite Test Video Image under Axial Load
The granite blocks used in this study are located in the Besishan area, Yumen city, Gansu Province, China. After slicing the blocks into a cuboid with the size of 50 mm × 50 mm × 100 mm and polishing into the specimen with the section flatness smaller than 0.02 mm, the camera with a type of Canon 600D was stably set on the front of the specimen with a distance of around 100 cm. The uniaxial compression test was then conducted using the serving machine with the maximum load of 2.0 MN. The test process was photographed to obtain the video image with a MOV format.
From the video image, the crack appeared at 256 s. All of the frames with the size of 25 mm × 50 mm from this instant were then extracted from the video image. Two frames per second (or, the total of 700 frames) were extracted to conduct the following analysis. Each frame was evenly divided into four parts, or, the upper left, upper right, lower left, and lower right parts, to explore the changes in digital parameters at various parts. By using the conventional identification and point-selection technique, the types of various compositions at various locations were determined and labelled. Fig. 1 shows an original frame. It should be noted that the composition type and change on a frame were herein determined by the color and traditional naked-eye technique. The composition changes may be also modeled by using the phase field models (PFMs). It is useful to apply the PFMs in characterizing these changes in the future work. As for the use of the phase field model, the fracture propagation was investigated in poroelastic media based on the classical Biot poroelasticity theory (Zhou et al. [27]), the crack propagation, branching and coalescence in a rock may be simulated (Zhou et al. [28]), the fluid-driven dynamic crack propagation in poroelastic media can be also explored (Zhou et al. [29]), related computations for quasi-static and dynamic crack propagation was also implemented on a general finite element software (Zhou et al. [30]). The modified PFM can be used to simulate the compressive-shear fractures in rock-like materials by constructing a driving force in the evolution equation (Zhou et al. [31]).

Establishment of Feature Dataset for Various Compositions
Taking the above-mentioned 700 images during the test, the digital parameters (including the long-axis length L and the area A) of various compositions at various locations and instants were then computed. As for the length L changing with time of various compositions in Fig. 4, it can be seen that: (a) as for cracks (see Fig. 4a), the length on the upper left parts increase obviously into 240 pixel, first followed by the maintenance of the length, then followed by the appearance of the cracks on the lower parts and no new crack appeared on the upper left parts; the changes in the length are much similar to those in the area, implying that the most changes in the crack area are distributed to the changes in the length; (b) as for biotite (see Fig. 4b), most of the changes appeared on the lower left parts, and there are small changes on the upper parts; the lengths on the upper right parts are obviously greater than those on the upper left part; the lengths on the lower parts increased obviously at the instants of 120 and 600 s; the changes in the length are similar to those in the area; (c) as for feldspar (see Fig. 4c), the lengths increased obviously at around 120 s; the lengths at upper left and lower right parts are stable at about 450 pixel, whereas the lengths at upper right part decrease slowly; there are many differences between the changes in lengths and in areas, implying that the length changes in the feldspar are related both to the long-axis length and to the short-axis length; (d) as for quartz (see Fig. 4d), the change trends are similar at the upper left and lower right parts, whereas a relatively great difference appears at the lower left and upper right parts; most changes in the areas are attributed to the changes in the lengths.  As a whole, under axial load, for crack and biotite, the change trends in the long-axis length are relatively similar to those in the area, most area changes are attributed to the changes in the long-axis lengths; for feldspar, there is a great difference between the changes in the long-axis length and area; for quartz, there is a small difference between the change trend in the long-axis length and area at the upper left and lower right parts, whereas the area changes are dependent both on the long-axis length and on the short-axis length. Fig. 5 shows the changes with time in the area A.
As for the area A changing with the time of various compositions in Fig. 5, it can be seen that: (a) as for cracks (see Fig. 5a), none appears at the upper left part, whereas the earliest ones appear at the upper right part. During the crack propagation, new cracks generated gradually at the lower left and lower right parts; the areas increased quickly at around 120 s and increased slowly later until the specimen destructed. The crack area approached into the maximum at around 682 s and was then stable until the specimen destructed; most of the cracks appeared at the upper left part with a maximum of around 1600 mm 2 ; the changes in crack areas are relatively the same at around 500 mm 2 between those at the lower left part and those at the lower right part whereas the former changes more quickly; (b) as for biotite (see Fig. 5b), most of the areas at the lower left part fluctuated, which may be induced by more identification errors of small cracks; the areas at lower left and lower right parts increased obviously at around 120 s and 600 s; the areas fluctuates smaller at the lower left, upper right, and lower right parts; (c) as for feldspar (see Fig. 5c), the area is usually decreased; the greater one appears at the lower left part, whereas the smallest one appears at the lower right part; those on the upper parts are stable, whereas those on the lower parts decrease obviously with a relatively great speed; (d) as for quartz (see Fig. 5d), the area is stable; the great-small area order of the locations is lower right, lower left, and upper right; the areas at the upper right part has a decreasing trend and smaller than those of feldspar as the cracks increased quickly at 120 s; the areas at the lower parts slowly decreased followed by an increase.
As a whole, under axial load, most cracks appear on the feldspar parts; areas of biotite, feldspar, and quartz are increased, decreased, and stable, respectively, and this change sequence agrees with the smallgreat hardness of the compositions. The LSTM-based neural network is a deep learning algorithm based on the modified recurred neural network, in which the memory cell may selectively member the long or short input data and is much suitable to conduct the temporal data. The conventional LSTM-based neural network composed of three layers, or input, hidden, and output layers. The output is related both to the current input and to the hidden layer in the earlier layers and therefore has the memory ability. The main difference between LSTM and RNN is that a "processor" is added in LSTM to judge whether the data is useful. The structure of the processor is called LSTM memory cell. In each cell, three gates (respectively called input, oblivion, and output ones) are set. As the data get into the LSTM-based neural network, whether they are available depends on the pre-defined rules, only those met the requirements are mentioned. Fig. 6 shows the structure of LSTM memory cell.
In Fig. 6, input gate (i t ), forget gate (f t ), output gate (o t ), cell state (c t ), and cell output (h t ) are computed relatively by where x t 2 R k , representing the input time series; r is the sigmoid function acting on three gates with the output of [0,1], representing the passage degree (0 and 1 are no pass and pass respectively); tanh is the tangent of the hyperbola; W ix , W ih , W fx , W fh , W ox , and W oh are the weight matrices initialized with random values corresponding to various gates; b i , b f and b o are the biases to the corresponding weight matrices; "1" represents the multiplication of the corresponding elements in the matrices; c t is the expression of the storage unit at time t; h t is the outputs of LSTM unit at time t.

Design of LSTM-Based Architecture for Various Compositions on Granite
Following the conditions described earlier, the LSTM-based architecture for various compositions on granite was established (shown in Fig. 7). The input layer, hidden layer, output layer, and net training are included in the architecture. (a) Input layer. Taking the above-mentioned time series as the original input data, in which earlier 80% and later 20% were respectively set as the training and test sets. The time series are the long-axis length and area of quartz, biotite, feldspar, and crack at each instant extracted from the above-mentioned 700 images. In order both to avoid the anomaly and noise and to accelerate the training of the net, the original data were standardized by where x b is the standardized data; x is the original data; l and r are the mean and variance of the original data x, respectively.
(b) Hidden layer. In the established LSTM architecture, the hidden layer is composed of two LSTM layers, in which the hidden element is used to remember the information at corresponding time steps. In the current study, the first and second LSTM nets are originally respectively set as a combination of 150 and 125 memory elements. In each layer, a dropout layer was added both to control the abandoned probability of neural element and to avoid the over fitting of the net.
(c) Output layer. In the established LSTM-based architecture, the output layer is composed of a fullyconnected layer and a regression layer. A dropout layer was also added. The estimated time-series were obtained from the output layer. The inverse standardization was conducted to compute the loss by comparing the estimated time-series and actual ones. Herein, a fully-connected layer multiplies the input by a weight matrix and then adds a bias vector to make the output values more exact, a regression layer computes the mean absolute error loss at each instant and can be used to obviously display the differences between the estimated and actual values. Therefore, these two layers were composed in the output layer both to reduce the errors of the estimated digital parameters and to visually display these errors in the later sections.
(d) Training parameters in the architecture. The parameters conclude the selection of optimizer, learning ratio, minimum batch size, and maximum epoch. In this study, the Adam algorithm is set as the optimizer; the leaning ratio is set as 0.0 to 1.0 to update weights; the minimum batch size represents the number of specimen during the training and may affect the optimization and training speed; the epoch is the training number where all the training set are used.

Accuracy Measurement of LSTM-Based Architecture
Root mean squared error (or RMSE) and goodness of fit (or R²) were used as the criteria for evaluation to measure efficiency of the established LSTM-based architecture, and computed by where y i is the actual value;ŷ i is the estimated value; y i is the mean of the actual value; n is number of the time-series data.
Root mean squared error (RMSE) represents the error between the estimated and actual values. The smaller the RMSE, the closer is the estimated value to the actual one. Goodness of fit (R²) is a dimensionless coefficient in the interval of [0,1]. If R² is closer to 1, the estimated values approach more to the actual ones, and vice versa.

Training of LSTM-Based Architecture
In the current study, the original training parameters are set as follows: the number of the hidden layer is 1; the ratio of the training to test sets is 0.8; the leaning ratio is 0.003; the minimum batch size is 10; and the maximum epoch is 100.
Taking the area changes with time of feldspar at upper right parts as the input data (see Fig. 2b), the estimated area (see Fig. 8) and the corresponding locally-enlarged figure were obtained (see Fig. 9). From Fig. 9, it can be seen that the results are not satisfactory; there is a relatively great error of 200 mm 2 .

Exploration of Parameters in LSTM-Based Architecture
To improve the established LSTM-based neural network in estimating the digital parameters of the compositions included in the granite under axial load, we now explored the effects of the abovementioned training parameters on the estimation accuracies.
(1) Number of hidden layers Tab. 1 shows the computed results of the estimation accuracies with various numbers of hidden layers. It can be seen that RMSE and R 2 approach their smallest and greatest ones if the number of hidden layers is set (2) Super parameters Now we determine the available learning ratio and minimum batch size. To approach this, we first draw the changes of RMSE and R 2 with the maximum epochs at various learning ratios (respectively see Figs. 10a and 10b).
In Fig. 10, the number of hidden layers is 2, the maximum epoch is 50, 100, 150, 200, 250, 300 and 350, respectively, whereas the learning ratios are set at 0.001, 0.003, 0.005, and 0.010, respectively. It can be seen that R 2 approaches the maximum (0.9814) at a learning ratio of 0.005. Coincidentally, at this learning ratio, RMSE also approaches to the minimum value (31.4129). Therefore, the best estimation of the digital parameters may be obtained if the learning ratio is set at 0.005.
The computed RMSE and R 2 at various batch sizes were listed at Tab. 2. In Tab. 2, the minimum batch sizes are set at 5, 10, 15, 20, 25, and 30, respectively, whereas the learning ratios are set at 0.001, 0.003, 0.005, and 0.010, respectively. It is coincidental again that RMSE and R 2 simultaneously approach to the maximum (0.9814) and minimum (30.6670).
(3) Train ratio Now we determine the available train ratio using the above-mentioned number of hidden layers and super parameters. Here, the train ratio is defined as the percentage of the training set in total of the dataset where the total of the feldspar area is used. The accuracy may be unstable and unavailable if a greater train ratio is used. Nevertheless, the fidelity may be decreased if a smaller train ratio is used.  Using above-mentioned LSTM-based architecture (see Section 3.2), the number of hidden layers (see Section 4.1), and super parameters (see Section 4.2), we obtained the total of the feldspar area at different instants at various training rates (see Fig. 11). In Fig. 11, the train ratios are set at 0.5, 0.6, and 0.8, respectively. It can be seen that the accuracies are greater if the train ratio increased. Considering the balance between the input dataset and training-consumed time, we set the train ratio at 0.8 in the following sections.
(4) Measurements of accuracy Using the above-mentioned parameters, we may estimate the changes in the digital parameters in describing various compositions at various locations on the granite specimen under axial load. Due to the limited space, the results related to the area are presented herein.
Tab. 3 lists the estimation accuracies of various compositions at various locations by using the updated LSTM-based architecture. It can be seen that the mean RMSE and mean R 2 of biotite areas are 23.144 and 0.809, respectively, whereas those of quartz areas are 44.902 and 0.912, respectively. In total, the errors are relatively small and the established architecture may reflect the area changes of various compositions at various locations.  Figure 11: Total of feldspar area vs. instants at various training rates using established LSTM-based architecture Fig. 12 shows the estimated and actual areas of various compositions at various instants using the updated LSTM-based architecture. It can be seen that the errors of the estimated areas are about AE100, AE150, AE20 and AE50 mm 2 , respectively for feldspar, quartz, biotite and crack. This order is coincident with the great-small order of the percentages of the compositions included in the granite. That is to say, the composition with a greater percentage may have a greater accuracy if the established architecture is used.
In total, there is a relatively small error if the established LSTM-based architecture is used to estimate the digital parameters of various compositions included in granite under axial load. Fig. 13 shows the stress-strain curve of granite under uniaxial compression load. It can be seen that the deformation process of granite may be divided into the following four stages:

Stage Division during Granite Deformation
(a) Stage I. The time of duration is 0 to 80 s. No crack appeared on the specimen surface, the stress level was very low, and the rock belongs to the compaction stage; (b) Stage II. The time of duration is 81 to 256 s. There is no crack on the specimen surface whereas the stress increased rapidly with a linear stress-strain relation, and the rock belongs to an elastic deformation stage; (c) Stage III. The time of duration is 257 to 520 s. In this stage, most cracks appear on the specimen surface whereas the stress increased almost with a linear stress-strain relation, and the rock belongs to an elastic-plastic deformation stage;

Establishment of LSTM-Based Architecture for Estimation of Granite Deformation Stage
For efficiently estimating the granite deformation stage using the LSTM-based architecture, the aboveestablished architecture for estimation of digital parameter changes of granite compositions was furthermore modified and described as follows: (a) Digital parameters. The digital parameters of an area in input layer are set as long-axis length, shortaxis length, perimeter, area, oblate degree, circularity, variance index and rectangularity.
(b) Accuracy measurement. The ACC (or average accuracy) is used as the criterion for evaluating the accuracy of the LSTM-based architecture and computed by where N and N´respectively represent the total and correctly-classified numbers in the time sequence data.

Accuracy of LSTM-Based Architecture
The established LSTM-based architecture was used to determine the granite deformation stages. The estimation results are shown in Tab. 4 and Fig. 14. It can be seen that using the established LSTM-based architecture in all of the deformation stages, ACCs of biotite, quartz, feldspar and crack are all greater than 70% with a great-small order of biotite, feldspar, crack and quartz; the estimation accuracy is relatively low (just greater than 70%) at stage I and greater (all greater than 80%) at stages II, III and IV. Although ACCs are 70% to 80% at stage I and ACCs of quartz at stages II and III are 80% to 90%, other ACCs are greater than 90%. This implies that more attentions should be paid for the compaction stage and quartz in the further study. In total, the established LSTM-based architecture effectively estimate the deformation stages of various compositions in granite under uniaxial load.

Conclusions
In this study, two LSTM-based architectures were established to respectively estimate the changes of the composition digital parameters and the rock deformation stages of granite under axial compression load. The influences of the training parameters on the availability of the architectures were also explored. The results show that: (b) Using the modified LSTM-based architectures, the digital parameters are efficiently estimated if the number of hidden layers, maximum epoch, learning ratio, minimum batch size, and train ratio are set to be 2, 150, 0.005, 10, and 0.8, respectively. The great-small order of the accuracy is biotite, feldspar, crack, quartz.
(c) In estimating the deformation stage using the established LSTM-based architecture, root mean squared error at the compaction stage is relatively great while those at the later deformation stages are relatively small. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.