Prediction and Evaluation of Coal Mine Coal Bump Based on Improved Deep Neural Network

State Key Laboratory of Water Resource Protection and Utilization in Coal Mining, Beijing 100011, China School of Energy Science and Engineering, Henan Polytechnic University, Jiaozuo 454000, China Henan Key Laboratory for Green and Efficient Mining & Comprehensive Utilization of Mineral Resources, Jiaozuo 454000, China Collaborative Innovation Center of Coal Work Safety, Jiaozuo, Henan Province 454000, China


Introduction
Coal bump is a dynamic phenomenon characterized by sudden, rapid, and violent destruction of coal (rock) around roadway or mining face due to the instantaneous release of elastic deformation energy [1]. This kind of disaster is a bottleneck problem in underground mining engineering, which directly threatens the safety of construction personnel and equipment and then seriously affects the project progress. Therefore, the prediction of coal bump is very important. Prediction is the core of coal bump prevention and control. Accurate and reliable prediction of high-intensity coal bump disaster is to effectively avoid and control it [2]. The prediction of coal bump has become a research hotspot in the fields of large-scale underground geotechnical engineering and deep coal mineral resources mining.
The current research on prediction of coal bump can be generally divided into three categories: The first category is the criteria established on the basis of coal bump mechanism, such as Russense criterion, Barton criterion, Turchaninov criterion, and Hoek criterion [3]. The second category is the prediction method based on field measurement, mainly including the microgravity method [4], acoustic emission method [5], and microseismic method [6]. The third type is the prediction method considering the influence of various factors. The third kind of method considers the problems relatively comprehensively and has good guiding significance for engineering practice. In recent years, this kind of method has attracted extensive attention of scholars. The third method is divided into two subcategories: (1) comprehensive prediction method based on coal bump index criterion. Among them, Tan [7] proposed a prediction method for comprehensively judging the possibility and intensity of coal bump based on fuzzy mathematics theory. Adoko et al. [8] and Wang et al. [9] conducted in-depth research; prediction models based on fuzzy mathematics theory are established, respectively, but the determination of index weight in this method depends on subjective factors. Hu et al. [10] established an improved matter-element extension model for coal bump intensity prediction, which is difficult to predict mixed and intermediate coal bumps. Chen et al. [11] established the prediction model of ideal point method by calculating the index weight through the combination weighting method, but the ideal point method is only an evaluation and analysis method, and it is necessary to determine reasonable evaluation factors and ideal points when using. Hu et al. [12] applied the combination weight to assign weight to the index and established the coal bump grade preside model based on the approximate ideal solution ranking method. This method is difficult to determine the index weight under the condition of multiple factors. Li et al. [13] proposed an improved cloud model to predict coal bump by fusing the index weights through cloud atomization. The prediction accuracy will be reduced for the indexes that do not obey the normal distribution. (2) The comprehensive prediction method of coal bump based on the sample data of examples, among which the representative ones are as follows: Gong et al. [14] established a Bayesian discriminant model for coal bump prediction, and the prediction accuracy of the model is easily affected by the representativeness of the original data and the sample size. Luo and Cao [15] used principal component analysis to calculate the weight matrix and established a weighted distance discrimination model. This method is greatly affected by the representativeness and accuracy of the original data. Wu et al. [16] established the coal bump prediction model of least squares support vector machine based on particle swarm optimization algorithm. The kernel function is the core of support vector machine, and its selection directly affects the prediction accuracy and calculation time. Pu et al. [17] established a coal bump prediction model based on decision tree. The decision tree is suitable for highdimensional data, but it is easy to over fit.
The above methods and theories of coal bump prediction have achieved certain prediction results from different angles, which have played a great role in promoting the research of this problem. However, due to the complexity of coal bump mechanism, the diversity of influencing factors, and the defects of various methods, there are still the following deficiencies in practical engineering application: (1) the main ideas of most methods belong to comprehensive evaluation, and the core problem is the determination of the weight of each index. However, the determination of the weight will inevitably be subjective and arbitrary; the rationality of weight is the key to the reliability of coal bump prediction results. (2) Coal bump prediction is a complex nonlinear problem. The occurrence of coal bump is the result of the joint action of many factors. Some of these influencing factors are determined and quantitative, while others are random, qualitative, and fuzzy. It is difficult to describe comprehensively and accurately by using mathe-matical or mechanical methods and theories, which is greatly affected by human factors and one sidedness. Therefore, it is still necessary to explore new prediction methods and carry out the research on coal bump intensity classification prediction. In 1994, Feng [18] first proposed an adaptive pattern recognition method for rock burst prediction by using neural network theory, and then, some scholars also carried out research in this field. Jia et al. [19], Roohollah and Abbas [20], and Wu et al. [21] established generalized regression neural network, emotional neural network, and probabilistic neural network coal bump prediction models, respectively. Coal bump is a special form of mine pressure manifestation. The prediction of the breeding mechanism, time of occurrence, and intensity of coal bump has been an outstanding problem. The key to effective prevention and control of coal bump lies in the research of monitoring and early warning of coal bump. The current mine monitoring mostly uses microseismic for global monitoring. For the monitoring of mine seismic and impact ground pressure at the mining face, most of the traditional monitoring means such as drill chip method and stress monitoring are used at present. The accuracy of the prediction needs to be enhanced, and the development and arrangement of the monitoring system needs to be further optimized. With the development of big data and artificial intelligence technology, people gradually began to use computer models to predict the impact hazard of coal bump. However, the prediction is mostly based on geological factors and mining technology factors, as well as monitoring data. The prediction methods and algorithms are relatively single, and the prediction results are fixed, which is poor guidance for the site. Further research is needed in the comprehensive utilization and integration analysis of the information obtained by various techniques reflecting different aspects of coal bump. This paper studies the deep neural network model based on the dropout method and improved Adam algorithm. The model makes full use of the stronger nonlinear learning ability and deeper network depth of deep neural network (DNN) [22]. The model avoids the problem of determining the index weight and is completely data-driven. The qualitative and quantitative analyses are effectively combined to avoid the influence of human factors. It can mine complex and subtle deep relationships in incomplete, imprecise, and noisy limited data sets. Therefore, the research on the application of depth neural network in coal bump prediction is of great significance to expand the coal bump prediction system and improve the ability of prediction.

Prediction Sample Database
2.1. Selection of Evaluation Indicators. The occurrence mechanism of coal bump is complex, and there are many influencing factors. The selection of indicators is the key to prediction. Too many indicators will make it difficult to obtain the measured values of some indicators and increase the complexity of the prediction process. Too few indicators cannot reflect the comprehensiveness of the prediction process, resulting in the inconsistency between the results and 2 Geofluids the reality. The research of this paper is to determine the coal bump prediction and evaluation index through the analysis of three coal bump engineering examples of Qianqiu coal mine, Zhaolou coal mine, and Wulong coal mine. The purpose of analyzing coal bump engineering examples is to convert fuzzy and nonquantifiable influencing factors into quantifiable physical and mechanical indexes. At the same time, the selected indexes should also be common, easy to measure in practice, and recorded in previous coal bump examples. From the in situ stress level and the location of coal bump in the example, coal bump usually occurs in the rock mass with high stress concentration. Therefore, the maximum tangential stress of tunnel wall surrounding rock is selected as one of the coal bump prediction indexes. In terms of landform, coal bump usually occurs in mountains or deep underground projects, or in rock mass with high tectonic stress. From the structural layout, the more irregular the excavation section, the greater the possibility of coal bump, and the above factors can be reflected by the maximum tangential stress of the surrounding rock. In the example, the coal bump section form is mainly tensile failure, accompanied by shear failure, so the tensile strength and shear strength of rock are selected. Through reading the existing literature, it is found that there are few records of shear strength in the actual coal bump example, which is difficult to analyze. Therefore, only the tensile strength of the rock is selected as a predictor of coal bump, and the tensile strength is considered to represent the tensile and shear properties of the rock.
In addition, coal bump mainly occurs in hard coal or rock with complete structure, and the common index to measure the hardness of coal rock is the compressive strength of coal rock, and the compressive strength of rock should be measured in almost any coal mass engineering. Therefore, the compressive strength of coal is also selected as the prediction index of coal bump. The formation of high-energy reservoir in surrounding rock must meet two conditions: one is that the rock mass can store large elastic strain energy and second is that the internal stress of the rock mass is highly concentrated. The coal bump tendency index reflects the energy storage and release performance of rock mass. Under the same stress conditions, the greater the rock mass, the better the energy storage and release performance of rock mass. Therefore, the coal bump tendency index is selected as the coal bump evaluation index.
Through the above analysis, it is considered that the role of many influencing factors in coal bump can be reflected by the four physical and mechanical indexes of maximum tangential stress, uniaxial compressive strength, uniaxial tensile strength, and elastic energy index of surrounding rock. Therefore, four indexes are selected as the prediction and evaluation indexes of coal bump in this study.   [23][24][25][26], 305 groups of coal bump engineering case data are collected as the sample data of coal bump prediction. All data samples have complete independent four factors.

Improved Deep Neural Network Model
In recent years, deep learning technology has attracted extensive attention. As a deep learning model fitting complex nonlinear relationships, deep neural network has not only made a breakthrough in image classification but also significantly improved the accuracy of speech recognition.

Deep Neural Network
Model. The DNN model is derived from the perceptron model (as shown in Figure 1). The perceptron model can only be used for binary classification and is unable to learn more complex nonlinear models. DNN is extended based on perceptron model by adding hidden layer, expanding activation function, and adding neurons in output layer. The interior of DNN can be divided into three categories: input layer, hidden layer, and output layer. Its structure is shown in Figure 2. The outstanding feature of DNN is that it has multiple hidden layers. Each link between network units is a causal chain that can be learned and trained. If the same network unit is used, DNN has far more expression ability than shallow network and stronger ability to deal with complex problems. Figure 3 shows the trend of the number of coal bump mines in China according to the China Energy Statistics Yearbook 2013. Figure 4 shows the history of the development of machine learning theory based on a comprehensive analysis of previous literature.
Activation function simulates the threshold activation characteristics of human brain neurons, introduces nonlinear features into DNN, and realizes the transformation from simple linear space to highly nonlinear space. The Improved Adam algorithm 1. Initialization: initial learning efficiency,η = 0:001; Exponential decay rate of first-order moment and second-order moment estimation, β 1 = 0:9, β 2 = 0:999, β 1 , β 2 ∈ ½0, 1Þ; Small constants for numerical stability, δ = 1e − 08 2. Initialization parameter θ 3. Initialization: first order moment vector m 0 = 0, second order moment vector v 0 = 0; time step t 0 = 0; iterative direction of improved Adam algorithm p λ 0 = 0 4. When the stop criterion is not reached 5. A small batch of m samples fx 1 , x 2 , ⋯ ⋯ , x m g was collected from the training set 6. Calculate the gradient y i for the target g t ⟵ 1/m∇ θ t−1 ∑ i Lððx i ; θÞ, y i Þ 7. t ⟵ t + 1 8. Updated biased first-order moment estimation: m t ⟵ β 1 ⋅ m t−1 + ð1 − β 1 Þ ⋅ g t 9. Updated biased second-order moment estimation Correct the deviation of the first moment:m ⟵ m t /ð1 − β 1 t Þ 11. Correct the deviation of the second moment:v ⟵ v t /ð1 − β 2 t Þ 12. The amount of update per iteration of the improved Adam algorithm: Considering the advantages of fast convergence and strong generalization ability of model training using ReLu function [27], ReLu is selected as the hidden layer activation function in this paper, and its function form is as follows: The activation function of the output layer is determined according to the problems to be solved. Coal bump prediction belongs to the classification task, which usually adopts softmax function, and its function form is as follows: where h k L is the output of the kth neuron in the output layer. Forward calculation cannot learn the best parameters (weight and bias) based on learning samples. Therefore, Rumelhart et al. [28] proposed the backpropagation (BP), which outputs the parameters of each layer in turn from the error of the predicted value and the actual value from the output layer backwards. When using BP algorithm to optimize parameters, for classification tasks, the loss function generally selects cross-entropy error, and its function form is as follows: where y k i is the actual value, y i ∧ k is the predicted value, N is the learning sample number, and T is the number of classifications.

Algorithm Improvement of Neural Network Model.
Overfitting refers to the state that only the training data can be fitted, but other data not included in the training data cannot be well fitted. Generally, the reasons for overfitting are as follows: (1) the model has many parameters and strong expressiveness and (2) less training data. Since there are few parameters in the model, only reason 2 can be considered. Considering the limitation of coal bump data, in order to prevent overfitting in the training process of the DNN model, this paper uses the dropout method to regularize the model. The basic idea of the dropout method is to randomly discard a certain proportion of neurons in the input layer and hidden layer in the DNN training process. Dropout reduces the feature extraction process of irrelevant feature data.
The goal of DNN training is to reduce the error until the global optimal or suboptimal solution is reached based on stochastic gradient descent, momentum, AdaGrad, and adaptive motion estimation (Adam). SGD is the simplest and commonly used optimization algorithm in DNN training. Compared with SGD algorithm, Adam algorithm combines the advantages of momentum algorithm and Adagrad algorithm, automatically adjusts the learning rate, and efficiently searches the parameter space. It is suitable for solving the problem of coal bump prediction with high noise. Although Adam algorithm theoretically solves the adaptive problem of learning rate, Wilson et al. [29] found that Adam algorithm not only has higher training effect but also brings nearly half of the test error. To solve this problem, we integrate the idea of momentum [30] into Adam's algorithm, which is more stable. The optimization and update steps of the improved Adam algorithm are shown as Algorithm 1.

Prediction of Coal Bump Based on Improved Neural
Network Model. In this paper, the dropout method and improved Adam algorithm are applied to the coal bump  Number of hidden layer neurons 32, 64, 16 3 Number of neurons in output layer 6 4 Hidden layer activation function ReLU 5 Output layer activation function Softmax 6 Loss function Cross-entropy error 7 Suppress overfitting Dropout method 7 Geofluids that the hidden layer is three layers, and the number of neuron nodes is 32, 64, and 16, respectively. As shown in Table 1, coal bump intensity is often divided into four levels, namely, no coal bump (level I), slight coal bump (level II), intermediate coal bump (level III), and strong coal bump (level IV). Considering that the input and output of the DA-DNN model are numerical values, the above coal bump levels are coded, and the four levels of "no rock burst", "slight rock burst", "intermediate rock burst", and "strong rock burst" are represented by numbers "0", "1", "2", and "3", respectively. The four neurons in the output layer are "0", "1", "2", and "3".
The activation function of hidden layer is ReLu function. Because coal bump prediction is a classification task, the output layer activation function is softmax function, and the loss function is cross-entropy error. To verify the superiority of ReLu function, the prediction accuracy of test set is taken as the verification target, which is compared with other three common activation functions. It can be seen from Figure 6 that the prediction accuracy of selecting ReLu as the activation function is more than 95%, and the rest are less than 85%. Obviously, the prediction accuracy of selecting ReLu function is higher.
The improved Adam algorithm is adopted in this study. To verify the superiority of the improved Adam algorithm, it is compared with SGD, Adam, and improved Adam. It can be seen from Figure 7 that taking the prediction accuracy of the test set as the verification target, when the training times (epochs) are less than 700, the prediction accuracy of SGD algorithm is less than 62%. When the training times are 100, the accuracy of Adam and improved Adam algorithm has reached more than 70%, while when the training times are 300, the accuracy of improved Adam algorithm has reached more than 95%. And it is obviously better than Adam algorithm. Only when the training times are greater than 700, the prediction accuracy of SGD algorithm can reach more than 70%. As can be seen from Figure 8, the training time of the improved Adam algorithm is significantly lower than that of the Adam algorithm, indicating that its loss convergence speed is better. The main parameters of the DA-DNN coal bump prediction model are shown in Table 1. In this paper, the DA-DNN algorithm is programmed in Python language, the development environment is Python 3.7, and the code implementation is based on keras algorithm package.

Case Study of Coal Bumping Based on Improved Neural Network Model
Taking a coal mine in Shanxi Province as an example, the mining depth of the mine is 500 m, there are many impact 8 Geofluids dynamic phenomena during roadway excavation, and there has been a rock burst accident, which belongs to a strong rock burst mine. According to the prediction results of the above indexes, it is consistent with the actual situation of the mine. Borehole pressure relief is a method to form surrounding rock fracture area by constructing large-diameter holes on the coal wall to eliminate stress concentration and reduce impact risk. The crushing zone formed by drilling and unloading pressure can attenuate the vibration waves caused by mine earthquakes, rapidly weaken the vibration wave energy transmitted to the roadway, and protect the roadway from vibration damage. The principle of pressure relief by large-diameter drilling is shown in Figure 9. During the mining of 401101 working face and the excavation of central main roadway in a coal mine, the pressure is relieved by drilling holes on both sides of the roadway. The drilling direction is inclined along the coal seam, the hole spacing is 0.7 m, the hole depth is 20 m, and the hole is 1.2~1.5 m from the roadway floor. It can be seen from Figure 10 that the total daily energy was in the low energy stage for five consecutive days after the coal bump event on June 16, and the large energy release event occurred again on June 22, but no coal bump accident was caused. Since then, the comprehensive prevention and control scheme combined with composite support technology and large-diameter borehole pressure relief was implemented in the roadway, and the energy has been lower than 7:5 × 10 4 J for 4 consecutive days after the implementation of the scheme; the frequency is less than 3 times, indicating that this scheme can effectively reduce the energy stored in coal and rock mass and its vibration frequency.
It can be seen from the above that after the comprehensive prevention and control scheme is adopted in the roadway excavation process, the occurrence frequency and strength of coal bump are significantly reduced, while there is no impact in the area with low stress level, indicating that the effect of the comprehensive prevention and control scheme has played a good control role and achieved certain results in the prevention and control of coal bump.

Conclusions
(a) When the training times (epochs) are less than 700, the prediction accuracy of SGD algorithm is less than 60%. When the training times are 100, the accuracy of Adam and improved Adam algorithm has reached more than 70%, while when the training times are 300, the accuracy of improved Adam algorithm has reached more than 95% (b) The comprehensive prevention and control scheme combined with composite support technology and large-diameter borehole pressure relief was implemented in the roadway, and the energy has been lower than 7:5 × 10 4 J for 4 consecutive days after the implementation of the scheme; the frequency is less than 3 times, indicating that this scheme can effectively reduce the energy stored in coal and rock mass and its vibration frequency (c) After the comprehensive prevention and control scheme is adopted in the roadway excavation process, the occurrence frequency and strength of coal bump are significantly reduced, while there is no impact in the area with low stress level, indicating that the effect of the comprehensive prevention and control scheme has played a good control role and 9 Geofluids achieved certain results in the prevention and control of coal bump (d) Coal bump data is growing rapidly and is being produced in large quantities in mining engineering. The traditional data processing methods cannot adapt gradually. It is an urgent direction to develop artificial intelligence data processing methods and use deep learning technology to learn and mine coal bump data

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.