Real-Time Prediction Model of Coal and Gas Outburst

Coal and gas outburst has been one of the main threats to coal mine safety. Accurate coal and gas outburst prediction is the key to avoid accidents. The data is actual and complete by default in the existing prediction model. However, in fact, data missing and abnormal data value often occur, which results in poor prediction performance. Therefore, this paper proposes to use the correlation coeﬃcient to complete the missing data ﬁlling in real time for the ﬁrst time. The abnormal data identiﬁcation is completed based on the Pauta criterion. Random forest model is used to realize the prediction model. The prediction performance of sensitivity 100%, accuracy 97.5%, and speciﬁcity 84.6% were obtained. Experiments show that the model can complete the prediction of coal and gas outburst in real time under the condition of missing data and abnormal data value, which can be used as a new prediction model of coal and gas outburst.


Introduction
China is one of the countries with serious coal and gas outburst disasters which pose a great threat to the coal mine safety production [1]. e accidents caused by coal and gas outburst account for 38% of safety accidents in coal mine [2], which is the most dangerous and frequent accident type in coal mine accidents. Before coal and gas outburst, it is of great significance to complete the prediction of coal and gas outburst to ensure the safety of production and protect the life of miners. e common prediction methods of coal and gas outburst include the index prediction method and mathematical model prediction method [3]. e index prediction method is to detect the values of various indexes and compare them with the standard values of indexes to determine whether they are dangerous for coal and gas outburst. Common indexes include gas content, gas pressure, coal strength coefficient, cuttings index, and comprehensive index [4][5][6]. ere are many influencing factors and large measurement errors in these indexes, which often lead to low prediction accuracy. erefore, in recent years, researchers pay more attention to mathematical model prediction methods. e method is to select several characteristics that affect coal and gas outburst and use the mathematical model to predict the outburst risk [7][8][9]. In fact, the existing prediction methods are based on the ideal data to complete the prediction of coal and gas outburst. However, there is still a long distance from the actual application, which is mainly reflected in three aspects. First, the data used by default is complete. In fact, in the process of data transmission and data fusion, data is often missing, resulting in partial or even all data missing. Second, the data used by default has the same value as the real value. In fact, due to the limitations of experimental conditions and experimental process, there may be some differences between the data and the real value. ird, the prediction model is based on the previous data. When the new abnormal data is coming, there is a lack of processing capacity resulting in the inability to complete the prediction in real time. For this reason, this paper proposes a coal and gas outburst prediction model. According to the characteristics of data from the coal mine, the data processing method is given, which can accurately predict coal and gas outburst in practical application. e prediction of coal and gas outburst ensures the safety of miner and property of miners.

Real-Time Prediction Model
ere are many factors that affect coal and gas outburst. Commonly used factors include gas content, gas pressure, coal strength coefficient, drilling cuttings index, comprehensive index, initial gas release speed, porosity, and coal seam thickness [10]. In this paper, five index parameters are selected including gas content, gas pressure, coal strength coefficient, initial gas release speed, and porosity [11]. e mathematical prediction model is used to predict coal and gas outburst, which is shown in Figure 1. ree kinds of data including gas content, gas pressure, and initial velocity of gas emission collected from the underground are transmitted to the well through the transmission substation. ese three kinds of data and coal strength coefficient and porosity of the coal body form the joint dataset. Data processing is used to deal with abnormal data and missing data. e prediction of coal and gas outburst is completed by using the random forest model. In case of coal and gas outburst, the manager shall be informed to take relevant measures at the same time.
e audible and visual alarm shall be started to inform the miner to start the relevant risk avoidance equipment. e information shall be transmitted to the underground through the transmission substation. At the same time, the data storage and update are completed, and the new prediction model is trained by using the updated dataset regularly.

Data Description.
e data is divided into accident data and safe data, in which the safe data is easy to obtain and the data volume is large [12]. However, it is difficult to obtain accident data, even if there are some records of accident data, which are often incomplete, resulting in a small amount of accident data. erefore, this paper adopts different methods for safe data and accident data. When the safe data is missing, delete the data directly. When the accident data is missing, fill the data according to the existing data. In this paper, five indexes including gas pressure, gas content, initial velocity of gas emission, porosity, and coal strength coefficient, which are closely related to coal and gas outburst, are selected as characteristics. Partial characteristic data is shown in Table 1.

Data Processing.
According to the characteristics of coal mine data, this paper completes the data processing. e processing block diagram is shown in Figure 2. First, check whether there is data missing in the dataset. If there is data missing, use the corresponding processing method to complete the data filling. Secondly, check whether there are abnormal values in the dataset. If there are abnormal values, use the abnormal value processing method to process the data.

Missing Data Processing Method.
ere are many methods of data filling, but they are based on the existing dataset [13]. As the time goes, the amount of data will become very large, so it will take a lot of time to realize data filling so as to affect the prediction speed [7]. erefore, this paper proposes a numerical filling data method based on the correlation of variables which can be considered as real time. For the first time, the correlation between existing data variables is considered, and Pearson correlation coefficient is used to complete the new missing data filling that ensures data integrity. Pearson correlation coefficient can be expressed by numerical value to measure the correlation between two variables, which can be described as follows: where E(X) is the mathematical expectation of the variable, σ X represents the standard deviation of variable X, and μ X represents the mean value of variable X. In this paper, the correlation matrix of five characteristics is calculated, and the results are shown in Table 2.
When data from a certain feature in the dataset is missing, this article observes the correlation coefficient in the correlation matrix and finds out the maximum value of the correlation coefficient between the characteristics of the data and other characteristics.
is article makes use of the correlation coefficient to filling the missing value. In this paper, the calculated values obtained by using the correlation when the five characteristics are missing are listed, respectively. e experimental results are shown in Table 3.
It can be seen from Table 3 that the correlation coefficient matrix method is used to fill the missing data, and the similarity between the filled data value and the actual value is from 83.3% to 98.6%. On the whole, the filled data is very similar to the actual data which ensures higher prediction performance.

Abnormal Data Value Processing Method.
Due to the limitation of equipment precision and experimental conditions, noise will inevitably be mixed into the data, which will lead to abnormal data values. In this paper, after processing the missing value of the data, the abnormal value in the data is found according to the Pauta criterion. Pauta criterion is described as follows.
When the value meets the formula, X i is considered to be an abnormal, where X i is the data point, x � (1/n) n i�1 x i is the data mean value, and S x � [(1/n) n i�1 (x i − x) 2 ] 1/2 is the standard deviation of data. In this paper, 5 characteristics of the dataset are detected for the abnormal data value. When detecting the abnormal data value, the missing value filling method mentioned above is used for data correction. In this paper, 530 pieces of accident data and safe data are selected, and the above methods are used to complete the abnormal data identification. e abnormal identification results are shown in Table 4.
It can be seen from Table 4 that there are some abnormal data in the dataset. e main reason is that the accuracy of data acquisition equipment and the limitations of experimental methods leading to part of the data value deviate from the actual value. erefore, this part of abnormal signals must be identified and deleted, and the data filling method mentioned above should be used to complete the data filling, so as to ensure the high performance of prediction.

Realization of Coal and Gas Outburst Prediction.
In this paper, accuracy, sensitivity, and specificity are used as performance evaluation indexes of the model, which are defined as follows: TP is the number of correctly predicted safe data. FP is the number of wrongly divided safe data into accidental data. FN is the number of wrongly divided accidental data into safe data. TN is the number of correctly predicted accidental data.
In order to achieve high-performance detection and avoid overfitting, this paper uses the random forest model to achieve coal and gas outburst prediction. Random forest (RF) is an efficient integrated classification method composed of a large number of decision trees. According to the characteristics of decision trees, many improvements have been made so that the decision trees constructed are as uncorrelated as possible, thus significantly improving the classification performance of the system. In this paper, the  Mathematical Problems in Engineering grid search method is used to optimize the model parameters, the number of optimal decision trees is 500, and the number of random variables is 3. e 10 fold crossvalidation method is used to complete the cross validation, and the R language is used to realize the model. e prediction results are shown in Table 5.    Table 5, we calculated that the model has high performance of sensitivity 100%, accuracy 97.5%, and specificity 84.6%. From the experimental results, we can conclude that the model can completely detect coal and gas outburst with high performance.

Conclusion
In this paper, the prediction model of coal and gas outburst is realized. e correlation between the five important characteristics affecting coal and gas outburst is analyzed.
is paper completes the filling of missing data according to the correlation between characteristics for the first time which can guarantee to consume very little time. Abnormal values are identified and processed in order to ensure the analyzed data getting closer to the actual data. e processing method of missing data and abnormal value is proved be effective. e random forest model is used to predict coal and gas outburst with high performance. e experiment shows that it can complete the task of real-time prediction of coal and gas outburst with high performance of sensitivity 100%, accuracy 97.5%, and specificity 84.6%. It can be used for safety production of coal mine in practical application.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.