Application of BigML in the Classification Evaluation of Top Coal Caving

,


Introduction
According to the World Energy Statistics Review published in 2020, global coal consumption decreased by 0.6% in 2019, and the proportion of coal in primary energy reached the lowest level in 16 years, but the proportion of coal in primary energy is still up to 27% [1]. erefore, in the continuous development of other energy sources, coal is still one of the most critical energy sources [2], especially for a country such as China that is "rich in coal, poor in oil, and less in gas" and whose economy is developing rapidly, the status of coal is vital. In 2015, China's coal production and consumption had reached 47% and 50.01% of the world's coal production and consumption [3], respectively. By 2019, when global coal consumption is declining, China's coal consumption will still account for 51.7% of the world's coal consumption [1]. It shows that a large amount of coal needs to be consumed to support China's rapid economic development. Among the proven recoverable coal reserves in China, thick coal seams account for 44% of the total recoverable coal reserves [4], and coal seams with a thickness of more than 5 m are as much as 10 billion tons [5], so nearly half of the coal consumed comes from the mining of thick coal seams [6]. As the longwall top coal caving mining technology has the advantages of high output, high efficiency, low energy consumption, low cost, and strong adaptability [7], it is estimated that the use of longwall top coal caving mining in thick seams can double the production efficiency and reduce the production cost by 30% to 40% [8].
erefore, since China introduced it from abroad in the 1980s, it has become the primary technology for mining thick coal seam in China, Australia, Turkey, Russia, and Vietnam through continuous development and improvement [9][10][11][12][13][14][15][16]. e top coal caving (that is, the difficulty of top coal mining under the action of in situ stress and gravity [17]) is one of the critical factors determining the success of longwall top coal caving mining, and it is also an essential reference for designing longwall top coal caving mining technology and improving the recovery rate of top coal [18]. In addition, if the top coal caving can be understood from the feasibility study to the formal design stage, the relevant technical risks will be significantly reduced [19]. erefore, the evaluation of top coal caving has always been a hot spot in the research of longwall top coal caving mining at home and abroad.
At present, the evaluation of top coal caving at home and abroad is mainly based on observation and empirical methods [7,19], established mathematical evaluation models [20], and numerical simulations [21][22][23]. However, there are some shortcomings in these methods. For example, based on observation and experience, there are strict requirements on the experience level of the evaluator, and at the same time, they also need to bear the massive risk of mistakes due to lack of experience. e establishment of mathematical models and numerical models to evaluate top coal caving requires that the model builder has a very high level of professional knowledge. At the same time, as the simulation size of the current numerical simulation is getting larger and larger, the requirements for the computer are getting higher and higher, which makes the technical cost and time cost of the evaluation very high, and the evaluation efficiency is not high. In recent years, with the continuous development of science and technology, advanced artificial intelligence and machine learning algorithms such as ant colony clustering algorithms, expert systems, and artificial neural networks have been gradually used in coal production [24][25][26]. erefore, experts and scholars are also trying to use artificial intelligence algorithms. Moreover, machine learning algorithms are used to evaluate top coal caving and have achieved certain results. For example, Mohammadi et al. [27] used fuzzy multicriteria decision-making methods to establish a classification system for evaluating the caving of the direct roof of coal seams; Yongkui et al. [28] used Bayesian theory and rough set theory to establish a Bayesian classifier model used for the evaluation and prediction of roof caving properties of coal seams, which can accurately classify; Oraee and Rostami [29] used fuzzy logic algorithms to establish a fuzzy system for quantitative analysis of roof caving in longwall top coal caving mining face and applied the model to Tabas·Palward Mine's longwall top coal caving mining face which located in Palward District, Yazd Province, and the model prediction results in application are consistent with the on-site measured results; Shi et al. [17] established a top coal caving prediction model based on vector support vector machines, and the test results showed that the model has a certain feasibility and generalization; Yu and Mao [30] used SPSS statistical software to establish a top coal caving prediction model based on an artificial neural network. e training and test results show that the model has good top coal caving prediction capabilities. However, most of the prediction models of the above experts and scholars are built by computer programming, which requires higher computer language and professional level of users, especially when users want to optimize and modify the model to make it conform to the actual situation. erefore, it leads to the poor portability of the model and the difficulty of popularization and application. erefore, in order to make the model have better portability, at the same time, without programming, it is easy to use the established model to evaluate the caving ability of top coal, and even easy to modify and optimize the established model to make it more in line with their practical application. In this article, BigML is applied to establish the classification evaluation model of top coal caving. It is hoped that through this, it is possible to easily use the established model to evaluate the top coal caving without programming, and even easily modify and optimize the established model to make it more in line with their actual application situation.

Introduction to BigML
BigML (https://bigml.com) is a cloud-based machine learning platform dedicated to enabling all people who understand computer language and do not understand computer language to build their machine learning prediction model without writing a line of code and to make it automatic, remote callable, programmable, and extensible. It can perfectly and easily solve modeling tasks such as classification, regression, time series forecasting, cluster analysis, anomaly detection, correlation analysis, and realize model visualization. Because BigML has powerful functions and advanced algorithms and it is easy to use, it can realize a one-stop service from data loading, data cleaning, model building, and model evaluation to the final model prediction. So, BigML has been widely used in agriculture [31], medicine [32], finance [33], and other scientific research fields and has successfully helped thousands of analysts, software developers, and scientists all over the world solve machine learning tasks from "end-to-end" and seamlessly transform data into operational models for remote services, or embed them locally into applications for prediction. In addition, BigML has more than 147000 global users, and more than 600 universities and research institutions have cooperated with it. e global user distribution of BigML is shown in Figure 1 [30] and research [9] show that the buried depth (H), thickness of coal seam (M), thickness of gangue (MG), uniaxial compressive strength of coal (Rc), fracture development degree (DN, that is, the product of the number of through cracks N 1m on the surface of 1 m 2 coal and the fractal dimension D 1 of the number of cracks counted by the coal sample), and filling coefficient of direct roof (K, K � hk p /M) are essential geological factors affecting top coal caving [35]. erefore, this article considers the factors above influencing factors for the classification evaluation of top coal caving.
However, the engineering practice shows little difference between grades "4" and "5." Both are difficult to cave. erefore, in this article, the top coal caving is only divided into four grades, which are "(1) excellent caving," "(2) good caving," "(3) fair caving," and "(4) poor caving." e specific situation of each grade is shown in Table 1.

Source of Sample Data.
ere are a large number of research studies on top coal caving in the CNKI publicly published paper database. erefore, to meet the needs of model training and testing, this article obtained 68 sets of sample data from the CNKI publicly published paper database. e specific conditions of the sample data are shown in Table 2, and the data distribution is shown in Figure 2.
rough the shape and volume of the violin in the data distribution diagram (Figure 2), it can be seen that the "grade" of the obtained sample data is unbalanced. erefore, to ensure that the established model has good stability and strong prediction ability, the sample imbalance is a problem worthy of attention, and it is necessary to reasonably select model performance evaluation indicators to evaluate the model.

Data Cleaning and Segmentation.
Although all the sample data are from CNKI's publicly published paper database, outliers are inevitable in these sample data. e quality of the data sample generally has an essential impact on establishing the model and the predictive ability of the established model. erefore, cleaning the data sample and eliminating outliers make the established model better predictive. BigML's data anomaly detection function is based on the most advanced isolated forest algorithm [36], which has a powerful ability of outlier detection. erefore, this article uses BigML's data anomaly detection function to do outlier detection on the data samples in this article. Before BigML performs data anomaly detection, it is set to find at least three sets of relatively abnormal sample data. After testing, only the sample with the coal seam name "No. 8-1 coal seam of Baode Mine" is an outlier (in BigML, it is usually considered that the score is more than 60%, which is the actual abnormality [37]). e BigML anomaly detection result is shown in Figure 3. erefore, "No. 8-1 coal seam of Baode Mine" is removed from the sample data set to obtain clean sample data. e sample data distribution after cleaning is shown in Figure 4.
In order to train the model and test the model's performance, using the data segmentation function of BigML, the sample data are randomly divided according to 7 : 3, 70% of which is the training set and 30% is the test set. e sample data distribution of the training set and test set after segmentation is shown in Figure 5.

Selection of Model Performance Evaluation Indicators
Top coal caving classification evaluation is a problem of classification and prediction. In classification prediction model training and testing, model performance evaluation indicators play a vital role in obtaining the optimal classifier. erefore, choosing appropriate model performance evaluation indicators is essential to identifying and obtaining the optimal classifier [38]. Most of the performance indicators of the current evaluation and classification prediction model are based on the confusion matrix similar to Table 3, and these performance indicators can be divided into three categories.  2 Good e top coal can also cave well. Similarly, the coal can be discharged after selecting the appropriate caving support, but there are large blocks in the discharged coal, which is easy to cause the phenomenon of the bayonet, so corresponding measures need to be taken.
3 Fair e top coal can cave, but not well. At the same time, the discharged coal is large and often has a bayonet phenomenon. Corresponding measures must be taken to discharge the coal. 4 Poor e top coal is very difficult to cave, and more measures are needed to release the coal.

Paired Indicators.
e paired indicators mainly include the evaluation indicators of the binary prediction model, such as accuracy and error rate (Acc&Err), precision and recall (P&R), and true positive rate and true false rate (TPR&TFR), and the evaluation indicators of the multiclassification prediction model, such as macro-accuracy and macro-error rate and macro-precision and macro-recall, which are expanded from binary classification to multiclassification.
Accuracy and error rate (Acc&Err) is used to calculate the proportion of the samples with correct prediction classification and the samples with wrong prediction classification to the total samples. e calculation formula of accuracy and error rate is shown in equations (1) and (2). e range of accuracy and error rate (Acc&Err) is both [0, 1]. Generally, the closer the accuracy is to 1, the better the performance of the model. On the contrary, the closer the error rate is to 0, the better the model's performance: where FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, TP is the number of samples predicted to be true positives, TN is the number of samples predicted to be true negatives, and N sample is the total number of samples. Precision rate and recall rate, respectively, calculated the proportion of the predicted positive samples to the predicted positive samples and the actual positive samples. e calculation formula of precision and recall is shown in equations (3) and (4). e range of precision and recall is both [0, 1]. Ideally, the closer the precision and recall are to 1, the better the model's performance. However, in practice, the relationship between FP and FN is the relationship between type I error and type II error, so precision and recall (P&R) is a contradictory relationship. erefore, it is generally necessary to find a balance between the precision and recall: where TP is the number of samples predicted to be true positives, TN is the number of samples predicted to be true negatives, and FP is the number of samples predicted to be false positives. True positive rate and true false rate (also called sensitivity and specificity, TPR&TFR), respectively, calculate the proportion of samples correctly predicted as positive cases to the total positive samples, and the proportion of samples correctly predicted as negative cases to the total negative samples. e calculation formula of true positive rate and true false rate is shown in equations (5) and (6). e range of true positive rate and true false rate is both [0, 1]. Ideally, the closer the true positive rate is to 1 and the closer the true false rate is to 0, the better the model performance: where FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, TP is the number of samples predicted to be true positives, and TN is the number of samples predicted to be true negatives.
Macro-accuracy and macro-error rate and macro-precision and macro-recall are also called averaged-accuracy and averaged-error rate and averaged-precision and averaged-recall, which extended from the problem of dichotomy for requirements of multiclassification problems. ey all treat each category equally, add up the same indicators of different categories, and then calculate the average to realize the evaluation of the multiclassification prediction model. erefore, their value range and significance are the same as evaluating the two-classification prediction models. eir calculation formula is shown in the following equations: where FP i is the number of samples predicted to be false positive in class i, FN i is the number of samples predicted to be false negative in class i, TP i is the number of samples predicted to be true positive in class i, TN i is the number of samples predicted to be true negative in class i, N isample is the total number of samples of class i, and n is the number of categories.

Comprehensive Indicators.
e comprehensive indicators mainly include F-measure (also known as F-score, F1) [40], Matthews correlation coefficient (Phi coefficient) [41], Kendall's tau, and Spearman's rho, used for binary classification. Furthermore, as well as for evaluating the multiclassification prediction model, the macro-F-measure and macro-Matthews correlation coefficient (macro-Phi coefficient) of the evaluation indicators of the multiclassification prediction model are expanded from the evaluation indicators of the two-classification prediction model.
F-measure is proposed to solve the contradictory model performance measurement value of precision and recall. erefore, F-measure is a balance point between precision and recall (that is, the harmonic average of precision and recall), which can take into account the precision and recall of the classification model at the same time. Its formula is shown in equation (11). e value range of F-measure is [0, 1]. Ideally, the closer the F-measure value is to 1, the better the model performance, and vice versa, the worse the model performance:  Figure 3: Anomaly detection results of sample data. Shock and Vibration 7 where P is precision, R is recall, FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, and TP is the number of samples predicted to be true positives. e Matthews correlation coefficient (Phi coefficient), which is mainly used to measure the two classification problems, is a relatively balanced indicator. It comprehensively considers TP, TN, FP, and FN, and it can also be used in the case of unbalanced sample data categories. e value range of Phi coefficient is [−1, 1], a value of 1 indicates that the prediction is entirely consistent with the actual, a value of 0 indicates that the predicted result is not as good as the random predicted result, and −1 indicates that the predicted result is utterly inconsistent with the actual result [42]. Its calculation formula is shown in the following equation: where FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, TP is the number of samples predicted to be true positives, and TN is the number of samples predicted to be true negatives. Macro-F-measure and macro-Matthews correlation coefficient (macro-Phi coefficient) are also called averaged Fmeasure and averaged-Matthews correlation coefficient, respectively. It is also based on the needs of the multiclassification problem and extended from the two classification problems. ey all treat each category equally, add up the same indicators of different categories, and then calculate the average to realize the evaluation of the multiclassification prediction model. eir value range and significance are the same as the evaluation of the twoclassification prediction models. eir calculation formula is shown in the following equations:

Visual Indicators.
Visual indicators mainly include ROC curve [43] and AUC [44], precision-recall curve (also known as PR curve) [45], gain curve [46], K-S curve and K-S statistical value [47], and lift curve [48] and lift value. ROC curve, also known as receiver operating characteristic curve, is a comprehensive indicator that reflects TPR and FPR with the decision threshold [49]. It is a curve composed of points (TPR and FPR), the abscissa is FPR, and the ordinate is TPR. ROC curve is mainly used to compare the relative performance of different classification models. However, when the ROC curves of different classification models intersect, it is not easy to reasonably evaluate the models' relative performance.
AUC, also known as the area under the ROC curve, is often used in conjunction with the ROC curve. e value range of AUC is [0, 1]. According to experience, when the AUC value is less than 0.5, the predictive ability of the model is worse than random guessing, but if the prediction is reversed, it is better than random guessing; when the AUC value is equal to 0.5, the model has no predictive value, just as a random guess; when AUC value is more than 0.7, the model's predictive ability can be considered acceptable; when the AUC value is equal to 1, the model's predictive ability is perfect, and using this model, no matter what threshold is set, a perfect prediction can be obtained (most of the time does not exist). e specific AUC value range and its empirical evaluation of the model are shown in Table 4.
PR curve is a curve that reacts to the relationship between precision and recall. It is also used to evaluate the relative performance of different classification models and can be used with the AUC value. e PR curve is an essential supplement to the ROC curve, especially in unbalanced sample categories; the PR curve can reflect the classifier's quality more effectively than the ROC curve. e gain curve (or cumulative gain curve) is an indicator to describe the global accuracy. It represents the relationship between the percentage of correct predictions for positive cases and the effort required to achieve them, measured by the percentage of prediction cases. e Y-axis in the gain curve is equal to recall and sensitivity (TPR), and the X-axis is the percentage of positive instances. e formulas of these indicators are shown in the following equations: where FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, TP is the number of samples predicted to be true positives, and TN is the number of samples predicted to be true negatives. K-S curve (Kolmogorov-Smirnov curve), also called Lorentz curve, is used to describe the quality of the classification model. e K-S curve draws two curves with TPR and FPR as the vertical axis and the threshold as the horizontal axis. us, it reflects the difference between TPR and FPR at the same threshold. In general, the farther the two broken lines are, the better the classification model distinguishes between positive and negative samples. e K-S statistical value measures the maximum difference between TPR and FPR within the range of all possible thresholds, that is, the upper limit of the classification model for the discrimination between positive and negative samples. e calculation formula of the K-S statistic is shown in equation (17). e value range of the K-S statistic is [0, 1]. Ideally, the closer the K-S statistic is to 1, the stronger the classification model's ability to distinguish between positive and negative samples. According to experience, when the K-S statistic is less than 0.2, the model cannot distinguish between positive and negative samples, and when the K-S statistic is more significant than 0.4, the model can distinguish between positive and negative samples [51]. e specific range of K-S statistics and its empirical evaluation of the model are shown in Table 5: where FPR is the negative cumulative response, FPR � FP/ TR + FP, and TPR is sensitivity. Lift curve, which measures the increase in the accuracy of the model's prediction results under a certain threshold compared to the accuracy of the random prediction results without the model. In short, it is how much the prediction effect has been improved by using this model and not using this model for prediction. e larger the lift, the better the prediction effect of the model. e calculation of lift is shown in the following equation: where FP is the number of samples predicted to be false positives, FN is the number of samples predicted to be false negatives, TP is the number of samples predicted to be true positives, and TN is the number of samples predicted to be true negatives. e above parameters and indicators for evaluating model performance are provided in BigML. rough the Table 3: Confusion matrix of two-classification problems [39]. Shock and Vibration brief introduction of the above indicators, it is not difficult to find that these indicators evaluate the performance of the classification prediction model from different evaluation angles to realize the multidirectional and multiangle analysis and measurement of the prediction model performance [52].
In addition, it can also be seen that there is no single indicator that can evaluate the performance of the classification prediction model in an all-around way, and more is the collaborative evaluation of multiple parameter indicators. It can be seen from Figure 4 that the sample data of top coal caving have severe category imbalance (i.e., grade imbalance), and the classification evaluation of top coal caving is a multiclass prediction problem. So, it is necessary to select multiple parameters and indicators from the above indicators to evaluate the prediction model's performance to ensure that the prediction model is robust. Accuracy/error rate is the most commonly used indicator for researchers to evaluate the performance of classification prediction models, because they calculated the ratio of the number of correctly classified predictions to the total number of predictions and the number of incorrectly classified predictions accounted for the total number of predictions, and they can objectively reflect the global quality of the model. However, accuracy/error rate is not a good indicator when the sample data categories are unbalanced, because when the sample data categories are unbalanced, the prediction will favor the category that accounts for the majority of the total sample and ignore the category that accounts for the minority of the total sample, resulting in the category that accounts for the minority of the total sample does not have the classification prediction ability or the classification prediction ability is weak. In addition, because the top coal caving classification evaluation is a multiclassification prediction problem, this article will not take the accuracy/error rate and precision and recall rate as the indicators of model performance evaluation. So, only take the macro-accuracy/macro-error rate and macroprecision and the macro-recall rate as the auxiliary indicators. Moreover, taking the macro-Matthews correlation coefficient, ROC curve, PR curve, K-S curve, gain curve, lift curve K-S statistics, AUC value, and lift value are the leading indicators to evaluate the model's performance.

Predictive Model Establishment and Its Performance Evaluation
It is often not easy to get a more robust and stable classification prediction model, and therefore, it is also challenging to achieve overnight. erefore, more needs to continue exploring and optimizing the model to obtain a more robust and stable classification prediction model. In BigML, the methods of establishing classification prediction models include models (decision trees), ensemble (bagging, random decision forests, and boosted trees), deep nets, logistic regression, and other methods. However, it is not easy to know in advance the specific ways and methods to obtain. So, it is necessary to use the modeling method provided by BigML to establish one or more exploratory models and continuously evaluate and optimize the models to obtain a robust and stable classification evaluation model of top coal caving. BigML is not only a very friendly machine learning platform, which can build models (decision tree), ensemble (bagging, random decision forests, and boosted trees), deep nets, and logical regression models with one click. Nevertheless, it also considers that noncomputer professional users may have model parameter adjustment and optimization problems, so the "automatic optimization" function is also specially set up. e user can efficiently complete the modeling task through this function by simply specifying training samples and training objectives. e model parameters will be automatically adjusted to the theoretical optimal by BigML. In order to evaluate the performance of model establishment, BigML has set up modules such as single model evaluation, multimodel evaluation, and cross-validation evaluation. ese modules only need the user to specify the model to be evaluated and the test set sample used to evaluate the model, and then the model performance evaluation can be easily completed.

Establishment of the Classification Evaluation Model of Top Coal Caving.
e decision tree is the most commonly used method to establish a classification prediction model, so this article uses the decision tree in BigML to establish the top coal caving exploratory classification evaluation model. In addition, when the model (decision tree) in BigML is used to establish the classification evaluation model of top coal caving exploratory, the "automatic optimization" function in BigML is used to build the model, and the prepared training set samples are taken as the model training samples, and the "Grade" in the samples is taken as the training target. Finally, BigML obtains the classification evaluation model of top coal caving based on the training set samples, as shown in Figures 6-9. In order to evaluate the performance of the top coal caving grading evaluation model established by BigML, the "single model evaluation" module of BigML is used to test the model and evaluate the model's performance. e test sample is the test set sample because the sample is small, with only 21 groups, so the sampling method is set to be replaceable. e evaluation results are shown in Figures 10-13 and Tables 6 and 7.
According to Tables 6 and 7 and Figures 10-13, it can be seen that the established classification evaluation model of top coal caving can be barely accepted when the probability threshold is 50% (the default probability threshold of the classification prediction model is 50%, which is also a commonly used threshold for establishing classification  (Figure 13), the lift value of each grade is greater than 100% (that is, the model's prediction ability for top coal caving of each grade is stronger than the random model). However, from the values of ROC AUC, PR AUC, and K-S in the ROC curve (Figure 10), PR curve (Figure 11), and K-S curve (Figure 12), the model has a certain prediction ability for top coal caving of grade 2 and grade 4, while it has low prediction ability for top coal caving of grade 1 and grade 2. e ROC AUC and PR AUC of grades 2 and 4 are greater than 0.7, and the K-S values are 100%, while the ROC AUC and PR AUC of grades 1 and 3    are less than 0.7, and the K-S values are about 60%. Optimization or a better modeling method can be considered to establish a classification evaluation model of top coal caving to achieve a perfect prediction of each grade top coal caving. e deep network is also a method to establish a classification prediction model, so try to use the deep network in BigML to establish a better top coal classification and caving classification evaluation model. When using the deep network in BigML to establish the top coal classification and caving classification evaluation model, the "automatic optimization" function is also used for modeling, and the automatic method is selected as "Network search." e model training sample is the training set sample, and the training target is the "Grade" in the sample. After the model is trained, use the test set sample data to test and evaluate the model's performance, and the test sampling method is still set to replaceable sampling. e test results are shown in Figures 14 -17 and Tables 8 and 9.
According to Tables 8 and 9, when the probability threshold is 50%, the global performance of the classification evaluation model of top coal caving established by the deep network is not much better than that established by the decision tree. However, according to Figures 14-17, among the local evaluation parameters of the model, the model established by the deep network is better than the decision tree. From the model's graphical performance evaluation indicators, lift curve (Figure 17), the lift value of each evaluation grade is greater than 100%, which means that the prediction ability of the model established by the deep network for top coal caving of each grade is stronger than the random model. From the ROC AUC, PR AUC, and K-S values of each grade in ROC curve (Figure 14), PR curve ( Figure 15) and K-S curve (Figure 16), the ROC AUC, PR AUC, and K-S values of each grade in the classification evaluation model of top coal caving established by the deep network are greater than or equal to that established by the decision tree. In addition, from the ROC AUC, PR AUC, and K-S values of each grade of the classification evaluation model top coal caving established by the deep network, it can be seen that the model has good prediction ability for the top coal caving of each grade. e ROC AUC of each grade in model is greater than 0.7, the PR AUC is basically greater than or equal to 0.7, and the K-S value is greater than 80%. e above analysis shows that under the probability threshold of 50%, although the

Optimization of Top Coal Caving Classification Evaluation
Model. In the modeling process, it is not difficult to find that, in general, if several different models can be combined and their prediction results are averaged, the ideal prediction results can be obtained. At the same time, if the combined average model can balance the shortcomings of a single participating combined model, then the final model generally obtained is robust and stable. However, it is based on this idea to develop a fusion modeling method in BigML. e fusion modeling method combines different models and averages their predictions to balance the weaknesses of each model so that the model can produce better performance. e principle is similar to model integration, except that the fusion modeling method can combine and average a single decision tree and combine and average models such as logistic regression and deep network.
In order to optimize the model and get a more robust and stable classification evaluation model of top coal caving, the classification evaluation model of top coal caving established by decision tree and depth network is fused. According to the model performance evaluation parameters, although the global performance of the classification evaluation model of top coal caving established by decision tree and deep network is similar, the local performance of the model established by the deep network is better than that of the model established by a decision tree. erefore, the weight of the prediction result of the model established by the decision tree and depth network is 1 : 3. After the model fusion, the model's performance is tested and evaluated with the sample data of the test set, and the test sampling mode is set to replaceable sampling. e test results are shown in Figures 18-21 and Tables 10 and 11.
According to Tables 10 and 11 and Figures

24
Shock and Vibration can be seen that the model has good prediction ability for the top coal caving of each grade. e ROC AUC of each model grade is greater than 0.9, the PR AUC is basically greater than or equal to 0.9, and the K-S value is 100%. e above analysis shows that under the probability threshold of 50%, the classification evaluation model of top coal caving established by fusion is perfect and robust and has fully met the prediction needs, whether from the global or local point of view.

Practical Application of Prediction Model in Engineering
Because the evaluation model of top coal caving classification is optimized by the fusion method, the model is perfect and robust no matter from the global or local view and has fully met the demand of prediction, so the model is applied   (Group) Co., Ltd., located west of Changzhi City, Shanxi Province, China. Its geographical location is shown in Figure 22.

Conclusion
Because of the current, most of the evaluation and prediction models of top coal caving established by experts and scholars are established by computer programming, which makes it difficult for people who do not understand the calculation language to use or modify the models and makes it difficult for these models to be widely applied in the actual application process. is article introduces a method to establish the evaluation and prediction model of top coal caving without programming. At the same time, the model can be used to predict and evaluate top coal caving and modify the model according to its own needs without programming.
is method establishes the prediction model of top coal caving by using the machine learning platform BigML based on the cloud. At the same time, this paper establishes the prediction model of top coal caving evaluation by using BigML and applies it to evaluating top coal caving of No. 3 coal seam in Gucheng Coal Mine. e evaluation result is grade 1, which is consistent with the engineering practice. It fully proves that the application of BigML in evaluating top coal caving is successful and feasible and provides another more convenient method for the classification evaluation and prediction of top coal caving. In addition, it provides another way to realize the classification evaluation of top coal caving properties and the establishment of other evaluation predictions using machine learning without programming.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.