Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms

: Predicting pillar stability is a vital task in hard rock mines as pillar instability can cause large-scale collapse hazards. However, it is challenging because the pillar stability is a ﬀ ected by many factors. With the accumulation of pillar stability cases, machine learning (ML) has shown great potential to predict pillar stability. This study aims to predict hard rock pillar stability using gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) algorithms. First, 236 cases with ﬁve indicators were collected from seven hard rock mines. Afterwards, the hyperparameters of each model were tuned using a ﬁve-fold cross validation (CV) approach. Based on the optimal hyperparameters conﬁguration, prediction models were constructed using training set (70% of the data). Finally, the test set (30% of the data) was adopted to evaluate the performance of each model. The precision, recall, and F 1 indexes were utilized to analyze prediction results of each level, and the accuracy and their macro average values were used to assess the overall prediction performance. Based on the sensitivity analysis of indicators, the relative importance of each indicator was obtained. In addition, the safety factor approach and other ML algorithms were adopted as comparisons. The results showed that GBDT, XGBoost, and LightGBM algorithms achieved a better comprehensive performance, and their prediction accuracies were 0.8310, 0.8310, and 0.8169, respectively. The average pillar stress and ratio of pillar width to pillar height had the most important inﬂuences on prediction results. The proposed methodology can provide a reliable reference for pillar design and stability risk management.


Introduction
Pillars are essential structural units to ensure mining safety in underground hard rock mines. Their functions are to provide safe access to working areas, and support the weight of overburden rocks for guaranteeing global stability [1,2]. Instable pillars can cause large-scale catastrophic collapse, and significantly increase safety hazards of workers [3]. If a pillar fails, the adjacent pillars have to bear a larger load. The increased load may exceed the strength of adjacent pillars, and lead to their failure. Like the domino effect, rapid and large-scale collapse can be induced [4]. In addition, with the increase of mining depth, the ground stress is larger and pillar instability accidents become more frequent [5][6][7]. Accordingly, determining pillar stability is essential for achieving safe and efficient mining.
In general, there are three types of methods to determine pillar stability in hard rock mines. The first one is the safety factor approach. The safety factor indicates the ratio of pillar strength to From Table 1, it can be seen that each sample contains five indicators and the level of pillar stability. These five indicators have been widely used in a large amount of literature [9,10,24,33]. The advantages are two-fold: (1) They can reflect the necessary conditions of pillar stability, such as pillar size, strength, and load; (2) their values are relatively easy to be obtained. The descriptive statistics of each indicator are listed in Table 2. Besides, according to the instability mechanism and failure process of pillars, the level of pillar stability is divided into three types: stable, instable, and failed. Among them, the stable level indicates that the pillar shows no sign of stress induced fracturing, or only has minor spallings which do not affect pillar stability; the instable level indicates that the pillar is partially failed, and has prominent spallings, but still possesses a certain supporting capacity; the failed level indicates that the pillar is crushed, and has pronounced openings of joints, which shows a great collapse risk. The distribution of these three pillar stability levels is described in Figure 1.   The box plot of each indicator for different pillar stability level is shown in Figure 2. According to Figure 2, some meaningful characteristics can be found. First, all indicators have several outliers. Second, the pillar stability level is negatively correlated with A 1 and A 3 (the Pearson correlation coefficients are −0.381 and −0.266, respectively), while is positively correlated with A 5 (the Pearson correlation coefficient is 0.433). However, there are no obvious correlations with A 2 and A 4 (the Pearson correlation coefficients are −0.121 and 0.079, respectively). Third, the distances between upper and lower quartiles are different for the same indicators in various levels. Fourth, there are some overlapping parts for the range of indicator values in different levels. Fifth, as the median is not in the center of the box, the distribution of indicator values is asymmetric. All these phenomena illustrate the complexity of pillar stability prediction. The box plot of each indicator for different pillar stability level is shown in Figure 2. According to Figure 2

GBDT Algorithm
GBDT is an ensemble ML algorithm using multiple DTs as base learners. Every decision tree (DT) is not independent, because a new added DT increases emphasis on the misclassified samples attained by previous DTs [34]. The diagram of GBDT algorithm is shown in Figure 3. It can be noticed that the residual of former DTs is taken as the input for the next DT. Then, the added DT is used to reduce residual, so that the loss decreases following the negative gradient direction in each iteration. Finally, the prediction result is determined based on the sum of results from all DTs.
and i y denotes the label. The specific steps of GBDT are indicated as follows [35]: Step 1: The initial constant value γ is obtained as where ( , ) i L y γ is the loss function.
Step 2: The residual along the gradient direction is written as (2) where n indicates the number of iterations, and 1,2, , n N =  .

GBDT Algorithm
GBDT is an ensemble ML algorithm using multiple DTs as base learners. Every decision tree (DT) is not independent, because a new added DT increases emphasis on the misclassified samples attained by previous DTs [34]. The diagram of GBDT algorithm is shown in Figure 3. It can be noticed that the residual of former DTs is taken as the input for the next DT. Then, the added DT is used to reduce residual, so that the loss decreases following the negative gradient direction in each iteration. Finally, the prediction result is determined based on the sum of results from all DTs.

GBDT Algorithm
GBDT is an ensemble ML algorithm using multiple DTs as base learners. Every decision tree (DT) is not independent, because a new added DT increases emphasis on the misclassified samples attained by previous DTs [34]. The diagram of GBDT algorithm is shown in Figure 3. It can be noticed that the residual of former DTs is taken as the input for the next DT. Then, the added DT is used to reduce residual, so that the loss decreases following the negative gradient direction in each iteration. Finally, the prediction result is determined based on the sum of results from all DTs. Step 1: The initial constant value γ is obtained as where ( , ) i L y γ is the loss function.
Step 2: The residual along the gradient direction is written as (2) where n indicates the number of iterations, and 1,2, , n N =  .

Training set
indicates the sample data, where x i = (x 1i , x 2i , · · · , x ri ) represent the indicators, and y i denotes the label. The specific steps of GBDT are indicated as follows [35]: Step 1: The initial constant value γ is obtained as where L(y i , γ) is the loss function.
Step 2: The residual along the gradient direction is written aŝ where n indicates the number of iterations, and n = 1, 2, · · · , N.
Step 3: The initial model T(x i ; α n ) is obtained by fitting sample data, and the parameter α n is calculated based on the least square method as Step 4: By minimizing the loss function, the current model weight is expressed as Step 5: The model is updated as This loop is performed until the specified iterations times or the convergence conditions are met.

XGBoost Algorithm
XGBoost algorithm was proposed by Chen and Guestrin [31] based on the GBDT structure. It has received widespread attentions because of its excellent performances in Kaggle's ML competitions [36]. Different from the GBDT, XGBoost introduces the regularization term in the objective function to prevent overfitting. The objective function is defined as where R( f k ) denotes the regularization term at the k time iteration, and C is a constant term, which can be selectively omitted. The regularization term R( f k ) is expressed as where α represents the complexity of leaves, H indicates the number of leaves, η denotes the penalty parameter, and w j is the output result of each leaf node. In particular, the leaves indicate the predicted categories following the classification rules, and the leaf node indicates the node of the tree that cannot be split. Moreover, as opposed to use the first-order derivative in GBDT, a second-order Taylor series of objective function is adopted in XGBoost. Suppose the mean square error is used as the loss function, then the objective function can be derived as where q(x i ) indicates a function that assigns data points to the corresponding leaves, and g i and h i denote the first and second derivative of loss function, respectively. The final loss value is calculated based on the sum of all loss values. Because samples correspond to leaf nodes in the DT, the ultimate loss value can be determined by summing the loss values of leaf nodes. Therefore, the objective function is also expressed as where P j = i∈I j p i , Q j = i∈I j q i , and I j indicates all samples in leaf node j.
To conclude, the optimization of objective function is converted to a problem of determining the minimum of a quadratic function. In addition, because of the introduction of regularization term, XGBoost has a better ability to against overfitting.

LightGBM Algorithm
LightGBM was another algorithm designed by Microsoft Research Asia [32] using GBDT framework. It aims to improve the computational efficiency, so that the prediction problems with big data can be solved more effectively. In GBDT algorithm, presorting approach is used to select and split indicators. Although this method can precisely determine the splitting point, it requires more time and memory. In LightGBM, the histogram-based algorithm and trees leaf-wise growth strategy with a maximum depth limit is adopted to increase the training speed and reduce memory consumption.
The histogram-based DT algorithm is shown in Figure 4. It can be seen that the successive floating-point eigenvalues are discretized into s small bins. After that, these bins are used to construct the histogram with width s. Once the data is traversed at the first time, the required statistics (sum of gradients and number of samples in each bin) are accumulated in the histogram. Based on the discrete value of histogram, the optimal segmentation point can be found [37]. By using this method, the cost of storage and calculation can be reduced. LightGBM was another algorithm designed by Microsoft Research Asia [32] using GBDT framework. It aims to improve the computational efficiency, so that the prediction problems with big data can be solved more effectively. In GBDT algorithm, presorting approach is used to select and split indicators. Although this method can precisely determine the splitting point, it requires more time and memory. In LightGBM, the histogram-based algorithm and trees leaf-wise growth strategy with a maximum depth limit is adopted to increase the training speed and reduce memory consumption.
The histogram-based DT algorithm is shown in Figure 4. It can be seen that the successive floating-point eigenvalues are discretized into s small bins. After that, these bins are used to construct the histogram with width s. Once the data is traversed at the first time, the required statistics (sum of gradients and number of samples in each bin) are accumulated in the histogram. Based on the discrete value of histogram, the optimal segmentation point can be found [37]. By using this method, the cost of storage and calculation can be reduced. The level-wise and leaf-wise growth strategies are shown in Figure 5. According to the levelwise growth strategy, the leaves on the same layer are simultaneously split. It is favorable to optimize with multiple threads, and control model complexity. However, leaves on same layer are indiscriminately treated, whereas they have different information gain. Information gain indicates the expected reduction in entropy caused by splitting the nodes based on attributes, which can be determined by [38] ( ) where ( ) En B is the information entropy of the collection B, d p is the ratio of B pertaining to category d , D is the number of categories, v is the value of attribute V , and v B is the subset of B for which attribute has value v .
Actually, many leaves with low information gain are unnecessary to be searched and split, which increases a great deal of extra memory consumption and causes this method to be ineffective. In contrast, leaf-wise growth strategy is more efficient because it only split the leaf that has the largest information gain on the same layer. Furthermore, considering this strategy may cause trees with high depth, resulting in overfitting, a maximum depth limitation is adopted during the growth of trees [39]. The level-wise and leaf-wise growth strategies are shown in Figure 5. According to the level-wise growth strategy, the leaves on the same layer are simultaneously split. It is favorable to optimize with multiple threads, and control model complexity. However, leaves on same layer are indiscriminately treated, whereas they have different information gain. Information gain indicates the expected reduction in entropy caused by splitting the nodes based on attributes, which can be determined by [38] where En(B) is the information entropy of the collection B, p d is the ratio of B pertaining to category d, D is the number of categories, v is the value of attribute V, and B v is the subset of B for which attribute has value v. Actually, many leaves with low information gain are unnecessary to be searched and split, which increases a great deal of extra memory consumption and causes this method to be ineffective. In contrast, leaf-wise growth strategy is more efficient because it only split the leaf that has the largest information gain on the same layer. Furthermore, considering this strategy may cause trees with high depth, resulting in overfitting, a maximum depth limitation is adopted during the growth of trees [39].

Construction of Prediction Model
The construction process of prediction model is shown in Figure 6. First, approximately 70% and 30% of the original pillar stability data are selected as training and test sets, respectively. Second, a five-fold cross validation (CV) approach is utilized to tune the model hyperparameters. Third, using the optimal hyperparameters configuration, the prediction model is fitted based on training set. Fourth, the test set is adopted to evaluate model performances according to the overall prediction results and the prediction ability for each stability level. Last, by comparing the comprehensive performance of these models, the optimal model is determined. If the prediction performance of this model is acceptable, then it can be adopted for deployment. The entire calculation process is performed in Python 3.7 based on the scikit-learn [40], XGBoost [41], and LightGBM [42] libraries, respectively. The hyperparameter optimization process and model evaluation indexes are descripted as follows.

Hyperparameter Optimization
Most of ML algorithms contain hyperparameters that need to be tuned. These hyperparameters should be adjusted based on dataset rather than specifying manually. Generally, the hyperparameters search methods include grid search, Bayesian optimization, heuristic search and randomized search [43]. Because the randomized search method is more efficient for simultaneously tuning multiple hyperparameters, it is used to recognize the best set of hyperparameters in this study.

Construction of Prediction Model
The construction process of prediction model is shown in Figure 6. First, approximately 70% and 30% of the original pillar stability data are selected as training and test sets, respectively. Second, a five-fold cross validation (CV) approach is utilized to tune the model hyperparameters. Third, using the optimal hyperparameters configuration, the prediction model is fitted based on training set. Fourth, the test set is adopted to evaluate model performances according to the overall prediction results and the prediction ability for each stability level. Last, by comparing the comprehensive performance of these models, the optimal model is determined. If the prediction performance of this model is acceptable, then it can be adopted for deployment. The entire calculation process is performed in Python 3.7 based on the scikit-learn [40], XGBoost

Construction of Prediction Model
The construction process of prediction model is shown in Figure 6. First, approximately 70% and 30% of the original pillar stability data are selected as training and test sets, respectively. Second, a five-fold cross validation (CV) approach is utilized to tune the model hyperparameters. Third, using the optimal hyperparameters configuration, the prediction model is fitted based on training set. Fourth, the test set is adopted to evaluate model performances according to the overall prediction results and the prediction ability for each stability level. Last, by comparing the comprehensive performance of these models, the optimal model is determined. If the prediction performance of this model is acceptable, then it can be adopted for deployment. The entire calculation process is performed in Python 3.7 based on the scikit-learn [40], XGBoost [41], and LightGBM [42] libraries, respectively. The hyperparameter optimization process and model evaluation indexes are descripted as follows.

Hyperparameter Optimization
Most of ML algorithms contain hyperparameters that need to be tuned. These hyperparameters should be adjusted based on dataset rather than specifying manually. Generally, the hyperparameters search methods include grid search, Bayesian optimization, heuristic search and randomized search [43]. Because the randomized search method is more efficient for simultaneously tuning multiple hyperparameters, it is used to recognize the best set of hyperparameters in this study.

Hyperparameter Optimization
Most of ML algorithms contain hyperparameters that need to be tuned. These hyperparameters should be adjusted based on dataset rather than specifying manually. Generally, the hyperparameters search methods include grid search, Bayesian optimization, heuristic search and randomized search [43]. Because the randomized search method is more efficient for simultaneously tuning multiple hyperparameters, it is used to recognize the best set of hyperparameters in this study. In general, K-fold CV approach is utilized for hyperparameters configuration [44]. In our work, five-fold CV method is used, which is illustrated in Figure 7. The original training set is randomly split into five equal size subsamples. Among them, a singular subsample is adopted as the validation set, and the other four subsamples are used as the training sub-set. This procedure is repeated five times until each subsample is selected as a validation set once. Thereafter, the average accuracy of these five validation sets is used to determine the optimal hyperparameter values. In general, K-fold CV approach is utilized for hyperparameters configuration [44]. In our work, fivefold CV method is used, which is illustrated in Figure 7. The original training set is randomly split into five equal size subsamples. Among them, a singular subsample is adopted as the validation set, and the other four subsamples are used as the training sub-set. This procedure is repeated five times until each subsample is selected as a validation set once. Thereafter, the average accuracy of these five validation sets is used to determine the optimal hyperparameter values. In this study, some critical hyperparameters in GBDT, XGBoost, and LightGBM algorithms are tuned, as shown in Table 3. The specific meanings of these hyperparameters are also explained in Table 3. First, the search range of different hyperparameters values is specified. In particular, for different algorithms, the search range of the same hyperparameters is kept consistent. Further on, according to the maximum average accuracy, the optimal values for each set of hyperparameters are obtained, which are indicted in Table 3.

Model Evaluation Indexes
The accuracy, precision, recall and 1 F measures have been widely used for the performance evaluation of ML algorithms [43]. Among them, accuracy indicates the proportion of samples that is correctly predicted; precision is the ability to accurately predict samples; recall denotes the capability In this study, some critical hyperparameters in GBDT, XGBoost, and LightGBM algorithms are tuned, as shown in Table 3. The specific meanings of these hyperparameters are also explained in Table 3. First, the search range of different hyperparameters values is specified. In particular, for different algorithms, the search range of the same hyperparameters is kept consistent. Further on, according to the maximum average accuracy, the optimal values for each set of hyperparameters are obtained, which are indicted in Table 3.

Model Evaluation Indexes
The accuracy, precision, recall and F 1 measures have been widely used for the performance evaluation of ML algorithms [43]. Among them, accuracy indicates the proportion of samples that is correctly predicted; precision is the ability to accurately predict samples; recall denotes the capability of correctly predicting as many samples as possible in actual samples; and F 1 represents a comprehensive metric that measures the performance of both precision and recall. Accordingly, these indexes are adopted to evaluate model performances in this study. Assume the confusion matrix is expressed as where E is the number of pillar stability levels, g aa represents the number of samples correctly predicted for level a, and g ab denotes the number of samples of level a that are classified to level b. Based on the confusion matrix, the precision, recall and F 1 measure for each pillar stability level are respectively determined by To further reflect the overall prediction performance, the accuracy and macro average of precision, recall and F 1 (expressed as macro − Pr, macro − Re and macro − F 1 , respectively) are calculated as

Overall Prediction Results
The prediction results of GBDT, XGBoost, and LightGBM algorithms were obtained on the test set. Subsequently, the confusion matrix of each algorithm was determined, as shown in Table 4. The values on the main diagonal indicated the number of samples correctly predicted. It can be seen that most samples were accurately classified using these three algorithms. Based on Table 4, the accuracy, macro − Pr, macro − Re, and macro − F 1 were calculated based on Equations (16)- (19), which were listed in Table 5. According to the results, all these three algorithms had good performances in predicting the pillar stability of hard rock. Furthermore, the accuracy degrees of GBDT and XGBoost were largest and up to 0.8310, followed by LightGBM with an accuracy of 0.8169. Although GBDT and XGBoost possessed the same accuracy, the prediction performances were different. Comparing their values of macro − Pr, macro − Re, and macro − F 1 , GBDT performed better than XGBoost. Therefore, based on their overall prediction performances, the rank was GBDT > XGBoost > LightGBM.

Prediction Results of Each Stability
To analyze the prediction performance of algorithms for each stability level, the precision, recall and F 1 indexes were calculated based on Equations (13)- (15), which were shown in Figures 8-10, respectively. It can be seen that the prediction performance of each algorithm for different stability level was not the same. For the stable level, GBDT achieved the highest precision value (0.8929) and the highest F 1 value (0.8621); and XGBoost achieved the highest recall value (0.8667). For the instable level, XGBoost possessed the highest precision value (0.7692); and GBDT possessed the highest recall value (0.7500) and the highest F 1 value (0.7059). For the failed level, LightGBM obtained the highest precision value (0.9130); and XGBoost obtained the highest recall value (0.9200) and the highest F 1 value (0.8846). Moreover, it can be observed that the prediction performance of these algorithms for failed level was the best, followed by stable level. However, the prediction performance for instable level was the worst. Therefore, based on their overall prediction performances, the rank was GBDT > XGBoost > LightGBM.

Prediction Results of Each Stability
To analyze the prediction performance of algorithms for each stability level, the precision, recall and F1 indexes were calculated based on Equations (13)- (15), which were shown in Figures 8-10, respectively. It can be seen that the prediction performance of each algorithm for different stability level was not the same. For the stable level, GBDT achieved the highest precision value (0.8929) and the highest F1 value (0.8621); and XGBoost achieved the highest recall value (0.8667). For the instable level, XGBoost possessed the highest precision value (0.7692); and GBDT possessed the highest recall value (0.7500) and the highest F1 value (0.7059). For the failed level, LightGBM obtained the highest precision value (0.9130); and XGBoost obtained the highest recall value (0.9200) and the highest F1 value (0.8846). Moreover, it can be observed that the prediction performance of these algorithms for failed level was the best, followed by stable level. However, the prediction performance for instable level was the worst.

Relative Importance of Indicators
The relative importance of indicators is a valuable reference for taking reasonable measures to prevent pillar instability. In this study, the importance degree of each indicator was obtained using GBDT, XGBoost, and LightGBM algorithms, which was indicated in Figure 11. Based on GBDT algorithm, the rank of importance degree was 5 A (average pillar stress) and 3 A (ratio of pillar width to pillar height).

Relative Importance of Indicators
The relative importance of indicators is a valuable reference for taking reasonable measures to prevent pillar instability. In this study, the importance degree of each indicator was obtained using GBDT, XGBoost, and LightGBM algorithms, which was indicated in Figure 11. Based on GBDT algorithm, the rank of importance degree was 5 A (average pillar stress) and 3 A (ratio of pillar width to pillar height).

Relative Importance of Indicators
The relative importance of indicators is a valuable reference for taking reasonable measures to prevent pillar instability. In this study, the importance degree of each indicator was obtained using GBDT, XGBoost, and LightGBM algorithms, which was indicated in Figure 11. Based on GBDT algorithm, the rank of importance degree was A 5 > A 3 > A 1 > A 4 > A 2 . According to XGBoost algorithm, the rank of importance degree was A 5 > A 3 > A 1 > A 4 > A 2 . Using LightGBM algorithm, the rank of importance degree was A 3 > A 5 > A 1 > A 2 > A 4 . Therefore, there were two most important indicators: A 5 (average pillar stress) and A 3 (ratio of pillar width to pillar height).

Discussions
Although the pillar stability can be well predicted using GBDT, XGBoost, and LightGBM algorithms, the prediction performance for different instable level was not the same. The reason may be two-fold. One is that the number of samples for instable level is smaller than other two levels. Another is that the discrimination boundary of instable level is more uncertain, which would influence the quality of data. As data-driven approaches, the prediction performances of GBDT, XGBoost, and LightGBM algorithms are greatly affected by the number and quality of supportive data. Therefore, compared with other two levels, the prediction performance for instable level was worse.
According to the analysis of indicator importance, 5 A and 3 A were the most important indicator. The reason may be that the pillar stability is greatly affected by the external stress conditions and inherent dimension characteristics. According to the field experience, the spalling phenomenon of pillar was more common in deep mines because of high stress, and the pillar with small 3 A value was more vulnerable to be damaged. Based on the importance degrees of indicators, some measures can be adopted to improve pillar stability from two directions. One is reducing pillar stress, such as the adjustment of excavation sequence and relief of pressure [32]. Another is optimizing pillar parameters, such as the increase of pillar width and ratio of pillar width to pillar height.
To further illustrate the effectiveness of GBDT, XGBoost, and LightGBM algorithms, the safety factor approach and other ML algorithms were adopted as comparisons.
First, the safety factor approach was used to determine pillar stability. According to the research of Esterhuizen et al. [33], the pillar strength can be calculated by Based on Equation (20) Figure 11. Importance degree of each indicator.

Discussions
Although the pillar stability can be well predicted using GBDT, XGBoost, and LightGBM algorithms, the prediction performance for different instable level was not the same. The reason may be two-fold. One is that the number of samples for instable level is smaller than other two levels. Another is that the discrimination boundary of instable level is more uncertain, which would influence the quality of data. As data-driven approaches, the prediction performances of GBDT, XGBoost, and LightGBM algorithms are greatly affected by the number and quality of supportive data. Therefore, compared with other two levels, the prediction performance for instable level was worse.
According to the analysis of indicator importance, A 5 and A 3 were the most important indicator. The reason may be that the pillar stability is greatly affected by the external stress conditions and inherent dimension characteristics. According to the field experience, the spalling phenomenon of pillar was more common in deep mines because of high stress, and the pillar with small A 3 value was more vulnerable to be damaged. Based on the importance degrees of indicators, some measures can be adopted to improve pillar stability from two directions. One is reducing pillar stress, such as the adjustment of excavation sequence and relief of pressure [32]. Another is optimizing pillar parameters, such as the increase of pillar width and ratio of pillar width to pillar height.
To further illustrate the effectiveness of GBDT, XGBoost, and LightGBM algorithms, the safety factor approach and other ML algorithms were adopted as comparisons.
First, the safety factor approach was used to determine pillar stability. According to the research of Esterhuizen et al. [33], the pillar strength can be calculated by Based on Equation (20) and the given pillar stress, the safety factor can be determined by their quotient. From the work of Lunder and Pakalnis [45], when η > 1.4, the pillar was stable; when 1 < η < 1.4, the pillar was unstable; and when η < 1, the pillar was failed. According to this discrimination method, the prediction accuracy was 0.6610. From the work of González-Nicieza et al. [8], when η > 1.25, the pillar was stable; when 0.90 < η < 1.25, the pillar was unstable; and when η < 0.90, the pillar was failed. Base on this classification method, the prediction accuracy was 0.6398. Therefore, the prediction accuracy of safety factor approach was lower than that of the proposed methodology. It can be inferred that this empirical method has difficulty in obtaining satisfactory prediction results on these pillar samples.
On the other hand, some ML algorithms, such as RF, adaptive boosting (AdaBoost), MLR, Gaussian naive Bayes (GNB), Gaussian processes (GP), DT, MLPNN, SVM, and k-nearest neighbor (KNN), were adopted to predict pillar stability. First, some key hyperparameters in these algorithms were tuned, and the optimization results were listed in Table 6. Afterwards, these algorithms with optimal hyperparameters were used to predict pillar stability based on the same dataset. The accuracy of each algorithm was shown in Figure 12. It can be seen that the accuracies of these algorithms were all lower than those of GBDT, XGBoost, and LightGBM algorithms. The accuracies of most algorithms (except for GP) were lower than 0.8, whereas the accuracies of GBDT, XGBoost, and LightGBM algorithms were all higher than 0.8. It demonstrated that compared with other ML algorithms, GBDT, XGBoost, and LightGBM algorithms were better for predicting pillar stability in hard rock mines.  Although the proposed approach obtains desirable prediction results, some limitations should be addressed in the future.
(1) The dataset is relatively small and unbalanced. The prediction performance of ML algorithms Although the proposed approach obtains desirable prediction results, some limitations should be addressed in the future.
(1) The dataset is relatively small and unbalanced. The prediction performance of ML algorithms is heavily affected by the number and quality of dataset. Generally, if the dataset is small, the generalization and reliability of model would be influenced. Although GBDT, XGBoost, and LightGBM algorithms work well with small datasets, the prediction performances could be better on a larger dataset. In addition, the dataset is unbalanced, particularly for samples with instable level. Compared with other levels, the prediction performance for the instable level is not good. This illustrates the adverse effects of imbalanced data on results. Therefore, it is meaningful to establish a larger and more balanced pillar stability database. (2) Other indicators may also have influences on the prediction results. Pillar stability is affected by numerous factors, including the inherent characteristics and external environments. Although the five indictors adopted in this study can describe the necessary conditions of pillar stability to some extent, some other indicators may also have effects on pillar stability, such as joints, underground water, and blasting disturbance. Theoretically, the joints and underground water can affect the pillar strength, and blasting disturbance can be deemed as a dynamic stress on the pillar. Accordingly, it is significant to analyze the influences of these indicators on the prediction results.

Conclusions
To ensure mining safety, pillar stability prediction is a crucial task in underground hard rock mines. This study investigated the performance of GBDT, XGBoost, and LightGBM algorithms for pillar stability prediction. The prediction models were constructed based on training set (165 cases) after their hyperparameters were tuned using the five-fold CV method. The test set (71 cases) were adopted to validate the feasibility of trained models. Overall, the performances of GBDT, XGBoost, and LightGBM were acceptable, and their prediction accuracies were 0.8310, 0.8310, and 0.8169, respectively. By comprehensively analyzing the accuracy and macro average of precision, recall and F 1 , the rank of overall prediction performance was GBDT > XGBoost > LightGBM. According to the precision, recall and F 1 of each stability level, the prediction performance for stable and failed levels was better than that for instable level. Based on the importance scores of indicators from these three algorithms, the average pillar stress and ratio of pillar width to pillar height were the most influential indicators on the prediction results. Compared with the safety factor approach and other ML algorithms (RF, AdaBoost, LR, GNB, GP, DT, MLPNN, SVM, and KNN), the performances of GBDT, XGBoost, and LightGBM were better, which further verified that they were reliable and effective for the pillar stability prediction.
In the future, a larger and more balanced pillar stability database can be established to further illustrate the adequacy of these algorithms for the prediction of stable, instable, and failed pillar levels. The influences of other indicators on the prediction results are essential to be analyzed. The methodology can also be applied in other fields, such as the risk prediction of landslide, debris flow, and rockburst.