Coal and Gas Outbursts Prediction Based on DWT+FICA-LDA Feature Extraction and QPSO-DELM Classifier


 Due to the severity and great harm of coal and gas outbursts accidents, outbursts prediction becomes very necessary, the paper presents a hybrid prediction model of feature extraction and pattern classification for coal and gas outbursts. First, Discrete wavelet transform (DWT) is utilized as a processing technique to decompose subseries and extract the features with different frequency, and the optimal feature components are retained; Second, in order to eliminate the redundancy between the features and uncorrelation between feature and outbursts, we use the fast independent component analysis(FICA) feature extraction method based on high-order statistics to obtain each independent feature, obtaining the global information in the feature; then the obtained features are input into linear discriminant analysis(LDA) , under the guidance of class label, then the local information in features are obtained; Finally, the projected features are input into the deep extreme learning machine(DELM) classifier based on the optimal parameters by quantum particle swarm optimization(QPSO) for training and classification. The experimental results on the data set of coal and gas outbursts show that compared with other models in the current prediction of coal and gas outbursts, this method has significant effect on various indicators such as speed and recognition effect.

more effective features for the prominent classification problem. At the same time, because coal and gas outbursts influencing factors have the characteristics of random and diversity, the relationships between influencing factors, outbursts and influencing factors are also very complex, the single feature extraction method cannot fully mine the internal laws of the outbursts influencing factors. The correlation information between them cannot be extracted, and the information components contained in the influencing factors cannot be integrated, which makes that the prediction effect of the classifier is not high. The literature shows that the sample data of coal and gas outbursts contains quite complex information, including global and local structure information, gaussian and non-gaussian components, which has significant non-linear and non-gaussian characteristics. In order to analyze the feature data better, we need to use the mixed feature fusion method to extract features in multiple feature spaces, any single subspace feature extraction algorithm has its limitations, and cannot be superior to other subspaces in any case.
Therefore, this paper focuses on the problems which exist in the single feature extraction of coal and gas outbursts and make full use of the combination of local linear feature and global nonlinear feature extraction methods to make up for the shortcomings of single feature extraction method.
Classifier model selection for outbursts prediction is another issue addressed in this study, the common classifiers are BP Neural Network [17], Support Vector Machine(SVM) [18], Random Forestry(RF) [19], and other intelligent classifier technologies [20][21]. These methods play an important role in improving the prediction of coal and gas outbursts. Among them, PNN and RF all assume the indexes of coal and gas outbursts prediction are independent of each other, and there are some disadvantages such as complex calculation, long learning time and large sample space. ELM is a single hidden layer feedforward neural network, which is characterized by the connection weight matrix between input layer and hidden layer, and the offset of hidden layer nodes only need onetime random initialization, instead of updating by traditional back propagation algorithm. The only thing to be calculated is the weight matrix between hidden layer and output layer, which can be calculated by finding the square of generalized inverse matrix. Compared with other models, ELM has some obvious advantages. From the perspective of classification performance, ELM will achieve classification performance similar to or even better than SVM, and gain many successful experiences in small sample and high-dimensional nonlinear pattern recognition problems. ELM can approach any non-linear objective function with arbitrary precision by giving an excitation function satisfying certain conditions [21], which has strong nonlinear approximation ability and can be used to describe coal and gas outbursts prominent nonlinear evolution of space-time. However, due to the randomness, the parameters of ELM are not optimal, and there are problems in stability and reliability, so parameter optimization is needed. Many kinds of literatures have proposed some optimization methods such as Genetic Algorithm(GA), Ant Colony(AC) and Particle Swarm Algorithm(PSO) [22][23][24][25][26][27]. To some extent, these intelligent optimization algorithms can solve the problems that the optimal solution of parameters cannot obtain by experience setting, however these optimization algorithms are generally based on the mode of evolutionary algorithm. The disadvantage of the evolutionary algorithm is that the optimization algorithm is easy to trap at local minima, the parameters searched are not optimal, and the efficiency is not high, so the algorithm needs iteration continuously.
QPSO is a global searching algorithm, which theoretically guarantees to find the best solution in the searching space.
Compared with PSO, the iterative equation of QPSO [28] does not need the particle velocity vector, and it needs less parameters to be adjusted, which can be realized more easily, related experimental results also show that QPSO has better performance than standard PSO algorithm.
These models follow the non-linear characteristics of time and space evolution of coal and gas outbursts from different perspectives, and enhance the prediction ability of coal and gas outbursts. The mechanism of coal and gas outbursts is very complex, and there are many influencing factors, the non-linear relationship between the outbursts risk and influencing factors cannot be expressed by a clear mathematical model, thus increasing the difficulty of prediction. Therefore, it is necessary to establish an effective prediction model of coal and gas outbursts. In order to improve the detection accuracy and stability, we propose a optimization classifier model based on DWT feature denoising and FICA-LDA feature extraction. The main contributions of this paper are as follows: The first stage is feature decomposition and reconstruction, we use DWT to smooth the coal and gas outburst features, and extract the trend components in the original features, improving the quality of trend term extraction to a certain extent. The second stage is dual space feature extraction, FICA is used to extract high-order information, the accounting information maps the high-dimensional features in the input space to the new low-dimensional feature space, and separates the independent features. Then, LDA is used to extract second-order statistical information, we can find the dimensions with identification ability through data dimensionality reduction, so that the original data can be reflected in these dimensions, and different classes can be distinguished as much as possible to get the optimal identification characteristics. The third stage is machine learning algorithm and parameter optimization, we use the DELM with high learning ability and generalization ability to predict outbursts, and uses the QPSO algorithm with good comprehensive performance to optimize the related parameters of the DELM. Finally, we conduct a comprehensive evaluation using the actual dataset of coal and gas outbursts, this includes a comparison with stateof-the-art feature selection methods in coal and gas outbursts. The remaining part of the article is organized as follows. Section2 offers the materials and methodologies used in the experiments and introduces our proposed approach. The dataset preparation and configuration environment of the proposed schemes are presented, then the experimental results are analyzed in Section3. Conclusions based on this research and future research targets are highlighted in Section4.

Theory and method
Coal and gas outbursts is a complex non-linear and random feature set, here we propose a mixed feature extraction and selection model based on DWT+FICA-LDA+QPSO-RELM. Firstly, we use DWT to decompose different features in coal and gas features, and reconstruct them by selecting threshold and threshold functions to form a new feature set of indicators; Second, FICA algorithm is used to extract non Gaussian features, forming independent features to produce a new feature set of space, then LDA is used to extract Gaussian features and local information, FICA-LDA feature extraction method can obtain the discrimination features; Finally, the obtained features are input to the DELM based on QPSO for classification, as shown in the Fig.1: Wavelet transform [29] is a signal analysis tool for time-frequency analysis, which has unique advantages in time-frequency localization. It can extract information hidden in the original feature data, analyze local Where φ is the parent wavelet, t is the discrete data feature, T is the length of data, integer variables m and n are the scale and conversion factor of discrete wavelet, = * ，b = * , the important application in data analysis is data decomposition and reconstruction. ( = , , … , ) denotes detail component, ( = , , … , ) denotes the approximate component. The series of approximations represents the low frequency components, which contains the trend information of the original signal. The detail series represents the high frequency components, including the characteristic signals of the influencing factors related to the original signal. In this study, a fast discrete wavelet analysis algorithm called Mallat [36] is used, which is based on four filters: low and high pass decomposition filter, low and high pass reconstruction filter. The signal can be decomposed into an approximate value (an) and several detail values (dn) through the multi-level decomposition process based on Mallat. In the decomposition step, the signal is divided into two high frequency and low frequency components, while the low frequency is divided into two parts: high frequency and low frequency. High frequency is called signal detail, low frequency is the approximate value of signal, and the final result is the combination of high frequency and low frequency components.
The original signal can be reconstructed by the following wavelet transform: In wavelet decomposition, a selection of proper wavelet function and the number of decomposition layers are very necessary, many types of wavelet functions such as Daubechies, Symmlet, Gaussian, Mexican hat, Morlet and Shannon wavelets can be used in wavelet decomposition. It is found that Daubechies wavelet function has the advantages of effective local ability in the field of time-frequency. Therefore, Daubechies wavelet function is used as the wavelet base in this paper to balance the wavelength and smoothness. The number of decomposition layers is determined as 3 according to the relevant literature and experience value, which can more effectively separate various feature components.

Selection of threshold and threshold function
The selection rule of threshold and the design of threshold function are the key factors that affect the denoising effect, the threshold function can be divided into two kinds: soft threshold and hard threshold. The hard threshold function has a break point, which simply keeps or removes the signal. While the soft threshold is used to distinguish the threshold value, it also uses the threshold value to attenuate the signal. In this paper, the soft threshold function is used, that is, when the absolute value of the wavelet coefficient is greater than or equal to the given threshold value, the value is the original value minus the threshold value, when it is less than the given threshold value, the value is zero. After the selected threshold function is selected, the threshold needs to be selected. The quality of threshold selection directly determines the performance of noise reduction model, that is, it is very important to select the threshold and carry out threshold quantification to noise reduction [30]. The types of threshold mainly include: fixed threshold, unbiased likelihood estimation threshold, heuristic threshold, minimax threshold, etc. in this paper, the fixed threshold is adopted. In a word, the combination of fixed threshold and soft threshold is used in the selection of wavelet analysis threshold and threshold function.

Wavelet denoising algorithm
DWT is used to decompose the original data, it can compare the signals of different frequency range ,and divide the signals with different characteristics into subspaces of different resolution scales, and then reconstruct the signals with wavelet coefficients. The reconstructed data not only removes the unstable factors, but also maintains good details. The noise of measured data always has high frequency signal. The essence of wavelet de-noising is to extract high frequency part from the signal. The details of wavelet transform process include wavelet decomposition, threshold and threshold function processing and wavelet reconstruction [31]: -Wavelet decomposition. Select the appropriate wavelet function and the number of multi-scale decomposition layers, then analyze the wavelet transform of the original signal, on this basis, get the lowest approximation coefficient (low frequency) and the discrete detail coefficient (high frequency) of each layer. -Threshold and threshold function processing. After wavelet processing the threshold method is used to deal with the high frequency coefficients of the discrete detail coefficients of each layer, so as to eliminate the noise. -Wavelet reconstruction. After threshold processing the low-frequency wavelet coefficients of wavelet decomposition and the high-frequency wavelet coefficients are used to reconstruct, and the smoothed signals are obtained.

ICA
ICA [32][33][34] model assumes that the observed mixed signal variable y = (y1, y2,...) .These variables are composed of n independent unknown source signals s= (s1, s2,... sn) is a linear mixture, so there is a relationship between the mixed signal y and the source signal s, which can be expressed as y = ws using vectors, and w as the unknown mixed coefficient matrix.
Where n is the unknown unpredictable noise, which accords with the Gaussian distribution. ICA recovers the source signal and mixture matrix through y. The main idea is to remove the correlation between the data variables from the non-Gaussian signal of the original data, so that the components of each variable are statistically independent of each other. ICA has a better separation effect on the cross signal, and can more deeply mine the effective information which is covered due to the cross signal. Superior performance in signal separation, redundancy elimination and noise reduction to realize decomposition and fusion of deformation information. In this paper, formula (5) is analyzed by using FastICA algorithm which is more efficient than traditional ICA.

FICA
Because the FICA [35][36][37] algorithm takes the maximum negative entropy as a search direction, we first discuss the decision criterion of negative entropy, definition of negative entropy is as follows: Where H(Y) is a Gaussian random variable with the same variance, and is the differential entropy of the random variable According to information theory, among the random variables with the same variance, the random variables with Gaussian distribution have the maximum differential entropy. When there is a Gaussian distribution, the stronger the non-Gaussian property of; is, the smaller the differential entropy is and the larger the value is, so it can be used as a non-Gaussian measure of random variables. Since it is impractical to calculate the probability density distribution function needed to know for differential entropy according to equation (7), the following approximate formula is adopted: Where G(⋅) is some non-quadratic function and V is a Gaussian variable of standard normal distribution. In this way, the problem of finding the maximum value of J(W) is transformed into the problem of solving the maximum value of E{G( )}, that is, when solving the value of mixed matrix W, each independent component has the strongest non Gaussian property and the best separation effect. The iteration formula (9) is used for calculation Where g (.) is the derivative of function G(). At the same time, w (K + 1) was normalized by (10) Therefore, the specific iterative steps are as follows: -Step l: Select the initial value w; -Step2: Calculate w (k + 1) according to (9); -Step3: Normalize w (k + 1) according to (10); -Step4: If the adjacent w change is less than the given value, the iteration stops; otherwise, it turns to setp2

LDA[38-39] is a supervised linear dimensionality reduction method based on Fisher's discriminant criteria,
the key goal is to find a projection matrix G to maximize the Fisher criterion after sample projection. The optimal projection direction can be found by eigenvalue decomposition of the scattering matrix. The specific steps are as follows: Step1：Let x be a matrix of k class data[x 1, x 2 , … . x k ], each x k includes n i samples, then calculate the mean vector of all kinds of samples x i ′ and the mean vector of all samples x ′ , which are expressed as follows: Step2: Calculate the inter-class scatter matrix s b and intra-class scatter matrix of the sample s w , which are respectively expressed as follows: Step3:Find the projection matrix G and define fisher criterion function，in order to obtain the maximum value of G, it can be solved in the following ways: Where, g i is the eigenvector corresponding to the eigenvalue λ of matrix S W −1 S b .
Step4: The Lagrange multiplier method is used to solve all samples with the most optimal projection direction, then can achieve the purpose of dimension reduction: y = G T X (17)

FICA-LDA feature extraction algorithm
The FICA is to extract the high-order statistics information of non-Gaussian from original features, which can provide a means for many processing problems that cannot be solved by the second-order statistics method. From the non-Gaussian feature, we can find nonlinear expression which makes the components become statistically independent or as independent as possible. FICA can effectively extract non-Gaussian components of process data and reduce the complexity of detection model, but FICA is an unsupervised learning method, and it is unable to extract all kinds of correlation information between class label and various features, so it reduce its accuracy and sensitivity in feature extraction [40]. LDA is mainly to find an optimal projection matrix, so that the feature space can get a new feature space after the matrix mapping. In the new feature space, each category is easier to be recognized, and the dimension can also be reduced. LDA only depends on the second-order statistics information, and the main variables obtained are uncorrelated, and the data is Gaussian distribution. For random variables or processes that do not obey Gaussian distribution, the second-order statistics cannot fully represent their statistical characteristics [41][42].
In view of this situation, we propose a dual space feature extraction fusion algorithm based on the combination of FICA and LDA, the basic idea is that we use the FICA of high-order statistical information to extract the useful information of coal and gas outbursts, and separate the relevant and unrelated features in the new feature space, then use LDA supervised learning to extract the second-order statistics method. In this way, we can make full use of the complementarity of the two feature spaces,the difficult recognition patterns in one feature subspace are easy to recognize in the other feature subspace, FICA can extract complete useful information and separate useful features and noise features, LDA can reduce the dimension of features, improving the classification accuracy. In the two feature spaces, we combine the advantages of FICA and LDA method to mine all kinds of hidden information in the features of coal and gas outbursts to the greatest extent, which contribute to improve the prediction accuracy of classifier. The core of the algorithm is to combine the advantages of FICA and LDA dual space feature extraction algorithm, in view of the complexity and diversity of coal and gas outbursts, considering the characteristics of existing small outburst sample data, we design a dual space feature extraction algorithm combining FICA and LDA, and the implementation steps of the fusion method are as follows: Algorithm 2 FICA-LDA algorithm Input: original data set D, feature subset k1, k2. Output: feature set s. 1. Normalization the original data, and produce the new data set D; 2. Run the FICA algorithm on dataset d to centralize and whiten the dataset, the independent component signal k1 and noise component k2 are separated from the data set; 4. Run LDA algorithm on dataset k1 5. Input data set to LDA training, computer matrix intra class scatter matrix and inter class scatter matrix. 6. Get the best projection matrix and eigenvector, the dimension of training samples and test samples are reduced by mapping matrix, and the feature subset k2 of low dimension is obtained.

ELM
ELM [43][44][45] is a single hidden layer feedforward neural network, this method is mainly composed of three layers: input layer, single hidden layer and output layer. With the advantages of parameter selection, fast learning speed and good generalization performance, the ELM is fast, simple, easy to realize and has strong generalization. In small samples, logical regression and neural network are obviously affected, while ELM function maintains good performance, which makes it have obvious advantages under the condition that samples are difficult to obtain. Suppose that there are N discrete training samples {( , .Single layer feedforward neural network model can be expressed as follows: Where L is the number of neurons in the hidden layer, G (x) is the activation function, is the weight value of the ith neuron in the input layer and the hidden layer, is the bias value of the ith neuron in the hidden layer, is the weight value of the first neuron and the output layer in the hidden layer, y IUI is the actual output value. The matrix form can be expressed as: in t = h β, the hidden layer output matrix H can be expressed as: The output matrix H of the hidden layer can be expressed as follows: The output weight matrix of the hidden layer can be expressed as: = [ ， ， ] ,the output matrix of the fitting is as follows: = [ ， ， ] ,its biggest feature is that the connection weight matrix between the input layer and the hidden layer, and the offset of the nodes in the hidden layer only need to be initialized randomly once, without updating through the traditional back propagation algorithm. The only calculated one is the weight matrix between the hidden layer and the output layer, which can be obtained by the method of finding the generalized inverse matrix. Algorithm flow is as follows: Step 1: Given the training data set, randomly determine the hidden layer node parameters and input weights Step 2: Select the activation function to calculate the output matrix H of the hidden layer.
Step 3: H is the generalized inverse matrix of the hidden layer output matrix, the least square method is used to calculate the output layer weight B.

DELM
Although ELM has more advantages, it also has some problems. ELM is not suitable for constructing deeper network structure, it reduces the amount of computation and machine overhead. The weight β from the hidden layer node to the output layer node of ELM is solved by the generalized inverse matrix of the hidden layer output, and the robustness of the model is not strong. ELM only considers the empirical risk, not is the structural risk, so the data may be over-fitting, this problem can be solved by using the DELM. DELM [46][47][48][49][50] is a multiple-layer neural network, its learning procedure is highly efficient in learning time and has good generalization capabilities. Compared with ELM, DELM adds the restriction of regular term to prevent overfitting. In this method, the regularized terms are constructed in the linear equations to solve the problem of insufficient computational stability and over-fitting, and the structural risk and empirical risk of ELM are balanced. DELM is mainly to obtain the output weight by evaluating the loss function with the weighted least squares.
C is used to balance the empirical risk and structural risk of the learning machine. For the solution of the above-mentioned conditional extremum problem, the partial derivative is used after the Lagrange equation is converted to the unconditional extremum.
Therefore, for the DELM, the output weight can be expressed as follows: The objective function is

QPSO 2.4.1 PSO
Particle swarm optimization (PSO) [51]is a global optimization algorithm based on iteration, which is easy to implement and has no many parameters to adjust. In particle swarm optimization, a group of particles is composed formula of velocity and position are as follows.
Where = , , . . . , ; = , , . . . , , m is the particle size; is the D dimension of the individual extreme value pbest of the ith particle in the j iteration, is the D dimension of the global extreme value gbest of all particles in the j iteration, W is the inertia weight factor, c1 and c2 are the particle acceleration learning factors, R1 and R2 are the random numbers between 0 and 1. Inertia weight W is used to balance the global and local exploration ability of particle swarm, which is very important for particle swarm. A large inertia weight is conducive to the exploration ability but weak particle generalization ability. On the contrary, a small inertia weight is conducive to the fast generalization ability and lead to local optimization. In the current work, we propose a nonlinear decreasing inertia weight method to adjust the W value, taking further performance improvement as the equation: Where and are the maximum and minimum values of w respectively, t is the current number of iterations and tmax is the maximum number of iterations.

QPSO
The main defect of PSO is that the global generalization ability cannot be guaranteed, and there are too many parameters to be set, which is not conducive to find the optimal parameters of the model to be optimized. The change of particle position is lack of randomness and easy to fall into the trap of local optimum. In order to solve this problem, Sunetal proposed the quantum particle swarm optimization algorithm(QPSO) [52], is inspired by the particle swarm optimization algorithm trace analysis and quantum mechanics. The algorithm cancels the particle's moving direction attribute, and the update of particle's position has nothing to do with the particle's previous motion.
In this way, the random particles of particle's position are added, and the particles are moved according to the following iterative formula, Where pbest and ij are the best positions of all pbests in particle swarm, and mbest represents the average best position of the particle history of pbest. Where Xij represents the position of the ith particle and M represents the size of the particle swarm.
Pij (t) represents the update of the ith particle position. U, ∅ij are the random value generated by the unified probability distribution in the range of [0,1], α parameter is the contraction expansion factor, it is the only parameter controlling the particle generalization speed in the quantum particle swarm algorithm, and it is very sensitive to the population number and the maximum number of iterations.
Here we set the value range of α as (0.5,1) by searching, and the combination of particle swarm optimization and equation (21) is quantum particle swarm optimization. Algorithm1 QPSO algorithm The

QPSO-DELM
Coal and gas outbursts is a small sample data, the common logic regression and neural network are obviously affected, and the DELM function maintains good performance, which makes it have obvious advantages under the condition that the sample is difficult to obtain. Because DELM randomly generates input weights and hidden layer thresholds, the generalization ability of the model is poor and the prediction accuracy is not ideal. In this paper, QPSO is used to optimize the input weights and hidden layer thresholds of the DELM, which greatly improves the prediction accuracy and efficiency of the elm model. The algorithm is as follows : Output: Regularize parameter C, input the optimal value of weight and hidden layer threshold.
Step1:Define related parameters: Search space N, Iteration number G, population M, fitness function.
Step2: Quantum code the cascade of input weights and hidden layer thresholds of DELM model to be optimized, and initialize pbest & gbest of particles, initialize the parameters of position and velocity.
Step3: for each particle to get the best solution mbest, Nrad is between[0,1],and it is randomly generated for each qubit. If Nrad≤ |α 2 | then the qubit is 0, otherwise it is 1.
Step4: Calculate the fitness function of each particle.
Step5 Step7:Determine whether the most conditional or the maximum number of iterations are met, output the optimal solution regularization parameters, input the weights and hidden layer thresholds, and return to step 3 for execution.

Dataset description and preprocessing
The influencing factors of coal and gas outbursts include geological stress, gas and physical properties of coal seam. The experimental data come from the historical data of Pingdingshan No. 8  reference to previous studies [53] the coal and gas outbursts influencing indexes include the following: gas pressure(1),initial velocity of gas output(2), initial velocity of gas emission(3), coal seam firmness coefficient(4),structural coal thickness(5), fault structure complexity (6). These indexes are operability, extensive and applicable in engineering practice. Gas outbursts is a bipartition problem, namely 1 for the outburst type and 0 for the non-outburst type, as listed in column7. Table 1 gives a part of coal and gas outbursts sample data. In order to make the experimental results more objective, the method of ten-fold cross-validation is used to verify the classification effect, that is, the data set is randomly divided into ten parts, one of which is taken as the testing set in turn, the other nine parts are taken as the training set, and then the corresponding classification method is run to classify and learn the data in the training set, and the testing set is used to test, and the corresponding 10 times testing results are obtained. In this paper, coal and gas outbursts prediction is a two-classification problem (outburst or normal),according to the data distribution characteristics of coal and gas outbursts, due to the use of different units in the acquisition, there are great differences in the order of magnitude, which affects the convergence speed and accuracy of the algorithm. Before training the model, this paper uses the normalization method to process these data into new data within the range of [0,1], and removes the dimension to make the indexes more comparable.

Experimental environment and parameter setting
The validity of the algorithm is further verified by comparing experiments on real datasets. All experiments are completed on a CPU with the main frequency of 2.8GhZ and a PC with 4GB memory. The operating system is windows7, and the algorithm is implemented in Python 3.7 environment.
True TP indicates the number of samples correctly marked for application, true negative TN indicates the number of samples correctly marked for application, false negative FN indicates the number of samples incorrectly marked for application, and false negative FP indicates the number of samples incorrectly marked for application.
According to the above evaluation parameters, the overall accuracy, precision, sensitivity, specificity are obtained according to reference. In order to eliminate the random factors, we perform ten times with different random seeds on each experiment. The average performance of these ten repetitions is regarded as the final result. In order to ensure that the fair comparison between the methods, we use RF as a filter feature section classifier to select the best subset, the proposed computational method such as (SVM, KNN, RF, DT)might be sensitive to the values of its main controlling parameter, we use grid search optimization to tune all kinds of parameters. The wavelet basis function is DB4, the number of decomposition layers is 3, the parameters of the DELM and SVM are selected by QPSO and cross validation.

Experimental results and analysis
In this paper, the following experimental steps are used to verify the comprehensive performance of our  than that of FICA. At the same time we also find out that the prediction accuracy of different feature extraction with different classifiers is highest when the independent component is IC3.
The results show that when the number of independent components is too low, it is difficult to completely separate the independent information carried in the feature, and the independent components decomposed cannot well express the sample information. When the number of independent components gradually rises, the independent information in the detection signal can be separated one by one to fully express the sample information it represents, which leads to the improvement of the final discrimination accuracy. When the number of independent components reaches a certain level, the prediction accuracy of the discriminant model is stable. The accuracy does not continue to increase with the increase of independent number. At the same time, a certain feature of the sample may be reflected in multiple features. In order to avoid the complexity brought by feature redundancy and interference to the stability of the model, it is necessary to eliminate the feature describing the same sample property in different dimension space and compress the data to a certain extent while mining the sample difference features. The characteristics of DWT+FICA can solve this series of problems very well, which is very helpful to improve the signal-to-noise, enhance the statistical independence and non-Gaussian of the feature, and separate the components as independent components with non-Gaussian as independence measure. In order to verify the accuracy and validity of the FICA-LDA feature extraction proposed in this paper, all kinds of feature information such as linear feature, Gaussian feature, non-Gaussian feature, nonlinear feature and any two features are extracted to form new feature vector respectively. SVM and ELM classifiers are used to classify, and the classification results are compared with the methods proposed in this paper. Here we mainly compare the effects of several different feature combination. From Table2-3, it can be seen that after FICA, KPCA, PCA, LLE and LDA are combined on the two classifiers, the accuracy of algorithms are improved. Among them, FICA-LDA feature extraction is the best way to extract features on ELM, with the accuracy 0.99, which is higher than that of KPCA-LDA, PCA-LDA and LLE-LDA, and the execution time is the lowest. The indexes of SVM can be slightly lower than that of ELM, this is because after the original data is decomposed by FICA, the feature points of the original feature is compressed, and the redundant information in the feature is also removed, the relatively independent information in the original feature is mined by using the characteristics of high-order moments of the dataset, so that the feature information covered by the original cross feature can be effectively separated from each other, which can improve the modeling effect. The LDA removes the redundancy and obtains the characteristics of the optimal discrimination, so we can establish a discrimination model based on this kind of information through LDA on the basis of FICA, which has a better discrimination effect and improves the classification performance. KPCA, PCA and LLE are linear and nonlinear feature extraction methods respectively, and their effects are close to FICA.

Performance comparison of DWT+FICA combining with different feature extraction methods
The coal and gas outbursts influencing factors has the characteristics of non-Gaussian distribution and high nonlinearity. The traditional feature extraction methods only consider the information maximization while neglects the cluster structure information of the sample data, which leads to the incomplete feature extraction. The features are linear and each feature has non-Gaussian characteristics, which are the complex of Gaussian signal and non-Gaussian signal. FICA-LDA feature extraction is very suitable for the prediction of outbursts in this paper, it can effectively select feature information and ensure the integrity of the information to achieve better classification results. From the different indicators of the two classifiers, we can see that the effect of various indicators obtained by FICA-LDA feature extraction is not only higher than that of the single feature method, but also is the highest in various feature combinations. It shows that the combination of FICA-LDA feature extraction multiple features can extract more features and improve the classification accuracy.  From Table 6 Table. For SVM classifier, the classification accuracy of DWT+FICA -LDA is 0.99, which is higher than that of DWT+FICA and DWT+LDA. DWT+FICA-LDA with ELM and DELM classifier all have the highest accuracy, this shows that the combination of DWT+FICA-LDA can produce good prediction accuracy. FICA only relies on high-order statistics information, which can effectively extract non-Gaussian components and global information of process data and reduce the complexity of detection model. But FICA is an unsupervised learning method, it can't extract all kinds of correlation information between class label and feature, which reduces its accuracy and sensitivity in feature extraction. LDA is mainly to find an optimal projection matrix, so that the feature space can get a new feature space after this matrix mapping, and each class in the new feature space is easier to be identified, and the dimension can also be reduced. LDA only relies on the second-order statistics information, and obtains the local information. The main variables obtained are uncorrelated and cannot fully represent their statistical characteristics. In this way, FICA and LDA feature extraction methods are combined with each other, with the complementary advantages, and the feature information of coal and gas outbursts extraction are complete. From Table6-8 we can also see that the QPSO and PSO algorithm are compared with the classifiers of SVM, ELM and DELM. In terms of accuracy and other indicators, the QPSO algorithm can achieve 1 accuracy. Compared with PSO, the accuracy is improved and the running time is less, but the accuracy of SVM is no change in the accuracy. This shows that the combination of different classification algorithms and appropriate optimization algorithms is very important for the prediction effect.

Comparison with existing methods in literatures Table9
Comparison with different algorithms in existing literatures

Accuracy Precision
Specificity Sensitivity AUC MCSS TIME It can be seen from Table9 that compared with the models in the literatures, the method proposed in this paper has excellence results, the accuracy is 1. Compared with KPCA + CS-ELM, KPCA +PSO-PNN and LDA, this proposed method adopts DWT to perform the feature denoising, and FICA-LDA is used in feature extraction, then the strong nonlinear fitting ability of DELM is used to achieve better prediction results, with the accuracy increased by 0.16, 0.15 and 0.11 respectively; Compared with ISOMAP-WLSVM, PCA+SVM, FA+RF, the effect of proposed method in this paper is more effective, with the accuracy increased by 0.17, 0.15 respectively, it can achieve less consumption of calculation time and lower feature dimensionality on the premise of a high accuracy. Consequently, we can conclude that the proposed method has an advantage in combination of property in training time and testing accuracy than the other compared methods. At the same time we also compare their performances in terms of computation complexity, which reveals their running time for real data. The running time is obtained in the following mode, and we run all the methods and employ RF as classifier to select the best subset. It can be seen from Table9 and Fig.5that the running time of ours is 1.40s, which is higher than other feature extraction methods and is lower than most feature selection methods.
Moreover, it can predict outbursts according to the distribution characteristics of coal and gas outbursts sample data, which not only has better classification performance, but also has a low running time. The feature selection methods are between 0.95s-2.55s. Because these feature selection methods have the defect of high complexity, which make them more time-consuming. The advantages of our proposed methods are significant compared with most feature selection methods. Therefore, Comprehensive analyses demonstrate that the proposed feature selection method is effective in feature selection and classification performance for outbursts.
To sum up, our proposed method has higher prediction accuracy and average computation cost over the compared prediction methods in prediction of coal and gas outbursts. However, the proposed model has some defects. The proposed model has only been validated on one available dataset which come from pingdingshan no8 mine, but dataset of different mines can be tested in order to achieve better generalization performance. The current work deals with solving a two-class classification problem, however solving a multi-class outbursts classification problem is highly in demand. Further, ours requires more time cost to obtain the high accuracy, so a model with less computation time cost based optimization scheme can be investigated in future. . We compare the accuracy of the proposed with the accuracy of others in literatures. The results are achieved and the confidence interval is set to 0.95. Table 10 shows the E-value of independent sample binomial test. If E-values is a very small number, then the proposed algorithm is better than the other methods with high confidence, it means that there is a significant difference between the two methods. From the result, it shows the proposed method is better than the other methods with high confidence for this problem. The values less than 0.05, it proves that the performance of ours are better than others. According to the results of Table10, there are not values more than 0.05, which shows the performance of ours have much different with other methods, the performance of ours can obtain the highest accuracy compared with other methods in literatures.

Conclusion
According to the redundancy and uncorrelated characters in coal and gas outbursts, this paper introduces the DWT into the feature preprocessing of coal and gas outbursts, and reasonably eliminates the random fluctuation and noise feature information, which contributes to improve the accuracy of coal and gas outbursts.
Due to the complexity of coal and gas outbursts, the components obtained by DWT has redundant and uncorrelated information, which can't get better noise reduction effect, the coal and gas outbursts have non-linear and non-Gaussian, on the basis of the above, this paper deeply analyzes the combination of FICA and LDA algorithm, which can capture the essential information contained in it as much as possible, and uses the non-Gaussian to separate the components as independent components. In view of the incompleteness of information extracted by FICA, we input the features and class labels separated by FICA into the LDA supervision layer, projects the features and makes the features have class differentiation, carry out feature fusion and dimensionality reduction, extract Gaussian feature components, and eliminate the redundancy and correlation between features, then new features are formed; In view of the randomness of the parameter setting of DELM algorithm, the QPSO algorithm is applied int the optimization of DELM parameters, and the coal and gas outbursts prediction model based on QPSO-DELM is proposed, then the prediction accuracy of the model is significantly improved. Through example verification and comparative analysis, it is proved that the model has high identification, strong generalization ability, and high prediction accuracy, and it can be applied in the field of coal and gas outbursts prediction.
In the future, we will verify the proposed methods on different data sets of coal and gas outbursts, improve the generalization ability and adaptability of the model, provide new ideas and effective technical means for coal and gas outbursts prediction.

Availability of data and material
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study. Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

Declaration of Competing Interest
The authors declare no competing financial interest.