Seismic Events Prediction Using Deep Temporal Convolution Networks

. Seismic events prediction is a crucial task for preventing coal mine rock burst hazards. Currently, this task attracts increasing research enthusiasms from many mining experts. Considering the temporal characteristics of monitoring data, seismic events prediction can be abstracted as a time series prediction task. This paper contributes to address the problem of long-term historical dependence on seismic time series prediction with deep temporal convolution neural networks (CNN). We propose a dilated causal temporal convolution network (DCTCNN) and a CNN long short-term memory hybrid model (CNN-LSTM) to forecast seismic events. In particular, DCTCNN is designed with dilated CNN kernels, causal strategy, and residual connections; CNN-LSTM is established in a hybrid modeling way by utilizing advantage of CNN and LSTM. Based on these manners, both of DCTCNN and CNN-LSTM can extract long-term historical features from the monitoring seismic data. The proposed models are experimentally tested on two real-life coal mine seismic datasets. Furthermore, they are also compared with one traditional time series prediction method, two classic machine learning algorithms, and two standard deep learning networks. Results show that DCTCNN and CNN-LSTM are superior than the other ﬁve algorithms, and they successfully complete the seismic prediction task.


Introduction
Underground coal mines are different from general permanent tunnel engineering; like subways, their stopes are constantly moving with the mining activities ongoing.Affected by mining disturbance, the force equilibrium status of coal is destroyed and the internal stress of coal and rock is redistributed.Due to the above situation, rock burst disasters occur frequently.Seismic events prediction can directly reflect the safety conditions of underground coal mine, and it helps preventing rock burst accidents and hazards effectively.erefore, seismic forecasting is a crucial guarantee for coal mine safety production.Previously, conventional predictors are mostly based on classic geomechanics, applying unified indexes to evaluate all coal mine roofs [1,2].However, the mechanism of roof disaster has not been thoroughly studied; it is difficult to accurately establish geomechanical models for simulating the occurrence processes of roof disasters.
Several coal mine safety scholars have attempted to explore some data-driven works [3].Considering the temporal characteristics of coal mine seismic data, they abstracted underground coal mine seismic prediction as a time series regression task.Time series regression or prediction is a top ten challenge in data mining [4]; it plays an extremely important role in many domains.Time series prediction has attracted increasing research enthusiasms from different communities over the past years.A lot of classic prediction methods have been proposed, like ARIMA [3].Traditional time series prediction approaches have good mathematical and theoretical explanation, but some of those classic methods treat time series with linear characteristic hypothesis, while, seismic time series is typical nonlinear data.us, applying those conventional methods on coal mine seismic prediction is not appropriate.
Recently, artificial intelligence technologies like machine learning have been successfully applied in computer vision, audio synthesis, and natural language processing [5].It inspired some mining experts to utilize classic machine learning algorithms on mining safety areas, like support vector machines (SVM) [6][7][8][9] and random forests (RF) [10,11].ese machine learning algorithms rely on high quality and hand-crafted statistic or domain feature engineering, which is the foundation to guarantee the forecasting effects.However, manually defining or extracting detailed features from the monitoring time series is generally time consuming and laborious.Moreover, hand-crafted domain features require the cooperation of mining experts, which might be greatly influenced by subjective factors.In addition, if the original monitoring data are directly fed to the abovementioned algorithms, the relevant models could output inaccurate prediction which will lead to false risk alarms.
In the realms of time series prediction, some efforts were spent on addressing the above issues with deep learning algorithms in the past few years.e most popular deep sequential models are recurrent neural networks (RNNs).Except receiving the signals from the previous layer, each layer of RNN can additionally learn its own historical information.Based on this mechanism, RNN is naturally suitable for predicting time series.However, the application of standard RNN is still limited by some training problems, like gradients vanishing [12,13].Long short-term memory network (LSTM) is a modified RNN model which was designed with gated mechanism to avoid these problems.With longterm memory ability, LSTM has been the preferred model for time series prediction in the field of deep learning [14].
While some recent studies have shown that convolutional networks can achieve similar results and even surpass LSTM on time series prediction tasks [15][16][17], standard convolution neural networks (CNNs) were designed for image processing.After modified with 1D convolution kernels, CNN can be used to forecast time series.e modified CNN can be called as temporal CNN, which could automatically learn time-translation invariant features from time series.In order to make temporal CNN extract longterm historical features, some scholars proposed casual strategy and dilated CNN kernels [15,17].ose improved networks are capable of learning time series data autoregressively and memorizing historical long-term information with larger convolutional receptive fields.Based on the modified temporal CNNs, some experts conducted a series of explorations on audio synthesis [17], financial index prediction [15], power load forecasting [18], mechanical fault diagnosis [19], and urban water level prediction [20].
Despite the success of temporal convolutional networks in all these areas and time series prediction applications, there has not yet been an effort to apply deep temporal convolutional networks on the field of coal mine seismic events prediction.Zhou et al. [21] discussed different evaluation methods of mining rock bust; they found that deep learning techniques outperform shallow machine learning algorithms in almost all categories where data are plentiful.e motivation of this paper is to attempt this idea on such application.In particular, we aim to address the long-term historical dependence issue of underground coal mine seismic events prediction with deep temporal convolutional networks.e focus of this paper is to propose deep temporal convolutional models that can be successfully applied for underground coal mine seismic events prediction.Specially, the contribution and novelty of this paper are as follows: (1) Abstract the underground coal mine seismic events prediction into time series forecasting task (2) Two generative deep learning models are proposed to deal with the long-term memory problem: dilated causal temporal convolutional network (DCTCNN) and CNN-LSTM hybrid network (3) Improve the standard deep learning network to address the long-term historical dependence in time series prediction (4) Design the modified deep learning networks through expanding CNN local receptive field and hybrid modeling (5) Test the proposed models on two real-life coal mine seismic datasets and compare them with some classic algorithms Section 1 starts with an introduction.Brief reviews of related works are given in Section 2. Section 3 describes the proposed models.Section 4 presents the experimental setup.Section 5 provides the results and discussion.e conclusion is given in Section 6.

Related Works
Previously, some mining experts studied the hazard prediction methods based on time series regression.Wang et.al. [3] applied an autoregressive sliding integral seasonal product model (ARIMA) to predict coal mine water inflow.
ey smoothed underground water inflow data through standard differential method.However, their predicted values were linearly related to the input monitoring time series, which is not in accord with the real-life situation.
In order to overcome the problem that the above method cannot fit nonlinear monitoring time series, many researchers have applied classic machine learning algorithms.
ey proved that machine learning algorithms are more suitable for underground coal mine risk prediction.Hui et.al. [22] proposed a chaotic neural network for predicting underground coal mine rock burst hazards with monitoring seismic time series.Jian et.al. [6] applied SVM to predict long-term rock burst, and they combined SVM with particle swarm optimizing algorithm to forecast roof failures in a large scale underground goaf [9].Similarly, Juisheng and edja [7] used intelligent firefly algorithm to optimize least squared SVM for coal mine disaster forecasting.Yahui et.al. [8] combined genetic algorithm with SVM to assess coal mine rock burst risk.Except SVM, Jian et.al. [10,11] also used RF to forecast roof bolt support stability and rock burst hazards.ese machine learning based algorithms can relieve the shortcomings of traditional temporal sequences prediction approaches, but there are still problems of overreliance on hand-crafted features and inability to process large scale data.
Due to the abilities of automatic feature extraction and large-scale data processing, deep learning has been a popular research field.e mostly used deep temporal models are recurrent networks, especially LSTMs.eoretically, RNN is considered with the ability to memorize infinitely long sequences.However, standard RNN could not handle the problem of gradient vanishing [12,13].Gradients at a certain moment become too small to finish the backpropagation of RNN, which could lead to the termination of training process and the long-term historical information cannot be memorized.To address this issue, Hochreiter and Schmidhuber [23] designed LSTM with the gated mechanism.Laptev et.al. [24] and Lingxue and Nikolay [25] applied LSTM to forecast the waiting time of Uber passengers.Flunkert et.al. [26] proposed an autoregressive and probabilistic prediction network (DeepAR) to forecast retail volume and electrical and traffic load.Fernández-Navarro et.al. [27] applied an autoregressive strategy to improve the long-term memory ability of RNN.
e recurrent structures and long-term memory abilities of LSTMs are natural advantages on temporal sequence prediction.Many scholars of deep learning community regard these networks as the preferred methods for time series forecasting [5,14].However, some studies have shown that CNN based models can also achieve similar or better results [15][16][17].ey modified standard 2D CNNs into 1D temporal CNNs, which can be applied as a feature extractor to automatically learn time-translation invariance features.Haytham et.al. [20] used 1D CNN to predict urban water level; Alberto et.al. [28] and Yunxuan et.al. [29] applied those improved convolutional networks to forecast traffic load, wind speed, and financial time series, respectively.
Although the abovementioned 1D CNN can fit the data structure of time series, they did not take the long-term dependence issue into account.e modified convolutional networks cannot memorize long-term historical information.To deal with this problem, Google's Deep Mind team proposed a generative model called WaveNet for audio synthesis [17].e main contributions of WaveNet are the causal strategy, dilated CNN kernels, and residual connection mechanism.Causal strategy autoregressively creates a recurrent structure on convolutional network, dilated kernels allow CNN obtaining the long-term historical memory, and skip connection avoids the omission of useful information.Based on the idea of WaveNet, Anastasia et.al. [15] proposed an improved dilated convolutional network for financial time series prediction; Bai et.al. [16] designed a generative network with causal and dilated CNN filters for music and voice temporal sequences generation.
Another idea of solving the long-term memory issue is to combine temporal CNN and RNN or LSTM and utilize the advantages of these two networks to complete time series prediction tasks.He [18] and Wu and Tan [30] proposed hybrid deep learning models, in which CNN and RNN were connected in a serial way for short-term power and traffic load forecasting.Rui et.al. [31] designed a convolutional bidirectional LSTM network for motor fault diagnosis.Lai et.al. [32] proposed a traffic load forecasting using diffusion convolutional recurrent network.e above works have provided valuable research inspiration for us.A dilated convolutional network (DCTCNN) and a CNN-LSTM hybrid network are proposed in this paper to address the long-term historical dependence issue of underground coal mine roof risk prediction tasks.

Deep Temporal CNN Generative Network
Given a time series x { } of length N and a prediction model with parameter θ, this model would predict the next time step value x(t + 1) based on the previous t steps of the input data.
e likelihood function can be formulized as the following equation: ( For learning the likelihood p(•) of the prediction model, this paper establishes two deep temporal CNN-based generative networks in Sections 3.1 and 3.2.In particular, we apply multivariable time series input which means the input is [x(0), . . ., x(N − 1)].

Dilated Causal Temporal CNN.
Time series data often exhibit long-term dependence and correlation.In order to make temporal CNN capable of extracting long-term historical features, we utilize dilated convolution kernel mechanism which can be defined as where (w l h * d f l−1 )(i) is the i th convolutional kernel feature mapping of the l th layer, w is the weight parameter, f l−1 denotes the activation function of the previous layer, and h and d represent the number of cells and the dilated coefficient respectively.When the current computed data are the j th time step value, the current channel is m and M i−1 is the numbers of all channels.
In the input time series of dilated kernel, convolutional operation is performed every d time steps.rough this manner, temporal CNN can effectively learn the historical information which is far from the current time step.Suppose the number of dilated CNN layers is l ∈ 1, . . ., L { }. e dilated coefficient is increased by powers of constant two per layer, namely, d ∈ 2 0 , 2 1 , . . ., 2 L−1  . e convolutional filter size is 1 × k. e local receptive field r of each neuron is the set of changed elements in the previous output.For example, the receptive field of the L th layer is 2 L−1 × k, which means it has been enlarged 2 L−1 times of the original ones.A threelayer dilated CNN can be illustrated as Figure 1.
rough the dilated coefficient, convolutional filter size, and layer number, we can control the size of the enlarged local receptive field which would benefit from obtaining more historical information.In addition, we applied causal strategy to make sure that the local perceptive field only contains x(0), . . ., x(t) when predicting x(t + 1).Causal strategy can avoid the influence of future information on convolutional layers.is is equivalent to lling the input time series with zero vector [0, . . ., 0, x(0), . . ., x(N − 1)] ∈ R N+r . (3) where x(N) is the predicted value at time step N.
During the testing phase, the predicted value of every single step is obtained by feeding the trained network with [x(t + 1 − r), . . ., x(t)], where t + 1 > r, while multiple step predicted values are conducted in an autoregressive way.For example, when the model gives the prediction of x(t + 2), it is fed with [x(t + 2 − r), . . ., x(t), x(t + 1)].According to the above description of DCTCNN, the conditional expectation of predicted value on each time step can be formulized as + e r (x(t)), (5) where e i (i 1, .., r) is the previous expectation on the i th dilated neuron; it is learnable through the model.
Due to the sparse connection and parameter sharing mechanism of CNN, the above-described DCTCNN could automatically learn the translation-invariant features from input time series.Furthermore, it can also reduce the amount of model parameters.
e objective function is formulized as equation ( 6). e weights of DCTCNN are trained to minimize the mean squared error (MSE).To avoid over tting, L2 regularization is applied.Regularization can make the network get the balance between data tting and model complexity during the training phase; it can prevent the excessive weight matrix.us, L2 regularization could avoid over tting and improve the generalization of DCTCNN.

E(w)
1 where λ is the regularization term, w l h denotes the weight parameter, and x(t + 1) is the predicted value of x(t + 1) by using x(0), . . ., x(t).
From the perspective of Bayesian theory, the abovementioned minimized function is equivalent to maximizing the posterior probability.Furthermore, it is subject to a Laplace distribution with the center of x(t + 1) and the scale of 0.5.
e purpose of model training is to nd weight parameters that minimize the loss function.Deep learning networks are generally optimized with gradient descent methods and so does DCTCNN.
e parameters of DCTCNN would be updated as follows: where T and η represent the number of training iterations and the learning rate, respectively.Each iteration τ contains the computation of the forward predicted time series x and its corresponding error E(w(τ)) and the backpropagation gradient ∇E(w(τ)).e derivative of each relevant weight parameter and its gradient are calculated by using the chain rule in the backpropagation process (see equation ( 9)). e number of training iteration is set to ensure the network convergence, which is 500 in this paper.DCTCNN is trained with Adam [33]; this optimization method can adaptively update the learning rate parameter.At the nal layer, gradient is computed as follows: where a(•) and f are the activation function and output feature mapping of previous layer, respectively.Using a nonlinear activation function in each layer enables the model to learn nonlinear representations from the input monitoring time series.In this paper, recti er linear unit (ReLU) is applied as equation (10) and the output of l th layer is formulized as follows: ReLU(x) max(x, 0).
where b ∈ R is the bias, * d denotes the convolutional operation with dilated coe cient, and f l ∈ R 1×N l ×M l+1 represents the convolution output with w h l , h 1, . . ., M l .When the network is too "deep," the standard backpropagation becomes unstable and the training error would be increasing.
is phenomenon is called performance degradation.To address this problem, residual and skip connection is added in each dilated module of DCTCNN.Connecting the input and output of each convolutional module can force the network to approximately learn residual mapping instead of the original output.en, the residual term is passed as input to the next layer through the residual connection, ensuring not missing any useful information.Journal of Electrical and Computer Engineering Above all, the nal prediction is computed through the forward pass on the optimized DCTCNN.Considering the scale sensitivity of MAE, input monitoring time series needs to be standardized.e proposed dilated CNN module is demonstrated as follows.
Based on the module in Figure 2, the proposed DCTCNN can be established as Figure 3.In the proposed network, a fully connected dense layer and a linear activation are applied for outputting time series predictions.Dropout operation with 50% is added to avoid over tting; it only works during the training phase.For prediction task, the inputs of DCTCNN are multivariable time series and six hyperparameters.e training and testing phases of DCTCNN are summarized in Algorithm 1.

CNN-LSTM Hybrid Model.
e proposed generative model DCTCNN in Section 3.1 takes advantage of dilated and causal convolution mechanism to address the historical long-term dependence problem.It is based on the idea of expanding convolutional kernel receptive eld.While this section combines one-dimensional CNN with LSTM to solve the long-term memory issue of time series prediction, speci cally, temporal CNN works as an automatic feature extractor to obtain temporal translationinvariance features from input time series; the long-term memory capability of LSTM is applied to express the historical long-term dependence of the convolutional output.We name this generative network as CNN-LSTM hybrid model.Suppose the i th input time series is where l is the length of sequence, x (t) i denotes the value of time step t, and d represents the input dimension (number of input variables).
e rst layer of CNN-LSTM is temporal convolution layer, whose purpose is to automatically extract shortterm temporal features from the input multivariable time series.e convolutional operation can be formulized as follows: where c k is the output feature mapping of the k th convolutional lter and the corresponding weight and bias parameters are w k and b. ese kernels slide from the start time step to the end for completing the convolutional operations on the entire input sequences.e abstracted feature output is where m is the convolutional lter size.e rst layer contains several kernels, which can be shown as x 1:m , x 2:m−1 , . . ., x l−m+1:l .After the convolutional layer, a max pooling layer is used to subsample the extracted feature mapping: where s is the pooling size.
With the above description, the convolutional layer and the max pooling layer constitute a temporal CNN module.As shown in Figure 4, the input size is n × l × d in which d is the number of input samples.e output feature mapping size is n × [(l − m)/(s + 1)] × k, where k is the number of kernels.e input multivariable time series are abstracted and compressed from size l to [(l − m)/(s + 1)].Temporal CNN module essentially acts as an automatic feature extractor.
After several modules are shown in Figure 4, the output feature mapping is sent to long short-term memory cells.e unfold LSTM on time axis shows a chain structure, which is proposed for addressing the gradient vanishing of standard RNN training.A memory state cell is added in LSTM for storing long-term historical information.In addition, three gates are designed for controlling the data-flow.Specially, forget, input, and output gates are applied to determine the insignificance information, the useful information, and the output fused information.Temporary saved information is fused with the previous memory state; the long-term historical information and the current memory are fused with above manners.erefore, LSTM could deal with the problems of long-term historical dependence and gradient vanishing.e long short-term memory module is illustrated in Figure 5.
After LSTM, the proposed hybrid network applies a linear regression layer to compute the final prediction.
where  y t is the prediction at the current time step, W r ∈ R k×z and b r ∈ R z are the weight and bias parameters of linear regression, and z denotes the output dimension.
In the training phase, MSE is applied to be the objective function of CNN-LSTM.
Similar with Section 3.1, CNN-LSTM is also trained with Adam, and its structure diagram is shown as Figure 6. e training and testing phases of DCTCNN-LSTM are summarized in Algorithm 2.

Experimental Setup
e proposed generative temporal convolutional networks were tested on two real-life underground coal mine seismic dataset, which are shown in Table 1.UCI seismic bumps dataset [34] describes the problem of high-energy seismic bumps forecasting in a Polish coal mine Wesola.It concludes 2584 samples and 18 columns.We split this dataset into 60%, 10%, and 30% as training set, validation set, and test set, respectively.e mounted risk monitoring system in Wesola would give a seismic hazard alarm if the accumulated seismic energy is higher than 5 × 10 4 J. e other seismic dataset was from AAIA′16 Data Mining Challenge (Predicting Dangerous Seismic Events in Active Coal Mines [35]).e organizer Knowledge Pit offered a large volume dataset to predict increased coal mine seismic activities that endanger coal workers working underground.AAIA′16 seismic dataset was prepared by the organizer as the training set (133151 samples) and test set (3860 samples), and we split the 10% of the training set for validation.
In the experiments, our goal is to predict the total seismic energies.
erefore, we assigned the column 17 of UCI

6
Journal of Electrical and Computer Engineering seismic bumps and column 5 of AAIA′16 as the label columns.Note that all these datasets indicated in Table 1 were well prepared and cleaned of malformed and erroneous values, without missing attributes.erefore, we test DCTCNN and CNN-LSTM on these real-life datasets to perform the coal mine seismic events prediction.
In order to make the experiment more objective and fair, DCTCNN and CNN-LSTM were tested against ARIMA, support machine regression (SVR), random forest regression (RF), temporal CNN, and standard LSTM.We have split the 10% of experimental training sets for validation.For comprehensive evaluation, this work adopted 5-fold cross validation.Rooted MSE (RMSE) and mean absolute error (MAE) were used to test the performance of those methods before inverse standardization.Except standardization, the proposed networks did not commit any manual feature engineering operations.In particular, the predicted results were inversely standardized after the testing phase.e tested networks were trained with Adam in which the learning rate is 0.001, β 1 0.9, β 2 0.999, and ε 1e − 8. e batch size and the training iteration were xed as 64 and 500, respectively.All the deep neural networks were implemented in the deep learning framework TensorFlow.Experiments were executed on a PC with an Intel i7-6700K 4.0 GHx processor, 32 GB RAM, and a GTX1080 GPU accelerator.

Results and Discussion
e proposed networks were compared with the abovementioned ve methods.
eir performance on underground coal mine seismic events prediction is shown in Tables 2 and 3.
Among the seven compared temporal sequence prediction methods, traditional time series prediction method ARIMA performed worst on both error metrics.For the classic machine learning methods, the RMSE scores of support vector regression and random forest regression were (0.213, 0.698) and (0.108, 0.455), respectively.eir MAE scores were (0.076, 0.586) and (0.048, 0.045).SVR and RF worked better and ran faster than the traditional time series prediction algorithm.However, compared with the deep learning algorithms, conventional machine learning algorithms still performed inferior.e results of LSTM were (0.010, 0.366) RMSE and (0.026, 0.284) MAE; the corresponding score of normal    Journal of Electrical and Computer Engineering temporal recurrent network was (0.006, 0.198) and (0.009, 0.159).LSTM ran 46.3 s on UCI seismic bumps dataset and 1496.2 s on AAIA′16 seismic for 500 iterations.TCN ran 26.1 s on UCI seismic bumps dataset and 1347.7 s on AAIA′16 seismic for 500 iterations.e proposed generative networks performed better than the above algorithms.DCTCNN worked best with (6.673 × 10 −4 , 6.080 × 10 −3 ) RMSE and (1.031 × 10 −3 , 0.070) MAE on two datasets; its running time was 79.2 s and 4729.3 s.CNN-LSTM hybrid network got (4.950 × 10 −3 , 0.011) RMSE and (3.667 × 10 −3 , 0.109) MAE; its running time was 60.5 s and 2302.4 s.Although DCTCNN and CNN-LSTM were not the fastest models, the experimental results of the proposed models in this work were better than the above algorithms in 2∼3 orders magnitudes on UCI seismic bumps and 1∼2 orders magnitudes on AAIA′16 seismic.It illustrated that they are more appropriate on underground coal mine seismic events prediction tasks.
e performance of the above seven approaches are exhibited in Figures 7-13.
From the above figures, the energies of many coal mine seismic activities are higher than 5 × 10 4 J. ese seismic events are extremely dangerous, and they need to be   7, the traditional time series prediction method totally failed.ARIMA just fitted the data with the default hypothesis of linear characteristics.However, underground coal mine seismic prediction is a nonlinear task; it is not appropriate to predict the rock burst hazards in a linear way.Figures 8 and 9 showed that machine learning-based algorithms performed much better than ARIMA, and they can forecast several mild dangerous events which are below the alarm threshold (5 × 10 4 J).For the activities which are beyond the alarm threshold, RF did better than SVR.Support vector machine is a generalized linear model which can be nonlinearized with kernel functions like radial basis function (RBF).But it might be overfitting on the training set.While random forest is a tree-based ensemble algorithm, it randomly applies a bagging of several decision trees during the training phase.us, RF could be trained more generalized than SVR. is is why RF could predict several hazards in Figure 9. Nonetheless, both RF and SVR did not perform well enough on coal mine seismic prediction.
Unlike classic machine learning algorithms, deep learning networks do not rely on hand-crafted feature engineering, they can commit end-to-end learning, and the high-level abstractions can be extracted through their hierarchical structures.As shown in Figures 10 and 11, LSTM and temporal CNN performed much better than RF and SVR.Under the alarm threshold level, their predictions almost cover all the mild dangerous events.Surprisingly, TCNN also forecasted several risk events and worked better than sequential algorithm LSTM. is situation endorsed the description in Section 1 and also some previous works [15][16][17].TCNN could automatically extract the timetranslation invariance features with one-dimensional CNN kernels.However, it did not catch the risk events whose seismic energy hit the alarm threshold.is can be explained by the lack of long-term historical memory in TCNN, which is the core issue of underground coal mine seismic prediction.
For addressing the problem of long-term historical dependence in risk prediction, this work proposed two generative temporal CNN-based networks: dilated causal temporal CNN (DCTCNN) and CNN-LSTM hybrid model.
eir performance on underground coal mine seismic prediction can be seen in Figures 12 and 13. ey successfully complete the prediction task, and all the seismic hazards have been accurately forecasted in advance.DCTCNN solved the long-term historical memory problem with expanded convolution receptive fields; it could obtain more historical information from the input monitoring time series.CNN-LSTM hybrid network addresses the corresponding issue by utilizing advantages of CNN and LSTM.When testing on a small dataset, it is difficult to identify which is the better model.However, DCTCNN beat CNN-LSTM on large volume seismic dataset, and CNN-LSTM spends less running time.From this perspective, the power of deep learning on large volume data is demonstrated, and it can be clearly observed in Figures 7-13.
e training processes of DCTCNN and LSTM can be seen in Figures 14 and 15.From the model training perspective, DCTCNN performed more stable and convergent than CNN-LSTM.In particular, the mechanism of DCTCNN would not cause an extra increase on parameter set.Furthermore, the application of residual connection and L2 regularization avoids information omission and model overfitting.However, CNN-LSTM has a simpler network structure, and it can be trained faster than DCTCNN in about 2 times.Furthermore, CNN-LSTM is more appropriate in long sequence situations, and DCTCNN could give a better prediction.

Conclusions
In this paper, we proposed two generative deep learning networks in order to address the long-term historical memory issue of underground coal mine seismic us, DCTCNN can forecast the underground coal mine seismic effectively.In future work, seismic events prediction with deep temporal convolutional networks can be expanded with further investigations.First, the effect of monitoring seismic dataset for the proposed networks will be analyzed, and they will be validated on larger underground coal mine seismic time series datasets.Second, the spatial information of coal mine roadway, e.g., distances between seismic monitoring points or other spatial features, will be taken into account.
ird, the experimental datasets will be collected from more than one coal mine, and the proposed temporal convolutional networks will be trained in a transfer learning way.

Table 2 :
Seven compared methods performance on UCI seismic bumps dataset.