An automatic P-wave onset time picking method for mining-induced microseismic data based on long short-term memory deep neural network

Abstract The automatic P-wave onset time (P-onset) picking of microseismic (MS) waveforms generated during rock failure is the basis of and key to locating the source and exploring the failure mechanism of rock failure in underground engineering. However, it is hard to ensure the accuracy in picking P-onset of MS waveforms with a low signal-to-noise ratio (SNR) mixed with noises. Moreover, traditional P-onset picking methods mainly depend on manual intervention, which imposes onerous requirements on the SNR of MS waveforms. Hence, a new P-onset picking method is proposed based on the deep learning model: the principal components of MS waveforms are extracted based on the multi-channel singular spectrum analysis (MSSA) method to reduce the influence of noise; afterwards, a short-time Fourier transform (STFT) is applied to the processed MS waveforms to attain essential parameters of the waveforms; finally, a model for P-onset picking is established based on the long short-term memory (LSTM) network. Subsequently, a comparison is conducted between the proposed method and other methods based on actual field MS data collected from Xiadian Gold Mine and the seismic data from Stanford Earthquake Dataset. The results show that the proposed method can accurately extract data features of MS waveforms and further improve the P-onset picking performance.


Introduction
In recent years, the MS monitoring technology has shown its remarkable advantages in three-dimensional (3-d) real-time monitoring and spatio-temporal early warning of disasters. It has been widely used in monitoring various ground pressure-related disasters (including rockburst, rib spalling, and roof fall of roadways, and caving of the goaf) occurring during the excavation of long-distance, large-scale and deep group geotechnical engineering operations (Dai et al. 2016, Zhao et al. 2018. By identifying and analyzing the MS waveforms, scholars have further investigated a series of engineering problems (e.g. water inrush (Zhao et al. 2020), rockburst (Lu et al. 2015), and gas outburst (Lu et al. 2014)) and achieved favourable research results.
The deformation and failure of rock mass essentially mean the initiation, propagation, and interaction of fractures under engineering-induced disturbance, during which MS waveforms are released and can be synchronously recorded by multiple MS sensors distributed in the vicinity. Performing inversion analysis on the MS waveforms containing rock failure information can rapidly acquire important data such as three factors (i.e. time, location, and magnitude) pertinent to rock failure and source mechanism (Backers et al. 2005). The analysis result of the aforementioned data is mainly dependent on the accuracy of the P-onset (K€ uperkoch et al. 2012). Inaccurate P-onset picking directly influences the location accuracy of MS events. Besides, inaccurate P-onset picking also significantly affects the inversion result of the rock failure mechanism (Linzer et al. 2015, Vavrycuk 2017. The data with a low signal-tonoise ratio (SNR) will lead to inaccurate P-onset, thus further showing unfavourable influences on the moment tensor inversion. Therefore, picking the accurate P-onset is the basis of, and key to, performing seismic source location and later inversion of failure mechanism. The MS waveforms in mines are more likely to be influenced by the industrial noise, the wave propagation, and the distribution of the monitoring sensors; the monitored MS waveforms show complex components and datasets are large; there are specific requirements on the timeliness of processing the MS data and the location accuracy in mine fields. These requirements introduce a difficulty to engineers in processing and analyzing the data. Therefore, it can be seen that the research of the P-onset picking method is a challenging subject, and the solution of the problem has important engineering value.
Conventional P-onset picking methods can work well when used in MS waveform with a high SNR, such as the widely applied STA/LTA method (Allen 1978, Stevenson 1976 and Auto-Regression model and Akaike Information Criterion (AR-AIC) method (Sleeman and van Eck 1999). However, it is challenging to pick the P-onset from MS waveforms with a low SNR. Given this, scholars have proposed various improved methods to increase the picking accuracy of the P-onset based on conventional methods. For example, in improving the STA/LTA method, K€ uperkoch et al. (2012) proposed picking the P-onset with high-order statistics and attained the characteristic function by calculating the skewness and kurtosis within sliding windows. The core of the STA/LTA method is the construction of characteristic functions. Zhu et al. (2017) proposed an improved time-window energy eigenvalue method, introducing an edge detection factor and a constraint factor to help identify the initial onset. Mousavi et al. (2016) proposed another energy-ratio-based method in the wavelet domain. These modified STA/LTA methods based on the improved eigenfunction have advantages for different characteristic waveforms. However, such methods need to set a fixed trigger threshold, which can not meet the actual needs of mine MS monitoring. Additionally, scholars have also investigated P-onset picking under various techniques, including fractal theory (Chi-Dur an et al. 2017), cross-correlation (Akhouayri et al. 2014), SVD-based method (Iqbal et al. 2016), classificationbased machine learning methods (Chen 2018, Chen 2020, and neural networks Beroza 2018, Mousavi et al. 2020).
Due to the excellent feature extraction ability of deep learning technology than traditional methods, it has achieved great success in computer vision, machine translation, signal processing, and other fields (Lv et al. 2019, Singh et al. 2021. Scholars have carried out meaningful research on seismic waveform P-onset picking using the deep learning method in recent years. For example, in the convolutional neural network (CNN) (one of the typical deep learning models) based works, Chen et al. (2019) combined the convolutional neural network (CNN) method with the clustering-based method to obtain an even better P-onset picking result. Ross et al. (2018) transformed the problem of P-onset picking into the problem of image feature extraction and realized the P-onset picking using the excellent feature extraction function of CNN. Zhang et al. (2020) developed a new P-onset picking method by combining CNN with the continuous wavelet transform (CWT) method, which improves the robustness of the arrival picker to noise. Othman et al. (2022) proposed an automated event detection and denoising method using residual deep CNN. A residual deep neural network is customized and employed to detect passive seismic events in this work. It exhibits high accuracy in detecting and denoising events from real passive seismic data sets.
However, the information in the CNN is transferred unidirectionally, and there is no connection between the nodes in each layer, so it is impossible to simulate time-dependent relationships therewith. As a result, the CNN could not extract the time-domain features of the monitoring signal. Then, considering the features of the original time series of MS signals, the P-onset picking method may achieve better results, which needs to be studied. A recurrent neural networks (RNN) model-long short-term memory networks (LSTM) has a cyclic network structure and the ability to memorize historical information. That is to say, the LSTM can extract the time domain characteristics of time-series signals. At present, the LSTM has been widely used for the analysis of time-series signals, such as those required for traffic speed prediction (Ma et al. 2015), power load forecasting (Zheng et al. 2017), and speech recognition (Graves and Jaitly 2014). However, in the mine MS monitoring field, scholars have done little research on the LSTM in MS signal P-wave picking. As a result, they have not effectively combined the advantage of LSTM with the time-series feature of the MS signal. Therefore, this paper uses the advantages of LSTM to study the P-onset picking method.
Hence, a supervised method for jointly picking the P-onset from MS waveforms generated during mining based on MSSA (pre-processing, extracting valid MS waveform components) and LSTM deep neural network (for training an accurate P-onset picking model) is proposed. Finally, the proposed method is verified based on the actual MS waveforms recorded during the rock failure in the test stope of the Xiadian Gold Mine in Shandong Province, China, for automatic, high-accuracy processing of MS data. This paper aims to realize the accurate picking P-onset of MS waveforms, which will lay a foundation for the subsequent high-precision location and inversion analysis of rock failure mechanisms.

MS monitoring data in Xiadian gold mine
The Xiadian Gold Mine, located in Zhaoyuan City, the 'Gold Capital' of China, is one of ten mines producing gold in China. The vein-like and quasi-lamellar main orebody of the mine occurs at the elevation range of À600 to À1470 m, extending gently and wavily along with the strike and dip (left panel, Figure 1). Xiadian Gold Mine, with a current mining depth of 800 m, is typical of deep mining operations in China. The underground mining environment in the mining area is adverse, especially under the long-term effect of high in-situ stress. A large amount of elastic energy is accumulated in surrounding rocks, having the occurrence risk of severe ground pressure behaviour (rockburst). At present, to effectively grasp the stability of surrounding rock during deep mining, a test stope (lower right panel, Figure 1) was set between the 550 and 551 exploration lines at a sub-level of À700 m in the mine. Based on the prevailing conditions on-site, six MS sensors (five single-component geophones and one three-component geophone with bandwidth of 9 to 2000 Hz and sensitivity of 80 V/m/s ± 10%) with 6000 Hz sampling frequency were distributed near the sub-levels of À692 m and À700 m in the test stope, guaranteeing good location accuracy of MS events. Four and two MS sensors were separately distributed at sub-levels of À692 m and À700 m, respectively, as shown in the right panel in Figure 1. MS sensors were densely arranged in the range of 4109813.06 m to 4109931.55 m in the north and 529676.57 m to 529753.62 m in the East. During the mining period of the test stope, 9216 waveforms were collected by the MS monitoring system, and the adequate monitoring time was from November 9, 2015, to March 31, 2016. After eliminating blasting, mechanical vibration, and other interference waveforms, 4465 effective MS waveforms induced by rock failure were selected. Finally, 4465 waveforms are expanded to 10000 by waveform translation to ensure enough training samples.

Public seismic datasets -Stanford earthquake dataset (STEAD)
In addition, to more rigorously clarify the P-onset picking performance of the model proposed in this paper, STEAD (Mousavi et al. 2019) is used as the benchmark dataset. STEAD gives the results of manually picked P-onset, SNR, and other important information of each seismic waveform, which is convenient for the training and testing of the P-onset picking model. It facilitates training, validation, performance comparisons, and the adoption of best practices. Moreover, this data set could have applications beyond seismology. The database is publicly available through https:// github.com/smousavi05/STEAD. The training and testing of the P-onset picking model mainly use 10000 seismic waveforms with manually P-onset picking results in "chunk 2" of STEAD.

Theory
In this section, with the help of the multi-channel singular spectrum analysis (MSSA) method and LSTM deep neural network, a new joint picking method (MSSA-LSTM P-onset picking method) for mining-induced MS waveform is proposed. The selection of the MSSA method is because the method can extract valid MS components in the sequence of waveforms by extracting the generality (representing the attribute of a failure source) of multiple waveforms from the same seismic source. Furthermore, it weakens the effect of noise on the first motion amplitude, thus improving the accuracy of P-onset picking. In addition, based on the phase space reconstruction, the method does not change the phase and frequency of waveforms and thus maintains the properties of the waveforms (Zhigljavsky 2010). The basis of selecting the LSTM model is that LSTM can memorize the historical information and thus show the unique superiority in processing the sequence waveforms. Furthermore, it is feasible to train the corresponding characteristics of P-onset by summarizing the trend in the waveforms before and after P-onset.
3.1. MS waveform pre-processing method using MSSA method 3.1.1. Basic principle of the MSSA The MSSA is extended based on singular spectrum analysis (SSA) proposed by Read (1993), which is used to process multi-dimensional sequence waveforms. As an emerging time-series analysis method in recent years, MSSA allows the data space to be projected into subspaces with different features and further characterizes the properties of the subspaces with singular values. On this basis, the principal components (PCs) of waveforms can be extracted using the principle of reduced rank (Zhigljavsky 2010). The MSSA is more advantageous than traditional spectrum analysis methods as it not only can identify different frequencies of waveforms but can sort the waveforms according to their energy levels.
The same seismic source inducing rock failure comprises four or more MS waveforms (the datum belongs to a multi-dimensional sequence). Moreover, the released MS waveforms during rock failure often show a similar trend over time . Therefore, it is feasible to extract the components representing the characteristics of the seismic source with the MSSA method to weaken the influence of noise. The MSSA for MS waveforms mainly involves three parts: calculating the tracking matrix based on original waveforms, decomposition of singular values of the matrix, and reconstructing PCs of MS waveforms.
The deduction behind the method is described as follows (Huang et al. 2016, Oropeza andSacchi 2011): The MS waveforms for a seismic source consisting of L waveforms, each of which is composed of N sampling points, is displayed as follows: (1) The time series x ðlÞ i in the lth channel is arranged as a track matrix with M rows and N-M þ 1 columns according to the embedding dimension M： . . . ( 2) where, X represents the matrix with L Â M rows and N-M þ 1 columns, whose covariance matrix C X ¼ XX T belongs to an L Â M Toeplitz matrix; where, C ll 0 refers to the covariance matrix for the lag between MS waveforms in the l th and l 0th channels. The optimal values of the elements in the j th row and j 0th column are estimated with Yule-Walker method: The singular value decomposition is conducted on C X to attain the singular value vector R and the corresponding eigenvector matrix P. The singular spectrum r is solved. A large singular value reflects the PCs (generalities) of multiple waveforms from the failure seismic source and a low value represents noise components. The orthogonal projection a i, k of the l th original waveform x ðlÞ i from the seismic source on the eigenvector P k is expressed as follows: where, a i, k refers to the k th PC; P ðlÞ j, k denotes the component of the k th eigenvector P k in the l th channel for the lag time j, which reflects the spatial evolution (with channel l) and also temporal evolution (with time j).
Afterwards, the original multi-dimensional MS waveformsx ðlÞ i, k (that is, the principal reconstructed component (RC)) from the seismic source can be reconstructed based on the k th PC a i, k and the eigenvector P k : Hence, the result of the linear addition of L Â M RCs (x ðlÞ i, k ) is equivalent to the original MS waveform x ðlÞ i from the seismic source. To extract the PCs of the multiple waveforms from the seismic source, it is feasible to select the first K RCs with the most significant contributions. Thus, the approximationx of the original multidimensional MS waveform from the seismic source can be expressed as follows: i, k , :::, The new multi-dimensional MS waveforms constructed through Eq. (7) correspond to the PCs of the original multi-dimensional MS waveforms from the seismic source, which remain valid waveforms for rock failure that can reflect the properties of the seismic source and ignore the interference from noise as far as possible.

Characteristics of singular spectrum of mining-induced MS waveforms
The characteristics of the singular spectrum are analyzed by selecting the typical MS waveforms taken from Xiadian Gold Mine. An MS waveform with the length N of 4000 and the sampling frequency of 6000 Hz stochastically selected from the MS database is used for analysis, with embedding dimension M ¼ 20 determined by the method proposed by Cao (1997). Figure 2 shows the MS waveform and characteristics of the singular values of the MS waveform. Figure 3 lists the distribution of energy contributions of various RCs. Figure 2b shows the first six principal RCs: the amplitude of the RC 1 is high, at about ten times that of RC 2. The waveform amplitudes gradually reduce from RC 2 to RC 6, in which the amplitude of RC 1 is about 8000 times that of RC 6 (RC 3 to RC 6 contain significant noise). Figure 2c displays the changes in the singular values of the waveform. Similarly, the first singular value is the largest, while the ratio of the first to the second singular values under the effect of noise is lower than that in waveforms unaffected by noise. Figure 3 shows that the energy contributions of RC 1 and RC 2 are 89.75% and 10.02%, respectively. According to the principle that the energy of a valid signal component is higher than that of noise in MS waveforms, it can be thought that RC 1 is the valid PC of the waveform, and the signals reconstructed from the rest of the singular values are noise. By comparing RC 1 with the original waveforms (Figure 2d), valid MS waveform components are retained and noise removed from the waveform of RC 1, thus improving the picking accuracy of the P-onset.

LSTM network
3.2.1. The basic structural principle At present, commonly used deep neural networks include feedforward neural network (FNN), CNN and RNN. In the FNN, the information transmission is unidirectional, and nodes in each layer are not connected. As a result, the network's output each time only relies on the current input, so it fails to clarify the simulated temporal relationship. On the other hand, A RNN can be considered as the extension of FNN at the time scale, which considers the sequential relationship of the input information at an adjacent step time and can memorize historical information. The network structure of RNN is displayed in Figure 4. X, S, and O denote vectors, representing the values in the input layer, hidden layer, and output layer, respectively; U denotes the weight matrix from the input layer to the hidden layer while V refers to the weight matrix from the hidden layer to the output layer; W represents the recurrent weight matrix between the hidden layers at the current time step and at the adjacent time step.
The circulation layer is calculated as follows (Graves 2012a): where, O t , S t , and x t refer to the output value of the network, the value of the hidden layer, and the input value of the network, all at the time t, respectively; S tÀ1 represents the value of the hidden layer at t-1; gðÁÞ and f ðÁÞ denote the corresponding activation functions. It can be seen from the above equations that the value of S t is not only related to x t but also to S tÀ1 , that is, the output at each time step contains some information about all input vectors at the current time and that before. It means that the connected structures in an RNN can memorize the input information at the historical time steps in the network and persistently influence the subsequent state of the network to some extent, thus affecting the final output of the network. This is why the RNN is able to memorize historical information. Under the memory function, RNN is suitable for processing temporal or spatial sequence problems. However, the vanishing and exploding gradient problems will occur in the training process with RNN with the growth of the depth of the model owing to many parameters being required to be trained (Bengio et al. 1994, Graves 2012. To solve this problem, Hochreiter and Schmiduber. (Graves 2012) first proposed LSTM method.

Construction of the LSTM model
The LSTM network is composed of several typical LSTM units in series, and the unit structure is shown in Figure 5. The LSTM cell attempts to add or eliminate information for neurons with the introducing the concept of a 'gate' that is, controlling the input and output of information to realize the memory function. The LSTM unit contains three gates (an input gate, output gate, and forget gate), which are used to protect and control the state of the storage cell C.
The forget gate aims to control how much information in the storage cell C tÀ1 at the previous moment is to be kept in the storage cell C t at the current moment. The input gate mainly controls how much information on the input X t at the current moment is to be stored in the storage cell C t : The output gate is mainly used to control how much information in the storage cell C t at a certain moment is passed to the hidden state h t of the cell. In a forget gate, the forgotten part of a state memory cell is commonly determined by input X t , state memory cell C tÀ1 , and intermediate output h tÀ1 : X t in an input gate transformed by sigmoid and tanh functions commonly determines the retaining vector in the state memory cell. The intermediate output h t is jointly determined by updated C t and output O t : The above working principle is exemplified by the following equations: where, f t , i t , g t , O t , C t , and h t separately represent the forget gate, input gate, input node, output gate, and memory cell and state of immediate output; W fx , W fh , W ix , W ih , W gx , W gh , W ox , and W oh indicate the matrix weights of the corresponding gate multiplied by the input X t and intermediate output h tÀ1 , respectively; b f , b i , b g , and b o denote the bias terms of the corresponding gates; stands for that elements in the vector are subjected to bit-wise multiplication; r and / refer to the sigmoid and tanh functions, respectively. Through the operation of the aforementioned gates, it is feasible to record the historical information of the time series and memorize the attributes at the current and previous times. Due to that characteristic, the LSTM can be used to process the Ponset problem. In the learning and training of the LSTM, the error back-propagation method is used to update the weight and bias. A loss function obtained based on the common error evaluation method (that is, minimizing the sum of mean errors) is established as the objective function of the training.
where, e, y t and p t denote the loss function (that is, the square sum of errors), the true output value and the forecast value based on the network, respectively.
3.3. MSSA-LSTM P-onset picking algorithm procedure MS waveforms are time series data; they contain background noise signals in the front, followed by a P-wave, then the superposition of an S-wave, and finally the appearance of a coda wave with the attenuation of the waveform. According to the characteristics of MS waveforms, a segment of MS waveform can be divided into a noise section, a signal section, and a coda wave section ( Figure 6). Here, the coda wave section is defined as a waveform section with an amplitude lower than a certain threshold after the waveforms are attenuated, in which the first motion amplitude is taken as the threshold. As the working principle of the LSTM described above, the corresponding output of the P-onset not only depends on the input at the current time but also on the input at the previous time for a segment of MS waveform sequence. That is, whether a sampling point belongs to the noise section, signal section, or coda wave section is determined by summarizing characteristics of input points at the current time and their difference with those at the previous time. Through constant training, it is possible to determine the signal section from a segment of MS waveforms, and the first sampling point in the signal section corresponds exactly to the P-onset.
To clearly understand the content of the MSSA-LSTM P-onset picking method proposed in this paper, the detailed implementation steps are summarized as follows, the flowchart is shown in Figure 7   At first, MS waveforms are filtered according to a specific frequency range (40 $ 500 Hz determined by spectrum analysis (Zhao et al. 2021)) to eliminate the noise frequencies which cannot exist in the MS frequency range.
Next, waveforms are aligned (the P-wave in all waveforms from the same MS event are aligned). The alignment of signals refers to the method proposed by (Vavrycuk 2017). The waveform with the largest SNR is selected from all received waveforms from each MS event and then subject to cross-correlation with the other waveforms. Under the translation of waveforms, it is feasible to realize the maximum correlation between the other waveforms and this waveform. Therefore, it can more accurately extract the valid components from the aligned waveforms in subsequent analysis based on the MSSA method. Iqbal et al. (2017) and Liu et al. (2017) also give more effective waveform alignment methods.
Afterwards, the waveforms are processed using the manual P-onset picking method, and then the P-onset is marked.

Extracting RC 1 of MS waveforms by the MSSA method
The flowchart is shown in Figure 7b： After selecting an appropriate embedding dimension, a tracking matrix is constructed; the covariance matrix is solved; singular value decomposition is performed, and the energy contribution degree (CD) of RC 1 (the first principal reconstructed component) of each waveform is separately calculated. The threshold e 1 of CD is set (80% in the present study). If the value of CD exceeds 80% (that is, requiring that RC 1 plays a dominant role), the requirement is satisfied. Furthermore, the result of RC 1 of the waveform representing the seismic wave is preserved. If the value of CD is lower than 80%, the requirement is not satisfied. In this case, the waveform of the seismic source is abandoned.
The implementation of steps 1 and 2 is displayed in Figure 8. The eight waveforms in the figure originate from a certain MS event. The result of the aligned waveforms is shown in Figure 8a. The first four PCs (PC 1 to PC 4) after MSSA solving are shown in Figure 8b. The comparison between PC 1 and the original waveform of the seismic source is displayed in Figure 8c. According to Eq. (7), RC 1 of each waveform of the seismic source can be attained according to PC 1.

Building an LSTM network model for P-onset picking
The flowchart is shown in Figure 7c, and the concept map of building P-onset picking LSTM model is shown in Figure 9: RC 1 of all waveforms of the training samples is subjected to the short-time Fourier transform (STFT) method. The noise components of MS waveforms are complex, and thus it is hard to satisfy the requirement if the P-onset picking is performed based on the original waveform. Hence, the MS waveforms processed by the MSSA method are processed through STFT to attain the essential parameters (e.g. time, frequency, and amplitude characteristics of waveforms) directly that can characterize the MS waveforms (Durak and Arikan 2003). For example, Figure 10 shows the STFT result of the waveform from a specific MS event after feature extraction through MSSA. Noise is inhibited after feature extraction through the MSSA. The changes in the amplitude and frequency of the waveforms (lower panel, Figure 10) after being processed through the MSSA are more significant than those before being processed (upper panel, Figure 10) when the P-wave approaches: the amplitude is enhanced, and the frequency range is broadened; the noise section greatly differs from the signal section, and thus it is easier to extract the P-onset.
In the time-frequency matrix obtained through STFT, the corresponding timeamplitude waveforms are extracted as the characteristics according to different frequency ranges (leftmost panel, Figure 10). According to the fast Fourier transform length of 512, each waveform is divided into 34 time-amplitude waveforms with different frequencies to form a characteristic sequence according to the sampling frequency of MS waveforms. This is intended to acquire a more accurate result by using more characteristics.
Design an LSTM network model for picking P-onset: Each sampling point of the MS waveform is given a label, that is, noise, signal, or coda. The labelled time series data are input into the LSTM network model, and the procedure is shown in Figure  9. Through multiple attempts based on the LSTM network model, 34 LSTM cells are finally attained, in which each cell contains 600 neurons. In the LSTM cells, the dynamic information of the character sequence is recurrently transferred, extracted, and stored according to the temporal order. The LSTM cells are connected with the fully connected layer, in which a further 600 neurons are set. The linear combination of the output of all LSTM cells at the last time step is taken as the input of the fully connected layer to combine the dynamic characteristic information of MS waveforms learned from various LSTM cells at different times. Subsequently, the noise section, signal section, and coda section are distinguished in the Softmax regression layer; finally, the position of the initial point of the signal section corresponds exactly to the P-onset.

Testing the P-onset picking accuracy
The flowchart is shown in Figure 7d: The test data are input; P-onset picking is performed with the aid of the LSTM network model trained through the aforementioned steps, and then the result is compared with that obtained through manual P-onset picking.
The threshold e 2 of picking accuracy is set (90% in the study). That is, if the percentage of the error of P-onset picked through the model based on test data and the manual method within 20 sampling points exceeds 90%, the LSTM model satisfies the requirement and is preserved; otherwise, the model is further trained with increasing training samples until it satisfies the requirement.
The 70% data are stochastically taken from the MS waveform library as the training samples, and the other 30% data are used for the validation.

Evaluation indices
To verify the P-onset picking accuracy of the proposed model MSSA-LSTM, three evaluation indices (i.e. mean absolute error (MAE), mean absolute percentage error (MAPE), and root-mean-square error (RMSE) are used to evaluate the P-onset picking performance. The three evaluation indices are separately defined as follows: xðtÞ ÀxðtÞ j j (17) where, xðtÞ refers to the P-onset picked by manual method, and x _ ðtÞ refers to the Ponset picked by an automatic method.

Analysis of P-onset picking results of MS monitoring data
In order to evaluate the P-onset picking performance of the proposed MSSA-LSTM method, the test data of the mining-induced MS waveforms recorded from the Xiadian Gold Mine are used. The P-onset picking is performed on the test samples separately through MSSA-LSTM, LSTM (the waveform is not processed with MSSA), CNN (Ross et al. 2018), front-term average/behind-term average (BTA/FTA), AR-AIC, and manual methods. BTA/FTA method is one of the optimization methods of STA/LTA method. BTA/FTA method is possible to determine the P-onset by searching for the global maximum, and it is unnecessary to set thresholds (FeedMeImATroll 2022). The P-onset picking performance of BTA/FTA method has been confirmed by scholars (Vavrycuk 2017, Vavry cuk et al. 2021. Figure 13 shows the P-onset picking accuracy within the error range of 0 ms À 20 ms picked using MSSA-LSTM, LSTM, CNN, BTA/FTA, and AR-AIC methods. It can be seen from Figure 11 that the accuracy curves of the five P-onset picking methods rise rapidly in the range of 0 ms À 2.5 ms and tend to be stable after 10 ms. In other words, the P-onset picking errors of these five methods are mainly within 10 ms. The P-onset pickling accuracy of MSSA-LSTM, LSTM, CNN, BTA/FTA, and AR-AIC methods within 10 ms is 93.70%, 90.35%, 88.72%, 85.65%, and 84.95%, respectively. The P-onset picking accuracy of the MSSA-LSTM method is higher than that of the LSTM method due to the principal component extraction function of the Figure 11. The P-onset picking accuracy of MS test data using different methods within the error range of 0 ms -20 ms. MSSA method. The P-onset picking accuracy of the LSTM method is close to CNN but better than BTA/FTA and AR-AIC methods. Figure 12 shows the histogram of the P-onset picking error of MS data using MSSA-LSTM, CNN, BTA/FTA, and AR-AIC methods. The gray dotted line in Figure  12 is the boundary between À10 ms and 10 ms. When the P-onset picking error of an MS waveform is within À10 ms to 10 ms, the P-onset picking of the MS waveform can be regarded as adequate. It can be internally found from the figure that the Ponset picking error of MSSA-LSTM, CNN, BTA/FTA, and AR-AIC methods is mainly concentrated in the range of À 10 ms to 10 ms. However, the P-onset picking error of the MSSA-LSTM method is more concentrated within À 10 ms and 10 ms, followed by CNN, BTA/FTA and AIC. In addition, the P-onset picked by BTA/FTA and, AR-AIC methods present a large discrepancy compared with that attained based on the manual P-onset picking method, and multiple significant errors exist ( Figure  12c and d). Figure 13 displays the two typical MS waveforms and their P-onset picking results based on the above methods. It is evident that the MS waveform in Figure 13a exhibits a significant noise section with a small SNR; the BTA/FTA and AR-AIC methods fail to pick a reasonable P-onset, and the picked polarity is also wrong. In this case, the MSSA-LSTM and CNN methods still can pick an accurate P-onset and deliver a more robust anti-noise performance. It is attributed to the excellent feature extraction function of deep neural networks (LSTM and CNN). Figure 13b shows that MSSA-LSTM, CNN, BTA/FTA, and AR-AIC methods all have high P-onset picking accuracy when the SNR of the MS waveform is large; that is, the difference between signal and noise is noticeable. Table 1      In conclusion, compared with the CNN, BTA/FTA, and AR-AIC picking methods, the proposed MSSA-LSTM method further improves the P-onset picking performance of MS waveforms.
The applicability of the MSSA-LSTM needs to be explained: when the MSSA-LSTM method is used in mine MS monitoring, BTA/FTA, AR-AIC or other conventional methods need to be used for pre-triggering to determine which MS signals locate an MS event, and then the MSSA-LSTM model is used to pick the P-onset after pre-triggering further. However, if only the MS signal without positioning information needs picked, SSA could be used instead of MSSA, and the SSA-LSTM method is used for P-onset picking. Hereinafter, the seismic data without positioning information is used to test the SSA-LSTM method.

Analysis of P-onset picking results of seismic data
Similar to the analysis process in Section 5.1, the seismic data in STEAD was used to test the P-onset picking performance of the proposed method. It should be emphasized that STEAD can provide seismic waveform information but does not include seismic event information. Therefore, it is necessary to revise the MSSA-LSTM method. In this section, the MSSA method is replaced by the SSA method. Figure 14 shows the P-onset picking accuracy within the error range of 0 s À 1.0 s picked using SSA-LSTM, CNN, BTA/FTA and AR-AIC methods. It can be seen from Figure 14 that the accuracy curves of the four P-onset picking methods rise rapidly in the range of 0 s À 0.2 s and tend to be stable after 0.5 s. In other words, the P-onset picking errors of these four methods are mainly within 0.5 s. The P-onset picking accuracy of SSA-LSTM, CNN, BTA/FTA, and AR-AIC methods within 0.5 s is 91.25%, 85.64%, 80.64%, and 77.70%, respectively. The P-onset picking accuracy of the SSA-LSTM method is higher than that of CNN, BTA/FTA, and AR-AIC methods. Figure 15 shows the histogram of the P-onset picking error of seismic data using SSA-LSTM, CNN, BTA/FTA and AR-AIC methods. When the P-onset picking error of a seismic waveform is within À0.5 s to 0.5 s, the P-onset picking of the MS waveform can be regarded as adequate. It can be internally found from the figure that the P-onset picking error of the four methods is mainly concentrated in the range of À 0.5 s to 0.5 s. However, the P-onset picking error of the SSA-LSTM method is more concentrated within À 0.5 s and 0.5 s, followed by CNN, BTA/FTA, and AR-AIC. Similar to the results in Section 5.1, the P-onset picked by BTA/FTA and AR-AIC methods present a large discrepancy compared with that picked based on the manual P-onset picking method, and also multiple significant errors exist (Figure 15c and d). Figure 16 displays the scatter distribution of absolute error of P-onset picking results using SSA-LSTM (Figure 16a), CNN (Figure 16b), BTA/FTA (Figure 16c), and AR-AIC (Figure 16d) with SNR of seismic test data. The red curve in the figure is the outer boundary envelope of scatter points, and the blue curve is the boundary envelope of inner aggregation points. Here SNR of each seismic waveform is directly given in the STEAD. There is a visible trend of decreasing pick error of the four methods with SNR. However, the picking error increases rapidly for SNR < 20, reflecting that actual onsets under these conditions are likely beneath the noise level. In addition, it can be found from the picking error distribution that the number of scattered points of the picking error beyond the inner boundary envelope of the SSA-LSTM method is the least, followed by CNN, BTA/FTA, and AR-AIC. Moreover, the value of the picking error beyond the inner boundary envelope of BTA/FTA and AR-AIC methods is significantly greater than that of the SSA-LSTM method. Overall, the above analysis shows that the SSA-LSTM method canmaintain high P-onset picking performance with low SNR (SNR < 30). Table 2 lists the three evaluation indices of the four methods: SSA-LSTM, CNN, BTA/FTA, and AR-AIC. The MAE values of SSA-LSTM are 0.21 s, 0.38 s, and 0.59 s lower than those of CNN, BTA/FTA, and AR-AIC, respectively; the RMSE values of SSA-LSTM are 0.80 s, 0.96 s, and 1.40 s lower than those of CNN, BTA/FTA, and AR-AIC, respectively; the MAPE of SSA-LSTM are 3.89%, 7.20%, and 9.74% lower than those of CNN, BTA/FTA, and AR-AIC, respectively. Overall, these numbers demonstrate the robustness of the method for P-onset picking with high precision across the entire seismic data set.

Conclusion
Given the complex underground environment in deep mines, an automatic P-onset picking method for mining-induced MS data based on MSSA and LSTM depth neural network is proposed for automatic and high-precision processing of MS data, which lays the foundation for subsequent high-precision location and inversion analysis of rock failure mechanism. The method was divided into three main parts: At first, the valid components of MS waveforms are extracted with the aid of the MSSA method to weaken the influence of noise on the waveforms for rock failure; Afterwards, the STFT is applied to the processed MS waveforms to acquire essential parameters (such as time, frequency, and amplitude) of waveforms for rock failure; Finally, a model for P-onset picking is established based on the LSTM deep learning network, in which the MS waveform data processed through MSSA and STFT methods are used as the input while the P-onset picking result is taken as the output of the model.
Based on the actual MS data induced by deep mining of Xiadian Gold Mine and the seismic data in a benchmark dataset STEAD, the proposed MSSA-LSTM P-onset picking method is tested numerically and compared with CNN, BTA/FTA, and AR-AIC methods: In the P-onset picking of the MS data in Xiadian Gold Mine, the pickling accuracy of MSSA-LSTM, LSTM, CNN, BTA/FTA and AR-AIC methods within 10 ms is 93.70%, 90.35%, 88.72%, 85.65%, and 84.95%, respectively. The P-onset picking accuracy of the MSSA-LSTM method is slightly better than CNN but significantly better than BTA/FTA and AR-AIC methods. In the P-onset picking of the seismic data in STEAD, the P-onset picking accuracy of the SSA-LSTM, CNN, BTA/FTA and AR-AIC methods within 0.5 s is 91.25%, 85.64%, 80.64%, and 77.70%, respectively. Thus, the P-onset picking accuracy of SSA-LSTM method is higher than that of CNN, BTA/FTA and AR-AIC methods. By analyzing the P-onset picking results of the seismic data with different SNR, it is concluded that: the SSA-LSTM method can still maintain high P-onset picking performance on waveforms with a low SNR (SNR < 30).
Although the P-onset picking performance of the proposed model has been improved compared with other methods (CNN, BTA/FTA, AR-AIC), there are still some problems to be explained: The structure of the LSTM model is relatively complex, and the training is more time-consuming than conventional machine learning methods (SVM, BP networks, Elman networks, etc.); The network characteristics of LSTM determine that it can not process data in parallel. Furthermore, with the development of deep learning technology, better models (less training time and higher performance) are constantly proposed, making it possible to improve the P-onset picking performance further and deserves further study.

Disclosure statement
No potential conflict of interest was reported by the authors.

Data availability statement
The data that support the finding of this study are available from the corresponding author upon reasonable request.