Comparison of recent S-wave indicating methods

Seismic event consists of surface waves and body waves. Due to the fact that the body waves are faster (P-waves) and more energetic (S-waves) in literature the problem of their analysis is taken more often. The most universal information that is received from the recorded wave is its moment of arrival. When this information is obtained from at least four seismometers in different locations, the epicentre of the particular event can be estimated [1]. Since the recorded body waves may overlap in signal, the problem of wave onset moment is considered more often for faster P-wave than S-wave. This however does not mean that the issue of S-wave arrival time is not taken at all. As the process of manual picking is time-consuming, methods of automatic detection are recommended (these however may be less accurate). In this paper four recently developed methods estimating S-wave arrival are compared: the method operating on empirical mode decomposition and Teager-Kaiser operator [2], the modification of STA/LTA algorithm [3], the method using a nearest neighbour-based approach [4] and the algorithm operating on characteristic of signals’ second moments. The methods will be also compared to wellknown algorithm based on the autoregressive model [5]. The algorithms will be tested in terms of their S-wave arrival identification accuracy on real data originating from International Research Institutions for Seismology (IRIS) database.


Introduction
The indication of seismic event moment of arrival is vital in the processing of seismic signals, e.g. when the problem of event's energy evaluation is considered (which is mostly done by computing the integral of the registered signal from it's moment of arrival till the end). Moreover, the result has to be rescaled in accordance to the distance between the sensor and the event's hypocenter (due to inverse-square law). Naturally when this approach is conducted one has to estimate the moment of wave arrival and the distance (which is commonly solved with the aid of onset moment indicated by at least four different single-axial sensors).
Single excitation indicate various waves which differ in terms of propagation patterns. They can be divided into [6]: body waves, which travel through the Earth interior, and surface waves which propagate along the interface between differing media. As their name suggest the main difference between the body and surface waves is the place of their occurrence. Another difference is their carrier frequency which for body waves is much higher thus their identification on seismometer data is relatively easy.
Thebody waves can be qualified as Primary waves (P-waves, pressure waves) and Secondary waves (S-waves, shear waves). As the P-waves are the fastest one (especially faster than the S-wave), sensors register their moment of arrival as first. When the sensors are relatively close to the event epicentre the delay between event's onset moments are small thus the indication of arrivals other than P-wave is difficult. The indication of waves arrival may be considered as a case of finding a structural break detection [7][8][9][10]. Recently couple new methods to indicate the S-wave arrival were created. However, the results of each are obtained from different set of data. Thus this is quite hard to tell which of them is the most accurate. It was decided to compare these new methods altogether with one well-known method. All of the methods employs the same real data set.
First part of the algorithm, EMD, is useful for processing natural signals as it doesn't need assumption about its linearity and stationarity [11]. The EMD divides the time series into Intrinsic Mode Functions (IMF) which form complete and nearly orthogonal basis for the signal. Each mode function fulfils two constraints: having only one extreme between zero crossing, and the mean value of their upper and lower envelope must equal zero.
TKEO is a nonlinear operator which is used to estimate the energy of signal. For n-th sample of discrete signal x[n] it is defined as [12] Ψ In this method the mean TKEO is used. It is described as: where N is the window length. Kirbas-Peaker algorithm has the following steps: 1. The signal is filtered by 10-point moving average and IRR Butterworh band-pass filter [13].
2. The resulting signal is divided into windows of N samples.
3. The EMD algorithm is applied to each window. The IMF with the most visible S-wave is taken. 4. In each window the mean TKEO is calculated.
5. First window with mean TKEO exceeding predefined threshold th s implies the period with S-wave onset.
6. The sample with maximum TKEO value (in the corresponding window) is chosen as actual S-wave arrival moment.

Rawles-Thurber Algorithm
The method [4] was presented in 2015 and uses a nearest neighbours-based approach of Nikolov [14]. A method does not need to use parameters estimated from data, but uses data itself to build a model. The main issue is to aggregate the set of reference data with P-waves and S-waves picks which is compared to the analysed seismic signals. The method may somehow be considered as a clusterisation problem [15,16]. The Rawles-Thurber algorithm uses 3 traces of a signal. To indicate the S-wave, firstly the P-wave onset is pointed. The method consists of following steps: 1. In the first step, two reference data sets are built: for P-waves from vertical trace and for Swaves from horizontal traces. Each reference set consists of positive R + and negative examples R − , where positive examples are N-length windows surrounding the P-wave/S-wave onsets and negative are windows with length of N containing for example noise or P-wave/S-wave coda.
2. The next move is windowing and data preprocessing by applying the bandpass filter. All of signal's traces are divided into moving-windows of the same length as reference windows. Each window is scaled as follow: the absolute amplitude of window is divided by the mean of its first half.
3. For each window of N-length, the similarity metric d(r, s) to reference examples is computed, where d(r, s) is the sum of squares of the Euclidean distance between the sample of reference signal r i and observed signal s i : 4. Subsequently the score function R(s) is determined for each window as the ratio of the negative to positive distances of the particular trace: As for S-wave identification two horizontal traces are used, to appoint the S-wave score function R S (s), two horizontal scores R H1 (s) and R H2 (s) are multiplied: 5. The centre of the window with the biggest value of score R P (s) for a vertical trace indicates the moment of P-wave onsets, and the biggest R S (s) -the moment of S-wave onset. In addition, the R P (s) and R S (s) scores must exceed the thresholds and the difference between P-wave and S-wave picks has to be located in user-supplied range.

Tasic-Runovc Algorithm
Tasic-Runovc approach [3] is a modification of STA/LTA algorithm and needs three-component seismic station monitors with velocities in east-west (E), north-south (N) and vertical (V) directions. The STA/LTA algorithm is based on the energy analysis by the ratio of the short-term average (STA) energy to long-term average (LTA) energy level from the same seismogram [17]. In T-R method the modification of STA/LTA is applied through the transformation of signal f (x i ). The nominator still gives the information about the sensitivity for local changes in the amplitude as the STA segment, however the denominator informs about the amplitude variation from the sample i to the end of the signal. The algorithm is as following:

Each trace of signal is saved as vectors
The length of the signal is k and an i-th sample is in the range 1 < i p < i s < k, where i p indicates the moment of P-wave and i s -moment of S-wave arrival.
2. To estimate the relative total energy, the vector O consisting of the sum of squares of the components of 3 traces is used: 3. To appoint the S-wave pick, the f (x i ) function is made: where l is the window's length. 4. The arguments of the function above are components of vectors Y E , Y N and O. The next step is to make a characteristic vector C with components: 5. If the sample of the characteristic function exceeds the user-supplied threshold as the first one, sample i s is taken as the moment of S-wave arrival.

Sokolowski-Wylomanska-Zimroz algorithm
The algorithm is expansion of P-wave indicating algorithm [18]. The procedure is based on the characteristic of signals' empirical second moment. It is known that when X t is consisted of i.d.d. random variables with mean µ and variance σ 2 they fulfill [19]: This clearly induces that when the cummulative sum of squared signal n i=1 x 2 i is considered, one has to be aware that if x i is i.i.d. the regression line can be fitted. It is common to assume that the seismic noise fulfils the assumption of being i.d.d. However, when signal's part after the P-or S-wave onset moment is considered the signal is clearly not i.i.d.
It was noticed that the statistics after the P-wave onset moment (and before the S-wave moment) becomes concave. It was also noted that when applying the logarithm to the statistics the resulting function is in general concave from the beginning till the onset moment and from the onset moment till the end (if on whole its course it is concave it means that no impulse was registered).
In the paper it was also observed that the logarithmic statistic when considered for the part of signal which includes the seismic wave is well approximated with exponential function. This led the authors to the following algorithm: 1. Determine the moment of P-wave arrival (p) for input signal s t of length n. Take x t = s t+p−1 for t ∈ {1, 2, ..., n − p + 1}.

Compute
T and e a 2 x+b 2 + c 2 to L 2 T sum their squared errors (SSE). Find i which minimises SSE.
4. S-wave arrival is indicated for i + p.

Takanami-Kitagawa Algorithm
Algorithm of Takanami-Kitagawa [5] was proposed in 1988 and uses the AR model to analyse seismic signal. Despite the fact of seismic signals being non-stationary, they can be divided into two subseries and each of them can be approximated by the AR model. The method to obtain the moment of S-wave arrival is as follows: 1. The signal is divided into two subseries before and after the unknown moment of S-wave arrival: where ε i n is a Gaussian noise with mean zero and variance σ 2 i , a i m is the autoregressive coefficient, M(i) is the order of the i-th model, s 1 is the unknown arrival time and N is the length of series. The [13] is considered as the "noise" part of signal and [14] as the "earthquake" part of signal, where S-wave appears. The length of "noise" window increases together with the s 1 , while the "earthquake" window's length decreases.

2.
To measure the quality of estimated model, the Akaike Information Criterion (AIC) is used: where L max is the maximised value of the likelihood function for the model and M P is the number of independently estimated parameters. For each potential time of S-wave arrival s 1 AIC is calculated for the first part of signal AIC 1 i and second part AIC 2 i .
3. Then the sum of two AIC is taken: 4. The sample i min with minimum value of AIC i is searched.
5. The sample i min + 1 responds the S-wave pick.
The above-mentioned algorithms are tested on two signal sets from International Research Institutions for Seismology (IRIS) database with reference picks. Each of them contains the events from West Coast of South America. The first set consists of 40 signals registered in Brazilian seismic station SAML from different earthquakes in 2003-2017, while the second one contains 100 series from LVC station events registered in 2002-2017. Both data sets are sampled at 20 Hz, the magnitudes of events are above 5.5 and the distances between stations and earthquakes are between 7 • and 20 • .
If the method does not provide otherwise, the input seismic signals are not denoised as such step can make onset moment indication easier [20,21] In K-P, R-T and T-R algorithms, the setting of few parameters is essential and most of them is determined by the minimum mean squared or absolute errors between reference and auto S-wave picks on the mesh with different values of parameters.
In Kirbas-Peaker approach the optimal length of window N, the number of IMF and threshold th s are chosen from the mesh of mean absolute errors between references and auto picks for particular parameters.
The least error is reached for north-south trace and parameters which are showed in table 1. In Rawles-Thurber method the thresholds th P and th S are appointed from the mesh of mean absolute errors between picks and the length of window was taken as in Rawles-Thurber paper [4]. The range of distance between P-wave and S-wave picks is determined on the base of reference picks in data sets. A different reference set from other observed signals (39 for 1st set and 99 for 2nd set) is created for each signal. The parameters used in this method are presented in table 2. Table 2. Parameters used in Rawles-Thurber algorithm 1st data set 2nd data set length of window N (in samples) 67 67 P-wave threshold th P 0.0001 0.03 S-wave threshold th S 0.0001 0.0001 minimum distance between P and S-wave S P min (in samples) 200 900 maximum distance between P and S-wave S P max (in samples) 3100 3100 To chose the best parameters for Tasic-Runovc algorithm the mesh with mean squared errors is applied. Optimal threshold th S and the length of window l are presented in Table 3. Table 3. Parameters used in Tasic In Takanami-Kitagawa algorithm, the AR models are matched to data starting from the sample i P + 200, where i P is the moment of P-wave onset and ending on the end of the signal. In the first step of fitting autoregressive models the "noise" part consists of the samples i n : where i 1 = i P + 200 and the "earthquake" part from the samples i e : where N -the length of signal. The first and second part was moving on the right in the each consecutive step (the last element in "noise" part and the first one in "earthquake" part are changing) and the best models of AR are fitted to each part. Maximum order of AR model to first part of signal is 10 and to second part -15 in both data sets.

Application to the particular signal
In the figures below the application of all five methods to the one signal from the first analysed set of data is shown. The upper panels of all figures present the signal with indicated reference S-wave onset. Figure 1 presents the implementation of Kirbas-Pekaer to the exemplary signal. The lower panel illustrates the mean TKEO value computed for each window of signal. The horizontal solid line represents the S-wave threshold th S and the distinguished box indicates the window exceeding the value of threshold as the first one. The sample with maximum value of mean TKEO in that window responds the S-wave pick. In Figure 3 the application of Tasic-Runovc method can be seen altogether with the statistic which determines the results. It can be seen that with the S-wave onset moment the statistics considerably rises which confirms the algorithm idea.

Algorithms' results
In this section all methods are compared. Their results are summarised by such statistics as number of events whose evaluated S-wave arrival are closer than 10 seconds, mean of absolute differences between the estimation and real moment, its standard deviation, and standard deviation of differences. In Table 4 the results of algorithms tested on the first set (which consists of 40 events) are presented. The bold numbers symbolise the algorithm with best performance in specific statistics. In Table 4 it can be seen that the Takanami-Kitagawa and Tasic-Runovc algorithms has the most number of indications closer than 10 seconds (15 out of 40 which is 37.5%). However other statistics (mean value of errors, and their standard deviations) clearly show the superiority of Sokolowski-Wylomanska-Zimroz method. The Kirbas-Peaker algorithm seems to be the least accurate. In Table 5 the results of methods applied to the second set (100 signals) are presented. This time the results are almost the same as in Table 4. The biggest change is the small number of accurate picks of Tasic-Runovc algorithm (Previously 37.5% of indications and now 1%). Also the results of Sokolowski-Wylomanska-Zimroz algorithm seem to be more accurate than for the previous set.

Conclusions
In this paper the problem of S-wave arrival has been presented. Within this four relatively new algorithms indicating moment of S-wave arrival were compared altogether with one earlier and more commonly used method. The algorithms were tested on real data which was gained from the International Research Institutions for Seismology database.
The results for the first data-set provides that the Takanami-Kitagawa and Tasic-Runovc algorithms have the most number of accurate picks for, however they are not the most accurate algorithms, as the Sokolowski-Wylomanska-Zimroz indicated picks that were on average most accurate with the results having the lowest variance (and in this terms the Takanami-Kitagawa is the second best). The results also show that the Kirbas-Peaker is the least one and the Rawles-Thurber is last but one in terms of results accuracy.
For the second data-set only Takanami-Kitagawa algorithm has the biggest number of accurate picks (Sokolowski-Wylomanska-Zimroz has the second best number and this time Tasic-Runovc method has only one accurate pick). In terms of accuracy once again S-W-Z algorithm is the most