Modernization of adaptive matching pursuit method to analyze geophysical signals of pulse nature

. The paper is devoted to the development and comparison of different numerical methods which increase the adaptive property and improve the accuracy of matching pursuit algorithm in connection to geoacoustic and electromagnetic signals. At each step of adaptive matching pursuit, a function is chosen which has the highest correlation with an initial signal. Then parameters of a chosen function are refined. The refinement is performed by the help of different grid methods and methods based on gradient direction search. The paper considers the peculiarities of application of sparse approximation methods to geophysical signals of pulse nature and compares different variants of modification of adaptive matching pursuit algorithm.


Introduction
At present, the task of complex analysis of different nature signals with the aim to investigate the processes of geophysical field interaction is topical in geophysics.A significant part of signals under analysis is of pulse nature and has nonstationary character that considerably complicates their analysis by well-studied classical methods.Complexity of tasks entails the appearance of a larger number of highly specialized analysis methods adapted for concrete features of signals and extending researcher's tools.
This paper is devoted to a new approach to analyze geoacoustic and electromagnetic emission signals.We suggest applying the instrument of sparse approximation to analyze signal time-frequency inner structure which depends directly on generation process characteristics.Sparse approximation was chosen because the signals under investigation are well representable as a sum of a small number of elementary (single-frequency) pulses.There is a large number of algorithms which allow us to estimate sparse approximation, however, the matching pursuit [1] algorithm showed the best result in pulse signal processing and became the basis of the suggested approach.The matching pursuit algorithm was updated and adapted for the signals under investigation.The main difference of the adapted matching pursuit from the original algorithm is the introduction of the refinement procedure of chosen function parameters.We should note that refinement can be done by different ways.The paper investigates and compares some of them.
Initially, the adaptive matching pursuit was developed for geoacoustic emission pulse signals.The system based on it for signal analysis was implemented at the Laboratory of Acoustic Research IKIR FEB RAS at the stage of post processing of geoacoustic emission disturbance data.Application of sparse approximation together with methods of timefrequency analysis and mathematical statistics worked well (the investigation is described in detail in [2][3][4][5]), thus, we decided to test the suggested approach on electromagnetic emission pulse signals.

Adaptive matching pursuit
Sparse approximation search in a dictionary from N functions refers to NP-stiff problems and as for the moment there is no algorithm capable of solving it during polynomial time.Exact solution algorithm requires complete enumeration of all possible combinations of functions, i.e. has factorial complexity O(N!).
One of the algorithms of approximate solution is the Matching Pursuit (MP).This algorithm refers to greedy algorithm class.Its essence is the determination of a function, having the highest correlation with a signal, at each algorithm iteration.A block-scheme of the algorithm is illustrated in Fig. 1a.Matching pursuit has cubic computational complexity O(N 3 ) in case, when a scalar product matrix is calculated by a definition, and logarithmicsquared one O(N 2 log N), when the calculation is performed by fast Fourier transform.
The main disadvantages of matching pursuit algorithm are, firstly, the necessity of application of dictionaries (function sets into which a signal is decomposed) of large volumes to provide sufficient accuracy of decompositions and, secondly, "rough" sampling in the parameter space (Fig. 2).

Fig. 2.
Examples of "rough" sampling in a dictionary parameter space.Both pulses have the frequencies of about 9 kHz, however, in view of absence of a function with such a frequency in the dictionary, they are decomposed into functions with the frequency of 10 kHz.
Assume that a dictionary is composed of analytically defined functions, each of which depends uniquely on a parameter set p (for example, the basic frequency) g(t, p).Function parameters having the highest scalar product with a signal are defined on each iteration of matching pursuit algorithm.Consequently, for a fixed initial signal, the matching pursuit iteration can conditionally be described as a search problem of function maximum of many variables The main idea of the proposed improvements is to add a refinement procedure to the parameters p of a function having the highest correlation with a signal.The developed algorithm was called Adaptive Matching Pursuit (AMP).The block-scheme of the algorithm is illustrated in Fig. 1b.
The refinement can be carried out by different grid methods and methods based on gradient direction search.Further we consider the following in the paper: 1) gradient descend method; 2) different variants of grid search.

Geophysical signal analysis
As it was shown in the papers [6][7], analysis of geoacoustic emission pulse signals, applying a combined dictionary, compound of shifted and modulated Gaussian and Berlage functions, gives good results.Gaussian functions are universal as they have the least possible square of a frequencytime window.The Gaussian function (Fig. 3a) is described by the following analytical expression: where A is the amplitude which is chosen so that ||g(t)||2 = 1; tend is the atom length; f is the filling harmonics frequency; B(tend) is the B parameter limit value calculated by the formula end 2 end 4 ln 0.05 Δ is the coefficient of B parameter variation relatively the limit value.Berlage functions have a form similar to single geoacoustic pulses.The Berlage function (Fig. 3b) can be described by the formula max () where A is the amplitude which is chosen so that ||g(t)||2 = 1; tend is the atom length; pmax is the maximum position relatively the atom length, pmax ∈ [0.01, 0.4]; f is the frequency from 200 to 20000 Hz; n(pmax) is the parameter n limit value calculated by the formula ; Δ is the coefficient of n parameter variation relatively the limit value.Owing to the oscillatory nature of the signals under investigation, the function F(τ, p) has a big number of local extrema that makes the refinement process more complicated.Parameter τ, responsible for the function g(t, p) shift relatively the signal is assumed to be continuous as long as scalar products are calculated for all possible values of the shift τ for each g(t, p) function.
Based on the projection form, we can make a conclusion that the most important parameter is frequency f which affects the scalar product the most.

Gradient descend method
One of the most popular methods of multidimensional optimization is the gradient descend method.Gradient direction is the direction of the fastest function increase, thus, a maximum is determined by the motion along the gradient vector.As long as both the Gaussian and the Berlage functions are defined analytically, and F(τ, p) is a scalar product, to calculate the gradient, we can obtain explicit formulas where λ is the gradient step size and this step may be constant or variable.Depending on the choice strategy of step λ, we obtain different variations of the method, for example gradient method with step splitting, steepest descend method and so on.Initial approximation p [0] is determined by the matching pursuit method from function initial dictionary.Fig. 5 illustrates the results of refinement of the first function frequency parameter entering into the signal decomposition (Fig. 2b) by gradient descend methods with a fixed step λ.In Fig. 5a the dictionary is composed of Berlage functions with four distinctive frequency values (dashed lined), in Fig. 5b those are with five values.Both in the first and in the second cases, the method converges but slowly over 500 iterations on an average.As long as the function F(τ, p) has a complicated structure, realization of the steepest descend method, within which the maximum possible step is made in gradient direction, requires calculation of lengthy derivatives Thus, to improve the convergence, we decided to apply the gradient descend with step splitting.
We choose a limit initial value λ* and a parameter (0,1)

 
, then at each iteration we assume that λ [i] = λ* and check the condition 6 shows the results of refinement by gradient descend method with step λ splitting Owing to the step splitting procedure we succeeded to improve the method convergence, the plotted trajectories were obtained over 20 iterations.

Grid search method
It is the simplest method for maximum search which consists in sequential calculation of function values for different arguments p and in the search for a maximum value.The arguments can be chosen randomly or with an equal step.
When slightly changed, this method can be applied to optimize F(τ, p).The initial approximation p [0] (i = 0) is obtained by classical matching pursuit method.Then in the vicinity of p [0] we make a new grid, F(τ, p) is calculated in its each node and a new approximation of p [i+1] is chosen.If the current and previous approximations coincide, we split the grid step.
Three variants of the grid search method were tested: each parameter is refined separately; parameters are refined jointly; frequency is refined separately, and form parameters are refined jointly.Fig. 7 shows the block-schemes of different variants of refinement by grid search method.The refinement is performed until the required accuracy is achieved or the defined number of refinement iterations is made.
Fig. 8 shows the graphs of the function F(τ, p) projection for the pulse (Fig. 2b) on the coordinate axes (τ, f), and the results of refinement of the first function frequency by different variations of grid search method.In Fig. 8a the parameters are refined separately, in Fig. 8b they are refined jointly, and in Fig. 8с the frequency is refined separately from parameter form.In each of three cases we obtain acceptable solution.

Electromagnetic emission
The main sources of natural electromagnetic radiation in VLF range are lightning strokes and the processes occurring in the ionosphere and magnetosphere [8].The signals caused by lightning strokes are called atmospherics and just like geoacoustic emission they have pulse nature and similar kilohertz frequency range.Thus, we decided to test the approach on the basis of sparse approximation, which showed good results for geoacoustic signals, on electromagnetic emission (Fig. 9).
The authors suggest to decompose atmospherics through a combined dictionary consisting of shifted and modulated Gaussian and Berlage functions.However, in contrast to geoacoustic signals, we decided to include the functions with the frequencies less than 200 Hz into the dictionary since many signals under investigation have long-lasting lowfrequency component (just like in the signal in Fig. 9).From the graphs illustrated in Fig. 10, we can conclude that for atmospherics the frequency f also affects the dictionary function contribution into the signal under investigation the most.
All the above mentioned ways of realization of the refinement procedure were tested on electromagnetic emission signals.Fig. 11 shows the refinement process of the first function frequency f, which is a part of an atmospheric (Fig. 9), by gradient descend method with fixed step, gradient descend with step splitting and three variants of grid search method.For each numerical scheme, the average operation time and the achieved accuracy were measured.As a main criteria for accuracy, we chose the value calculated as a relation of residual norm to initial signal norm on a percentage basis ERR was calculated for each iteration, then the average value was calculated, thus, error average value was calculated for each numerical scheme for different number of functions entering into a decomposition.

Results
Fig. 13 represents the graphs of ERR value dependence on tested algorithm iteration number for atmospherics and geoacoustic pulses samplings.All the tested schemes allows us to improve the algorithm accuracy compared with the classical matching pursuit, however, the grid methods demonstrate better results in accuracy.It is the most vividly demonstrated on the sampling compound of atmospherics (Fig. 13a).We should note that gradient descend methods require parameter more fine-tuning, a peculiar optimal value of step λ with respect to gradient corresponds to each signal.Grid methods are more universal and simple for application.
Table 1 shows the time of algorithm execution for different number of refinement iterations.The fastest are those constructed on gradient method with step splitting and grid search methods, however, since the grid search method with frequency separate refinement gives better accuracy compared to other algorithms, this variant of adaptive matching pursuit will be used in further analysis of electromagnetic and geoacoustic signals.Fig. 14 shows decomposition of signals illustrated in Fig. 2, adaptive matching pursuit algorithm applying the same dictionary.In the result, decompositions from 4 and 7 functions with the frequency of about 9 kHz were constructed with the error of 5%.The problem of «rough» sampling has been solved.

Fig. 4 .
Fig. 4. Graphs of function F(τ, p) projections on different coordinate axes for Gaussian (a) and Berlage (b) functions.Dots indicate the maxima.

Fig. 5 .
Fig. 5. Parameter f refinement by gradient descend method with a fixed step.Initial dictionary contains 4 frequency values (a), 5 frequency values (b).

Fig. 6 .
Fig. 6.Refinement of f parameter by gradient descend method with step splitting, the initial dictionary contains 4 frequency values (a), 5 frequency values (b).

Fig. 7 .
Fig. 7. Block-scheme of refinement procedure for Berlage function by grid search method: parameters are refined separately (a), parameters are refined jointly (b), frequency is refined separately and form parameters are refined jointly (с).

Fig. 8 .
Fig. 8. Parameter f refinement by grid search method: parameters are refined separately (a), parameters are refined jointly (b), f is refined separately, and the form parameters are refined jointly (с).

Fig. 10
Fig. 10 illustrates two-dimensional function F(τ, p) projections of the atmospheric (Fig. 9) on different coordinate axes: (τ, f), (τ, tend), (τ, Δ) for Gasussian function and (τ, f), (τ, tend), (τ, Δ), (τ, pmax) for Berlage function.From the graphs illustrated in Fig.10, we can conclude that for atmospherics the frequency f also affects the dictionary function contribution into the signal under investigation the most.All the above mentioned ways of realization of the refinement procedure were tested on electromagnetic emission signals.Fig.11shows the refinement process of the first function frequency f, which is a part of an atmospheric (Fig.9), by gradient descend method with fixed step, gradient descend with step splitting and three variants of grid search method.

Fig. 10 .
Fig. 10.Graphs of function F(τ, p) projections on different coordinate axes for Gaussian (a) and Berlage (b) functions.Dots indicate the maxima.

Fig. 11 .
Fig. 11.Frequency f refinement by gradient descend methods with fixed step (a), gradient descend with step splitting (b) and three variants of grid search method: parameters are refined separately (c), parameters are refined jointly (d), f is refined separately, and the form parameters are refined jointly (e).

Fig. 12
Fig.12shows atmospheric decomposition in time and time-frequency domains.This decomposition is built by adaptive matching pursuit algorithm with refinement based on grid search method (frequency parameter is refined separately, and the form parameters are refined jointly).

Fig. 12 .
Fig. 12. Atmospheric decomposition by adaptive matching pursuit.In order to choose the best variant of refinement, we made a calculation experiment.We selected 100 typical geoacoustic and 56 electromagnetic signals.The signals were decomposed into functions of a combined dictionary compound of Gaussian and Berlage functions.A dictionary with the following parameters was formed for geoacoustic pulses: f: 5 values from the range from 2000 to 20000 Hz; tend: 3 values from the range from 40 to 100%; pmax: 3 values from 5 to 30%; Δ: 3 values from 1.1 to 5; and for the atmospherics : f: 5 values from the range from 50 to 20000 Hz; tend: 3 values from the range from 20 to 100%; pmax: 3 values from 5 to 30%; Δ: 3 values from 1.1 to 7.For each numerical scheme, the average operation time and the achieved accuracy were measured.As a main criteria for accuracy, we chose the value calculated as a relation of residual norm to initial signal norm on a percentage basis

Fig. 13 .
Fig. 13.Graphs of ERR value dependence on tested algorithm iteration number for atmospherics (a) and geoacoustic pulses (b).

Fig. 14 .
Fig. 14.Solution of "rough" parameter sampling problem The work was carried out by the means of the Common Use Center "North-Easten Heliogeophysical Center" CKP_558279.The research was supported by RSF, project No.18-11-00087.
Solar-Terrestrial Relations and Physics of Earthquake Precursors

Table 1 .
Time of algorithm execution.
Solar-Terrestrial Relations and Physics of Earthquake Precursors