Improved earthquake aftershocks forecasting model based on long-term memory

A prominent feature of earthquakes is their empirical laws, including memory (clustering) in time and space. Several earthquake forecasting models, such as the epidemic-type aftershock sequence (ETAS) model, were developed based on these empirical laws. Yet, a recent study [1] showed that the ETAS model fails to reproduce the significant long-term memory characteristics found in real earthquake catalogs. Here we modify and generalize the ETAS model to include short- and long-term triggering mechanisms, to account for the short- and long-time memory (exponents) discovered in the data. Our generalized ETAS model accurately reproduces the short- and long-term/distance memory observed in the Italian and Southern Californian earthquake catalogs. The revised ETAS model is also found to improve earthquake forecasting after large shocks.


Introduction
Earthquakes pose a serious threat to human life and property and as such attracted the attention of many scientists. Yet, the understanding and the forecasting of earthquakes are limited. Predicting earthquakes by using a diagnostic precursor via some observable signal has not produced a reliable prediction scheme [2][3][4]. Indeed, the current earthquake predictability is based on the known seismic laws. The distribution of earthquake magnitudes is exponential and follows the Gutenberg-Richter law (N(m) ∝ 10 −bm , where N is the number of earthquakes of magnitude m and b ≈ 1) [5]. The number of earthquakes triggered by a mainshock increases exponentially with its magnitude (Utsu law) [6]. In addition, the rate of triggered events decays as a power law with time (Omori law) [7].
An operational earthquake forecasting scheme has been developed and applied to forecast earthquake sequences based on the empirical laws, including the clustering of earthquakes in space and time [3]; space-time earthquake clustering can be attributed to the triggering of earthquakes [8]. This clustering stimulated the development of a series of earthquake forecasting models based on a branching process, such as the epidemic-type aftershock sequence model (ETAS) [9,10], and the short-term earthquake probability model (STEP) [11]. The ETAS model combines the Gutenberg-Richter, Utsu, and Omori laws into a Hawkes (point) process in a way that every past earthquake (above a certain magnitude) triggers other earthquakes according to the same laws. Previous studies and many retrospective analyses [12,13] have proved that clustering models, such as the ETAS model, provide better forecasts than other models; still, these models cannot explain some central earthquake features [14][15][16][17], and many statistical physics features of seismic activity have not yet been fully understood [18].
Temporal and spatial memory (correlations) exists widely in many natural systems [19][20][21], including in earthquake activity. For example, Livina et al [22] identified the short-term memory of successive inter-event times in real earthquake catalogs using a conditional probability method. They found a strong short-term memory in which a short (long) inter-event time tends to follow a short (long) inter-event time. Other correlation detection methods, such as the detrended fluctuation analysis [23], have also been applied to detect the memory of inter-event times [24]. The empirical short-term memory between successive inter-event times in real catalogs has been found to be reproduced by the ETAS model, only for a narrow range of model parameters [25]. Recently, a new measure has been [1] introduced, called 'lagged' conditional probability, to explore long-term memory, both in successive and non-successive inter-event times and distances [1]. This analysis has resulted in a memory measure versus (time or distance) lag for which a crossover between two distinct behaviors has been found: a slowly decaying power law at short scales (time or distance) and a significantly faster decay (that may be exponential) at long scales [1]. This behavior, discovered in real catalogs, could not be reproduced by the ETAS model. More specifically, the model's analysis resulted in the memory without the crossover that was observed in the real catalogs [1]; the model's memory is weaker (stronger) in short (long) time scales than the real catalogs. The value of the power law exponent depends on the productivity parameter α, which is associated, in the model, with the Utsu law. Earthquakes can trigger more correlated events with a larger α, resulting in enhanced earthquake memory. Therefore, based on the empirical finding [1] of crossover in the memory behavior, here we introduce into the ETAS model two productivity parameters, large and small, α 1 and α 2 , for short and long time scales. We show here that this revised ETAS model reproduces the observed double power law behavior of memory, as well as the crossover observed in the real data. Moreover, we show that the revised model significantly improves the forecasting performance of earthquake events.

Data
The Italian earthquake catalog is complete [26] for events with magnitudes above 3 and includes 8854 events from 1981 to 2017; the earthquake rate is 0.67 events per day. A catalog is 'complete' when all events above the specified magnitude are included in it. The Southern California catalog is complete [27] for events with magnitudes above 3 and includes 13 586 events from 1981 to 2018; the earthquake rate is 0.98 events per day.

The revised ETAS model
In the ETAS model, seismic events are assumed to involve a space-time stochastic point process [9,10]. Each event above a magnitude M 0 is selected independently from the Gutenberg-Richter distribution (where b = 1). The conditional rate, λ, at location (x, y) at time t is given by where H t is the history process prior to t, t i are the times of the past events, and M i are their magnitudes. μ(x, y) = μ 0 u(x, y) is the background intensity at location (x, y), where u is the spatial probability density function (PDF) of background events, which is estimated by the method proposed by Zhuang [28,29]; μ 0 is the background rate of the entire region. We represent the total number of the past events as n − 1. The dependence of the triggering ability on magnitude is given by the Utsu law as where A is the occurrence rate of earthquakes at zero lag. In equation (2), we introduce two productivity parameters, α 1 and α 2 (α 1 α 2 ). When α 1 = α 2 , equation (2) reduces to the original ETAS model. h10 −bM 0 is the crossover number of events with the magnitude threshold M 0 ; h is a parameter that can be estimated from the data. The ith historical event has a larger rate to trigger the nth event (due to the larger α 1 ), when the number of events between the ith and nth events is smaller than h10 −bM 0 . Note that we consider here a regional size (like Italy) model where the spatiotemporal clustering sequences below the crossover are mostly unmixed with other remote sequences. The parameter h may be different when dealing with a much larger area size. The function g (t − t i ) follows the Omori law as where c and p are the Omori law parameters. The spatial clustering of aftershocks is implemented by introducing a spatial kernel function f indicates that the distances between triggering and triggered events depend on the magnitudes of the triggering events. q, D and γ m are the estimated parameters.
We chose the parameters of the original ETAS model (ETAS1) for Italy based on refs [25,30] as follows: 26, and the critical parameter n = Ac it is smaller than 1 (β = ln (10)). The spatial parameters were chosen to be q = 2.0, D = 0.03 (in units of 'degree'), and γ m = 0.48, and were estimated based on the method described in references [28,29]. The parameters of the revised ETAS model (ETAS2) were chosen to be A = 3.35, α 1 = 2.0, α 2 = 1.5, and h = 2 × 10 5 (the crossover h10 −bM 0 = 200 for the magnitude threshold M 0 = 3.0); the other parameters are the same as for the ETAS1 model.

N-test
The N-test [31] compares the total number of observed earthquakes (N obs ), within the spatial size, magnitude, and time interval, to the expected total number of target earthquakes predicted by the forecasting model (N forecast ).

Results
We first define the earthquake inter-event time interval as τ i = t i+1 − t i (in days); this is the time interval between two consecutive earthquake events above a certain magnitude threshold. Similarly, an inter-event distance r i is defined as the distance (in km) between the epicenters (i.e., the projections of hypocenters on the surface) of events i + 1 and i, where both are above a certain magnitude threshold. We calculate the inter-event times and distances with the magnitude threshold M 0 = 3.0(M w ) for Italy's seismic catalog, which is known to be complete for earthquake magnitudes above 3.0 [26] (see materials and methods); this catalog spans 37 years, from 1981 to 2017. We then propose the 'lagged' conditional cumulative distribution function (CDF) method based on the 'lagged' conditional PDF [1] as follows. First, all inter-event times (distances) are sorted into ascending order and then divided into three equal quantiles. The first quantile, Q1, contains the smallest 1/3 inter-event times (distances), and the third quantile, Q3, contains the largest 1/3 inter-event times (distances). The conditional CDF of inter-event times (distances) is defined as C(τ k |τ 0 ) (C(r k |r 0 )), where τ 0 (r 0 ) belongs to Q1 or Q3, and τ k (r k ) is the lagged kth inter-event time that follows τ 0 (r 0 ). This method generalizes previous studies that used lag 1 (k = 1) [22], thus revealing the long-range memory empirical laws of earthquake catalogs. To demonstrate the empirical long-term memory, we show, in figures 1(a) and (b), the lagged conditional CDFs C(τ 50 |τ 0 ) and C(r 50 |r 0 ) using a high lag number k = 50, for the first and third quantiles, Q1 and Q3, for the Italian catalog. It can be seen that the lagged conditional CDFs of Q1 are significantly different from those of Q3 (see figures 1(a) and (b)). This implies the existence of memory (correlations) also for a large number of lags. For the randomly shuffled catalogs that do not contain memory, both lagged conditional CDFs of Q1 and Q3 are identical to the unconditional CDF (indicated by the dashed black curves in figures 1(a) and (b)). Thus, the real catalogs exhibit memory, even for large k lags, as found earlier [1].
We next test the memory in the ETAS model by simulating earthquake records for the region of Italy (34 • N-48 • N, 6 • E-20 • E), based on the thinning method [10,32]. The original and our revised ETAS models are called here ETAS1 and ETAS2, respectively. In ETAS2, we introduce the productivity parameter α 1 for small lag-index k (short time scale), and the smaller productivity parameter α 2 for large lag-index k above a crossover value (long time scale). See the data and methods section for more details regarding the model. We generate 50 realizations of synthetic events greater than or equal to magnitude 3 for the time window of 13 500 days (same as the time length of the real catalog) for the synthetic catalogs of ETAS1 and ETAS2. The earthquake rates are 1.11 ± 0.06 and 0.74 ± 0.09 events per day for ETAS1 and ETAS2, respectively. The higher rate indicates more aftershocks generated for ETAS1 than ETAS2 in the entire period. The rate of the real catalog is 0.67 events per day. Figure S1 (https://stacks.iop.org/NJP/23/042001/ mmedia) shows the epicenter map and longitude-time plot of the catalogs for the time window. Figures 1(c) and (d) show the lagged conditional CDFs C(τ 50 |τ 0 ) and C(r 50 |r 0 ) for the catalogs simulated by ETAS2. Note that there are substantial differences between the CDFs of Q1 and Q3, where these differences are common both for the real (figures 1(a) and (b)) and for the ETAS2 simulated catalogs (figures 1(c) and (d)). In contrast, the CDFs of Q1 and Q3 of the original ETAS1 model almost completely overlap, for both inter-event times and distances (figures 1(e) and (f)), in contrast to the observed CDFs. This demonstrates the existence of stronger memory in ETAS2 and, in contrast, much weak memory in ETAS1. Figure S2 shows that the memory of ETAS1 is even weaker when the rate of aftershocks is similar to that of ETAS2.
To quantify the memory based on the lagged conditional CDF, the maximum gap S (indicated by a double arrow in figure 1) between the lagged conditional CDFs of the first and third quantiles is calculated. It follows that S is theoretically bounded between 0 (complete overlap between the CDFs or PDFs of Q1 and Q3) and 1 (complete separation) where larger S indicates stronger memory. We next calculate the memory measure for the real Italian catalog as a function of the lag index k for inter-event times and distances, respectively (in figure S3). We also consider here the larger complete magnitudes with thresholds M 0 = 3.5, 4.0. We rescale the memory measure S(τ k |τ 0 ) by a factor 10 aM 0 , which represents the dependence of memory on the magnitude threshold [1] (i.e., F(x) = S(x)10 aM 0 ) where the lag-index, k, is rescaled by x = k10 bM 0 , to account for the Gutenberg-Richter law. The curves S(x) of all cases collapse into a single curve F(x) after the rescaling (figure 2). It is seen that the rescaled memory measure F(x) of the Italian catalog decays slowly for small x (lags) and faster for large x (lags), both for inter-event times and inter-event distances, as seen in figures 2(a) and (b). Figures 2(c) and (d) depict the corresponding scaling  Table 1. Estimated parameters (mean ± std), a exponent in the scaling factor 10 aM 0 and power law exponents of the scaling function, γ 1 , γ 2 for the interevent times and distances for the real catalog of Italy, ETAS1 and ETAS2 models. For the models, the error bars show the standard deviations for 50 realizations.

Parameters
Real ETAS1 ETAS2  a) and (b)) and the associated crossover of the scaling curves. Moreover, the crossovers are close to x c ≈ 10 5.0 , for both the real catalog and our developed ETAS2 model. Immediately after large shocks, the crossover point (lag) corresponds to a time of the order of 40 days (see figures S4 and S5). The scaling function behaves as F(x) ∼ x −γ 1 for x < x c and F(x) ∼ x −γ 2 for x x c . These two exponents (γ 1 , γ 2 ) and the parameter a are summarized in table 1, showing that the new ETAS2 model reproduces quite accurately the scaling exponents of the real catalog. In contrast, the original ETAS1 model basically exhibits a power law behavior but with only a single exponent, without a crossover, and thus fails to reproduce the observed memory characteristics of the data (see figures 2(e) and (f)). The flat in large k indicates that the inter-event distances are completely uncorrelated in figure 2(f). We also calculate the memory measure for the real and simulated catalogs of Southern California (see figures S6 and S7 and table S2). The results indicate that the new ETAS2 model reproduces quite well the scaling function of the real catalogs, in contrast to the ETAS1 model. An earthquake catalog could, in principle, be incomplete after a large shock as a result of: (a) coda waves, (b) the inefficiency of the seismic network to detect weak and frequent events, and (c) the overlapping of aftershock seismograms [33][34][35][36]. To test whether the crossover could be due to incompleteness, we have produced incomplete catalogs based on refs [34][35][36]. Figure S8 shows that the synthetic incomplete catalog based on the ETAS1 model cannot reproduce the crossover observed in the real data and in the ETAS2 model. This suggests that our results are not due to the possible incompleteness of earthquake catalogs. Moreover, the crossover is not reproduced as observed in the real data for ETAS1 using extensive set of model's simulations covering a wide range of ETAS1 model parameters in figures S11-13. In all of these the resulting crossover is significantly weaker than the observed one.
Next we test and compare the forecasting performance of the ETAS1 and ETAS2 models. We perform the N-test [31] (see materials and methods), which compares the total number of earthquakes forecasted by the model with the total observed number of earthquakes over the entire region; we apply this test to the L'Aquila (Italy) earthquake (magnitude 6.3) that occurred on April 6, 2009 [37]. Figure 3(a) presents the locations of earthquakes above magnitude threshold 3 that occurred within one month after the L'Aquila mainshock. The cumulative number of earthquakes increased immediately after the L'Aquila mainshock (red circles in figure 3(b)). Figure S9 shows the non-cumulative number of earthquakes as a function of time. Notably, the original ETAS1 model forecast many fewer events compared to the real catalog, while the forecast of the revised ETAS2 is very similar to the real events. Indeed, the ETAS1 model is known to severely underestimate the number of earthquakes immediately after large shocks [37], while overestimating the number of earthquakes at long-term periods. The suggested remedy is to increase the α-value after large shocks [37][38][39][40][41][42]. However, this increase of the α-value is not consistent with the α-value evaluated based on the entire Italian catalog. The reason is that ETAS1 model with large α-value overestimates the cumulative number of earthquakes in the long time period (see figure S10) and furthermore cannot reproduce the observed crossover of memory in the Italian catalog (see figures S11-13).
We also test the quality of the forecast in space (figure 3(c)) and find that our revised ETAS2 model outperforms the original ETAS1, resulting in an increased number of events, which makes its performance closer to real observations. The spatial clustering of aftershocks can cause a large fraction of events within a short distance. Aftershocks generally occur within a 100 km radius from the mainshock's epicenter. We find that 98% of the events in Italy within 30 days after the L'Aquila mainshock were within a radius of R = 100 km ( figure 3(c)). We find that 94% and 86% of the events (from 500 independent realizations) of the ETAS1 and ETAS2 models, respectively, occur within a 100 km radius within one month after the main shock. The fractions in the real data and the ETAS2 are similar to each other.
We also test the forecasts after the six largest shocks (M w 6) that occurred in Italy from 1981-2017.  This is because an even larger earthquake (L6, magnitude 6.6 in comparison to magnitude 6.1 of the L5 event) hit Norcia later, leading to more aftershocks four days after L5. We also show in figures 4(d)-(f) the fraction of earthquakes within a 100 km radius around the mainshock's epicenter and within different numbers of days from the mainshock. The results indicate that the observed fraction of earthquakes (red dots) falls within the narrower error bars of the proposed ETAS2 model, while the observations fall outside the error bars of the ETAS1 model, despite the larger error bars of this model (figures 4(d)-(f)). We thus conclude that the new ETAS2 model's forecasting performance is significantly better than that of the ETAS1 model. Similarly improved forecasting performances of the ETAS2, after large shocks in Southern California, are shown in figure S14.

Conclusions
Here we study the long-term memory by considering the lagged conditional probabilities of interevent times and distances in real and synthetic earthquake catalogs. We suggest that the spatiotemporal clustering of aftershock sequences dominates the memory in inter-event times and distances that are smaller than the crossover. While above the crossover, the clustering and memory behavior substantially decreases. This is incorporated into the ETAS model. Our results suggest that the memory of the ETAS model actually depends on the aftershock productivity parameter α. Following the observed catalogs [1], we revised the ETAS model to include two productivity parameters, α 1 and α 2 , for short and long time scales. We show that the revised ETAS model suggested above not only reproduces the memory characteristics (scaling function) of the real catalog but also exhibits significantly better forecasting skills in comparison to the original ETAS model. According to the Utsu law, the aftershock rate depends on the magnitude of the mainshock. The two α-values of the revised ETAS model indicate that the Utsu relation for long-term triggering is not same as for short-term triggering. For lag index smaller than the crossover lag, a large earthquake tends to trigger many more events. The triggering ability is substantially reduced for lags above the crossover lag. This may imply a possible earthquake generating mechanism that depends on the stress conditions established by historical events [43,44]. Our results imply that the stress conditions depend more on the near recent events (below the crossover number) and less on the far events (above the crossover number). The fundamental reason for the crossover dependence on number of earthquake events could be that this number is proportional to the energy released from faults. Our methods may be also relevant for other fields like in the studies of sleep disorder and epileptic seizures [45,46].