Nowcasting Earthquakes in Southern California with Machine Learning:Bursts, Swarms and Aftershocks May Reveal the Regional Tectonic Stress

,


Introduction
Earthquakes of all magnitudes are known to cluster strongly in space and time (e.g., Scholz, 2019;Reasenberg, 1985). In fact, such burst phenomena are widely observed in many areas of science (Bahar et al., 2015;Mantegna and Stanley, 2004;Paczuski et al., 1996). For purposes of convenience, we introduce here a definition of seismic bursts that encompasses both seismic swarms and aftershock sequences, but that could be applied to other types of clustered events.
The goal of our work is to improve upon the methods of earthquake nowcasting (Rundle et al., 2016(Rundle et al., , 2018(Rundle et al., , 2019a which can be used to define the current state of risk from large earthquakes. These methods have begun to be applied to India (Pasari, 2019), Japan and Greece (personal comm. 2019).
Our definition of a seismic burst is the occurrence of an unusual sequence of earthquakes closely clustered in space and time (e.e., Hill and Prejean, 2007;Peresan and Gentili, 2018;Zaliapin and Ben-Zion, 2016a,b).
We define two general types of bursts, Type I and Type II: • We define a Type I seismic burst as a mainshock-aftershock sequence, in which the initiating event has the largest magnitude in the sequence, and is typically followed by a power-law Omori decay of occurrence of smaller events (Omori, 1900;Scholz, 2019). • A Type II seismic burst is defined as a sequence of similar magnitude events in which the largest magnitude event is not the initiating event, and in which there is not typically a subsequent power-law decay.
The earthquakes defining the bursts are small, usually of magnitudes characterizing the catalog completeness level. For the Southern California region, we consider small earthquakes of magnitudes M ³ 3.3. This magnitude threshold was chosen as a value high enough to ensure completeness of the catalog data used. The catalog containing these events is downloaded from the US Geological Survey earthquake search database (https://earthquake.usgs.gov/earthquakes/search/).
In our analysis below, we consider the Southern California region contained within a 600 km circle surrounding Los Angeles, California. We also consider time series beginning after 1984/1/1, after the data became most reliable in terms of catalog completeness, with accurate locations. The region is arbitrary in terms of method, but requires a complete catalog to be adequately applied and tested.

Method
We begin with our definition of a seismic burst, or cluster of small events. We coarsegrain time in the catalog into units of single days, and consider an elementary burst to be a day on which there are 2 or more small earthquakes of magnitude M ³ 3.3 within the region of interest, which for this study is the 600 km radius Southern California region. Note that, over the last 10 years, the rate of occurrence of these small earthquakes have been about 0.75 such earthquakes per day in that geographic region.
The method we describe proceeds in 5 stages. The first stage consists of an automated definition and classification of seismicity into candidates of seismic bursts. The second stage involves automated rejection of outliers. The third stage selects the members of the ensemble of accepted bursts which will then be displayed as a time series. The fourth stage applies an exponential moving average to the bursts to construct the burst time series. The fifth stage involves optimization of the ensemble of possible bursts with a simple cost function.
In Stage I, classification, the daily seismic catalog is searched to find bursts consisting of connected sequences of days in which 2 or more M ³ 3.3 events occur without any intervening days of fewer events. For each such set of days, we also, as a rule, include the preceding day to allow for any foreshock events.
This stage yields many hundreds of candidate bursts. Note that this process will of necessity yield bursts that include purely random, uncorrelated events. To remove these, we then filter the bursts in the following two ways.
In Stage II of the method, rejection of outliers, we detect and remove small earthquakes that may be random outliers. We begin by computing the spatial centroid, or center-of-mass, of each burst. In this calculation, all events having M ³ 3.3 are treated as a particle or unit of mass, each of equal computational weight.
We now compute the horizontal distance or radius ("Ri") of each small event from the centroid, then the median distance ("MedianR") is calculated from the set {Ri}. A factor FCL is defined such that if any event in the burst satisfies the relation: it is rejected.
This filter is applied to each of the candidate bursts. Using all the accepted small events in the burst, we then compute the burst radius-of-gyration (RG) about the burst centroid. Note that RG is the square root of the mean square radius of the small events in the burst. These filtered bursts now define the ensemble of accepted clusters. Radius of gyration is a parameter used to study fracture mechanics (e.g. Kucherovand Ryvkin, 2014; Sayers and Calvez, 2010).
As a numerical example, we find that many of the compact clusters of most interest have MedianR ~ 1 km. A typical value of FCL is 5-25 (see below), so that small events farther away than 5 km to 25 km from the burst centroid would be rejected.
In Stage III of the method, we filter the collection of bursts according to their mass ratio or density (r), which we define as the ratio of cluster mass µ to radius-of-gyration RG: Again, mass is defined as the number of small events in the cluster or burst.
To implement this filter, we define a filter or threshold value FEN corresponding to a particular value of mass ratio r. Each burst is tested, with the criterion for acceptance being: With this condition, we accept only high-density clusters, which are typically the most compact clusters. As will be seen, clusters that are accepted by this criterion correspond to long wavelength fluctuations in the time series, so condition (3) represents a low-pass filter.
Since the range of values of FEN is rather large (see below), we prefer to work with Log10(r) or Log10(FEN). The more compact bursts are found to have r as large as 10-100 per km, some of these being aftershocks of the major earthquakes that have occurred in the region since 1984.
In Stage IV, we apply an exponential moving average 2 (EMA) to the filtered burst time series data. The choice to be made with this method is the value of N, the number of averaging steps. For our purposes, we chose to adopt a 1-year averaging interval for the temporal resolution, corresponding to an average of N ~23 bursts per year. From this value, we compute the a-parameter in the EMA as in the usual formula for the EMA (https://en.wikipedia.org/wiki/Moving_average): Finally, in Stage V, we optimize the collection or ensemble of the bursts and combine them into a single time series. The result of this stage is an ensemble in which the largest earthquakes of M³7 occur at approximately the same value of RG for each event. We find that an acceptable estimate of the optimal ensemble is equal weighting of the members of the ensemble of values of FCL and FEN, which range from [5,10,...,25], and Log10(FEN) = [-1.0, -0.9, ...,0.2].

Results
Basic Results. Figure 1 shows an example of a moderate sized burst with a mass of 23 small events and a radius of gyration of 2.61 km. It began on August 31, 2005 and ended on September 2, 2005. The density, mass-to-radius ratio, of this burst is thus 8.81 km -1 . Location was at the southern end of the Salton Sea, with a maximum magnitude of M5.1.  . It can be seen that the EMA time series progressively increases prior to the M ³ 7.0 events, such that the value of RG progressively declines, reaching a value less than about RG ~ 2 km, after which the large M ³ 7.0 earthquake occurs. We call this process radial localization, to distinguish it from spatial localization that has been previously proposed has been difficult to observe for earthquakes.
The sudden and discontinuous declines in the time series, or increase in RG, just after the time of the large earthquakes represent the appearance of the large dimension aftershock bursts. These are obviously large, very compact bursts with large masses. Radii of gyration RG are typically the size of the mainshock dimensions, meaning they are much larger than the preseismic, compact small bursts. Thus, on a plot such as Figure 3, these will correspond to a sudden decrease in value of the time series on the inverted scale of the Figure, or a sudden increase in radius of gyration.
The other vertical dotted lines represent earthquakes having magnitudes 6.0 £ M £ 7.0. See Table 1 for a list of these. It is interesting to observe that the M6.5 San Simeon earthquake that occurred on December 22, 2003 had a more visible signature on Figure 3 than the M6.7 Northridge earthquake of January 17, 1994. But neither of these earthquakes had a significant effect on the evolution of the time series.
Note in particular the sudden increase in RG (decrease in time series) during August 26-28, 2012, where no vertical dashed line is shown (vertical arrow). This event corresponds to the Brawley earthquake swarm (https://www.usgs.gov/center-news/earthquakeswarm-brawley-seismic-zone) which included 2 earthquakes having magnitude M³5. The occurrence of this sudden increase in RG may suggest that an event more significant than the M~5 events, such as a larger slow earthquake.
Other less prominent but similar patterns can be seen, as for example in August-September 1995. This event is apparently associated with the Ridgecrest earthquake swarm (https://scedc.caltech.edu/significant/ridgecrest1995.html) that began on August 17, 1995 (vertical arrow). A series of events occurred in that swarm, including a M5.8 earthquake that spawned over 2500 aftershocks in the next weeks.
Sensitivity. We expect that time series such as that in Figure 4 will be affected by, and sensitive to, the choice of values for FCL and FEN. To illustrate this sensitivity, we show 4 such time series in Figure 5 It can be seen from Figure 5 that the time series have varying amounts of power in the frequency bands that define them. For example, Figure 5a and 5c have more high-frequency information than Figures 5b. and 5d. As a result, we adopt a strategy that seeks to bring out the common features of all these time series.
Optimizing for Common Features of Earthquakes. The strategy we employ involves defining a cost function that seeks to optimize the value of radius of gyration RG for the largest earthquakes M³7, just before they occur. The cost function that we use requires that the radius of gyration of these large earthquakes just prior to failure be a relatively uniform value. This would allow a crude forecast of when a large such earthquake might occur in the future.
We construct in Figure 6 an EMA time series by building an ensemble average over many time series similar to that shown in Figure 4. Note that each time series in the ensemble will of necessity be composed of a different number of bursts (blue dots in Figure 4). Also, note that each such time series is tabulated non-uniformly in time.
Constructing such an ensemble average of time series demands that we have an equidistantly tabulated time series for each value of the control variables [FCL, FEN]. From each non-uniform time series, we therefore build an interpolated time series having equidistantly tabulated values for each day since 1/1/1984 to the present. Once we have these interpolated time series in hand, we compute the EMA mean value of RG for each day, together with its standard deviation. To a first approximation, our simple cost function yields an ensemble average consisting of equal weights of the constituent time series. These values are not unique, but in varying these filter parameters, we find no significant differences in the results.
In Figure 6, the solid blue curve represents the time-dependent mean of the RG ensemble. The green band at the top records the time-dependent 1s (1 standard deviation or 68% confidence interval) of the ensemble mean for each time. The similarities between Figures 3-6 are clear.
There is a recharge period where average RG decreases prior to each magnitude M³7, followed by a sudden discharge where RG increases in average due to the large aftershock bursts following the mainshock. Between these large mainshocks, it can be seen that lesser magnitude earthquakes result in lesser but similar effects. We discuss these results in the following section.

Physical Mechanism
An important question would be to understand the physics underlying radial localization. At the present time, we do not fully understand the mechanics of the process, rather it must stand as an unexplained observation at this time.
However, we speculate as to the origins of the process. We do know that earthquake faults are organized in a hierarchy of groups with a fractal distribution of scales (Turcotte, 1997). Small faults are also known to be mechanically stiffer than large faults (e.g., Scholz, 2019), in that it takes a higher applied stress to produce the same slip.
Our speculation is therefore that as the regional tectonic stress increases, smaller faults and groups of faults are activated, leading to the observed radial localization. We emphasize, however, that any such explanation is subject to further investigation, modeling, and substantial revision as new information becomes available.

Discussion and Conclusions
Our purpose in this research has been to connect the time of occurrence of the largest earthquakes in California since 1984 with a readily observable property of small earthquake seismicity in the region so as to improve upon earthquake nowcasting. We have therefore focused on bursts of small earthquake seismicity, which include both aftershocks of large earthquakes, and small earthquake swarms. Improved machine learning methods involving classification of the small earthquake seismicity, filtering and optimization can probably systematically improve upon our results.
We have discussed a process that we call radial localization of small earthquake bursts, which we define to be the gradual decrease of radius of gyration of these bursts prior to large earthquakes. We distinguish this from the idea of spatial localization that has been proposed previously as a precursor to large earthquakes (e.g., Kossobokov and Keilis-Borok, 1990), and which we discuss below.
The elastic rebound theory of earthquakes (Richter, 1958) proposes that tectonic stresses build up, recharge or increase, in a region following a large earthquake until another large earthquake occurs and stress discharges or decreases. At that point the stresses are suddenly reduced, and a new cycle of stress recharge and discharge begins. By presenting our results in the manner shown in Figures 3-6, the similarity with the elastic rebound theory can be seen.
Our present results contribute to the development of seismic nowcasting methods that we have discussed earlier (Rundle et al., 2016;2019a,b). In the previous methods, elastic rebound is introduced as a constraint, by counting small earthquakes since the last large earthquake. In contrast using this method, elastic rebound emerges naturally, in that it follows directly from time-dependent properties of the bursts. Another difference is that the seismic nowcasting method produces a cumulative distribution function, or equivalently a survivor distribution of future large earthquake activity. By contrast, the present method computes an observable property of the region with a clear physical meaning.
Of interest is the observation that the minimum radius of gyration (minimum RG) prior to M³7 mainshocks is typically 1 to 2 Km. Achievement of each of these RG values was followed within 1 to 3 years by an M³7 earthquake, the only exception being the M6.5 December 22, 2003 San Simeon earthquake. However, the time series recovered from that event and soon evolved towards the minimum RG again. It should also be emphasized that no M³7 earthquakes have been observed at RG values greater than the ensemble mean value of 2.5 Km. For that reason, RG £ 2.5 Km can be considered to define a "low risk" threshold for M³7 earthquakes.
The issue of spatial localization, mentioned above, has been a recurring topic in the literature for many years. Field, laboratory, statistical and theoretical investigations have all examined processes associated with the onset of earthquakes and laboratory fracture. A very partial list of previous works includes the following.
• In the area of laboratory experiments, the emphasis has been on recording acoustic emissions in rock fracture experiments (Wawersik and Fairhurst, 1970;Yong and Wang, 1980;Lockner, 1993;Garcimartin et al., 1997;Scholz, 1968;Gao et al., 2019;Jasperson et al., 2019) • Slip localization in laboratory friction experiments (Dieterich, 1986(Dieterich, , 1992Dieterich and Kilgore, 1995). • In the area of field measurements, there have been many searches for foreshocks of large earthquakes (e.g., Richter, 1958;Scholz,2019;Kelleher et al., 1973;Kossobokov and Keilis-Borok, 1990;Skordas et al., 2020;Vallianatos et al., 2012;Wyss et al., 1999; and many others referenced therein), defined as smaller earthquakes that immediately precede mainshocks in the source region. • In the area of statistical properties of small earthquakes, other investigations have focused on understanding the statistics of links associated with clusters of small earthquakes (Zalipin et al., 2008;Zaliapin and Ben-Zion, 2016a,b;2013). • In the area of theoretical investigations, there have been proposals of nonlinear material constitutive equations that demonstrate collapse of strain onto the eventual fracture rupture surface (Rudnicki and Rice, 1975;Shaw et al., 1992) All of these studies have generally found results that have proven elusive when applied to field observations associated with earthquakes. As an example, it has been stated that only about 5% of all major earthquakes demonstrate foreshock activity (https://www.usgs.gov/faqs/what-probability-earthquake-a-foreshock-a-largerearthquake?qt-news_science_products=0#qt-news_science_products). For that reason, spatial localization and pre-mainshock elevated activity has generally not been found to be a property associated with major earthquakes, despite previous expectations.
At the current time, our study has not as yet seen reliable spatial localization of burst activity prior to mainshocks during the short times of days-weeks-months leading up to these major earthquakes. The search for these events is a work in progress at the present time, although there are some suggestions of potential signals. We plan to address this question in future work.
Other studies have shown that large earthquakes tend to occur in relatively small regions where small earthquake activity has been the greatest for a number of years (Rundle et al., 2003;Tiampo et al., 2002a,b,c;Holliday et al., 2006a,b;. This is essentially a consequence of the universal applicability of the Gutenberg-Richter relation (Rundle et al., 2016;. The RELM earthquake forecasting test suggests that this approach may be fruitful (Holliday et al., 2007;Lee et al., 2011) We might therefore suggest a strategy to anticipate major earthquakes that combines methods such as those proposed by (e.g., Rundle et al., 2003) to estimate candidates for spatial locations of future events, combined with the ensemble time series methods discussed here. Or in other words, a time series such as that in Figure 4 might be combined with these likely location methods to determine times and candidate locations most at risk for future major earthquakes.
A question that remains is the applicability of the methods described here to other seismically active regions, which in turn depends on the completeness of the seismic catalog over a wide range of magnitudes. A major advantage of the Southern California region is that the catalog is complete to small magnitudes, a property that is not generally the case elsewhere. While we have not examined this question in detail, we have seen promise in preliminary applications of this method to Japan. Future work will be directed at answering this question.   (Table 1). Green represents bursts occurring between 1992 Landers and 1999 Hector Mine earthquakes, blue color represents bursts between Hector Mine and El Mayor Cucupah earthquakes, red color between El Mayor Cucupah and Ridgecrest earthquakes, and black color following Ridgecrest. Larger yellow stars locate the epicenters of the M³7 earthquakes during the period, smaller yellow stars locate the epicenters of the 7.0>M³6.0 earthquakes.   (Table 1). A value N=23 was used to construct the EMA (see text). Left red arrow indicates the Ridgecrest swarm of 8/17/1995, right arrow indicates the Brawley swarm of 8/26/2012. Note that the vertical scale is inverted, so that the smallest radius bursts are at the top.   (Table 1). A value N=23 was used to construct the EMAs that make up the components of the time series. (see text).