Introduction

Although a considerable amount of work has been performed (Reasenberg and Jones 1989, 1990; Gerstenberger et al. 2005; Jordan et al. 2011; Zechar et al. 2016; Omi et al. 2013; Gulia and Wiemer 2019; Ogata and Omi 2020), short-term earthquake forecasting remains a subject of thorough examination. High aftershock activity closely following the mainshock can greatly increase seismic hazard in affected areas; hence, post-seismic potentiality assessment is urgently needed for emergency managers, seismological agencies and the public (Goltz and Roeloffs 2020; Fan et al. 2021).

Many models have been developed to improve the reliability and efficiency of aftershock forecasting, such as the Reasenberg–Jones (R-J) model (Reasenberg and Jones 1989; Page et al. 2016), short-term earthquake probability (STEP) model (Gerstenberger et al. 2005), and epidemic-type aftershock sequence (ETAS) model (Ogata 1988, 2011; Ogata and Omi 2020). These probabilistic models were developed to generate reliable forecasts for impending large shocks or subsequent aftershocks with magnitudes smaller than that of the first event. In Japan, a time-dependent aftershock forecast system was launched in 2017 to evaluate the occurrence probability of aftershocks with a considerable magnitude within one day, three days and seven days after the mainshock. The first forecast can be issued 3 h after mainshock occurrence because the model parameters are sequence-specific variables and can be calculated by employing the Omori–Utsu (Utsu et al. 1995) and Gutenberg–Richter models (Gutenberg and Richter 1944) based on real-time catalogs. The U.S. Geological Survey (USGS) has been announcing aftershock forecasts online since August 2018. The forecast procedure was initially based on the R-J model and then switched to the ETAS model after 2020 to obtain more effective results. Gulia and Wiemer (2019) proposed a foreshock traffic light classification system (FTLS) to define the level of concern regarding subsequent seismic hazards after a large earthquake based on b-value trend analysis of the sequence. Similar attempts are under development in Italy (Marzocchi et al. 2014), New Zealand (Rhoades et al. 2016), China (Bi and Jiang 2020) and the rest of the world (Schorlemmer 2018).

With most current systems, predicted properties of earthquakes are obtained based on sequence-specific parameters, which are derived from the subsequent aftershocks days to weeks or even months after mainshock occurrence. The more aftershock information obtained, the more stable the model is. However, the overlap of seismic signals at the early stage of an earthquake sequence has greatly affected the completeness of records (Iwata 2008; Lippiello et al 2019; Bi and Jiang 2020), which has prevented efficient estimation of the parameters in forecasting models (Jiang et al. 2013; van der Elst and Page 2017). At this point, we introduce a nonparametric historical analogy method to quickly forecast the subsequent seismic hazard, including the determination of the occurrence possibility of a larger earthquake and estimation of the largest aftershock magnitude around the epicenter within 7 days after the mainshock.

It should be noted that this historical analogy method is not an innovative approach but a strategy that has been implemented in earthquake consultation work in China for decades. However, this strategy does not constitute a definite model that can be employed to forecast impending large shocks. Instead, seismologists are encouraged to evaluate nearby historical seismic activity and make decisions regarding the likelihood of large aftershocks based on personal knowledge and experience. Correspondingly, different conclusions can be obtained owing to subjective bias or varying data completeness levels. Moreover, Chinese seismologists are subject to the statutory responsibility of providing the central government with aftershock forecasts once a major shock occurs. A very short time is available to assemble effective precursors for robust seismic forecasts. In the past, the initial version of the aftershock forecast report was usually issued several hours after the first event, after the post-earthquake potentiality consultation meeting held by the China Earthquake Administration (CEA, http://www.cea.gov.cn/), as described in the literature (China Earthquake Administration 1998; Jiang et al. 2015).

In this paper, we propose an automatic system for near-real-time aftershock forecasting that integrates the historical analogy method, as well as valuable seismic and geological data, expert rules and mapping templates. This system streamlines the complex data analysis and comprehensive assessment process that can last several hours or even days and automatically outputs rapid post-earthquake potentiality reports in standard format only several minutes after earthquake occurrence. Applied to the China Earthquake Network Centre (CENC) since the end of 2018, this system (named the CENC Automatic Aftershock Forecasting System or CAAFS) has quickly responded to hundreds of earthquakes in China, ranging in magnitude from M 2.0 to M 7.4, and has become an indispensable component of post-earthquake consultation work of the Chinese earthquake agency.

In the first section of this paper, the logic behind the system design and the methods used for aftershock forecast and evaluation. In Section “Overview of the system”, an overview of the system framework, data and outputs is provided. An evaluation of the forecast results for hundreds of earthquakes in mainland China over the past three years (2019–2021) is detailed in Section “Evaluation of the forecast results over the past 3 years”. Then, we examine the advantages and limitations of this system in Section “Discussions”. In Section “Conclusions“, the conclusions of this paper are outlined.

Background and methodology

Background

China has a long history of earthquake monitoring, with more than 4,000 years of historical documentation and nearly 100 years of seismic instrument records (Qin 1993). On the basis of abundant observations, Chinese seismologists have extensively summarized the properties and characteristics of earthquake sequences and have obtained various indicators for sequence type discrimination and aftershock magnitude prediction, mainly based on empirical and statistical strategies (Wu 1990; Jiang et al. 2006a; Su and Zhao 2008).

Earthquake sequences can usually be divided into two categories: mainshock-aftershock sequences and swarms (or multiplets referred as Felzer et al. 2004 and Grimm et al. 2022). The difference between these two sequence categories is whether there occurs only one mainshock with a magnitude much larger than that of the other events. In China, however, earthquake sequences are more likely to be classified into three types: the mainshock-aftershock type, the isolated earthquake type and the doublet and swarm type. We refer to these as the MAT, IET and swarm sequences, respectively, in this paper. Since the strongest event always occurs at the beginning of the MAT sequence, the probability of the subsequent seismic hazards could be lower than that of the swarm sequence, in which clusters of earthquakes with comparable magnitudes are observed in spatiotemporal proximity. This comprises the underlying assumption of our system.

Sequence type classification is mainly based on RE, i.e., the ratio of the energy released by the mainshock to the total energy released by the entire sequence. For RE ≥ 99.99%, the sequence is of the IET type. Moreover, RE < 90% indicates an earthquake sequence of the swarm type, while 90% ≤ RE < 99.99% indicates a sequence of the MAT type (Zhou et al. 1980). This classification can be reduced to a more intuitive model via the replacement of RE by the ratio of the largest two events in the sequence (Jiang et al. 2006b).

$$\mathrm{Sequence}\;\mathrm{type}=\left\{\begin{array}{c}\mathrm{MAT},\;0.6\leq\Delta\mathrm M\leq2.4\\\mathrm{Swarm},\;\Delta\mathrm M<0.6\\\mathrm{IET},\;\Delta\mathrm M>2.4\end{array}\right.$$
(1)

where \(\Delta M\) is the magnitude difference between the largest two events. Thus, the above sequence classification strategy is simple and intuitive and exhibits notable practicability for rapid estimation of the post-earthquake potentiality in China (China Earthquake Administration 1998).

To explore the mechanism of earthquake sequences, scientists have conducted in-depth research on the physical environment of seismic sources. There exists general agreement that any aftershocks triggered after the main shock are mainly induced by stress adjustment during fault slip (Freed 2005; Scholz 2019; Bonatis et al. 2022), whereas the migration of swarms can be partly explained by pore fluid diffusion( and other similar phenomena as suggested by Hatch et al. (2020). Statistical results for mainland China further suggest that the characteristics of earthquake sequences exhibit certain regional differences due to the specific tectonic setting (Zhou et al. 2001; Tian et al. 2004; Jiang et al. 2006b; Li et al. 2011; Cheng et al. 2015; Wen et al. 2020). For instance, the Tanlu belt in eastern China experiences more MAT sequences than other types; IET sequences predominantly occur in the middle of Tianshan Mountain in Xinjiang and Qinghai Provinces in northwestern China; and swarm sequences are more common in southwest Yunnan Province, east Qinghai Province and the middle section of the Hebei Plain belt (Wang et al. 1997; Jiang et al. 2006c).

In addition, the mainshock magnitude may affect the resultant sequence characteristics. For example, approximately 53% of historical earthquake sequences in the western part of Yunnan Province belong to the MAT type with magnitude greater than or equal to 5; while the proportion decreases to 33% when considering sequences with M ≥ 6 (Li et al. 2011). According to seismic activity statistics, the historical (before the 1970s) and modern (1970 to the present) earthquake sequences are basically consistent in terms of their proportion and spatial distribution in most parts of mainland China (Zhou et al. 2001; Tian et al. 2004; Li et al. 2011; Cheng et al. 2015; Wen et al. 2020) because the stress field, geologic faults and medium conditions remain stable or slightly change over a long period of time (usually hundreds of years).

Under this mechanism, we can preliminarily determine the sequence type at the early stage based on the prominent type of historical earthquakes with the same intensity around the epicenter area.

Methodology

By means of the historical earthquake analogy method, this system aims to forecast the short- term aftershock activity during the following 7 days immediately after an earthquake. After preliminarily assessing the type, the main tasks of this system are (1) to determine whether a larger earthquake will occur and (2) to generate a forecast of the largest aftershock magnitude. The operation process is described below.

  • Step 1. Identifying earthquake sequences

When a new earthquake \({M}_{new}\) is detected, the system first determines whether it belongs to a known sequence. First, the distance is calculated between the new earthquake \({M}_{new}\) and each earthquake \({M}_{i}\) that occurred over the past \(\Delta T\) days. If the distance exceeds the rupture scale Rc of \(\mathrm{max}\left({M}_{new}, {M}_{i}\right)\), \({M}_{new}\) is regarded as a new sequence, and the method moves to Step 2. Otherwise, the current earthquake \({M}_{new}\) is clustered into the past sequence with \({M}_{i}\), and post-earthquake trend determination is conducted in Step 4. There is another possibility, namely, \({M}_{new}\) is related to more than one previous earthquake. In this case, we choose only the largest earthquake as \({M}_{i}\) and move to Step 4.

The rupture scale Rc can be obtained via the Wells equation (Wells and Coppersmith 1994):

$${M}_{w}=5.08+1.16lg{R}_{c}$$
(2)

This equation was mainly developed to calculate the rupture scale of earthquakes with \({M}_{w}\ge 5\). In regard to the scale of smaller earthquakes, Rc can be expanded to a certain extent considering positioning or statistical errors:

$${R}_{c}={R}_{c}+e$$
(3)

where \(e\) is empirically chosen as 5, 10 or 15 (unit: km), corresponding to earthquakes with \({M}_{w}<3, {3\le M}_{w}<4 \mathrm{or }4\le {M}_{w}<5\), respectively.

In addition, in the CENC earthquake catalog, the magnitude is Ms, which must be converted to Mw in advance,, referring to Giacomo et al. (2015):

$${M}_{w}={e}^{-0.222+0.233{M}_{s}}+2.863$$
(4)

Unless stated otherwise, the magnitude M in the following text refers to the Ms scale.

The length of the sequence time window \(\Delta T\) can be established following the approach of Lolli and Gasperini (2003) considering \(\mathrm{max}({M}_{new},{M}_{i})\ge 4\) and can be empirically defined for \(\mathrm{max}({M}_{new},{M}_{i})<4\):

$$\left\{\begin{array}{c}\Delta T=60+60\left(\max\left({\mathrm M}_{\mathrm{new}},{\mathrm M}_{\mathrm i}\right)-4.0\right),\max({\mathrm M}_{\mathrm{new}},{\mathrm M}_{\mathrm i})\geq4\\\Delta T=30,3.0\leq\max\left({\mathrm M}_{\mathrm{new}},{\mathrm M}_{\mathrm i}\right)\leq3.9,(\mathrm{empirical}\;\mathrm{equation}\\\Delta T=15,2.0\leq\max\left({\mathrm M}_{\mathrm{new}},{\mathrm M}_{\mathrm i}\right)\leq2.9,(\mathrm{empirical}\;\mathrm{equation})\end{array}\right.)$$
(5)
  • Step 2. Preliminary assessment of the sequence type

For independent earthquake \({M}_{new}\), we first retrieve the proportion of historical earthquake sequence types with magnitudes equal to or larger than \({M}_{new}\) from a circular region with radius \(R\) and preliminarily estimate the possible type of the current sequence.

According to China’s post-earthquake consultation regulations, the radius R (unit: km), based on historical earthquake statistics, can be obtained as:

$$\mathrm R=\left\{\begin{array}{cc}200,&M_{new}\geq6\\100,&5\leq M_{new}<6\\50,&4\leq M_{new}<5\\20,&M_{new}<4\end{array}\right.$$
(6)

If no historical cases are retrieved from the circular region, R can be expanded to a higher level.

In addition, the statistical time period varies among different parts of China. According to Xu and Gao (2014), in terms of the completeness of earthquake catalogs for mainland China and surrounding areas, the \(M\ge 5\) earthquake catalog for the Tibetan area has been generally completed since 1950 and that for other regions of mainland China has been completed since 1900. For all \(M<5\) earthquakes, the statistical time begins in 1970, as the digital seismic network has been built thereafter.

  • Step 3. Aftershock magnitude prediction based on expert rules

Expert rules are types of standard condition statements, which can be obtained from expert empirical approaches in earthquake consultation meetings; these expert rules include statistical laws for the seismic activity, such as the Gutenberg–Richter law (the relationship between the magnitude and number of earthquakes) and Bath's law (the average magnitude difference between the mainshock and the largest aftershock). When the initial conditions are input, including the time, location, and magnitude of an earthquake, along with the proportion of historical sequence types, the corresponding rules are triggered. Then, the system generates a forecast of the possible aftershock magnitude based on expert rules (Liu et al. 2019), which can be simplified as follows:

  • If < Condition Variables > < Operator >< Condition Value > < Weight > 

  • Then < Conclusions > 

The input, i.e., < Condition Variables > , includes 5 categories, namely, magnitude M, statistical radius R, historical earthquake sequence ratio P, sample size N, and cluster type T, as determined in Step 4 (T = 1 indicates a new cluster; T = 2 suggests that the earthquake is an aftershock in an existing MAT sequence; T = 3 suggests that the earthquake belongs to an existing swarm sequence). < Condition value > corresponds to the value for each condition, such as R = 200, P = 65%, and N > 10. < Weight > indicates the importance of each variable in the expert rules. For example, when the input event is clustered into an existing sequence (T = 2 or 3), only M and R are considered in the rules, while a value of 0 is assigned to the weight of P.

The < Conclusions > field indicates whether “it is unlikely that an earthquake of magnitude X or greater will occur in the same area within a short time” (which provides an estimate of the upper range of the aftershock magnitude) and whether “aftershocks with magnitude X/X–Y may occur in the same area within a short time” (which provides a prediction of the exact magnitude of the largest aftershock). Subsequent upper range prediction is necessary for each event, but only swarms or large mainshocks of \(M\ge 5\) can be determined via aftershock magnitude prediction.

These conclusions are logical and qualitative but uncertain for government agencies and society. This qualitative nature is a public communication requirement, which not only generates advice for nonprofessional managers and the public regarding the short-term seismic hazards but also prevents unnecessary panic due to being misled by absolute arbitrary announcements. However, from a scientific perspective, the above opinions provide clear meanings: (1) within a short time refers to the next 7 days after the mainshock, mainly referencing the analysis study of Jiang et al. (2007), in which it was determined that the sequence records within 7 days after the mainshock could provide a higher sequence discrimination ability than that provided by the sequence records within 1 day, 3 days or 5 days. (2) The same area denotes the region in which aftershocks could occur, as defined by Eqs. (2) to (4).

  • Step 4. Trend prediction of existing sequences

If the newly occurring earthquake \({M}_{0}\) can be clustered into a previous sequence with \({M}_{i}\) in Step 1, the system then calculates the magnitude difference \(\Delta M (\Delta M= {M}_{i}-{M}_{new})\). If \(\Delta M\le -0.6\), \({M}_{new}\) is identified as the mainshock, and \({M}_{i}\) is considered only a foreshock. Then, the trend forecast process is the same as that in Step 2. If \(\Delta M\ge 0.6\), \({M}_{new}\) is considered an aftershock of \({M}_{i}\), and the trend conclusion of \({M}_{i}\) is maintained. If \(\left|\Delta M\right|<0.6\), \({M}_{new}\) and \({M}_{i}\) comprise a swarm sequence, and subsequent earthquakes of a comparable size could occur.

Criteria

To evaluate the prediction effect of the system, we conducted a validation test involving early sequence type determination and magnitude prediction of the largest aftershock. In the test of the sequence type, we referred to the scheme of Diao et al. (2002) and Tian et al. (2004); notably, the sequence types could be divided into two categories: (1) IET and MAT sequences were combined into the so-called “low hazards” category because no larger aftershocks will occur in the seismic zone. (2) Regarding the “high hazards” type, including doublets and earthquake swarms, stronger earthquakes may occur in the near future, which may cause notable damage.

This classification method is especially suitable for sequence type prediction based on the historical analogy method. Because historical earthquake records are basically qualitative descriptions with inadequate aftershock data, it is difficult to distinguish MAT from IET sequences. In addition, the energy released by the mainshock of the IET and MAT sequence types accounts for a large proportion of the total energy, so it is difficult to distinguish these two sequence types from the perspective of the sequence-released energy.

Therefore, sequence type prediction becomes a binary classification problem. Hence, we adopted the precision (P) and omission rate (O) to evaluate the prediction effect for a single category and employed the accuracy rate (A) to evaluate the overall accuracy:

$$\mathrm{P}= \frac{TP }{TP+FP}$$
(7)
$$\mathrm{O}=\frac{FN}{TP+ FN}$$
(8)
$$\mathrm{A}= \frac{TP+TN}{N}$$
(9)

where TP, FP, TN, and FN are the numbers of samples classified as true positives, false positives, true negatives, and false negatives, respectively, and N denotes the number of all samples.

Aftershock magnitude prediction is a one-dimensional prediction problem with space and time constraints. The results can be divided into two categories: hit and miss. When the magnitude of the subsequent aftershocks remains within ± 0.5 of the predicted magnitude, the prediction is considered a hit. Conversely, it is considered a miss. Therefore, the prediction of strong aftershocks with different magnitudes can also be regarded as a multiclassification problem, and the accuracy rate (A) can be applied to evaluate the classification results.

In addition, we defined the control rate τ and false alarm rate ρ to evaluate the effectiveness of aftershock upper range prediction:

$$\uptau =\left(1- \frac{{FP}_{u}}{N}\right)\times 100\%$$
(10)
$$\rho=\frac1N\sum\nolimits_{i=0}^N\frac{\widehat{M_{iu}}-M_i}{\widehat{M_{iu}}}\times100\%\;(M_{ic}>M_i)$$
(11)

where \({FP}_{u}\) denotes the number of earthquake cases with a magnitude above the upper range magnitude. The closer τ is to 1, the higher the system reliability. \(\widehat{{M}_{iu}}\) denotes the upper range of the magnitude of the subsequent aftershock predicted from earthquake case i, and \({M}_{i}\) denotes the actual magnitude of the largest aftershock. The lower the value of ρ is, the lower the false alarm rate of the system and the more meaningful the prediction.

To evaluate the deviation of the system estimates of the aftershock magnitude from the real value, we used the mean absolute error (MAE), which can be calculated as:

$$MAE=\frac1N\sum\nolimits_{i=0}^N\left|\widehat{M_{ia}}-M_i\right|$$
(12)

where \(\widehat{{M}_{ia}}\) is the exact aftershock magnitude of earthquake case i predicted by the system.

Overview of the system

Data

Aimed at aftershock forecasting based on historical records and the generation of rich background information output for post-seismic consultation, the CAAFS relies on abundant sources of basic data, such as national seismic catalogs (Table 1), national earthquake sequence catalogs and various geographical and geological vector data.

Table 1 Sources for the national seismic catalog

The national seismic catalog is mainly derived from the Chinese strong earthquake catalog (1831 B.C. to 1969 A.D., Ms ≥ 4.75), regional earthquake catalogs of China collected from the provincial earthquake administrations (1970–1994, ML ≥ 2), monthly earthquake reports provided by the CENC (1995–2008, ML ≥ 0), and unified earthquake catalog of the CENC (2008 and later, ML ≥ 0). To compensate for the occurrence of small earthquakes before 1970, we also collected historical catalog data provided in “Compilation of Catalogs of Various Periods of Chinese Earthquakes” (780 B.C. to 1984 A.D., with a minimum magnitude as low as ML -0.9), which was especially useful in supplementing reservoir earthquake catalog data.

Based on the national seismic catalog, we indentified more than 720,000 groups of historical earthquake sequences (shown in Fig. S1) via the linking method raised by Reasenberg (1985) and classified them into three types, i.e., the MAT, IET, and swarm types, according to Eq. (1). Among these sequence types, the sequences before 1900 were classified as MAT sequences because of incomplete records. Moreover, more than 80% of moderately strong earthquakes in mainland China belong to the MAT sequence type.

To verify the accuracy of sequence catalog extraction, we considered 741 earthquake sequences with M ≥ 5 in mainland China since 1970 and compared them individually to the corresponding sequences selected by the system. Although the number of aftershocks extracted by the system was far smaller than that extracted manually (Fig. S2), the sequence type consistency was relatively high, and the total proportions of the three sequence types were basically the same (Fig. S3). Notably, the sequence integrity slightly affected the type discrimination results because the sequence type only depended on the magnitude difference between the mainshock and the largest aftershock. The above comparison demonstrates the scientificity and reliability of the historical sequence data that this system relies on.

In addition, the basic data used by the system include topographic data provided by the Global Land and Sea Topography Database (http://www.bodc.ac.uk), active crustal block boundary data of mainland China retrieved from Zhang et al. (2003), geographic databases such as those of administrative divisions and reservoirs provided by the National Geographic Information Resource Catalog Service System (http://www.webmap.cn), and population data above the county level provided by the Ministry of Civil Affairs of China (http://www.mca.gov.cn/).

Scheme

The CAAFS was deployed on the cloud platform of the CENC in October 2018, and the system could output rapid aftershock forecasts as well as social geographic maps, geological background data, historical seismic activity statistics, and other earthquake consultation information. A flowchart of the system is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of the CAAFS system

The CAAFS monitors the National Earthquake Early Report Information Sharing System (EQIM) at 5-s intervals and obtains seismicity data, including time, longitude, latitude, magnitude, depth and epicenter, immediately after an earthquake. If the magnitude reaches a predetermined threshold (M ≥ 2 in metropolitan regions and M ≥ 3 in other areas of mainland China), the rapid aftershock forecast process is triggered. In addition to forecasting the largest aftershock magnitude over the next 7 days, a series of basic information is automatically calculated and visualized in maps, such as historical seismic activity data and earthquake sequence type statistics with different spatial ranges from the epicenter, as well as the epicenter topography, tectonic background, and affected population. In the process, aftershock forecast reports are finally generated in multiple formats, and they are issued to users through the server client and mobile application.

CAAFS users include the earthquake monitoring and prediction department of the CENC, the institute of earthquake forecasting of CEA, and those of all provincial earthquake administrations across China. Aftershock forecast reports containing forecast opinions produced by the system are provided to the local government within 30 min after manual revision.

Output

As stated above, in addition to forecast opinions on the early sequence types and subsequent strong aftershock magnitudes, as described in Section “Methodology”, the CAAFS automatically calculates auxiliary information related to the earthquake situation, as shown in Fig. 2 and Table 2. The rich information produced by the system can provide a data basis for rapid post-seismic consultation.

Fig. 2
figure 2

Parts of auxiliary maps (in Chinese) related to one event, adopting the Qinghai Maduo M 7.4 earthquake in 2021 as an example, including the locations of the (a) epicenter, (b) active crustal block, (c) maximum principal stress, (d) historical sequence types, (e) active faults and past earthquakes and (f) focal mechanism solutions nearby

Table 2 Product list of the CAAFS

Evaluation of the forecast results over the past 3 years

To verify the efficiency of the CAAFS, we examined the prediction results for 713 independent earthquake cases to which the system responded from 2019 to 2021, as shown in Fig. 3. A total of 713 aftershock forecast reports and all related graphics were output in a relatively short time (the blue dot-dashed line in Fig. 3b), with an average time of 5.85 min (Intel(R) Core(TM) i7-7500U CPU, 16G RAM), although the continuous system enhancements increased the time consumption. The magnitude in the test cases ranged from M 2.0 to M 7.4, but most of the values varied between M 3 and M 5.

Fig. 3
figure 3

(a) Spatial distribution and magnitude binning; (b) magnitude-time plot and product time consumption for the tested earthquakes from 2019 to 2021 in mainland China

Sequence type assessment

To verify the accuracy of the early sequence type assessments, we compared the predicted types in 713 cases with the clustered types computed by the R–J model described in Section “Data”. All the checked data of the 713 cases were manually verified for three months after the main shocks to ensure that the sequence types were correct, considering the statistics on earthquake sequences in mainland China obtained by Jiang et al. (2006a, b, c), which show that 93% of sequences have their largest aftershock within 90 days.

The performance of the system’s early sequence type prediction was evaluated quantitatively according to Eqs. (7)-(9). From the results listed in Table 3, 595 cases out of 713 have the correct prediction of sequence type, with an accuracy rate of 83.5%. Among them, the accuracy rate of the sequences with low hazards (IET and MAT) reaches 84.6%, and the omission rate is only 4%. For swarms detection (high hazards), the accuracy rate is 75.6%, and the omission rate reaches 58.5%.

Table 3 Evaluation parameters for early sequence type prediction

In addition, we determined the identification accuracy for the sequence type in the different magnitude bins. Since there were only 11 sequences with a mainshock of M ≥ 6, we categorized all sequences into four magnitude bins, i.e., M 2- M 2.9, M 3- M 3.9, M 4- M 4.9, and M ≥ 5. The results are shown in Fig. 4, which reveals that the precision bars above the central line in most magnitude bins were above 80%, except for the gray bar in the first magnitude bin, denoting the identification precision for swarm sequences with M 2-M 2.9, at only 53.8%. Underreported cases of the relatively safe sequences were mainly distributed in the M 2-M 2.9 magnitude bin, with an omission rate of 19.4%. In addition, the omission rate ranged from 33.3% to 74.4% for the sequences with high hazards and were distributed across all magnitude bins. The omission rate was much higher than that for the low hazards events. The above results revealed that the system achieved a high recognition rate for MAT and IET sequences but could not effciently detect swarms: it classified more than half of the swarm sequences as MAT or IET sequences, as indicated in Table 3.

Fig. 4
figure 4

Evaluation of early sequence type prediction in the different magnitude bins

Aftershock magnitude prediction

To evaluate the system performance of aftershock prediction, we next conducted a validation test of 713 forecast opinions via a comparison to the actual aftershocks within 7 days after the mainshock. In the test, the system produced aftershock upper range predictions in all 713 cases, among which 430 cases exhibited detectable aftershocks (M ≥ 2 in metropolitan areas and M ≥ 3 in the other regions) 7 days after the first event. The evaluation indicators derived from Eqs. (7) to (12) are summarized in Table 4. It should be noted that if the magnitude of the first event was less than M 4, the system could only forecast the upper range of the magnitude of the aftershock but not the exact magnitude. Therefore, the exact aftershock magnitude predictions covered only 163 cases.

Table 4 Validation test results for the aftershock predictions

Our results indicated that the aftershock control rate \(\tau\) reached 93.1% for the 713 cases, and the predicted upper range was exceeded within 7 days after the earthquake in only 49 cases, most of which were foreshocks of MAT sequences. For example, after the M 5.4 earthquake in Jiashi, Xinjiang Province, which occurred on January 18, 2020, the system predicted it as a MAT sequence and concluded that “there was a slight possibility for an earthquake of M ≥ 5 to occur within a short time”. However, one day later, another M 6.4 earthquake occurred in the original seismic zone, and the previous M 5.4 earthquake was actually a foreshock of the sequence. In contrast to the relatively reliable \(\tau\) value, the false alarm rate ρ for the 430 cases was 25.5%. Notably, the predicted value was one-fourth higher on average than the actual value.

However, considering the aftershock magnitude prediction errors in Fig. 5, 42.3% (69/163) of the aftershock magnitudes in all cases were correctly predicted (the predicted magnitude errors remained within ± 0.5). If the hit error was extended to 1, the accuracy rate increased to 73% (119/163). Moreover, the aftershock magnitudes predicted by the system tended to be higher than the actual magnitudes according to the magnitude difference distribution depicted in Fig. 5.

Fig. 5
figure 5

Predicted magnitude errors (predicted value—real value)

Specifically, we analyzed ρ and MAE in each magnitude bin to explore the efficiency for the different mainshock sizes, as shown in Fig. 6. Obviously, the prediction MAE values for M 2-M 2.9 earthquakes achieved the lowest median value of 0.5, followed by those for M 3-M 3.9 and M ≥ 4 earthquakes, with median values of 0.6 and 0.7, respectively. Analogously, the false alarm rate was better for M 2-M 2.9 earthquakes, with a median value of ρ = 14.3%, followed by M 3-M 3.9, M ≥ 5 and M 4-M 3.9 earthquakes, with ρ = 23.3, 28, and 30%, respectively. In other words, the median values of MAE and ρ did not vary greatly for most of the magnitude bins except for the M 2-M 2.9 bin, where the two indicators were much lower and more concentrated.

Fig. 6
figure 6

Values of ρ and MAE for the predicted aftershocks in each magnitude bin. In each box, the central line is the median, and the box edges indicate the first and third quartiles. The whiskers extend to indicate 1.5 times the interquartile range (IQR), and the outliers are marked with red crosses

Spatial variability in the test results

Due to the uneven seismicity distribution and historical record incompleteness for various crustal blocks of mainland China, the validity of the system for different spatial regions should also be examined. We analyzed the false alarm cases for aftershock upper range and exact magnitude prediction and examined the sequence type and regional tectonic characteristics.

Figure 7 shows that most of the earthquake cases in which the upper range of aftershock prediction was exceeded (Fig. 7a) or in which the exact magnitude could not be forecasted (Fig. 7b) were distributed in western China, including the Tianshan, Qaidam, Bayan Har, Qiangtang and Chuandian blocks, where the seismic monitoring capacity is lower than that in the east.

Fig. 7
figure 7

Spatial distribution and sequence types for the false alarm cases in (a) aftershock upper range prediction and (b) exact magnitude prediction; false alarms are marked with filled circles, while correctly predicted cases are marked with hollow circles

The sequence type significantly affects the prediction results. For instance, the majority of earthquakes that exceed the upper prediction range in the Qinghai–Tibet Plateau are swarms, which usually indicate upcoming seismic hazards in the affected regions. However, in the Chuandian block, the majority of the false alarm cases tend to be of the safe category, with MAT or IET types, and these cases are mostly caused by the misrecognition of foreshocks. In contrast, failures in exact magnitude prediction are likely to occur for relatively safe sequences with larger mainshock sizes relative to those in cases in which the upper range is not exceeded.

In addition, the local seismogenic tectonic background plays an important role in the forecast precision of this system. For example, the Qiangtang block was a highly seismically active area, and the tectonic stress field and seismogenic structure are relatively complex and mixed with Cenozoic and ancient faults (Wang et al. 1997). Therefore, the regularity of historical earthquake cases is not applicable to newly occurring earthquakes. The hit rate is accordingly lower, and more expert knowledge is needed to adjust the expert rules.

Temporal variability in the results

In the validation test of Sections “Sequence type assessment” and “Aftershock magnitude prediction”, the forecast issued by the system was assumed to apply to the 7 days after the first shock. In fact, the conclusion in the final report is vaguely expressed as “a short time”. Each person has a different subjective understanding of “a short time”. Therefore, we observed the real subsequent aftershocks in 713 earthquake cases at 1, 3, 7, 10, 15, 30, 60 and 90 days after the mainshock to verify the system effectiveness over time. Accordingly, we calculated the control rate τ and false alarm rate ρ to evaluate the aftershock upper range prediction following Eqs. (10) and (11), respectively, and the accuracy rate A and MAE for the exact aftershock magnitude forecasts were derived from Eqs. (9) and (12), respectively. It should be noted that the continuation of earthquake sequences varies with the different sizes of the mainshock, as expressed in Eq. (5). Accordingly, we masked any time slices exceeding the empirical sequence duration, as shown in Fig. 8.

Fig. 8
figure 8

Temporal variability analysis of the aftershock forecasts. The white translucent masks in (a) and (b) indicate that the sequences may end at that time and that the indicators are less meaningful

As shown in Fig. 8, the τ value delineated by the blue lines decreased to a certain extent over time, regardless of the mainshock size. In essence, it maintained a high level of 88% or above, indicating a stable precision throughout the whole 90-day period after the mainshock. Correspondingly, the ρ value (green lines) decreased synchronously with τ, especially in the M 2-M 2.9 bin, where ρ was significantly lower than that in the other magnitude bins. This may occur because the largest aftershock was more likely to occur over time, resulting in an increasing magnitude of the mainshock.

In contrast, the prediction accuracy value represented by the orange lines rapidly increased during the first 7 to 10 days after the mainshock. For example, in the M 2-M 2.9 range, the accuracy rate dramatically increased from 33.3% to 76.9%, with an enhancement of more than 50%. The growth in accuracy in the other magnitude bins varied between 30 and 60%. In addition, the MAE value, marked by gray bars in Fig. 8, obviously declined during the first 3 to 7 days, in contrast to that during the other time periods. In other words, the 4 indicators of the system performance were balanced well during the first 7 days relative to a shorter or longer period.

Some results were unexpected, as the trend of the indicators in the masked region in Fig. 8a and 8b is inconsistent with that in the earlier section, which may indicate that the previous sequences had ended and new stress release and adjustment processes occurred in the same region. In addition, in the M 3-M 3.9 range in Fig. 8b, both the ρ and A values on the first day indicated an abnormal tendency, which may occur because many aftershocks could not be detected on the first day, especially for small mainshocks, so these sequence samples were not counted, causing a higher A and lower ρ values.

Based on the above analysis, we could obtain the preliminary conclusion that the CAAFS achieves a satisfactory control rate for the upper range of subsequent earthquakes, and its favorable performance can even be extended to the entire sequence. The prediction accuracy of the largest aftershocks increases over time, but the predictions are generally valid for 7 days after the mainshock. As such, we can describe the system as forecasting the aftershock tendency within 7 days after the earthquake in this paper.

Discussions

Practicality and effectiveness

We introduced a practical system, i.e., the CAAFS, for near-real-time aftershock forecasting that has performed well in China for over 3 years. The system can provide verifiable trustworthy estimates of the seismic sequence type and possible size of the next largest aftershock only a few minutes after the first shock. The forecasting mechanism of the CAAFS can be attributed to the analogy of historical seismic sequences occurring in regional tectonic settings. The expert rules used in this system are obtained from experienced skilled seismologists founded on a sufficient amount of historical seismic monitoring.

This kind of expert experience plays an important role in rapid post-earthquake consultation work in China. For example, Jiang (2010) described in detail how Chinese seismologists forecasted the largest aftershock of the 2008 Wenchuan M 8.0 earthquake on the following basis:

  1. a.

    The subsequent largest aftershock calculated via Bath’s law (Bath 1965) exhibited a magnitude of M 6.8.

  2. b.

    The average magnitude difference between the mainshock and the largest aftershock among M \(\ge 5\) sequences in mainland China is 1.2, which indicates that the subsequent largest aftershock could attain a magnitude of approximately M 6.6.

  3. c.

    The magnitude of the largest aftershocks of M \(\ge 7\) sequences in mainland China is generally less than M 6.8.

  4. d.

    The magnitude differences between thrust fault earthquakes and the corresponding largest aftershocks worldwide are generally greater than 1.

Grounded in statistical experience, seismologists reported to the central government that the magnitude of the largest aftershock could vary between M 6 and M 7. In fact, the largest aftershock of the Wenchuan earthquake sequence was an M 6.4 earthquake, occurring in Qingchuan County on May 25, 2008, which agreed with the forecast.

Inconsistent with other popular aftershock forecasting frameworks, such as ETAS-based models, historical analogy methods are simple, intuitive and easy to establish. The advantages of the above historical earthquake analogy method are as follows:

  • First, a real-time forecast can be issued immediately after earthquake occurrence, thereby relying only on the earthquake location and size and historical catalogs with known errors and characteristics.

  • Second, historical sequence types can be determined only by the difference between the mainshock and the largest aftershock. The earthquake catalog completeness slightly influences the final results.

  • Finally, with the use of area-based analysis instead of fault-specific analysis, the identification of seismogenic faults slightly affects the forecasting process, which is beneficial for the first half-hour aftershock forecasting.

Using such historical analogy method mentioned above, the forecasts automatically generated by the system are comparable to human-drafted forecasts. According to our retrospective review, nearly half of the forecasting reports is sent directly to the central or local government as a first version without manual modification, which can greatly reduce the pressure on seismologists in post-earthquake emergency consultation work.

Limitations

Although the CAAFS performed well in early sequence type classification and aftershock forecasting in mainland China, there remain certain system shortcomings and deficiencies.

First, the system achieves an insufficient accuracy in earthquake swarm detection. According to existing research, the proportion of swarms among all M ≥ 5 earthquakes is approximately 18% in mainland China (Jiang et al. 2006c). Swarm sequences more often occur in southwestern Yunnan Province, eastern Qinghai Province and the southwest Tianshan block in Xinjiang Province, as well as at the intersection of the Shanxi and Yinshan‒Yanshan seismic belts and at the middle of the Hebei Plain belt. When applied in the above regions, users should give more attention to assessments of earthquake swarms.

Second, the foreshock-mainshock-aftershock sequence type is not established in the historical sequence dataset. Approximately 5%-10% of strong earthquakes worldwide produce observable foreshocks (Reasenberg and Jones 1989), and the proportion increases with instrumental coverage improvement, as inferred by Trugman and Ross (2019). Although many researchers have attempted to capture foreshocks to generate short-term or imminent predictions of upcoming large earthquakes (Jones and Molnar 1979; McGuire et al. 2005; Gasperini et al. 2021), the identification of foreshocks remains uncertain due to the lack of understanding of the physical mechanism of foreshock activity. Moreover, the statistical characteristics of foreshocks are not considered in this system, and a small number of large earthquakes may be missed as a consequence.

Finally, the system relies on historical earthquake cases mainly collected from a circular region around the epicenter without considering fault properties. In fact, the earthquake type is also closely related to regional fault characteristics. For example, when an earthquake occurs in a region dominated by dip-slip or thrust faults, the sequence is more likely to be of the MAT type than of the swarm type. Since relevant research only remains at the statistical level, with an inadequate physical mechanism foundation, currently, the system considers only regional analogies to generate more robust forecasts.

In addition to the deficient understanding of the physical mechanism, both of the first two issues are also related to the accuracy of seismic sequence identification because a single parameter for sequence determination (for example, magnitude difference) is readily affected by the sample number, resulting in a low statistical significance. Besides, source parameters such as the stress drop or apparent stress for medium and small earthquakes vary with the magnitude (Allman and Shearer 2009; Zhao et al. 2011), which leads to greater uncertainties. Recently, we examined the probability of employing more sequence factors in machine learning approach to obtain better sequence discrimination results (Jiang and Wang 2023), such as parameters related to the focal mechanism, sequences decay, normalized energy entropy and so on. The application of machine learning methods may be the next step in earthquake forecasting (Beroza et al 2021; Rundle et al 2022). However, focused attention should be given to the applicability of small- or moderate-scale sample data in machine learning algorithms, as suggested by Grinsztajn (2022).

Furthermore, regional variability in sequence types has been widely observed and analyzed globally (Chen et al. 1987; Ben-Zion et al. 1993, 2006; Yamanaka and Kikuchi 2004; Kanamori and Rivera 2004). For example, Bonatis et al. (2022) defined six distinct tectonic regions in Greece and estimated controlling factors of the duration and productivity. Similarly, China comprises five active blocks, 26 seismic zones, and dozens of seismic subregions based on brittle failure theory of seismogenic faults (Zhang et al. 2005; Qin et al. 2016). Whether these tectonic-related regions can be used as statistical units in the application of the historical analogy method to past earthquake cases should be investigated in the future.

Conclusions

Although short-term forecast of aftershocks is still an open scientific goal, we can obtain useful information about earthquake hazard based on existing incomplete information and imperfect forecasting models. The automatic system described in this paper, namely CAAFS, issues near real-time forecasts of the aftershocks only a few minutes after earthquake occurrence based on a rich historical case database and expert rules. Given the magnitude of newly occurred event and the type ratios of past sequences in a circular region around the epicenter, CAAFS first assesses the possible type of the current event and then outputs the largest aftershock forecast report in several minutes, along with a series of auxiliary maps for consultation.

CAAFS was implemented to CENC and served for over 30 provincial earthquake administrations as well as the institute of earthquake forecasting, CEA since the end of 2018. It has generated 731 short-term aftershock forecast reports from 2019 to 2021. The evaluation test shows that the accuracy rate for early sequence type classification of all the 713 cases reaches 83.5%, and over 90% of the aftershocks remain within the predicted upper range, proving that this system is good at assessing future hazards. Moreover, 163 cases were given the explicit magnitude for the largest aftershocks, and the hit rate reached 42.3%, implying a relatively reliable behavior on aftershock forecasting.

The limitations for CAAFS are found in foreshocks and swarms detection, especially in diverse tectonic settings. On one hand, the physics behind these seismicity patterns are still under discussion. On the other hand, the determination of earthquake clustering and quantitative criteria for discriminating different sequence patterns are not clearly defined. Further research might focus on exploring more characteristics of seismic sequences using new technology, such as the machine learning algorithms, and investigating tectonically related mechanisms to help approximate the past sequence behavior into future forecasting.