A test on methods for MC estimation based on earthquake catalog

This study tested five methods widely used in estimating the complete magnitudes (MC) of earthquakes. Using catalogs of observed earthquake properties, we test the performance of these five algorithms under several challenging conditions, such as small volume of events and spatial‐emporal heterogeneity, in order to see whether the algorithms are stable and in agreement with known data. We find that the maximum curvature method (MAXC) has perfect stability, but will significantly underestimate MC unless heterogeneity is absent. MC estimated by the b‐value stability method (MBS) requires many events to reach a stable result. Results from the goodness of fit method (GFT) were unstable when heterogeneity lowered the fitness level. The entire magnitude range method (EMR) is relatively stable in most conditions, and can reflect the change in MC when heterogeneity exists, but when the incomplete part of the earthquake catalog is dismissed, this method fails. The median‐based analysis of the segment slope method (MBASS) can tolerate small sample size, but is incapable of reflecting the missing degree of small events in aftershock sequences. In conditions where MC changes rapidly, such as in aftershock sequences, observing the time sequence directly can give a precise estimation of the complete sub‐atalog, but only when the number of events available for study is large enough can the MAXC, GFT, and MBS methods give a similarly reliable estimation.


Introduction
Earthquake catalogs are important for seismicity analysis and seismic hazard assessment. In compiling earthquake catalogs, not all events can be detected and thus cataloged, for example when the event is too small or the ambient noise is too large, and when the aftershock waveform is overwhelmed by the main shock (Iwata, 2008). The incompleteness of earthquake catalogs can significantly affect any subsequent analysis that relies on catalog data. Thus, assessing the completeness of an earthquake catalog is a necessary step in some researches, such as b-value estimation.
The complete magnitude (M C ) of an eearthquake catalog is defined as: the magnitude of the weakest event which is fully collected in an earthquake catalog. A catalog that includes all events whose magnitude is greater or equal to M C is called a complete catalog. In empirical studies, complete catalogs have been found to exhibit a log-linear frequency magnitude distribution (FMD) as depicted by equation (1): This statistical law is called the G-R law (Gutenberg and Richter, 1944); N represents the number of events with magnitude larger than magnitude m, and M C is the smallest magnitude in the com-plete catalog.
Traditionally, M C is estimated based on this relationship, where we estimate M C as the point deviating from the G-R law. Such methods are less computationally intensive, and are still frequently used. Another kind of method, based on waveform information (e.g., Gomberg, 1991), is rarely applied due to the huge computation time it requires. Another important parameter limiting use of the G-R law is the b-value, which controls the relative number of large and small earthquakes. Relative studies (e.g., Schorlemmer et al., 2005) have shown that b-values may reflect the stress level of the particular region; we can thus consider the b-value as having real physical meaning. Aki (1965) introduced what has become the common method for b-value estimation, the maximum likelihood estimation, as is shown in equation (2) ) . (2)

⟨M⟩
In the above equation, is the mean of the magnitudes that are larger than M C ; Δm is the width of the magnitude bin, commonly chosen to be 0.1. In this relationship, M C directly affects the estimation of b-value, so we can reasonably assume that our estimation of M C can affect the b-value estimate, and this is verified in some tests (e.g., Woessner and Wiemer, 2005).
The complete catalog is generally extracted from a raw catalog by dismissing its incomplete part. And since the frequency dimen-sion is log scaled in FMD, a small difference in M C can significantly change N, the number of events: When M C is over-estimated, the quantity of available data is considerably decreased; this reduction affects the stability of further calculations, especially if the raw catalog is small. On the other hand, an under-estimated M C will result in biased results due to the uncollected (incomplete) events. The importance of correctly estimating M C is thus clear on these grounds alone. But since equation (2) and related studies correlate M C and b-value, an accurate M C estimation is also crucial to correctly estimating the b-value.
In real applications, considering the spatiotemporal heterogeneity of M C , investigators must consider the pattern of an event's spatiotemporal distribution, so as to determine its extension in space and time, when the catalog is relatively complete (Wiemer and Wyss, 2000). When spatial heterogeneity exists, in mapping M C the data used to calculate M C on each spatial grid can be heterogeneous. A robust technique for mapping M C should give a median estimation if the data on a grid is composed of subsets with different M C . On the other hand, due to a high spatial resolution or low seismicity, there can be only about 100 events in one grid (Mignan et al., 2011), which challenges the algorithm's stability. In conditions where M C changes rapidly, such as in aftershock sequences, M C will be quite large immediately following the main shock, since the waveform of aftershocks is affected or even overwhelmed by that of the main shock. However, in determining parameters of some models of seismicity (such as Omori-Utsu formula, Utsu, 1957), a complete sub-catalog is required (Zhuang JC et al., 2012). Thus, a standard test for a method's performance in temporal heterogeneity is whether it can detect sensitively the point at which M C decreases to a stable level.
In brief, several alternative methods for calculating M C are often used in such special conditions as: small volume of events, spatial M C heterogeneity, and rapid change of M C temporally. We will test the performance of the five existing methods, in these special situations, for M C estimation based on earthquake catalogs, and proceed to give suggestions for when each might be most appropriately used. Similar work has been done using synthetic catalogs (such as Huang YL et al., 2016); our study provides further validation by testing on data from real catalogs of different characters and from different seismic networks.

Data
We used three groups of datasets for testing three aspects of M C estimation performance. A detailed description follows.

Dependence on Sample Size
In this part of the test, we use the same data sets as in the original paper introducing the MBASS method (Amorese, 2007) . To test his new method, Amorese tried to reproduce the results in Woessner and  (abbreviated as W&W2005 hereafter), and his data sets are basically the same as in W&W2005. These are the data sets: (1) Regional catalog ECOS, which is a subset of the Earthquake Catalog of Swiss events from the Swiss Seismological Service. The subset we used records earthquakes in the southern province Wallis from 1992 to 1998; the magnitude scale is local magnitude.
(2) Regional catalog NCSN, compiled by the Northern California Seismic Network. We use a subset from the San Francisco Bay area during 1998-2002; this catalog gives coda-duration magnitudes.
(3) Volcanic region NIED, maintained by the National Research Institute for Earth Science and Disaster Prevention. The subset covers 1992-2002 for a volcanic region in Kanto province; magnitudes are locally reported.
A brief description of each catalog is summarized in Table 1, and their FMD are plotted in Figure 1. The catalogs are from different tectonic regions and of different magnitude scales, which increases the likelihood that our results will be reliable in a wide range of situations.
Here we chose the data set from Schorlemmer and Woessner (2008), a subset from the Southern California Seismic Network (SCSN), with a time span of January 2001 to July 2007. We chose this earthquake catalog because previous studies have mapped its M C distribution (Schorlemmer and Woessner, 2008). Two subregions with significant different M C values are selected, whose data are mixed to create heterogeneity, which is what can happen in real M C mapping procedures. We fixed the size of the mixed catalog, and changed the mixing ratio to see the effect of heterogeneity on each algorithm. The expectation is that the resulting M C will change linearly with increasing mixing ratio.

Detectability of Short Term Missing of Small Events
A complete sub catalog is needed in fitting an event to models of seismicity. In terms of aftershock sequence, the starting time for a complete catalog needs to be estimated. Zhuang JC et al., (2012) suggests that this starting time can be estimated by observing directly the time sequence, which is magnitude in respect of events' index. We wish to see if the algorithms for M C estimation can reflect the missing of events, and whether the results can be compared to that of the 'visual inspection method'. We select four sets of aftershock sequences. (1) Wenchuan earthquake: the aftershock sequence occurred on 12 May 2008 in Sichuan Province, China, with a magnitude M8.0. The data set is from the appendix of an article on temporal seismicity model on CORSSA (Zhuang JC et al., 2012). There are 198 events in the catalog.
(2) Fukuoka earthquake: 16 August 2005 offshore of the western area of Fukuoka-Ken, Kyushu, Japan. The aftershock data are the test data set in the software SASeis2006 (Ogata, 2006). There are 403 events in the catalog.
(3) Miyagi earthquake: M6.2, 26 July 2003 in northern Miyaji-Ken, northern Japan. The test data set is in the software SASeis2006 (Ogata, 2006). We extract a subset with magnitude larger than 1.5, 1459 events in this catalog.
(4) Landers earthquake: a 1992 M W 7.3 aftershock sequence extracted from the SCSN catalog. We use the subset of events of magnitude larger than 1. The selected subset includes 19707 events.
These aftershock sequences are from different tectonic regions and the data were collected with different detecting abilities (thus characterized by differing numbers of events). This variability in our data sets can give us important knowledge of how well the methods work in different situations.

Algorithms for M C Estimation Based on Earthquake Catalog
The five methods that are used for estimating M C : (1) MAXC, maximum curvature method by Wiemer and Wyss (2000). The magnitude corresponding to the point of the maximum curvature deviating G-R law is considered an estimate of M C , i.e., the magnitude connecting to the linear cumulative part of FMD.
(2) GFT, goodness-of-fit method by Wiemer and Wyss (2000). This method extracts subsets with ascending cut-off magnitude M CO , and measures the deviation between these subsets and the G-R law by making linear fits to the G-R law, which will finally output a series of goodness-of-fit values. The goodness-of-fit R is defined as: where B i and S i are the observed and predicted number of events in i-th magnitude bin. With larger M CO , fewer events are missed in the catalog, and the corresponding R value tends to grow. Once this R reaches a confidence level of 95% or 90%, the corresponding M CO is determined as M C . Note that if a 90% goodness-of-fit level cannot be reached, the GFT method fails.
(3) MBS, M C by b-value stability by Cao AM and Gao SS (2002) and modified by Woessner and Wiemer (2005). This method, too, extracts subsets with ascending cut-off magnitude M CO , and calculates the corresponding b-values for each subset. When small events are not recorded, larger earthquakes constitute a relatively heaver part of the subset of small M CO , and the corresponding bvalue is too low; with M CO ascending, the estimated b-value becomes higher and finally reaches a 'plateau'. The first M CO result- ing in a stable b-value is the estimated M C .
With development of this method, the criterion for stability is modified. Cao AM and Gao SS (2002) measures the change of bvalue for two successive M CO , and b-value is considered stable if the difference is smaller than 0.03. But Woessner and Wiemer (2005) found that this criterion will lead to poor numerical stability of the MBS method. Their criterion measures the difference between the b-value for a certain M CO and the average b-value for this M CO , which is defined by equation (4): where Δm is the size of magnitude bin, and win_M is the window length for calculating.
A b-value is considered stable if the difference is smaller than the b-value uncertainty by Shi YL and Bolt (1982), given by the following equation: ⟨M⟩ where b is the calculated b-value; N is the number of events to calculate b; is the average magnitude.
(4) EMR, entire magnitude range method by Woessner and Wiemer (2005). This method tries to fit the entire non-cumulative part of FMD, instead of merely the linear complete part of FMD. Woessner and Wiemer (2005) used the normal cumulative distribution function (CDF) to model the incomplete part of the seismic intensity, and the G-R law for the complete part, which is depicted by equation (6). They compared normal CDF with other CDFs, and relative tests show that normal CDF fits best. .

(6)
The detailed procedure is: first divide the catalog into complete and incomplete parts by an ascending cut-off magnitude, and then fit the data separately. For each M CO , there are four parameters changeable: σ and μ of the normal CDF, and a-value and bvalue in the G-R law. The M CO that leads to the best fit is the final M C estimation.
(5) MBASS, the median-based analysis of the segment slope by Amorese (2007). The MBASS method calculates M C by detecting the multiple change-point in the non-cumulative FMD's slope series. The segment slope at magnitude m in FMD is defined as equation: where N is the number of events; Δm is the size of magnitude bin.
After each segment slope at consecutive magnitudes was calculated, the median was subtracted from this slope series. The Wilcoxon-Mann-Whitney (WMW) test (Mann and Whitney, 1947;Wilcoxon, 1945) was then applied to find a notable and stable change point. These two steps are done iteratively, and the mag-nitude corresponding to the main change in the median of the slope series is found, which is the output M C value. The detailed algorithm and corresponding program is provided in Amorese (2007).
All the programs implementing these five methods can be found in the package ZMAP (Wiemer, 2001). Our results were obtained from this package, which is used also by other scholars, such as Woessner and Wiemer (2005).

The Uncertainty for M C and b-value Estimation
The real catalogs are generated following a certain probability distribution. We assume that the FMD of a catalog can represent its real probability distribution, while some random fluctuations are inevitable. To assess the influence of these fluctuations, or the reliability of the results, we calculate M C and b-values and their uncertainties by these steps: We use a Monte Carlo approximation of the bootstrap method (Efron, 1979), which is to draw, with replacement from the original catalog, a certain size bootstrap sample. The sampling procedure is repeated several hundreds of times, and we estimate M C for each bootstrap sample. The mean value of these M C is the final result for M C estimation, and its standard deviation gives the uncertainty. The corresponding b-value is estimated by MLE via the equation (2), with the uncertainty estimated by Shi YL and Bolt's (1982) criteria (equation (5)).

Generation of Synthetic Catalogs
In real catalogs, we do not know the true M C . In order to find the accuracy of each method, we can take advantage of synthetic tests, where we can set the "true" value of M C .
The model introduced by Ogata and Katsura's (1993) (abbreviated as OK1993 hereafter) was chosen as the basis for synthetic catalog generation. OK1993 models the detectability of station by normal CDF, following equation (10). The final observed magnitude frequency intensity (equation (8)) is given by multiplication of the detection rate probability and the real intensity from the G-R law (equation (9)).
The difference between the OK1993 model and the EMR method lies in the fact that no separation of complete and incomplete parts is made. This avoids introduction of a discontinuous point in FMD, which is unreasonable physically. The magnitude of each event is generated stochastically by the rejection method (Zhuang JC and Touati, 2015). The rejection method is a basic technique for simulating a random variable from a known distribution (equation (8) in our case). The brief procedure is, first, to generate a random number, and then determine whether to adopt this number, according to the known probability density function.
The estimation of theoretical M C value follows Huang YL et al.  2016). Theoretical M C is estimated as the magnitude above which one event in every 500 events is expected to be unrecorded. To be specific, it is the resulting M value when N equals 500 in equation (11), where q(m) has the same definition as in equation (10).
The detailed discussion for choosing the criteria of theoretical M C is presented in Huang YL et al. (2016).

Dependence on Sample Size
For real catalogs, we used the bootstrap method mentioned above to get M C and b-values (with their uncertainties) under different sampling size. This is done by changing the size of bootstrap samples to get catalogs of different volumes of events that nevertheless come from the same FMD. This method will leave or even magnify a certain pattern in the FMD to bootstrap samples, which can cause systematic deviation in the final result. Thus, in tests on synthetic catalogs, we tried two different approaches. One is to generate catalogs of the same size as the corresponding real catalogs, then repeat the same procedure as to the real catalogs; the other is to generate directly different sizes of catalogs. We tested the difference between these two approaches.
In generating synthetic catalogs, we cite the results from Mignan and Woessner (2012), who used the OK1993 model to fit the dataset in W&W2005; fitted parameters are presented.

Tolerance to Heterogeneity
The bootstrap method mentioned before is used to select randomly a certain number of events from both subsets. Considering the stochastic fluctuation, selected events may come mainly from large or small magnitude events of one subset, and thus the final results of M C to mixing ratio will not change monotonously. To attenuate this effect, for each mixing ratio the procedure was repeated five times and the resulting M C values were averaged. Apart from consuming an unreasonable amount of time, increasing uncertainty is an important reason why we do not repeat more than five times.

Detectability of Small Missed Events
We used a sliding window approach, calculating M C for events in the moving 'window'. A relation between M C and starting index is plotted to see whether the M C algorithms can detect the missing of small events in aftershock sequences, and to compare the results to that of the 'visual inspection method'.

Problems Found in Reproducing the Results of W&W2005 and Relative Modification in Programs
Before conducting our tests, we tried to reproduce the results of W&W2005 for inspection of the programs and data sets we used. We found that the MBS method gave much larger results than W&W2005 for the four catalogs we used, as shown in Table 2. We then plotted b-value in respect to Mco for the NCSN catalog. Though a 'plateau' is found when Mco equals about 1.4, the bvalue uncertainty is too big by Shi and Bolt's (1982) criteria, so the b-value is thus not considered stable (Fig. 2a).
In W&W2005, the author pointed out that 'window length' has a strong influence on the result, and suggested that it be set to 5. Our tests show that too short a 'window', such as 2 or 3, will lead to large uncertainty, which means poor computational stability. We also found in the code that average b-value is estimated from b-values for magnitudes from a certain M CO forward (equation 4), which will give too much weight to b-values for larger M CO .
So we tried to change the starting point, and finally found that a -0.1 shift will give better results (following equation (12)). Modified results are shown in Figure 2 and Table 2. ) . (12)

Resample from real catalog
Applying the bootstrap method mentioned before to real catalogs, setting the sample size N of the bootstrap samples to 100, 200, 500, 1000, 5000, 10000, 100000, respectively, the resulting relation between M C and b-value to N is shown in Figure 3. We find in the four real catalogs that, for small N, the MBS method gives a substantially lower M C estimate than other methods and the bvalue is also seriously under-estimated; conversely, when N is large, the resulting M C and b-values are much larger than from other methods. This shows that the MBS method is sensitive to N. In contrast, results from the MAXC method are quite stable, but are always the lowest of all five methods. Estimation of EMR will decrease slightly with ascending N, and will finally be similar to that of the MAXC method, which tends to underestimate M C ; thus a modification value of about 0.2 is needed (Huang et al., 2016). The GFT and MBASS methods have stable performance in this part of the test, but give estimates that are smaller than those of MBS.

Resample from synthetic catalogs
As a comparison to last part, synthetic catalogs of the same sizes as the original ones were processed in the same way. The FMD of synthetic catalogs and their theoretical M C values are shown in Figure 4, and the results are shown in Figure 5. It is worth mention that additional treatment needs to be done for the 'tail' of FMD, since large earthquakes rarely happen and tend to deviate We find the same phenomenon of stability for the MBS method; but for large N, b-value estimation is not substantially as large as for M C . The difference exists for the GFT method when M C is over estimated for large N, but has no influence on b-value. Generally, all methods tend to under-estimate M C compared to the theoretical M C value estimated by Yilei Huang's criteria (Huang et al., 2016).

Generating direct catalogs of different sample sizes
The results are shown in Figure 6. Note that all five methods are applied to the same catalogs, so the statistical fluctuations in creating synthetic catalogs are the same for all methods. The general patterns are the same for the last part of the test, but MBS performs more stable than other methods.

Tolerance to Spatial Heterogeneity
We select from Schorlemmer and Woessner (2008) Figure 7. Note that the first subset has a flat non-cumulative FMD, which is caused by heterogeneity (Wiemer and Wyss, 2000;Mignan et al., 2011).
The mixing ratio of the two sub catalogs changes from 0 to 1, and the sizes of mixed catalogs are set to 500, 1000, 3000, 5000, 7000, 10000, 15000, 20000, respectively. Affected by heterogeneity, the goodness-of-fit criterion for the GFT method is set to 90%; other side, the method would be numerically unstable. The results are shown in Figure 8.
More fluctuation exists when N is no more than 1000; below that threshold, computation is not stable. For samples large enough, results for the five methods are divided into three groups: the MAXC and MBASS methods provide the lower estimations; they rise and become closer with increasing ratios; the GFT and EMR methods lead to a median and relatively stable estimation; the MBS method leads to the largest fluctuations, and even the downward trend is ambiguous.

Aftershock sequence for Wenchuan and Fukuoka earthquake
This set of aftershock sequences contain a small number of events. The results are shown in Figure 9. For the Wenchuan earthquake, since the sample size is small, MBS is unstable and the criterion for the GFT method is set to 90%. However, the visual inspection estimates the 'starting time' to be 40, which is not big enough for window length and would lead to an unstable estimation. So we choose a window length considering the tradeoff between window length and numerical stability. If the M C algorithms work, the resulting M C value should decrease with a time lapse. But as we can see in the figure, the usable MAXC, GFT and MBASS methods cannot reflect the missing of small events. The same phenomenon appears in the Fukuoka earthquake results.

Aftershock sequence for Miyaji and Landers earthquake
When a catalog contains a large volume of events, a longer window length can be applied. The results are shown in Figure 10. For both of these earthquakes, the MBASS method gives a large M C with strong fluctuations, so for the Landers aftershock sequence     we plotted results only for the other three methods. It can be seen in this part of the test that M C estimated by the MAXC, GFT and MBS methods decreases to a rather stable value at the index close to that estimated by visual inspection. This means that they can detect the missing of small events.

Discussion and Conclusions
In M C estimation, all methods except the MBS method can give reasonable estimates when applied to data from small sample catalogs. When N is large, the MBS method will lead to a large M C in tests based on real catalogs, but not when used on synthetic catalogs of different sample sizes. This fact reveals some kind of instability of the MBS method, but also shows that controlling bootstrap size is not a perfect way to study a method's dependence on sample size. Perhaps a systematic error is magnified in the process of resampling, by some 'bad' pattern. In synthetic tests, the GFT method overestimates M C significantly, which may result from the method used in creating synthetic catalogs. However, it is possible that the GFT method does lead to a substantially larger M C value when the sample size is large.
In terms of b-value estimation, though M C values can be significantly overestimated in synthetic tests, b-values are not. This may be explained if b-value as function of cutoff magnitude for synthetic catalogs has a long 'plateau', and even a large M C estima- tion stays in this 'plateau'. These observations reflect the overidealization of synthetic catalogs; some 'flaw' in real catalogs is not being modeled. Thus, for b-value estimation, we suggest that the figure of b-value to cutoff magnitude should be inspected to get a visual impression.
Heterogeneity exists in our first selected subset. Results show that when the first subset takes a small ratio in the mixed catalog, M C estimated by the MAXC method will be significantly lower than for the other four methods, which reveals a poor tolerance of heterogeneity by the MAXC method. Though M C generally decreases by the MBASS method, fluctuations of the curve show an incapability of MBASS to reflect changes of M C when heterogeneity exists.
On another hand, the conclusion of Mignan et al. (2011) is verified, since the separation between the results of MAXC and MBASS decreases when the ratio of the second subset increases. Results for the MBS method show even stronger fluctuation, and the trend of M C decreasing can hardly be seen. This means that M C estimation by the MBS method is untrustworthy for heterogeneous catalogs.
GFT and EMR perform better in this test; the results show a linear change of M C with increasing mixing ratio. This property enables these methods to be applied to studies of M C mapping, as has been done by the EMR method in Schorlemmer and Woessner (2008).
In tests for aftershock sequences ranging from a few hundreds of  events to thousands of events, the 'visual inspection method' can always be effectively applied. The MAXC, GFT and MBS methods can detect missed small events only when the sample size is large; the MBASS method does not have this property. Since the EMR method needs a whole range of magnitudes to fit, it is not valid in this condition, where the incomplete part of FMD is discarded in extracting aftershock sequences.
The properties of the five methods are summarized in Table 3, based on the discussion above.