Spatio-temporal clustering of earthquakes based on distribution of magnitudes

It is expected that the pronounced decrease in b-value of the Gutenberg–Richter law for some region during some time interval can be a promising precursor in forecasting earthquakes with large magnitudes, and thus we address the problem of automatically identifying such spatio-temporal change points as several clusters consisting of earthquakes whose b-values are substantially smaller than the total one. For this purpose, we propose a new method consisting of two phases: tree construction and tree separation. In the former phase, we employ one of two different declustering algorithms called single-link and correlation-metric developed in the field of seismology, while in the later phase, we employ a variant of the change-point detection algorithm, developed in the field of data mining. In the later phase, we also employ one of two different types of objective functions, i.e., the average magnitude which is inversely proportional to the b-value, and the likelihood function based on the Gutenberg–Richter law. Here note that since the magnitudes of most earthquakes are relatively small, we formulate our problem so as to produce one relatively large cluster and the other small clusters having substantially larger average magnitudes or smaller b-values. In addition, in order to characterize some properties of our proposed methods, we present a method of analyzing magnitude correlation over an earthquake network. In our empirical evaluation using earthquake catalog data covering the whole of Japan, we show that our proposed method employing the single-link strategy can produce more desirable results for our purpose in terms of the improvement of weighted sums of variances, average logarithmic likelihoods, visualization results, and magnitude correlation analyses.

and trends and inspiring us to lead to the discovery of new knowledge and insights underlying these interactions.
The clustering of earthquakes is important for many applications in seismology, including seismic activity modeling, and earthquake forecasting. Here, for the sake of explaining our research purpose more specifically, we explain two important notions in seismology, i.e., the Gutenberg-Richter law and its parameter b referred to as the b-value. The Gutenberg-Richter law expresses the empirical relationship between the earthquake magnitude m and the number N(m) of earthquakes with magnitudes equal to or larger than m in any given region and time period as N (m) = 10 a−bm (Gutenberg and Richter 1954). Here a and b are the constant parameters where the value of a denotes the seismicity level in a region and the b-value is the slope parameter of the magnitude frequency curve which commonly close to 1.0 in seismically active regions. This means that when the b-value is smaller, the slope of the magnitude log-frequency curve is gradual, i.e., earthquakes with large magnitudes occur relatively frequently, while when the b-value is larger, the slope is steep, i.e., earthquakes with large magnitudes are relatively quite rare. Especially, it is expected that the pronounced decrease in the b-value of the Gutenberg-Richter law for some region during some time interval can be a promising precursor in forecasting earthquakes with large magnitudes (Nanjo et al. 2012). Thus we address the problem of automatically identifying such spatio-temporal change points as several clusters consisting of earthquakes whose b-values are substantially smaller than the total one.
In our previous work (Yamagishi et al. 2020), for this problem, we have proposed a basic idea of the method which uniquely combines some techniques developed in two different fields, i.e., declustering algorithms (Frohlich and Davis 1990; Davis and Frohlich 1991;Baiesi and Paczuski 2004) in the field of seismology and a change-point detection algorithm (Yamagishi et al. 2014;Yamagishi and Saito 2017) in the field of data mining. In this paper, we further propose to extend this method which enables to employ one of two different types of objective functions for our clustering, i.e., the average magnitude which is inversely proportional to the b-value, and the likelihood function based on the Gutenberg-Richter law. Here note that since the magnitudes of most earthquakes are relatively small, we formulate our problem so as to produce one relatively large cluster and the other small clusters having substantially larger average magnitudes or smaller b-values. Moreover, in order to characterize some properties of our proposed method, we newly present a method of analyzing magnitude correlation over an earthquake network. In our empirical evaluation using an earthquake catalog covering the whole of Japan, it was confirmed that we could generally obtain the clustering results each of which consists of one relatively large cluster and the other small clusters having substantially different average magnitude from the total one. The paper is organized as follows. We describe related work in "Related work" section, give our problem setting and the proposed methods for clustering an earthquake catalog in "Proposed method" section, and present a method of analyzing magnitude correlation to characterize some properties of our proposed methods in "Magnitude correlation analysis" section. We report and discuss experimental results using a real catalog in "Experimental evaluation" section and conclude this paper and address the future work in "Conclusion" section.

Related work
In this paper, we propose a new method by combining declustering algorithms and a change-point detection algorithm developed in the two different fields of seismology and data mining, respectively. Thus, we describe some existing studies relating to these algorithms below.

Declustering algorithms
Seismicity declustering is the process of separating an earthquake catalog into foreshocks, mainshocks, and aftershocks, and several algorithms have been developed from various perspectives (van Stiphout et al. 2012). The window method is known as a simple way of identifying mainshocks and aftershocks. As a beginning of this method, the lengths and durations of windows were proposed by Knopoff and Gardner (1972) and Gardner and Knopoff (1974). After that, the alternative window parameter settings are proposed by Uhrhammer (1986), and comparative experiments were conducted by Molchan and Dmitrieva (1992). Meanwhile, the algorithm of Reasenberg (1985) called the cluster method assumed an interaction zone centered on each earthquake. This method based on the previous work of Savage (1972), and Molchan and Dmitrieva (1992) provide a condensed summary of the original paper of Reasenberg. As an alternative to deterministic declustering methods above, ideas of probabilistic separation appeared in the investigation of Kagan and Jackson (1991). Zhuang et al. (2002Zhuang et al. ( , 2004 and Zhuang (2006) suggested the stochastic declustering method also called stochastic reconstruction to bring such a probabilistic treatment into practice based on the epidemic-type aftershock sequence (ETAS) model (Ogata 1988a(Ogata , 1998b. The generalization of stochastic declustering by Lengliné (2008, 2010) has no specific underlying model and can accept any (additive) seismicity model. In other studies, Frohlich and Davis (1990) and Davis and Frohlich (1991) proposed the single-link cluster analysis based on a spatio-temporal metric between two earthquakes, and Hainzl et al. (2006) proposed the estimating background rate based on inter-event time distribution. Based on the inter-event times, the method by Bottiglieri et al. (2009) uses the coefficient of variation of the times, and Frohlich and Davis (1985) proposed the ration method which also exploits the inter-event times but without examining their distribution. As another cluster analysis with links, Baiesi and Paczuski (2004) proposed a simple spatio-temporal metric called correlation-metric to correlate earthquakes with each other and Zaliapin et al. (2008) further defined the rescaled distance and time.
Among these declustering algorithms, we focused on the single-link cluster analysis proposed by Frohlich and Davis (1990) and Davis and Frohlich (1991) and the correlation-metric proposed by Baiesi and Paczuski (2004). In our proposed method, we employ one of these two algorithms alternatively in the tree construction phase.

Change-point detection algorithms
Our research aim is in some sense the same, in the spirit, as the work by Kleinberg (2002) and Swan and Allan (2000). They noted a huge volume of the time-series data, tried to organize it, and extract structures behind it. This is done in a retrospective framework, i.e., assuming that there is a flood of abundant data already and there is a strong need to understand it. Kleinberg's work is motivated by the fact that the appearance of a topic in a document stream is signaled by a "burst of activity" and identifying its nested structure manifests itself as a summarization of the activities over a period of time, making it possible to analyze the underlying content much easier. Kleinberg's method used a hidden Markov model in which bursts appear naturally as state transitions, and successfully identified the hierarchical structure of e-mail messages. Swan and Allan's work is motivated by the need to organize a huge amount of information in an efficient way. They used a statistical model of feature occurrence over time based on hypothesis testing and successfully generated clusters of named entities and noun phrases that capture the information corresponding to major topics in the corpus and designed a way to nicely display the summary on the screen (Overview Timelines). We also follow the same retrospective approach, i.e., we are not predicting the future, but we are trying to understand the phenomena that happened in the past.
We are interested in detecting spatio-temporal changes in the magnitude of earthquakes. For this purpose, by defining a set of links with some declustering algorithm described earlier, we construct a spatio-temporal network (spanning tree), where the nodes correspond to the observed earthquakes. After that, in order to analyze the burst of activity in an earthquake catalog and attempt to present an overview map, we employ a variant of our proposed change-point detection algorithm (Yamagishi et al. 2014;Yamagishi and Saito 2017).

Proposed method
be a set of observed earthquakes, where x i , t i and m i stand for a location vector, time and magnitude of the observed earthquake i, respectively. Here, we treat every earthquake (event) as a single point in a spatio-temporal space, as done by representative declustering methods in van Stiphout et al. (2012). Here, we assume that these earthquakes are in order from oldest to most recent, i.e., t i < t j if i < j . In this paper, from the observed dataset D , we address the problem of automatically extracting several clusters consisting of spatio-temporally similar earthquakes whose average magnitudes are substantially different from the total one. In what follows, we describe some details of our proposed algorithm consisting of two phases: tree construction and tree separation. Below, for a given number G of clusters, we describe our main algorithm, which produces a clustering result R G .
Phase1. Construct a spanning tree T of the observed earthquakes in D by using the single-link and correlation-metric strategies described below, Phase2. Produce a clustering result R G by separating the spanning tree T based on one of variance and likelihood criteria. Figure 1 shows an illustrative example of a small tree, where we assign the time direction and the spatial distance between two earthquakes to the horizontal and vertical axes, respectively, and depicts each sample earthquake assumed to be in D as a circle so that a larger magnitude is indicated by a larger radius. From these earthquakes, Phase1 constructs a spanning tree T by producing a set of links between two earthquakes, and then Phase2 produces a set of clusters, just like a subtree N surrounded by the red dotted line, by removing a subset of links, just like a link denoted by red dotted one, as shown in Fig. 1. Namely, as shown by N in Fig. 1, our purpose is to extract clusters having substantially larger average magnitudes or smaller b-values. In fact, although our problem setting can be categorized as event clustering according to the classification by Ansari et al. (2020), the purpose of clustering is substantially different from existing representative methods to our best knowledge.

Tree construction strategies
Among several seismicity declustering algorithms, we focus on two studies, i.e., the singlelink cluster analysis proposed by Frohlich and Davis (1990) and Davis and Frohlich (1991), and the correlation-metric proposed by Baiesi and Paczuski (2004). In our experiments described later, it is shown that we obtain quite different extraction results by employing either one of these two strategies.
In the single-link strategy, with respect to two earthquakes i and j, the spatio-temporal metric d i,j is defined as It was found that a spatio-temporal scaling constant C = 1 km/day gives satisfactory results. Then, an earthquake j is regarded as the aftershock (child node) of i SL if the metric d i,j is minimized, i.e., i SL (j) = arg min 1≤i<j d i,j . Then, based on the single-link strategy, we can define a spanning tree, where the nodes correspond to the observed earthquakes, and the links are defined by In the correlation-metric strategy, with respect to two earthquakes i and j such that i < j , the spatio-temporal metric n i,j is defined as (2) Here d f is the fractal dimension set to d f = 1.6 , and b the parameter of the Gutenberg-Richter law set to b = 0.95 . Again, an earthquake j is regarded as the aftershock (child node) of i CM (j) if the metric n i,j is minimized, i.e., i CM (j) = arg min 1≤i<j n i,j . Then, based on the correlation-metric strategy, we can define a spanning tree, where the nodes correspond to the observed earthquakes, and the links are defined by

Tree separation algorithm
Let R ⊂ T be a subset of tree links constructed by either one of the single-link and correlation-metric strategies. Here note that when |R| = G − 1 , by removing all the links in R from T , we can separate the tree into G connected components. Then, the original set of observed earthquakes which correspond to nodes of the tree is also divided into G clusters . . , N } . Now, by denoting the average magnitude of cluster N g as we can derive our first objective function f 1 (R) to be minimized as follows: Namely, we employ the definition of weighted variance in this objective function.
On the other hand, with respect to observed magnitudes for each of the G clusters {N g | 1 ≤ g ≤ G} , by assuming an exponential distribution with a parameter g , we can consider the following logarithmic likelihood function: where m c denotes a cut-off magnitude defined as m c = min 1≤i≤N {m i } − m � /2 , and m stands for the bin-width of observed magnitudes, i.e., m = 0.1 in our catalog. Here we should clarify the relationship between the Gutenberg-Richter law and the above formulation, i.e., the number N(m) of earthquakes with magnitudes equal to or greater than m can be formulated by using the above exponential distribution as follows: where N = N (m c ) , a = log 10 N + m c log 10 e and b = log 10 e . Then, by substituting the maximum likelihood estimator into Eq. (5), we can derive our second objective function f 2 (R) to be minimized as follows: Namely, we employ the definition of average logarithmic likelihood in this objective function. Intuitively, we intend to produce one relatively large cluster and the other small clusters having substantially large average magnitudes from the total one. In fact, since the distribution of magnitudes in a catalog reasonably obeys the Gutenberg-Richter law (exponential distribution), i.e., the magnitudes of most earthquakes are relatively small, it is naturally expected that we can improve the objective function by separating clusters of spatio-temporally similar earthquakes with relatively large magnitudes. Here note that this objective function can be interpreted as a weighted sum of variances.
In order to compute the resultant set of separation links R , we employ a variant of our proposed change-point detection algorithm (Yamagishi et al. 2014;Yamagishi and Saito 2017). Namely, from the observed dataset D , the tree T constructed by either the single-link and correlation-metric strategies, and a given the number of clusters G, our algorithm computes R as follows: Step1. Initialize g ← 1 and R 0 ← ∅.
Step2. Compute e g ← arg min Step3. Set g ← g + 1 and then return to Step2 if g < G − 1 ; otherwise set g ← 1 and and then h ← 0 if e ′ g � = e g ; otherwise set h ← h + 1, Step5. Output R G−1 and then terminate if h = G − 1 ; otherwise set g ← (g mod (G − 1)) + 1 and then return to Step4.
More specifically, after initializing the variables in Step1, we compute the optimal g-th link in e g by fixing the already selected set of (g − 1) links in R g−1 and add it to R g−1 as shown in Step2. We repeat this procedure from g = 1 to G − 1 as shown in Step3. After that, we start with the solution obtained as R G−1 , pick up a link e g from the already selected links, fix the rest R G−1 \ {e g } and compute the better link e ′ g of e g as shown in Step4, where · \ · represents set difference. We repeat this from g = 1 to G − 1 . If no replacement is possible for all g, i.e. e ′ g = e g for all g ∈ {1, . . . , G − 1} , then no better solution is expected and the iteration stops, as shown in Step5. Here, it is not guaranteed that the above algorithm theoretically produces the optimal result, but it is confirmed that the algorithm always computes the optimal or near-optimal solutions in our empirical evaluation (Yamagishi et al. 2014;Yamagishi and Saito 2017).

Magnitude correlation analysis
In order to extract clusters consisting of earthquakes with relatively large magnitudes, by using either one of the single-link and correlation-metric strategies, we want to construct a tree where such earthquakes are connected with each other. To examine the type of properties, we present a method of analyzing magnitude correlation over a network of these observed earthquakes. Here note that we can regard this method as an assortative mixing analysis (Newman 2003) over networks in terms of magnitudes. Let G = (V, E) be a network obtained from the observed earthquake dataset D , where V = {1, . . . , N } , and we assign the direction of link (i, j) ∈ E according to the time direction, i.e., t i < t j . Then, for nodes j ∈ V and i ∈ V , we define the sets of in-neighbor and out-neighbor nodes as IN (j) = {i ∈ V | (i, j) ∈ E} and ON (i) = {j ∈ V | (i, j) ∈ E} , respectively. Let V(m) ∈ V be a set of nodes with magnitude m, and then, for each observed magnitude m, we can compute the following out-magnitude and in-magnitude correlation functions where recall these observed magnitudes are quantized with the bin-width m = 0.1 . Actually, when earthquakes with larger magnitudes have a tendency to be connected with larger ones, we can expect that both C in (m) and C out (m) become increasing functions with respect to m.

Experimental evaluation
By using an earthquake catalog which contains source parameters determined by Japan Meteorological Agency 1 in the whole of Japan, we generated two original datasets. Namely, by setting the minimum magnitude m min = min 1≤i≤N {m i } and the maximum depth as m min = 3.0 and 100 km, respectively, we selected N = 104, 343 earthquakes during the period from Oct. 01, 1997 to Dec. 31, 2016 as dataset A, while by setting m min = 4.0 and 100 km, we selected N = 27, 728 earthquakes during the period from Oct. 01, 1977 to Dec. 31, 2016 as dataset B.

Quantitative evaluation
First, we evaluate the performance of the proposed method employing our different tree construction strategies, i.e., single-link and correlation-metric. Figure 2 shows the experimental results of the datasets A and B, using the two objective functions, weighted variance f 1 (R) and average logarithmic likelihood f 2 (R) , which are depicted in pairs of Fig. 2a, b and c, d, respectively, where the horizontal and vertical axes stand for the number of clusters varied from G = 1 to 8 and the objective function value defined in Eq. (4) or (8). Note that for each of Fig. 2a, b (or Fig. 2c, d) the value at G = 1 is nothing more than the total variance (or likelihood) of each dataset. From these experimental results, we can see that in the case of employing the single-link strategy, the objective function values defined by the weighted sums of variances become much smaller and those defined by the average logarithmic likelihood become much larger, in comparison to those of employing the correlation-metric strategy. This suggests that the proposed method employing the single-link strategy can produce more desirable results for our purpose of producing one relatively large cluster and the other small clusters having substantially larger average magnitudes or smaller b-values.
Next, we evaluate the similarity of the results with different numbers of clusters obtained by the proposed method. In what follows, we only show our experimental results using the dataset A, but reasonably similar results have been obtained for the dataset B. For this purpose, we employ the Adjusted Rand index (Rand 1971) for evaluating two different clustering results denoted by G h | denotes the number of objects in common between N g and N ′ h , and let cg g = H h=1 c g,h and ch h = G g=1 c g,h denote the sums of c g,h . Then, we can compute the original Adjusted Rand Index using the permutation model ARI(G, H) as Figure 3 shows the similarity matrices consisting of the Adjusted Rand Indices by varying the number of clusters from G = 2 to 8, where Fig. 3a, b are those of the proposed method employing the single-link and correlation-metric strategies based on the weighted variance function f 1 (R) , and Fig. 3c, d are those based on the average logarithmic likelihood function f 2 (R) . From these experimental results, we can see that in the (11)

Fig. 2 Results of performance evaluation
case of employing the single-link strategy, there exist three types of similar results, i.e., 2 ≤ G ≤ 4 , 5 ≤ G ≤ 7 and G = 8 as for the weighted variance function, and 2 ≤ G ≤ 5 , 6 ≤ G ≤ 7 and G = 8 as for the average logarithmic likelihood function, but almost a single type of similar results except for G = 2 in the case of employing the correlationmetric strategy, regardless of the types of objective functions. Namely, we can expect to obtain several types of results by varying G in the case of the single-link strategy. On the other hand, we can see that in the case of employing the weighted variance function, the ranges of the Adjusted Rand Indices were much wider than those employing the average logarithmic likelihood function.

Visual evaluation
First, we visually evaluate the obtained results employing the different tree construction strategies by focusing on the dataset A. Here, we characterize each cluster by using the corresponding b-value Recall that m c = m min − m � /2 , where the minimum magnitude m min and the bin-width of observed magnitude m in the dataset A were set to m min = 3.0 and m = 0.1 . Here note that the average magnitude in dataset A is around 3.50, and the corresponding b-value is approximately amounts to 0.79. Figure 4 shows our visualization results based on the weighted variance function f 1 (R) , where the numbers of clusters are 4, 7, and 8 (12) b g = log 10 e µ g − m c .

Fig. 3 Results of similarity evaluation
for the single-link strategy, and 8 for the correlation-metric strategy. Here these results are selected according to the similarity matrices shown in Fig. 3a, b. Also note that by selecting earthquakes whose magnitudes are greater than or equal to 5.0, we plotted each of them as a triangle with a color shown in Fig. 4 according to its cluster's corresponding b-value. Similarly, Fig. 5 shows our visualization results based on the average logarithmic likelihood function f 2 (R) , where the numbers of clusters are 5, 7, and 8 for the single-link strategy, and 8 for the correlation-metric strategy, Here these results are also selected according to the similarity matrices shown in Fig. 3c, d. From these results, as expected, we could generally obtain the clustering results each of which consists of one or two relatively large clusters and the other small clusters having substantially different b-values from the total one. More specifically, as for the comparison between the two different strategies, single-link and correlation-metric, shown in Fig. 4c, d (or Fig. 5c, d), respectively, by employing the former strategy, we could obtain clearly visible clusters having substantially large average magnitudes around the region where the 2011 Tohoku earthquake with m max = max 1≤i≤N {m i } = 9.0 (indicated by red cross in the figures), the largest magnitude in Japan, occurred. As for the comparison between the two different objective functions, weighted variance and average likelihood, shown in Figs. 4c and 5c (or Figs. 4d and 5d), respectively, by employing the average logarithmic likelihood function, the ranges of the b-values were somehow wider than those employing the weighted variance function. On the other hand, as for comparison among the different numbers of clusters in case of employing the single-link  Fig. 5a-c), we could obtain somehow different types of clustering results, which might help to analyze the dataset from multiple viewpoints. In short, in our empirical evaluation, we can confirm that the proposed method employing the single-link strategy can produce more desirable results for our purpose of producing one relatively large cluster and the other small clusters having substantially larger average magnitudes or smaller b-values.
Next, under the setting that the number of clusters is 8 ( G = 8 ), by focusing on the clusters which contain the 2011 Tohoku earthquake with m max = 9.0 , we compare the individual clusters obtained by our methods employing different strategies for tree construction based on different objective functions for tree separation. Figure 6 shows our visualization results, where the results obtained by the single-link and correlation-metric strategies based on the weighted variance function f 1 (R) are shown in Fig. 6a, b, and those based on the average logarithmic likelihood function f 2 (R) are shown in Fig. 6c, d. In our visualization, we refer to the earthquakes that occurred before the 2011 Tohoku earthquake and those after it as foreshocks and aftershocks respectively, and depict foreshocks and aftershocks by blue and orange circles respectively, and the 2011 Tohoku earthquake by a red cross. 2 From these experimental results, we can clearly see that as for the results obtained by the single-link strategy, the clusters contain both foreshocks and aftershocks that occurred within a relatively narrow region, as shown in Fig. 6a, c. Conversely, as for the results obtained by the correlation-metric strategy, the clusters contain only aftershocks that occurred in a quite wide region, as shown in Fig. 6b, d. Moreover, we can see that the pairs of results obtained by the same tree construction strategy are quite similar, regardless of the types of objective functions, f 1 (R) and f 2 (R). Figure 7 shows the estimated b-values over time with respect to the clusters shown in Fig. 6, where the results obtained by the single-link and correlation-metric strategies based on the weighted variance function are also shown in Fig. 7a, b, and those based on the average logarithmic likelihood function are shown in Fig. 7c, d. More specifically, we depict the b-values estimated within each cluster over time by blue lines, which are calculated from moving time windows each containing 100 earthquakes centered on the target earthquake based on Eq. (12), and depict the 90 % confidence intervals (Shi and Bolt 1982) by gray areas. Here we indicated the occurrence time of the 2011 Tohoku earthquake by a vertical red dotted line. From these experimental results, in the case of the single-link strategy, we can observe the pronounced decrease in b-value before the occurrence time of the 2011 Tohoku earthquake as shown in Fig. 6a, c. On the other hand, in case of the correlation-metric strategy, we cannot observe such a decrease in b-value because only an extremely short foreshock period is contained which is compared to the total period of the cluster, as shown in Fig. 6b, d. Again, we can see that the proposed method employing the single-link strategy can produce more desirable results Here recall that our research objective is also illustrated in Fig. 1, and we can confirm that the visualization results by the proposed method employing the single-link strategy are closer to that in Fig. 1 than those by the proposed method employing the correlationmetric strategy. In other words, as shown in the visualization results by the proposed method employing the single-link strategy in Fig. 6, if an earthquake of extremely large magnitude, such as the 2011 Tohoku earthquake, occurred and also many earthquakes of relatively large magnitudes occurred spatio-temporally close, the set of these earthquakes is the typical cluster such as the N in Fig. 1 which we aim to extract.

Magnitude correlation analysis
From our experimental results described above, we can see that the single-link strategy produces more desirable results than those by the correlation-metric strategy, regardless of the types of objective functions for tree separation. In order to clearly reveal this reason, we revisit the original trees constructed by the two strategies, and compare their properties in terms of the magnitude correlation analysis proposed in "Magnitude correlation analysis" section. Namely, we examine the properties that earthquakes with larger magnitudes have a tendency to connect each other.  Fig. 8a, b and c, d are those of employing the single-link and correlation-metric strategies, respectively. From these experimental results, we can observe that in the case of employing the correlation-metric strategy, the in-magnitude correlation values are much larger than the other values. This observation can be naturally explained by the fact that as for each link (i CM (j), j) obtained by the correlation-metric strategy, each earthquake i CM (j) with a quite large magnitude is likely to be selected according to the correlationmetric measure defined in Eq. (2). As another characteristic, we can observe that in the case of employing the single-link strategy, the in-and out-magnitude correlation values slightly increase as m becomes large, i.e., earthquakes with larger magnitudes have a tendency to connect each other. This also indicates that the single-link strategy has a desirable property for our purpose of producing one relatively large cluster and the other small clusters having substantially larger average magnitudes or smaller b-values.

Conclusion
In this paper, for a given dataset of observed earthquakes, we addressed the problem of automatically extracting several clusters consisting of spatio-temporally similar earthquakes whose magnitudes are substantially larger average magnitudes or smaller b-values in comparison to the total ones. Especially, we intended to produce one relatively large cluster and the other small clusters having substantially different average magnitudes from the total one. For this purpose, we proposed a new method consisting of two phases. In the former tree construction phase, we employ one of two different declustering algorithms called single-link and correlation-metric developed in the field of seismology, while in the later tree separation phase, we employ a variant of the change detection algorithm, developed in the field of data mining, based on one of two different types of objective functions, i.e., the average magnitude which is inversely proportional to the b-value, and the likelihood function based on the Gutenberg-Richter law. In our empirical evaluation using earthquake catalog data covering the whole of Japan, it was confirmed that we could generally obtain the clustering results each of which consists of one relatively large cluster and the other small clusters having substantially different average magnitude from the total one. Moreover, we showed that the proposed method employing the single-link strategy can produce more desirable results, in terms of the improvement of weighted sums of variances, average logarithmic likelihoods, visualization results, and magnitude correlation analyses. As a future task, we plan to conduct more experiments to see that our clustering method can provide new findings on the earthquake statistics, the underlying earthquake dynamics, and so on, by producing one relatively large cluster and the other small clusters having substantially different average magnitudes from the total one. Further theoretical studies to find the optimal number of clusters are also future works.