Evaluation of Pre-Earthquake Anomalies of Borehole Strain Network by Using Receiver Operating Characteristic Curve

In order to monitor temporal and spatial crustal activities associated with earthquakes, groundand satellite-based monitoring systems have been installed in China since the 1990s. In recent years, the correlation between monitoring strain anomalies and local major earthquakes has been verified. In this study, we further evaluate the possibility of strain anomalies containing earthquake precursors by using Receiver Operating Characteristic (ROC) prediction. First, strain network anomalies were extracted in the borehole strain data recorded in Western China during 2010–2017. Then, we proposed a new prediction strategy characterized by the number of network anomalies in an anomaly window, Nano, and the length of alarm window, Talm. We assumed that clusters of network anomalies indicate a probability increase of an impending earthquake, and consequently, the alarm window would be the duration during which a possible earthquake would occur. The Area Under the ROC Curve (AUC) between true predicted rate, tpr, and false alarm rate, f pr, is measured to evaluate the efficiency of the prediction strategies. We found that the optimal strategy of short-term forecasts was established by setting the number of anomalies greater than 7 within 14 days and the alarm window at one day. The results further show the prediction strategy performs significantly better when there are frequent enhanced network anomalies prior to the larger earthquakes surrounding the strain network region. The ROC detection indicates that strain data possibly contain the precursory information associated with major earthquakes and highlights the potential for short-term earthquake forecasting.


Introduction
It has been observed that the preparatory process and the occurrence of shallow earthquakes are usually accompanied by crustal deformation. Precise observations of surface displacement, using data from GNSS satellites, strainmeters and seismographs all over the world, allow the investigation of complex tectonic structures [1][2][3] and the prediction of short-and medium-term earthquakes [4,5]. However, earthquakes have complicated changing spatio-temporal distribution characteristics, which makes the evaluation of earthquake-related deformation data highly challenging. In recent years, various studies have attempted to understand the interactions and relationships between strain data and earthquakes [6][7][8][9], including those related to the 2008 Wenchuan earthquake in China [10,11] and the 2011 Tohoku earthquake in Japan [12][13][14]. Deformation data represented by borehole strain measurements are also considered to contribute to the estimation of crustal changes. These measurements also play an important role in analysing geodynamic processes, such as those related to volcanic activity [15], slow earthquakes [16,17], and moreover precursory activity [6,[18][19][20].
Although borehole strain have been considered to record some pre-earthquake anomalies [8,9,21], whether these anomalies contain precursory information and their use in forecasting of major earthquakes were not evaluated. Receiver Operating Characteristic (ROC) analysis has previously been employed for short-term forecasting to explore whether earthquake precursor information can be obtained from geophysical observations of precursor anomalies [22,23]. Prediction strategies are based on an alarm model that detects and classifies the extracted anomalies extracted and targets earthquake events, thereby evaluating whether measurements contain precursory information for large earthquakes. Recently, various geophysical measurements have indicated the potential for earthquake forecasting [24][25][26], through GPS data [27] and electromagnetic measurements [28][29][30][31]. However, current alarm models only focus on a single anomaly and underestimate the clustering of earthquake anomalies [32,33], which may increase the risk of inaccurate results from short-term forecasting.
In this study, we employ complex networks to jointly investigate borehole strain observations from multiple stations, using seven YRY-4 borehole strainmeter measurements in Western China from 2010 to 2017. We then propose a new prediction strategy to test enhanced networks clustering and earthquake occurrences using ROC analysis and assess its significance related to earthquakes. Furthermore, we discuss the efficiency of the strategy under the influence of different factors. Also, by comparing with GNSS geodetic data, we found that the prediction strategy performs more significantly when there are frequent enhanced network anomalies prior to the earthquakes.

Borehole Strain Observations
More than 40 YRY-4 borehole strainmeters have been installed in China with the aim of coordinating with satellite-based measurement systems and monitoring surface deformation associated with earthquakes. Surrounding the north and south sides of active fault zones, such as the Longmenshan fault zone and Haiyuan fault zone, we selected the following monitoring sights: YC, HY, LX, GZ, ZT, YS, and TC, all of which yield fairly good data with long-term continuity. Table 1 lists the detailed information of the seven stations, and Figure 1 shows their locations and the regional tectonic settings. YRY-4 borehole strainmeters sample every minute with a resolution of nearly onebillionth strain changes. The strainmeters can test the quality and reliability of the observations through a self-consistency function, because of their four gauges arranged at 45 • intervals. Consequently, the arrangement produces four components [21]. In this study, the strain data were collected from 1 January 2010 to 10 August 2017, and we applied the areal strain, S a , which is transformed by half of the sum of the four observed components and describes the subsurface strain state [21].

Studied Earthquakes
In this study, we employed the E s parameter, which considers the magnitude of the earthquake and the epicentral distance to identify earthquake events. The E s index is the daily sum of the local earthquake energy, whereas E s is the local energy of an earthquake in the borehole strain network defined by the following equation and the local energy of a single station is shown by the multiplier in the root sign [28,34]: Here, m and r i are the magnitude of the earthquake and the epicentral distance of each site, respectively, and M is the number of sites. We selected the earthquakes in the area 24 • -39 • N, 98 • -106 • E during 1 January 2010 to 10 August 2017. It should be noted that all the events occurred within the study area during the study period are shallow earthquakes (less than 60 km, mostly less than 30 km), and the earthquakes with magnitude greater than 1.0 were taken into account. To avoid repeated detection of the same process, we have declustered the earthquake catalog by the traditional Reasenberg declustering algorithm, obtained from the Zmap toolbox [35,36]. The completeness of the declustering earthquake catalog is around 1.5, and there are 162 earthquakes of Ms ≥ 4.0, 30 earthquakes of Ms ≥ 5.0, 9 earthquakes of Ms ≥ 5.7 and 2 earthquakes of Ms ≥ 7.0. The local seismicity and the daily E s are shown in Figure 2. First, we considered that an earthquake occurred if the E s index on a given day exceeded 10 7 J/km 2 . There are 26 target earthquakes, and their detailed information is presented in Table 2. If 10 7 is converted to magnitude, when the average epicentral distance in the network is about 440 km, generally an earthquake should be greater than Ms5.0. We only listed the type of fault of an earthquake when the seismic moment tensors are available.

Tectonic Settings in Brief
The Longmenshan, the Xianshuihe and the Anninghe fault reflect the structural characteristics of the compression of the Yangtze block by the Qinghai Tibet Plateau, as shown in Figure 1. After a long period of geological development, these three fault zones have become the primary zonal faults of Bayan Har Block, Sichuan Basin and Sichuan-Yunnan Block. The total regional stress in this area comes from the compression of the Qinghai Tibet Plateau in the northwest direction [37]. However, the fault structure in this area is diversified, because of the complex geological environments. The Longmenshan fault, which shows a thrust motion from south to north, can be divided into three main faults: the Maoxian-Wenchuan fault (southern segment), the Beichuan-Yingxiu fault (central segment) and the Anxian-Guanxian fault (northern segment) [38]. These three faults are imbricate and all dip northwestward. At the depth of 20-24 km, they converge and merge into a gently inclined reverse fault, which becomes the dominant structure of the Qinghai Tibet Plateau napping over the Sichuan Basin [39,40]. The southernmost end of Xianshuihe fault is the approximately north-south striking Anninghe fault, also known as the Xianshuihe-Anninghe fault zone; it is a large left-lateral strike-slip shear zone [39,41]. The Haiyuan fault is an important active fault zone in the northeastern border of the Qinghai Tibet Plateau, which separates the Bayan Har Block of Qinghai Tibet Plateau from the relatively stable Ordos plate. Its present movement mode is mainly left-lateral strike-slip with thrust component [42].

Extracting Network Anomalies of Borehole Strain
Complex networks can be used as an abstraction of strain monitoring systems to express concealed interactions or relationships between spatial and temporal characteristics [43][44][45][46][47][48][49]. we defined a borehole strain network by a graph consisting of M nodes and a set of edges between nodes. M nodes are on behalf of M borehole stations, while edges are defined by significant relationship between two nodes [50]. An adjacency matrix is applied to describe the graph; A = a ij , i, j ∈ V. The correlation between the sequences of length n from two nodes, P = P i and Q = Q j , i, j ∈ V, is defined as Here,p andq are the means of P and Q, respectively, and P and Q are the strain components of each observation site [50]. The adjacency matrix A = a ij , i, j ∈ V acquires the components that were assigned as "one" if the absolute value of R PQ was greater than 0.8, and as "zero" if R PQ was less than 0.8, indicating whether nodes i and j were connected or not connected. To monitor the changes in the borehole strain network connection, the network degree is used based on the adjacency matrix A. The degree k i of a node is defined as the total number of edges connected to that node in a borehole strain network: Then, the network degree is calculated as the mean value of the k i , (i = 1, . . . , M) in the network, which is expressed as follows: The greater the network degree, the stronger the connection between networks; otherwise, the nodes are considered to be weakly connected, indicating a disordered network. Figure 3 shows an example of the strain network result of 2017 Ms7.0 Jiuzhaigou earthquake, Sichuan. The green lines represent the connections between two stations. It is a fully connected network of that day. The red star shows the epicenter of the earthquake. The daily network degree together with the seismicity and the daily Es are shown in Figure 2 for the complete period of 2010-2017.

Evaluation of Strain Network Anomalies and Earthquake Occurrence
Due to an increase in the spatiotemporal coherence during earthquake precursory activity [31,50,51], we assumed that the network would be highly connected during such activities. In order to explore the correlations between strain network connection and earthquakes, we proposed a prediction model to test their relationship through a Receiver Operating Characteristic (ROC) curve. The prediction strategy is shown in Figure 4. We analysed how many anomalies of the strain network degree, N ano , are in an anomaly window, T ano , where the first day and last day of an anomaly window are required to be anomalous. An alarm window, T alm , is described as the period in which an impending earthquake would occur. First, we select a threshold to define the anomalies in the strain network degrees. An anomaly is detected when the network degree exceeds the threshold. Next, if the anomalies (N ano ) in the alarm window (T ano ) account for more than half (as shown in Figure 4, the length of the anomaly window is 6 and the number of anomalies exceeds 3), we assume that a target earthquake with E s ≥ 10 7 will occur in the following alarm window, T alm . A target earthquake is labelled as predicted if it falls in the alarm window, or as missed if it falls outside the window. The selected threshold is set to vary from the maximum to the minimum network degree. Consequently, the number of anomalies changes from 0 to the length of the sequence. As a result, tpr and f pr change from 0.0 to 1.0. Thus, the confusion matrix shown in Figure 4b can be used to estimate the true positive rates, tpr, and false positive rates, f pr. We set tpr = TP/(TP + FN) and f pr = FP/(FP + TN) [31,52]. Figure 5 shows an ROC detection result with tpr against f pr. For a given alarm rate, tpr, the probability of n 1 predictions in n target earthquakes is subject to a binomial distribution [28]: As a result, the diagonal line in the ROC diagram indicates a prediction by random guessing [24,53,54], and the confidence interval of the true positive rate at each alarm rate can be computed. Any prediction above the diagonal line shows that the number of predicted earthquakes is better than that of false alarm days. Next, the Area Under the Curve (AUC) is introduced to evaluate the efficiency of the prediction. The expected area of random prediction is 0.5, where an area greater than 0.5 indicates that the strain network anomalies may contain precursory earthquake information. The result in Figure 5 shows that the prediction for N ano = 7 and T alm = 1 days is obviously better than random guess with AUC = 0.81.

Results
To reduce the contingency of ROC detection, we performed statistical studies based on Superposed Epoch Analysis (SEA) of the strain network during 2010-2017 preferentially. SEA is a statistical method to highlight typical features of a special event by reducing random noise by superposition [55][56][57]. First, if there is a network anomaly, we count one for that corresponding day for the whole dataset. Here, we set the threshold of the network degree to 4. Then, for each target earthquake, we extracted a data set of 45 days before and 30 days after its occurrence day. We repeated this procedure for the 26 target earthquakes, and then superposed the counts for all data sets. We performed the SEA results of 1-day count and 5-day counts as shown in Figure 6, respectively. To evaluate statistical significance, we randomly selected 26 days instead of studied earthquakes in the whole dataset, and conducted the same procedure. We repeated such random SEA tests 10,000 times to calculate the random_mean and the random standard deviation (σ) [54,57].
Generally, if the earthquake events and the network anomalies are not correlated, the count distribution probably is random. In other words, if the 1-day or 5-day counts exceed the random_mean+1.5σ level, it is indicative of a statistically significant correlation between the earthquakes and the network anomalies. In Figure 6, the most anomalies occurred on the earthquake day inferring that coseismic anomalies were recorded by each station. Before an earthquake, there are clearly higher probabilities of network anomalies than after. We found, about 20 days before, the results of 1-day counts are mostly significant. The results of 5-day counts are also clearly significant during the periods 1-20 days before the earthquake. We backtracked Figure 5, and found when f pr was surrounding 0.1-0.2, tpr was more significant. The corresponding threshold is 3-5 at that time, which is consistent with that in the above SEA analysis results. These results highly suggested the correlation between the local earthquakes (E s ≥ 10 7 ) and strain network anomalies. Figure 6. The Superposed Epoch Analysis (SEA) results of strain network anomalies. The blue, the black, and the red lines indicate the counts of 1-day count, random_mean, and random_mean+1.5σ, respectively. The grey bars and the pink line demonstrate 5-day counts and corresponding random_mean+1.5σ. The "0 day" indicates the day when E s ≥ 10 7 .
Moreover, we performed a comparative analysis with shuffled target earthquakes [58]. The SEA analysis and the ROC detection results are shown in Figure 7. The counts of network anomalies corresponding to the shuffled target earthquakes are random and not significant. And the ROC result indicates that the network anomalies do not contain precursory information, with AUC being 0.42. Therefore, the above results further suggest a possible relationship between the network anomalies and earthquakes (E s ≥ 10 7 ).
Next, their correlation was further discussed using the ROC prediction. Regarding the prediction model, there are two variables related to the efficiency of detection: the number of network anomalies in an anomaly window, N ano , and the length of the alarm window, T alm . Thus, we first repeated the detections for the earthquakes (E s ≥ 10 7 ) with different variable lengths. N ano and T alm were varied from 1 to 30 days as shown in Figure 8a.  We found that an optimal efficiency of predictability exists for N ano and T alm , as shown in Figure 8a. When there were within 20 anomalies in the anomaly window and the alarm window length was 1-25, the AUC was greater than 0.5, indicating the effectiveness of the prediction strategy. Next, we extracted the row and column of the maximum AUC (0.81), i.e., T alm = 1 when N ano = 7, and a detailed optimal area as shown in Figure 8b. The AUC value exceeded the upper boundary of the 90% confidence interval when N ano was between 1 and 15 and T alm was 1-12. Especially in Figure 5, the prediction of strain network is clearly better than the random prediction when N ano = 7 and T alm = 1 days. These results further suggested the statistical efficiency of short-term forecasts and indicate that the network tends to be highly connected prior to the earthquakes (E s ≥ 10 7 ). Moreover, with regard to the prediction model in Figure 4, three other factors should be discussed: different E s thresholds, different proportions of the anomalies in an anomaly window, and local earthquake region. In the following, we further investigate the impact of these three factors on ROC prediction efficiency. First, target earthquakes with different E s values were selected, consisting of 8, 26, 75, and 184 earthquakes with E s thresholds of 10 8 , 10 7 , 10 6 , and 10 5 , respectively. With fixed values of N ano = 7 and T alm = 1, we performed ROC analysis for these different target earthquakes as shown in Figure 9. The detection efficiency increases when the threshold of E s was 10 7 and 10 8 , with AUC values of 0.81 and 0.76, respectively. However, the ROC curves are not significant when the threshold was 10 5 and 10 6 . This illustrates the higher efficiency of ROC analysis for the earthquakes with larger energy. Second, we tested the influence of different proportions of anomalies in an anomaly window, as shown in Figure 10. This test also shows the ROC curves with fixed values of N ano = 7 and T alm = 1. We found that, when there were more than half (2N) or a third (3N) of network degree anomalies in an anomaly window, the detection efficiency was more significant and stable. When there were few anomalies in an anomaly window, the detection efficiency was low and especially when T ano = 8N ano , the result seems to be accidental and irregular. This result shows that clusters of anomalies indicate an increased probability of an earthquake, which is also consistent with the physical assumption [32]. Third, we selected the local seismicity for other two larger region, that is, 23 • -40 • N, 97 • -107 • E, and 22 • -41 • N, 96 • -108 • E, respectively. The condition of this ROC prediction is that E s ≥ 10 7 , N ano ≥ T ano /2, N ano = 7 and T alm = 1. The results are shown in Figure 11. The AUC values decrease as the detection range increases, which also indicates the network anomalies are related with local earthquakes. Earthquake selection within a larger region may bring irrelevant events, thereby increasing the missing rate. Thus, ROC prediction for earthquakes surrounding the network may be a good choice. In summary, we consider that clusters of highly connected network anomalies may contain precursory information related to local earthquakes (E s ≥ 10 7 ) surrounding the strain network.

Discussion
By using the ROC prediction, we proved that the pre-earthquake strain-network anomalies are significant, which consequently suggests a possible correlation between the borehole strain data recorded in Western China the and local earthquakes (E s ≥ 10 7 ). This research supports the understanding of pre-earthquake processes, and the fault network can be regard as a big seismogenic system in Mainland China [59]. In certain subsystems, multiple 'potential sources' appear because of fault activities and fault blocks, crustal movement and stress accumulation. The space-time evolution of the interaction between 'potential sources' and subsystems is considered as a self-organizing process. Regardless of the expression of the measurements, the dimensionality and disorder will decrease before reaching a new steady state [59]. Consequently, earthquake precursors are the dynamic behavior in the self-organizing evolution of the seismogenic system. With regard to borehole strain data, network anomalies before the target earthquakes probably indicate that the steady state started to weaken due to the crustal movements. Meanwhile, a new steady state is established through the recovery of the network after an earthquake. If we think of this self-organization process as the entropy reduction of a seismic system, then the unknown external energy that flows in may be related to impending earthquakes [50,60,61].
The crustal deformation before earthquakes is an important factor in integrated seismological research. To examine whether borehole strain measurements are useful crustal information before earthquakes, statistical tests were performed for the aligned GNSSazimuths related to earthquakes with magnitudes greater than 5 during 2012-2015 in similar areas [24,62]. The ROC diagram of the optimal prediction strategy shows the prediction is obviously better than the random prediction, indicating the correlation between aligned GNSS-azimuths and earthquakes. Precursor patterns recorded several days before major earthquakes in the seismicity have also been found through the measurement of other remote sensing data, which may be helpful for earthquake prediction [63][64][65]. Sarlis et al. [64] used the natural time analysis to study the electric signal activity and found that changes of the order parameter appeared minima a few days before a mainshock. [64]. The statistical significance of this study has recently been verified through the ROC technique by focusing on the AUC [31] and event coincidence analysis [66].
From the 29 target earthquakes (Table 2), the 2013 Lushan Ms7.0 earthquake, 2014 Ms6.6 Ludian earthquake, and 2017 Ms7.0 Jiuzhaigou earthquake were the most remarkable. During the Lushan earthquake, the deformation signal was extracted by analysing the change of the GNSS geometry net-form, integrating the location, baseline length, and direction information of all GNSS stations in the seismic source area [67]. A significant pre-seismic deformation anomaly with locked status and deceleration in the first shear strain and direction were found in the GNSS time series, implying the occurrence of strong sinistral shear tectonic forces. Before the Ludian earthquake, the temporal evolution of the GNSS-derived orientation exhibited a unique disorder-alignment-disorder sequence that corresponded well with the four stages of an earthquake preparation [68]. This region of stress accumulation was generally consistent with the earthquake preparation zones estimated through numerical models [62]. And before the Jiuzhaigou earthquake, GNSS data were usually used to study the long-term strain accumulation of 3-8 years [2,69]. With regard to short-term anomalies, the seismo-conductivity anomalies could be observed by the geomagnetic data with epicentral distance from 63 to 385 km approximately 17 days before the earthquake, which matched the results of our predictive model. These anomalies are considered to be created by earthquake-related stress that accumulates in the crust [70]. Meanwhile, the rest target earthquakes with Ms ≥ 5 were also occasionally accompanied by multiple types of pre-earthquake anomalies [71,72]. The pre-earthquake anomalies of seismo-crustal deformation are also proved to contain precursory information in New Zealand, California, Japan and Taiwan [27,28,73].
These results suggest that surface deformation provides useful crustal displacement information before relatively large earthquakes. However, it is still far from the practical application for short-term earthquake forecasts. Since mechanisms and preparation phases of earthquake events and the time lags between anomalies and earthquakes generally are different from case to case, we were still unable to deal with accurate prediction [53]. Moreover, actual earthquake processes are more complicated than this prediction model. In future studies, we will improve the model by including additional factors combined with physical backgrounds and utilise more types of data, such as a combination of satellite and ground-based measurements.

Conclusions
Strainmeters provide precise observation of surface deformation that permits to capture strain anomalies related to earthquakes. However, whether these anomalies contain precursory information and their use for forecasting earthquakes were not evaluated before. In this study, we used ROC prediction to study the borehole strain data observed in Western China from 2010-2017. Initially, daily connections between multiple stations of borehole strain network were detected. The correlation between the network anomalies and local earthquakes (E s ≥ 10 7 ) was verified by Superposed Epoch Approach. Then, we proposed a new prediction strategy based on an assumption that clusters of network anomalies indicate an increased probability of an earthquake and discussed how it can be used in short-term forecasting. We further demonstrated the influences of N ano and T alm on short-term earthquake forecasting using AUC values. Furthermore, on comparing with GNSS geodetic data, the ROC prediction results indicate that the detection of borehole strain networks has a higher significance for forecasting. We found the optimal strategy for short-term forecast: that is to evaluate anomalies greater than 7 within 14 days and to set the alarm window at 1 day. After discussing three other influences on the prediction strategy, i.e., different E s thresholds, different proportions of anomalies in an anomaly window, and local earthquake selection, we consider that clusters of enhanced network connections may contain precursory information related to the earthquakes (E s ≥ 10 7 ) surrounding the strain network. The methodology proposed in this study could help evaluate prediction strategies and investigate the possibility of using different measurements for short-term earthquake forecasting.