Next-day largest earthquake magnitude forecasting with the aid of Moon tidal force and sunspot data

Seismicity is a complex phenomenon with a multitude of components involved. In order to perform forecasting, which has yet to be done sufficiently well, it is paramount to be in possession of information of all these components, and use this information effectively in a prediction model. In the literature, the influence of the Sun and the Moon in seismic activity on Earth has been discussed numerous times. In this paper we contribute to such discussion, giving continuity to a previous work. Most importantly, we instrument four earthquake catalogs from different regions, calculating the Moon tidal force at the region and time of each earthquake, which allows us to analyze the relation between the tidal forces and the earthquake magnitudes. At first, we find that the dynamical system governing Moon motion is unidirectionally coupled with seismic activity, indicating that the position of the Moon drives, to some extent, the earthquake generating process. Furthermore, we present an analysis that demonstrates a clear positive correlation between tidal force and earthquake magnitude. Finally, it is shown that the use of Moon tidal force data and sunspot number data can be used to improve next-day maximum magnitude forecasting, with the highest accuracy being achieved when using both kinds of data. We hope that our results encourage researchers to include data from Moon tidal forces and Sun activity in their earthquake forecasting models.


Introduction
A glimpse into the literature shows that researchers have been investigating earthquake forecasting for decades already, but for most practical purposes we are still unable to forecast large events.Despite that, progress has been achieved in small steps in each subfield of earthquake forecasting.In this paper, we focus on next-day forecasting using, as main input data, earthquake catalogs that provide magnitude, location and depth of each earthquake.Such data is readily accessible for numerous regions around the world, so it is important to assess what the best forecasting accuracy is, which can be obtained with just such kind of data.
In the field, one method to contribute to earthquake forecasting is to identify and evaluate precursor phenomena.Many of them have been studied, including foreshocks and smaller fractures that supposedly precede larger events (Niemeijer et al 2010, Rivière et al 2018, Dorostkar and Carmeliet 2019), anomalous variations in Earth's electric and magnetic fields Draganov et al (1991), Fenoglio et al (1995), Fujiwara et al (2004), emissions of chemical gases from underground (Richon et al 2003, Ghosh et al 2009, Tareen et al 2019), and the list goes on.
Besides those, the effects of the Sun and the Moon and, more generally, of tidal forces on earthquakes, have been analyzed quite a few times in the literature (Ide et al 2016, Hao et al 2019).In Saldanha and Hirata (2022), we investigated the effects of the sunspot numbers, and gave evidence that it indeed influences earthquakes on Earth.Here, we focus on the tidal force exerted by the Moon, which is hypothesized to have an influence on earthquakes because it is able to deform not only the ocean surface, but also the continental crust as well (Baker 1984, Volodichev et al 2003).Such movement could be a major player in the rate at which elastic strain energy builds up in faults, and also promote the appearance of cracks in the crust (nucleations) that can evolve into an earthquake.

Datasets analyzed
The earthquake catalogs we used for analysis were obtained from: the GeoNet3 project for New Zealand earthquakes, the Japanese Meteorological Agency4 for earthquakes in Japan, the Department of Geophysics-Geothermics of the University of Athens5 for earthquakes in the Balkan peninsula and surroundings, and the United States Geological Survey6 for worldwide earthquakes.The period covered by the datasets was chosen to be from 1 January 2000 to 31 August 2021.
We have also used the sunspot data provided by the American Association of Variable Star Observers7 .Besides that, the earthquake catalogs were instrumented with calculations of differential pulls exerted by the Moon using the invaluable tools provided by the Astropy library (Astropy Collaboration et al 2022).

Prediction framework
We use here the prediction framework proposed in Saldanha and Hirata (2022), of which we give a brief overview in the following.It consists mainly of calculating edit distances (Victor and Purpura 1997) between earthquake point patterns, and feeding these distances to a radial basis function network (Judd and Mees 1995).
Let P 1 = {t 1 , . . ., t m } and P 2 = {s 1 , . . ., s n } be two point patterns, then we can iteratively transform P 1 into P 2 using predetermined primitive operations.Each of these operations incurs a certain cost, and the edit distance is defined as the lowest possible cost necessary to do such transformation.The primitive operations and their associated costs are as follows: insertion insert point s i into P 1 , paying a cost of 1; deletion remove point t i from P 1 , paying a cost of 1; and shifting replace point t i with s j , paying a cost of λ|s j − t i |.
The hyperparameter λ is meant to normalize the values s j and t i so that the particular time scale chosen (e.g.measured in seconds or in hours) does change the behavior of the algorithm.The above is easily extensible to the case where each point is accompanied by additional information (called marks), such as magnitude, depth etc.Consider the marked point patterns P 1 = {(t 1 , u 1 ), . . ., (t m , u m )} and P 2 = {(s 1 , v 1 ), . . ., (s n , v n )}, with marks u i , v j ∈ R d .Then, the cost of insertion and deletion is maintained at 1, and the cost of shifting (t i , u i ) onto (s j , v j ) becomes (Schoenberg and Tranbarger 2008, Suzuki et al 2010, Hirata and Sukegawa 2019): where u i,k denotes the kth coordinate of vector u i .Again, the hyperparameters λ k are supposed to normalize the values such that the costs are all comparable to each other.
For analyzing the catalogs, we prepared a time window of 7 days and slid it by 1 day along the time axis, collecting a set x i of earthquake events for each window i; we note that the timestamp of an earthquake, after doing this process, is modified to be the time elapsed since the beginning of the time window it belongs to.This process was repeated for each catalog.The distances are fed to a radial basis function network, which finds a linear mapping with the form (Judd and Mees 1995): (1) where w are parameters found by least squares, and the equation is linear on the activation terms ψ (d X (x, c i )), with d X (x, c i ) being the edit distance between the earthquake point pattern of interest x and a few selected instances from the training set c i .In practice, the output ẑ(x, w) is a prediction for either the maximum magnitude or total number of earthquakes above a certain threshold on the day following x.
Figure 1 further illustrates how the earthquake patterns are collected.There are two kinds of sliding time windows: observed time windows have a length of 7 days, whereas forecast time windows are 1 day long, and they are slid throughout the dataset on a 1 day by 1 day basis.We collect all earthquakes within a observed time window, but for the forecasting time window, we only take the largest earthquake magnitude.Listing 1 shows how this is done algorithmically.
Predictions are made by comparing an observed time window from the test set (observed time window B) to the observed time windows from the training set (such as observed time window A), using the edit distance as means of comparison.In the case of figure 1, if the distance between observed time windows A and B was very low, then we could predict the outcome of time window B as being the same outcome of time window A. In practice, we do not have just one observed time window, but rather, we select 100 observed time windows from the past, and take the weighted average of the respective outcomes.The weighing is automatically done by the radial basis function network (Judd and Mees 1995).
To include sunspot and Moon tidal force data into the above framework, we modify equation (1) as follows: ẑ(x, y (1) , y (2) , w) where y (1) is the vector containing the 7 sunspot numbers (one per day) referring to the same week that x refers to, and y (1) c i is the same, but referring to the same week as c i ; the same reasoning applies to y (2) and y (2) c i , but here these vectors refer to the 168-dimensional vector containing the hourly tidal force collected over the 7 day window (the force is calculated on an arbitrarily selected point around the center of the region).These vectors always have the same dimension, so we use the Euclidean distance d E here.Parameters κ 1 and κ 2 are normalizing constants to make the three kinds of distances to have values in the same order of magnitude on average.
In order to validate such prediction framework, we perform tests on toy models, which are presented in B.

Test for coupling between dynamical systems
In order to test the existence of a causality relationship between the Moon and earthquakes on Earth, we employ the methods proposed in (Andrzejak and Kreuz 2011) and (Hirata et al 2016), which work mainly upon the two distance matrices obtained by calculating the distances between embedded states after appropriately embedding the two time series (see appendix A for an overview of Dynamical Systems concepts and terms).In the case that one dynamical system drives the other, if distances for the forced system in the reconstructed space are near, then distances for the driving force are also near, while the opposite is not true; the methods of Andrzejak and Kreuz (2011) and Hirata et al (2016) test whether this relation exists in the two distance matrices provided to it.The idea is based on the embedding theorem for the forced system by Stark (Stark 1999).
Listing 1. Pseudocode illustrating the preparation of the earthquake point patterns that arise by collecting the earthquakes occurring in an observed time window and the respective forecast time window, as illustrated in figure 1.By the end of its execution, the i-th element of 'eqPointPatterns' represents an earthquake point pattern for a time window of 7 days, and the i-th element of 'maxMag' and 'logN' contains the maximum magnitude and the log amount of earthquakes on the day following such 7 day period.
1 let 'catalog' <-the earthquake catalog 2 3 let 'eqPointPatterns' <-empty vector, will hold the earthquake point patterns 4 let 'maxMag' <-empty vector, will hold the maximum magnitudes on each day 5 let 'logN' <-empty vector, will hold the log amount of earthquakes on each day Let us assume that y (2) 2 , . . ., y (2)  n are the sequence of reconstructed states obtained by performing an embedding on the Moon tidal force time series, and that x 1 , x 2 , . . ., x n are the reconstructed states that result from an embedding on the earthquake data.Now let d E (y (2) j ) and d X (x i , x j ) be appropriate metrics defined in these two reconstructed spaces, which in our case are the Euclidean metric as d E , and the edit distance as d X .The method of Andrzejak and Kreuz (2011) operates on the two distance matrices resulting from these two metrics.Take the ith row in the distance matrix for Y (the tidal force system), and find the smallest distance in that row, excluding the diagonal and W elements adjacent to it (to avoid temporal correlation); assume that such element is the jth entry in the row, namely d E (y (2) j ).Now take the element located at position (i, j) in the distance matrix for X (i.e.earthquakes), which is d X (x i , x j ), and record its rank in that row after excluding the same diagonal elements as before.This element could be, for example, the fourth lowest distance in that row.By repeating this process for each row of the distance matrices, we obtain a collection of ranks r 1 , . . ., r n , and if the system Y does not drive X, these ranks are expected to be on average about half the number of non-excluded elements in a row, which for most rows is (n − 2W)/2 (recall that W is a parameter to exclude the entries referring to states that might be close in space because they are close in time, and it is chosen depending on the parameters on the embedding performed to reconstruct the states y (2) i and x j , ∀i, j ∈ {1, . . ., n}).Thus, for each row, call the number of non-excluded elements K i and let G i = (K i + 1)/2 be the expected rank if no coupling exists, then their method calculates the quantity (which can be called the 'A-K coupling index from X to Y'): which is a number belonging to [−1, 1], and the value 0 represents absence of coupling.It is straightforward to derive L(X → Y) using the same logic as above.
Hirata et al (2016) method works as follows.First, we normalize the distances such that the sets {d E (y In case there is no coupling, then there should be no difference in the distribution of points if we take different slices of the region, such as [0, 1] × [0, 0.25] and [0, 1]× [0.25, 0.5] (or by slicing the left-hand interval and keeping the right-hand intact).On the other hand, if there existed a coupling from the Moon to earthquakes on Earth, then the points in [0, 0.25] × [0, 1] would be in general nearer to the x-axis than in the other region slices, reflecting that small distances in the driving system tend to be associated with small distances in the driven system (which is the definition of dynamical coupling).In this context, the method of Hirata et al (2016) performs a statistical test for the presence of such behavior.In contrast to the example given above, however, the method considers that the [0, 1] interval is sliced into a user-defined number of bins.Of course, coupling on the opposite direction can be tested by merely inverting the meaning of values y (2) i for all i, and x j for all j, above.Like with the prediction framework, these coupling tests are also validated using toy models, as also presented in appendix B.

Tidal force calculation
As broadly known, the acceleration of the Earth towards the Moon is given by a , where m M is the mass of the Moon, d M the distance between the Earth and Moon, and G the gravitational constant.Bodies as huge as the Earth will have some portions nearer to the Moon (the 'ear side') and portions farther (the 'far side'), and due to the d 2 M term involved in the gravitational force, these two different portions of the Earth will be pulled towards the Moon with different intensities.Let r E = 6.37 × 10 6 m be the approximate radius of the Earth, then an object with mass 1 located in the near side of the Earth is pulled towards the Moon with acceleration a ′ M = Gm M /(d M − r E ) 2 , and most importantly, if we evaluate the difference a ′ M − a M , we get the relative acceleration of the object on the surface, when observed from the center of the Earth itself.The difference becomes: now since d M ≫ r E , we can eliminate the terms r E /d M , and also because 2r E ≫ r 2 E /d M , the right-hand term can be also removed from the numerator, thus yielding the approximation (Sawicki 1999): commonly called 'differential pull.'An object in the near-side surface of the Earth would move towards the Moon with this acceleration, if the Earth did not exert any gravitational force on that object.One implication of this is that objects on the surface of the Earth are slightly lighter (if measured using a scale) than we would expect from the well known formula m • g.Interestingly, the exact same effect occurs in the far-side of the Earth surface, and with the same magnitude of differential pull.The point here is that the Earth is pulled towards the Moon with more intensity than the object in the far side of the surface, so if the Earth gravity did not exist, the object on the surface would be 'left behind' with a speed equivalent to the differential pull found above.The calculations for this are very similar to the above, but for the difference: The above equations were used to calculate the differential pulls at each region and at each hour.We calculated the position of Earth and Moon using Astropy, which, when making calculations for periods of time in the past, offers values with more than sufficient accuracy for our purposes here.By doing this, we are also implicitly dealing with the variable sidereal period of the Moon, which can vary ∼3.5 h about the average value (estimated using Astropy, also see Wilkinson (2010)).In this sense, our method differs, for example, from what is done in Hao et al (2019), whose conclusions concerning Moon influences are mainly based on approximating its period as being fixed.
We note that we are aware of the existence of models for mathematical models for the calculation of differential pulls and other physical quantities associated to tidal forces (Longman 1959, Agnew 2005).However, since Astropy gives us very accurate positions of the Earth, Sun and Moon when considering time points in the past, we consider that using such data would be better than using mathematical models that cannot capture all the gravitational interactions (including those resulting from other planets in our Solar System) concerning each of these three bodies.Of course, using mathematical models allows performing analyses farther in the future.However, in our prediction model, we only need to know the heavenly body positions up to the present date and time, which would already allow us to make a prediction for next 7 days.If for any reason the actual positions are not available or cannot be obtained algorithmically, we note that Astropy is also very accurate for estimating the trajectory of the Earth, Sun and Moon in the near future (Astropy Collaboration et al 2022).

Results
The influence of the Moon on earthquakes has been analyzed various times in the literature.However, while those studies tend to focus more on finding correlations between those two phenomena, we opt for another approach, one that detects causality instead; in this sense, we note that correlation does not necessarily imply causality (Liang 2014).Having determined the existence of causality, we also investigate whether the strength of tidal forces can be used to improve accuracy of earthquake forecasting.
By using the method of Andrzejak and Kreuz (2011), as explained in section 2.3, we found that the Moon drives earthquakes (represented by Y → X, with Y being Moon effects) to some extent, since L(Y → X) is usually much larger (in absolute value) than L(X → Y); the one exception is the Touhoku case, where a negative value was observed for L(Y → X), but this kind of situation is difficult to interpret, and can probably be attributed to the large changes in earthquake frequency and magnitudes in Touhoku around the year of 2011 (see table 1).Besides that, unidirectional coupling is also identified when we use the method of Hirata et al (2016), since the p-values for the Y → X case are in general a lot smaller than for X → Y (table 1), except for the Japan case, where such relation is inverted, and in fact it indicates that earthquakes drive Sun electromagnetic activity, but surely there must have been stochastic factors that led this case to behave differently from the others.We see that in most cases (but not all), the methods indicate a coupling Y → X, so it can be concluded, from table 1, that there is evidence to believe that such a coupling indeed exists.
Besides dynamical coupling, we also analyzed the direct relation between the differential pull and the magnitude of each earthquake.Here, although the emphasis is in correlation (rather than causation (Liang 2014)), the results below should still serve to strengthen the analysis presented above.Figure 2 shows what happens if we take the average differential pulls of all earthquakes with magnitude greater than or equal to a threshold m, which is represented by the x-axis.The lines stop as soon as the number of earthquakes above that threshold is lower than 25, otherwise we would observe a sudden increase in the instability of the lines as the threshold increases.What we observe here is a clear tendency of the average differential pull to increase as the magnitude threshold increases, which thus indicates a tendency of larger magnitude earthquakes to happen when there is a higher tide.
However, note that, when compared to the maximum differential pull encountered in the instrumented datasets, most cases display only a moderate increase (ranging between 9.3% and 31.1%)relative to the average differential pull in the corresponding dataset.In other words, although the tendency seems to exist, it is very far from meaning that earthquakes tend to occur when the moon is at nadir or zenith, for example; what is most meaningful here is that the Moon tidal force seems to be positively correlated with the earthquake magnitudes.
That said, it might be worth remarking that analyzing correlation directly with the two time series (magnitude and differential pulls) in the dataset is made difficult since the number of earthquakes with lower magnitudes is a lot larger than those with larger magnitudes.We also remark that, in figure 2, the upper bound on the magnitude threshold for the worldwide dataset ended up being lower than for the Japanese one, even though the latter is supposed to be a subset of the former.This is because the datasets come from different sources, and the calculation of the magnitude differs between them, giving some differences in the magnitude assigned to the same earthquake (an average difference of 0.3, but that can reach up to 1.6).Table 2 shows a direct comparison between smallest magnitude earthquakes with higher magnitude ones.Here, we compare the 20% smallest with the earthquakes above each threshold shown in the second row of the table, and then perform a Wilcoxon rank-sum test to compare the median differential pulls of both sets of earthquakes.The alternative hypothesis here is 'the earthquakes with larger magnitudes have larger differential pulls.'In the table, we see the null hypothesis is rejected in more than two thirds of the cases, at a significance level of 5%.Thus, again, indicating that a larger differential pull tends to be associated with earthquakes with larger magnitude.
We then proceeded to try verifying how data from the Sun and the Moon can affect forecasting accuracy.The forecasting algorithm used here is as delineated in section 2.2, and we remark that the choice of the parameters κ 1 and κ 2 was as follows.After performing an embedding in each of the three kinds of data (earthquakes, sunspots and moon differential pulls), we end up with reconstructed states, denoted respectively by the indexed sequences x i , y Y .Let m(D) represent the average of the lower triangle part of matrix D, then κ 1 and κ 2 are chosen so that: where we take half the mean of the distances in the earthquake dynamical system so that it is not overwhelmed by the other two distances in equation ( 2).Results of New Zealand earthquake forecasting using different kinds of features: (i) using just earthquake data (the Baseline method), (ii) using earthquake data and Moon differential pulls, (iii) using earthquake and sunspot data, and (iv) using earthquake data, Moon differential pulls and sunspot data.The figures show the predicted log-number of earthquakes (log (N + 1)) against the real magnitudes so that he correlation can be observed.
The prediction results are shown in figures 3-8.Besides the correlation, we also calculate the odds-ratio (another metric of accuracy) by performing a binary forecasting, where we associate the event 'the maximum magnitude on a day will be larger than a fixed threshold α' to the event 'the predicted log number of earthquakes on that day is larger than a constant β' .This results in a confusion matrix: where A, B, C and D are the number of cases that fulfill the conditions shown in the respective row and column.The odds-ratio is then given by A • D/B • C, providing insights into the odds of correct positive predictions compared to incorrect ones.Higher odds ratios indicate a more favorable balance between true positives and false positives, suggesting better predictive performance.The odds-ratio was collected for each trial, and they are shown in the boxplots in figures 4-8.
In figure 3 we show an example of predictions made for the New Zealand case, where we predict the logarithm of the number of earthquakes (log (N + 1), with N being the number of earthquakes), and correlate it to the actual magnitude of these earthquakes (both these quantities are related by the Gutenberg-Richter's law (Stein andWysession 2003, Saldanha andHirata 2022)); we do not predict the maximum magnitudes directly because they are less regular than the log-number of earthquakes, and this approach demonstrated better results.For earthquakes in Japan (figure 4), an improvement in correlation is observed when using data from the Moon tides, and an even larger increase (5.37% improvement) when both Moon tidal force data and sunspot data are used.However, when using sunspot data only, the improvement is negligible.Here, odds-ratio was not improved by using Moon tidal force data or sunspot data individually, but there is a 3.51% increase when using both sunspot and Moon tidal force data.
Figure 5 shows results for the Touhoku region, which is a subset of the Japan data.In this case, the average correlation obtained is considerably larger than for the Japan nationwide case; in fact, we have observed that analyzing smaller regions tend to lead to the better accuracy, a phenomenon that we will be further analyzing in future work.The accuracy gain brought by including sunspot data and Moon tidal force data is also more noticeable, with up to a 29.62% increase when using both Moon tidal force data and sunspot data; however, similarly to the Japan case, using sunspot data alone leads to a lower improvement (in odds-ratio as well).When including all three kinds of data, we get a 76.63% increase in odds-ratio, which is also remarkable.
For New Zealand (figure 6), we see that in this case sunspot data was more relevant in terms of correlation, with a 8.86% increase.Including Moon tidal force data on top of sunspot data actually led to a smaller improvement in the correlation; the odds-ratio, however, increased 5.72%, which is more than when using sunspot data and Moon tidal force data individually.
Finally, figures 7 and 8 show results for the Balkan region and a subregion located in its west side, where there is a large concentration of earthquakes.Again, by restricting the analysis to a subregion, we ended up obtaining far larger average correlations.In both cases, substantial increases in correlation and odds-ratio are attained when including sunspot data or Moon tidal force data, and the highest values are seen when using all three kinds of data simultaneously.Correlation and odds-ratios for predicting earthquakes in Japan nationwide, using different settings of data sources.In the Baseline model, only earthquake data is used, and in the other settings we use: (i) earthquake and Moon data, (ii) earthquake and sunspot data, and (iii) earthquake, Moon and sunspot data.The thresholds for calculating the odds-ratio were chosen as α = 6 (for the real magnitudes) and β = 3.2 (for the predicted values).The improvements in correlation median (relative to the Baseline) were: 2.90%, 0.82%, 5.37%, respectively.In odds-ratio median, they were: 0%, 0.77%, 3.51%.

Conclusion
In this paper, we have presented evidence that support existing hypotheses that earthquakes are affected by entities external to the Earth, namely, the Sun and the Moon.We had begun to analyze the case of sunspot numbers in previous work (Saldanha and Hirata 2022), but here it has been extended to demonstrate that the conclusions also apply very well to the Balkan peninsula region (and its west subregion), thus strengthening our previous results.
As for the Moon, we found that the coupling detection tools from dynamical systems indicate that the motion of the Moon is unidirectionally coupled with the generating process of earthquakes, meaning that the position of the Moon has an influence on the latter, although the extent of this influence cannot be inferred from these methods alone.Subsequently, we used planetary motion simulation to calculate the Moon differential pull at the region of each earthquake in the moment they occurred, and analyzed the relation of these forces with the magnitude of earthquakes.To the best of our knowledge, although there are related work discussing the correlation between the amount of earthquakes and the Moon tidal forces (Métivier et al 2009) as well as the correlation between the magnitudes of earthquakes and the combined effects of the Sun-Moon tidal forces (Ide et al 2016), the combined 'causal' effects of sunspot numbers and the Moon tidal force on earthquakes on the Earth had not been investigated yet.Our analysis strongly indicates the existence of causal relation between the Moon tidal force and the magnitude of earthquakes, which we interpreted as a slight-to-moderate tendency of larger earthquakes to occur when the tidal force is relatively large at the position they occur.Finally, we showed that next-day magnitude forecasting accuracy can be improved by using Moon tidal force data, and improved even more by using both Moon tide and sunspot information.
We hope that our results encourage the use of Moon tide and sunspot information to improve earthquake forecasts, so future work should endeavor to design forecasting algorithms that can use these heterogeneous kinds of data more effectively.We also highlight that our work is limited in the sense that only next-day maximum magnitudes are forecast, with no regard whatsoever to where the earthquake will occur.A difficulty arises here since the representation of the spatial dimension in terms of latitude and longitude appear to be inadequate for forecasting the position of earthquakes; further exploration of this problem is left as future work.

Figure B1.
Results of applying the prediction framework described in section 2.2 to a toy model concerning a numerically simulated periodically forced Rössler system of differential equations.The local maxima is taken along the x axis of the system, so as to mimic the earthquake behavior.

Figure B2.
Results of applying the prediction framework described in section 2.2 to a toy model concerning a numerically simulated periodically forced Lorenz system of differential equations.The local maxima is taken along the x axis of the system, so as to mimic the earthquake behavior.
Then we calculate the pairwise Euclidean distances among the vectors v 1 , . . ., v n , and also the pairwise Euclidean distances among the vectors u 1 , . . ., u n , which yields two distance matrices that we use in the same as we used for earthquakes.The prediction results are as shown in figure B3, where the inclusion of the complete information about both systems increases the correlation slightly and also reduces the variance of prediction significantly.
For the Lorenz system, once again t F is taken to be 10 000 instead, and the first 50 thousand simulated points are discarded.The rest of the procedure was executed in the exact same manner, and the results can be found in figure B4, where we again observe a higher correlation and lower variance therein.
We have also applied the coupling test of Andrzejak and Kreuz (2011) to all the toy models.Let X represent the forced system and Y the driving force, then for the Rössler model using time windows, we had (L(Y → X), L(X → Y)) = (0.005, −0.001), and using delay coordinates these values were (0.006, −0.002).For the Lorenz model using time windows we obtained (0.010, −0.004), and using delay coordinates it was (0.003, −0.006).
Thus, for all cases, the values of L(Y → X) have the correct sign and are statistically significant at a significance level of 0.1, except for the Lorenz model using delay coordinates.The values of L(X → Y) are all negative, which does not have any particular meaning.Results of applying the prediction framework described in section 2.2 to a toy model concerning a numerically simulated periodically forced Rössler system of differential equations.The local maxima is taken along the x axis of the system, so as to mimic the earthquake behavior.

Figure B4.
Results of applying the prediction framework described in section 2.2 to a toy model concerning a numerically simulated periodically forced Lorenz system of differential equations.The local maxima is taken along the x axis of the system, so as to mimic the earthquake behavior.
Initially, we expected the coupling results to be more prominent, since it is a toy model running in perfect conditions, but we conjecture that since the systems are forced by a simple sine function and at a quite modest coupling strength, the simplicity of such driving force makes it more difficult to detect the coupling.In fact, if we substitute the driving force in the Rössler model by the time series generated by another Rössler model, we observe L(Y → X) = 0.025, which is sensibly higher than the ones observed above.

Figure 1 .
Figure 1.Illustration of the data preparation and organization used in this project.The period considered was from 1st January 2000 up to 31st August 2021.Observed time windows have a length of 7 days, whereas forecast time windows are 1 day long.These windows are slid throughout the dataset on a per 1 day basis.

Figure 2 .
Figure2.Average differential pull of the earthquakes above the threshold given in the x-axis.The lines stop when there are less than 25 earthquakes in the calculation of the average.We observed an increase for every earthquake catalog analyzed.
are separated in training and test sets.For each kind of data, we calculate the pairwise distances within the training set, resulting in three symmetric distance matrices D X , D

Figure 3 .
Figure3.Results of New Zealand earthquake forecasting using different kinds of features: (i) using just earthquake data (the Baseline method), (ii) using earthquake data and Moon differential pulls, (iii) using earthquake and sunspot data, and (iv) using earthquake data, Moon differential pulls and sunspot data.The figures show the predicted log-number of earthquakes (log (N + 1)) against the real magnitudes so that he correlation can be observed.

Figure 4 .
Figure 4. Correlation and odds-ratios for predicting earthquakes in Japan nationwide, using different settings of data sources.In the Baseline model, only earthquake data is used, and in the other settings we use: (i) earthquake and Moon data, (ii) earthquake and sunspot data, and (iii) earthquake, Moon and sunspot data.The thresholds for calculating the odds-ratio were chosen as α = 6 (for the real magnitudes) and β = 3.2 (for the predicted values).The improvements in correlation median (relative to the Baseline) were: 2.90%, 0.82%, 5.37%, respectively.In odds-ratio median, they were: 0%, 0.77%, 3.51%.

Figure B3.
Figure B3.Results of applying the prediction framework described in section 2.2 to a toy model concerning a numerically simulated periodically forced Rössler system of differential equations.The local maxima is taken along the x axis of the system, so as to mimic the earthquake behavior.

Table 1 .
Hirata et al (2016) (2011)upling tests ofAndrzejak and Kreuz (2011)(represented by the A-K coupling indices L(Y → X) and L(X → Y)) andHirata et al (2016)(represented by the p-values) on the datasets considered in this paper.Here Y corresponds to Moon tidal effects, whereas X represents earthquakes.

Table 2 .
Hypothesis tests comparing the differential pulls of the 20% smallest earthquakes against those of the earthquakes whose magnitudes are above the thresholds 5, 5.25, 5.5, 5.75 and 6 (as indicated in the first row of the table).The values in the table display the p-values obtained with a Wilcoxon rank-sum test.Values in boldface represent a rejected null hypothesis at a significance level of 5%.