The epistemic and aleatory uncertainties of the ETAS-type models: an application to the Central Italy seismicity

Stochastic models provide quantitative evaluations about the occurrence of earthquakes. A basic component of this type of models are the uncertainties in defining main features of an intrinsically random process. Even if, at a very basic level, any attempting to distinguish between types of uncertainty is questionable, an usual way to deal with this topic is to separate epistemic uncertainty, due to lack of knowledge, from aleatory variability, due to randomness. In the present study this problem is addressed in the narrow context of short-term modeling of earthquakes and, specifically, of ETAS modeling. By mean of an application of a specific version of the ETAS model to seismicity of Central Italy, recently struck by a sequence with a main event of Mw6.5, the aleatory and epistemic (parametric) uncertainty are separated and quantified. The main result of the paper is that the parametric uncertainty of the ETAS-type model, adopted here, is much lower than the aleatory variability in the process. This result points out two main aspects: an analyst has good chances to set the ETAS-type models, but he may retrospectively describe and forecast the earthquake occurrences with still limited precision and accuracy.


Data and Model
Locations and magnitudes of earthquakes, recorded by the stations of the Italian National Seismic Network (downloadable from cnt.rm.ingv.it), are evaluated in real-time in the surveillance room of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) in Rome and then revised by the analysts of the Italian Seismic Bulletin (Bollettino Sismico Italiano -BSI). By following the BSI strategies of revision, all events with ML ≥ 3.5 are quickly revised, whereas the standard review is done for the smaller events, within an agreed timeframe 5 .
The recent Central-Italy seismicity includes two main sequences: the 2009 L' Aquila sequence, having 5 events above ML5.0 (the largest ML5.9 event occurred on April 06 2009) and the still ongoing 2016-2017 Central Italy sequence, for now having 9 events above ML5.0 and 2 events above ML6.0 (the ML6.0 Amatrice event, at August 24 2016, initiating the sequence, and the Mw6.5 Norcia event, at October 30 2016). At present (2017 August), all events above ML3.5 have been revised, whereas the smaller ones, occurred after August 24 2016, are currently under revision.
The dataset , analyzed here, collects the events occurred from April 16 2005 up to March 28 2017, in the area [12.50-14.00E, 42.00-43.20N], with magnitude above 3.0 and depth between 0 and 40 km (1,599 events; see Fig. 1a). The starting date marks the start-up of a new seismic network, causing a significant improvement of the earthquakes detection 6,7 . The magnitudes distribution is in agreement with a Gutenberg-Richer Law, having a b-value b equal to 1.1 (see Fig. 1b). A time-dependent evaluation of the completeness magnitude (Mc), by mean of the Mc and B-value Stability (MBS) 8,9 and the Goodness of Fit Test (GFT) 10 methods, reveals a sudden increase of Mc, up to ML3.5, soon after the Mw6.5 event (see Fig. 1c). Otherwise,  can be considered complete.
I use the TMS version of the ETAS model () implemented in SEDAv1.0 (Statistical Earthquake Data Analysis), a statistical software freely provided via the Zenodo open access platform 11 (https://zenodo.org/ record/55277). This has the intensity function  where is the observed data (history) up the time t; • T i , M i and (X i , Y i ) are the time, the magnitude and the epicentral coordinates of the i-th event, respectively; • (x, y) belongs to a region ; } is the spatial probability density function (PDF) of the background events; k p c d q { , , , , , , , } are 8 parameters to be estimated, where μ is the Poisson background seismic rate, k is the productivity coefficient, p and c are the parameters of the generalized Omori Law, α is the coefficient of the exponential magnitude productivity law (called Utsu Law), d, q and γ are the parameters of the spatial distribution of triggered events. • r i is the distance (in kms) between the location (x, y) and the epicenter of the i-th event (X i , Y i ); The background spatial PDF u(x, y) is assumed uniform in each of the N c cells C j (of area A j ) of a regular grid (in degrees), recovering the region  of interest, so that where u j is the probability to observe a background event inside the cell C j , per day. I estimate the ETAS model by using the data of  occurred from April 16 2005 to August 24 2016, since these are complete and totally revised by BSI. For the present study, I use a background grid covering the region  = [12.65-13.75E, 42.05-43.15N] and square cells C j with a side of 0.01°. By applying the SEDAv1.0 simulated annealing algorithm (1000 runs), based on the maximum likelihood criterion 11,12 , I found the parameters shown in Fig. 2. These distributions represent the "epistemic" uncertainty, in the context of this ETAS-type model, but they are not independent, since some parameters are highly correlated, causing the multimodality of log-likelihood function 13 . The different runs generate similar spatial background distribution (Fig. 2b), whereas some parameters are much more uncertain. Indeed, the proportion of 95% confidence interval size for each parameter, respect to median value (multiplied by 100 to obtain percentages), goes from 4% to 78% (Fig. 2a). In the following, I fix M max = 7.5, which is a precautionary limit of what expected in Central Italy, by historical and geological data 14,15 . The impact of this choice will be analyzed in the future.
which has a 95% confidence interval size equal to about 35% of its median value (0.57), we can compute the background (BR), triggering (TR) and overall (OR) rates to equilibrium, for each model, by equations whereas BR varies in a range equal to about 15% (95% confidence level; see Figs 2 and 3d) of its median value, the respective percentages for TR and OR are close to 80% and 50% (Fig. 3d).

The "epistemic" and "aleatory" uncertainty
As above said, the intensity function of the ETAS-type models (eq. 1) well represents and separate the "aleatory" and "epistemic" uncertainties. The history  t refers to "aleatory" natural randomness of the process, whereas the parameters , together with the spatial background distribution u(x, y), represent the "epistemic" reducible uncertainties.
To quantitatively measures both "aleatory" and "epistemic" uncertainties, two types of analysis will presented in the following. In the first, the history is kept fixed to the observations  t obs , whereas the uncertainty of param-   Figure 4(a,c,e,g) shows some time windows of 15 days, including all events above ML5.0. Firstly, the model uncertainty is negligible, since the 99% confidence bounds of theoretical rates are very close to their median, for each day D j . Secondly, the model generally well describe the time behavior of seismicity, but it is not able to fit the number of events observed at occurrence days of main events. You have to remember that this analysis does not include the natural randomness of the process, so that the observed number of events N obs j is a percentile of a random distribution, predicted by  i , having as mean Ne th i j , . Anyway, the systematic underestimation of observed number of events, soon after the occurrence of largest events, suggests an intrinsic inefficiency of the model, probably only partially justified by the preliminarity of data.   Specifically, for each model  i , l compute the theoretical daily number of events within 5 km from the center of each cell, in order to minimize the impact of possible locations errors of not revised events. The maps of the median and of the 99% confidence bounds of these rates, for each cell, show that the spatial variability is negligible, including the areas near the locations of main events (Fig. 5d,h). "Aleatory" uncertainty: the natural randomness. In this analysis the model is fixed to  best , having the maximum log-likelihood (see Fig. 2, red lines), and the history is changed by simulations. Specifically, for each day D j , the system consider 10000 different histories , consisting of observations occurred before D j (so that all histories are equal before D j ) and of simulations inside D j . These simulations corresponds to what the model  best forecasts, given the observed history  t obs , before D j . The theoretical daily number of events Na th k j , are computed by integrating the intensity function on the overall region, the entire magnitude range and the interval time D j , for each history t k j , t m x y dt dx dy dm ( , , , / ) , , ,  Figure 6 corresponds to Fig. 4, for this second analysis. The first result is that the "aleatory" uncertainty is much larger than the "epistemic" one. Specifically, the distribution of occurrences is asymmetric, with a strongly positive skewness. This result is due to the not negligible probability to have a medium-large event, causing an increase of seismic rate. Secondly, the observed number of events fully falls within the range of model forecasts, but for days in which the main events occur, since these are not still included in the histories. This means that what actually happened is generally covered by the model best  . The check of the spatial variability of the seismic rates (Fig. 7) show a much larger natural randomness respect to the "epistemic" uncertainty ( Fig. 5), especially around the main event location. This is much more close to a binomial distribution than to a poisson one, since it has a variation much lower than the median and a high skewness.

Discussion and Conclusions
The main aim of this paper is to quantitatively measure uncertainty of an ETAS-type model. The role of uncertainties and the manner to address them appropriately has been discussed by different specialists for a long time (see ref. 17 ). The nature of uncertainties of a model and how one deals with them depends on the context and the application. Taking in mind the questionable meaning of terms "epistemic" and "aleatory" (as discussed in the Introduction section), these measures may be useful to improve the accuracy of a model and to provide an objective judgment of its both retrospective and forecast performances. Specifically, this paper presents results deriving by the application of an ETAS-type model on Central Italy seismicity.
The main result of this paper is that the "aleatory" variability is much larger than the "epistemic" uncertainty, at least in the context of the specific version of the ETAS-type model used here. This result has two important consequences. The first is that the Simulated Annealing algorithm, used to set the model, has a good search ability, which allows to return the optimal solution of the log-likelihood maximization. Different sets of parameters may lead to very close log-likelihood, due to strong correlation among them. This result partially invalidates the physical meaning given to individual parameters, but allows us to quantify their resolution. Whereas the speeds of the temporal and spatial decay for the triggered events (p and q parameters) are well constrained, c parameter is much more uncertain (Fig. 2). The moderate variability of background rate (Fig. 2a) and of background spatial distribution (Fig. 2b) points out a good discrimination between independent and triggered events. This is confirmed by low variability of background equilibrium rate BR and of branching ratio BrR (Fig. 3d). More generally, the parameters variability has a small impact on the main features of the model (Fig. 3).
The analysis of the "epistemic" uncertainty, discussed in the present paper, comprises only the parameters uncertainty of the ETAS-type model proposed here. None consideration is made about different parametrization of the ETAS modeling or about further short-term stochastic models. The parametric uncertainty analysis (Figs 4 and 5) shows a good ability of the model to retrospective forecast the occurrences, considering that the variance of what expected is not taken into account. A quantitative test of retrospective agreement with observation is beyond the scope of this paper and will be made in the future. In the context of present study, the sensitivity analysis on model parameters is not conducted, due to correlations among them 13 . Indeed, the distributions shown in Fig. 2 are not independent and cannot be used individually, for the identification of more critical parameters. The right way forward is to include these correlations by using model configurations  i . The large aleatory uncertainty (Figs 6 and 7), so as the scarce ability to retrospectively fit observations soon after the strongest events (Fig. 4), is associated with the still poor modeling of main short term features of seismicity. The ETAS-type models do not take into account specific characteristics of the ongoing seismicity, as the possible activation of close major faults, during a sequence. The background rate, constant in time and variable in space, is inadequate to represent the complexity of long-term physical processes, leading the starting of significant sequences. For these reasons, the ETAS-type models are able to provide plausible, even if highly uncertain, forecasts of aftershocks, but they do not usefully predict the occurrence of large shocks, both before and during a sequence. All these results show that a great effort needs to be done to improve this type of models.
The present paper does not discuss the impact of data uncertainty and of highly temporary incompleteness, soon after the Mw6.5 event occurrence (October 30 2016). This topic will be treated in future work, when the catalog revision by BSI will be finished. This is of prominent importance for two reasons. Firstly, data uncertainty is a particular type of epistemic uncertainty that might play a role in judging the performance of ETAS-type models 18 . Secondly, the assessment of data uncertainty/incompleteness, and of their impact on the ETAS-type models, can help to provide new, more efficient, real time forecasting procedures, comprised in the Operational Earthquake Forecasting 19 .
The impact of incomplete data set on the ETAS model estimation has been studied by Omi, Ogata and colleagues [20][21][22] , which proposed a Bayesian estimation method to apply on incomplete early aftershock data, soon after the occurrence of larges events. Anyway, in the present study, the temporary incompleteness is a problem of lesser importance, since it is limited to the first day after the main event occurrence (Mw6.5, October 20, 2016; see The shape of aleatory uncertainty (Figs 6 and 7) is more close to a binomial distribution than a poisson one. This result does not confirm that a binomial distribution always well approximates forecasts made by an ETAS-type models. This topic will be investigated in future work. Nevertheless, this result is in agreement with the well-known result of unsuitability of Poisson hypothesis for this type of forecasts 23,24 .
Aleatory uncertainty is naturally associated with the uncertain future of the system (i.e., the occurrence of the events). Anyway, as above said, it is not independent by not-aleatory uncertainties. Reducing aleatory variability might be not inexpensive and causes an increase of epistemic uncertainty. This last includes a "blind" ignorance, more difficult to deal with because of their unknown nature 25,26 . Even if the classification of uncertainty as aleatory or epistemic is ambiguous, studies like this are likely to reveal important information about the reliability of existing models and to provide guidance on how they can be improved.