Natural Hazards and Earth System Sciences Statistical eruption forecast for the Chilean Southern Volcanic Zone : typical probabilities of volcanic eruptions as baseline for possibly enhanced activity following the large 2010 Concepción earthquake

A probabilistic eruption forecast is provided for ten volcanoes of the Chilean Southern Volcanic Zone (SVZ). Since 70% of the Chilean population lives in this area, the estimation of future eruption likelihood is an important part of hazard assessment. After investigating the completeness and stationarity of the historical eruption time series, the exponential, Weibull, and log-logistic distribution functions are fit to the repose time distributions for the individual volcanoes and the models are evaluated. This procedure has been implemented in two different ways to methodologically compare details in the fitting process. With regard to the probability of at least one VEI ≥ 2 eruption in the next decade, Llaima, Villarrica and Nevados de Chill án are most likely to erupt, while Osorno shows the lowest eruption probability among the volcanoes analysed. In addition to giving a compilation of the statistical eruption forecasts along the historically most active volcanoes of the SVZ, this paper aims to give “typical” eruption probabilities, which may in the future permit to distinguish possibly enhanced activity in the aftermath of the large 2010 Concepci ón earthquake.


Introduction
A series of volcanoes frequently erupting in historical times is situated in the Southern Volcanic Zone (SVZ) in Chile, an active volcanic arc produced by subduction of the Nazca Plate beneath the South American Plate.Large parts of the regions adjacent to this segment of the Andes in both Chile and Argentina are intensely used by the population for living, industrial zones, agricultural production, recreation and tourism.At a first glance, some of the volcanoes may be Correspondence to: Y. Dzierma (ydzierma@geophysik.uni-kiel.de)perceived as remote because of limited accessibility, but dispersal of fallout tephra as produced by explosive eruptions easily overcomes the distances the volcanoes have to urban areas, placing them volcanologically in the endangered vicinity.Estimating the likelihood at which eruptions will occur in the near future is therefore an important part of the necessary hazard assessment for these risk-exposed regions, aiming at mitigation of potentially upcoming volcanic crises.
In addition to the highly active volcanoes, this subduction system is known to have experienced several of the largest earthquakes in history, in particular the 1960 Valdivia earthquake (with a magnitude of 9.5) and the 2010 Concepción earthquake (offshore Maule, magnitude 8.8) (Fig. 1), which have both been associated with tsunamis.Elevated volcanic activity in the aftermath of large earthquakes has long been a subject of research and debate, and is still not satisfactorily understood.Already Charles Darwin postulated increased activity levels in the volcanic chain following the 1835 Concepción earthquake (Darwin, 1840).By means much beyond visual observation, it has been suggested in several recent studies that processes such as the transfer of stress and opening of fluid pathways following large earthquakes may play a fundamental role in triggering arc volcanism (e.g.Walter, 2007;Walter and Amelung, 2007;and references therein).Those studies approached the seismic-volcanic interplay by numerical modelling.Advances looking at the temporal relationship have been provided by Watt et al. (2009), demonstrating augmented eruptive activity following the 1960 Valdivia earthquake in Chile and the timescales at which the volcanic response may happen; this is also addressed, by, e.g., Jupp et al. (2004), Lemarchand and Grasso (2007).
In the coming years, the large Concepción earthquake of 27 February 2010 will provide another opportunity to better understand the temporal frame and regional catchment of earthquake-triggered eruptions.To detect elevated volcanic activity caused by the earthquake, it is an essential Published by Copernicus Publications on behalf of the European Geosciences Union.

T re n c h
Fig. 1.SRTM image of the Chilean Southern Volcanic Zone (SVZ) displaying the location of the volcanoes considered in this study (red triangles), other major volcanic centres within the SVZ (yellow triangles), epicentres of the earthquakes (white stars), the plate boundary (dashed blue line), major cities (white circles).Inset shows the location of the SVZ on the South American continent.Image courtesy SRTM Team NASA/JPL/NIMA, modified.
pre-requisite to define a baseline of "normal volcano activity in historical times" of the Chilean SVZ, against which the volcanic activity in the coming months to years can be statistically compared.The purpose of this paper is threefold: Firstly, it intends to give a compilation of statistical eruption probabilities based on eruption time series to be used for hazard estimation and management, as complete as the historical eruption record up to the end of the year 2009 permits.Secondly, it provides the "typical" eruption behaviour in the SVZ in the hope of allowing for discrimination between expected eruptions and those triggered by the 2010 Concepción earthquake.Finally, from the methodological point of view, the application of the same standard method to this set of different volcanoes can serve to illustrate several debatable issues that may arise during the analysis.For this aspect it appears useful to consider a group of volcanoes from a restricted geographical area and hence with a co-genetic large-scale driving force, although individual eruptive centres are affected on smaller scales by processes within the system in terms of magma evolution, ascent, and eruption mechanisms.

Geologic setting
Tectonic plates converging towards South America have constructed a vast and high mountain chain, the Andes, stretching along the entire western shore of the South American continent (Fig. 1).Four active volcanic arcs occur along the Chilean Andean chain: the Northern Volcanic Zone, the Central Volcanic Zone, the Southern Volcanic Zone (SVZ), which result from the subduction of the Nazca Plate beneath the South American Plate; and the Austral Volcanic Zone created by subduction of the Antarctic Plate beneath the South American Plate.Convergence rates are fast, reaching 70-90 mm/yr (Stern, 2004).The most active segment is the SVZ, bound to the North at 33 • S by the Pampean flat slab segment, where volcanism ceases when the subduction plate angle drops below 24 • , and to the South at 46 • S by the Patagonian Volcanic Gap (Stern, 2004).The SVZ contains more than 60 large volcanic centres and numerous small vents, many of which are known to have been active during the Quaternary (e.g.Stern, 2004).In this study, we focus on large volcanoes for which activity has been documented in the historical record.

Data
Chronological volcanic eruption records were obtained from the Smithsonian Global Volcanism Program website (www. volcano.si.edu, Siebert and Simkin, 2002); in-depth questioning of their correctness is beyond the scope of this work.We acknowledge, however, that individual studies might contain additional eruptions, discredit registered eruptions or shift the exact onset dates.
Target volcano selection has then been performed here based on the number of eruptions in the record, and their Volcano Explosivity Index (VEI, explained below).Since a minimum number of data entries is required to qualify a volcano for statistical analysis of the eruption time series, we selected only those volcanoes with at least seven eruptions of VEI ≥ 2 in historical times.The VEI, assigned to individual eruptions in literature, is a means to classify explosive volcanic eruptions mainly by the volume of the tephra ejected, complemented by eruption parameters such as eruption duration, column height, and qualitative descriptive terms (Newhall and Self, 1982;refined by Simkin and Siebert, 1994).In some cases, the assignment of a precise VEI value to a given eruption may be hampered by uncertainties.To which VEIlimit a data set should be confined is still of debate in the community of statistical volcanology.On the one hand, De la Cruz-Reyna (1991) expressed the opinion that statistical evaluation of eruption frequencies would be more reliable for large eruptions (of VEI ≥ 4, possibly ≥3) than for smaller ones.Indeed, since tephra volumes are reconstructed by detailed field mapping from the depositional fan, very small eruptions are fairly prone to being missed or volumetrically underestimated because of erosional processes.This problem applies in particular to eruptions in the earlier record, and ignoring possible incompleteness of the record will produce erroneous results in the statistical procedure.Criteria on how incompleteness of the record can be detected are discussed below.On the other hand, despite the fact that the statistical technique for eruption probability analysis has been applied to larger eruptions (VEI ≥ 3) in most previous studies, there is no direct volcanological reason why the analysis should not include smaller eruptions.Dzierma and Wehrmann (2010) provide reliable statistical assessments for eruption records consisting mainly of eruptions with VEI = 2, which we adopt for this study and exclude only very small eruptions of VEI = 1.Moreover, in the historical record of the selected SVZ volcanoes, most eruptions do not exceed VEI = 2. Since the problem of completeness is likely to be more pronounced for these than for larger eruptions, however, more attention must be paid to testing the condition of completeness.
The two historically most active volcanoes of the SVZ, Villarrica and Llaima, have already been analysed previously (Dzierma and Wehrmann, 2010); the results are included here for completeness.Furthermore, one parameter in the procedure is modified in this approach, as discussed in the Methods section.Small deviations in the results compared to the previous analysis occur because of a slight difference in implementing the procedure.For the completeness of the eruption record, Llaima and Villarrica were shown to have complete records only after 1860 and 1730, respectively (Dzierma and Wehrmann, 2010).In the case of the other volcanoes, the exact time since when the record can be considered complete is not a priori known, therefore, the analysis will be performed for the full historical eruption record.In the course of the analysis the completeness of the records will be assessed.
In consideration of the requirements defined above, twelve of the SVZ volcanoes are subjected to the statistical analysis (Fig. 1, Table 1; for eruption records, see supplementary material).Although expressing dynamic natural systems by means of statistical models is generally associated with some limitations, particularly when eruption records are short, we present these results to provide an estimate inferred from the available data.
To convert the eruption records into the time series the following statistical analyses are based upon, the interval from one eruption to the subsequent eruption ("repose time") is defined by the successive eruption onsets, thus neglecting the duration of an individual eruption.This convention has been initiated by Klein (1982).In derivation of the available data, these repose times are resolved on a yearly scale.

Methods
Since the purpose of this paper is to contribute a probabilistic eruption forecast as a part of a hazard assessment, only standard statistical methods that have evolved in eruption time series analysis will be applied here to the selected SVZ volcanoes.We follow a procedure similar to the one of Dzierma and Wehrmann (2010), which we will briefly outline in this section.The observed eruption repose times are described by means of distribution functions, for which several models exist.The methodology includes the expressions and characteristics of the possible distribution functions that can be fit to the data if certain prerequisites are fulfilled, the assessment of those conditions, the evaluation of how well the distribution functions image the data sets, and how the probabilistic eruption forecast can be extracted from the models.

The Poisson Process and the Exponential distribution
A chronological series of events such as volcanic eruptions can be described by a Poisson process, if the events occur 1. seldomly (the probability that more than one event occurs in a small increment of time is negligible), 2. independently of each other (the probability of an event in any time interval is independent of any occurrences before the start time of the interval) and

stationarily (without time trend).
A Poisson process corresponds to the ideal case, in which the time series data follow the most simple conceivable model.For a mathematical definition and derivation of these properties, see, e.g., Cox and Lewis (1966).
If the data are created by a Poisson process, they can be represented by an exponential distribution function.To check whether this is the case, several tests are usually performed.We start the analysis by plotting the cumulative number of eruptions for each volcano as a function of time.This is a way to get a first visual impression of the eruption time series on whether it is Poissonian or not.In addition, it is possible to visually define the onset of different eruption regimes if they exist, and compare the eruption behaviour to that expected for a Poisson process.If the cumulative number of eruptions is observed to increase approximately linearly with time, this can be interpreted as a positive test for Poissonian behaviour.Alternatively, the Poissonian behaviour could first be confirmed, subsequently permitting to fit the exponential.
Table 1.Results for the correlation and stationarity check for the SVZ volcanoes.The columns show the number of repose times between the eruptions in the record, the correlation coefficient R derived from the scatter plots, and the significance of the correlation measured by the correlation coefficient R.Where no correlation was found at 95% significance, "no correlation" is stated.The last column gives the result of the statistical stationarity check from the moving average plots, stating either that the data passed the stationarity check, or at which data points the acceptable scatter was exceeded.Where a volcano failed in one of the tests, it was examined whether restricting the record to later times would improve the results.Only volcanoes which passed both tests and still contained a sufficient amount of data entries were included in the later analyses.A visual impression can be derived from Fig. 2  This line of reasoning is equally valid and indeed preferred by some authors.
Independence of successive eruptions (property 2 above) implies that the time series is memoriless, and is tested by calculating the correlation coefficient R of successive repose times from serial correlation scatter plots (Cox and Lewis, 1966;Mendoza-Rosas and De la Cruz-Reyna, 2008).The square of the correlation coefficient (the coefficient of variation R 2 ) represents the fraction of the variation explained by the model to the total variation.Stationarity of a data set (condition 3 above) is defined as independence from the time interval considered.This does not mean that future data of a time series will be the same as those of a certain period in the past, but that the statistical parameters of each randomly chosen period are identical.Therefore, different portions of the series display the same statistical properties, smoothed over the time intervals considered, although individual data values may be strikingly different.Non-stationarity may be detected by a moving average test (Klein, 1982;De la Cruz-Reyna, 1996;Mendoza-Rosas and De la Cruz-Reyna, 2008;Dzierma and Wehrmann, 2010), in which the average of, e.g., each five successive repose times is calculated and the results are plotted as a function of time.For volcanic eruptions, possible trends in the repose times can be detected in this way, provided that the eruption record is complete.In the opposite direction, if the moving average test manifests non-stationarity, this can also advise to question the completeness of the record.
For stationary time series, we can define the repose time distribution function F (t) as "the probability that the observed repose time T is smaller than or equal to t", which is applicable to the total eruption record.In the mathematically benevolent case, the repose time distribution is associated with the instantaneous failure probability given by as the corresponding density function.The cumulative distribution function F (t) is non-decreasing in t, and T is the random variable for the repose time between the eruptions.The associated survival function S(t) is then Values range between zero and one, so this function must be scaled by a factor A to give the expected number of repose times greater than time t.
Nat. Hazards Earth Syst.Sci., 10, 2093-2108, 2010 www.nat-hazards-earth-syst-sci.net/10/2093/2010/For the purpose of predicting the probability of future volcanic eruptions, the residual life distribution is defined.This is the survival function as a function of time t after a given waiting time (age) x, given survival up to that time.
The exponential function is characterised by a constant, memoriless statistical hazard rate expressed as implying that the probability that an eruption occurs in the next small increment of time is independent from the amount of time that has already elapsed since the last eruption event.
Therefore, the exponential distribution does not suffer from aging, giving the mathematical characteristic of S x,exp (t) = S exp (t), and the volcanologic interpretation that a single eruption is not strong enough to exploit the driving energy in the magmatic system to a degree that the following eruption would be attenuated or delayed.This last condition can be integrated to give the distribution function for a Poisson process, the exponential distribution

The Weibull distribution
Since many volcanic systems are observed to cycle through periods of increasing volcanic activity or waning/extinguishing phases, it is appropriate to apply a distribution function yielding a systematically increasing or decreasing hazard rate.The Weibull distribution, as a model of simple failure, has been found to represent such volcanic behaviour (Ho, 1991;Bebbington and Lai, 1996a, b;Watt et al., 2007;Dzierma and Wehrmann, 2010).It is expressed by: where d is a power parameter, usually referred to as the "shape parameter".For d > 1, the distribution describes an increasing hazard rate, d = 1 corresponds to the exponential distribution, while d < 1 gives a model of decreasing hazard rate.

The Log-logistic distribution
To accommodate volcanic processes that might counteract each other such that specific processes contribute to increasing the likelihood of an eruption, whereas others decrease it, the log-logistic distribution (Pareto III distribution) has been successfully applied to a number of volcanoes (e.g.Connor et al., 2003Connor et al., , 2006;;Dirksen et al., 2006;Dzierma and Wehrmann, 2010;Watt et al., 2007).The parameters can be phenomenologically involved in the numerical expression using, e.g., a shape parameter p besides the scale parameter x 0 : Log-logistic distributions have proved useful in particular for time series that include very long or short repose intervals (Connor et al., 2003).

Fitting the repose time distributions
The distribution functions can be fit to the eruption time series with the help of a standard software package (e.g., R, Matlab, or Origin).Here, OriginPro 8.0 and Origin 7.0 were used to fit all three distributions, applying the method of least-squares fitting (equivalent to maximum-likelihood if the errors are assumed to be normally distributed).
In the fitting process, it must be decided whether the scale factor A is treated as a free parameter to be estimated by fitting, or as a fixed value representing the total number of observed repose times.This issue has not been addressed in previous studies.Since both alternatives can be justified, we implement both of them to the same datasets for comparison.Thus, for each distribution function, two fits are made: one with the scale factor A fixed to the total number of repose times, and one where A is varied and optimised to achieve the best fit.
The reason behind performing this double fit for each distribution function is the following: It could be argued that the scale factor A is actually irrelevant in the prediction of future eruption probabilities, because it does not appear in the residual life distribution S x .Any parameter that does not appear in the final result might be left out of the analysis from the beginning, given that it is only the functional form (the shape) and not absolute scale of the fit distribution functions that enters the results.
However, if the fits are performed with fixed A, this actually amounts to overweighting the value at T = 0 with respect to all longer repose times, which may influence the shape of the fit curves.A fit with a slightly different scale may in fact approximate the total data set much better, while deviating at the axis intercept.Whether the improvement of the fit by inclusion of the scale A justifies including another free parameter in the fit -possibly overfitting -is evaluated using the AIC c .

Akaike Information Criterion
For the relative comparison of the fitted models, the corrected Akaike Information Criterion (AIC c ) is applied, which is given for least-squares fitting by where n is the number of data points fitted, k the number of free parameters, and y(x i ) the points of the model y(x) used to fit the data points (x i ,y i ),i = 1,...,n (Sigiura, 1978;Burnham and Anderson, 1998).The correction is applied because some of the data sets are small.Since the AIC c penalises both the misfit and a large number of parameters (Akaike, 1973;Bebbington, 2007;Turner et al., 2008), it is an expedient instrument here especially for the evaluation whether the fixed versus fit scale factor A is more adequate.It should be emphasised that the absolute value of the AIC c is meaningless.The model that achieves the best fit to the data will be the one producing the lowest AIC c value, permitting to select the relative "best model" for each volcano analysed.However, the AIC c cannot be used to assess whether the models describe the data sufficiently well that they can be interpreted as representative for the eruption time series in an absolute manner.We will therefore evaluate the fit quality with a standard hypothesis test (goodness-offit-test).

Goodness-of-fit-test
To assess the goodness of the fits, statistical test such as the Kolmogorov-Smirnov-test (KS-test) and the χ 2 -test are available.Since the χ 2 -test can only be applied if the number of data points is sufficiently large, we here apply the KStest.The KS-test investigates the maximum deviation of the model from the data and compares this to a tabulated threshold (Gibbons, 1976).However, the KS-test does not take into account the decreased number of degrees of freedom when parameters are estimated from the data and is therefore overly restrictive if applied to fit values.

Forecasting eruption probabilities in comparison of the methods
The probability that at least one VEI ≥ 2 eruption will occur within a given amount of time t is estimated by the formula (Marshall and Olkin, 2007) where s is the time that elapsed since the last eruption, reaching till the present.The fraction term of the formula is the probability of no eruption occurring within time s + t given our knowledge that none has occurred by time s -this is the residual life distribution defined in Eq. ( 5).Vice versa, this can be applied to estimate the probability that at least one eruption occurs in the given time.
5 Results: analysis of data sets In the following implementation of the statistical procedure, we show only selected examples to demonstrate each step of the analysis.The complete sequence of figures for all volcanoes is provided in the supplementary material.

Cumulative number of eruptions
As first approach to assessing Poissonian behaviour and stationarity of the data set, the cumulative number of eruptions is plotted versus time (Fig. 2).If a piecewise function was to be fit to the time series, such plots would also serve for a visual definition of eruptive regimes.The introduction of eruption regimes can be necessary for non-stationary eruption time series of volcanoes that alternate between states of higher or lower activity (Mendoza-Rosas and De la Cruz-Reyna, 2008, 2009, 2010) or -a similar interpretation with a somewhat different emphasis -which exhibit distinct eruptive episodes (Nathenson, 2007).While methods exists for a mathematically more stringent definition of regimes (e.g., Mulargia et al., 1987;Bebbington, 2007), already the visual choice of regimes as given here can provide a first impression on the characteristic volcano behaviour.Tupungatito, Planchón-Peteroa, Llaima, and Villarrica have an overall relatively linear increase of the cumulative number of eruptions with time, meaning that the rate of eruptivity -the number of eruptions in a given time interval -appears to be approximately constant.This is consistent with stationary behaviour.
For Quizapu-Cerro Azul, Maipo, Antuco, Osorno, and to some degree Copahue, high-activity phases with frequent eruptions in short time intervals of only a few years have alternated with low-activity episodes of infrequent eruptions over time intervals up to decades.In the cases of volcanoes with such irregular activity, the eruption forecast is not obvious.If it was estimated from simple browsing through the eruptive record and straightforward extrapolating to the future, this would yield the assumption that the most recent eruptive regime (dormancy, in some cases) would be solely representative for that volcano.To avoid erroneous predictions because of oversimplification, in particular for those volcanoes the statistical approach will be useful if stationarity is still sufficiently supported.
Nevados de Chillán, Puyehue-Cordón Caulle, and Calbuco constitute an intermediate category between the two patterns: They appear overall close to stationary with only minor variation in the cumulative frequency over time, but show one marked change in the temporal eruption record.In the cases where very variable activity regimes exists, and where the first regime is found to exhibit particularly low activity, this may indicate that the eruption record at these early times may be incomplete.An example is Nevados de Chillán.The early eruptions in 1650 and 1749 are both estimated to be VEI = 3 or larger, which may indicate that only eruptions from this = 0.132, with the intuitive interpretation that at most 13% of the variation observed in the repose times can be explained by the variation in the preceding repose time.
size upwards are documented in the record and smaller eruptions may have been missed.Although the present low activity of the volcano may lend some credibility to the assumption that it may also have displayed low-activity regimes in the past, we here prefer the more cautious restrictive approach and interpret the first eruption regime as indicative of record incompleteness.For the following analyses for Nevados de Chillán, only eruptions from 1860 on are taken into account.
The same argument may be made for the first regime of Quizapu-Cerro Azul and Puyehue-Cordón Caulle, although in their case the earlier eruptions are estimated of the same size (VEI = 2) as later ones, so at least the record does not seem to be systematically biased towards larger eruptions.

Test for independence and stationarity
The results of the tests for independence and stationarity are summarised in Table 1 for all volcanoes; as an example, the graphical representation of the data is shown for the restricted record of Nevados de Chillán (Figs. 3, 4).The validity of these tests for the volcanoes with comprehensive eruption records is evident, while for small datasets these two tests become increasingly uncertain.
For all except two of the volcanoes analysed, the calculated correlation coefficient from the scatter plots does not imply a significant correlation at the 95% level.For Copahue, both a significant correlation (with correlation coefficient R = 0.88) is found, and the first two moving averages are considerably higher than the later ones.This corresponds to the first two very long repose times at the beginning of the historical record and most likely reflects incompleteness of the record.If only eruptions from the 20th and 21st centuries are included, however, the eruption record becomes so short that a statistical analysis would be doubtful.We will therefore not consider Copahue further in this study.
In the case of Planchón-Peteroa, although the test gives an anticorrelation of successive repose times at 95% significance, the cumulative number of eruptions appears to increase linearly with time in a good approximation of an exponential distribution behaviour.Besides, the stationarity test is satisfied.However, two of the first three eruptions on the record are larger (>VEI = 3 in 1660 and VEI = 4 in 1761), whereas all more recent eruptions are VEI = 1-2.Therefore, the record may be biased and we repeat the analysis for eruptions only from 1835 on.However, the data still show a significant correlation.Since the record cannot be shortened further, we also exclude Planchón-Peteroa from the analyses below.
Puyehue-Cordón Caulle also has a very long repose time right at the beginning of the historic record, which is inconsistent with stationarity and had already been indicated in the plot of cumulative number of eruptions versus time.In addition, the record suggests that the first eruptions may be of uncertain VEI.Taken together, these two observations suggest that only eruptions from 1893 on should be considered.In this case, a correlation coefficient of −0.176 is found, consistent with no correlation, and the data can be considered stationary.
The fact that stationarity for Nevados de Chillán is only consistent with the shortened eruption record after 1860 (Figs. 3, 4) confirms our previous choice of restricting the historic record for the analysis to that time interval.
While the independence and stationarity tests are satisfied for Llaima, a slight questionability arises on theoretical grounds, since at least two repose times less than one year are observed in the years 1932 and 1998.These are treated as zero repose time in the analysis, which in principle contradicts the requirement for a Poisson process that two or more events should not occur contemporaneously.
In the case of Antuco and Calbuco, the statistical tests confirm stationarity.However, a future eruption in or after the year 2010 would lead to an inconsistency in the movingaverage test such that the latest data point would fall outside the defined range.In such a hypothetical case, the prerequisites for the analysis would no longer be met.
Taken together, we are left with ten volcanoes for which the eruption history -restricted where necessary -is reasonably consistent with stationary Poissonian behaviour.Among these, some of the volcanoes that were found to show different visually-defined eruption regimes (vic., Quizapu-Cerro Azul, Maipo, Antuco, and Osorno) are not statistically proven to be non-stationary, meaning that the observed variation in eruptive activity ranges within the acceptable limits of statistical fluctuations.Therefore, in the following we will treat these time series as stationary and fit them with the standard models for the distribution functions.The fitting of the whole eruptive record should smooth over shorterterm fluctuations and represent the average volcano eruptive behaviour.However, the visual inspection of the cumulative number of eruptions indicates that some degree of nonstationarity may be present in these data sets.Therefore, the prediction based on the distribution function fits should be taken as an approximation to the expected volcano behaviour, which for these volcanoes may be more uncertain than for the volcanoes having more uniform eruption rate.Although the activity may be well approximated by the distribution function fit, it cannot be taken for granted that we are presently witnessing a regime corresponding to the average eruption rate, but may be inside a low-or high-activity eruption phase.From the opposite perspective, neither is it convincing that the last eruption regime or the longest-duration regime should be more significant.We will therefore use the average eruption behaviour for the calculation of future eruption probabilities, although we are aware that some volcanoes may temporarily be experiencing regimes of different activity levels.Convincingly, for several of these volcanoes, i.e., for Maipo, Quizapu-Cerro Azul, Nevados de Chillán, Antuco, Puyehue-Cordón Caulle, and Osorno, the recent activity appears to have been lower than the historic average.If no systematic shift in the definition of VEI has taken place, consequently, this implies a real change in the eruptive regime -and cannot be explained by documentary artefacts in the form of record incompleteness.

Repose time fits
The repose time distributions with the fit functions are shown for all volcanoes in Fig. 5. Tables with the fit parameters, mean squared deviation divided by degrees of freedom (χ 2 /DoF), AIC c , and KS-value are given in the supplementary material.

Goodness of fits
The goodness of the fits was checked using the KS-test.For the majority of cases, the fit functions pass the hypothesis test at the 5% level.The only exceptions to the general confirmation of the goodness of the fits are Llaima, where the exponential function fails the test for both variable and fixed A, and the Weibull function with variable A, as well as Antuco, where the Weibull function with variable A fails to achieve a good fit to the data.More extremely than failing the KStest, in the case of Maipo the Weibull function fails to give a stable fit to the data at all in case A is varied.
There are several reasons explaining the failure of the KStest in the few cases where they are observed.The poor fits to the Llaima dataset may be surprising at a first glance, since this is the largest and most uniform of the eruption records considered here.But rather than raising doubts about the quality of this eruption record, its comprehensiveness facilitates to reveal the differences between the dataset and the models in better detail.Evidently, even very simple models are adequate for fitting datasets of only seven to ten eruptions, while larger datasets enable the statistical hypothesis tests to assess the appropriateness of different models on a much firmer ground.The exponential model appears to be oversimplified for Llaima.The KS-differences of the Weibull distribution for Llaima most likely originate from the two years that comprise two eruptions, since the largest offset is mathematically produced by values at repose time zero.In this special case, it may be argued that the χ 2 -test gives a more meaningful estimate of the goodness of fit, which is also justified since the dataset is sufficiently large.
The failures of the Weibull distribution at Llaima, Antuco, and Maipo highlight a general problem: While the Weibull function with variable A usually gives rather good fits, sometimes it suffers from instability in the fitting process.This is particularly evident when small datasets are used, as in the case of Antuco and Maipo.The underlying reason is that the Weibull function can have an either increasing or decreasing hazard rate, depending on the choice of the shape parameter larger or smaller than one.For small datasets, the Weibull function will generally have more large repose-time values to adjust to and will therefore have decreasing hazard rates.In this case, the form of the dataset is fit rather well by the Weibull function, but the scale will be much larger than the total number of eruptions, resulting in large deviations for very small repose times (0-1 year) and a failure of the KStest (for further discussion of the fitting of the Weibull func-tion for constant vs. variable A in the case of scarce data, see Wehrmann and Dzierma, 2010).
A secondary problem occurs for the exponential function in case of fixed scale A, which generally struggles to give a good fit to the data.Since only one free parameter can be fit, the data set cannot be described satisfactorily.The possibly occurring large deviations may result in a failure in the quality-of-fit test, even where the eruption behaviour is close to Poissonian.

AIC c for fit vs. fixed scale A
The difference in fit quality is assessed by the AIC c to compare the fits for fixed vs. variable A for the same distribution function, and for each individual volcano.The exponential function is generally markedly improved by the inclusion of A as a free parameter, observed at Maipo, Quizapu-Cerro Azul, Llaima, Villarrica, Puyehue-Cordón Caulle, and Osorno.For some other cases (Tupungatito, Nevados de Chillán, Antuco, and Calbuco), fixing vs. varying A does not produce a significant difference.Regarding the Weibull distribution, for the majority of volcanoes (Tupungatito, Quizapu-Cerro Azul, Llaima, Villarrica, Puyehue-Cordón Caulle, Osorno, Calbuco) there is a slight decrease in fit quality for free A, meaning that the additional free parameter is penalized while not significantly enhancing the agreement with the data.The same applies for the log-logistic distribution for the volcanoes Tupungatito, Quizapu-Cerro Azul, Antuco, Llaima, Puyehue-Cordón Caulle, Osorno, and Calbuco.However, the difference in AIC c for most of these cases is small.
The seeming contradiction for Antuco, which has a significantly lower AIC c when A is varied, has to be evaluated in the context that this fit fails the KS-test, due to the large misfit which occurs at repose time zero.This indicates the aforementioned instability in the fitting process for the Weibull distribution for some data sets when A is variable, also observed for Maipo (see above) and Villarrica (see Dzierma and Wehrmann, 2010).
In a few cases, such as Nevados de Chillán and particularly Villarrica, the fits for the log-logistic distribution are markedly improved by including A as a free parameter.Taken together with the drastic improvement generally achieved for the exponential distribution, it appears that leaving A variable has the potential to greatly improve the fit quality, without jeopardising a marked decrease in quality (in these simple scenarios).This is particularly plausible for the exponential distribution, because if A is held fixed, there is only one parameter available to obtain a fit of the data, which seems to be generally insufficient.Where the other distribution functions are used, two or three free parameters are fit (for A fixed or varied, respectively), which makes it easier to obtain good agreement with the observations.In this case, the improvement in the fit from the inclusion of another free parameter (A) is counterbalanced in the AIC c by Nat.Hazards Earth Syst.Sci., 10, 2093Sci., 10, -2108Sci., 10, , 2010 www.nat-hazards-earth-syst-sci.net/10/2093/2010/ the number of free parameters, giving comparable AIC c values.Most eruption records considered here can obviously be well approximated using two fit parameters.
To be consistent, we believe either A should be fit or held fixed for all distribution functions alike.Therefore, we propose to treat A as a variable in all the fits.In some special cases of scarce datasets where no stable fit can be achieved, or the fit quality is insufficient, it may become necessary to hold A fixed.
6 From the models to the eruption forecast

Interpreting the three distribution functions
The statistically expected probability for at least one VEI ≥ 2 eruption to occur in the future was calculated for all volcanoes, for both the fits with fixed and variable A. While the model with the lowest AIC c value should be expected to give the best approximation to the data and might therefore be chosen for the prediction, we prefer to show the predictions based on all three distribution functions for several reasons: The different distribution functions applied here have been attributed to differing geologic processes, which govern the eruptive behaviour of a volcano.A good fit of the Weibull distribution, a model of simple failure, is understood to best describe systems in which this failure occurs after a certain time as a result of one dominant process.For a volcano this translates into repose times of magma chamber maturation between the eruptions in a continuous or temporally regular fashion, and eruption onset when gas pressure through magma differentiation exceeds a limit of supersaturation.The log-logistic distribution, in contrast, has been associated with competing processes such that certain parameters contribute to an increase in eruption probability, whereas other parameters that simultaneously antagonise these processes may decrease the probability of an eruption (e.g.Connor et al., 2003).For example, crystallisation-induced degassing from a shallow chamber can take place in an open system, e.g. at a lava lake or through large-scale open fumarole paths, and the continuous quiescent pressure release will contribute to relaxation of the system and lower the likelihood of an eruption, although the magma reservoir itself is ongoingly replenished and evolves.
The scope of a statistical analyses such as the ones implemented in this study is limited to numerically propose which type of parameter interaction most likely exerts the dominant control on the eruptive behaviour.Subjecting any of the volcanoes considered here to a detailed volcanologic investigation may correct or even disprove the ranking we suggest based on statistical means.Furthermore, beyond the ranking inferred from the AIC c values, if all fits are confirmed to be consistent with the observations by the KS-test, there is no reason to discard any of them and no process interaction (gradually ongoing, amplifying or competing) can be defi-nitely ruled out on statistical grounds.We therefore offer a forecast obtained from all three distribution functions, representing the range of possible eruption probabilities for each volcano.All calculated eruption probabilities are shown in Fig. 6, the corresponding tables are given in the supplementary material.

Forecasting eruption probabilities -the different models
In most cases, the exponential distribution gives the highest eruption probability, followed by the Weibull distribution and the log-logistic distribution.The Weibull distribution predicts higher probabilities than the exponential one only in the cases where it has an increasing hazard rate, i.e. a shape parameter d > 1 (which applies to Nevados de Chillán, Llaima, Villarrica, and Puyehue-Cordón Caulle (A fixed)).For most volcanoes considered in this study, the shape parameter is rather found to be d < 1, corresponding to a decreasing hazard rate.This may appear counter-intuitive in that it might be thought that a dominant process leading up to an eruption would in any case be more likely to produce an increasing hazard rate distribution, where the probability of an eruption increases with time since the last eruption.However, for most of the analysed volcanoes, we appear to be witnessing a period of activity below the average (Fig. 2), which would be reflected in a decreased shape parameter of the Weibull function.It should be emphasised here that a statistical decreasing hazard rate does not include the option that the volcano's activity may wane towards extinction.This would be in contradiction to the presumed stationarity, which implies that the volcano will eventually reach the 100% probability of erupting again.The log-logistic distribution consistently predicts the lowest eruption probabilities.From the volcanological point of view, this may be expected, since in most volcanoes competing processes will be involved of which some may counteract the build-up of eruptions.In the scenario where one of these processes would temporarily become dominant, the volcano might switch between eruption sequences best described by log-logistic to Weibull behaviour (or vice versa).However, for fitting the data, a decision has to be made in favour of one of the models.In this context we reiterate that the statistical description can only offer an approach to classifying the eruption behaviour, while by no means imposing any desired behaviour on the natural system.
It is difficult to decide which model prediction should be considered the most realistic for any given volcano.For instance, the exponential model for Antuco predicts highest eruption probabilities, reaching near 100% within the next 50 years.Given the last repose time of over 140 years at this volcano, the validity of the exponential model may be questioned here based on its potential oversimplification.It may also be discussed to what extent residual life calculations are a suitable tool for volcanoes that may have entered a period  of longer repose times (this would conflict with the stationarity presumption).On the other hand, the average of the last 260 years of eruption behaviour may be more representative of the current situation.In this case, the predictions may hint at the volcano being overdue to erupt.However, it cannot a-priori be decided whether the present-day behaviour is representative or merely a passing exceptional phase.

Forecasting eruption probabilities -the different volcanoes
As an indication of the dangerousness of all SVZ volcanoes considered here in comparison with each other, the eruption hazard is presented as the probability of at least one VEI ≥ 2 eruption occurring within the next ten years (Fig. 7).In this definition, the volcano with the highest eruption hazard is Llaima, for which all distribution functions predict a near-100% chance of erupting at least once in the next ten years.This comes as no surprise, since indeed Llaima has been erupting at time intervals of mostly less than a year in the recent past.Villarrica and Nevados de Chillán share a relatively high eruption probability ranging between 40 and 100% over the next ten years, which also fits well into the common-sense expectations based on the recent past for Villarrica.For Nevados de Chillán, in contrast, the last VEI = 2 eruption occurred in 1973, and the volcano has been in a comparatively quiet state for about 75 years.Such observation brings up the question of representativeness of the current eruptive regime for the volcano's behaviour on a longer term, and has been a matter of debate in statistical volcanology (e.g.Mulargia et al., 1987;Bebbington, 2001;Mendoza-Rosas andDe la Cruz-Reyna, 2009, 2010).As a general theoretical complication, the possibility that a new regime is just commencing can never be firmly ruled out.However, since this remains hypothetical, probabilistic eruption forecasts are based on either the most recent eruptive regime, considered to be still ongoing, or on a weighted mean smoothing the fluctuations between high and low activity regimes over the entire past eruption record considered (Dzierma and Wehrmann, 2010).The high probability of an upcoming eruption at Nevados de Chillán as shown in Fig. 7 is derived from the complete eruption series after 1860.For the case that the recent past turns out to be more representative and the volcano is indeed in a longer-lasting low-activity regime, the method we applied here overestimates the likelihood of a future eruption.
Puyehue-Cordón Caulle, Tupungatito and Calbuco have medium hazard on the order of 50%, with large variation between the different model predictions.Quizapu-Cerro Azul showed the most extreme alternation between highand low activity episodes.From the entire record, it is assigned an only moderate to low probability around 15-50% of a VEI ≥ 2 eruption.Nevertheless, this volcano is the site within the SVZ where the most voluminous and devastating eruptions occurred in historical times, which should be kept in mind if the volcano is to be categorised in the course of an integrative hazard assessment.For Antuco and Maipo, the probabilities diverge strongly depending on the model applied (as indicated above).From the exponential distribution, a 60% eruption likelihood is estimated, while the log-logistic distributions lead to an eruption prediction of below 10% in the next ten years.
The lowest eruption probabilities are found for Osorno (below 20% over the next ten years).Osorno exemplifies another impressive case of discrepancy between superficial and scientific threat classification.It is categorised as one of "the most active volcanoes" in the Southern Chilean Andes by e.g.local touristic agencies and in the overview description of the Global Volcanism Program.Intensive recreational use such as the touristic development of hiking paths and ski resorts may lead to such human perception and will also increase the number of people exposed to the risk of an eruption.By means of statistical time series analysis, however, Osorno was found to constitute only a very low eruption hazard.Moreover, most eruptions from Osorno in the historical past were of small volume and low explosivity.A similar situation with regard to risk perception occurs for Copahue.
Copahue is understood by the local population as "the most active volcano of Argentina".Here it had to be excluded from the analysis because of too few eruptions in the record.
With regard to evaluating whether future eruptions are related to the strong seismic activity, this study can only provide a basis for comparison against future eruption frequencies.Enhanced volcanic activity in response to the 2010 earthquake could be manifested in more frequent eruptions, by stronger eruptions or by several centres erupting simultaneously.For those volcanoes with very short repose Y. Dzierma and H. Wehrmann: Statistical eruption forecast for Chilean SVZ volcanoes times or a very high eruption probability in the near future, it will be difficult to discern if any future eruption is a direct result of the earthquake, or was due to occur and merely coincides.Nevertheless, as most arc eruptions are considered to be invoked multifactorially, only volcanoes that have built up towards an eruption when the earthquake takes place will be susceptible for earthquake-triggering; and would thus become candidates for producing seismogenic eruptions (Watt et al., 2009;Lemarchand and Grasso, 2007).For the opposite scenario of volcanoes with lower eruption frequency, the analysis of seismically elicited eruption activity may not be viable since the long-term inter-eruption interval is not substantially shorter than the interval between very large earthquakes.To calculate the level of enhanced eruption frequency sufficient to be statistically significant is beyond the scope of this paper and will become feasible if indications for enhanced activity are detected in the near future.

Conclusions
The statistical probability of future eruptions is determined for large volcanic centres in the SVZ, giving highest hazards for Llaima, Villarrica, and Nevados de Chillán.A medium eruption likelihood is calculated for Puyehue-Cordón-Caulle, Tupungatito, Calbuco, Quizapu-Cerro Azul, Antuco, and Maipo, while Osorno displays only low eruption probabilities.Our investigation reveals that there might be a significant difference between the common sense perception of potentially upcoming eruptions and the numerical prospect, here particularly salient for Nevados de Chillán and Osorno.
With the perspective to a comprehensive volcanic hazard assessment, consisting of detailed investigation of the geophysical, geochemical, geodetic, tectonic processes and monitoring, this analysis contributes pivotal information in providing the probability of future eruptions within a given time period.Although this is indubitably essential, we would like to emphasise that these forecasts are only of probabilistic nature; and it needs to be kept in mind that that any volcano yields a potential to reawake with only little precursors or even entirely without warning after long periods of dormancy.Watt et al. (2009) suggested an arc volcanic response to large earthquakes to occur by both dynamic stress due to the passage of seismic waves and permanent static stress changes induced by large-scale deformation.Walter and Amelung (2007) model volumetric deformation to distances of about 2000 km from the epicenter of large (mag >8-9) earthquakes.Seismic eruption triggering may hence affect the entire arc, provided the volcano is in a pre-eruptive state at the time the earthquake happens.Like Watt et al. (2009), we acknowledge, however, that a random coincidence in individual cases remains possible.The coming months to years will show whether any elevation of volcanic activity above the background levels presented here can be registered.Given the intrinsically high eruption probabilities at several of the volcanoes analysed, it remains to be evaluated if an increased activity level can indeed be verified as an earthquake response.If any eruptions are elicited by the earthquake as expected, their geographical and temporal distribution may provide insight into the importance and decay of volumetric deformation and its associated static stress.
Several other volcanoes erupted during historical times in the SVZ.The eruptions of Planchón-Peteroa and Copahue should not be neglected in an encompassing SVZ consideration.San Jose showed several VEI = 2 eruptions since 1822; Decabezado Grande erupted in 1933, probably in response to the large 1932 Quizapu-Cerro Azul eruption in its immediate vicinity.At Lonquimay, three VEI = 3 and one VEI = 2 eruptions are documented since 1850; of which the last one in 1988 lasted for more than a year.A few (partly uncertain) small, but explosive eruptions are reported from Callaqui, Mocho-Choshuenco, and Carrán in the past centuries; while recent activity distressed the regions in the South of Chile around Huequi, Chaitén, Minchinmávida and Hudson.The eruption records for these volcanoes are too scarce for individual statistical analyses.However, with regard to the volcanic dynamics of the entire subduction system on a larger scale, it is likely that they also contributed to stress changes by volumetric evacuation.Including these eruptions in an arc-scale consideration and weighting of all data by eruption magnitude, i.e., volume of ejected material, rather than defining the statistical "event" as a simple eruption occurrence from a chosen indexed explosivity upwards, may help to resolve possible transfer of eruption triggering/prevention forces over time and distance.

Fig. 2 .
Fig. 2. Cumulative number of eruptions (VEI ≥ 2) as a function of time.Blue lines: piecewise-linear approximation, showing individual eruption regimes/episodes.Green lines: approximation for constant eruption rate.These lines are meant exclusively as an aid to guide the eye and not as a basis for the statistical calculations.

Fig. 3 .
Fig.3.Independence check for Nevados de Chillán, restricted eruption record from 1860 onwards.The scatter plot shows each repose time plotted against the preceding repose time.The points are relatively evenly distributed, and no clear correlation is evident.This is in agreement with the correlation coefficient R = 0.364, consistent with uncorrelated data.The square of the correlation coefficient (the coefficient of variation) is R 2 = 0.132, with the intuitive interpretation that at most 13% of the variation observed in the repose times can be explained by the variation in the preceding repose time.

Fig. 4 .
Fig. 4. Stationarity check for Nevados de Chillán (eruption record after 1860).The 5-point moving averages of the repose times are distributed around their mean value (dashed line) and with variation that can be explained by stochastic effects (below the dotted line corresponding to two standard deviations distance from the mean).

Fig. 5 .
Fig. 5. Repose time distribution for the SVZ volcanoes (restricted eruption records where appropriate), with fits for the exponential (green), Weibull (blue) and log-logistic (orange) distribution functions.Fits for fixed scale A are displayed as solid lines, for variable A in dashed lines.

Fig. 6 .
Fig. 6.Statistically predicted probability for at least one eruption of strength VEI = 2 or larger occurring in the given time, for each volcano.Colours and line types are as in Fig. 5.

Fig. 7 .
Fig. 7. Baseline of typical volcanic activity of the historically frequently active volcanoes of the SVZ, expressed as probability (in %) of at least one VEI ≥ 2 eruption in the next ten years.Volcanoes are ordered from North to South.The shaded area is approximated between the volcanoes for better visibility.
and from the plots in the supplementary material.