Introduction

Statistical analysis of extremes in the environmental sciences is important for several reasons, for instance regarding engineering design and regulations, as well as related issues in insurance and finance. As conventional risk measures, usually upper quantiles of quantities of interest are estimated, based on observed data (see e.g. Rychlik and Rydén 2006; Naghettini 2017). These are called return levels and are associated with a certain return period. For instance, consider temperature at a location and a return period of 100 years. The related return level is the temperature exceeded on average once in 100 years. The classical assumptions involve two key conditions: extreme events arising from a stationary distribution; and extreme events being independent. Traditionally, the notions of return periods and return levels have been developed and used extensively for, e.g. hydrology with roots to the influential book by Gumbel (1958); for a recent review, see Naghettini (2017). Applications can also be found in agricultural sciences, e.g. in crop science where variability of precipitation and temperature affects the yield (Barlow et al. 2015).

Of much interest is to introduce non-stationary statistical models, due to changing environmental conditions. Discussion is present not the least in the hydrology research community, cf., e.g. the articles with provocative titles by Milly et al. (2008), Serinaldi and Kilsby (2015), see also Koutsoyiannis and Montanari (2015). Non-stationarity can be introduced by extending the standard limiting statistical distributions. Coles (2001) presented a model with time-dependent parameters of the generalised extreme value (GEV) distribution. More recently, the flexible class of distributions GAMLSS (Rigby and Stasinopoulos 2005) has been considered an option for modelling of meteorological data, with an early application by Villarini et al. (2010). With several options to choose in that framework, model selection is crucial. This has been stressed by Caroni and Panagoulia (2016) and Panagoulia et al. (2014) where the application of various information criteria was investigated in simulation studies.

The use of the GAMLSS framework deserves to be examined closer, as suggested by Debele et al. (2017), where the authors propose to evaluate the reliability of the various models by simulation experiments and discuss the lack of such analysis in many applied papers. Moreover, the difficulties in using GAMLSS are pointed out: distributions of response variables, choices of explanatory variables and link functions. In addition, the numerical estimation algorithms can fail.

This paper is a case study; as part of an intended larger study on trends of climate indicators for locations in Sweden, some results worth reflecting upon were found when fitting various alternatives for extreme minimum temperatures. The particular results from the numerical procedures are presented and discussed and may serve as a caution for the applied researcher when working with the flexible class of distributions and the potential danger of using automatised procedures for the model selection—a human ought to judge all results. The paper is organised as follows. In “Climate indices for extremes” section, a description of the data is given. A review of the statistical modelling and framework is given in “Statistical modelling and methodology” section. The main findings are provided in “Results” section, where results from estimation procedures are presented along with interpretation and discussion, including a simulation study. Finally, in “Summary” section, a summary and discussion are provided.

Climate indices for extremes

Modelling of extreme quantities in the earth sciences, for instance, meteorology, is not straightforward, as pointed out by Stephenson (2008): “There is no universal unique definition of what is an extreme event”. Several indicators have been suggested for monitoring change in climatic extremes (see e.g. Frich et al. 2002; Zhang and Zwiers 2013). Climate indices may relate to temperatures as well as precipitation, and they may be based on absolute thresholds or percentile based. As examples of the former are Daily maximum temperature (TXx) or Maximum one-day precipitation amount (RX1day), of the latter: Warm days (TX90p)—percentage of time when daily maximum temperature > 90th percentile or Very wet days (R95p)—annual total precipitation from days > 95th percentile. Percentiles refer to long-term percentiles from the reference interval 1961–1990 (Zhang and Zwiers 2013).

Data on extreme temperatures

In this paper, we study a time series of the annual minimum of daily minimum temperature (as an indicator, abbreviated TNn). Hence, in a mathematical sense, these are true minima when considering time series of annual observations. The observations are made at the Swedish town Östersund (latitude: 63.1\(^\circ\), longitude: 14.3\(^\circ\), elevation: 376 m). The time series covers the period 1918–2017, with year 1943 missing. Data are available at http://www.ecad.eu, see Klein Tank et al. (2002).

Statistical modelling and methodology

Generalised extreme value distribution (GEV)

From classical extreme value theory, it is well known that a limiting distribution in the case of independent, identically distributed observations is the GEV distribution, see for instance Coles (2001), Gomes and Guillou (2015) or Dey et al. (2016). The GEV distribution has three parameters (location \(\mu\), scale \(\sigma\) and shape \(\xi\)):

$$\begin{aligned} F(y; \mu ,\sigma ,\xi ) = \left\{ \begin{array}{ll} \exp \left\{ -\left[ 1+\xi \frac{y-\mu }{\sigma }\right] ^{-1/\xi }\right\} , &{}\quad \xi \ne 0, \\ \exp \left\{ -\exp \left[ -\frac{y-\mu }{\sigma }\right] \right\} , &{}\quad \xi = 0. \end{array}\right. \end{aligned}$$

where \(\mu \in {\mathbb {R}}\), \(\sigma >0\) and \(\xi \in {\mathbb {R}}\). The shape parameter \(\xi\) affects the support of the distribution: when \(\xi =0\), the GEV distribution is the Gumbel distribution (with support \({\mathbb {R}}\)). When \(\xi >0\), the distribution corresponds to the Fréchet distribution with support \(y\ge \mu + \sigma /\xi\), and when \(\xi <0,\) it corresponds to the reversed Weibull distribution with support \(y\le \mu - \sigma /\xi\).

The shape parameter \(\xi\) (occasionally called extreme value index, EVI) determines the tail of the GEV distribution. In the Fréchet case (\(\xi >0\)), a heavy tail occurs, while for the reversed Weibull (\(\xi <0\)), there is a finite right end point, i.e. an upper bound.

Turning to the situation with time-dependent parameters, Coles (2001) presented a form for a non-stationary GEV with the distribution function:

$$\begin{aligned} F(y;\mu (t),\sigma (t),\xi ) = \exp \left\{ -\left[ 1+ \xi \frac{y-\mu (t)}{\sigma (t)} \right] ^{-1/\xi }\right\}. \end{aligned}$$

Usually, the shape parameter \(\xi\) is not considered time dependent. As pointed out by Debele et al. (2017), it is assumed to be constant in time due to the high uncertainty of its estimation, even in the stationary case.

Estimation is commonly performed by the maximum-likelihood (ML) method. An implementation in R is found in the package extRemes (Gilleland and Katz 2016); for a review of further implementations, see Gilleland et al. (2013). In the framework of time-varying parameters, typically polynomials in time are considered and a log-linear expression for the scale parameter, e.g.

$$\begin{aligned} \mu (t)=\mu _0+\mu _1t, \quad \sigma (t)=\exp (\sigma _0 + \sigma _1 t). \end{aligned}$$

It could be remarked that the ML estimation is somewhat intricate. As noted above, the support of the GEV distribution depends on the parameter values. Smith (1985) showed for instance that when \(\xi \le -\,1\), the ML estimate does not exist. For further discussion, see Dey et al. (2016) and Bücher and Segers (2017).

GAMLSS

A general class of statistical models for a univariate response variable, called the generalised additive model for location, scale and shape (GAMLSS), was presented by Rigby and Stasinopoulos (2005). In generalised linear models, introduced by Nelder and Wedderburn (1972), the mean of the response variable is modelled through a link function, assuming an exponential family distribution for the response. Within the GAMLSS framework, it is possible to model population parameters of the distribution. In addition to the systematic part of the regression models, it is possible to include additive nonparametric functions of explanatory variables as well as random-effects terms.

In the application given in this paper, there are no random effects and only one covariate (time), so the general GAMLSS model is simplified. Assume that we have p parameters, \({\varvec{{\theta }}}=(\theta _1,\ldots ,\theta _p)\) of a population probability-density function \(f(y|{\varvec{{\theta }}})\) and n observations \(y_1,\ldots ,y_n\). Further, let \({\varvec{{\theta }}}_k\) and \({\varvec{{\eta }}}_k\) be vectors of length n. The model can then, for \(k=1,2,\ldots ,p,\) be formulated:

$$\begin{aligned} g_k({\varvec{{\theta }}}_k)= {\varvec{{\eta }}}_k = {\mathbf {X}}_k{\varvec{{\beta }}}_k. \end{aligned}$$

In our application, \({\varvec{{\beta }}}_k\) usually has two components (intercept and slope), and \({\mathbf {X}}_k\) is the related design matrix. Moreover, the first two population parameters \(\theta _1\) and \(\theta _2\) are usually characterised as location and scale parameters.

For practical purposes, an implementation in R is found in the package gamlss (Stasinopoulos and Rigby 2007). Fitting of the model is performed by maximisation of a penalised likelihood function, and two algorithms are provided. Default links are provided for the various distribution families, but it is also possible for the user to employ alternative links.

Model selection

An important part of modern statistical methodology is model selection. Widely used statistical information criteria are tools for this, e.g. the Akaike information criterion (AIC, Akaike 1974) or the Bayesian information criterion (BIC, Schwarz 1978). In general, BIC penalises more complex models more strongly than AIC (Agresti 2015).

An issue in the context of non-stationary models is the ability to detect non-stationarity. In extensive simulation studies, Panagoulia et al. (2014) investigated simulated samples from GEV distributions with different complexities in the location and scale parts. As a conclusion, use of BIC was suggested: non-stationarity was detected and the correct model selected more often compared to AIC (except in very small samples). Hence, in this study, we utilise BIC as the information criterion when comparing models.

Return periods and return levels

A commonly used risk measure, not the least for design criteria in reliability engineering, is return period and the related return level. Interest is then in estimation of the T-year return level, where typically T is a large number, e.g. 100, 1000, 10,000. This notion will be used extensively in the sequel, and hence a short review will now be given, see also Fernandez and Salas (1999) or Rychlik and Rydén (2006).

Actually, for the classical definition of return level, independence and stationarity are key assumptions, and since waiting time for an extreme event is involved, the geometric distribution plays a key role. It can be suggested that a suitable definition of the T-year return level \(y_{T}\) as being the high quantile for which the probability that the annual maximum exceeds this quantile is 1 / T, and hence

$$\begin{aligned} F(y_{T})=1-1/T \end{aligned}$$
(1)

where F is the cumulative distribution function, e.g. for a fitted GEV distribution.

Extensions of the concepts of return period under non-stationary conditions have been discussed and suggestions been proposed (Salas and Obeysekera 2014; Rootzén and Katz 2013). A review is given by Cooley (2013). Under the assumption of stationarity, there is a one-to-one relationship between a return level (the quantile) and a return period (the associated time interval). However, when the assumption of stationarity is removed, different interpretations are possible (Cooley 2013). One is based on the expected waiting time until an exceedance occurs, another is that for a T-year return period, the expected number of events in T is equal to one.

However, in this paper, we use the conventional approach as the main aim is to study the quality of the fitting of various distribution families.

Uncertainties of the estimated return levels are of interest to assess; for the non-stationary case, see Obeysekera and Salas (2014). In the sequel, we report confidence intervals as reported by the package extRemes, computed by the so-called delta method. For details in the context of return levels, see appendix A.2 in Cooley (2013).

Results

Initial estimation of return levels

At present, the R package gamlss.dist contains more than 100 distributions (Stasinopoulos and Rigby 2018). It is a comprehensive task to judge all these alternatives, and possibly many of them would not make a reasonable fit to the extremes under study, in form of minimum temperatures. As our aim is to study the behaviour of extremes over time, the models in this study have one covariate, time.

Among the GAMLSS alternatives, we decided after initial investigations to fit inverse Gaussian, gamma, lognormal and reverse Gumbel. In addition, using routines from the R package extRemes, the conventional Gumbel distribution was fitted as well as GEV. For the latter distribution, the ML estimate of the shape parameter was found as \({\hat{\xi }}=-\,0.22\). Moreover, for the GEV, permutations of time-varying location and scale parameters were considered; in the case with location parameter, also square dependence on time was an option.

For the location of Östersund, all options considered, and the alternatives presented in the following table were considered the top-four best fits with respect to the BIC values (also listed). For the fitting of the GAMLSS options, the default links were chosen. The GEV model had a location parameter \(\mu (t)=\beta _0+\beta _1 t+\beta _2 t^2\) (although the difference in BIC compared to a regular GEV is slight).

Table 1 BIC values for various fitted models

It could be emphasised that the differences in BIC between the alternatives are quite small. Hilbe (2011) gives a guideline for model comparisons with regard to sample size, and in this case, the models are not significantly different.

We now consider application of the fitted distributions GEV, lognormal and gamma to estimation of 100-year return temperatures (in \(^\circ \hbox {C}\)). Starting with the GEV distribution, a point estimate is \({-}\,38.2\). To get an idea of the uncertainty of this estimate, the delta method gives a 95% confidence interval as \([-\,40.7,\, -\,35.9]\).

Turning to the results from GAMLSS, for both considered families the algorithms returned negative scale parameters for the default links chosen (the scale parameter is positive by definition in the model), and moreover the quantiles could not be computed. Refitting was performed, considering other links and numerical algorithms. We here discuss lognormal and gamma in some detail:

  1. Lognormal distribution.

    Refitting with identity link and inverse link, respectively, resulted in positive scale parameters and the estimates of 100-year temperatures \(-\,37.2\) and \(-\,37.3\), respectively. These are reasonable in light of the previously presented GEV estimate.

  2. Gamma distribution.

    This in fact had the best fit (cf. Table 1). However, trying other link functions did result in estimated 100-year temperatures \({-}\,4.5\) and \({-}\,90\), with identity and inverse link, respectively. These values seem out of question.

Computations using the different options for optimisation algorithms provided in the package did not affect the outcomes for the gamma distribution.

Simulation study

We consider the two distributions discussed above and perform a simple simulation study as follows: (i) generate 5000 samples, each of 100 observations, from the distribution with the fitted parameters; (ii) compute for each sample the 0.99 quantile (that is, the 100-year return level). The so obtained samples of return levels are illustrated in histograms in Fig. 1. In the lognormal case (left panel), we notice that the bulk of data is in the interval \(({-}\,40,{-}\,30)\), in agreement with our estimate from data, \({-}\,37\). Turning to the gamma case (right panel), the distribution is quite skewed. The median from the simulation is \({-}\,55\), but recall that with the links described above, we obtained \({-}\,4.5\) and \({-}\,90\) for the Östersund data. Possibly even more extreme (in the sense of unrealistic) results could be obtained from this skewed distribution.

Fig. 1
figure 1

Histogram of 100-year return value from simulations, based on fitted distributions from GAMLSS. Left: lognormal distribution. Right: gamma distribution

Illustration of fitted models

We now turn to the aspects of time-varying coefficients. In Fig. 2, the dots represent original observations. The smooth, solid curve is the result of fitting a loess model (locally weighted scatter plot smoothing, Cleveland and Devlin 1988). The curve indicates a non-monotone behaviour over time. The dashed line shows the 100-year return value based on the lognormal distribution with a time dependence in the location parameter: \(\mu (t)=0.37{-}0.001t\), \(\sigma =0.15\). The location parameter \(\mu (t)\) is slowly decreasing over time, resulting in a slight increase in 100-year return temperature. The dotted line is the result after fitting a GEV model with a time-dependent location parameter (time squared). In summary, this figure illustrates that the lognormal distribution and a GEV with time-varying parameters are reasonable results. Note that the outcomes of the gamma distribution are not displayed.

Fig. 2
figure 2

Solid curve: fit by loess regression. Dashed: 100-year return value based on GAMLSS, lognormal. Dotted: 100-year return value based on GEV, time-dependent location parameter

Summary

Estimation of return levels is crucial in the earth and environmental sciences. In this case study, we examined regression models from the GAMLSS framework and performed model selection with BIC, comparing with conventional extreme value distributions, GEV and Gumbel (possibly time dependent). Exemplifying with data of extreme minimum temperatures, we found that BIC values from a practical point of view are quite similar for the models examined. Though care must be taken of the choice of links, for this data set, with the lognormal distribution, links other than the default in the software worked out reasonably, while for the gamma distribution, the results were not realistic.

In fact, the researchers involved with GAMLSS have issued a warning, and a passage from a user’s manual is worth quoting in this discussion (Stasinopoulos et al. 2008):

Warning: The models described here are very flexible and therefore should be used with care. As simple advice,

  • Start with a simple model and built (sic) it up.

  • Do not attempt to fit overcomplex models that are not supported by your data.

  • Compare results from several models.

Debele et al. (2017) encouraged further explorations of use of the GAMLSS framework for estimation of return levels and state that differences in values of information criteria might be small, but translate into “considerable differences in time-dependent moments and in the magnitude of upper quantiles”. Indeed, this is confirmed by the findings for the data set in this article; cf. the simulation studies of the gamma case, as presented in Fig. 1.

To conclude, model selection and closer examination of the so obtained distributions are highly important with these flexible distribution types. There might be a danger to solely rely on computed values of information criteria; human judgement is necessary. We finish by a colourful remark by N.T. Longford, in the discussion of Rigby and Stasinopoulos (2005):

The new models are top of the range mathematical Ferraris, but the model selection that is used with them is like a sequence of tollbooths at which partially sighted operators inspect driver’s (sic) licences and road worthiness certificates.