Abstract
The extremal index is an important parameter in the characterization of extreme values of a stationary sequence. This paper presents a novel approach to estimation of the extremal index based on truncation of interexceedance times. The truncated estimator based on the maximum likelihood method is derived together with its first-order bias. The estimator is further improved using penultimate approximation to the limiting mixture distribution. In order to assess the performance of the proposed estimator, a simulation study is carried out for various stationary processes satisfying the local dependence condition \(D^{(k)}(u_n)\). An application to daily maximum temperatures at Uccle, Belgium, is also presented.
Similar content being viewed by others
Data availability
The data analysed in Sect. 5 are freely available at https://climexp.knmi.nl/start.cgi as a part of the European Climate Assessment and Dataset project.
References
Ancona-Navarrete, M.A., Tawn, J.A.: A comparison of methods for estimating the extremal index. Extremes 3(1), 5–38 (2000)
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., de Waal, D., Ferro, C.: Statistics of Extremes: Theory and Applications. Wiley (2004)
Cai, J.J.: A nonparametric estimator of the extremal index. Available from https://arxiv.org/abs/1911.06674 (2019) (Accessed 10 Dec 2019)
Casella, G., Berger, R.L.: Statistical Inference. Thomson Learning (2002)
Chernick, M.R., Hsing, T., McCormick, W.P.: Calculating the extremal index for a class of stationary sequences. Adv. Appl. Probab. 23, 835–850 (1991)
Fawcett, L., Walshaw, D.: Estimating return levels from serially dependent extremes. Environmetrics 23(3), 272–283 (2012)
Ferreira, M.: Heuristic tools for the estimation of the extremal index: A comparison of methods. REVSTAT 16(1), 115–136 (2018a)
Ferreira, M.: Analysis of estimation methods for the extremal index. Electron. J. App. Stat. Anal. 11(1), 296–306 (2018b)
Ferreira, H., Ferreira, M.: Estimating the extremal index through local dependence. Annales de l’Institute Henri Poincare (B) Probability and Statistics 54(2), 587–605 (2018)
Ferro, C.A.T., Segers, J.: Automatic declustering of extreme values via an estimator for the extremal index. Technical report 2002-025. EURANDOM, Eindhoven (2002). Available from www.eurandom.nl (Accessed 10 Mar 2022)
Ferro, C.A.T., Segers, J.: Inference for clusters of extreme values. J. R. Stat. Soc. B Met. 65(2), 545–556 (2003)
Fukutome, S., Liniger, M.A., Süveges, M.: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 120(3–4), 403–416 (2015)
Fukutome, S., Liniger, M.A., Süveges, M.: Correction to: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 137(3–4), 3215 (2019)
Gomes, M.I.: On the estimation of parameters of rare events in environmental time series. In: Barnett, V., Turkman, K. (eds.) Statistics for the Environment 2: Water Related Issues, pp. 225–241. Wiley, New York (1993)
Gomes, M.I., Hall, A., Miranda, M.C.: Subsampling techniques and the Jackknife methodology in the estimation of the extremal index. Comput. Stat. Data An. 52(4), 2022–2041 (2008)
Holešovský, J., Fusek, M.: Estimation of the extremal index using censored distributions. Extremes 23(2), 197–213 (2020)
Hsing, T.: Estimating the parameters of rare events. Stoch. Proc. Appl. 37, 117–139 (1991)
Klein Tank, A.M., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta, M., Pashiardis, S., Hejkrlik, L., Kern-Hansen, C., Heino, R.: Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. Climatol. 22, 1441–1453 (2002)
Leadbetter, M.R., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Series. Springer (1983)
Northrop, P.J.: An efficient semiparametric maxima estimator of the extremal index. Extremes 18, 585–603 (2015)
Robert, C.Y., Segers, J., Ferro, C.A.: A sliding blocks estimator for the extremal index. Electron. J. Stat. 3, 993–1020 (2009)
Smith, R.L., Weissman, I.: Estimating the extremal index. J. R. Stat. Soc. Ser. B 56, 515–528 (1994)
Süveges, M.: Likelihood estimation of the extremal index. Extremes 10, 41–55 (2007)
Süveges, M., Davison, A.C.: Model misspecification in peaks over threshold analysis. Ann. Appl. Stat. 4(1), 203–221 (2010)
Acknowledgements
The paper was supported by the Technology Agency of the Czech Republic (project No. TL05000072) and by specific research project at Brno University of Technology (project No. FAST-S-22-7867). The authors would like to thank the editor and the reviewers for a number of useful suggestions which helped improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests
Authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Derivation of the bias corrections
1.1.1 Bias correction of the estimator \(\widehat{\theta }\) under the limit distribution
Let \(T_1, \dots , T_{N-1}\) be a sequence of interexceedance times obtained from an underlying series \(X_1, \dots , X_n\) for a high enough threshold u. Moreover, let us assume that the underlying series satisfies the \(D^{(D+1)}(u_n)\) condition for certain non-negative integer D. On that account, times \(T_1, \dots , T_{N-1}\) exceeding D can be treated as approximately independent inter-cluster times (Ferro and Segers 2003).
Using the delta method (Casella and Berger 2002), we may establish the asymptotic approximation for the estimator (7), i.e.
where \(U = \sum _{i=1}^{N-1} \varvec{1}_{[T_{i}> D]}\) and \(V = \overline{F}(u) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_{i}> D]}\). Let us assume the variable \(\overline{F}(u) T_i\) follows the limiting distribution \(F_{\theta }\) from (2), i.e. \(P(\overline{F}(u) T_i \le x) = 1-\theta \exp (-\theta x)\) for \(x \ge 0\). For the sake of simplicity, we put \(d = \overline{F}(u) D\) and obtain
Notice that for \(x>0\) it holds
Let us take \(P(T_i>D) = \theta \mathrm e^{-\theta d}\) from (14), and suppose \(P\left( \overline{F}(u) (T_i-D)> x \mid T_{i}> D\right) = \mathrm e^{-\theta x}\) from (6). Since \(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}\) is a non-negative random variable, by applying (15) we obtain
Accounting for the independence of the large interexceedance times, we have
where \({{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \overline{F}(u) (T_j-D) \varvec{1}_{[T_j > D]}) = 0\) for \(i \ne j\) by the assumption. We put the foregoing into (13) and obtain
1.1.2 Bias correction under the penultimate approximation
Let us assume the distribution of each time \(T_i\) is given by \(P(T_i > n) = \theta p^{n \theta }\), \(n=1, 2, \dots\), where \(\theta \in (0,1]\) and \(p \in (0,1)\). Further on, it will be more convenient to work with the probability mass function \(h(n) = P(T_i> n-1) - P(T_i > n)\) of \(T_i\). This takes the form of
Let us again assume the times \(T_1, \dots , T_{N-1}\) exceeding D can be treated as approximately independent. Let us denote \(U = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\) and \(V = (1-p) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\). From (13) we may derive the first-order bias of the estimator \(\widehat{\theta } = U/V\) under the penultimate distribution (16). Since
and \({{\,\mathrm{\mathrm {cov}}\,}}\left( \varvec{1}_{[T_i>D]}, T_i \varvec{1}_{[T_i>D]} \right) = {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) - {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) ,\) straightforward computations lead us to
Since \({{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(T_i \varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = 0\) for \(i \ne j\) by the assumption, we obtain
Since \(\varvec{1}_{[T_i>D]}\) is the Bernoulli random variable equal to 1 with probability \(\theta p^{D \theta }\), we have \({{\,\mathrm{\mathrm {var}}\,}}(\varvec{1}_{[T_i>D]}) = \theta p^{D \theta } (1- \theta p^{D \theta })\). It yields
Hence, if \(T_i\) is described by the penultimate distribution (16), from (13) we obtain
If we replace \(\overline{F}(u)\) by \(1-p\) in (9) and substitute \(\widehat{\theta }\) by (17), then
where the last equality follows from the Taylor expansion as \(p \rightarrow 1\).
1.2 Proof of Theorem 2
The bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\)
Let us denote \(\overline{F}(u_n)\) by \(p_n\). First, let us deal with the bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\) that is of the form
where \(N = \sum _{i=1}^n \varvec{1}_{[X_i > u_n]}\). From Lemma B.3 in Ferro and Segers (2002) we already know \(N/(np_n) \xrightarrow {p}1\), i.e. \(N = np_n (1+o_p(1))\). In particular, \(N \xrightarrow {p}\infty\) for \(n \rightarrow \infty\). Since \(p_n = o(1)\), it is obvious that \((N-1)/(N-1+p_n D) \xrightarrow {p}1\), and \(1/(N-1+p_n D) \xrightarrow {p}0\).
The term \(\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\)
Let us define
It should be pointed out that \(Y_n = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]} + \varvec{1}_{[j_N \le n-D]}\), where \(j_N\) denotes the largest exceedance time of \(u_n\) (i.e. the last exceedance) within the series \(X_1, \dots , X_n\). As shown in Ferro and Segers (2002), it holds \(n-j_N = o_p(n)\). This particularly means that \(n-j_N \xrightarrow {p}\infty\); therefore \(\varvec{1}_{[j_N \le n-D]} = \varvec{1}_{[n-j_N \ge D]} \xrightarrow {p}1\).
Based on the stationarity and the \(D^{(D+1)}(u_n)\) condition, it is implied that
Let us put \(I_i = \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]}\) for integer \(1 \le i \le n-D\). Notice the m-dependence implies \({{\,\mathrm{\mathrm {cov}}\,}}(I_i, I_j) = 0\) for all \(j > i+D+m\), and the stationarity implies that \({{\,\mathrm{\mathrm {E}}\,}}I_i = {{\,\mathrm{\mathrm {E}}\,}}I_1\). We obtain
where \({{\,\mathrm{\mathrm {E}}\,}}I_1 = P\left( X_1> u_n\right) P\left( M_{1,D+1} \le u_n \mid X_1 > u_n\right) = p_n \theta (1+o(1))\) based on the \(D^{(D+1)}(u_n)\) condition, and
for \(i+D+1 \le j \le i+D+m\). For \(j \le i+D\) the events \(I_i\) and \(I_j\) conflict; therefore \({{\,\mathrm{\mathrm {E}}\,}}(I_i I_j) = 0\). Based on this, \({{\,\mathrm{\mathrm {var}}\,}}Y_n \le 2 (n-2D-m) m \left[ p_n \theta - p_n^2 \theta ^2 +o(1) \right] \le 2 n m p_n \theta (1+o(1))\).
Putting \({{\,\mathrm{\mathrm {E}}\,}}Y_n\) and \({{\,\mathrm{\mathrm {var}}\,}}Y_n\) together, the expectation \({{\,\mathrm{\mathrm {E}}\,}}[ \left( Y_n \{n p_n \theta (1+o(1))\}^{-1} - 1 \right) ^2 ]\) is bounded by some positive constant times \(2 m (n p_n \theta )^{-1}\). Since \(2 m (n p_n \theta )^{-1} \rightarrow 0\), this gives \(Y_n = n p_n \theta (1+o_p(1))\).
The term \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\)
Let us define
It can be seen that \(Z_n = \sum _{i=1}^{N-1} (T_i-1) \varvec{1}_{[T_i>D]} + (n-j_N) \varvec{1}_{[j_N \le n-D]}\), and
Moreover, we can take advantage of the following expression
where
Therefore \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]} = V_n + Y_n + o_p(n)\).
To establish consistency of the estimator \(\widehat{\theta }^\mathrm{BC}\), we only need to show \(n^{-1}V_n \xrightarrow {p}1\). Thus, we need to prove (i) \(n^{-1} {{\,\mathrm{\mathrm {E}}\,}}V_n \rightarrow 1\) and (ii) \(n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1\).
By stationarity of the process we obtain
We rewrite the sum as an integral and observe
We apply Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) that specify the convergence of the probability of maximum, and by the dominated convergence theorem it follows
Hereby we have proven the first point concerning \(V_n\).
Now we focus on (ii) \(n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1\). For integers \(1 \le i \le n-D\), \(1 \le j \le n-D-i\), put \(I_{i,j} = \varvec{1}_{[X_i>u_n, M_{i,i+D+j} \le u_n]}\). We may write
where, accounting m-dependence and stationarity, the expected values are as follows
We can write \({{\,\mathrm{\mathrm {E}}\,}}(V_n^2) = A_n + B_n + C_n\), where
At the same time, \({{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = P\left( X_1>u_n, M_{1,1+D+l} \le u_n\right) \le p_n C \gamma ^{(D+l)/r_n}\) for some \(C>0\) and \(0< \gamma < 1\), which follows from Lemma B.1 (Ferro and Segers 2002). We obtain
Therefore, by the same arguments as in Ferro and Segers (2002), both \(A_n\), \(B_n\) are \(o(n^2)\).
For \(C_n\) let us start with changing the summation order and derive that
where \(a_+\) denotes \(\max \{ a,0 \}\). Let us rewrite the sum as an integral
Let us use Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) again, and by the dominated convergence theorem obtain
The estimator \(\widetilde{\theta }_\mathrm{T}\)
From the previous paragraphs we already know \(\widehat{\theta }^\mathrm{BC} \xrightarrow {p}\theta\). It remains to show that the estimator \(\widetilde{\theta }_\mathrm{T}\) is consistent as well. Let us rewrite the relation (11) to get \(\widetilde{\theta }_\mathrm{T} = g_n\left( \widehat{\theta }^\mathrm{BC} \right)\), where
From the continuous mapping theorem it follows \(g_n\left( \widehat{\theta }^\mathrm{BC} \right) \xrightarrow {p}g_n(\theta )\). Since \(p_n = o(1)\) and \(N = n p_n (1+o_p(1)) = o_p(n)\), it is clear that \(g_n(\theta ) \xrightarrow {p}\theta\) for \(n \rightarrow \infty\).
\(\square\)
1.3 Extremal index estimators from the simulation study
Consider a sequence of interexceedance times \(T_1, \dots , T_{N-1}\), and let \(T_{(1)} \le \dots \le T_{(N-1)}\) be the corresponding order statistics. In Sect. 4, properties of the truncated estimator \(\widehat{\theta }_\mathrm{T}\) and the following competitive estimators based on interexceedance times are assessed.
Intervals estimator
The intervals estimator \(\widehat{\theta }_{\text {I}}\) of Ferro and Segers (2003) is
where
K-gap estimator
The K-gap estimator \(\widehat{\theta }_{\text {SD}}\) of Süveges and Davison (2010) is based on the sample \(S_i^{(K)} = \max \{ T_i-K;0 \}\), \(i=1,\dots , N-1\), for some \(K \ge 0\). Considering the limit distribution from Süveges and Davison (2010) and under some additional assumptions on the local dependence, the corresponding log-likelihood function is of the form
where \(N_C = \sum _{i=1}^{N-1} \varvec{1}_{[T_i > K]}\) is the number of interexceedance times exceeding the value of K. The K-gap estimator is obtained by maximizing the log-likelihood function, i.e.
Censored estimator
The estimator \(\widehat{\theta }_{\text {C}}\) of Holešovský and Fusek (2020) is based on censoring of the interexceedance times. For a given \(D \ge 0\) assume that \(T_{(N-N_D-1)} \le D < T_{(N-N_D)}\) for a positive integer \(N_D\). The times \(T_{(1)}, \dots , T_{(N-N_D-1)}\) are considered censored, while the times exceeding the value of D are considered observed. The corresponding log-likelihood function is
The censored estimator is obtained by maximizing the log-likelihood function, i.e.
Rights and permissions
About this article
Cite this article
Holešovský, J., Fusek, M. Improved interexceedance-times-based estimator of the extremal index using truncated distribution. Extremes 25, 695–720 (2022). https://doi.org/10.1007/s10687-022-00444-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687-022-00444-8