Skip to main content
Log in

Improved interexceedance-times-based estimator of the extremal index using truncated distribution

  • Published:
Extremes Aims and scope Submit manuscript

Abstract

The extremal index is an important parameter in the characterization of extreme values of a stationary sequence. This paper presents a novel approach to estimation of the extremal index based on truncation of interexceedance times. The truncated estimator based on the maximum likelihood method is derived together with its first-order bias. The estimator is further improved using penultimate approximation to the limiting mixture distribution. In order to assess the performance of the proposed estimator, a simulation study is carried out for various stationary processes satisfying the local dependence condition \(D^{(k)}(u_n)\). An application to daily maximum temperatures at Uccle, Belgium, is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data analysed in Sect. 5 are freely available at https://climexp.knmi.nl/start.cgi as a part of the European Climate Assessment and Dataset project.

References

  • Ancona-Navarrete, M.A., Tawn, J.A.: A comparison of methods for estimating the extremal index. Extremes 3(1), 5–38 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., de Waal, D., Ferro, C.: Statistics of Extremes: Theory and Applications. Wiley (2004)

    Book  MATH  Google Scholar 

  • Cai, J.J.: A nonparametric estimator of the extremal index. Available from https://arxiv.org/abs/1911.06674 (2019) (Accessed 10 Dec 2019)

  • Casella, G., Berger, R.L.: Statistical Inference. Thomson Learning (2002)

  • Chernick, M.R., Hsing, T., McCormick, W.P.: Calculating the extremal index for a class of stationary sequences. Adv. Appl. Probab. 23, 835–850 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Fawcett, L., Walshaw, D.: Estimating return levels from serially dependent extremes. Environmetrics 23(3), 272–283 (2012)

    Article  MathSciNet  Google Scholar 

  • Ferreira, M.: Heuristic tools for the estimation of the extremal index: A comparison of methods. REVSTAT 16(1), 115–136 (2018a)

    MathSciNet  MATH  Google Scholar 

  • Ferreira, M.: Analysis of estimation methods for the extremal index. Electron. J. App. Stat. Anal. 11(1), 296–306 (2018b)

    MathSciNet  Google Scholar 

  • Ferreira, H., Ferreira, M.: Estimating the extremal index through local dependence. Annales de l’Institute Henri Poincare (B) Probability and Statistics 54(2), 587–605 (2018)

  • Ferro, C.A.T., Segers, J.: Automatic declustering of extreme values via an estimator for the extremal index. Technical report 2002-025. EURANDOM, Eindhoven (2002). Available from www.eurandom.nl (Accessed 10 Mar 2022)

  • Ferro, C.A.T., Segers, J.: Inference for clusters of extreme values. J. R. Stat. Soc. B Met. 65(2), 545–556 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Fukutome, S., Liniger, M.A., Süveges, M.: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 120(3–4), 403–416 (2015)

    Article  Google Scholar 

  • Fukutome, S., Liniger, M.A., Süveges, M.: Correction to: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 137(3–4), 3215 (2019)

    Article  Google Scholar 

  • Gomes, M.I.: On the estimation of parameters of rare events in environmental time series. In: Barnett, V., Turkman, K. (eds.) Statistics for the Environment 2: Water Related Issues, pp. 225–241. Wiley, New York (1993)

    Google Scholar 

  • Gomes, M.I., Hall, A., Miranda, M.C.: Subsampling techniques and the Jackknife methodology in the estimation of the extremal index. Comput. Stat. Data An. 52(4), 2022–2041 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Holešovský, J., Fusek, M.: Estimation of the extremal index using censored distributions. Extremes 23(2), 197–213 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Hsing, T.: Estimating the parameters of rare events. Stoch. Proc. Appl. 37, 117–139 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Klein Tank, A.M., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta, M., Pashiardis, S., Hejkrlik, L., Kern-Hansen, C., Heino, R.: Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. Climatol. 22, 1441–1453 (2002)

  • Leadbetter, M.R., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Series. Springer (1983)

    Book  MATH  Google Scholar 

  • Northrop, P.J.: An efficient semiparametric maxima estimator of the extremal index. Extremes 18, 585–603 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C.Y., Segers, J., Ferro, C.A.: A sliding blocks estimator for the extremal index. Electron. J. Stat. 3, 993–1020 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Smith, R.L., Weissman, I.: Estimating the extremal index. J. R. Stat. Soc. Ser. B 56, 515–528 (1994)

  • Süveges, M.: Likelihood estimation of the extremal index. Extremes 10, 41–55 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Süveges, M., Davison, A.C.: Model misspecification in peaks over threshold analysis. Ann. Appl. Stat. 4(1), 203–221 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The paper was supported by the Technology Agency of the Czech Republic (project No. TL05000072) and by specific research project at Brno University of Technology (project No. FAST-S-22-7867). The authors would like to thank the editor and the reviewers for a number of useful suggestions which helped improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Holešovský.

Ethics declarations

Conflicts of interests

Authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Derivation of the bias corrections

1.1.1 Bias correction of the estimator \(\widehat{\theta }\) under the limit distribution

Let \(T_1, \dots , T_{N-1}\) be a sequence of interexceedance times obtained from an underlying series \(X_1, \dots , X_n\) for a high enough threshold u. Moreover, let us assume that the underlying series satisfies the \(D^{(D+1)}(u_n)\) condition for certain non-negative integer D. On that account, times \(T_1, \dots , T_{N-1}\) exceeding D can be treated as approximately independent inter-cluster times (Ferro and Segers 2003).

Using the delta method (Casella and Berger 2002), we may establish the asymptotic approximation for the estimator (7), i.e.

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \frac{{{\,\mathrm{\mathrm {E}}\,}}U}{{{\,\mathrm{\mathrm {E}}\,}}V} - \frac{{{\,\mathrm{\mathrm {cov}}\,}}(U,V)}{({{\,\mathrm{\mathrm {E}}\,}}V)^2} + \frac{{{\,\mathrm{\mathrm {E}}\,}}U}{({{\,\mathrm{\mathrm {E}}\,}}V)^3} {{\,\mathrm{\mathrm {var}}\,}}V, \end{aligned}$$
(13)

where \(U = \sum _{i=1}^{N-1} \varvec{1}_{[T_{i}> D]}\) and \(V = \overline{F}(u) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_{i}> D]}\). Let us assume the variable \(\overline{F}(u) T_i\) follows the limiting distribution \(F_{\theta }\) from (2), i.e. \(P(\overline{F}(u) T_i \le x) = 1-\theta \exp (-\theta x)\) for \(x \ge 0\). For the sake of simplicity, we put \(d = \overline{F}(u) D\) and obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = P(T_i>D) = 1-P\left( \overline{F}(u) T_i \le d\right) = \theta \mathrm e^{-\theta d}. \end{aligned}$$
(14)

Notice that for \(x>0\) it holds

$$\begin{aligned} P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}> x\right) = P\left( \overline{F}(u) (T_i-D)> x \mid T_{i}> D\right) \cdot P(T_i>D). \end{aligned}$$
(15)

Let us take \(P(T_i>D) = \theta \mathrm e^{-\theta d}\) from (14), and suppose \(P\left( \overline{F}(u) (T_i-D)> x \mid T_{i}> D\right) = \mathrm e^{-\theta x}\) from (6). Since \(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}\) is a non-negative random variable, by applying (15) we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})&= \int _0^{\infty } P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}> x\right) \,\mathrm {d}x \\&= \theta \mathrm e^{-\theta d} \int _0^{\infty } e^{-\theta x} \,\mathrm {d}x = \mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})^2&= 2\int _0^{\infty } x P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]} > x\right) \,\mathrm {d}x\\&= 2 \theta \mathrm e^{-\theta d} \int _0^{\infty } x \mathrm e^{-\theta x} \,\mathrm {d}x = \frac{2}{\theta } e^{-\theta d}. \end{aligned}$$

Accounting for the independence of the large interexceedance times, we have

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}U&= (N-1) {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = (N-1)\theta \mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {E}}\,}}V&= (N-1) {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) = (N-1)\mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})^2 - \left( {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) \right) ^2 \right] \\&= (N-1) e^{-\theta d} \left( \frac{2}{\theta } - e^{-\theta d} \right) ,\\ {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) - {{\,\mathrm{\mathrm {E}}\,}}(\varvec{1}_{[T_{i}> D]}) {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) \right] \\&= (N-1) \mathrm e^{-\theta d}(1-\theta \mathrm e^{-\theta d}), \end{aligned}$$

where \({{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \overline{F}(u) (T_j-D) \varvec{1}_{[T_j > D]}) = 0\) for \(i \ne j\) by the assumption. We put the foregoing into (13) and obtain

$${{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \theta + \frac{1}{N-1}\mathrm e^{\theta d} = \theta + \frac{1}{N-1} + \frac{\theta d}{N-1} + O(\theta ^2).$$

1.1.2 Bias correction under the penultimate approximation

Let us assume the distribution of each time \(T_i\) is given by \(P(T_i > n) = \theta p^{n \theta }\), \(n=1, 2, \dots\), where \(\theta \in (0,1]\) and \(p \in (0,1)\). Further on, it will be more convenient to work with the probability mass function \(h(n) = P(T_i> n-1) - P(T_i > n)\) of \(T_i\). This takes the form of

$$\begin{aligned} h(n) = \left\{ \begin{array}{ll} 1-\theta p^{\theta }, &{} n=1,\\ \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} &{}, n=2, 3, \dots . \end{array} \right. \end{aligned}$$
(16)

Let us again assume the times \(T_1, \dots , T_{N-1}\) exceeding D can be treated as approximately independent. Let us denote \(U = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\) and \(V = (1-p) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\). From (13) we may derive the first-order bias of the estimator \(\widehat{\theta } = U/V\) under the penultimate distribution (16). Since

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]}\right)&= P(T_i>D) = \theta p^{D \theta },\\ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]}\right)&= \sum _{n=1}^{\infty } n \varvec{1}_{[n>D]} h(n) = \sum _{n=D+1}^{\infty } n \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} = \frac{\theta p^{D\theta }}{1-p^{\theta }}\left[ 1+D(1-p^{\theta })\right] ,\\ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i^2 \varvec{1}_{[T_i>D]}\right)&= \sum _{n=1}^{\infty } n^2 \varvec{1}_{[n>D]} h(n) = \sum _{n=D+1}^{\infty } n^2 \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} \\&= \frac{\theta p^{D \theta }}{(1-p^{\theta })^2} \left( p^{\theta } + \left[ 1+D(1-p^{\theta }) \right] ^2 \right) ,\\ \end{aligned}$$

and \({{\,\mathrm{\mathrm {cov}}\,}}\left( \varvec{1}_{[T_i>D]}, T_i \varvec{1}_{[T_i>D]} \right) = {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) - {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) ,\) straightforward computations lead us to

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}U&= (N-1){{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = (N-1)\theta p^{D \theta },\\ {{\,\mathrm{\mathrm {E}}\,}}V&= (1-p)(N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]}\right) - D {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]}\right) \right] = (N-1) \theta p^{D \theta } \frac{1-p}{1-p^{\theta }}, \end{aligned}$$

Since \({{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(T_i \varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = 0\) for \(i \ne j\) by the assumption, we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (1-p) {{\,\mathrm{\mathrm {cov}}\,}}\left( \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}, \sum _{i=1}^{N-1} (T_i-D)\varvec{1}_{[T_i>D]} \right) \\&= (1-p) \left[ \sum _{i=1}^{N-1} {{\,\mathrm{\mathrm {cov}}\,}}\left( \varvec{1}_{[T_i>D]}, T_i \varvec{1}_{[T_i>D]} \right) - D \sum _{i=1}^{N-1} {{\,\mathrm{\mathrm {var}}\,}}\left( \varvec{1}_{[T_i>D]} \right) \right] ,\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (1-p)^2 (N-1) {{\,\mathrm{\mathrm {var}}\,}}\left[ (T_i-D) \varvec{1}_{[T_i>D]} \right] = (1-p)^2 (N-1) \left[ {{\,\mathrm{\mathrm {var}}\,}}(T_i \varvec{1}_{[T_i>D]}) \right. \\&\left. \quad\;+ D^2 {{\,\mathrm{\mathrm {var}}\,}}(\varvec{1}_{[T_i>D]}) - 2D {{\,\mathrm{\mathrm {cov}}\,}}(T_i \varvec{1}_{[T_i>D]}, \varvec{1}_{[T_i>D]}) \right] \end{aligned}$$

Since \(\varvec{1}_{[T_i>D]}\) is the Bernoulli random variable equal to 1 with probability \(\theta p^{D \theta }\), we have \({{\,\mathrm{\mathrm {var}}\,}}(\varvec{1}_{[T_i>D]}) = \theta p^{D \theta } (1- \theta p^{D \theta })\). It yields

$$\begin{aligned} {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (N-1) \theta p^{D \theta } (1-\theta p^{D \theta }) \frac{1-p}{1-p^{\theta }},\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (N-1)\frac{ p^{D \theta }}{\left( 1-p^{\theta }\right) ^2}\left[ \theta (1+p^{\theta })-\theta ^2p^{D\theta }\right] (1-p)^2. \end{aligned}$$

Hence, if \(T_i\) is described by the penultimate distribution (16), from (13) we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \frac{1-p^{\theta }}{1-p} + \frac{1}{N-1}\frac{(1-p^{\theta })p^{\theta }}{(1-p)\theta p^{D \theta }}. \end{aligned}$$
(17)

If we replace \(\overline{F}(u)\) by \(1-p\) in (9) and substitute \(\widehat{\theta }\) by (17), then

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta }^\mathrm{BC} \approx \frac{1}{N-1+(1-p)D} \left[ (N-1)\left( \frac{1-p^{\theta }}{1-p} + \frac{1}{N-1}\frac{(1-p^{\theta })p^{\theta }}{(1-p)\theta p^{D \theta }} \right) - 1 \right] \\ = \theta + \frac{1-p}{2 (N-1)} \left[ 1 + \theta (N-4) - \theta ^2 (N-1) \right] + O((1-p)^2), \end{aligned}$$

where the last equality follows from the Taylor expansion as \(p \rightarrow 1\).

1.2 Proof of Theorem 2

The bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\) 

Let us denote \(\overline{F}(u_n)\) by \(p_n\). First, let us deal with the bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\) that is of the form

$$\widehat{\theta }^\mathrm{BC} = \frac{N-1}{N-1+p_n D} \cdot \frac{\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}}{p_n \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}} - \frac{1}{N-1+p_n D},$$

where \(N = \sum _{i=1}^n \varvec{1}_{[X_i > u_n]}\). From Lemma B.3 in Ferro and Segers (2002) we already know \(N/(np_n) \xrightarrow {p}1\), i.e. \(N = np_n (1+o_p(1))\). In particular, \(N \xrightarrow {p}\infty\) for \(n \rightarrow \infty\). Since \(p_n = o(1)\), it is obvious that \((N-1)/(N-1+p_n D) \xrightarrow {p}1\), and \(1/(N-1+p_n D) \xrightarrow {p}0\).

The term \(\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\)

Let us define

$$Y_n = \sum _{i=1}^{n-D} \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]}.$$

It should be pointed out that \(Y_n = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]} + \varvec{1}_{[j_N \le n-D]}\), where \(j_N\) denotes the largest exceedance time of \(u_n\) (i.e. the last exceedance) within the series \(X_1, \dots , X_n\). As shown in Ferro and Segers (2002), it holds \(n-j_N = o_p(n)\). This particularly means that \(n-j_N \xrightarrow {p}\infty\); therefore \(\varvec{1}_{[j_N \le n-D]} = \varvec{1}_{[n-j_N \ge D]} \xrightarrow {p}1\).

Based on the stationarity and the \(D^{(D+1)}(u_n)\) condition, it is implied that

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}Y_n&= (n-D) P\left( X_1>u_n\right) P\left( M_{1,1+D} \le u_n \mid X_1>u_n\right) = (n-D) p_n \theta (1 + o(1)). \end{aligned}$$

Let us put \(I_i = \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]}\) for integer \(1 \le i \le n-D\). Notice the m-dependence implies \({{\,\mathrm{\mathrm {cov}}\,}}(I_i, I_j) = 0\) for all \(j > i+D+m\), and the stationarity implies that \({{\,\mathrm{\mathrm {E}}\,}}I_i = {{\,\mathrm{\mathrm {E}}\,}}I_1\). We obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {var}}\,}}Y_n&= \sum _{i=1}^{n-D} \sum _{j=1}^{n-D} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) \le 2 \sum _{i=1}^{n-D} \sum _{j=i}^{n-D} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) = 2 \sum _{i=1}^{n-2D-m} \ \sum _{j=i}^{i+D+m} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) \\&= 2 \sum _{i=1}^{n-2D-m} \ \sum _{j=i}^{i+D+m} \left[ {{\,\mathrm{\mathrm {E}}\,}}(I_i I_j) - {{\,\mathrm{\mathrm {E}}\,}}I_1 \cdot {{\,\mathrm{\mathrm {E}}\,}}I_1 \right] \end{aligned}$$

where \({{\,\mathrm{\mathrm {E}}\,}}I_1 = P\left( X_1> u_n\right) P\left( M_{1,D+1} \le u_n \mid X_1 > u_n\right) = p_n \theta (1+o(1))\) based on the \(D^{(D+1)}(u_n)\) condition, and

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}(I_i I_j)&= P\left( X_i> u_n, M_{i,i+D} \le u_n, X_j> u_n, M_{j,j+D} \le u_n\right) \\&= P\left( X_j> u_n, M_{j,j+D} \le u_n \mid X_i> u_n, M_{i,i+D} \le u_n\right) P\left( M_{i,i+D} \le u_n \mid X_i> u_n\right) \\&\quad\times P(X_i > u_n) \le p_n \theta (1 + o(1)), \end{aligned}$$

for \(i+D+1 \le j \le i+D+m\). For \(j \le i+D\) the events \(I_i\) and \(I_j\) conflict; therefore \({{\,\mathrm{\mathrm {E}}\,}}(I_i I_j) = 0\). Based on this, \({{\,\mathrm{\mathrm {var}}\,}}Y_n \le 2 (n-2D-m) m \left[ p_n \theta - p_n^2 \theta ^2 +o(1) \right] \le 2 n m p_n \theta (1+o(1))\).

Putting \({{\,\mathrm{\mathrm {E}}\,}}Y_n\) and \({{\,\mathrm{\mathrm {var}}\,}}Y_n\) together, the expectation \({{\,\mathrm{\mathrm {E}}\,}}[ \left( Y_n \{n p_n \theta (1+o(1))\}^{-1} - 1 \right) ^2 ]\) is bounded by some positive constant times \(2 m (n p_n \theta )^{-1}\). Since \(2 m (n p_n \theta )^{-1} \rightarrow 0\), this gives \(Y_n = n p_n \theta (1+o_p(1))\).

The term \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\) 

Let us define

$$Z_n = \sum _{i=1}^{n-D} \Big [ \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]} \cdot \sum _{j=1}^{n-i} \varvec{1}_{[M_{i,i+j} \le u_n]} \Big ].$$

It can be seen that \(Z_n = \sum _{i=1}^{N-1} (T_i-1) \varvec{1}_{[T_i>D]} + (n-j_N) \varvec{1}_{[j_N \le n-D]}\), and

$$\begin{aligned} \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}&= Z_n - (n-j_N) \varvec{1}_{[j_N \le n-D]} - (D-1) \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\\&= Z_n + o_p(n) - (D-1) (Y_n + O_p(1)). \end{aligned}$$

Moreover, we can take advantage of the following expression

$$Z_n = \sum _{i=1}^{n-D} \Big [ \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]} \Big ( D + \sum _{j=1}^{n-D-i} \varvec{1}_{[M_{i,i+D+j} \le u_n]} \Big ) \Big ] = DY_n + V_n,$$

where

$$V_n = \sum _{i=1}^{n-D} \sum _{j=1}^{n-D-i} \varvec{1}_{[X_i>u_n, M_{i,i+D+j} \le u_n]}.$$

Therefore \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]} = V_n + Y_n + o_p(n)\).

To establish consistency of the estimator \(\widehat{\theta }^\mathrm{BC}\), we only need to show \(n^{-1}V_n \xrightarrow {p}1\). Thus, we need to prove (i) \(n^{-1} {{\,\mathrm{\mathrm {E}}\,}}V_n \rightarrow 1\) and (ii) \(n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1\).

By stationarity of the process we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}V_n&= \sum _{i=1}^{n-D} \sum _{j=1}^{n-D-i} P\left( X_i>u_n, M_{i,i+D+j} \le u_n\right) \\&= p_n \sum _{j=1}^{n-D} (n-D-j) P\left( M_{1,1+D+j} \le u_n \mid X_1>u_n\right) . \end{aligned}$$

We rewrite the sum as an integral and observe

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}V_n&= p_n \int _{1}^{n-D} (n-D-\lceil s \rceil ) P\left( M_{1,1+D+\lceil s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s\\&= p_n r_n \int _{r_n^{-1}}^{(n-D)/r_n} (n-D-\lceil r_n s \rceil ) P\left( M_{1,1+D+\lceil r_n s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s. \end{aligned}$$

We apply Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) that specify the convergence of the probability of maximum, and by the dominated convergence theorem it follows

$$n^{-1} {{\,\mathrm{\mathrm {E}}\,}}V_n \rightarrow \tau \int _0^{\infty } \theta \mathrm e^{-\theta \tau s} \,\mathrm {d}s = 1.$$

Hereby we have proven the first point concerning \(V_n\).

Now we focus on (ii) \(n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1\). For integers \(1 \le i \le n-D\), \(1 \le j \le n-D-i\), put \(I_{i,j} = \varvec{1}_{[X_i>u_n, M_{i,i+D+j} \le u_n]}\). We may write

$${{\,\mathrm{\mathrm {E}}\,}}(V_n^2) = \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=1}^{n-D} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l}),$$

where, accounting m-dependence and stationarity, the expected values are as follows

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}I_{i,j}&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,j},&{{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l})&= 0, \text { for } i < k \le i+D+j,\\ {{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{i,l})&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{j,l\}},&{{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l})&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} \cdot {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \text { for } k > i+D+j+m. \end{aligned}$$

We can write \({{\,\mathrm{\mathrm {E}}\,}}(V_n^2) = A_n + B_n + C_n\), where

$$\begin{aligned} A_n&= \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{l=1}^{n-D-i} {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{ j,l \}},\\ B_n&= 2 \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=i+D+j+1}^{i+D+j+m} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{ j,l \}}\\ C_n&= 2 \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=i+D+j+m+1}^{n-D} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} \cdot {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \end{aligned}$$

At the same time, \({{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = P\left( X_1>u_n, M_{1,1+D+l} \le u_n\right) \le p_n C \gamma ^{(D+l)/r_n}\) for some \(C>0\) and \(0< \gamma < 1\), which follows from Lemma B.1 (Ferro and Segers 2002). We obtain

$$\begin{aligned} A_n&\le 2 (n-D) \sum _{j=1}^{n-D} \sum _{l=j}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = 2 (n-D) \sum _{l=1}^{n-D} l {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \le 2(n-D) \sum _{l=1}^{n-D} l p_n C \gamma ^{\frac{D+l}{r_n}},\\ B_n&\le 4m(n-D) \sum _{j=1}^{n-D} \sum _{l=j}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = 4m (n-D) \sum _{l=1}^{n-D} l {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \le 4m(n-D) \sum _{l=1}^{n-D} l p_n C \gamma ^{\frac{D+l}{r_n}}. \end{aligned}$$

Therefore, by the same arguments as in Ferro and Segers (2002), both \(A_n\), \(B_n\) are \(o(n^2)\).

For \(C_n\) let us start with changing the summation order and derive that

$$\begin{aligned} C_n&= 2 \sum _{j=1}^{n-D} \sum _{l=1}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \sum _{i=1}^{n-D-j} \sum _{k=i+D+j+m+1}^{n-D-l} 1\\&= \sum _{j=1}^{n-D} \sum _{l=1}^{n-D} (n-2D-l-j-m) (n-2D-l-j-m-1)_+ {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \end{aligned}$$

where \(a_+\) denotes \(\max \{ a,0 \}\). Let us rewrite the sum as an integral

$$\begin{aligned} C_n&= \int \limits _{1}^{n-D} \int \limits _{1}^{n-D} (n-2D-\lceil t \rceil -\lceil s \rceil -m) (n-2D-\lceil t \rceil -\lceil s \rceil -m-1)_+ \\&\quad\times p_n^2 P\left( M_{1,1+D+\lceil t \rceil } \le u_n \mid X_1>u_n\right) P\left( M_{1,1+D+\lceil s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s \,\mathrm {d}t\\&= p_n^2 r_n^2 \int \limits _{r_n^{-1}}^{\frac{n-D}{r_n}} \int \limits _{r_n^{-1}}^{\frac{n-D}{r_n}} (n-2D-\lceil r_n t \rceil -\lceil r_n s \rceil -m) (n-2D-\lceil r_n t \rceil -\lceil r_n s \rceil -m-1)_+\\&\quad\times P\left( M_{1,1+D+\lceil r_n t \rceil } \le u_n \mid X_1>u_n\right) P\left( M_{1,1+D+\lceil r_n s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s \,\mathrm {d}t. \end{aligned}$$

Let us use Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) again, and by the dominated convergence theorem obtain

$$n^{-2} C_n \rightarrow \tau ^2 \int _0^{\infty } \int _0^{\infty } \theta ^2 \mathrm e^{-\theta \tau t} \mathrm e^{-\theta \tau s} \,\mathrm {d}s \,\mathrm {d}t = 1.$$

The estimator \(\widetilde{\theta }_\mathrm{T}\) 

From the previous paragraphs we already know \(\widehat{\theta }^\mathrm{BC} \xrightarrow {p}\theta\). It remains to show that the estimator \(\widetilde{\theta }_\mathrm{T}\) is consistent as well. Let us rewrite the relation (11) to get \(\widetilde{\theta }_\mathrm{T} = g_n\left( \widehat{\theta }^\mathrm{BC} \right)\), where

$$g_n(t) = -\frac{p_n}{2(N-1)} + t \left( 1 - \frac{p_n(N-4)}{2(N-1)} \right) + \frac{t^2 p_n}{2}.$$

From the continuous mapping theorem it follows \(g_n\left( \widehat{\theta }^\mathrm{BC} \right) \xrightarrow {p}g_n(\theta )\). Since \(p_n = o(1)\) and \(N = n p_n (1+o_p(1)) = o_p(n)\), it is clear that \(g_n(\theta ) \xrightarrow {p}\theta\) for \(n \rightarrow \infty\).

\(\square\)

1.3 Extremal index estimators from the simulation study

Consider a sequence of interexceedance times \(T_1, \dots , T_{N-1}\), and let \(T_{(1)} \le \dots \le T_{(N-1)}\) be the corresponding order statistics. In Sect. 4, properties of the truncated estimator \(\widehat{\theta }_\mathrm{T}\) and the following competitive estimators based on interexceedance times are assessed.

Intervals estimator

The intervals estimator \(\widehat{\theta }_{\text {I}}\) of Ferro and Segers (2003) is

$$\begin{aligned} \widehat{\theta }_{\text {I}} = \left\{ \begin{array}{ll} \min \{ 1,\tilde{\theta } \} &{} \quad \mathrm{if}~\max \{T_i:1\le i\le N-1\}\le 2,\\ \min \{ 1,\tilde{\theta }^{*} \} &{} \quad \mathrm{if}~\max \{T_i:1\le i\le N-1\}> 2, \end{array}\right. \end{aligned}$$
(18)

where

$$\begin{aligned} \tilde{\theta } = \frac{2\left( \sum \limits _{i=1}^{N-1}T_i\right) ^2}{(N-1)\sum \limits _{i=1}^{N-1} T_i^2},\qquad \tilde{\theta }^{*} = \frac{2\left[ \sum \limits _{i=1}^{N-1}(T_i-1)\right] ^2}{(N-1)\sum \limits _{i=1}^{N-1} (T_i-1)(T_i-2)}. \end{aligned}$$

K-gap estimator

The K-gap estimator \(\widehat{\theta }_{\text {SD}}\) of Süveges and Davison (2010) is based on the sample \(S_i^{(K)} = \max \{ T_i-K;0 \}\), \(i=1,\dots , N-1\), for some \(K \ge 0\). Considering the limit distribution from Süveges and Davison (2010) and under some additional assumptions on the local dependence, the corresponding log-likelihood function is of the form

$$\begin{aligned} \ell _{\text {SD}}\left( \theta \right) = (N-1-N_C) \log (1-\theta ) + 2N_C\log \theta - \theta \sum \limits _{i=1}^{N-1} \overline{F}(u) S_i^{(K)}, \end{aligned}$$

where \(N_C = \sum _{i=1}^{N-1} \varvec{1}_{[T_i > K]}\) is the number of interexceedance times exceeding the value of K. The K-gap estimator is obtained by maximizing the log-likelihood function, i.e.

$$\begin{aligned} \widehat{\theta }_{\text {SD}} = {{\,{\text {arg max}}\,}}_{0 \le \theta \le 1} \ell _{\text {SD}}\left( \theta \right) . \end{aligned}$$
(19)

Censored estimator

The estimator \(\widehat{\theta }_{\text {C}}\) of Holešovský and Fusek (2020) is based on censoring of the interexceedance times. For a given \(D \ge 0\) assume that \(T_{(N-N_D-1)} \le D < T_{(N-N_D)}\) for a positive integer \(N_D\). The times \(T_{(1)}, \dots , T_{(N-N_D-1)}\) are considered censored, while the times exceeding the value of D are considered observed. The corresponding log-likelihood function is

$$\begin{aligned} \ell _{\text {C}}\left( \theta \right)&= (N-1-N_D) \log \left( 1-\theta e^{-\theta \overline{F}(u) D}\right) + \log \frac{(N-1)!}{(N-1-N_D)!}\\&\quad+ 2 N_D \log \theta -\theta \sum _{i=N-N_D}^{N-1} \overline{F}(u) T_{(i)}. \end{aligned}$$

The censored estimator is obtained by maximizing the log-likelihood function, i.e.

$$\begin{aligned} \widehat{\theta }_{\text {C}} = {{\,{\text {arg max}}\,}}_{0 \le \theta \le 1} \ell _{\text {C}}\left( \theta \right) . \end{aligned}$$
(20)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holešovský, J., Fusek, M. Improved interexceedance-times-based estimator of the extremal index using truncated distribution. Extremes 25, 695–720 (2022). https://doi.org/10.1007/s10687-022-00444-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10687-022-00444-8

Keywords

AMS 2000 Subject Classifications

Navigation