Improved interexceedance-times-based estimator of the extremal index using truncated distribution

Holešovský, Jan; Fusek, Michal

doi:10.1007/s10687-022-00444-8

Improved interexceedance-times-based estimator of the extremal index using truncated distribution

Published: 24 June 2022

Volume 25, pages 695–720, (2022)
Cite this article

Extremes Aims and scope Submit manuscript

245 Accesses
4 Citations
Explore all metrics

Abstract

The extremal index is an important parameter in the characterization of extreme values of a stationary sequence. This paper presents a novel approach to estimation of the extremal index based on truncation of interexceedance times. The truncated estimator based on the maximum likelihood method is derived together with its first-order bias. The estimator is further improved using penultimate approximation to the limiting mixture distribution. In order to assess the performance of the proposed estimator, a simulation study is carried out for various stationary processes satisfying the local dependence condition $D^{(k)}(u_n)$. An application to daily maximum temperatures at Uccle, Belgium, is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of the extremal index using censored distributions

Article 02 March 2020

On the Asymptotic Locations of the Largest and Smallest Extremes of a Stationary Sequence

Article 27 February 2017

An adaptive truncation method for inference in Bayesian nonparametric models

Article 09 December 2014

Data availability

The data analysed in Sect. 5 are freely available at https://climexp.knmi.nl/start.cgi as a part of the European Climate Assessment and Dataset project.

References

Ancona-Navarrete, M.A., Tawn, J.A.: A comparison of methods for estimating the extremal index. Extremes 3(1), 5–38 (2000)
Article MathSciNet MATH Google Scholar
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., de Waal, D., Ferro, C.: Statistics of Extremes: Theory and Applications. Wiley (2004)
Book MATH Google Scholar
Cai, J.J.: A nonparametric estimator of the extremal index. Available from https://arxiv.org/abs/1911.06674 (2019) (Accessed 10 Dec 2019)
Casella, G., Berger, R.L.: Statistical Inference. Thomson Learning (2002)
Chernick, M.R., Hsing, T., McCormick, W.P.: Calculating the extremal index for a class of stationary sequences. Adv. Appl. Probab. 23, 835–850 (1991)
Article MathSciNet MATH Google Scholar
Fawcett, L., Walshaw, D.: Estimating return levels from serially dependent extremes. Environmetrics 23(3), 272–283 (2012)
Article MathSciNet Google Scholar
Ferreira, M.: Heuristic tools for the estimation of the extremal index: A comparison of methods. REVSTAT 16(1), 115–136 (2018a)
MathSciNet MATH Google Scholar
Ferreira, M.: Analysis of estimation methods for the extremal index. Electron. J. App. Stat. Anal. 11(1), 296–306 (2018b)
MathSciNet Google Scholar
Ferreira, H., Ferreira, M.: Estimating the extremal index through local dependence. Annales de l’Institute Henri Poincare (B) Probability and Statistics 54(2), 587–605 (2018)
Ferro, C.A.T., Segers, J.: Automatic declustering of extreme values via an estimator for the extremal index. Technical report 2002-025. EURANDOM, Eindhoven (2002). Available from www.eurandom.nl (Accessed 10 Mar 2022)
Ferro, C.A.T., Segers, J.: Inference for clusters of extreme values. J. R. Stat. Soc. B Met. 65(2), 545–556 (2003)
Article MathSciNet MATH Google Scholar
Fukutome, S., Liniger, M.A., Süveges, M.: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 120(3–4), 403–416 (2015)
Article Google Scholar
Fukutome, S., Liniger, M.A., Süveges, M.: Correction to: Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theor. Appl. Climatol. 137(3–4), 3215 (2019)
Article Google Scholar
Gomes, M.I.: On the estimation of parameters of rare events in environmental time series. In: Barnett, V., Turkman, K. (eds.) Statistics for the Environment 2: Water Related Issues, pp. 225–241. Wiley, New York (1993)
Google Scholar
Gomes, M.I., Hall, A., Miranda, M.C.: Subsampling techniques and the Jackknife methodology in the estimation of the extremal index. Comput. Stat. Data An. 52(4), 2022–2041 (2008)
Article MathSciNet MATH Google Scholar
Holešovský, J., Fusek, M.: Estimation of the extremal index using censored distributions. Extremes 23(2), 197–213 (2020)
Article MathSciNet MATH Google Scholar
Hsing, T.: Estimating the parameters of rare events. Stoch. Proc. Appl. 37, 117–139 (1991)
Article MathSciNet MATH Google Scholar
Klein Tank, A.M., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta, M., Pashiardis, S., Hejkrlik, L., Kern-Hansen, C., Heino, R.: Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. Climatol. 22, 1441–1453 (2002)
Leadbetter, M.R., Lindgren, G., Rootzén, H.: Extremes and Related Properties of Random Sequences and Series. Springer (1983)
Book MATH Google Scholar
Northrop, P.J.: An efficient semiparametric maxima estimator of the extremal index. Extremes 18, 585–603 (2015)
Article MathSciNet MATH Google Scholar
Robert, C.Y., Segers, J., Ferro, C.A.: A sliding blocks estimator for the extremal index. Electron. J. Stat. 3, 993–1020 (2009)
Article MathSciNet MATH Google Scholar
Smith, R.L., Weissman, I.: Estimating the extremal index. J. R. Stat. Soc. Ser. B 56, 515–528 (1994)
Süveges, M.: Likelihood estimation of the extremal index. Extremes 10, 41–55 (2007)
Article MathSciNet MATH Google Scholar
Süveges, M., Davison, A.C.: Model misspecification in peaks over threshold analysis. Ann. Appl. Stat. 4(1), 203–221 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The paper was supported by the Technology Agency of the Czech Republic (project No. TL05000072) and by specific research project at Brno University of Technology (project No. FAST-S-22-7867). The authors would like to thank the editor and the reviewers for a number of useful suggestions which helped improve the manuscript.

Author information

Authors and Affiliations

Brno University of Technology, Faculty of Civil Engineering, Institute of Mathematics and Descriptive Geometry, Žižkova 17, 602 00, Brno, Czech Republic
Jan Holešovský
Department of Mathematics, Brno University of Technology, Faculty of Electrical Engineering and Communication, Technická 2848/8, 61600, Brno, Czech Republic
Michal Fusek

Authors

Jan Holešovský
View author publications
You can also search for this author in PubMed Google Scholar
Michal Fusek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Holešovský.

Ethics declarations

Conflicts of interests

Authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Derivation of the bias corrections

1.1.1 Bias correction of the estimator $\widehat{\theta }$ under the limit distribution

Let $T_1, \dots , T_{N-1}$ be a sequence of interexceedance times obtained from an underlying series $X_1, \dots , X_n$ for a high enough threshold u. Moreover, let us assume that the underlying series satisfies the $D^{(D+1)}(u_n)$ condition for certain non-negative integer D. On that account, times $T_1, \dots , T_{N-1}$ exceeding D can be treated as approximately independent inter-cluster times (Ferro and Segers 2003).

Using the delta method (Casella and Berger 2002), we may establish the asymptotic approximation for the estimator (7), i.e.

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \frac{{{\,\mathrm{\mathrm {E}}\,}}U}{{{\,\mathrm{\mathrm {E}}\,}}V} - \frac{{{\,\mathrm{\mathrm {cov}}\,}}(U,V)}{({{\,\mathrm{\mathrm {E}}\,}}V)^2} + \frac{{{\,\mathrm{\mathrm {E}}\,}}U}{({{\,\mathrm{\mathrm {E}}\,}}V)^3} {{\,\mathrm{\mathrm {var}}\,}}V, \end{aligned}$$

(13)

where $U = \sum _{i=1}^{N-1} \varvec{1}_{[T_{i}> D]}$ and $V = \overline{F}(u) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_{i}> D]}$. Let us assume the variable $\overline{F}(u) T_i$ follows the limiting distribution $F_{\theta }$ from (2), i.e. $P(\overline{F}(u) T_i \le x) = 1-\theta \exp (-\theta x)$ for $x \ge 0$. For the sake of simplicity, we put $d = \overline{F}(u) D$ and obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = P(T_i>D) = 1-P\left( \overline{F}(u) T_i \le d\right) = \theta \mathrm e^{-\theta d}. \end{aligned}$$

(14)

Notice that for $x>0$ it holds

$$\begin{aligned} P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}> x\right) = P\left( \overline{F}(u) (T_i-D)> x \mid T_{i}> D\right) \cdot P(T_i>D). \end{aligned}$$

(15)

Let us take $P(T_i>D) = \theta \mathrm e^{-\theta d}$ from (14), and suppose $P\left( \overline{F}(u) (T_i-D)> x \mid T_{i}> D\right) = \mathrm e^{-\theta x}$ from (6). Since $\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}$ is a non-negative random variable, by applying (15) we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})&= \int _0^{\infty } P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}> x\right) \,\mathrm {d}x \\&= \theta \mathrm e^{-\theta d} \int _0^{\infty } e^{-\theta x} \,\mathrm {d}x = \mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})^2&= 2\int _0^{\infty } x P\left( \overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]} > x\right) \,\mathrm {d}x\\&= 2 \theta \mathrm e^{-\theta d} \int _0^{\infty } x \mathrm e^{-\theta x} \,\mathrm {d}x = \frac{2}{\theta } e^{-\theta d}. \end{aligned}$$

Accounting for the independence of the large interexceedance times, we have

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}U&= (N-1) {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = (N-1)\theta \mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {E}}\,}}V&= (N-1) {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) = (N-1)\mathrm e^{-\theta d},\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]})^2 - \left( {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) \right) ^2 \right] \\&= (N-1) e^{-\theta d} \left( \frac{2}{\theta } - e^{-\theta d} \right) ,\\ {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) - {{\,\mathrm{\mathrm {E}}\,}}(\varvec{1}_{[T_{i}> D]}) {{\,\mathrm{\mathrm {E}}\,}}(\overline{F}(u) (T_i-D) \varvec{1}_{[T_{i}> D]}) \right] \\&= (N-1) \mathrm e^{-\theta d}(1-\theta \mathrm e^{-\theta d}), \end{aligned}$$

where ${{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \overline{F}(u) (T_j-D) \varvec{1}_{[T_j > D]}) = 0$ for $i \ne j$ by the assumption. We put the foregoing into (13) and obtain

$${{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \theta + \frac{1}{N-1}\mathrm e^{\theta d} = \theta + \frac{1}{N-1} + \frac{\theta d}{N-1} + O(\theta ^2).$$

1.1.2 Bias correction under the penultimate approximation

Let us assume the distribution of each time $T_i$ is given by $P(T_i > n) = \theta p^{n \theta }$, $n=1, 2, \dots$, where $\theta \in (0,1]$ and $p \in (0,1)$. Further on, it will be more convenient to work with the probability mass function $h(n) = P(T_i> n-1) - P(T_i > n)$ of $T_i$. This takes the form of

$$\begin{aligned} h(n) = \left\{ \begin{array}{ll} 1-\theta p^{\theta }, &{} n=1,\\ \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} &{}, n=2, 3, \dots . \end{array} \right. \end{aligned}$$

(16)

Let us again assume the times $T_1, \dots , T_{N-1}$ exceeding D can be treated as approximately independent. Let us denote $U = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}$ and $V = (1-p) \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}$. From (13) we may derive the first-order bias of the estimator $\widehat{\theta } = U/V$ under the penultimate distribution (16). Since

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]}\right)&= P(T_i>D) = \theta p^{D \theta },\\ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]}\right)&= \sum _{n=1}^{\infty } n \varvec{1}_{[n>D]} h(n) = \sum _{n=D+1}^{\infty } n \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} = \frac{\theta p^{D\theta }}{1-p^{\theta }}\left[ 1+D(1-p^{\theta })\right] ,\\ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i^2 \varvec{1}_{[T_i>D]}\right)&= \sum _{n=1}^{\infty } n^2 \varvec{1}_{[n>D]} h(n) = \sum _{n=D+1}^{\infty } n^2 \theta p^{n \theta } \frac{1-p^{\theta }}{p^{\theta }} \\&= \frac{\theta p^{D \theta }}{(1-p^{\theta })^2} \left( p^{\theta } + \left[ 1+D(1-p^{\theta }) \right] ^2 \right) ,\\ \end{aligned}$$

and ${{\,\mathrm{\mathrm {cov}}\,}}\left( \varvec{1}_{[T_i>D]}, T_i \varvec{1}_{[T_i>D]} \right) = {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) - {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]} \right) ,$ straightforward computations lead us to

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}U&= (N-1){{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]} \right) = (N-1)\theta p^{D \theta },\\ {{\,\mathrm{\mathrm {E}}\,}}V&= (1-p)(N-1) \left[ {{\,\mathrm{\mathrm {E}}\,}}\left( T_i \varvec{1}_{[T_i>D]}\right) - D {{\,\mathrm{\mathrm {E}}\,}}\left( \varvec{1}_{[T_i>D]}\right) \right] = (N-1) \theta p^{D \theta } \frac{1-p}{1-p^{\theta }}, \end{aligned}$$

Since ${{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(\varvec{1}_{[T_i>D]}, \varvec{1}_{[T_j>D]}) = {{\,\mathrm{\mathrm {cov}}\,}}(T_i \varvec{1}_{[T_i>D]}, T_j \varvec{1}_{[T_j>D]}) = 0$ for $i \ne j$ by the assumption, we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (1-p) {{\,\mathrm{\mathrm {cov}}\,}}\left( \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}, \sum _{i=1}^{N-1} (T_i-D)\varvec{1}_{[T_i>D]} \right) \\&= (1-p) \left[ \sum _{i=1}^{N-1} {{\,\mathrm{\mathrm {cov}}\,}}\left( \varvec{1}_{[T_i>D]}, T_i \varvec{1}_{[T_i>D]} \right) - D \sum _{i=1}^{N-1} {{\,\mathrm{\mathrm {var}}\,}}\left( \varvec{1}_{[T_i>D]} \right) \right] ,\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (1-p)^2 (N-1) {{\,\mathrm{\mathrm {var}}\,}}\left[ (T_i-D) \varvec{1}_{[T_i>D]} \right] = (1-p)^2 (N-1) \left[ {{\,\mathrm{\mathrm {var}}\,}}(T_i \varvec{1}_{[T_i>D]}) \right. \\&\left. \quad\;+ D^2 {{\,\mathrm{\mathrm {var}}\,}}(\varvec{1}_{[T_i>D]}) - 2D {{\,\mathrm{\mathrm {cov}}\,}}(T_i \varvec{1}_{[T_i>D]}, \varvec{1}_{[T_i>D]}) \right] \end{aligned}$$

Since $\varvec{1}_{[T_i>D]}$ is the Bernoulli random variable equal to 1 with probability $\theta p^{D \theta }$, we have ${{\,\mathrm{\mathrm {var}}\,}}(\varvec{1}_{[T_i>D]}) = \theta p^{D \theta } (1- \theta p^{D \theta })$. It yields

$$\begin{aligned} {{\,\mathrm{\mathrm {cov}}\,}}(U,V)&= (N-1) \theta p^{D \theta } (1-\theta p^{D \theta }) \frac{1-p}{1-p^{\theta }},\\ {{\,\mathrm{\mathrm {var}}\,}}V&= (N-1)\frac{ p^{D \theta }}{\left( 1-p^{\theta }\right) ^2}\left[ \theta (1+p^{\theta })-\theta ^2p^{D\theta }\right] (1-p)^2. \end{aligned}$$

Hence, if $T_i$ is described by the penultimate distribution (16), from (13) we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta } \approx \frac{1-p^{\theta }}{1-p} + \frac{1}{N-1}\frac{(1-p^{\theta })p^{\theta }}{(1-p)\theta p^{D \theta }}. \end{aligned}$$

(17)

If we replace $\overline{F}(u)$ by $1-p$ in (9) and substitute $\widehat{\theta }$ by (17), then

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}\widehat{\theta }^\mathrm{BC} \approx \frac{1}{N-1+(1-p)D} \left[ (N-1)\left( \frac{1-p^{\theta }}{1-p} + \frac{1}{N-1}\frac{(1-p^{\theta })p^{\theta }}{(1-p)\theta p^{D \theta }} \right) - 1 \right] \\ = \theta + \frac{1-p}{2 (N-1)} \left[ 1 + \theta (N-4) - \theta ^2 (N-1) \right] + O((1-p)^2), \end{aligned}$$

where the last equality follows from the Taylor expansion as $p \rightarrow 1$.

1.2 Proof of Theorem 2

The bias-corrected estimator $\widehat{\theta }^\mathrm{BC}$

Let us denote $\overline{F}(u_n)$ by $p_n$. First, let us deal with the bias-corrected estimator $\widehat{\theta }^\mathrm{BC}$ that is of the form

$$\widehat{\theta }^\mathrm{BC} = \frac{N-1}{N-1+p_n D} \cdot \frac{\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}}{p_n \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}} - \frac{1}{N-1+p_n D},$$

where $N = \sum _{i=1}^n \varvec{1}_{[X_i > u_n]}$. From Lemma B.3 in Ferro and Segers (2002) we already know $N/(np_n) \xrightarrow {p}1$, i.e. $N = np_n (1+o_p(1))$. In particular, $N \xrightarrow {p}\infty$ for $n \rightarrow \infty$. Since $p_n = o(1)$, it is obvious that $(N-1)/(N-1+p_n D) \xrightarrow {p}1$, and $1/(N-1+p_n D) \xrightarrow {p}0$.

The term $\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}$

Let us define

$$Y_n = \sum _{i=1}^{n-D} \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]}.$$

It should be pointed out that $Y_n = \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]} + \varvec{1}_{[j_N \le n-D]}$, where $j_N$ denotes the largest exceedance time of $u_n$ (i.e. the last exceedance) within the series $X_1, \dots , X_n$. As shown in Ferro and Segers (2002), it holds $n-j_N = o_p(n)$. This particularly means that $n-j_N \xrightarrow {p}\infty$; therefore $\varvec{1}_{[j_N \le n-D]} = \varvec{1}_{[n-j_N \ge D]} \xrightarrow {p}1$.

Based on the stationarity and the $D^{(D+1)}(u_n)$ condition, it is implied that

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}Y_n&= (n-D) P\left( X_1>u_n\right) P\left( M_{1,1+D} \le u_n \mid X_1>u_n\right) = (n-D) p_n \theta (1 + o(1)). \end{aligned}$$

Let us put $I_i = \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]}$ for integer $1 \le i \le n-D$. Notice the m-dependence implies ${{\,\mathrm{\mathrm {cov}}\,}}(I_i, I_j) = 0$ for all $j > i+D+m$, and the stationarity implies that ${{\,\mathrm{\mathrm {E}}\,}}I_i = {{\,\mathrm{\mathrm {E}}\,}}I_1$. We obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {var}}\,}}Y_n&= \sum _{i=1}^{n-D} \sum _{j=1}^{n-D} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) \le 2 \sum _{i=1}^{n-D} \sum _{j=i}^{n-D} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) = 2 \sum _{i=1}^{n-2D-m} \ \sum _{j=i}^{i+D+m} {{\,\mathrm{\mathrm {cov}}\,}}\left( I_i, I_j \right) \\&= 2 \sum _{i=1}^{n-2D-m} \ \sum _{j=i}^{i+D+m} \left[ {{\,\mathrm{\mathrm {E}}\,}}(I_i I_j) - {{\,\mathrm{\mathrm {E}}\,}}I_1 \cdot {{\,\mathrm{\mathrm {E}}\,}}I_1 \right] \end{aligned}$$

where ${{\,\mathrm{\mathrm {E}}\,}}I_1 = P\left( X_1> u_n\right) P\left( M_{1,D+1} \le u_n \mid X_1 > u_n\right) = p_n \theta (1+o(1))$ based on the $D^{(D+1)}(u_n)$ condition, and

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}(I_i I_j)&= P\left( X_i> u_n, M_{i,i+D} \le u_n, X_j> u_n, M_{j,j+D} \le u_n\right) \\&= P\left( X_j> u_n, M_{j,j+D} \le u_n \mid X_i> u_n, M_{i,i+D} \le u_n\right) P\left( M_{i,i+D} \le u_n \mid X_i> u_n\right) \\&\quad\times P(X_i > u_n) \le p_n \theta (1 + o(1)), \end{aligned}$$

for $i+D+1 \le j \le i+D+m$. For $j \le i+D$ the events $I_i$ and $I_j$ conflict; therefore ${{\,\mathrm{\mathrm {E}}\,}}(I_i I_j) = 0$. Based on this, ${{\,\mathrm{\mathrm {var}}\,}}Y_n \le 2 (n-2D-m) m \left[ p_n \theta - p_n^2 \theta ^2 +o(1) \right] \le 2 n m p_n \theta (1+o(1))$.

Putting ${{\,\mathrm{\mathrm {E}}\,}}Y_n$ and ${{\,\mathrm{\mathrm {var}}\,}}Y_n$ together, the expectation ${{\,\mathrm{\mathrm {E}}\,}}[ \left( Y_n \{n p_n \theta (1+o(1))\}^{-1} - 1 \right) ^2 ]$ is bounded by some positive constant times $2 m (n p_n \theta )^{-1}$. Since $2 m (n p_n \theta )^{-1} \rightarrow 0$, this gives $Y_n = n p_n \theta (1+o_p(1))$.

The term $\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}$

Let us define

$$Z_n = \sum _{i=1}^{n-D} \Big [ \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]} \cdot \sum _{j=1}^{n-i} \varvec{1}_{[M_{i,i+j} \le u_n]} \Big ].$$

It can be seen that $Z_n = \sum _{i=1}^{N-1} (T_i-1) \varvec{1}_{[T_i>D]} + (n-j_N) \varvec{1}_{[j_N \le n-D]}$, and

$$\begin{aligned} \sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}&= Z_n - (n-j_N) \varvec{1}_{[j_N \le n-D]} - (D-1) \sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\\&= Z_n + o_p(n) - (D-1) (Y_n + O_p(1)). \end{aligned}$$

Moreover, we can take advantage of the following expression

$$Z_n = \sum _{i=1}^{n-D} \Big [ \varvec{1}_{[X_i>u_n, M_{i,i+D} \le u_n]} \Big ( D + \sum _{j=1}^{n-D-i} \varvec{1}_{[M_{i,i+D+j} \le u_n]} \Big ) \Big ] = DY_n + V_n,$$

where

$$V_n = \sum _{i=1}^{n-D} \sum _{j=1}^{n-D-i} \varvec{1}_{[X_i>u_n, M_{i,i+D+j} \le u_n]}.$$

Therefore $\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]} = V_n + Y_n + o_p(n)$.

To establish consistency of the estimator $\widehat{\theta }^\mathrm{BC}$, we only need to show $n^{-1}V_n \xrightarrow {p}1$. Thus, we need to prove (i) $n^{-1} {{\,\mathrm{\mathrm {E}}\,}}V_n \rightarrow 1$ and (ii) $n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1$.

By stationarity of the process we obtain

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}V_n&= \sum _{i=1}^{n-D} \sum _{j=1}^{n-D-i} P\left( X_i>u_n, M_{i,i+D+j} \le u_n\right) \\&= p_n \sum _{j=1}^{n-D} (n-D-j) P\left( M_{1,1+D+j} \le u_n \mid X_1>u_n\right) . \end{aligned}$$

We rewrite the sum as an integral and observe

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}V_n&= p_n \int _{1}^{n-D} (n-D-\lceil s \rceil ) P\left( M_{1,1+D+\lceil s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s\\&= p_n r_n \int _{r_n^{-1}}^{(n-D)/r_n} (n-D-\lceil r_n s \rceil ) P\left( M_{1,1+D+\lceil r_n s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s. \end{aligned}$$

We apply Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) that specify the convergence of the probability of maximum, and by the dominated convergence theorem it follows

$$n^{-1} {{\,\mathrm{\mathrm {E}}\,}}V_n \rightarrow \tau \int _0^{\infty } \theta \mathrm e^{-\theta \tau s} \,\mathrm {d}s = 1.$$

Hereby we have proven the first point concerning $V_n$.

Now we focus on (ii) $n^{-2} {{\,\mathrm{\mathrm {E}}\,}}(V_n^2) \rightarrow 1$. For integers $1 \le i \le n-D$, $1 \le j \le n-D-i$, put $I_{i,j} = \varvec{1}_{[X_i>u_n, M_{i,i+D+j} \le u_n]}$. We may write

$${{\,\mathrm{\mathrm {E}}\,}}(V_n^2) = \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=1}^{n-D} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l}),$$

where, accounting m-dependence and stationarity, the expected values are as follows

$$\begin{aligned} {{\,\mathrm{\mathrm {E}}\,}}I_{i,j}&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,j},&{{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l})&= 0, \text { for } i < k \le i+D+j,\\ {{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{i,l})&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{j,l\}},&{{\,\mathrm{\mathrm {E}}\,}}(I_{i,j} I_{k,l})&= {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} \cdot {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \text { for } k > i+D+j+m. \end{aligned}$$

We can write ${{\,\mathrm{\mathrm {E}}\,}}(V_n^2) = A_n + B_n + C_n$, where

$$\begin{aligned} A_n&= \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{l=1}^{n-D-i} {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{ j,l \}},\\ B_n&= 2 \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=i+D+j+1}^{i+D+j+m} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}I_{1,\max \{ j,l \}}\\ C_n&= 2 \sum _{i=1}^{n-D} \ \sum _{j=1}^{n-D-i} \ \sum _{k=i+D+j+m+1}^{n-D} \ \sum _{l=1}^{n-D-k} {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} \cdot {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \end{aligned}$$

At the same time, ${{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = P\left( X_1>u_n, M_{1,1+D+l} \le u_n\right) \le p_n C \gamma ^{(D+l)/r_n}$ for some $C>0$ and $0< \gamma < 1$, which follows from Lemma B.1 (Ferro and Segers 2002). We obtain

$$\begin{aligned} A_n&\le 2 (n-D) \sum _{j=1}^{n-D} \sum _{l=j}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = 2 (n-D) \sum _{l=1}^{n-D} l {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \le 2(n-D) \sum _{l=1}^{n-D} l p_n C \gamma ^{\frac{D+l}{r_n}},\\ B_n&\le 4m(n-D) \sum _{j=1}^{n-D} \sum _{l=j}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} = 4m (n-D) \sum _{l=1}^{n-D} l {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \le 4m(n-D) \sum _{l=1}^{n-D} l p_n C \gamma ^{\frac{D+l}{r_n}}. \end{aligned}$$

Therefore, by the same arguments as in Ferro and Segers (2002), both $A_n$, $B_n$ are $o(n^2)$.

For $C_n$ let us start with changing the summation order and derive that

$$\begin{aligned} C_n&= 2 \sum _{j=1}^{n-D} \sum _{l=1}^{n-D} {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l} \sum _{i=1}^{n-D-j} \sum _{k=i+D+j+m+1}^{n-D-l} 1\\&= \sum _{j=1}^{n-D} \sum _{l=1}^{n-D} (n-2D-l-j-m) (n-2D-l-j-m-1)_+ {{\,\mathrm{\mathrm {E}}\,}}I_{1,j} {{\,\mathrm{\mathrm {E}}\,}}I_{1,l}, \end{aligned}$$

where $a_+$ denotes $\max \{ a,0 \}$. Let us rewrite the sum as an integral

$$\begin{aligned} C_n&= \int \limits _{1}^{n-D} \int \limits _{1}^{n-D} (n-2D-\lceil t \rceil -\lceil s \rceil -m) (n-2D-\lceil t \rceil -\lceil s \rceil -m-1)_+ \\&\quad\times p_n^2 P\left( M_{1,1+D+\lceil t \rceil } \le u_n \mid X_1>u_n\right) P\left( M_{1,1+D+\lceil s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s \,\mathrm {d}t\\&= p_n^2 r_n^2 \int \limits _{r_n^{-1}}^{\frac{n-D}{r_n}} \int \limits _{r_n^{-1}}^{\frac{n-D}{r_n}} (n-2D-\lceil r_n t \rceil -\lceil r_n s \rceil -m) (n-2D-\lceil r_n t \rceil -\lceil r_n s \rceil -m-1)_+\\&\quad\times P\left( M_{1,1+D+\lceil r_n t \rceil } \le u_n \mid X_1>u_n\right) P\left( M_{1,1+D+\lceil r_n s \rceil } \le u_n \mid X_1>u_n\right) \,\mathrm {d}s \,\mathrm {d}t. \end{aligned}$$

Let us use Lemma B.1 and Lemma B.2 from Ferro and Segers (2002) again, and by the dominated convergence theorem obtain

$$n^{-2} C_n \rightarrow \tau ^2 \int _0^{\infty } \int _0^{\infty } \theta ^2 \mathrm e^{-\theta \tau t} \mathrm e^{-\theta \tau s} \,\mathrm {d}s \,\mathrm {d}t = 1.$$

The estimator $\widetilde{\theta }_\mathrm{T}$

From the previous paragraphs we already know $\widehat{\theta }^\mathrm{BC} \xrightarrow {p}\theta$. It remains to show that the estimator $\widetilde{\theta }_\mathrm{T}$ is consistent as well. Let us rewrite the relation (11) to get $\widetilde{\theta }_\mathrm{T} = g_n\left( \widehat{\theta }^\mathrm{BC} \right)$, where

$$g_n(t) = -\frac{p_n}{2(N-1)} + t \left( 1 - \frac{p_n(N-4)}{2(N-1)} \right) + \frac{t^2 p_n}{2}.$$

From the continuous mapping theorem it follows $g_n\left( \widehat{\theta }^\mathrm{BC} \right) \xrightarrow {p}g_n(\theta )$. Since $p_n = o(1)$ and $N = n p_n (1+o_p(1)) = o_p(n)$, it is clear that $g_n(\theta ) \xrightarrow {p}\theta$ for $n \rightarrow \infty$.

$\square$

1.3 Extremal index estimators from the simulation study

Consider a sequence of interexceedance times $T_1, \dots , T_{N-1}$, and let $T_{(1)} \le \dots \le T_{(N-1)}$ be the corresponding order statistics. In Sect. 4, properties of the truncated estimator $\widehat{\theta }_\mathrm{T}$ and the following competitive estimators based on interexceedance times are assessed.

Intervals estimator

The intervals estimator $\widehat{\theta }_{\text {I}}$ of Ferro and Segers (2003) is

$$\begin{aligned} \widehat{\theta }_{\text {I}} = \left\{ \begin{array}{ll} \min \{ 1,\tilde{\theta } \} &{} \quad \mathrm{if}~\max \{T_i:1\le i\le N-1\}\le 2,\\ \min \{ 1,\tilde{\theta }^{*} \} &{} \quad \mathrm{if}~\max \{T_i:1\le i\le N-1\}> 2, \end{array}\right. \end{aligned}$$

(18)

where

$$\begin{aligned} \tilde{\theta } = \frac{2\left( \sum \limits _{i=1}^{N-1}T_i\right) ^2}{(N-1)\sum \limits _{i=1}^{N-1} T_i^2},\qquad \tilde{\theta }^{*} = \frac{2\left[ \sum \limits _{i=1}^{N-1}(T_i-1)\right] ^2}{(N-1)\sum \limits _{i=1}^{N-1} (T_i-1)(T_i-2)}. \end{aligned}$$

K-gap estimator

The K-gap estimator $\widehat{\theta }_{\text {SD}}$ of Süveges and Davison (2010) is based on the sample $S_i^{(K)} = \max \{ T_i-K;0 \}$, $i=1,\dots , N-1$, for some $K \ge 0$. Considering the limit distribution from Süveges and Davison (2010) and under some additional assumptions on the local dependence, the corresponding log-likelihood function is of the form

$$\begin{aligned} \ell _{\text {SD}}\left( \theta \right) = (N-1-N_C) \log (1-\theta ) + 2N_C\log \theta - \theta \sum \limits _{i=1}^{N-1} \overline{F}(u) S_i^{(K)}, \end{aligned}$$

where $N_C = \sum _{i=1}^{N-1} \varvec{1}_{[T_i > K]}$ is the number of interexceedance times exceeding the value of K. The K-gap estimator is obtained by maximizing the log-likelihood function, i.e.

$$\begin{aligned} \widehat{\theta }_{\text {SD}} = {{\,{\text {arg max}}\,}}_{0 \le \theta \le 1} \ell _{\text {SD}}\left( \theta \right) . \end{aligned}$$

(19)

Censored estimator

The estimator $\widehat{\theta }_{\text {C}}$ of Holešovský and Fusek (2020) is based on censoring of the interexceedance times. For a given $D \ge 0$ assume that $T_{(N-N_D-1)} \le D < T_{(N-N_D)}$ for a positive integer $N_D$. The times $T_{(1)}, \dots , T_{(N-N_D-1)}$ are considered censored, while the times exceeding the value of D are considered observed. The corresponding log-likelihood function is

$$\begin{aligned} \ell _{\text {C}}\left( \theta \right)&= (N-1-N_D) \log \left( 1-\theta e^{-\theta \overline{F}(u) D}\right) + \log \frac{(N-1)!}{(N-1-N_D)!}\\&\quad+ 2 N_D \log \theta -\theta \sum _{i=N-N_D}^{N-1} \overline{F}(u) T_{(i)}. \end{aligned}$$

The censored estimator is obtained by maximizing the log-likelihood function, i.e.

$$\begin{aligned} \widehat{\theta }_{\text {C}} = {{\,{\text {arg max}}\,}}_{0 \le \theta \le 1} \ell _{\text {C}}\left( \theta \right) . \end{aligned}$$

(20)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holešovský, J., Fusek, M. Improved interexceedance-times-based estimator of the extremal index using truncated distribution. Extremes 25, 695–720 (2022). https://doi.org/10.1007/s10687-022-00444-8

Download citation

Received: 08 October 2021
Revised: 08 June 2022
Accepted: 08 June 2022
Published: 24 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10687-022-00444-8

Keywords

AMS 2000 Subject Classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved interexceedance-times-based estimator of the extremal index using truncated distribution

Abstract

Access this article

Similar content being viewed by others

Estimation of the extremal index using censored distributions

On the Asymptotic Locations of the Largest and Smallest Extremes of a Stationary Sequence

An adaptive truncation method for inference in Bayesian nonparametric models

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher’s Note

Appendix

1.1 Derivation of the bias corrections

1.1.1 Bias correction of the estimator \(\widehat{\theta }\) under the limit distribution

1.1.2 Bias correction under the penultimate approximation

1.2 Proof of Theorem 2

The bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\)

The term \(\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\)

The term \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\)

The estimator \(\widetilde{\theta }_\mathrm{T}\)

1.3 Extremal index estimators from the simulation study

Intervals estimator

K-gap estimator

Censored estimator

Rights and permissions

About this article

Cite this article

Keywords

AMS 2000 Subject Classifications

Navigation

Improved interexceedance-times-based estimator of the extremal index using truncated distribution

Abstract

Access this article

Similar content being viewed by others

Estimation of the extremal index using censored distributions

On the Asymptotic Locations of the Largest and Smallest Extremes of a Stationary Sequence

An adaptive truncation method for inference in Bayesian nonparametric models

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher’s Note

Appendix

Appendix

1.1 Derivation of the bias corrections

1.1.1 Bias correction of the estimator \(\widehat{\theta }\) under the limit distribution

1.1.2 Bias correction under the penultimate approximation

1.2 Proof of Theorem 2

The bias-corrected estimator \(\widehat{\theta }^\mathrm{BC}\)

The term \(\sum _{i=1}^{N-1} \varvec{1}_{[T_i>D]}\)

The term \(\sum _{i=1}^{N-1} (T_i-D) \varvec{1}_{[T_i>D]}\)

The estimator \(\widetilde{\theta }_\mathrm{T}\)

1.3 Extremal index estimators from the simulation study

Intervals estimator

K-gap estimator

Censored estimator

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS 2000 Subject Classifications

Search

Navigation