Threshold selection in univariate extreme value analysis

Schneider, Laura Fee; Krajina, Andrea; Krivobokova, Tatyana

doi:10.1007/s10687-021-00405-7

Threshold selection in univariate extreme value analysis

Open access
Published: 16 April 2021

Volume 24, pages 881–913, (2021)
Cite this article

Download PDF

You have full access to this open access article

Extremes Aims and scope Submit manuscript

Threshold selection in univariate extreme value analysis

Download PDF

Laura Fee Schneider ORCID: orcid.org/0000-0002-1429-8157¹,
Andrea Krajina¹ &
Tatyana Krivobokova²

11 Citations
Explore all metrics

Abstract

Threshold selection plays a key role in various aspects of statistical inference of rare events. In this work, two new threshold selection methods are introduced. The first approach measures the fit of the exponential approximation above a threshold and achieves good performance in small samples. The second method smoothly estimates the asymptotic mean squared error of the Hill estimator and performs consistently well over a wide range of processes. Both methods are analyzed theoretically, compared to existing procedures in an extensive simulation study and applied to a dataset of financial losses, where the underlying extreme value index is assumed to vary over time.

Estimating the Extremal Coefficient: A Simulation Comparison of Methods

Revisiting the Maximum Likelihood Estimation of a Positive Extreme Value Index

Article 01 March 2015

Modern Extreme Value Theory at the Interface of Risk Management, Bayesian Networks and Heavy-Tailed Time Series

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Extreme value analysis of heavy-tailed distributions is important for various applications. In seismology and climatology, for example, statistics of extremes are used to study earthquakes (Beirlant et al. 2018) or heavy precipitation (Carreau et al. 2017). Another important field of research is analysing high financial losses, which becomes particularly interesting if the losses depend on covariates (Chavez-Demoulin et al. 2016; Hambuckers et al. 2018).

To investigate the behaviour of heavy tails, we consider random variables from the max-domain of attraction (DoA) of a Fréchet distribution. Let $X_{1},\dots ,X_{n} $ be independent and identically distributed (i.i.d.) random variables with a distribution function F, where F is in the DoA of an extreme value distribution (evd) G_γ with extreme value index γ > 0. This means there exist sequences a_n > 0 and b_n real, s.t.

$$ \begin{array}{@{}rcl@{}} \lim_{n\rightarrow\infty} F^{n}(a_{n} x+b_{n}) = G_{\gamma}(x) : = \exp\left( -x^{-1/\gamma}\right). \end{array} $$

In this situation the following first order condition holds,

$$ \begin{array}{@{}rcl@{}} \underset{t\rightarrow\infty}{\lim}\ \frac{1-F(tx)}{1-F(t)}=x^{-1/\gamma},\quad x>0, \end{array} $$

(1)

i.e. the survival function 1 − F is regularly varying with index − 1/γ. Distributions fulfilling this condition are called Pareto-type distributions, because they only differ from the Pareto distribution by a slowly varying function ℓ_F(x), i.e. 1 − F(x) = x^− 1/γℓ_F(x).

We can interpret the quotient in Eq. 1 as a conditional probability, from which it follows directly that

$$ \begin{array}{@{}rcl@{}} \frac{X_{1}}{t} \Big| X_{1}>t\ \overset{\mathcal{D}}{\longrightarrow}\ & P, \text{ as } t\rightarrow\infty \text{ and } P\sim \text{Pareto}\left( 1,\frac{1}{\gamma}\right), \\ \log\left( \frac{X_{1}}{t}\right)\Big| X_{1}>t\ \overset{\mathcal{D}}{\longrightarrow}\ & E, \text{ as } t\rightarrow\infty \text{ and } E\sim \text{Exp}\left( \frac{1}{\gamma}\right), \end{array} $$

(2)

where $\text {Pareto}\left (1,1/\gamma \right )$ denotes the Pareto distribution with the scale parameter 1 and the shape parameter 1/γ. Thus, for a sufficiently large threshold t the data above this threshold can be modelled by a Pareto or an exponential distribution. In this article we concentrate on the exponential approximation and utilize it for inference on the extreme value index. It is common to consider the threshold t = X_(n−k,n) and choose the sample fraction k instead of t, where X_(1,n) ≤⋯ ≤ X_(n,n) denote the order statistics of a sample of size n. In this case, a natural estimator for γ under the exponential approximation of the log-spacings $Y_{(i,k)}:=\log (X_{(n-i+1,n)})-\log (X_{(n-k,n)})$ is their mean, the Hill estimator (Hill 1975),

$$ \hat{\gamma}_{k} := \frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i+1,n)}}{X_{(n-k,n)}} \right) = \frac{1}{k}\sum\limits_{i=1}^{k} Y_{(i,k)}. $$

(3)

The Hill estimator is still among the most popular and well-known estimators for the extreme value index, although its sample path as a function in k can be highly unstable and estimation, therefore, crucially depends on the choice of the sample fraction k. This dependence highlights the difficulties in estimating γ in the peaks-over-threshold approach. To select a threshold above which the data can be used for statistical inference in the tail is one of the most fundamental problems in the field of extreme value analysis.

Due to the importance of this task, an appropriate choice of the threshold has been discussed extensively in extreme value research over the last decades and suggested solutions cover a variety of methodologies. We give a short summary of different types of approaches and stress the specific difficulties that arise. We mainly concentrate on the methods considered in the simulation study in Section 4. More comprehensive reviews about threshold selection can be found in Scarrott and MacDonald (2012) and Dey and Yan (2016).

One basic concept in threshold selection is data visualisation, which is also discussed more deeply in Kratz and Resnick (1996) and Drees et al. (2000). Popular graphical diagnostics used in this context are the Zipf plot, Hill plot, QQ-plot or the mean-excess plot, to name a few. A major drawback of these methods is their subjectivity due to the necessarily personal interpretation of the plot. Further, it is a burden to choose each threshold manually, especially in high dimensional settings or when analysing many samples. Easier ways to select the sample fraction are rule-of-thumb approaches such as using the upper 10% of the data (DuMouchel 1983) or $k=\sqrt {n}$ (Ferreira et al. 2003). Reiss and Thomas (2007) present a procedure that tries to find a region of stability among the estimates of the extreme value index. Their method depends on a tuning parameter, whose choice is further analysed in Neves and FragaAlves (2004). To our knowledge, no theoretical analysis exists for this approach.

Besides these, and similar, heuristic approaches, there is a class of theoretically motivated procedures that target the optimal sample fraction for specific estimation tasks, such as quantile estimates (Ferreira et al. 2003), estimation of high probabilities (Hall and Weissman 1997) or the Hill estimator, see below. Further, some suggestions compare the empirical distribution to the fitted generalized Pareto distribution (GPD) via goodness-of-fit tests (Northrop and Coleman 2014; Wadsworth 2016; Bader et al. 2018) or by minimizing the distance between them (Pickands 1975; Gonzalo and Olmo 2004; Clauset et al. 2009), where the latter approach is theoretically analysed in (Drees et al. 2018). Alternatively, Bayesian procedures can be considered for threshold selection, e.g., the cross-validation approach suggested in Northrop et al. (2017) in the context of ocean storm severity.

Of particular interest to us are methods that aim to estimate the sample fraction $k_{\text {opt}} = \arg \min \limits _{k>0}\mathbb {E}(\hat {\gamma }_{k}-\gamma )^{2}$ which minimizes the asymptotic mean squared error (AMSE) of the Hill estimator. To construct an estimator for k_opt, Drees and Kaufmann (1998) utilize the Lepskii method and an upper bound on the maximum random fluctuation of $\hat {\gamma }_{k}$ around γ. To apply their approach it is necessary to choose several tuning parameters and to obtain consistent initial estimates for γ and a second-order parameter ρ. In Guillou and Hall (2001), a test statistic is constructed based on an accumulation of log-spacings, which is tested against a critical value that has to be chosen. Danielsson et al. (2001) introduce a double bootstrap approach to estimate the optimal sample fraction. They need to choose the number of bootstrap samples and a parameter n₁. For n₁, a data-driven, but computationally expensive selection method is provided, where the whole bootstrap procedure is repeated for various possible values of n₁. Another estimator for k_opt is given in Beirlant et al. (2002), which employs least squares estimates from an exponential regression approach. The method depends on an estimate for ρ and a sample fraction k₀. A different approach is taken in Goegebeur et al. (2008), where the properties of kernel test statistics regarding the estimation of bias are used in order to construct an estimator for the AMSE/γ and minimize it with respect to k. Here the choice or estimation of the second-order parameter ρ is needed.

In this paper, we contribute to the problem of threshold selection by introducing two new and easily applicable methods, which do not require the user to manually choose any tuning parameters. The first method presented in Section 2 is inspired by the idea of testing the exponential approximation. We estimate the integrated square error (ISE) of the exponential density under the assumption that the log-spacings are indeed exponentially distributed. The error functional we obtain, called the inverse Hill statistic (IHS), is very easy to compute and does not depend on any tuning parameters. Since this criterion is variable for small k, it can be additionally smoothed to improve the performance. The minimizing oracle sample fraction of IHS is asymptotically smaller than k_opt, as it is stricter against deviation from an exponential approximation. However, this estimator performs remarkably well in small samples, as illustrated in our simulation study.

In the second approach, we suggest a smooth estimator for the AMSE of the Hill estimator, called SAMSEE (smooth AMSE estimator). This estimator is based on a preliminary estimate of γ and an estimate of the bias of the Hill estimator. For the preliminary gamma estimate we use a generalized Jackknife approach in Gomes et al. (2000). By minimizing SAMSEE we estimate the optimal sample fraction k_opt. For estimation, the choice of a large sample fraction K is necessary, for which we present a data-driven selection procedure in Section 3. SAMSEE utilizes the idea of fixing ρ = − 1, which is justified by good performance in simulations and leads to a simpler and more robust estimator than employing an estimate $\hat {\rho }$.

After introducing the two novel threshold selection methods in Sections 2 and 3 we compare these methods to various other approaches in a numerical analysis in Section 4. In Section 5 the importance of automated threshold selection procedures is illustrated in an application, where we non-parametrically estimate an extreme value index that varies over time. This example illustrates how the new methods enable the selection of a threshold that depends on covariates.

The proof of Theorem 3, which describes the asymptotic behaviour of the bias estimator, as well as the auxiliary results, can be found in Appendix A. Appendix A provides additional results from our simulation study.

2 Inverse Hill statistic

In this section, we introduce the first threshold selection procedure by analysing the integrated square error (ISE) between the exponential density h_γ and its parametric estimator $h_{\hat {\gamma }_{k}}$ employing the Hill estimator,

$$ \text{ISE}(k):= \int \left\{h_{\gamma}(x) -h_{\hat{\gamma}_{k}}(x)\right\}^{2}\mathrm{d}x =\frac{1}{2\gamma}-\frac{2}{\gamma +\hat{\gamma}_{k}} + \frac{1}{2\hat{\gamma}_{k}} , $$

where $h_{\gamma }(x):=\frac {1}{\gamma }e^{-x/\gamma }$ and $\hat {\gamma }_{k}$ is the Hill estimator. The first term of ISE is constant and thus plays no role for selecting k. The last term of ISE is known, but the second term is not. Therefore, we cannot minimize ISE directly. Instead, we want to estimate and minimize its expectation under the exponential approximation. This is based on the idea of considering the hypothesis H₀ that the log-spacings Y_(i,k) are indeed exponentially distributed. Under H₀ the Hill estimator is gamma distributed as the sum of independent exponential random variables and the mean of ISE (MISE) can be calculated explicitly. We observe that MISE is a decreasing function in k under the exponential approximation,

$$ \text{MISE}(k)-\frac{1}{2\gamma}:= \mathbb{E}_{H_{0}}[\text{ISE}(k)]-\frac{1}{2\gamma} =-\frac{1}{\gamma}C(k)+\frac{k}{2(k-1)\gamma} , $$

(4)

where $C(k):=2\exp (k) k^{k} {\Gamma }(1-k,k)$ and ${\Gamma }(a,b):={\int \limits }_{b}^{\infty } t^{a-1}e^{-t} \mathrm {d}t$ denotes the upper incomplete gamma function. One can show that C(k) ≈ 1 + 1/(4k), i.e. it converges to 1 very fast, s.t. we obtain

$$ \lim_{k\rightarrow\infty} \mathbb{E}_{H_{0}}\left[ \frac{2}{\gamma+\hat{\gamma}_{k}}\right] = \frac{1}{\gamma} = \mathbb{E}_{H_{0}}\left[\frac{k-1}{k\hat{\gamma}_{k}}\right] . $$

This provides an estimator for the first term in Eq. 4 under H₀. However, owing to its high variability for small k, we instead want to find an estimator of the form $w/\hat {\gamma }_{k}$ for some w depending on k that minimizes the MSE of that estimator under the exponential approximation. To do so, we approximate its MSE in the following way,

$$ \begin{array}{@{}rcl@{}} \mathbb{E}_{H_{0}}\left[\left( \frac{w}{\hat{\gamma}_{k}}-\frac{2}{\hat{\gamma}_{k}+\gamma}\right)^{2}\right] \approx \frac{w^{2}k^{2}}{\gamma^{2}(k-1)(k-2)} -\frac{2wk}{\gamma^{2}(k-1)} + \frac{1}{\gamma^{2}}. \end{array} $$

(5)

The approximation depends on similar functions as C(k), which quickly become constant. The MSE in Eq. 5 is minimized for w = (k − 2)/k. Thus, we suggest the inverse Hill statistic

$$ \begin{array}{@{}rcl@{}} \text{IHS}(k) := \frac{1}{2\hat{\gamma}_{k}} -\frac{k-2}{\hat{\gamma}_{k} k} = \frac{4-k}{2\hat{\gamma}_{k} k} \end{array} $$

to estimate MISE(k) − (2γ)^− 1 and the sample fraction selected via minimizing IHS, $\hat {k}_{\text {IHS}} := \arg \min \limits _{1<k<n}\{ \text {IHS}(k)\}.$ By minimizing IHS we select a sample fraction where IHS starts increasing and thereby contradicts the behaviour of MISE under H₀. This criterion can be compared to hypothesis testing with a high significance level α, which implies seeking high confidence when deciding to not reject H₀. Further properties of $\hat {k}_{\text {IHS}}$ are analysed theoretically in Section 2.1 and for finite samples in a numerical study in Section 4.

Figure 1 illustrates that IHS exhibits high fluctuations for small k, which might make the automated selection of the minimum highly variable. To control this problematic behaviour we smooth IHS. More specifically, we want to estimate $\mathbb {E}[\text {IHS}]$ by considering the regression problem

$$ \begin{array}{@{}rcl@{}} \text{IHS}(k) = \mathbb{E}[\text{IHS}(k)] + \sigma\epsilon_{k},\ \ k=1,\dots,n, \end{array} $$

where σ > 0 and $\mathbb {E}[\epsilon _{k}]=0$. Due to the structure of the Hill estimator, the random variables 𝜖_k are highly dependent, which needs to be taken into account in estimation. In our simulations, we apply a Bayesian non-parametric procedure introduced by Serra et al. (2018) to obtain a smooth estimator for the expectation of IHS – denoted as sIHS – comprising less variation for small k, as illustrated in Fig. 1, and thereby improving automatic threshold selection. The approach implemented in the eBsc R-package is completely data-driven and non-parametric: both mean and covariance structure of the data are estimated non-parametrically with the smoothing parameters chosen from the data. Alternatively, one can employ any other smoothing technique, which accounts for dependence in the error (Opsomer et al. 2001; Krivobokova and Kauermann 2007; Lee et al. 2010). A very good choice is the well-established function gamm from the mgcv package. However, for this procedure some parameters need to be fixed in advance. We experimented with several settings and found that choosing the number of knots to be at least 40 (or more) and specifying the correlation structure to corAR1() delivers results very much comparable with the eBsc function.

We finally want to remark on the relation between IHS and ISE, which is given by

$$ \text{IHS} +\frac{1}{2\gamma}= \text{ISE} +\frac{2}{k\hat{\gamma}_{k}} +\frac{\hat{\gamma}_{k}- \gamma}{\hat{\gamma}_{k}(\hat{\gamma}_{k}+\gamma)}. $$

(6)

This equation points out that minimizing IHS does not minimize ISE, as IHS takes an additional bias term into account, and that the sign of the bias of $\hat {\gamma }_{k}$ plays a role. If the bias is negative, we suggest using

$$ \begin{array}{@{}rcl@{}} \text{IHS}^{-}(k) := \frac{4+k}{2\hat{\gamma}_{k} k}\quad \text{and} \quad \hat{k}_{\text{IHS}^{-}} := \arg\min_{1<k<n}\ \text{IHS}^{-}(k) \end{array} $$

instead of IHS, where the two cases can easily be distinguished by analysis of the Hill estimator for large k. If the bias of the Hill estimator is positive, which is the standard case, we can see from Eq. 6 that IHS selects smaller k (larger thresholds) than ISE. This is not surprising, because we estimate the expectation of the ISE under the hypothesis that the exponential approximation holds. This is a much more conservative error functional, meaning it is more strict against deviation from the exponential distribution.

In conclusion, with IHS we do not aim to estimate k_opt but to find a sample fraction where we can be very certain that the exponential approximation still holds. The performance of sIHS in small samples is illustrated in simulations and an application in Sections 4 and 5.

2.1 Theoretical analysis of IHS

In order to understand the IHS asymptotically, we consider the second-order condition,

$$ \underset{t\rightarrow\infty}{\lim} \frac{\frac{U(tx)}{U(t)}-x^{\gamma}}{A(t)} =x^{\gamma}\frac{x^{\rho}-1}{\rho}, $$

(7)

for x > 0 and with second-order parameter ρ < 0. Here, A(t) denotes a function converging to zero as t goes to infinity and |A| is regularly varying with index ρ. Further, U is defined by $U(x):=F^{\leftharpoonup }\!\left (1-\frac {1}{x}\right )$, where $F^{\leftharpoonup }$ denotes the left inverse of the distribution function F. In this setting the following asymptotic normality statements for the Hill estimator $\hat {\gamma }_{k}$ hold.

Theorem 1 (Theorem 3.2.5 in de Haan and Ferreira, 2006)

Let $X_{1},\dots ,X_{n}$ be i.i.d. random variables with distribution function F ∈DoA(G_γ) for γ > 0. If Eq. 7 holds and k is an intermediate sequence, i.e. HCode $k\rightarrow \infty $ and $k/n\rightarrow 0$ as $n\rightarrow \infty $, then

$$ \begin{array}{@{}rcl@{}} \sqrt{k}(\hat{\gamma}_{k} - \gamma) \overset{\mathcal{D}}{\longrightarrow}\ \mathcal{N}\left( \frac{\lambda}{(1-\rho)}, \gamma^{2}\right), \end{array} $$

as $n\rightarrow \infty $ and with $\lambda := \underset {k\rightarrow \infty }{\lim } \sqrt {k}A(n/k)$.

Theorem 2

Under the conditions of Theorem 1, it holds as $n\rightarrow \infty $ that

$$ \begin{array}{@{}rcl@{}} \sqrt{k} \left( \frac{1}{\hat{\gamma}_{k}}-\frac{1}{\gamma} \right) \overset{\mathcal{D}}{\longrightarrow}\ \mathcal{N}\left( \frac{-\lambda }{(1-\rho) \gamma^{2}}, \frac{1}{\gamma^{2}} \right). \end{array} $$

Proof

Apply the delta method to Theorem 1. □

Following the reasoning in de Haan and Ferreira (2006), page 78, the minimizing point of the AMSE of the Hill estimator can be found explicitly if considering A(t) = ct^ρ with c≠ 0. In this special case, the minimizing sample fraction can be expressed as

$$ k_{\text{opt}}= \left( \frac{\gamma^{2} (1-\rho)^{2}}{-2\rho c^{2}} \right)^{1/(1-2\rho)} n^{-2\rho/(1-2\rho)}. $$

(8)

Under the same assumption, we can calculate the minimizing point k_IHS of the asymptotic expectations of IHS and IHS⁻. Let $\mathbb {A}\mathbb {E}$ denote the asymptotic expectation referring to the expectation of the limiting distribution in Theorem 2. Then

$$ \begin{array}{@{}rcl@{}} k_{\text{IHS}} &:= \arg\min_{k}\ \mathbb{A}\mathbb{E}[\text{IHS}]=\arg\min_{k}\left\{ \frac{2}{\gamma k} +\frac{A(n/k)}{2\gamma^{2}(1-\rho) }\cdot \frac{k-4}{k} \right\} \\ &\approx \arg\min_{k}\left\{ \frac{2}{\gamma k} +\frac{A(n/k)}{2\gamma^{2}(1-\rho) } \right\} = \left( \frac{4\gamma(1-\rho)}{-\rho c}\right)^{1/(1-\rho)} n^{-\rho/(1-\rho)} . \end{array} $$

It is easy to check that the same formula holds for IHS⁻ if c is replaced by its absolute value. Comparing k_opt and k_IHS for a fixed $\rho >-\infty $ we obtain that

$$ \begin{array}{@{}rcl@{}} \frac{k_{\text{IHS}}}{k_{\text{opt}}} \approx \left( \frac{-\rho}{32}\cdot k_{\text{IHS}}\right)^{-1/(1-2\rho)}\approx d\cdot n^{\frac{\rho}{(1-2\rho)(1-\rho)}} \longrightarrow 0, \end{array} $$

(9)

as $n\rightarrow \infty $ and for a constant d depending on ρ, γ and c. This supports what Eq. 6 already suggested: minimizing IHS gives asymptotically a smaller k than k_opt. Thus, k_IHS asymptotically performs suboptimally for the Hill estimator but still is an intermediate sequence leading to consistent estimates asymptotically. For finite samples the ratio crucially depends on ρ, and k_IHS can also be larger than k_opt, as illustrated in Fig. 2. It also holds that $k_{\text {IHS}}/k_{\text {opt}}\rightarrow 1$, as $\rho \rightarrow -\infty $, since both sample fractions converge to n in this case.

Although k_IHS is of smaller order than k_opt asymptotically, the simulation study in Section 4 shows that $\hat {k}_{\text {IHS}}$ works quite well in small samples and in particular when used for quantile estimation of certain processes. We consider the following quantile estimator for the (1 − p)-quantile,

$$ \hat{q}_{k}(p) = X_{(n-k,n)} \left( \frac{k}{np} \right)^{\hat{\gamma}_{k}}, $$

(10)

as defined in Theorem 4.3.8 in de Haan and Ferreira (2006). This theorem shows that the sample fraction k_opt also minimizes the asymptotic relative MSE of $\hat {q}_{k}(p)$. For finite samples, however, the quantile estimator seems to benefit from k_IHS, since in this case IHS often works better than procedures estimating k_opt, see Section 4. This can have different reasons, two of which are illustrated by Fig. 3. Figure 3 displays the empirical expectation of IHS and of SAMSEE defined in Section 3, the empirical versions of the MSE of $\hat {\gamma }_{k}$ and the relative MSE of the quantile estimator,

$$ \text{MSEQ}:= \mathbb{E}\left[ \left( \frac{ \hat{q}_{k}(p)}{q(p)}-1 \right)^{2} \right]/ \log\left( \frac{k}{np}\right), $$

(11)

as used in Theorem 4.3.8 in de Haan and Ferreira (2006). The estimator $\hat {q}_{k}(p)$ is more sensitive to changes in k than $\hat {\gamma }_{k}$ and thus the empirical version of MSEQ seems to need much more samples than the MSE of the Hill estimator to reach a stable estimate.

On the left of Fig. 3, we observe that k_IHS is indeed smaller than k_opt for a Fréchet distribution, where ρ = − 1, but so is the minimizer of MSEQ. The graphic on the right highlights the similarities between MSE, MSEQ and IHS for the boundary case ρ = 0 of the Loggamma distribution. The method SAMSEE, which is defined in the next section, overestimates k_opt on average in the Loggamma case and its mean lies between k_IHS and k_opt for the Fréchet distribution.

3 Smooth AMSE estimator

In this section, we illustrate a way to estimate k_opt via minimizing a smooth estimator for the AMSE of the Hill estimator, called SAMSEE. From Theorem 1 it is easy to see that the AMSE of the Hill estimator is

$$ \mathbb{A}\mathbb{E}[(\hat{\gamma}_{k}-\gamma)^{2}]= \frac{\gamma^{2}}{k} + \frac{A(n/k)^{2}}{(1-\rho)^{2}}. $$

(12)

To estimate the AMSE as a function of k, we employ two estimators, one for γ and one for the bias term as a combination of ρ and A. First we explain how we estimate γ and then we define the bias estimator. For γ we consider the generalized Jackknife estimator $\hat {\gamma }_{k}^{\text {GJ}}$ introduced by Gomes et al. (2000) as $\gamma _{n}^{\mathrm {G}_{1}}$. This estimator is defined by

$$ M_{n,k}:= \frac{1}{k}\sum\limits_{i=1}^{k} Y_{(i,k)}^{2},\quad \hat{\gamma}^{\mathrm{V}}_{k}:=\frac{M_{n,k}}{2\hat{\gamma}_{k}} , \text{ and} \quad \hat{\gamma}_{k}^{\text{GJ}}:= 2\hat{\gamma}^{\mathrm{V}}_{k} - \hat{\gamma}_{k}, $$

(13)

where Y_(i,k) denotes the log-spacings as in Eq. 3. Note, that $\hat {\gamma }^{\mathrm {V}}_{k}$ is the de Vries estimator introduced under this name in de Haan and Peng (1998) and $\hat {\gamma }_{k}$ is the Hill estimator as above. The generalized Jackknife estimator has a reduced bias compared to the Hill estimator and is even asymptotically unbiased if ρ = − 1, see (2.11) in Gomes et al. (2000).

Danielsson et al. (2001) use $(\hat {\gamma }^{\mathrm {V}}_{k}-\hat {\gamma }_{k})$ to access the bias of $\hat {\gamma }_{k}$ and apply a double bootstrap procedure to stabilize this highly varying estimate. We also use the difference of two estimators for γ, but now consider the following averages of Hill estimators,

$$ \bar{\gamma}_{k,K} := \frac{1}{K-k+1}\sum\limits_{i=k}^{K} \hat{\gamma}_{i} \quad \text{and} \quad \bar{\gamma}_{k} := \bar{\gamma}_{1,k}= \frac{1}{k}\sum\limits_{i=1}^{k} \hat{\gamma}_{i}, $$

where k < K is chosen to obtain a stable bias estimate. The idea to average the Hill estimator in order to reduce variation in the sample path is also studied in Resnick and Stǎricǎ (1997). These averages naturally contain a lot of structural information about the underlying asymptotic bias and inspire the definition

$$ \hat{b}_{k,K}:= \bar{\gamma}_{k,K}-\bar{\gamma}_{K}. $$

(14)

Theorem 3

Under the conditions of Theorem 1 and for $k/K\rightarrow c$ with 0 < c < 1 as $n\rightarrow \infty $, it holds for $\hat {b}_{k,K}$ defined in Eq. 14 that

$$ \begin{array}{@{}rcl@{}} \sqrt{k}\cdot\hat{b}_{k,K} \overset{\mathcal{D}}{\longrightarrow} \mathcal{N}\left( \frac{-\rho\lambda}{(1-\rho)^{2}}\delta_{\rho}(c),\ \gamma^{2} \nu(c) \right), \end{array} $$

where δ_ρ(c) = (c^ρ − 1)/(−ρ(c^− 1 − 1)) and $\nu (c)=2c^{2}/(1-c)^{2}\cdot (1-c+c\log (c))$ with 0 ≤ ν(c) ≤ 1.

Proof

The proof is given in Appendix A. □

By Theorem 3 the estimator $\hat {b}_{k,K}$ has the following property if ρ = − 1, because in this case the function δ_− 1(c) is equal to 1,

$$ \mathbb{A}\mathbb{E}\left[\hat{b}_{k,K}\right]=\frac{-\rho A(n/k)}{(1-\rho)^{2}} = \frac{1}{2} \frac{A(n/k)}{(1-\rho)} =\frac{1}{2}\mathbb{A}\mathbb{E}\left[\hat{\gamma}_{k}-\gamma\right]. $$

(15)

It remains to choose an appropriate K that is large enough to allow for minimization over all relevant k and small enough to be an intermediate sequence itself, see Theorem 3. To find such a K we use the following relation between the estimators that holds if ρ = − 1,

$$ \begin{array}{@{}rcl@{}} \mathbb{A}\mathbb{E}[\hat{\gamma}_{k}] = \mathbb{A}\mathbb{E}[\hat{\gamma}^{\mathrm{V}}_{k}+ \hat{b}_{k,K}]. \end{array} $$

(16)

This provides us with a relatively stable function in k that has the same asymptotic expectation as the highly non-smooth Hill estimator. We want to find an intermediate sequence K for which Eq. 16 holds and thus define

$$ \text{AD}(K):=\frac{1}{K}\sum\limits_{k=1}^{K} \left( \hat{\gamma}^{\mathrm{V}}_{k}+ \hat{b}_{k,K} - \hat{\gamma}_{k} \right)^{2} $$

(17)

to measure the deviation from approximation (16) uniformly over all k ≤ K. We want to find the minimum of AD(K), but not the one that occurs due to high variation for small k. We look for a K such that the stabilized numerical approximation of the derivative of AD is closest to zero,

$$ \begin{array}{@{}rcl@{}} K^{*} :=\underset{K}{\arg\min}\bigg\{ \sum\limits_{i=-2, i\neq 0}^{2} \bigg|\frac{\text{AD}(K)- \text{AD}(K+i)}{i} \bigg| \bigg\}. \end{array} $$

(18)

Now we combine the previously described estimators to approach the AMSE in Eq. 12 under the assumption that ρ = − 1. The idea of misspecifying ρ to simplify estimation – via avoiding the additional uncertainty through estimating ρ or selecting an influential tuning parameter – was already used, for example, by Gomes et al. (2000), Drees and Kaufmann (1998) and Goegebeur et al. (2008). It is also motivated by the simulations in Section 3.1.

Considering K^∗ in Eq. 18 and the property of $\hat {b}_{k,K}$ in Eq. 15, we define estimators for the AMSE of the Hill estimator and for k_opt by

$$ \begin{array}{@{}rcl@{}} \text{SAMSEE}(k) &:=& \frac{(\hat{\gamma}_{K^{*}}^{\text{GJ}})^{2}}{k} + 4\hat{b}_{k,K^{*}}^{2}, \\ \hat{k}_{\text{SAMSEE}}&:=& \underset{1<k< K^{*}}{\text{argmin}}\ \text{SAMSEE}(k). \end{array} $$

(19)

The left panel of Fig. 4 displays SAMSEE for a Fréchet sample with parameters γ = 1/2 and ρ = − 1. The Hill plot of the same sample and all k ≤ K^∗ = 388 is shown in the right panel of Fig. 4.

This smooth estimate of the AMSE can be useful beyond the context of threshold selection. SAMSEE evaluates how well each k performs as a threshold in the peaks-over-threshold model, where the lowest values indicate a good fit. Thus, SAMSEE could be used to construct a transition function between bulk and tail distribution in extreme value mixture models or an empirical prior for the threshold in Bayesian threshold selection approaches, see Scarrott and MacDonald (2012) for a review on mixture models.

3.1 SAMSEE if ρ≠ − 1

In the broader context of an unknown second order parameter ρ, we note that the generalized Jackknife estimator is no longer unbiased and the behaviour of our bias estimator changes. From Theorem 3 follows that

$$ \mathbb{A}\mathbb{E}[\hat{b}_{k,K}] = \frac{-\rho A(n/k)}{(1-\rho)^{2}} \cdot \delta_{\rho}(k/K). $$

If ρ≠ − 1, we can observe that the function δ_ρ(k/K) influences the rate at which the bias estimator increases with k. We can still apply SAMSEE in this situation and select K^∗ from Eq. 18. However, approximation (16) does not hold anymore and instead it holds that

$$ \begin{array}{@{}rcl@{}} \mathbb{A}\mathbb{E}[\hat{\gamma}_{k} - \hat{\gamma}^{\mathrm{V}}_{k}]- \mathbb{A}\mathbb{E}[\hat{b}_{k,K}] = \frac{-\rho A(n/k)}{(1-\rho)^{2}}\left( 1-\delta_{\rho}(k/K) \right) . \end{array} $$

(20)

The absolute value of the error described by Eq. 20 is high if δ_ρ strongly differs from 1 and the bias term A is large. If ρ≠ − 1, δ_ρ indeed deviates from 1 and we minimize the difference in Eq. 20 by minimizing A. This is why applying (18) in this case leads to a small K^∗. On the other hand, if δ_ρ = 1, the approximation stays valid for an increasing bias and K^∗ will typically be larger.

An alternative to fixing ρ = − 1 is to incorporate a consistent estimator $\hat {\rho }$ of the second order parameter. Estimators for ρ can be found in Caeiro and Gomes (2014), Gomes et al. (2002), and Fraga Alves et al. (2003) or Drees and Kaufmann (1998). Such an estimator can be included into the SAMSEE procedure via

$$ \begin{array}{@{}rcl@{}} &&\qquad K_{\hat{\rho}}^{*} :=\underset{K}{\arg\min}\bigg\{ \sum\limits_{i=-2, i\neq 0}^{2} \bigg|\frac{\text{AD}_{\hat{\rho}}(K)- \text{AD}_{\hat{\rho}}(K+i)}{i} \bigg| \bigg\},\\ &&\text{where }\text{AD}_{\hat{\rho}}(K) :=\frac{1}{K}\sum\limits_{k=1}^{K} \left( \hat{\gamma}^{\mathrm{V}}_{k}+ \hat{b}_{k,K}/\delta_{\hat{\rho}}(k/K) - \hat{\gamma}_{k} \right)^{2}. \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} \text{SAMSEE}_{\hat{\rho}}(k)&:=& \frac{(\hat{\gamma}_{K_{\hat{\rho}}^{*}}^{\text{GJ}})^{2}}{k} + \bigg((1-\hat{\rho}) \frac{ K_{\hat{\rho}}^{*}/k-1}{(k/ K_{\hat{\rho}}^{*})^{\hat{\rho}}-1}\cdot \hat{b}_{k, K_{\hat{\rho}}^{*}}\bigg)^{2}, \end{array} $$

(21)

$$ \begin{array}{@{}rcl@{}} \hat{k}_{\hat{\rho},\text{SAMSEE}}&:=& \underset{1<k< K^{*}}{\text{argmin}}\ \text{SAMSEE}_{\hat{\rho}}(k). \end{array} $$

In Table 1, we present the results of a simulation study indicating for which distributions it is beneficial to use $\hat {\rho }$ instead of ρ = − 1. We estimate ρ using the estimator $\hat {\rho }^{(1)}$ suggested in Theorem 1 in Drees and Kaufmann (1998). The results indicate that for these processes it is sensible to fix ρ = − 1 in SAMSEE, since only for the Cauchy distribution using $\hat {\rho }$ performs slightly better regarding both bias and root mean squared error (RMSE). This confirms the observations already made by others (Gomes et al. 2000; Drees and Kaufmann 1998; Goegebeur et al. 2008), that it is often recommendable to select ρ = − 1 instead of allowing for further variability by including an additional estimator.

Table 1 The averages of adaptive γ estimates and their root mean squared error (in brackets) are presented for thresholds $\hat {k}$ selected using SAMSEE or $\textup {SAMSEE}_{\hat {\rho }}$ with the true ρ, ρ = − 1 or $\hat {\rho }=\hat {\rho }^{(1)}$

Full size table

4 Simulation study

In the following, we numerically analyse the performance of ten threshold selection methods on heavy-tailed distributions with very different tail behaviours. The simulation study is based on the following distributions:

the Student-t distribution with 6 degrees of freedom, which corresponds to γ = 1/6 and ρ = − 1/3,
the Fréchet distribution with parameter α = 2 and distribution function $F(x)=\exp (-x^{-\alpha })$ for x > 0, which implies γ = 1/2 and ρ = − 1,
the standard Cauchy distribution leading to a tail behaviour with γ = 1 and ρ = − 2,
the Loggamma distribution with γ = 1 and ρ = 0 and density function
the Burr distribution with a parametrisation such that γ = 2, ρ = − 1 and distribution function
$$ F(x)=1-(1+\sqrt{x})^{-1},\ \text{for } x>0, $$
a logarithmically perturbed Pareto distribution of the random variable g(U) with γ = 1 and ρ = − 1, where $U\sim \text {Unif}(0,1)$ and $g(x)=x^{-1}/\log (x^{-1})$. This distribution is denoted as negative Bias owing to the negative bias of the Hill estimator when applied to random samples from this distribution (Drees et al. 2000).

On these distributions, we evaluate the methods by their root median square error (RMedSE) when adaptively estimating γ with the Hill estimator relative to the RMedSE obtained using k_opt,

$$ \text{EFF}_{\gamma}(\hat{k}):=\sqrt{\frac{\mathbb{M}[(\hat{\gamma}_{\hat{k}}-\gamma)^{2}]}{\mathbb{M}[(\hat{\gamma}_{k_{\text{opt}}}-\gamma)^{2}]}}, $$

where $\mathbb {M}$ denotes the median. These efficiency quotients are also used with the mean instead of the median by, e.g., Guillou and Hall (2001) and Gomes et al. (2000) and Drees and Kaufmann (1998). We opted in favor of the median to reduce the influence of few extreme outliers. The efficiency values based on the mean are additionally provided in Appendix A.

The smaller the quotient the better the threshold selection procedure performs compared to the optimal sample fraction k_opt. Values below 1 indicate that the selection method, that chooses a specific k for each sample, performs better than k_opt, which is fixed for all samples of a distribution. Furthermore, we study the efficiency in quantile estimation with the estimator defined in Eq. 10 for p = 0.001,

$$ \text{EFF}_{q}(\hat{k}):=\sqrt{ \frac{\mathbb{M}[ (\hat{q}_{\hat{k}}- q)^{2} ]}{\mathbb{M}[ (\hat{q}_{k_{\text{opt}}}- q)^{2}]}}. $$

Since the true minimizer k_opt of the AMSE is not known, we utilize an empirical version suggested by Gomes et al. (2000). Following their approach we approximate k_opt by the mean of 20 independent replicates of $\bar {k}_{\text {opt}}$, which is the minimizer of the empirical MSE based on 1000 samples, i.e. $\bar {k}_{\text {opt}}= \text {argmin}_{k} \ \mathbb {E}_{n=1000} [ (\hat {\gamma }_{k}-\gamma )^{2} ]. $

We compare these efficiency values for ten different threshold selection methods. If the choice of tuning parameters is required, we follow the recommendations given by the corresponding authors based on their numerical studies. Note that the choice of these parameters is not data-driven and may not be optimal. Most of the considered approaches are constructed for adaptive estimation of γ applying the Hill estimator. This includes one procedure that looks for a stable region among the Hill estimates, while the others aim to estimate k_opt. We also evaluate the performance of the IHS approach discussed in Section 2, which aims to minimize the deviation from the exponential approximation and two methods relying on goodness-of-fit statistics. In total, the following methods are considered:

SAM: :: SAMSEE procedure with ρ = − 1 as defined in Eq. 19 in Section 3,
GH: :: Method by Guillou and Hall (2001) utilizing c_crit = 1.25 and p = 1,
DK: :: Procedure by Drees and Kaufmann (1998) with fixed ρ = − 1,
GO: :: Approach by Goegebeur et al. (2008) defined in their equation (3.3) with fixed ρ = − 1,
DB: :: Double bootstrap approach by Danielsson et al. (2001) with n₁ minimizing a criterion suggested in their article, where n₁ = n^{1 − 1/b} with b ∈{4,7,12,24,38},
B: :: Method by Beirlant et al. (2002) with ρ = − 1 and taking the median of the estimates for $k_{0}\in \{3,\dots ,n/2\}$,
RT: :: Method by Reiss and Thomas (2007) with β = 0 as suggested by Neves and FragaAlves (2004),
C: :: Method defined in equation (3.9) of Clauset et al. (2009),
NC: :: Testing approach by Northrop and Coleman (2014) described in Section 4.2 using α = 0.2 and 15 threshold suggestions from the 0.5 to the 0.95 empirical quantile,
sIHS: :: IHS smoothed by using the eBsc package, see Section 2.

Figures 5 and 6 illustrate the efficiency values obtained in our simulation study from random samples of size n = 150, 500 and 5000 drawn from the distributions above. The figures provide a convenient overview on the spread of the efficiency values for each method over the range of considered distributions. All values, including those that are too large for the detailed graphics, can be found in Tables 2 and 3 in Appendix A.

Table 2 Efficiency values EFF_γ based on 2000 samples if n = 500 and on 500 samples if n = 5000

Full size table

Table 3 Efficiency values EFF_q for p = 0.001 based on 2000 samples if n = 500 and on 500 samples if n = 5000

Full size table

Overall, we see a very diverse picture of methods performing best depending on the distribution and the sample size. The graphs also show that SAMSEE is nicely concentrated close to one for all considered scenarios and especially for quantile estimation. The good performance of SAMSEE stays valid for even more extreme quantiles like p = 0.0001, see Table 4. When looking at adaptively estimating γ in Fig. 5, DK performs comparable well to SAMSEE on small samples, and for larger samples DB and B show efficiency values similarly concentrated as SAMSEE. Considering quantile estimation in Fig. 6 sIHS is an alternative to SAMSEE on small samples with even lower efficiency values for four of the six distributions. The approach B is almost as closely concentrated around 1 as SAMSEE when estimating quantiles from larger samples. The performance of sIHS depends on the underlying distribution and deteriorates with growing sample size. The estimation of γ using sIHS still improves for larger samples but it performs weaker compared to k_opt, which confirms the theoretical findings in Section 2.1. However, sIHS performs impressively well, for example, for the loggamma distribution with efficiency values even below 1 in every scenario. In our study, sIHS performs well for smaller samples and for quantile estimation. SAMSEE on the other hand shows impressively stable performance over the whole range of distributions, sample sizes and tasks. This makes SAMSEE a very good choice for threshold selection if no expert knowledge about the underlying distribution or about parameter tuning is available.

Table 4 Efficiency values EFF_q for p = 0.0001 based on 2000 samples and n = 500

Full size table

5 Application to varying extreme value index

In this section, we use the two introduced estimators for the sample fraction in a financial application on operational losses, where we are particularly interested in the distributional properties of very high losses. The observations of interest are operational losses from the Italian bank UniCredit collected from January 2005 to June 2014. The dataset provides scaled losses above 2000 Euro recorded with the corresponding date. The operational losses are grouped by the type of event that caused the specific loss. We consider the event type CPBP. The CPBP losses are caused by clients, products and business practices related to derivatives or other financial instruments. This group provides 16138 observations over the period from 2005 to 2014, which seems sufficient for our local estimation approach. The number of collected losses per quarter is presented on the left of Fig. 7 next to the logarithm of the raw data over time.

It has been discussed before that it is reasonable to assume that the distribution of such extreme losses is heavy-tailed (Chavez-Demoulin et al. 2016; Moscadelli 2004) and that it changes with the financial market over time (Hambuckers et al. 2018; Cope et al. 2012). In Fig. 12 in Appendix A, we provide some graphical justification that this assumption holds locally for the dataset of CPBP losses. In Hambuckers et al. (2018) the data is analysed in a regularized generalized Pareto regression approach including several firm-specific, macroeconomic and financial indicators as covariates. This approach describes the dependence of the GPD parameters on various covariates via parametric functions. We consider the same data, but with a different focus. Our aim is to estimate the time dependent extreme value index γ(t) non-parametrically with a simple ad hoc estimator that extends the estimator from de Haan and Zhou (2020) by utilizing the approaches sIHS and SAMSEE presented in Sections 2 and 3 for the local selection of a threshold. We present the estimator for γ(t) in Section 5.1 and the results we obtain when applying this estimator to the dataset of operational losses are shown in Section 5.2.

5.1 Estimating a varying extreme value index

In de Haan and Zhou (2020), the authors discuss testing for a trend in the extreme value index and estimating it non-parametrically. They consider n independent random variables $X_{i}\sim F_{i/n}$, where F_s ∈DoA(G_γ(s)) for s ∈ [0,1], and they introduce the following estimator for γ(s), which locally applies the Hill estimator and is based on a global sample fraction k,

$$ \hat{\gamma}_{k}(s):= \frac{1}{2kh} \sum\limits_{i\in I_{n}(s)} \left( \log X_{i} - \log X_{(\lfloor 2nh\rfloor-\lfloor 2kh\rfloor, \lfloor 2nh\rfloor )} \right)^{+}, $$

(22)

where I_n(s) is the h-neighbourhood of s, i.e. I_n(s) := {i : |i/n − s|≤ h}. This estimator depends on the choice of the bandwidth h and the global sample fraction k, which is then rescaled to 2kh for the individual regions I_n(s). A small bandwidth h leads to very high variability in $\hat {\gamma }_{k}(s)$ and a large value of h smooths out all interesting features. Thus, the choice of h should balance these two effects. We suggest a modification of their estimator, where we locally estimate a threshold $\hat {k}(s)$, i.e.

$$ \hat{\gamma}_{\hat{k}(s)}(s):= \frac{1}{\hat{k}(s)} \sum\limits_{i\in I_{n}(s)} \left( \log X_{i} - \log X_{(\lfloor 2nh\rfloor-\hat{k}(s), \lfloor 2nh\rfloor )} \right)^{+}. $$

(23)

To compare these two approaches, we repeat the simulation presented in de Haan and Zhou (2020), see their Figure 2(i), on samples of size n = 5000 with $X_{i}\sim \text {Fr\'{e}chet}(1/\gamma (i/n))$, $\gamma (s)=1+c\cdot \sin \limits (2\pi s)$ and c = 1/4. For both versions of the estimator the bandwidth is fixed to h = 0.025. Figure 8 illustrates the benefits of locally optimizing the threshold via SAMSEE from Section 3, as it strongly tightens the empirical confidence interval around the average, which is obtained from the 2.5% and 97.5% empirical quantiles among 2000 estimates.

5.2 Functional extreme value index of operational losses

First, we test if the extreme value index is constant over time. With the test T4 by Einmahl et al. (2016) we can reject the null hypothesis of constant γ for various k in the range of thresholds selected by our procedures. One concern in testing is the possible serial dependence in the financial losses, as discussed in Section 4.2 in de Haan and Zhou (2020), where it is suggested to repeat the test on subsets of the data to reduce the serial dependence. The test T4 still shows significant results if only considering every second or every 5th observation. Thus, we are confident that the extreme value index of the losses is indeed varying over time.

To estimate the time dependent γ(t) we apply the estimator in Eq. 23. We select $\hat {k}$ by using sIHS from Section 2 and the SAMSEE method from Section 3. It remains to choose the bandwidth h that defines the neighborhood I_n(s). To our knowledge, no data-driven automated method for choosing an optimal bandwidth in the context of the peaks-over-threshold approach is available, and its investigation lies outside the scope of this article. Instead, we use a bandwidth of 170 days and thereby focusing on the long term trend of γ, since this bandwidth leads to a less varying extreme value index by smoothing over seasonal variations. Figure 9 shows the estimates of γ(t) that we obtain for the event type CPBP. It is clearly visible that both procedures yield similar estimates for most time points and that the simple ad hoc estimators recover an increase of the severity of high losses during the financial and Euro crisis from 2008 to 2011. A similar overall trend in the extreme value index can also be identified in the estimates of Hambuckers et al. (2018) for CPBP.

For a more extensive discussion of the data and results of the more complex model including further covariates we refer to Hambuckers et al. (2018).

References

Bader, B., Yan, J., Zhang, X.: Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate. Ann. Appl. Stat. 12, 310–329 (2018)
Article MathSciNet Google Scholar
Beirlant, J., Dierckx, G., Guillou, A., Stǎricǎ, C.: On exponential representations of log-spacings of extreme order statistics. Extremes 5, 157–180 (2002)
Article MathSciNet Google Scholar
Beirlant, J., Kijko, A., Reynkens, T., Einmahl, J.H.J.: Estimating the maximum possible earthquake magnitude using extreme value methodology: the Groningen case. Nat. Hazards 169, 1–23 (2018)
Google Scholar
Caeiro, F., Gomes, M.I.: A semi-parametric estimator of a shape second-order parameter. In: Studies in Theoretical and Applied Statistics, pp. 137–144. Springer (2014)
Carreau, J., Naveau, P., Neppel, L.: Partitioning into hazard subregions for regional peaks-over-threshold modeling of heavy precipitation. Water Resour. Res. 53, 4407–4426 (2017)
Article Google Scholar
Chavez-Demoulin, V., Embrechts, P., Hofert, M.: An extreme value approach for modeling operational risk losses depending on covariates. J. Risk Insur. 83, 735–776 (2016)
Article Google Scholar
Chung, K.L.: A course in probability theory. Academic Press (1974)
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev., 661–703 (2009)
Cope, E.W., Piche, M.T., Walter, J.S.: Macroenvironmental determinants of operational loss serverity. J. Bank. Finance 36, 1362–1380 (2012)
Article Google Scholar
Danielsson, J., de Haan, L., Peng, L., de Vries, C.G.: Using a bootstrap method to choose the sample fraction in tail index estimation. J. Multivariate Anal. 76, 226–248 (2001)
Article MathSciNet Google Scholar
de Haan, L., Peng, L.: Comparison of tail index estimators. Stat. Neerl. 52, 60–70 (1998)
Article MathSciNet Google Scholar
de Haan, L., Ferreira, A.: Extreme value theory - an introduction. Springer (2006)
de Haan, L., Zhou, C.: Trends in extreme value indices. JASA 0(0), 1–15 (2020)
Google Scholar
Dey, D.Y., Yan, J.: Extreme value modeling and risk analysis. Taylor and Francis Group (2016)
Drees, H., Kaufmann, E.: Selecting the optimal sample fraction in univariate extreme value estimation. Stoch. Proc. Appl. 75, 149–172 (1998)
Article MathSciNet Google Scholar
Drees, H., Resnick, S., de Haan, L.: How to make a Hill plot. Ann. Statist. 28, 254–274 (2000)
Article MathSciNet Google Scholar
Drees, H., Janßen, A., Resnick, S.I., Wang, T.: On a minimum distance procedure for threshold selection in tail analysis. arXiv:1811.06433(2018)
DuMouchel, W.H.: Estimating the stable index α in order to measure tail thickness: A critique. Ann. Statist. 11, 1019–1031 (1983)
Article MathSciNet Google Scholar
Einmahl, J.H.J., de Haan, L., Zhou, C.: Statistics of heteroscedastic extremes. J. R. Statist. Soc B 78, 31–51 (2016)
Article MathSciNet Google Scholar
Ferreira, A., de Haan, L., Peng, L.: On optimising the estimation of high quantiles of a probability distribution. Statistics 37, 401–434 (2003)
Article MathSciNet Google Scholar
Fraga Alves, M.I., Gomes, M.I., de Haan, L., Peng, L.: Semi-parametric estimation of the second order parameter in statistics of extremes. Portugaliae Math. 60, 193–213 (2003)
MathSciNet MATH Google Scholar
Goegebeur, Y., Beirlant, J., de Wet, T.: Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation. REVSTAT - Stat. J. 6, 51–69 (2008)
MathSciNet MATH Google Scholar
Gomes, M.I., Martins, M.J., Neves, M.: Alternatives to a semi-parametric estimator of parameters of rare events - the Jackknife methodology. Extremes 3, 207–229 (2000)
Article MathSciNet Google Scholar
Gomes, M.I., de Haan, L., Peng, L.: Semi-parametric estimation of the second order parameter in statistics of extremes. Extremes 5, 387–414 (2002)
Article MathSciNet Google Scholar
Gonzalo, J., Olmo, J.: Which extreme values are really extreme?. J. Financ. Economet. 2, 349–369 (2004)
Article Google Scholar
Guillou, A., Hall, P.: A diagnostic for selecting the threshold in extreme value analysis. J. R. Statist. Soc. B 63, 293–305 (2001)
Article MathSciNet Google Scholar
Hall, P., Weissman, I.: On the estimation of extreme tail probebilities. Ann. Statist. 25, 1311–1326 (1997)
MathSciNet MATH Google Scholar
Hambuckers, J., Groll, A., Kneib, T.: Understanding the economic determinants of the serverity of operational losses: a regularized generalized Pareto regression approach. J. Appl. Economet. 33, 898–935 (2018)
Article Google Scholar
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Statist. 3, 1163–1174 (1975)
Article MathSciNet Google Scholar
Kratz, M.F., Resnick, S.I.: The qq-estimator and heavy tails. Comm. Statist. Stoch. Model. 12, 699–724 (1996)
Article MathSciNet Google Scholar
Krivobokova, T., Kauermann, G.: A note on penalized spline smoothing with correlated errors. J. Am. Statist. Ass. 102, 1328–1337 (2007)
Article MathSciNet Google Scholar
Lee, Y.K., Mammen, E., Park, B.U.: Bandwidth selection for kernel regression with correlated errors. Statistics 44, 327–340 (2010)
Article MathSciNet Google Scholar
Moscadelli, M.: The modelling of operational risk: experience with the analysis of the data collected by the basel committee. Technical Report, Bank of Italy – Banking and Finance Department (2004)
Neves, C., FragaAlves, M.I.: Reiss and Thomas’ automatic selection of the number of extremes. Comput. Stat. Data Anal. 47, 689–704 (2004)
Article MathSciNet Google Scholar
Northrop, P.J., Coleman, C.L.: Improved threshold diagnostic plots for extreme value analyses. Extremes 17, 289–303 (2014)
Article MathSciNet Google Scholar
Northrop, P.J., Attalides, N., Jonathan, P.: Crossvalidatory extreme value threshold selection and uncertainty with application to ocean storm severity. J. R. Stat. Soc. C, 66, 93–120 (2017)
Article MathSciNet Google Scholar
Opsomer, J., Wang, Y., Yang, Y.: Nonparametric regression with correlated errors. Statist. Sci. 16, 134–153 (2001)
Article MathSciNet Google Scholar
Pickands, J.: Statistical inference using extreme order statistics. Ann. Statist. 3, 119–131 (1975)
MathSciNet MATH Google Scholar
Reiss, R-D, Thomas, M.: Statistical analysis of extreme values. Birkhäuser, Berlin (2007)
Resnick, S., Stǎricǎ, C.: Smoothing the Hill estimator. Adv. Appl. Probab. 29, 271–293 (1997)
Article MathSciNet Google Scholar
Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Stat. J. 10, 33–60 (2012)
MathSciNet MATH Google Scholar
Serra, P., Krivobokova, T., Rosales, F.: Adaptive non-parametric estimation of mean and autocovariance in regression with dependent errors. arXiv:1812.06948 (2018)
Wadsworth, J.L.: Exploiting structure of maximum likelihood estimators for extreme value threshold selection. Technometrics 58, 116–126 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work has been prepared while all three authors were employed at the Georg-August University of Göttingen. Support of the DFG’s (Deutsche Forschungsgemeinschaft) Research Training Group 2088 (B4) is gratefully acknowledged. We are grateful to Julien Hambuckers for providing us the dataset of operational losses from UniCredit. We acknowledge support by the Open Access Publication Funds of the Göttingen University.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Mathematical Stochastics, Georg-August University Göttingen, Goldschmidtstrasse 7, 37077, Göttingen, Germany
Laura Fee Schneider & Andrea Krajina
Institute for Statistics and Operations Research, University of Vienna, Oskar-Morgenstern-Platz 1, 1090, Wien, Austria
Tatyana Krivobokova

Authors

Laura Fee Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Krajina
View author publications
You can also search for this author in PubMed Google Scholar
Tatyana Krivobokova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laura Fee Schneider.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A:: Theoretical results and proof of Theorem 3

Lemma 1

Let $E_{1},\dots ,E_{n}$ be i.i.d. standard exponential random variables and k s.t. 1 ≤ k ≤ n and $k\rightarrow \infty $ as $n\rightarrow \infty $. We define the following random variables,

$$ \begin{array}{@{}rcl@{}} &&P_{k,n} := \sqrt{k} \left( \frac{1}{k}\sum\limits_{i=1}^{k} E_{i} -1 \right), \qquad Q_{k,n} := \sqrt{k} \left( \frac{1}{k}\sum\limits_{i=1}^{k} {E_{i}^{2}} -2 \right), \\ &&R_{k,n} := \sqrt{k} \left( \frac{1}{k}\sum\limits_{i=1}^{k} e_{i+1}^{k} E_{i} -1 \right), \end{array} $$

where ${e_{i}^{k}}:={\sum }_{l=i}^{k} l^{-1}=\mathbb {E}[E_{(k-i+1,k)}]$. Then it holds for $n\rightarrow \infty $ that

$$ \begin{array}{@{}rcl@{}} (P_{k,n}, Q_{k,n}, R_{k,n})^{T} \overset{\mathcal{D}}{\longrightarrow} \mathcal{N}\left( (0,0,0)^{T}, \begin{pmatrix} 1 & 4 & 1 \\ 4 & 20 & 4 \\ 1 & 4 & 2 \\ \end{pmatrix} \right). \end{array} $$

Proof

We use the Cramér-Wold device that gives us a joint normal limit distribution, if all linear combinations have an univariate normal limit distribution. For $a_{1}, a_{2}, a_{3} \in \mathbb {R}$ we prove asymptotic normality for the sum (a₁P_k,n + a₂Q_k,n + a₃R_k,n) by applying Liapounov’s central limit theorem (CLT), see Theorem 7.1.2. in Chung (1974).

We consider a sum $S_{n}:={\sum }_{i=1}^{k} X_{i,k}$ of independent random variables fulfilling the following three conditions,

1)
$\mathbb {E}[X_{i,k}]=0, \ \forall k\ \forall i$,
2)
${\sum }_{i=1}^{k} \text {Var}(X_{i,k}) = \sigma ^{2} $,
3)
${\Gamma }(k)={\sum }_{i=1}^{k} \mathbb {E}[|X_{i,k}|^{3}] \longrightarrow 0,\ \text {as } k\rightarrow \infty ,$

Then the CLT proves a standard normal limit for S_n. We define

$$ X_{i,k}:= \frac{1}{\sqrt{k}} \left( a_{1} E_{i} - a_{1} + a_{2} {E_{i}^{2}} - 2a_{2} + a_{3}e_{i+1}^{k} E_{i} -a_{3} e_{i+1}^{k} \right) $$

for $i=1,\dots ,k$ where $e_{k+1}^{k}:=0$, such that the approximation error between ${\sum }_{i=1}^{k} X_{i,k}$ and $\left (a_{1} P_{k,n} + a_{2} Q_{k,n} + a_{3} R_{k,n} \right )$ vanishes as $k\rightarrow \infty $, since

$$ \begin{array}{@{}rcl@{}} \lim_{k\rightarrow\infty} \frac{1}{k}\sum\limits_{i=1}^{k-1} e_{i+1}^{k} &=& \lim_{k\rightarrow\infty} \frac{1}{k}\sum\limits_{i=1}^{k-1} \frac{1}{k}\sum\limits_{l=i+1}^{k} \frac{1}{l/k} = \lim_{k\rightarrow\infty} {\int}_{\frac{1}{k}}^{1} {{\int}_{v}^{1}} \frac{1}{u} \mathrm{d}u \mathrm{d}v \\ &=& -\lim_{k\rightarrow\infty} {\int}_{\frac{1}{k}}^{1} \log(v)\mathrm{d}v = 1. \end{array} $$

Now we have to check the three conditions. Condition 1) follows immediately from $\mathbb {E}[E_{i}]=1$ and $\mathbb {E}[{E_{i}^{2}}]=2$. For condition 2) we need to calculate the variance

$$ \begin{array}{@{}rcl@{}} \text{Var}(X_{i,k}) &=& \text{Var}\left( \frac{1}{\sqrt{k}} \left( a_{1} E_{i} - a_{1} + a_{2} {E_{i}^{2}} - 2a_{2} + a_{3}{e_{i}^{k}} E_{i} -a_{3} {e_{i}^{k}} \right) \right)\\ &=& \frac{1}{k}\Big({a_{1}^{2}}\text{Var}(E_{i}) +{a_{2}^{2}}\text{Var}({E_{i}^{2}}) + {a_{3}^{2}}(e_{i+1}^{k})^{2} \text{Var}(E_{i})\\ &&\qquad + 2(a_{1}a_{2}+a_{2}a_{3}e_{i+1}^{k}) \operatorname{Cov}(E_{i},{E_{i}^{2}})+ 2a_{1}a_{3} \text{Var}(E_{i}) \Big)\\ &=& \frac{1}{k}\Big({a_{1}^{2}} + 20{a_{2}^{2}} + {a_{3}^{2}}(e_{i+1}^{k})^{2} + 8(a_{1}a_{2}+a_{2}a_{3}e_{i+1}^{k})+ 2a_{1}a_{3} \Big), \end{array} $$

since Var(E_i) = 1, $\text {Var}({E_{i}^{2}})=20$ and $\operatorname {Cov}(E_{i},{E_{i}^{2}})=\mathbb {E}[{E_{i}^{3}}]-\mathbb {E}[{E_{i}^{2}}]\mathbb {E}[E_{i}]=4$. With the approximation

$$ \begin{array}{@{}rcl@{}} \lim_{k\rightarrow\infty} \frac{1}{k}\sum\limits_{i=1}^{k-1}(e_{i+1}^{k} )^{2} = \lim_{k\rightarrow\infty} {\int}_{\frac{1}{k}}^{1} \Big({{\int}_{v}^{1}} \frac{1}{u} \mathrm{d}u \Big)^{2} \mathrm{d}v = 2 \end{array} $$

follows that ${\sum }_{i=1}^{k} \text {Var}(X_{i,k}) = {a_{1}^{2}} + 20{a_{2}^{2}} + 2{a_{3}^{2}} + 8a_{1}a_{2} + 8a_{2}a_{3} + 2a_{1}a_{3}$. Condition 3) holds with

$$ {\Gamma}(k)= \frac{1}{k\sqrt{k}} \sum\limits_{i=1}^{k} \mathbb{E}\Big[\big|(a_{1}+a_{3}{e_{i}^{k}})(E_{i}-1)+a_{2}({E_{i}^{2}}-2)\big|^{3}\Big] = \frac{c}{\sqrt{k}}\rightarrow 0, $$

as $k\rightarrow \infty $ and for a constant c > 0, since the exponential distribution has finite moments and ${\sum }_{i=1}^{k} (e_{i+1}^{k})^{3}/k \leq 6$. Thus, we obtain that

$$ \sum\limits_{i=1}^{k} X_{i,k} \overset{\mathcal{D}}{\longrightarrow} \mathcal{N}(0,\ {a_{1}^{2}} + 20{a_{2}^{2}} + 2{a_{3}^{2}} + 8a_{1}a_{2} + 8a_{2}a_{3} + 2a_{1}a_{3} ). $$

This is the limiting distribution of all the linear combinations (a₁P_k,n + a₂Q_k,n + a₃R_k,n) and gives us the joint normal distribution. □

Lemma 2

Let $X_{1},\dots ,X_{n}$ be i.i.d. random variables with distribution function F ∈DoA(G_γ) with γ > 0 and $T_{1},T_{2},\dots $ be i.i.d. random variables with distribution function 1 − 1/y. We define

$$ \begin{array}{@{}rcl@{}} \hat{\gamma}_{k} &:=& \frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i+1,n)}}{X_{(n-k,n)}} \right), \quad M_{n}:= \frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i+1,n)}}{X_{(n-k,n)}} \right)^{2}, \\ \overline{YE} &:=& \frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i+1,n)}}{X_{(n-k,n)}} \right) {e_{i}^{k}} , \text{ and}\quad {e_{i}^{k}}:=\sum\limits_{l=i}^{k} \frac{1}{l}. \end{array} $$

If the second order condition in Eq. 7 holds for ρ < 0 and x > 0 and P_k,n, Q_k,n and R_k,n are as defined in Lemma 1, then

$$ \begin{array}{@{}rcl@{}} &&\hat{\gamma}_{k} \overset{\mathcal{D}}{=} \gamma + \gamma P_{k,n}/\sqrt{k} + \frac{A(T_{(n-k,n)})}{1-\rho} + o_{p}(A(n/k)), \end{array} $$

(24)

$$ \begin{array}{@{}rcl@{}} &&M_{n} \overset{\mathcal{D}}{=} 2\gamma^{2} + \gamma^{2} Q_{k,n}/\sqrt{k} + \frac{2\gamma(2-\rho)}{(1-\rho)^{2}}A(T_{(n-k,n)}) + o_{p}(A(n/k)), \end{array} $$

(25)

$$ \begin{array}{@{}rcl@{}} &&\overline{YE} \overset{\mathcal{D}}{=} 2\gamma + \gamma(P_{k,n} + R_{k,n})/\sqrt{k} + \frac{2-\rho}{(1-\rho)^{2}}A(T_{(n-k,n)}) + o_{p}(A(n/k)), \end{array} $$

(26)

holds for n and k s.t. 1 ≤ k ≤ n and $k\rightarrow \infty $ as $n\rightarrow \infty $.

Proof

The results in Eqs. 24 and 25 are already stated in the proof of Theorem 1 in de Haan and Peng (1998).

To prove (26) we follow the proof of the asymptotic normality of the Hill estimator in de Haan and Ferreira (2006). Let A₀ be such that $A(t)/A_{0}(t)\rightarrow 1$, as $t\rightarrow \infty $. Then, for each 𝜖 > 0 there exists a t₀ such that for t ≥ t₀ and x ≥ 1 the inequality in Theorem B.2.18 in de Haan and Ferreira (2006) holds. For t = T_n−k,n and x = T_(n−i,n)/T_(n−k,n) we obtain that

$$ \begin{array}{@{}rcl@{}} \overline{YE} &&\overset{\mathcal{D}}{=} \frac{\gamma}{k} \sum\limits_{i=1}^{k} \log\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right){e_{i}^{k}} + A_{0}(T_{(n-k,n)}) \frac{1}{k}\sum\limits_{i=1}^{k} \frac{\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right)^{\rho}-1}{\rho} {e_{i}^{k}}\\ &&\qquad +o_{p}(1)|A_{0}(T_{(n-k,n)})|\frac{1}{k}\sum\limits_{i=1}^{k}\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right)^{\rho+\epsilon} {e_{i}^{k}}. \end{array} $$

The second term can be approximated by

$$ \begin{array}{@{}rcl@{}} \lim_{n\rightarrow\infty}\ \frac{1}{k}\sum\limits_{i=1}^{k} \frac{\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right)^{\rho}-1}{\rho} {e_{i}^{k}} \ = {{\int}_{0}^{1}} \frac{v^{-\rho}-1}{\rho} {{\int}_{v}^{1}} \frac{1}{u} \mathrm{d}u \mathrm{d}v = \frac{2-\rho}{(1-\rho)^{2}}, \end{array} $$

and for the third term holds

$$ \begin{array}{@{}rcl@{}} \lim_{n\rightarrow\infty}\ \frac{1}{k}\sum\limits_{i=1}^{k} \left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right)^{\rho+\epsilon} {e_{i}^{k}} \ = {{\int}_{0}^{1}} v^{-\rho-\epsilon} {{\int}_{v}^{1}} \frac{1}{u} \mathrm{d}u \mathrm{d}v = \frac{1}{(1-\rho-\epsilon)^{2}}. \end{array} $$

Note that for $E_{1},\dots ,E_{n}$ i.i.d. standard exponential random variables follows by Rényi’s representation that

$$ \begin{array}{@{}rcl@{}} &&\sum\limits_{i=1}^{k} \log\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right){e_{i}^{k}} \overset{\mathcal{D}}{=} \sum\limits_{i=1}^{k} {e_{i}^{k}} \sum\limits_{j=i}^{k} \frac{E_{j}}{j} = \sum\limits_{i=1}^{k} \frac{E_{i}}{i} \sum\limits_{j=1}^{i} {e_{j}^{k}} \\ &&\ = \sum\limits_{i=1}^{k} \frac{E_{i}}{i} \Big(i e_{i+1}^{k} + \sum\limits_{j=1}^{i} \sum\limits_{l=j}^{i} \frac{1}{l}\Big) = \sum\limits_{i=1}^{k} \frac{E_{i}}{i} \Big(i+ i e_{i+1}^{k}\Big) = \sum\limits_{i=1}^{k} E_{i} + \sum\limits_{i=1}^{k} E_{i} e_{i+1}^{k}. \end{array} $$

Thus,

$$ \begin{array}{@{}rcl@{}} \frac{\gamma}{k} \sum\limits_{i=1}^{k} \log\left( \frac{T_{(n-i+1,n)}}{T_{(n-k,n)}} \right){e_{i}^{k}} &\overset{\mathcal{D}}{=}& 2\gamma + \gamma \bigg(\Big(\frac{1}{k} \sum\limits_{i=1}^{k} E_{i} -1 \Big)+\Big(\frac{1}{k}\sum\limits_{i=1}^{k} E_{i} e_{i+1}^{k} -1 \Big) \bigg)\\ &=&2\gamma +\gamma\left( P_{k,n} + R_{k,n} \right)/\sqrt{k}. \end{array} $$

Combining the above arguments as in the proof of Theorem 3.2.5 in de Haan and Ferreira (2006) gives (26). □

Lemma 3

If $k\rightarrow \infty $, $k/n\rightarrow 0$ and $k/K\rightarrow c$ with 0 < c < 1, as $n \rightarrow \infty $, then

$$ \begin{array}{@{}rcl@{}} \lim_{n\rightarrow\infty}\operatorname{Cov}(R_{K,n},R_{k,n}) = \frac{2c-c\log(c)}{\sqrt{c}}\text{ and } \lim_{n\rightarrow\infty} \operatorname{Cov}(R_{K,n},P_{k,n}) = \frac{c-c\log(c)}{\sqrt{c}}, \end{array} $$

where R_k,n and P_k,n are defined in Lemma 1.

Proof

Let $E_{1}, E_{2},\dots $ be i.i.d. standard exponential random variables, where Cov (E_i,E_j) is equal to 1 if i = j and 0 otherwise. Then

$$ \begin{array}{@{}rcl@{}} \lim_{k\rightarrow\infty} \operatorname{Cov}&&(R_{K,n},R_{k,n}) =\lim_{k\rightarrow\infty} \operatorname{Cov}\left( \sum\limits_{i=1}^{k} E_{i} \frac{e_{i+1}^{k}}{\sqrt{k}} , \sum\limits_{i=1}^{K} E_{i} \frac{e_{i+1}^{K}}{\sqrt{K}} \right) \end{array} $$

$$ \begin{array}{@{}rcl@{}} &= &\lim_{k\rightarrow\infty} \sum\limits_{i=1}^{k} \sum\limits_{j=1}^{K} \frac{e_{i+1}^{k}}{\sqrt{k}} \frac{e_{j+1}^{K}}{\sqrt{K}} \operatorname{Cov}(E_{i}, E_{j})\\ &= &\lim_{k\rightarrow\infty} \frac{\sqrt{k}}{\sqrt{K}} \frac{1}{k} \sum\limits_{i=1}^{k} \bigg(\frac{1}{k}\sum\limits_{l=i+1}^{k} \frac{1}{l/k} \bigg) \bigg(\frac{1}{K}\sum\limits_{l=i+1}^{K} \frac{1}{l/K} \bigg)\\ &= &\lim_{k\rightarrow\infty} \frac{\sqrt{k}}{\sqrt{K}} \frac{1}{k} \sum\limits_{i=1}^{k} \bigg({\int}_{(i+1)/k}^{1} \frac{1}{u}\mathrm{d}u \bigg) \bigg({\int}_{(i+1)/K}^{1} \frac{1}{u}\mathrm{d}u \bigg)\\ &= &\sqrt{c} {{\int}_{0}^{1}} \log(v)^{2}-\log(v)\log(c) \mathrm{d}v = 2\sqrt{c} -\sqrt{c}\log(c). \end{array} $$

In the same way we obtain

$$ \begin{array}{@{}rcl@{}} \lim_{k\rightarrow\infty} &\operatorname{Cov}&(R_{K,n},P_{k,n}) = \lim_{k\rightarrow\infty} \sum\limits_{i=1}^{k} \sum\limits_{j=1}^{K} \frac{1}{\sqrt{k}} \frac{e_{j+1}^{K}}{\sqrt{K}} \operatorname{Cov}(E_{i}, E_{j})\\ &=&\lim_{k\rightarrow\infty} \frac{\sqrt{k}}{\sqrt{K}} \frac{1}{k} {\sum}_{i=1}^{k} \bigg({\int}_{(i+1)/K}^{1} \frac{1}{u}\mathrm{d}u \bigg)\\ &=& \lim_{k\rightarrow\infty} \frac{\sqrt{k}}{\sqrt{K}} {{\int}_{0}^{1}} \bigg({\int}_{cv}^{1} \frac{1}{u}\mathrm{d}u \bigg) \mathrm{d}v = \sqrt{c} -\sqrt{c}\log(c). \end{array} $$

□

Theorem 4

Let $X_{1},\dots ,X_{n}$ be i.i.d. random variables with distribution function F ∈DoA(G_γ), γ > 0. If the second order condition in Eq. 7 holds and $k\rightarrow \infty $, $k/n\rightarrow 0$ and $\sqrt {k}A(n/k)\rightarrow \lambda $, as $n\rightarrow \infty $,

$$ \begin{array}{@{}rcl@{}} \sqrt{k} \left( \bar{\gamma}_{k}-\gamma \right) \overset{\mathcal{D}}{\longrightarrow} \mathcal{N}\left( \frac{\lambda}{(1-\rho)^{2}}, 2\gamma^{2} \right). \end{array} $$

Proof

First we have to rewrite the average over the Hill estimator,

$$ \begin{array}{@{}rcl@{}} \bar{\gamma}_{k} &=& \frac{1}{k}\sum\limits_{i=1}^{k} \hat{\gamma}_{k} =\frac{1}{k}\sum\limits_{i=1}^{k} \frac{1}{i}\sum\limits_{j=1}^{i} \log\left( \frac{X_{(n-j+1,n)}}{X_{(n-i,n)}} \right)\\ &=&\frac{1}{k}\sum\limits_{i=1}^{k} \log X_{(n-i+1,n)}\sum\limits_{j=i}^{k} \frac{1}{j} -\frac{1}{k}\sum\limits_{i=1}^{k} \log X_{(n-i,n)}\\ &=& \frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i+1,n)}}{X_{(n-k,n)}} \right)\sum\limits_{j=i}^{k} \frac{1}{j} -\frac{1}{k}\sum\limits_{i=1}^{k} \log\left( \frac{X_{(n-i,n)}}{X_{(n-k,n)}} \right)\\ &=& \overline{YE}- \hat{\gamma}_{k} +\frac{1}{k}\log\left( \frac{X_{(n,n)}}{X_{(n-k,n)}} \right), \end{array} $$

where $\overline {YE}$ is defined in Lemma 2. Following the proof of Lemma 2 it holds that the last term above is in distribution equal to

$$ \begin{array}{@{}rcl@{}} \frac{\gamma}{k}\log\left( \frac{T_{(n,n)}}{T_{(n-k,n)}} \right) &+& \frac{A(T_{(n-k,n)})}{k}\frac{\left( \frac{T_{(n,n)}}{T_{(n-k,n)}} \right)^{\rho}-1}{\rho} \\&+& o_{p}(1)\frac{|A(T_{(n-k,n)})|}{k}\left( \frac{T_{(n,n)}}{T_{(n-k,n)}} \right)^{\rho+\epsilon}, \end{array} $$

where T_(i,n) is the i-th order statistic of $T_{1},\dots ,T_{n}$ i.i.d. random variables with distribution function 1 − 1/y. From Corollary 2.2.2 in de Haan and Ferreira (2006) it follows that, as $n\rightarrow \infty $, $\frac {k}{n}T_{(n-k,n)}\overset {\mathbb {P}}{\longrightarrow } 1 $ and $ \frac {T_{(n,n)}}{k T_{(n-k,n)}}\overset {\mathbb {P}}{\longrightarrow } 1$. Thus, $ \big (\log (X_{(n,n)})-\log (X_{(n-k,n)})\big )/k= O_{p}(\log (k)/k)+ O_{p}(A(n/k)/k)$, and by Lemma 2 follows

$$ \bar{\gamma}_{k} \overset{\mathcal{D}}{=} \gamma + \gamma R_{k,n}/\sqrt{k} + \frac{A(n/k)}{(1-\rho)^{2}} + o_{p}(A(n/k))+ O_{p}(\log(k)/k). $$

□

Theorem 5

Under the conditions of Theorem 4 and $k/K\rightarrow c$ with 0 < c < 1 as $n\rightarrow \infty $, it holds that

$$ \sqrt{k}(\bar{\gamma}_{k,K} -\gamma) \overset{\mathcal{D}}{\longrightarrow} \mathcal{N}\left( \frac{\lambda}{(1-\rho)^{2}}\frac{c^{\rho}-c}{1-c}, \frac{2\gamma^{2}c}{1-c}\Big(1+\frac{c\log(c)}{1-c}\Big) \right). $$

Proof

We can rewrite $\bar {\gamma }_{k,K}= \frac {K}{K-k+1}\bar {\gamma }_{K} - \frac {k}{K-k+1}\bar {\gamma }_{k}$ and apply Theorem 4,

$$ \begin{array}{@{}rcl@{}} \sqrt{k}(\bar{\gamma}_{k,K} -\gamma)&&\overset{\mathcal{D}}{=} \gamma\frac{K}{K-k+1} \frac{\sqrt{k}}{\sqrt{K}} R_{K,n} - \gamma\frac{k}{K-k+1} R_{k,n}\\ &&\qquad + \frac{K}{K-k+1}\frac{\sqrt{k}A(n/K)}{(1-\rho)^{2}}- \frac{k}{K-k+1}\frac{\sqrt{k}A(n/k)}{(1-\rho)^{2}} + o_{p}(1). \end{array} $$

Now we need $k/K\rightarrow c$ and the covariance between R_n,k and R_n,K, see Lemma 3, to obtain the limit of the variance,

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\lim_{k\rightarrow\infty} \text{Var}\bigg(\gamma\frac{K}{K-k+1}\frac{\sqrt{k}}{\sqrt{K}} R_{K,n} -\gamma\frac{k}{K-k+1}R_{k,n} \bigg)\\ &&\!\!\!\!= \lim_{k\rightarrow\infty} \gamma^{2}\bigg(\Big(\frac{\sqrt{c}}{1 - c} \Big)^{2}\operatorname{Var}(R_{K,n}) + \Big(\frac{c}{1 - c} \Big)^{2}\operatorname{Var}(R_{k,n}) - \frac{2c\sqrt{c}}{(1 - c)^{2}}\operatorname{Cov}(R_{n,K},R_{n,k}) \bigg)\\ &&\!\!\!\!= \gamma^{2}\frac{2c+ 2c^{2} -4c^{2}+ 2c^{2}\log(c)}{(1-c)^{2}} = \frac{2\gamma^{2}c}{1-c} \bigg(1+\frac{c\log(c)}{1-c}\bigg). \end{array} $$

The statement follows by applying this property of regular varying functions,

$$ \begin{array}{@{}rcl@{}} \sqrt{k}A(n/K)= \sqrt{k}A(n/k)\frac{A((k/K)n/k)}{A(n/k)}\rightarrow \lambda c^{\rho},\ \text{as } n \rightarrow\infty. \end{array} $$

□

Proof Proof of Theorem 3

To analyse the bias estimator $\hat {b}_{k,K}= \bar {\gamma }_{k,K}-\bar {\gamma }_{K}$, we utilize the Theorems 4 and 5. Following their proofs it holds that

$$ \begin{array}{@{}rcl@{}} \sqrt{k}&& \hat{b}_{k,K} \overset{\mathcal{D}}{=} \gamma\frac{k-1}{K-k+1}\frac{\sqrt{k}}{\sqrt{K}} R_{K,n} -\gamma\frac{k}{K-k+1}R_{k,n} \\ && \quad+ \frac{k}{K-k+1}\frac{\sqrt{k}(A(n/K)-A(n/k))}{(1-\rho)^{2}} + \sqrt{k}\Big(o_{p}(A(n/K))+ o_{p}(A(n/k))\Big). \end{array} $$

Here the random variable R_k,n is defined in Lemma 1 and we know that R_k,n has a normal limit distribution. Following the proof of Theorem 5 we can calculate the variance and bias, which is

$$ \begin{array}{@{}rcl@{}} \frac{k}{K-k+1}\frac{\sqrt{k}(A(n/K)-A(n/k))}{(1-\rho)^{2}} \rightarrow \frac{\lambda}{(1-\rho)^{2}} \frac{c(c^{\rho}-1)}{1-c}, \end{array} $$

as $n\rightarrow \infty $, which follows from the regular variation of A and due to $\sqrt {k}A(n/k)$ $\rightarrow \lambda $. Since

$$ \begin{array}{@{}rcl@{}} \sqrt{k}\Big(o_{p}(A(n/K))+ o_{p}(A(n/k))\Big)=o_{p}(1), \end{array} $$

the statement of the theorem follows. □

B Additional simulation results

This section provides in Tables 2 and 3 the exact efficiency values, which were presented graphically in the simulation study in Section 4. Further, the efficiency values for quantile estimation with p = 0.0001 and = 500 are presented in Table 4. Additionally, efficiency values employing the mean instead of the median are presented for comparison. The mean efficiency values are defined as

$$ \text{EFFmean}_{\gamma}(\hat{k}):=\sqrt{\frac{\mathbb{E}[(\hat{\gamma}_{\hat{k}}-\gamma)^{2}]}{\mathbb{E}[(\hat{\gamma}_{k_{\text{opt}}}-\gamma)^{2}]}}, $$

and can be found in Figs. 10 and 11 and Tables 5 and 6.

Table 5 Efficiency values EFFmean_γ based on 2000 samples if n = 500 and on 500 samples if n = 5000

Full size table

Table 6 Efficiency values EFFmean_q for p = 0.001 based on 2000 samples if n = 150 and n = 500 and on 500 samples if n = 5000

Full size table

C Additional illustrations

In this section, we provide additional graphical illustrations for the dataset of financial losses studied in Section 5. Figure 12 visualizes the Pareto fit of the rescaled losses locally for three distinct time periods of four month each. These plots support the assumption that the data follows a heavy-tailed distribution.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schneider, L.F., Krajina, A. & Krivobokova, T. Threshold selection in univariate extreme value analysis. Extremes 24, 881–913 (2021). https://doi.org/10.1007/s10687-021-00405-7

Download citation

Received: 28 June 2019
Revised: 29 December 2020
Accepted: 03 January 2021
Published: 16 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10687-021-00405-7

Keywords

AMS 2000 Subject Classifications

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Threshold selection in univariate extreme value analysis

Abstract

Similar content being viewed by others

Estimating the Extremal Coefficient: A Simulation Comparison of Methods

Revisiting the Maximum Likelihood Estimation of a Positive Extreme Value Index

Modern Extreme Value Theory at the Interface of Risk Management, Bayesian Networks and Heavy-Tailed Time Series

1 Introduction

2 Inverse Hill statistic

2.1 Theoretical analysis of IHS

Theorem 1 (Theorem 3.2.5 in de Haan and Ferreira, 2006)

Theorem 2

Proof

3 Smooth AMSE estimator

Theorem 3

Proof

3.1 SAMSEE if ρ≠ − 1

4 Simulation study

5 Application to varying extreme value index

5.1 Estimating a varying extreme value index

5.2 Functional extreme value index of operational losses

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A:: Theoretical results and proof of Theorem 3

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 4

Proof

Theorem 5

Proof

Proof Proof of Theorem 3

B Additional simulation results

C Additional illustrations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS 2000 Subject Classifications

Search

Navigation