A double-indexed functional Hill process and applications

Let $X_{1,n} \leq .... \leq X_{n,n}$ be the order statistics associated with a sample $X_{1}, ...., X_{n}$ whose pertaining distribution function (% \textit{df}) is $F$. We are concerned with the functional asymptotic behaviour of the sequence of stochastic processes \begin{equation} T_{n}(f,s)=\sum_{j=1}^{j=k}f(j)\left(\log X_{n-j+1,n}-\log X_{n-j,n}\right)^{s}, \label{fme} \end{equation} indexed by some classes $\mathcal{F}$ of functions $f:\mathbb{N}% ^{\ast}\longmapsto \mathbb{R}_{+}$ and $s \in ]0,+\infty[$ and where $k=k(n)$ satisfies \begin{equation*} 1\leq k\leq n,k/n\rightarrow 0\text{as}n\rightarrow \infty . \end{equation*} \noindent We show that this is a stochastic process whose margins generate estimators of the extreme value index when $F$ is in the extreme domain of attraction. We focus in this paper on its finite-dimension asymptotic law and provide a class of new estimators of the extreme value index whose performances are compared to analogous ones. The results are next particularized for one explicit class $\mathcal{F}$.

1. Introduction 1.1. General introduction. In this paper, we are concerned with the statistical estimation of the univariate extreme value index of a df F , when it is available. But rather than doing this by one statistic, we are going to use a stochastic process whose margins generate estimators of the extreme value index (SPMEEXI). To precise this notion, let X 1 , X 2 , ... be a sequence of independent copies (s.i.c) of a real random variable (rv) X > 1 with df F (x) = P(X ≤ x). F is said to be in the extreme value domain of attraction of a nondegenerate df M whenever there exist real and nonrandom sequences (a n > 0) n≥1 and (b n ) n≥1 such that for any continuity point x of M, (1.1) lim n→∞ P ( X n,n − b n a n ≤ x) = lim n→∞ F n (a n x + b n ) = M (x).
It is known that M is necessarily of the family of the Generalized Extreme Value (GEV) df : G γ (x) = exp(−(1 + γx) −1/γ ), 1 + γx ≥ 0, parameterized by γ ∈ R. The parameter γ is called the extreme value index. There exists a great number of estimators of γ, going back to first of all of them, the Hill's one defined by T n (f, s) = k −1 k j=1 j(log X n−j+1,n − log X n−j,n ), 2000 Mathematics Subject Classification. Primary 62EG32, 60F05. Secondary 62F12, 62G20.
where for each n, k = k(n) is an integer such that 1 ≤ k ≤ n, k → ∞, k/n → 0 as n → ∞.
A modern and large account of univariate Extreme Value Theory can be found in Beirlant, Goegebeur and Teugels [1], Galambos [2], de Haan [3] and [4], Embrechts et al. [5] and Resnick [6]. One may estimate γ by one statistic only. This is widely done in the literature. But one also may use a stochastic process of statistics {T n (f ), f ∈ F} indexed by F, such that for any fixed f ∈ F, there exists a sequence of nonrandom and positive real coefficients (a n (f )) n≥1 such that T * n (f ) = T n (f )/a n (f ) is an asymptotic estimator of γ. We name such families Stochastic Procesess with Margins Estimating of the EXtreme value Index (SPME-EXI's). Up to our knowledge, the first was introduced in Lo [7] (see also Lo [8]) as follows (log X n−i+1,n − log X n−i,n ) si s! , for 1 ≤ < k < n, p ≥ 1, i 0 = k, where P(p, h) is the set of all ordered partitions of p > 0 into positive integers, 1 ≤ h ≤ p : Further Lo et al. [9] and [10] introduced continuous and functional forms described in (1.2) below. Meanwhile, without denoting it such that, Segers [11] and others considered the Pickands process {P n (s), k/n ≤ s ≤ 1}, with indexed by some classes F of functions f : N * = N\{0} −→ R + , and by s > 0. We have two generalizations. First, for s = 1, we get T n (f, 1)/k = k j=1 f (j) (log X n−j+1,n − log X n−j,n ) /k, which is the functional generalization of the Diop and Lo statistics [10] for f (j) = j τ , for 0 < τ and Deme et al. [14]. Secondly, if f is the identity function and s = 1, we see that T n (Identity, 1)/k is Hill's statistic.
On the other hand, when utilizing the threshold method, we have, with the same properties of the parameters, the following statistic process : f (j) (log X n−j+1,n − log X n−k,n ) s , .
This leads the couple of statistics Our objective is to show that these two stochastic processes (1.2) and (1.3) are SPMEEXI's. In this paper, we focus on the stochastic process T n (f, s) which uses sums of independant random variables. As to S n (f, s) s, to the contrary, it uses sums of dependent random variables. Its study will be done in coming up papers.

1.2.
Motivations and scope of the paper. As announced, we focus on the stochastic process (1.2) here. We have been able to establish its finite-dimension asymptotic distribution. As already noticed in earlier works in Lo et al. ([10], [14]), the limiting law may be Gaussian or non-Gaussian. In both cases, statistical tests may be implemented. In case of non-Gaussian asymptotic limits, the limiting distribution is represented through an infinite series of standard exponential random variables. Its law may be approximated through monte-Carlo methods, as showed in Fall et al. [15].
Then we prove that it is a SPMEEXI in the sense of convergence in probability. Both for asymptotic distribution and convergence in probability, the used conditions are expressed with respect to an infinite series of standard exopnential randoms variables and through the auxiliary functions a and p in the representations of df 's in the extreme domain of attraction that will be recalled in the just next subsection. The conditions are next notably simplified by supposing that the df F is differentiable in the neighborhood of its upper endpoint.
To show how work the results for specific classes of functions f , we adapt them for f τ (j) = j τ , τ > 0. It is interesting to see that although we have the existence of the asymptotic laws for any τ > 0 and s ≥ 1, we don't have an estimation of γ in the region τ < s − 1, when s > 1.
One advantage of using SPMEEXI's is that we may consider the best estimators, in some sense to be precised, among all margins. We show in Theorem 3 that T n (f τ , s) is asymptotically Gaussian for τ ≥ s − 1/2. When we restrict ourselves in that domain, we are able to establish that the minimum asymptotic variance is reached for τ = s. Then we construc the best estimator T (τ ) n = T n (f τ , τ ), that is for τ = s. This is very important since the Hill estimator is T (1) n itself and, as a consequence, the Hill estimator is an element of a set of best estimators indexed by τ . In fact, it is the best of all, that is T (1) n has less asymptotic variance than T (τ ) n , τ > 1.
It will be interesting to found out whether this minimim variance can be improved for other functional classes.
Even when we have a minimum asymptotic variance estimator, it is not sure that the performance is better for finite samples. This is why simulation studies mean reveal a best combination between bias and asymptotic variance. At finite sample size, the performance of an estimator is measured both by the bias and the variance and we don't know how the random value of the estimator is far from the exact value. We will see in the simulation Section 3 that the boundary case τ = s − 1/2 gives performances similar to the optimal case.
Before we present the theoritical results and their consequences, we feel obliged to present a brief reminder of basic univariate extreme value theory and some related notation on which the statements of the results will rely on.
1.3. Basics of Extreme Value Theory. Let us make this reminder by continuing the lines of (1.1) above. If (1.1) holds, it is said that F is attracted to M or F belongs to the domain of attraction of M , written F ∈ D(M ). It is well-kwown that the three possible nondegenerate limits in (1.1), called extreme value df, are the following : The Gumbel df of parameter γ = 0, or the Fréchet df of parameter γ > 0, or the Weibull df of parameter γ < 0 , where I A denotes the indicator function of the set A. Now put D(φ) = ∪ γ>0 D(φ γ ), D(ψ) = ∪ γ>0 D(ψ γ ), and Γ = D(φ) ∪ D(ψ) ∪ D(Λ).
In fact the limiting distribution function M is defined by an equivalence class of the binary relation R on the set of df D on F defined as follows : Theses facts allow to parameterize the class of extremal distribution functions. For this purpose, suppose that (1.1) holds for the three df 's given in (1.4), (1.5) and (1.6). We may take sequences (a n > 0) n≥1 and (b n ) n≥1 such that the limits in Theorem 1. We have : (1) Karamata's representation (KARARE) (a) If F ∈ D(φ 1/γ ), γ > 0, then where sup(|p(u)| , |b(u)|) → 0 as u → 0 and c is a positive constant and where c, p(·) and b(·) are as in (1.8) where d is a constant and a(·) admits this KARARE : c, p(·) anf b(·) being defined as in (1.8). We warn the reader to not confuse this function a(.) with the function a n (., .) which will be defined later.
Finally, we shall also use the uniform representation of .. are independent and uniform random variables on (0, 1) and where G is the df of Y , in the sense of equality in distribution (denoted by = d ) and hence In connexion with this, we shall use the following Malmquist representation (see [16], p. 336) : where E 1,n , ..., E n,n is an array of independent standard exponential random variables. We write E i instead of E i,n for simplicity sake. Some conditions will be expressed in terms of these exponential random variables. We are now in position to state our first results for finite distribution asymptotic normality.

Our results
We need the following conditions. First define for n ≥ 1, f and s fixed, We will use the two main conditions of f and s fixed : Further, any df in D(G γ ) is associated with a couple of functions (p, b) as given in the representations (1.8), (1.9) and (1.11). Define then the following notation for λ > 1, b n (λ) = sup{|b(t)| , 0 ≤ t ≤ λk/n} and p n (λ) = sup{|p(t)| , 0 ≤ t ≤ λk/n} We will require below that, for some λ > 1, From now, all the limits below are meant as n → ∞ unless the contrary is specified.
Here are our fundamental results. First, we have marginal estimations of the extreme value index as expected. The conditions of the results are given in very general forms that allow further, specific hypotheses as particular cases. As well, although we focus here on finite-distribution limits, the conditions are stated in a way that will permit to handle uniform studies further.
If a n (f, s) → ∞ and for an arbitrary λ > 1 then (T n (f, s)/a n (f, s)) 1/s → P γ, where → P stands for convergence in probability.
If a n (f, s) → +∞ and for an arbitrary λ > 1, and for an arbitrary λ > 1, and where and the F s j 's are independent and centred random variables with variance one.
Then the conditions (H0a), (H0b), (H2a), (H2b), (H1a) and (H1b) respectively hold when hold these ones and a n bounded a n → ∞ a n ∼ 2s!k 1/2 a n ∼ s! k τ −s+1 Weakening the conditions for s interger. When the distribution function G admits an ultimate derivative at x 0 (G) = sup{x, G(x) < 1}, and this is the case for the usual df 's, one may take p(u) = 0, as pointed out in [17]. In that case, the conditions (HE0x), (HE1x) and (HE2x), for x = a or x = b, are much simpler. We then have We get these simpler conditions : It is interesting to remark that all these conditions automaticaly hold whenever b n (λ) → 0, and/or (CR1) holds. Indeed, we remark, by the Cauchy-Scharwz's inequality, that : Then for γ > 0, the corresponding conditions always hold since b n (λ) → 0 and for γ = 0, the corresponding conditions hold with (CR1). This surely leads to powerfull results. It also happens that for the usual cases, we know the values of [17] or [11]) 2.3. The special case of Diop-Lo. Now it is time to see how the preceeding results work for the particular case the functions class f τ (j) = j τ , τ > 0. This special study should be a model of how to apply the results for other specific classes. Here, we will replace f by τ in all the notation meaning that f = f τ . We summarize the holding conditions depending on τ > 0 and s ≥ 1, in the following table We may see the details as follows. First is finite if and only if 2(s − τ ) > 1. This gives the cases (I) and (II). For (III) in Table 1, we have Then (K2) also holds. The lines above also explain the fourth row of the table. The third row is immediate since It is worth mentioning that the case τ < s − 1 is not possible for s = 1. This unveils a new case comparatively with former studies of Deme et al. [14] for s = 1.
For testing the hypothesis F ∈ D(G γ ), γ ≥ 0, we derive the following laws by the delta-method under (CR1), especially for γ = 0.
We derive a n (τ, s) σ n (τ, s) T n (τ, s) a n (τ, s) For γ = 0 a n (τ, s) σ n (τ, s) a(k/n) −s T n (τ, s) a n (τ, s) For the new case τ < s − 1, we have for γ > 0, These two limiting laws also allow statistical tests based on Monte-Carlo methods as in [14]. V n (τ, s) T n (τ, s) a n (τ, s) So, finding the best performance is achieved for minimum value for the asymptotic variance V n (τ, s) −2 . We then have to find the greatest value of variance V n (τ, s). But maximizing this function both in s and τ might be tricky. However, for a fixed s ≥ 1, we may find that the maximum value of V n (τ, s) for τ ∈ [s − 1/2, +∞[. First, we have to isolate the boundary point τ = s − 1/2. We prove in Subsection 4.4 below that the maximum value of V n (s, τ ) is reached when τ = s. Using the formulae in Table 1, we see that, for τ ≥ s − 1/2, we have and for τ = s − 1/2, we get We get as best estimator with least asymptotic variance T n (τ ) (τ ) = T n (τ, τ ) a n (τ, τ ) 1/s for the normality zone τ ≥ s − 1/2. Its asymptotic variance (2.3) increases when τ decreases. This means that the Hill estimator is the best with respect to this sense. Now, let us move to the non-Gaussian zone, that is 0 ≤ s − 1 ≤ τ < s − 1/2, corresponding to the column II in the Table 1. We may easily derive from (2.1) that the asymptotic variance is of equivalent to which is still dominated by V n (τ, τ ) −1 . To sum up, we say that the Hill estimator has best asymptotic variance for all margins.
However for finite sample, we do not know how far the centered and normalized statistic is from the limiting Gaussian variable or the non-Gaussan limiting random.
Here we are obliged to back on simulation studies. Let us consider T n (τ, τ ) a n (τ, τ ) 1/τ and T (τ +1/2) n = T n (τ, τ + 1/2) a n (τ, τ + 1/2) We get that these two estimators generally behave better than the Hill's and the Dekkers et al.'s ones. At least, they have equivalent performances. But absolutely, they seem to be more stable in a sense to precised later. This must result in lesser biases that constitute a compensation of their poorer performance regarding the asymptotic variance point of view. A full report of simulation studies are given in Section 3.
But we should keep in mind that the results presented here, go far beyond the Diop-Lo family for which the Hill's estimator demonstates to be the least asymptotic variance estimator.
Further, researches will be conducted on other functions families in order to possibly find out estimators with asymptotic variances better that 1/V n (1, 1).

Simulation Studies
Nowadays, simulation studies are very sophisticated and may be very difficult to follow. Here, we want give a serious comparison of our estimators with several analoges while keeping the study reasonably simple. Let us begin to explain the stakes before proceeding any further. The estimators of the extremal index generally use a number, say k like in this text, of the greatest observations : X n−j+1 , 1 ≤ j ≤ k. For almost all such estimators, we have a small bias and a great variance for large values of k, and the contrary happens for small values of k. This leads to the sake of an optimal value of k keeping both the bias and the variance at a low level. A related method consists in considering a range of values kv(j) = kmin + j(kmin − kmin)/ksize, 1 ≤ j ≤ ksize over which the observed values of the statistic are stable and well approximate the index. This second method seems preferable when comparing two estimators with respect to the bias.
So we fix a sample size n and consider the range of values as described above where kmin and kmax are suitably chosen. Thus checking the curves of two statistics over the interval [mink, maxk] is a good tool for comparing their performances. Next, for each j, 1 ≤ kmin ≤ j ≤ kmax, we compute the mean square error of the estimated values of γ for values of k in a neighborhood of kv(j), that is for k ∈ [kv(j) − kstab, kv(j) + kstab], 1 ≤ j ≤ ksize, where kstab is also suitably fixed. The mimimun of these MES's certainly corresponds to the most stable zone and may be taken as the best estimation when it is low enough.
Here, we compare our class of estimators, represented the optimal estimator (2.4) and the boundary form (2.5) for s ∈ [1,5], with the estimators of Hill and Dekkers et al.. The estimators (2.4) and (2.5) for s ∈ [1,5] fall in the asymptotic normality area s − 1/2 ≤ τ . In a larger study, we will include the Pickands' statistic and consider the nongaussian asymptotic area.
The study will only cover the heavy tail case, that γ > 0. The case γ = 0 will be part of a large simulation paper. And we consider a pure Pareto law (I), and two perturbed ones laws (II) and (III): Theoritically, we then have in hands an infinite class of estimators. We should be able to find values of s and τ leading a lowest stable bias and hope that this bias will be lower than of the other analogues, or to be at their order at least. We know from   Table 3. Performances of the Boundary Double Hill statistics for s=1,...,5 with Model I asymptotic variance for s = τ . But it is not sure that this corresponds to the best performance for finite samples. And following the remark of Deme et al. [14] who noticed that the boundary case, that discriminates the Gaussian and the non Gaussian asymptotic laws, behaves well, we include it also here, that is the case s − 1/2 = τ . We fix the following values : n = 1000, kmin = 105, kmax = 375, ksize = 100, kstab = 5. And we fix the number of replications to B = 1000.
How to read our results? For each j ∈ [1, ksize], we compue the mean square error M SE(j) when k spans [kv(j) − kstab, kv(j) + kstab], that is Next we take the mimimum and the maximum values of these MSE(j)'s denoted as Min and Max. The difference (df ) Diff=Max -Min is reported as well as the middle term M id = (M in + M ax)/2.
We classify the estimators with respect to both the values of Mid and Diff. If the values Diff are of the same order for two estimators, we will prefer the one with the minimum value of Mid. That mean that this latter estimator is more stable and then, is better.
Since the Hill estimator and the Dekkers et al. do not depend on parameters, we have conducted a series of simulations and the performances are of the order of values given in Table 2. Next, we conduct simulations on the performances of the boundary Double Hill and the optimal double Hill statistics for s = 1, ..., s = 5 in Tables 3, 4 wher the F  Proof. Put V * n (f, s) = σ n (f, s) −1 V n (f, s). and suppose that (K1) holds. Then Kolmogorov's Theorem for sums of zero mean indenpendent rv's applies. Since the series Now suppose that (K2) holds. Let us evaluate the moment generating function of We use the expansion common characteristic function φ(u) = φ F (s) By using a first order expansion the logarithmic function in the neighborhood of unity, we have where the function ε(B n ) may change from one line to an other, but always tends to zero. Hence and V * * n (f ) → N (0, 1). and then V * n (f ) → N (0, Γ (2s + 1) − Γ (s + 1) 2 ).

Lemma 2.
If any of (H1a), (H1b), (H2a) and (H2b) holds with an abitrary λ > 1, then their analogues where b n (λ) is replaced with b n and p n (λ) is replaced with p n also hold.
By letting ε ↓ 0, one achieves the proof, that is c n → P 0.
A zero-point of ∂Vn(τ,s) ∂τ obviously, is a solution of the ordinary differential equation.
By taking the particular value of τ = s, we find that C(s) = (1/2) log k,and (4.5) becomes . This is the equality form of Cauchy-Schwarz's inequality with respect to the usual scalar product in R k . Then there exists a constant λ(s) such that j τ −s = λ(s) for 1 ≤ j ≤ k. The only solution is τ = s. Now to show that τ is the global maximum point, it suffices to notice that 1 s!q (s) since the left member is the opposite of a the empirical variance of log j, 1 ≤ j ≤ k. We conclude that the point τ = s is the unique local miximum point. Then the global maximum is reached at τ = s.