Estimation of extreme quantiles from heavy-tailed distributions in a location-dispersion regression model

We consider a location-dispersion regression model for heavy-tailed distributions when the multidimensional covariate is deterministic. In a ﬁrst step, nonparametric estimators of the regression and dispersion functions are introduced. This permits, in a second step, to derive an estimator of the conditional extreme-value index computed on the residuals. Finally, a plug-in estimator of extreme conditional quantiles is built using these two preliminary steps. It is shown that the resulting semi-parametric estimator is asymptotically Gaussian and may beneﬁt from the same rate of convergence as in the unconditional situation. Its ﬁnite sample properties are illustrated both on simulated and real tsunami data.


Introduction
The modeling of extreme events arises in many fields such as finance, insurance or environmental science.A recurrent statistical problem is then the estimation of extreme quantiles associated with a random variable Y , see the reference books [1,13,24].In many situations, Y is recorded simultaneously with a multidimensional covariate x ∈ R d , the goal being to 1 describe how tail characteristics such as extreme quantiles or small exceedance probabilities of the response variable Y may depend on the explanatory variable x.Motivating examples include the study of extreme rainfall as a function of the geographical location [17], the assessment of the optimal cost of the delivery activity in postal services [7], the analysis of longevity [30], the description of the upper tail of claim size distributions [1], the modeling of extremes in environmental time series [37], etc.
Here, we focus on the challenging situation where Y given x is heavy-tailed.Without additional assumptions on the pair (Y, x), the estimation of extreme conditional quantiles is addressed using nonparametric methods, see for instance the recent works of [9,19,21].
These methods may however suffer from the curse of dimensionality which is compounded in distribution tails by the fact that observations are rare by definition.These difficulties can be partially overcome by considering parametric models [11,5].Semi-parametric methods have also been considered for trend modeling in extreme events [10,27]: A nonparametric regression model of the trend is combined with a parametric model for extreme values.
Our approach belongs to this second line of works.We assume that the response variable and the covariate are linked by a location-dispersion regression model Y = a(x) + b(x)Z, see [39], where Z is a heavy-tailed random variable.This model is flexible since (i) no parametric assumptions are made on a(•), b(•) and Z, (ii) it allows for heteroscedasticity via the function b(•).Moreover, another feature of this model is that Y inherits its tail behavior from Z and thus does not depend on the covariate x.We propose to take profit of this important property to decouple the estimation of the nonparametric and extreme structures.
As a consequence, we shall show that the resulting semi-parametric estimators of extreme conditional quantiles of Y given x are asymptotically Gaussian and may benefit from the same rate of convergence as in the unconditional situation.A similar idea is implemented in [29]: An extreme-value distribution with constant extreme-value index is fitted to standardized rainfall maxima.The theoretical study of heteroscedastic extremes has been initiated in [26] and further developed in [12,15] through the introduction of a proportional tails model.The results were applied to trend detection in rainfalls and stock market returns.This paper is organized as follows.The location-dispersion regression model for heavytailed distributions is presented in more details in Section 2. The associated inference methods are described in Section 3: Estimation of the regression and dispersion functions, estimation of the conditional tail-index and extreme conditional quantiles.Asymptotic results are provided in Section 4 while the finite sample behavior of the estimators is illustrated in Section 5 on simulated data and in Section 6 on tsunami data.Proofs are postponed to the Appendix.

Location-dispersion regression model for heavy-tailed distributions
We consider the class of location-dispersion regression models, where the relation between a random response variable Y ∈ R and a deterministic covariate vector x ∈ Π ⊂ R d , d ≥ 1 is given by Y = a(x) + b(x)Z. ( The real random variable Z is assumed to be heavy-tailed.Denoting by FZ its survival function, one has FZ (z) = z −1/γ L(z), z > 0. ( Here, γ > 0 is called the conditional tail-index and L is a slowly-varying function at infinity i.e. for all t > 0, FZ is said to be regularly varying at infinity with index −1/γ.This property is denoted for short by FZ ∈ RV −1/γ , see [3] for a detailed account on regular variations.Model (1) has been introduced by [39] in the random design setting where the location function a : Π → R for y ≥ y 0 (x) > a(x) where the functions a(•), b(•) and the conditional tail-index γ are unknown.We thus obtain a semi-parametric location-dispersion regression model for the (heavy) tail of Y given x.The main assumption is that the conditional tail-index γ is independent of the covariate.On the one hand, the proposed semi-parametric heteroscedastic modeling offers more flexibility than purely parametric approaches.On the other hand, the location-dispersion structure may circumvent the curse of dimensionality and assuming a constant conditional tail-index γ should yield more reliable estimates in small sample contexts than purely nonparametric approaches.Let us also note that, from ( 2) and (3), the regular variation property yields FY (y | x)/ FZ (y) → b(x) 1/γ as y → ∞.The locationdispersion regression model can thus be interpreted as a particular case of the proportional tails model [12] with scedasis function b(•) 1/γ .The practical consequences of this point are further discussed in Section 5.
Let us note that the constraint (3) can always be fulfilled with i.e. µ 3 = 1/4, µ 2 = 1/2 and µ 1 = 3/4 up to an affine transformation of a(•), b(•) and Z such that (1) holds.From (1), for all α ∈ (0, 1), the conditional quantile of Y given x ∈ Π is and therefore the regression and dispersion functions are defined in an unique way by for all x ∈ Π.This remark is the starting point of the inference procedure described hereafter.

Inference
Let us denote by λ the Lebesgue measure and where Z 1 , . . ., Z n are independent and identically distributed (iid) from the heavy-tailed distribution (2).We assume that the design points x i , i = 1, . . ., n are all distinct from each other and included in Π, a compact subset of R d whose Lebesgue measure of the boundary is zero.Let {Π i , i = 1, . . ., n} be a partition of Π such that x i ∈ Π i .A three-stage inference procedure is adopted: The regression and dispersion functions are estimated nonparametrically in Paragraph 3.1, and the conditional tail-index is then computed from the residuals in Paragraph 3.2.Finally, the extreme conditional quantiles are derived by combining a plug-in method with Weissman's extrapolation device [40] in Paragraph 3.3.

Estimation of the regression and dispersion functions
The proposed procedure relies on the choice of a smoothing estimator for the conditional quantiles.Here, a kernel estimator for FY (y | x) is considered (see for instance [33,34]).For all (x, y) where Nonparametric regression quantiles obtained by inverting a kernel estimator of the conditional distribution function have been extensively investigated, see, for example [2,35,38], among others.In view of ( 6), the regression and dispersion functions are estimated by for all x ∈ Π.

Estimation of the conditional tail-index
The non-observed Z 1 , . . ., Z n are estimated by the residuals for all i = 1, . . ., n where ân (•) and bn (•) are given in (9).In practice, nonparametric estimators can suffer from boundary effects [6,31] and therefore only design points sufficiently far from the boundary of Π are considered.More specifically, consider Π(n) = {x ∈ R d , such that B(x, h) ⊂ Π} the erosion of the set Π by the ball B(0, h) centered at 0 and with radius h, see [36] built on non iid pseudo-observations.

Estimation of extreme conditional quantiles
Clearly, the purely nonparametric estimator (8) cannot estimate consistently extreme quantiles of levels α n arbitrarily small.For instance, when nα n → 0, the extreme quantile is likely to be larger than the maximum observation.In such a case, an extrapolation technique is necessary to estimate the so-called extreme conditional quantile q Y (α n | x).To this end, we propose to take profit of the structure of the location-dispersion regression model (5) to define the plugin estimator qn,Y (α n | x) = ân (x) + bn (x)q n,Z (α n ), (12) where ân (x) and bn (x) are given in (9) and qn,Z (α n ) is the Weissman type estimator [40]: Again, it should be noted that qn,Z (α n ) is computed from the non iid pseudo-observations Ẑi , i ∈ I n .Finally, by construction, the semi-parametric estimator (12) cannot suffer from quantile crossing, a phenomenon which can occur with quantile regression techniques.

Main results
The following general assumptions are required to establish the asymptotic behavior of the estimators.The first one gathers all the conditions to define a location-dispersion regression model for heavy-tailed distributions in a multidimensional fixed design setting.
(A.1) (Y 1 , x 1 ), . . ., (Y n , x n ) are independent observations from the location-dispersion regression model for heavy-tailed distributions defined by (1), ( 2) and ( 4) and such that max We refer to [33,34] for this definition of the multidimensional fixed design setting.
The second assumption is a regularity condition.Under (A.1) and (A.2), the quantile function q Z (•) and the density f Z (•) = − F Z (•) exist and we let H Z (•) := 1/f Z (q Z (•)) the quantile density function and U Z (•) = q Z (1/•) the tail quantile function of Z.Moreover, the conditional survival function of Y is twice continuously differentiable with respect to its second argument.The next assumption is standard in the nonparametric kernel estimation framework.
(A.3) K is a bounded and even density with symmetric support S ⊂ B(0, 1) the unit ball of R d and verifying the Lipschitz property: There exists c K > 0 such that . Finally, the so-called second-order condition is introduced (see for instance [24, eq (3.2.5)]: (A.4) For all t > 0, as z → ∞, where γ > 0, ρ < 0 and A is a positive or negative function such that A(z) → 0 as From [3, Theorem 1.5.12],property (2) is equivalent to as z → ∞ for all t > 0. The role of the second-order condition (A. Our first result states the joint asymptotic normality of the estimators (9) of the regression and dispersion functions.
where the coefficients of the matrix Σ are given by Table 1: A list of heavy-tailed distributions satisfying (A.4) with the associated values of γ and ρ.Γ(•) and B(•, •) denote the Gamma and Beta functions respectively.
A uniform consistency result can also be established: As a consequence of Theorem 2, one can prove that the residuals Ẑi = (Y i − ân (x i ))/ bn (x i ), see (10), are close to the unobserved Z i , i = 1, . . ., n.
Corollary 1.Under the assumptions of Theorem 2, for all i ∈ I n , Our next main result provides the asymptotic normality of the conditional tail-index estimator (11) and the Weissman estimator (13) computed on the residuals. Then, (ii) For all sequence (α n ) ⊂ (0, 1) such that nα n /k n → 0 and log(nα It appears that, in the location-dispersion regression model, the tail-index can be estimated at the same rate 1/ √ k n as in iid case, see [22] for a review.As expected, this semi-parametric framework is a more favorable situation than the purely nonparametric one for the estimation of the conditional tail-index where the rate of convergence  2ρ) .As a conclusion, up to logarithmic factors, possible choices of sequences are then If ρ ≥ −κ(d)/(2d), the rate of convergence of γn is thus n ρ/(1−2ρ) up to logarithmic factors which is the classical rate for estimators of the tail-index, see for instance [25,Remark 3].
For instance, in the situation where the dimension of the covariate is d ≤ 2, then the n ρ/ (1−2ρ)   rate is reached as soon as ρ ≥ −1.This corresponds to the challenging situation where a high bias is expected in the estimation which may occur for most usual distributions, depending on their shape parameters, see Table 1.
Theorem 4 states the asymptotic normality of the estimator (12) of extreme conditional As a comparison, the rate of convergence of purely nonparametric methods involves an extra h d/2 factor, see for instance [18,Theorem 3] or [8,Theorem 3].The location-dispersion regression model allows to dampen this vexing effect of the dimensionality.
Finally, a uniform consistency result is also available: Then, for all sequence (α n ) ⊂ (0, 1) such that nα n /k n → 0 and log(nα 5 Illustration on simulations
The design points x i , i = 1, . . ., n are chosen on a regular grid on the unit square Π.The kernel function K is the product of two quartic (or biweight) kernels: [4] and in accordance with (16), where σ = 12 −1/2 is the standard deviation of the coordinates of the design points.This choice is optimal for density estimation in the Gaussian case, but is also known to provide good results in other settings.

Graphical illustrations
In all the experiments, N = 100 replications of a dataset of size n = 10, 000 are considered.
The estimation results for the regression and dispersion functions are depicted respectively on Figure 1 and Figure 2 in the situation where Z is Student-t ν distributed for ν ∈ {1, 2, 4}.
The results are visually satisfying and seem independent from the degrees of freedom.This conclusion was expected since both estimators of a(•) and b(•) are based on non-extreme quantiles, they are thus robust with respect to heavy tails.
As already noticed in Section 2, in the context of proportional tails, both random variables Y and Z share the same conditional tail-index γ.This parameter can thus be estimated either by (11) (computed on the residuals Ẑi ) or by the classical Hill estimator (computed on the response variables Y i ).The associated estimation results are displayed on Figure 3 as functions of the sample fraction k n .It first appears that working on the residuals provides much better results in terms of bias than working on the initial response variable.Second, the tail-index estimator (11) has a stronger bias for larger values of ν.These empirical results are in line with the properties of the Student distribution.Indeed, the second-order parameter ρ = −2/ν being increasing with ν, the bias of the Hill-type estimator increases as well.
In practice, the estimation of the conditional tail-index and extreme conditional quantiles require the selection of the sample fraction k n .This parameter is selected using a meansquared error criterion.Assuming that A(t) = ct ρ , the optimal value of k n is given by see for instance [14] or [23].Letting moreover c = √ 2 and restricting ourselves to integer values, we end up with k * n = (γn) 2/3 where γ is a prior naive estimation of γ computed with k n = n 1/2 and where • denotes the floor function.Such a choice of k * n fulfils the assumptions of Theorem 3-5 for all three considered Burr distributions and for Studentt ν distributions with ν ∈ {1, 2}.The constraints are violated in case of the Student-t 4 distribution in order to examine the robustness of the method with respect to the choice of the pair (h, k n ) which may be challenging in practice.The estimated conditional quantiles q Y (1/n | •) of extreme level α n = 1/n are displayed on Figure 4.As expected, the estimated extreme conditional quantiles all share the same shape despite different variation ranges.

Quantitative assessment
In this section, we propose to highlight the performances of the extreme conditional quantile estimator (12) thanks to a comparison with a purely nonparametric one.The nonparametric estimator is based on the ideas of the moving window approach introduced in [16].For each   tail-index is estimated by the (local) Hill-type statistic and the extreme conditional quantile q Y (α n | x) is estimated by the associated Weissman-type statistic: .
Another option is to re-estimate γ and q Y (α n | x) by taking k ⊕ n = (γ n (x)n ) 2/3 in the above two estimators.The associated estimator of the extreme quantile is denoted by q⊕ n,Y (α n | x).The comparison between the true and estimated extreme conditional quantiles is based on a relative median-squared error (RMSE) computed on the N = 100 replications and the m n design points in the square Π(n) : ) computed on the rth replication.Here, both Student-t ν and Burr distributions are considered with ν ∈ {1, 2, 4}, α ∈ {1, 2, 4}, β = 1, α n = 1/n and n ∈ {20 2 , 40 2 , 60 2 , 80 2 , 100 2 }.The RMSE are reported in Table 2.For all estimators, it appears that the main driver of the relative error is the tail heaviness.Unsuprisingly, the semi-parametric estimator qn,Y provides much better results than the nonparametric ones q n,Y and q⊕ n,Y : Its RMSE is smaller and converges towards 0 at a faster rate when the sample size n increases.

Tsunami data example
The proposed illustration is based on the "Tsunami Causes and Waves" dataset, available at https://www.kaggle.com/noaa/seismic-waves.The data include the maximum wave height recorded at several stations in the world where a tsunami occured.We focus on the 2011 Tohoku tsunami, in Japan.This earthquake was the cause of the Fukushima Daiichi nuclear disaster.Indeed, a wave height greater than 15 meters (around 50 feet) flooded the nuclear plant, protected by a seawall of only 5.7 meters (19 feet).In this context, the estimation of return levels of wave heights associated with small probability is a crucial issue.Figure 5 n .Note that the values of Y are ranging from 0 to 55.88 meters (blue to red points).We propose to estimate an extreme quantile of the wave height at each station, following the methodology introduced in Section 3. The assumption of a constant conditional tail-index can be checked thanks to the test statistic T 4,n introduced in [12]: The idea is to compare the Hill estimate γH computed on the response variables with partial ones γp i computed on non-overlapping blocks indexed by i = 1, . . ., m.Under the hypothesis that the conditional tail-index is constant (and additional technical assumptions), it is then [12] for details.Following the ideas of Paragraph 5.3, we set k n = k ⊕ n = 72 and we choose m = 4 blocks as in [12], leading to T 4,n ≈ 2.14 and a p−value around 0.54.The hypothesis of a constant conditional tail-index cannot be rejected, and our semi-parametric approach can thus be applied on these data.
The regression and dispersion functions are then estimated via (9) and depicted on the bidimensional map (Figure 5, top-right and bottom-left panels) and along the one-dimensional first principal axis (Figure 6, top panels).Note that the principal axis has been obtained by computing the eigenvector associated with the largest eigenvalue of the covariance matrix of the coordinates (x i ), i = 1, . . ., n.It appears that ân (•) and bn (•) have a similar shape with a peak in the neighbourhood of the epicenter, indicating a strong heteroscedasticity of the observed phenomenon.
The residuals Ẑ1 , . . ., Ẑn are then computed from (10).The common practice is to use a graphical diagnosis to check whether these residuals have a heavy-tailed behavior.Here, a quantile-quantile plot is adopted, see the bottom-right panel of Figure 6.The log-excesses log( Ẑn−i+1,n / Ẑn−k * n +1,n ) are plotted versus the quantiles log(k * n /i) of the standard exponential distribution, i = 1, . . ., k * n .Note that the number of upper order statistics k * n = 82 is chosen following the approach described in Paragraph 5.2.It appears that the resulting set of points is close to the line of slope γn (computed with k * n = 82), which confirms that the heavy-tailed assumption is reasonable in this case.The proposed estimator (11) computed on the residuals as well as the Hill estimator computed on the output variables are both depicted as functions of k n on the bottom-left panel of Figure 6.The first one features a nice stable behavior, confirming the heavy-tail assumption, and pointing towards a tailindex close to 0.25.As a comparison, the Hill estimator computed on the original output variables is less stable and yields smaller results, in accordance with the negative bias observed on simulated data (Section 5).Finally, the extreme conditional quantile estimator (12) is evaluated at each station with the level α n = 10/n.The results are reported in the bottom-right panel of

Auxiliary lemmas
The first result is an adaptation of Bochner's lemma (for twice differentiable functions) to the multidimensional fixed design setting.respect to its second argument, and assume that where x i ∈ Π i such that ( 14) and ( 15) hold, and where Q is an even measurable positive function with symmetric support S ⊂ B(0, 1).Then, as n → ∞, where Proof.Consider the expansion and let us first focus on T n,1 .The change of variable u = (t n − s)/h yields Let us remark that x ∈ B(0, 1) implies and by definition of the erosion.As a consequence, S ⊂ B(0, 1) ⊂ (t n − Π)/h and therefore with respect to its second argument and let •, • be the usual dot product on R d .A second order Taylor expansion yields, for all y n ∈ C, •) is bounded on compact sets.Remarking that S uQ(u)du = 0 shows that Let us now turn to the second term Since ψ(• | •) is continuously differentiable with respect to its second argument, there exists Moreover, under assumption (15), Finally, collecting ( 18) and ( 19), the conclusion follows.
As a consequence of Lemma 1, the asymptotic bias and variance of the estimator (7) of the conditional survival function can be derived.(i) Then, where F Y is the conditional cumulative distribution function associated with FY . Proof.
and the conclusion follows from Lemma 1 applied with p = 1.
(ii) As a consequence of the independence assumption, var Fn,Y (y where Let us write with, under (A.3) and ( 15), uniformly on (s 1 , s 2 ) ∈ Π 2 i and i = 1, . . ., n.It thus follows from (14) that where we have defined Applying Lemma 1 with p = 1 twice and recalling that nh d → ∞ as n → ∞ entail and the conclusion follows: Finally, Lemma 3 is an adaptation of [20, Lemma 3].It permits to derive the error made on the estimation of the order statistics Z mn−i,mn , i = 0, . . ., m n − 1 from the error made on the unsorted Z i , i ∈ I n .
Lemma 3. Recall that I n = {i ∈ {1, . . ., n} such that For all i ∈ J n , x i ∈ C n and nh d → ∞ together with (15) entail that Π i ⊂ C n , for n large enough.Therefore, as the sets Π i are disjoint: in view of the absolute continuity of the erosion with respect to Lebesgue measure, see [32].

Preliminary results
Let ∨ (resp.∧) denote the maximum (resp.the minimum).The next proposition provides a joint asymptotic normality result for the estimator (7) of the conditional survival function evaluated at points depending on n.
Proof.Let us first remark that, for all j ∈ {1, . . ., J}, in view of ( 5), the sequence y j,n = a(t n ) + b(t n )(q Z (α j ) + ε j,n ) is bounded since a(•) and b(•) are continuous functions defined on compact sets and because ε j,n → 0 as n → ∞.Besides, from (3), F Y (y j,n | t n ) = F Z (q Z (α j ) + ε j,n ) → 1 − α j > 0 as n → ∞ and thus the assumptions of Lemma 2(i,ii) are satisfied.Now, let β = 0 in R J , J ≥ 1 and consider the random variable The random term can be expanded as By definition, E(Γ n,1 ) = 0, and by independence of Y 1 , . . ., Y n , where C (n) is the matrix whose coefficients are defined for all (k, ) ∈ {1, . . ., J} 2 by with S n,i being defined in (20) and expanded as (21): where ϕ is the function Replacing in (22) yields from Lemma 1 applied twice with p = 2 and recalling that nh d → ∞.Besides, let us remark that, in view of ( 5), as n → ∞.Therefore, assuming for instance k < implies α k > α and thus q Z (α k ) < q Z (α ) leading to y k,n < y ,n for n large enough.More generally, y k,n ∨ y ,n = y k∨ ,n and y k,n ∧ y ,n = y k∧ ,n for n large enough and thus ϕ(y k,n , y and ( 5), we have in view of the continuity of FZ .As a result, Collecting ( 24) and ( 25), one has and therefore where B is the matrix defined by the B k, coefficients.The proof of the asymptotic normality of Γ n,1 is based on Lyapounov criteria for triangular arrays of independent random variables: as n → ∞.Let us highlight that the random variables T i,n , i = 1, . . ., n, are bounded: in view of (A.3) and ( 14).As a consequence, one has 26) and (28).It is thus clear that (27) holds under the assumption nh d → ∞ and Let us now turn to the nonrandom term.Lemma 2(i) together with the assumptions nh d → ∞ and nh d+κ(d) → 0 as n → ∞ entail Finally, collecting ( 29) and ( 30), √ nh d Γ n converges to a centered Gaussian random variable with variance λ(Π) K 2 2 β t Bβ, and the result follows.
The following proposition provides the joint asymptotic normality of the estimator (8) of conditional quantiles.It can be read as an adaptation of classical results [2,35,38] to the location-dispersion regression model in the multivariate fixed design setting.
The following proposition provides a uniform consistency result for the estimator (8) of conditional quantiles of Y given a sequence of multidimensional design points in Π(n) , i.e.
not too close from the boundary of Π.Then, for all α ∈ (0, 1), Proof.Let v n = (nh d / log n) 1/2 and for all (ε, α) ∈ (0, 1) 2 , consider Let us also introduce, for all i ∈ I n , so that the following expansion holds: Let us focus on the first term.Assumption nh d / log n → ∞ entails that v n → ∞ as n → ∞ and thus q + i,n is bounded.Therefore Lemma 2(i) shows that for some θ ∈ (0, 1), and the continuity of f Z (•) then yields in view of the assumption nh d+κ(d) / log n → 0 as n → ∞.As a consequence,

(A. 2 )
The functions a(•) and b(•) are twice continuously differentiable on Π, b(•) is lower bounded on Π, b(t) ≥ b m > 0 for all t ∈ Π, and the survival function FZ (•) is twice continuously differentiable on R.

Figure 3 :
Figure 3: Simulation results obtained on a Student-t ν distribution for ν = 1 (left), ν = 2 (middle) and ν = 4 (right).Mean estimate of the conditional tail-index (11) (continuous black line), associated 95% empirical confidence intervals (dotted lines) and mean Hill estimate computed on the response variable (continuous blue line), as functions of the sample fraction k n .The true value γ = 1/ν is depicted by a red horizontal line.

Figure 5 .
The estimated quantiles of the maximum wave height are ranging from 0 to 60.53 meters, with largest values close to the epicenter.Note that such a quantile level means that the observed values Y 1 , . . ., Y n should exceed the return levels qn,Y (α n | x 1 ), . . ., qn,Y (α n | x n ) approximately 10 times in the sample.In this particular example, there are 15 waves exceeding the return levels, this empirical result does not deviate too much from the expected number of exceedances.

7
Appendix: Proofs Technical lemmas are collected in Paragraph 7.1 while preliminary results of general interest are provided in Paragraph 7.2.Finally, the proofs of the main results are given in Paragraph 7.3.

Figure 5 :
Figure 5: Results on tsunami data.Top-left: Maximum wave height recorded at each station.Top-right: Regression function estimate ân (•) at each station.Bottom-left: Dispersion function estimate bn (•) at each station.Bottom-right: Quantile estimate qn,Y (10/n | •) at each station.On all the maps, smallest and largest values are respectively depicted in blue and red.The straight line is the principal axis x (2) = 1.64x (1) + 80.35 computed on the coordinates of the stations, and * represents the epicenter of the earthquake.

Figure 6 :Lemma 1 .
Figure 6: Results on tsunami data.Top: Regression (left) and dispersion (right) function estimates ân (•) and bn (•) along the principal axis x (2) = 1.64x (1) + 80.35.The estimates at each station (black +) are smoothed (red dashed line) for the visualization sake.The vertical black line displays the projection of the epicenter on the principal axis.Bottom left: Hill estimator (11) computed on the residuals (black line) and on the original output variables (blue line) as a function of k n .Bottom right: Log-excesses log( Ẑn−i+1,n / Ẑn−k * n +1,n ) of the residuals versus log(k * n /i), 1 ≤ i ≤ k * n = 82.The straight line has slope γn 0.25.

Table 2 :
Relative median squared errors associated with the estimation of the extreme con- [24,e[24, Section 3.2].Since ρ may be difficult to estimate in practice, a miss-specified value ρ = −1 is considered in several works dealing with bias reduction of tail-index estimators, ditional quantile q Y (1/n | •).Results obtained with the semi-parametric estimator qn,Y and comparison with the purely nonparametric ones (q n,Y , q⊕ n,Y ) .