The diameter of a random elliptical cloud

We study the asymptotic behavior of the diameter or maximum interpoint distance of a cloud of i.i.d. $d$-dimensional random vectors when the number of points in the cloud tends to infinity. This is a non standard extreme value problem since the diameter is a max-$U$-statistic, hence a maximum of dependent random variables. Therefore, the limiting distributions may not be extreme value distributions. We obtain exhaustive results for the Euclidean diameter of a cloud of elliptical vectors whose Euclidean norm is in the domain of attraction for the maximum of the Gumbel distribution. We also obtain results in other norms for spherical vectors and we give several bi-dimensional generalizations. The main idea behind our results and their proofs is a specific property of random vectors whose norm is in the domain of attraction of the Gumbel distribution: the localization into subspaces of low dimension of vectors with a large norm.


Introduction
Let {X, X i , i ≥ 1} be i.i.d. random vectors in R d , for a fixed d ≥ 1. The quantities of interest in this paper are the maximum Euclidean norm M n (X) and the Euclidean diameter M (2) n (X) of the sample, that is where · denotes the Euclidean norm in R d . The behavior of M n (X) as n tends to infinity is a classical univariate extreme value problem. Its solution is well known. If the distribution of X is in the domain of attraction of some extreme value distribution, then M n (X), suitably renormalized, converges weakly to this distribution. We are interested in this paper only in the case where the limiting distribution is the Gumbel law. More precisely, the working assumption of this paper will be that there exist two sequences {a n } and {b n } such that lim n→∞ a n = ∞, lim n→∞ b n /a n = 0 and lim n→∞ nP( X > a n + b n z) = e −z for all z ∈ R, or equivalently, lim n→∞ P M n (X) − a n b n ≤ z = e −e −z .
The asymptotic behavior of the diameter of the sample cloud is also an extreme value problem since M (2) n (X) is a maximum, but it is a non standard one, because of the dependency between the pairs (X i , X j ). This problem has been recently investigated by [JJ12] for spherically distributed vectors, that is, vectors having the representation X = T W where W is uniformly distributed on the Euclidean unit sphere S d−1 of R d and T is a positive random variable in the domain of attraction of the Gumbel distribution, independent of W. This reference also contains a review of the literature concerning other domains of attractions.
If d = 1, a spherical random variable is simply a symmetric random variable, that is a positive random variable multiplied by an independent random sign. The diameter of a real valued sample is simply its maximum minus its minimum, and by independence and symmetry, it is straightforward to check that (M (2) n (X)−2a n )/b n converges weakly to the sum of two independent Gumbel random variables with location parameter log 2, i.e. distributed as Γ − log 2, where Γ is a standard Gumbel random variable. Note that the tail of such a sum is heavier than the tail of one Gumbel random variable.
If d ≥ 2, [JJ12] have shown that in spite of the dependency, the limiting distribution is the Gumbel law, but a correction is needed. Precisely, they proved that if (3) holds, with an additional mild uniformity condition, there exists a sequence {d n } such that d n → ∞, d n = O(log(a n /b n )) and The exact expression of the sequence {d n } will be given in the comments after Theorem 3.2. This implies that M (2) n (X)/(2M n (X)) converges in probability to 1, but the behaviors of M n (X) and M (2) n (X) are subtly different. Specifically, a n is typically a power of log n, so log(a n /b n ) is of order log log n.
It is possible to give some rationale for the presence of the diverging correcting factor d n in (4). In dimension one, two vectors with a large norm may be either on the same side of the origin or on opposite sides. In the latter case their distance is automatically large, typically twice as large as the norm of each one. In higher dimensions, two spherical vectors with a large norm can be close to each other and their distance will be typically much smaller than twice the norm of the largest one. Therefore we expect the probability that the diameter is large to be smaller in the latter case.
This suggests that the asymptotic behavior of the diameter is related to the localization of vectors with large norm. The behavior will differ if large values are to be found in some specific regions of the space or can be found anywhere.
There are many possible directions to extend the results of [JJ12]. One very simple case not covered by these results is the multivariate Gaussian distribution with correlated components. The Gaussian distribution is a particular case of elliptical distributions. The main purpose of this paper is to investigate the behavior of the diameter of a sample cloud of elliptical vectors.
Elliptical vectors are widely used in extreme value theory since they are in the domain of attraction of multivariate extreme value distributions. These distributions and their generalizations have been recently considered in the apparently unrelated problem of obtaining limiting conditional distributions given one component is extreme, see [FS10] and the references therein.
In this paper, the tail behavior of a product T U , where T is in the domain of attraction of the Gumbel distribution and U is a bounded positive random variable independent of T , was obtained as a by-product of the main result. Under some regularity assumption on the density of U at its maximum, the tail of T U is slightly lighter than the tail of T . The main reason is that if a random variable T is in the domain of attraction of the Gumbel distribution, then for any α > 1, This implies that for T U to be large, U must be very close to its maximum. The full strength of this remark was recently exploited in [BS13] who obtained the rate of convergence of U towards its maximum when the product T U is large and the conditional limiting distribution of the difference between U and its maximum, suitably renormalized. This property explains deeply the conditional limits obtained in [FS10]. Having in mind the earlier remarks on the link between the localization of the vectors with large norm and the asymptotic distribution of the diameter, it is clear that this localization property will be helpful to study the problem at hand in this paper.
The rest of the paper is organized as follows. In Section 2, we will define elliptical vectors and state our main results. In section 2.1, extending the results of [BS13], we will show that the realizations of a d-dimensional random elliptical vector with large norm are localized on a subspace of R d whose dimension is the multiplicity of the largest eigenvalue of the covariance matrix. This result will be crucial to prove our main results which are stated in Section 3. As partially conjectured by [JJ12,Section 5.4], if the largest eigenvalue of the covariance matrix is simple, then the limiting distribution of the diameter is similar but not equal to the one which arises when d = 1: correcting terms appear that are due the fluctuations around the direction of the largest eigenvalue. If the largest eigenvalue is not simple, say its multiplicity is k, then the diameter behaves as in the spherical case in dimension k, up to constants.
In Section 4, we will answer another question of [JJ12], namely we will investigate the l q diameter of a cloud of spherical vectors, for 1 ≤ q ≤ ∞. This problem is actually simpler than the corresponding one in Euclidean (l 2 ) norm, since the vectors with large norm are always localized close to a finite number of directions. Therefore, the "localization principle" applies and we obtain the same type of limiting distribution as in the case of an elliptical distribution with simple largest eigenvalue. For q = 1 and q > 2 the problem simplifies even further since the corrective terms vanish and the limiting distribution of the one dimensional case is obtained.
In Section 5, we discuss further possible generalizations and give several bidimensional exam-ples.
We think that beyond answering certain questions on the diameter of a random cloud, the main purpose of this paper is to emphasize the use of the localization principle of vectors with large norm in the domain of attraction of the Gumbel distribution. This principle should be useful in other problems.

The Euclidean norm of an elliptical vector
A random vector X in R d has an elliptical distribution if it can be expressed as where We will see that this number k plays a crucial role for tail of the norm and the asymptotic distribution of the diameter.
Let W i = (W i,1 , . . . , W i,d ), i = 1, 2, be independent random vectors uniformly distributed on S d−1 , and define X i = T i AW i , which are i.i.d. with the same distribution as X. Since for any orthogonal matrix P (i.e. P = P −1 ), P W is also uniformly distributed on S d−1 , it holds that and let {Y i , i ≥ 1} is a sequence of i.i.d. vectors with the same distribution as Y. Then Therefore, we will prove our results using the vectors {Y i , i ≥ 1}.
In all the sequel, we will assume that T is in the max domain of attraction of the Gumbel law, i.e. the limit (3) holds, or equivalently, there exists a function ψ T , called an auxiliary function for T , defined on (0, ∞) such that locally uniformly with respect to z ∈ R. Moreover, the survival function of T can be expressed as where lim x→∞ ϑ(x) ∈ (0, ∞). See e.g. [Res87, Chapter 0].
Define the functions ψ A and φ A on (0, ∞) by In the sequel, the notation ∼ means that the ratio of the two terms around ∼ tend to one when their parameter (x or n) tends to infinity and −→ denotes weak convergence of probability distribution.
Theorem 2.1. Let X be as in (5) with T satisfying (8), W uniformly distributed on S d−1 , and assume that the eigenvalues λ 1 , . . . , λ d of the correlation matrix satisfy (6). Then, Let Y be as in (7). Then, as x → ∞, conditionally on Y > x, where E is an exponential random variable with mean 1, W (k) is uniformly distributed on S k−1 , G k+1 , . . . , G d are independent standard Gaussian random variables, and all components are independent.

Comments
• This result implies that X is in the domain of attraction of the Gumbel distribution and that an auxiliary function for X is ψ A .
• The first statement can be obtained as a consequence of [FS10, Proposition 3.2.1]. In dimension 2, the second result is a consequence of [BS13, Theorem 2.1], where a real valued random variable X which can be expressed X = T u(S) is considered, with T satisfying (9), S taking values in [0, 1] and the bounded function u having some regularity properties around its maximum and the asymptotic behavior of S conditionally on the product T u(S) being large is obtained.
We now consider the polar representation of the vector Y, that is we define Θ = Y Y and for q = k + 1, . . . , d, we define also τ 2 q = λq λ 1 −λq .
Corollary 2.2. Under the conditions of Theorem 2.1, as x → ∞, conditionally on Y > x, where E is an exponential random variable with mean 1, W (k) is uniformly distributed on S k−1 , G k+1 , . . . , G d are independent standard Gaussian random variables, and all components are independent.
This result can be rephrased in terms of weak convergence of point processes (see e.g. [Res87, Proposition 3.21]). Let a n be the 1 − 1/n quantile of the distribution of X or Y , i.e. P( X > a n ) = P( Y > a n ) ∼ 1/n and set b n = ψ A (a n ) and c n = φ A (a n ). Define the points Corollary 2.3. Under the conditions of Theorem 2.1, the point processes n i=1 δ P n,i converge weakly to a Poisson point process Comments Since the measure e −x dx is finite on any interval [a, ∞], a ∈ R, the point process N has a finite number of points on any set [a, ∞] × S d−1 × R d−k . Therefore, the points can and will be numbered in such a way that Γ 1 > Γ 2 > . . . . Moreover, if the points P n,i are also numbered in decreasing order of their first component, then for each fixed integer m, (P n,1 , . . . , P n,m ) converges weakly to (P 1 , . . . , P m ).
We illustrate Theorem 2.1 for three dimensional Gaussian vectors whose maximum eigenvalue λ 1 of the correlation matrix is simple (Figure 1a) or double (Figure 1b). The rate of convergence to zero of the coordinates corresponding to the smallest eigenvalues is O(log n). Proof of Theorem 2.1 We will need the following Lemma.
Then W (k) is uniformly distributed on S k and independent of (W k+1 , . . . , W d ). If f is continuous and compactly supported on R d , then The convergence (12) can be extended to sequences of continuous functions f x with compact support which depend on x provided they converge locally uniformly to a continuous function with compact support. By bounded convergence, it can also be extended to sequences of bounded continuous functions f x if there exists a function f * (not depending on x) and such that |f x | ≤ f * for all x and The proof of the Lemma consists merely in a change of variable and is postponed to Section 6.
Since we have defined φ A such that xφ 2 locally uniformly with respect to z, w k+1 , . . . , w d . Let f be a continuous function with compact support in R d . Applying Lemma 2.4, we obtain where G k+1 , . . . , G d are i.i.d. standard Gaussian random variables.
The last step is to extend the convergence (13) to all bounded continuous functions f . By the comments after Lemma 2.4, it suffices to prove that the function k x can be bounded by a function k * independent of x and integrable with respect to Lebesgue's measure on R d−k . For any u ≥ 0 and p > 0, there exists a constant C such that, for large enough x, (see e.g. [FS10, Lemma 5.1]). For z ≥ 0, this trivially yields For a fixed z < 0, we write .
The first ratio in the right hand side is convergent hence bounded and since z is fixed, we can apply the bound (14) to the second ratio, upon noting that lim x→∞ ψ T (x+ψ T (x)z)/ψ T (x) = 1 for all z ∈ R. Thus (15) also holds with a constant C uniform with respect to z in compact sets of (−∞, 0]. Since we obtain, applying (15) with u = 1 2 d q=k+1 γ −2 q w 2 q and a fixed z ∈ R, For p large enough, the function k * (w 1 , . . . , w d ) = 1 + 1 2 d q=k+1 γ −2 q w 2 q −p is integrable with respect to Lebesgue's measure on R d−k . This concludes the proof.

Asymptotic behavior of the Euclidean diameter
We now study the behavior of the diameter of the elliptical cloud {X i , 1 ≤ i ≤ n}. Precisely, we investigate the asymptotic behavior of (M n (X) − 2a n )/b n in the case k = 1 and k > 1. As previously, we will prove our results with the vectors Y i , i ≥ 1.

Case k = 1: single maximum eigenvalue
In this case, the points P n,i defined in (10) become By Corollary 2.3, N n = n i=1 δ P n,i converges weakly to a Poisson point process By the independent increment property of the Poisson point process, the point process N can be split into two independent Poisson point processes N + and N − on R d whose points are the points of N with second component equal to +1 or −1 respectively. The mean measure of both processes is 1 2 e −x dx Φ τ 2 (dt 2 ) · · · Φ τ d (dt d ). Then the point processes N + n and N − n defined by converge weakly to the independent point processes N + and N − on R d which can be expressed as where {Γ ± i , i ≥ 1} are the points of a Poisson point process with mean measure 1 2 e −x dx on R, and {G ± i,q , i ≥ 1, q = 2, . . . , d} are i.i.d. standard Gaussian variables, independent of the points {Γ ± i , i ≥ 1}. Since the mean measure is finite on the half planes (x, ∞] × [−∞, ∞], there is almost surely a finite number of points of N ± in any of these half planes. Thus, the points of N ± can and will be numbered in decreasing order of their first component.
We can now state the main result of this section.
Theorem 3.1. Let {X i , i ≥ 1} be a sequence of i.i.d. random vectors with the same distribution as X and let the assumptions of Theorem 2.1 hold with k = 1, i.e. λ 1 > λ 2 , then are the points of two independent point processes with mean measure 1 2 e −x dxΦ(dt 2 ) . . . Φ(dt d ).
Comments The random variable defined in (16) is almost surely finite, since it is upper trivially holds. These two bounds imply that the limiting distribution is tail equivalent to the sum of two independent Gumbel random variables which is heavier tailed than a Gumbel distribution. However, it is not the sum of two independent Gumbel random variables. Therefore this result is different from the result in the spherical case in any dimension.

Case of the dimension 2
In dimension 2, a bivariate elliptical random vector X with correlation ρ ∈ (0, 1) can be defined by where U is uniformly distributed on [0, 2π], cos U 0 = ρ and sin U 0 = 1 − ρ 2 . The vector X admits the polar representation X = R(cos Θ, sin Θ) with The correlation matrix of X is then 1 ρ ρ 1 .
Its eigenvalues are 1 + ρ and 1 − ρ. By Theorem 2.1, we know that X is in the domain of attraction of the Gumbel law and more precisely, as x → ∞, Note that (1, 1) is always an eigenvector associated with the eigenvalue 1 + ρ. This means that the vectors in the cloud with large norm are localized close to the diagonal, whatever the value of ρ ∈ (0, 1). More precisely, let Θ n be the angle of the point X n of the cloud )/c n converges weakly to a Gaussian variable with mean zero and variance (1 − ρ)/2ρ. By Theorem 3.1, the limiting distribution of the diameter can be expressed as where are the points of two independent point processes with mean measure 1 2 e −x dxΦ(dt). If ρ = 1, the one dimensional case is recovered, but there is a discontinuity with the spherical case ρ = 0 where the limiting distribution is Gumbel and the normalization is different. Moreover, ifX n andX n are the points such that X n −X n = M (2) n (X), ifΘ n andΘ n are their respective angle such that cosΘ n > 0, cosΘ n < 0, then ((Θ n −π/4)/c n , (Θ n −5π/4)/c n ) converges weakly to a pair of i.i.d. Gaussian random variables with mean zero and variance (1 − ρ)/2ρ.
In Figure 2 we show two sample clouds of size 1000 of bivariate Gaussian variables with correlation ρ = 0.2 and ρ = 0.8. The rate of convergence to the diagonal is O(log n). In Figure 3, we show the empirical cumulative distribution function (cdf) of the limiting distribution based on 500 replications of the diameter of a Gaussian cloud (with correlation ρ = 0.2) of size 100 000. In simulations, the indices realizing the maximum in (17) are often i = 1 and j = 1. This implies that the limiting distribution of the diameter should be close to the distribution of the sum of two independent Gumbel random variables minus the square of a Gaussian random variable. We show this distribution together with the empirical and theoretical cdf of the diameter in Figure 3. The black thick line is the (simulated) theoretic cdf; the thick gray line is the empirical cdf based on 500 clouds of 100 000 points. The thin gray line is the cdf of the sum of two independent Gumbel random variables with location parameter log 2.

Proof of Theorem 3.1
Define the set O n = {w ∈ R d−1 | c 2 n w 2 ≤ 1} and the function f n on R×{−1, +1}×O n → R d by f n (r, , w) = (a n + b n r)( 1 − c 2 n w 2 , c n w) .
Define next the function g n on Since c n → 0, any w ∈ R d−1 is in O n for all large enough n. Then, for any r 1 , r 2 > 0, for The convergence is locally uniform. Moreover We want to conclude that the limiting distribution of (M (11)) by a continuous mapping argument, but some care is needed.
Then, by definition of the diameter, we have This yields the following lower and upper bounds for the diameter: As a corollary of the point process convergence, we obtain that The bounds (18) imply that the diameter is achieved by a pair of points (Ŷ n ,Y n ) such that Indeed otherwise, which is a contradiction. This implies that where E n is the set of points of N n whose first component is at least equal to (M + n ∧ M − n − a n )/b n − A n /b n , i.e.
Since by definition Y + n and Y − n belong to E n , it obviously holds that M where E + n and E − n are the points of E n whose second component is positive or negative, respectively, and P ± n is the point of E ± n with the largest first component, i.e. Y ± n . The convergence of the points of N n suitably numbered to those of N imply that the sets E + n and E − n converge to the sets E + and E − of points of N + and N − defined by The sets E + and E − are almost surely finite since the points Γ ± i are only finitely many in any interval (x, ∞). This implies that the cardinals of the sets E ± n are constant for large enough n. By Skorohod's representation theorem [Kal02,Theorem 3.30], we may moreover assume that the points of E ± n converge almost surely to those of E ± . Since g n converges uniformly to g on compact sets of R × {1} × R d−1 × R × {−1} × R d−1 and since P ± n converge to P ± 1 , g n (P + n , P − n ) converges to g(P + 1 , P − 1 ) which is finite. On the other hand, the points of (E are all included in a fixed compact set and thus This implies that for n large enough, max g n (x, y) ≤ g n (P + n , P − n ) .
We conclude that We can now apply a continuous mapping argument, since g n converges uniformly to g on compact sets of To see that this is identical to (16), note that if This proves that the maximum of g over all pairs of points of N + and N − is actually obtained over the pairs of E + × E − .

Case k > 1: multiple maximum eigenvalue
If k > 1, as in [JJ12], a strengthening of domain of attraction condition is needed to prove the result. Since an auxiliary function ψ can be chosen differentiable and such that lim x→∞ ψ (x) = 0, it always holds that lim x→∞ ψ(x + ψ(x)t)/ψ(x) = 1 locally uniformly with respect to t ∈ R. We must strengthen this uniformity as follows.
Assumption 3.1. For any positive function such that (x) → ∞ and (x) = O(log(x/ψ(x))) as x → ∞, This assumption is satisfied by all usual distributions, such as the Weibull, Gaussian, exponential or log-normal distributions. An important consequence is that the quantile of order 1 − 1/n of X and T can be related. Recall from Theorem 2.1 that, as x → ∞, with Let a T n be such that P(T > a T n ) ∼ 1/n and set b T n = ψ T (a T n ). Define the sequence {a n } by Then P( X > a n ) ∼ 1/n. This is a consequence of the equivalence (20) and Lemma 6.2. Let thus a n be defined as in (21) and define b n = ψ A (a n ) and

Comments
• In the spherical case k = d, we recover [JJ12, Theorem 1.1] and the constant c d therein is equal to the constant C k in (22) (taking the product over an empty set of indices to be equal to 1).
• We actually prove slightly more than the convergence (23). The proof can be used to

Proof of Theorem 3.2
The proof is nearly the same as the proof of [JJ12, Theorem 1.1]. We prove the convergence of a U -statistic of indicators to a Poisson random variable. The difference lies in added technicalities due to the coordinates of the vector corresponding to the smaller eigenvalues which have to be integrated out. In more precise terms, as in the proof of Theorem 3.1, we work with vague convergence of measures rather than weak convergence.
Define s n = 1 2 log d n and Since P(M (2) n (Y) > 2a n − b n d n + b n z) = P(S n (z) = 0), it suffices to prove that for all z ∈ R, S n (z) converges weakly to a Poisson random variable with mean e −z . For technical reasons, as in [JJ12], we must truncate the sum defining S n (z). Define In words, we restrict the sum to the indices of vectors whose norm is not too large, hence not too small either, since their distance must be large. Note that S n (z) = S n (z) implies that there is at least one index i such that T i > a T n + b T n s n . Since s n → ∞, this implies that for any A > 0, lim sup n→∞ P(S n (z) = S n (z )) ≤ lim sup Since A is arbitrary, this proves that for all z ∈ R, lim n→∞ P(S n (z) = S n (z)) = 0 .
This in turn implies that we only need to prove that S n (z) converges weakly to a Poisson random variable with mean e −z . This convergence is obtained by applying the criterion of [JJ86, Theorem 3.1 and Remark 3.4].
Lemma 3.3. Under the Assumptions of Theorem 3.2, The convergences (24) and (25) imply that S n (z) converges weakly to a Poisson distribution with mean e −z and this concludes the proof of Theorem 3.2.
The proof of Lemma 3.3 consists mainly in checking the vague convergence of certain measures and then strengthening this convergence to weak convergence by bounded convergence arguments. The requested bounds are obtained by means of Assumption 3.1 which is slightly stronger than the assumption of uniformity used in [JJ12, Theorem 1.1], but is satisfied for all usual distributions. Apart from these arguments, the proof follows the same lines as the proof of [JJ12, Theorem 1.1]. In view of their tedious technical nature, this proof is postponed to Section 6.
Let us note that as a by-product of the proof, we obtain in Lemma 6.5 the convergence of the cosine of the angle between two vectors Y 1 and Y 2 and of the components corresponding to the smaller eigenvalues, given that their distance is large and their norm is large, but not too large (this is quantified in the definition of S n (z)). This parallels the convergence proved in Theorem 2.1, but we do not explicitly use it in the proof of Theorem 3.2. It may eventually prove to be of interest for some other problem.

The l q norm of a random spherical vector
In this Section, the localization principle will be used to answer another question raised in [JJ12], namely, the asymptotic behavior of the l q diameter of a cloud of spherical random vectors in dimension d ≥ 2. Define the l q norm of a vector x ∈ R d by For d ≥ 2 and q ≥ 1, q = 2, the maximum of the l q norm is achieved on the l 2 sphere S d−1 at isolated points. Specifically, • if q ∈ [1, 2), then max w∈S d−1 w q = d 1/q−1/2 ; it is achieved at the 2 d "diagonal" points (±d −1/2 , . . . , ±d −1/2 ).
• if q ∈ (2, ∞), then max w∈S d−1 w q = 1; the maximum is achieved at the 2d intersections of the axes with S d−1 .
Therefore, the localization phenomenon will occur. A spherical vector whose norm is large must be close to the direction of one of these maxima, and the diameter will be achieved by points which are nearly diametrically opposed along one of these directions.
We consider a spherically distributed random vector, i.e. X = T W where T and W are independent and W is uniform on S d−1 . Let {X i , i ≥ 1} be a sequence of i.i.d. vectors with the same distribution as X. Define The behavior of X q differs only by constants for q ∈ [1, 2) and for q > 2, whereas the diameter has two very different behavior if q ∈ [1, 2) and q > 2. Therefore, we study these two cases separately. We start with the case q > 2 which is somewhat easier.

Case q > 2
For q > 2, the maximum of the l q norm on the l 2 sphere is 1 and is achieved at the 2d intersections of the sphere with the axes. We will see that the localization of the vectors with large norms occurs at a very fast rate, and therefore the diameter behaves asymptotically as in the one dimension case.
Theorem 4.1. Let X = T W where T and W are independent, W is uniform on S d−1 and T satisfies (3). For q ∈ (2, ∞], Moreover, conditionally on X q > x and X ∈ ∆ 1 , as x → ∞, where E is an exponential random variable with mean 1 and G 2 , . . . , G d are i.i.d. standard Gaussian random variables, independent of E. Comments If T 2 has a χ 2 distribution with d degrees of freedom, then X is a standard ddimensional Gaussian vector and Theorem 4.1 is a particular case of [HKP13, Theorem 1 and Example 1]. In that case, P(T > x) ∼ (1/2) d/2 Γ −1 (d/2), φ(x) = 1/x and the equivalent (26) yields The tail depends on d only in the constant but not in the exponent. This is expected since X q q is the sum of d independent random variables with subexponential tails. Hence, by definition of subexponentiality, the this sum is tail equivalent to d times the tail of one variable. This is specific to the Gaussian case, since otherwise the components of X are not independent.
Proof of Theorem 4.1. If W is uniformly distributed on S d−1 , then the distribution of W 1 has the density β d (1 − s 2 ) (d−3)/2 on [−1, 1] with β d = Γ(d/2) Γ((d−1)/2)Γ(1/2) . DefineW = (1 − W 2 1 ) −1/2 (W 2 , . . . , W d ). By Lemma 2.4,W is uniformly distributed on S d−2 and independent of W 1 . Let f be continuous with compact support in R d and define the function k for q = ∞. Then the following convergence holds, locally uniformly on [0, ∞) × R, This yields, for f continuous and compactly supported on R d , where R 2 has a χ 2 distribution with d − 1 degrees of freedom and is independent ofW. This implies that √ RW is a (d − 1) dimensional standard Gaussian vector. Equivalently, (R 2 /2, RW) can be expressed as ( 1 2 (G 2 2 +· · ·+G 2 d ), G 2 , . . . , G d ), where G 2 , . . . , G d are i.i.d. standard Gaussian random variables. This yields, for f continuous and compactly supported on The last step is to extend the convergence to bounded continuous functions. This is done as in the proof of Theorem 2.1, using the bound (15). Summing these equivalent over the 2d regions ∆ i yields (26).
Corollary 4.2. Under the assumptions of Theorem 4.1, as x → ∞, conditionally on X q > x and X ∈ ∆ 1 , where E is an exponential random variable with mean 1 and G 2 , . . . , G d are i.i.d. standard Gaussian random variables, independent of E.
The degeneracy with respect to the second variable in the convergence (28) is the key to the behavior of the diameter in this case. Let a n be the 1 − 1/n quantile of the distribution of X q and b n = ψ T (a n ).
where Γ + i and Γ − i , 1 ≤ i ≤ d are independent Gumbel random variable with location parameter log 2d.
Proof. With probability tending to one, the diameter will be achieved by a pair of points in two symmetric regions ∆ i and ∆ −i .

Case 1 ≤ q < 2
Let R d be split into 2 d isometric regions Q j , ±j = 1, . . . , 2 d−1 around each "diagonal" line x 1 = ±x 2 = · · · = ±x d , numbered in such a way that Q j = −Q −j and that Q 1 is the region which contains the point 1 = (1, . . . , 1). For q ∈ [1, 2), a spherical vector with a large l q norm must be close to one of the diagonals.
Theorem 4.4. Let X be as in Theorem 4.1. If 1 ≤ q < 2, then Moreover, conditionally on X q > x and X ∈ Q 1 , as x → ∞, where E is an exponential random variable with mean 1 and G is a Gaussian vector independent of E with covariance matrix

Comments
• The form of the covariance matrix implies that the components of the vectors G sum up to zero. This is natural since G must be in the space tangent to the sphere at the point d −1/2 1.
In view of this, a second order Taylor expansion yields ) . This yields, for f continuous and compactly supported, where R 2 has a χ 2 distribution with d − 1 degrees of freedom and is independent ofW. Thus RW is a (d − 1) dimensional standard Gaussian vector. This implies that (2 − q) −1/2 RŨ is a d dimensional Gaussian vector with covariance matrix This also implies that the components of RŨ sum up to zero. Summarizing, we have proved that, for f continuous and compactly supported where G is a Gaussian vector with mean zero and covariance matrix Σ. Again, the extension of the convergence to bounded continuous functions is done as in the proof of Theorem 2.1, using the bound (15). This proves (31). Summing this equivalence over the 2 d regions Q j yields (30).
Let U be as in (27). Theorem 4.4 yields that conditionally on X q > x and U ∈ Q 1 , , Theorem 4.4 and the convergence (32) can be adapted to each region Q j . For j = 1, . . . , 2 d , let ε j be the point of {−1, 1} d \ {1} which is in Q j . Then, conditionally on X q > x and U ∈ Q j , where G j = (ε 1 G 1 , . . . , ε d G d ) and (G 1 , . . . , G d ) is a Gaussian vector with zero mean and covariance matrix Σ.
The previous results can be translated into point process convergence. Let a n be the 1 − 1/n quantile of the distribution of X q . Define b n = ψ q (a n ) and c n = b n /a n . For j = 1, . . . , 2 d and i = 1, . . . , n, define Corollary 4.5. Let {X i , i ≥ 1} be a sequence of i.i.d. random vectors with the same distibution as X which satisfies the assumptions of Theorem 4.1. Then, Theorem 4.6. Let {X i , i ≥ 1} be a sequence of i.i.d. random vectors with the same distibution as X which satisfies the assumptions of Theorem 4.1. If 1 ≤ q < 2, then, where Γ ± i,j , i ≥ 1 j = 1, . . . , 2 d−1 are the points of independent Poisson point processes on (−∞, ∞] with mean measure 2 −d e −x dx and G ± i,j = (G ± i,j,1 , . . . , G ± i,j,d ), i ≥ 1, j = 1, . . . , 2 d−1 are i.i.d. Gaussian vectors with covariance matrix Σ Comments For q = 1, the corrective terms in (33) vanish and so the limiting distribution of the diameter is max j=1,...,2 d−1 Γ + 1,j + Γ − 1,j . If d > 2, it differs from the case q > 2 since the space is split into more regions (there are 2 d −1 diagonals and d axes).
Proof of Theorem 4.6. The diameter will be achieved by points nearly diametrically opposed and close to one of the diagonals. More precisely, In order to obtain the convergence of each sub-maximum, we proceed as in the proof of Theorem 3.1. The main step is the following. Define r n,i = a n + b n r i , i = 1, 2 , w n, where u and v are such that u q = v q = 1. This implies that This yields the expansion r n,1 w n,1 − r n,2 w n,2 q = 2a n This implies the convergence lim n→∞ r n,1 w n,1 − r n,2 w n,2 q − 2a The rest of the proof is exactly along the lines of the proof of Theorem 3.1.

Further generalizations
There are many ways to generalize the results of the previous sections, and because of the very local nature of the behavior of random vectors in the domain of attraction of the Gumbel distribution, it is possible to build all kind of ad hoc examples to illustrate nearly any type of behaviors. In this section we will only briefly describe several reasonable generalizations of elliptical distributions.
One possibility is to consider a random vector X that has the representation X = T W, where W is a random vector on the sphere S d−1 , no longer assumed to be uniformly distributed, and T is a positive random variable, independent of W. A second possibility is to assume that the vector X can be expressed as X = T g(W), where W is uniformly distributed on S d−1 and g is a bounded continuous function. This model includes the previous one if the function g takes values in the unit sphere. These models were used by [FS10] and [BS13] in the investigation of conditional limit laws of a bivariate vector given that one component is extreme. In such a model, the behavior of the vector given that its norm is large and the behavior of the diameter will be determined by the maxima of the function g . If they are isolated points, the localization phenomenon will arise and results such as Theorem 2.2 and 3.1 may be obtained. Otherwise, if g is constant on non empty open subsets of the sphere, we rather expect to obtain results similar to Theorem 3.2.
Another way to generalize the elliptical distributions is to consider vectors whose distribution has a density on R d of the form f (x) = e −U (x) where U is a continuous function on R d and the level sets of U are closed and convex and U satisfies some type of multivariate regular variation or asymptotic homogeneity. This type of assumptions has been used in [BE07] to obtain conditional limit laws of a vector given that one component is extreme and by [HR05] in the study of the longest edge of the minimum spanning tree of a random sample.
We leave this last direction as the subject of future research. In the following subsections, we give without proof several bidimensional examples. We only consider the Euclidean norm.

Generalized spherical distributions
Assume that X = T (cos Θ, sin Θ) where T and Θ are independent and the support of the distribution of Θ is [0, θ 0 ], θ 0 ∈ (0, 2π]. In this case, it holds that X = T and as previously, we denote the quantile of order 1 − 1/n of X by a n and define b n = ψ T (a n ).
The main question in this case is the existence of nearly diametrically opposed vectors in the sample cloud. If θ 0 < π, then there will be none, and therefore the diameter cannot behave like twice the norm.
The case 0 < θ 0 ≤ π/3 is trivial since X 1 − X 2 ≤ X 1 ∨ X 2 if the angle between X 1 and X 2 is less than π/3. In concrete terms, the distance between two points whose angle is less than π/3 is always smaller than their norms. This implies that M (2) n (X) ≤ M n (X). Define m n (X) = min 1≤i≤n X i and letX n andX n be points in the sample such that X n = M n and X n = m n . Then, by the triangle inequality Therefore we conclude that (M n (X) − M (2) n (X))/m n (X) → P 1 and If θ 0 ∈ (π/3, π), then there will be no vectors nearly diametrically opposed, but this case will differ from the case θ 0 ∈ [π, 2π] only by constants. As can be seen from the proof of Theorem 3.2 and [JJ12, Theorem 1.1], if θ 0 ≥ π, the asymptotic distribution of the diameter is determined by the behavior of cos(Θ 1 − Θ 2 ) at -1. If θ 0 ∈ (π/3, π), then it is determined by the behavior of cos(Θ 1 − Θ 2 ) when the angle between Θ 1 and Θ 2 is the largest, here θ 0 . Apart from this difference, the proof of [JJ12, Theorem 1.1] can be copied line by line to obtain the following result.

Generalized elliptical distributions
Let u, v be two continuous functions defined on [0, 1] such that u(0) = u(1) and v(0) = v(1) and such that the curve γ(s) = (u(s), v(s)) is simple. Define a bivariate random vector X by where T and S are independent and S is uniformly on [0, 1]. We call such a vector a generalized elliptical vector since elliptical vectors are obtained by choosing u(s) = cos(2πs) and v(s) = cos(2πs − U 0 ).
Define φ(x) = ψ T (x)/x and for i = 1, . . . q, m i = (s i ) and τ 2 i = −m i / (s i ). Adapting the proof of Theorem 2.1, we obtain This implies that an auxiliary function for X is mψ T (x/m). This idea has been exhaustively investigated in higher dimension under the assumption that T has a χ 2 distribution by [HKP13, Theorem 1 and 2].
We expect the diameter of the cloud to be achieved by pairs of points with large norms and which are nearly in the directions of the points γ(s i ) and γ(s j ) with maximum distance. We have obtained the limiting distribution of the diameter only when the two points with maximum distance are diametrically opposed.
Assume that γ(s 1 ) and γ(s 2 ) are diametrically opposed and that Assume for simplicity that this maximum is achieved only once. Let a n be the 1−1/n quantile of X /m and b n = ψ T (a n ). Adapting the proof of Theorem 3.1, we obtain are the points of two independent Poisson point processes with mean measure 1 q e −x dx, independent of the i.i.d. standard Gaussian random variables G + i , G − j , i, j ≥ 1. The problem when the points γ(s i ), γ(s j ) which achieve the maximum distance are not diametrically opposed is that the rate at which the vector with large norms concentrate to the directions of the points γ(s i ) and γ(s j ) is not fast enough to apply the arguments of the proof of Theorem 3.1. We leave this problem and higher dimensional extensions to future research.

Different rates of localization
The rate of localization of the vectors around the direction where the norm can be large is a n /b n in the previous examples. This is due to the regularity of the curve γ. Different rates may be obtained if the norm is not twice differentiable at its maxima but has some regular variation property. Consequently, different limiting distributions are also obtained. We give one example.
Define ψ a (x) = aψ T (x/a) and φ a,q (x) = {ψ a (x)/x} 1/(2q) . Let Z q be a random variable whose distribution admits the density q2 −1/(2q) Γ −1 (1/(2q))e − 1 2 |x| 2q with respect to Lebesgue's measure on R and let E be an exponential random variable with mean 1. Then, conditionally on X > x and cos U > 0, where the distribution Z q plays the role of the standard Gaussian distribution. Let a n be the 1 − 1/n quantile of X and let b n = ψ a (a n ). Then, where {Γ ± i , i ≥ 1} are the points of two independent Poisson point processes with mean measure 1 2 e −x dx and Z ± i are i.i.d. random variables with the same distribution as Z, and independent of the point processes. where R 2 has a χ 2 distribution with d degrees of freedom and is independent of W. Let R be such a random variable and define X = RW. The coordinates X 1 , . . . , X d of X are i.i.d. standard Gaussian random variables. It is then easily seen that hence W (k) is uniformly distributed on S k . Moreover, R k = (X 1 , . . . , X k ) is independent of W k . Noting that (W k+1 , . . . , W d ) = (X k+1 , . . . , X d ) and that W (k) is independent of X k+1 , . . . , X d , we obtain the independence of W (k) and (W k+1 , . . . , W d ).
Let f be compactly supported on R d and g be the density of (W k+1 , . . . , W d ). Since W (k) is independent of (W k+1 , . . . , W d ), it holds that Let us now compute g(0). Using the representation (36), we have, for any bounded measurable It is readily checked that J(r, 0, . . . , 0) = r (d−k)/2 , hence This yields the constant in (12).

Proof of Lemma 3.3
We need several preliminary results.
is uniformly distributed on S k−1 and where (u, v) = (u k+1 , . . . , u d , v k+1 , . . . , v d ) ∈ R 2(d−k) and c T n = b T n /a T n . The following Lemma gives the limit of the suitably rescaled functions g n and h n . The proof is elementary and is omitted.
Lemma 6.1. Let {ω n } be a sequence of positive numbers such that ω n = O(log(a T n /b T n )) and set s n = log ω n . Define the event T n = {a T n − b T n (ω n + s n ) ≤ T 1 , T 2 ≤ a T n + b T n s n }. Then, almost surely, locally uniformly, with Moreover, there exists a constant c > 0 such that Lemma 6.2. Under Assumption 3.1, for any sequence {ω n } such that ω n = O(log(a T n /b T n )), for all z ∈ R, Proof. Denoteã n = a T n −b T n ω n andb n = ψ T (ã n ). For any sequence {r n } that tends to infinity, the convergence P(T > r n +ψ(r n )z)/P(T > r n ) is locally uniform with respect to z ∈ R. Under Assumption 3.1, it holds thatb n /b n → 1. Thus, P(T >ã n + b T n z) P(T >ã n ) = P(T >ã n +b n b T ñ bn z) P(T >ã n ) → e −z .
Since the function ϑ has a positive finite limit at infinity, this yields (40).
For any sequence {ω n }, define s n = 1 2 log ω n , the event T n = {a T n − b T n (ω n + s n ) ≤ T 1 , T 2 ≤ a T n + b T n s n } and for z ∈ R, K n (z) = n 2 e −ωn ω n P(T 1 + T 2 > 2a T n − b T n ω n + b T n z ; T n ) .
Lemma 6.3. If Assumption 3.1 holds, then, for any sequence {ω n } such that ω n → ∞ and ω n = O(log(a n /b n )), and for all z ∈ R, Proof. The proof of the convergence (42) is a consequence of Lemmas 3.5 to 3.9 in [JJ12] under (39) as an assumption.
Lemma 6.4. If Assumption 3.1 holds, then for each p > 0, each sequence {ω n } such that ω n = O(log(b T n /a T n ), there exists a constant C such that, for large enough n and all y ≥ 0, sup u∈(−ωn,ωn) P(T > a T n + b T n (u + y)) P(T > a T n + b T n u) For all p > 0 and z ∈ R, there exists a constant C such that, for large enough n and all y ≥ 0, sup u∈(−ωn,ωn) K n (u + y + z) ≤ C(1 + y) −p .
Proof. Recall the representation (9). The function ϑ is upper and lower bounded, so P(T > a T n + b T n (u + y)) P(T > a T n + b T n u) = ϑ(a T n + b T n (u + y)) ϑ(a T n + b T n u) exp − y 0 ψ(a T n + b T n u) ψ(a T n + b T n u + b T n s) where ζ n ∈ (a T n + b T n u, a T n + b T n u + b T n s). Since lim s→∞ ψ (s) = 0, and by Assumption 3.1 ψ(a T n + b T n u)/b T n converges uniformly to 1 with respect to u ∈ (−ω n , ω n ), so, for > 0, and large enough n, it holds that exp   − This proves (43). To prove (44), define H n (u) = ne −ωn P(T ≤ a T n − b T n ω n + b T n u). Then, for any fixed z ∈ R and y ≥ 0, K n (u + y + z) = n ω n ωn+sn −sn P(T > a T n + b T n (u + y + z))H n (du) ≤ n ω n ωn+sn −sn P(T > a T n + b T n (u + y + z)) P(T > a T n + b T n (u + z)) P(T > a T n + b T n (u + z))H n (du) ≤ sup |u|≤ωn P(T > a T n + b T n (u + y + z)) P(T > a T n + b T n (u + z)) K n (z) .
Since K n (z) is a convergent sequence for each z ∈ R, it is bounded with respect to n. This yields (44).
Define c T n = b T n /a T n and d T n = 1 2 (2d − k − 1) log(a T n /b T n ) − log log(a T n /b T n ).
We are now in a position to prove Lemma 3.3.
Proof of (24). Let C k be as in (22). Then log C k = log C k − 2 log D k − log 2. Plug these values and the expression of a n in terms of a T n and b T n obtained in (21) into (47) and note that log(a n /b n ) = log(a T n /b T n ) + o(1).