Nonparametric conﬁdence intervals for conditional quantiles with large-dimensional covariates

: The ﬁrst part of the paper is dedicated to the construction of a γ - nonparametric conﬁdence interval for a conditional quantile with a level depending on the sample size. When this level tends to 0 or 1 as the sample size increases, the conditional quantile is said to be extreme and is located in the tail of the conditional distribution. The proposed conﬁdence interval is constructed by approximating the distribution of the order statistics selected with a nearest neighbor approach by a Beta distribution. We show that its coverage probability converges to the preselected probability γ and its accuracy is illustrated on a simulation study. When the dimension of the covariate increases, the coverage probability of the conﬁdence interval can be very diﬀerent from γ . This is a well known consequence of the data sparsity especially in the tail of the distribution. In a second part, a dimension reduction procedure is proposed in order to select more appropriate nearest neighbors in the right tail of the distribution and in turn to obtain a better coverage probability for extreme conditional quantiles. This proce- dure is based on the Tail Conditional Independence assumption introduced in (Gardes, Extremes , pp. 57–95, 18(3) , 2018).


Introduction
In a large range of applications, it is necessary to examine the effects of an observable R p -valued random covariate X on the distribution of a dependent R-valued variable Y . For instance, Y can model the level of ozone in the air and X be the vector gathering the concentration of other pollutants and weather conditions (e.g., Han et al. [17]). The relation between X and Y is commonly studied through the conditional expectation E(Y | X). An alternative way is to analyze conditional quantiles of Y given X. Recall that for all x ∈ X ⊂ R p , X being the support of X, and for α ∈ (0, 1), the (1 − α)-conditional quantile of Y given that X = x is Q(α | x) = inf{y; S(y | x) ≤ α}, where S(· | x) := P(Y > · | X = x) is the conditional survival function of Y given X = x. Starting from n independent copies (X 1 , Y 1 ), . . . , (X n , Y n ) of the random vector (X, Y ), conditional quantile estimation has been investigated by several authors see for example Stute [27] and Yang [30] for the case of a fixed level α and Gardes [9] and Gardes et al. [14] for an extreme level α = α n → 0. Instead of a point estimation of Q(α | x 0 ) where x 0 is a given point in X , we are interested here in the construction of a confidence interval for conditional quantiles. More specifically, our goal is to find a random interval [A n,γ (x 0 ), B n,γ (x 0 )] such that, where γ ∈ (0, 1) is a preselected probability. Usually, we take γ = 0.9 or 0.95. To allow us to make inference on the right and left tails of the conditional distribution, we also consider the case where α = α n depends on the sample size n and tends to 0 or 1 as the sample size increases. In the application on ozone concentration, this can be of high interest since large ozone levels in the air may cause serious effects on public health and on the environment. The literature on the construction of confidence interval for conditional quantiles is, up to our knowledge, only dedicated to the case where α is a fixed value in (0, 1). Several approaches have been considered.
The first one is called direct approach and is discussed for instance in Fan and Liu [7]). Let q(· | x 0 ) be the first derivative of Q(· | x 0 ) and let 0 < α 1 < α 2 < 1. The construction of the confidence interval is based on the existence of a random process Q n (· | x 0 ) indexed by α ∈ [α 1 , α 2 ] for which c n ( Q n (· | x 0 ) − Q(· | x 0 )) converges to a centered Gaussian process with variance q 2 (· | x 0 )σ 2 (· | x 0 ) for some positive sequence (c n ). It can be shown that the coverage probability of the interval converges to γ for any α ∈ [α 1 , α 2 ] where u β is the β-quantile of a standard normal distribution. The main drawback of the direct approach is that in most cases, the sequence c n depends on unknown quantities, such as the probability density function of X, that have to be estimated.
To avoid the estimation of c n , resampling methods have been considered by Parzen et al. [23] and Kocherginsky et al. [20]. Unfortunately, these methods are often time consuming.
This method of construction has been recently adapted by Goldman and Kaplan [16] to the conditional case but always for a fixed quantile level α. The first contribution of this paper is to adapt the order statistics method to the conditional case by using a nearest neighbors approach. Instead of using the whole sample as in the unconditional case, only the k n closest observations to x 0 are used in the order statistics method. The proposed confidence interval can be used for extreme conditional quantiles i.e., when the quantile level depends on n and tends to 0 or 1 as the sample size increases. The construction of confidence intervals for extreme conditional quantiles is more challenging because there are fewer observations available in the tail.
The nearest neighbors method strongly depends on the (pseudo-)distance in R p used to select the observations around the point of interest x 0 . The Euclidean distance is of course the natural choice but when p becomes large, some nearest neighbors can be located far away from the point of interest leading to confidence intervals with bad coverage probabilities. This is the well known curse of dimensionality phenomenon. To overcome this problem, one way is to assume the existence of a function g 0 : R p → R such that the conditional distribution of Y given X is equal to the conditional distribution of Y given g 0 (X). In other words, it is assumed that X and Y are independent conditionally on g 0 (X), in symbols X |= Y | g 0 (X), see Basu and Pereira [1]. The dimension of the covariate is thus reduced since X can be replaced by g 0 (X). In this case, it seems preferable to use the pseudo-distance d 0 defined for all (x, y) ∈ R p × R p by d 0 (x, y) = |g 0 (x) − g 0 (y)| instead of the Euclidean distance in R p . A natural question now is how to find the true function g 0 and therefore the most suitable distance d 0 ? One common approach is to assume that g 0 is linear i.e., that g 0 (x) = b 0 x for all x ∈ R p , where b 0 is a given vector in R p . This corresponds to the single-index model introduced in a regression context for instance by Li [21]. This single-index structure has been considered by Zhu et al. [31] for the estimation of conditional quantiles when the level α is fixed. Finding the distance reduces to finding the direction b 0 . Its estimation has received much attention in the literature; see Li [21] for the classical Sliced Inverse Regression (SIR) method, Cook and Weisberg [3], Samarov [25] and Cook and Li [2]).
Our second contribution is the proposition of a new data driven procedure to find an appropriate distance to use in the nearest neighbors selection process.
This distance is then used in the nearest neighbors order statistics approach for the construction of confidence intervals for conditional quantiles with extreme levels α = α n → 0. To reach this goal, we start with the dimension reduction assumption introduced in Gardes [10]. Roughly speaking, for some function g 0 : R p → R, we suppose that S(y | x 0 ) is equivalent, as y goes to infinity, to a function depending on x 0 only through g 0 (x 0 ). Hence, inference on extreme conditional quantiles of Y given X can be achieved only by using the information brought by the reduced covariate g 0 (X) and a good way to measure the closeness of the data to x 0 is to use the pseudo-distance defined for (x, y) ∈ R 2p by d 0 (x, y) = |g 0 (x)−g 0 (y)|. This distance is estimated by assuming that g 0 belongs to a parametric family. Note an estimator of g 0 has already been proposed by Gardes [10] in the particular case of a linear function. Unfortunately, the estimation procedure is computationally expensive.
The paper is organized as follows. The definition of the confidence interval for conditional extreme quantiles is given in Section 2. In particular, we show that the coverage probability of the proposed confidence interval converges to the nominal one. This section corresponds to our first contribution. The second contribution is handled in Section 3 where an estimator of the appropriate distance d 0 is proposed and used for the construction of a confidence interval for extreme conditional quantiles. In each section, the methods are illustrated with simulated data. An application to Chicago air pollution data set is proposed in Section 4. Section 5 concludes. All the proofs are postponed to Section 6.

Definition and main result
Let (X 1 , Y 1 ), . . . , (X n , Y n ) be n independent copies of a random vector (X, Y ). It is assumed throughout the paper that the distribution of (X, Y ) is absolutely continuous with respect to the Lebesgue measure. As mentioned in the introduction, for a given x 0 ∈ X where X is the support of X, our first contribution is to propose a confidence interval for the conditional quantile . In this paper, we assume that the quantile level α = α n depends on the sample size n. More specifically, Condition (2) with c ∈ (0, 1) corresponds to a classical conditional quantile level. This is the situation most frequently encountered in the literature. For instance, if α n = 1/2, the value Q(α n | x 0 ) is the conditional median of Y given X = x 0 . When c ∈ {0, 1} in (2), the level is said to be extreme. If c = 0 (resp. c = 1), the conditional quantile is located in the right tail (resp. left tail) of the conditional distribution of Y given X = x 0 .
The basic idea to construct a random interval [A n,γ (x 0 ), B n,γ (x 0 )] satisfying (1) is to apply the order statistics method to observations close enough to x 0 . In the unconditional case, the order statistics method to construct confidence interval is described in the introduction. To select the observations, a nearest neighbors approach is considered. More specifically, for some (pseudo-)metric d on R p , let (X . For a preselected probability γ ∈ (0, 1), we propose as a confidence interval for Q(α n | x 0 ) the following random interval where we recall that The confidence interval CI γ,αn (k n , d, x 0 ) is defined as in the unconditional case except that only the k n nearest neighbors random variables Y It remains to prove that the coverage probability of this interval tends to γ as the sample size increases. The accuracy of the confidence interval CI γ,αn (k n , d, x 0 ) depends on the smoothness of the function x → S[Q(α | x 0 ) | x]. For α ∈ (0, 1) and ζ > 0, we introduce the quantity from 1 when x belongs to the ball of center x 0 and radius ζ. Note that this quantity is classically considered when dealing with conditional distribution, see for instance Daouia et al. [4]. In the following result, the conditions required for the convergence of the coverage probability of (3) to γ are established.

Theorem 1.
Let γ ∈ (0, 1) and x 0 ∈ X . Assume that k n → ∞ and let h n such that P(d(X For a sequence of level α n ∈ (0, 1) satisfying (2), if S(· | x 0 ) is continuous and strictly decreasing, and if Note that, under the conditions of Theorem 1, other confidence intervals with asymptotic coverage probability γ can be proposed as for instance the one-sided confidence intervals R2γ−1(kn,αn),kn ].
The proof of Theorem 1 is based on the decomposition and on a similar one for P[Y (d,x0) Rγ (kn,αn),kn ≤ Q(α n | x 0 )]. This decomposition highlights two terms of error:  Condition (5) ensures that B 1,n (L γ (k n , α n )) converges to 0. Note that this condition entails that k n should be chosen not too large. In the unconditional case, i.e., if X and Y are independent then η n = B 1,n (j) = 0 for all j and one can take k n = n. Remark also that in the unconditional case, the accuracy of the confidence interval does not depend on the underlying distribution.
The second term of error is related to the behavior of the distribution function of a beta distribution. In Lemma 2, it is established that B 2,n = O(δ n ) and thus B 2,n → 0 under condition (4). If c = 0, the rate of convergence of α n to 0 is limited by (4) (namely, α n ln 2 (k n )/k n ). Similarly, when c = 1, one can construct an asymptotic confidence interval only if 1 − α n ln 2 (k n )/k n . Note that condition (5) is more restrictive when α n → 1 than when α n → c ∈ [0, 1). As shown in the simulation studies, the construction of confidence intervals in the left tail can thus be more difficult than in the right tail. It also appears that, as expected, the rate of convergence of the coverage probability can be very slow for extreme conditional quantiles.
In the next result, a sequence h n such that P(d(X

Proposition 1.
Assume that the distribution of X admits f X as a probability density function. If k n /(ln ln n) → ∞ and n/k n → ∞ then, for It thus appears that for a given value of k n , the radius h n increases with the dimension p. As a consequence, when p becomes large, some of the k n nearest neighbors can be located far away from the point of interest and the confidence interval can perform very badly. This phenomenon is well known as the "curse of dimensionality". In Section 3, a procedure to overcome this difficulty is proposed.

Illustration on simulated data
Let us take a look at the finite sample performance of the confidence interval introduced in the previous section. Using the observations of a sample {(X 1 , Y 1 ), . . . , (X n , Y n )} drawn from a random pair (X, Y ), our objective is to construct a γ-confidence interval for the conditional quantile Q(α n | x 0 ). In the estimation procedure, the nearest neighbors are selected with the classical Euclidean distance d e . Two models for the distribution of (X, Y ) are considered: where c and τ are positive functions defined for all x ∈ R p by c(x) = x 1 and τ (x) = c(x)ξ(g 0 (x)). The function ξ : R → (0, ∞) is defined by ξ(z) := 5z 2 /36 + 1/4. Note that Model 1 is not well defined when X = 0 since in this case S(y | 0) = 1 for all y.
In this model, the conditional distribution of Y given that X = x is a Burr distribution. Such a distribution is said to be heavy-tailed since for all t > 0 and x ∈ X , The function ξ • g 0 is referred to as the conditional extreme value index. It controls the right tail heaviness of the conditional distribution. This model is investigated with different values for the dimension p of the covariate and different functions g 0 : R p → R. More specifically, 4 settings are considered: − Model 2: The p components of the random vector X are independent and uniformly distributed on [−5, 5]. The conditional survival function of Y given X is given for y > 0 by The conditional distribution of Y given that X = x in Model 2 is a conditional Weibull type distribution, see for instance Gardes and Girard [12] or Gardes et al. [13] and ξ(g 0 (x)) is referred to as the conditional Weibull-tail index. As the conditional extreme value index, ξ(g 0 (x)) controls the tail heaviness of the conditional distribution. For p and g 0 , we consider the 4 settings (S 1 ) to (S 4 ).
To evaluate the performance of the confidence interval, we compute its coverage probability P[CI γ,αn (k n , d e , x 0 ) Q(α n | x 0 )]. This probability is approximated numerically by a Monte-Carlo procedure. More specifically, N = 2 000 independent samples of size n were generated. For given values of k n ∈ {1, . . . , n} and γ ∈ (0, 1), the confidence interval obtained with the r-th replication is denoted CI (r) γ,αn (k n , d e , x 0 ). The coverage probability is then approximated by This value is expected to be close to the preselected probability γ.
Selection of the number of nearest neighbors − We first take a look at the influence of the number of nearest neighbors k n . In Figure 1, for a sample of size n = 1000, the values of the coverage probabilities are represented as a function of k n ∈ {10, . . . , 200} for Model 1 with the settings (S 1 ), (S 2 ) and (S 3 ) for g 0 and p. Three different values for the conditional quantile level are considered: α = 1 − 8 ln(n)/n ≈ 0.9447, α = 1/2 and α = 1 − α 1,n ≈ 0.0553. The point of interest x 0 is the vector with all its components equal to 1. It appears that when the quantile level is close to 0 or 1, only few values of k n provide a reasonable coverage probability. It is thus relevant to propose a data driven procedure to select the value of k n . The selected number of nearest neighbors depends on the conditional quantile level α n , the point of interest x 0 ∈ R p , the nominal coverage probability γ and the distance d used to collect the nearest neighbors. First, let be the random variable corresponding to the center of the confidence interval CI γ,αn (k n , d, x 0 ). The basic idea to select a convenient number of nearest neighbors is to take k is a stability region of the finite sequence {C(n 0 ), . . . , C(n 1 )} where 1 ≤ n 0 < n 1 ≤ n. More precisely, we are searching for the valuẽ Var(C(i)).
Of course, the variance of C(i), and consequently the numberk (sel) n , is unknown in practice. We propose the following method to obtain an estimator ofk Let a ∈ (0, 1) and denote by · the floor function. For i ∈ {n 0 , . . . , n 1 }, the variance of C(i) is estimated by the local estimator where V(i) ⊂ {n 0 , . . . , n 1 } is the set of the na nearest neighbors of i. Finally, for a given η ≥ 0, we propose to take the following number of nearest neighbors: with the convention min{∅} = n 0 . Note that when η = 0, k (sel) n is the argument of the minimum of the sequence { Var n (n 0 ), . . . , Var n (n 1 )}. The role of η is to obtain a value of k (sel) n less sensitive to the fluctuations of the sequence { Var n (n 0 ), . . . , Var n (n 1 )}. To sum up, the setting parameters required to compute (6) are the integers n 0 and n 1 delimiting the possible values for k n , the value of a to compute the variance local estimator and the value of η. Throughout the simulation study, these parameters are fixed to n 0 = 0.05n/p , n 1 = 200, a = 0.006 and η to the first quartile of the sequence { Var n (n 0 ), . . . , Var n (n 1 )}.
In Figure 1, one can check that for the conditional median (α n = 1/2) the coverage probability obtained with the selected value of k n is close to the best attainable coverage probability. The choice of k n is much more difficult for the extreme conditional quantiles of level close to 0 or 1. Note that for the settings (S 1 ) and (S 2 ), the coverage probabilities are clearly better in the right tail than in the left one. This fact can partially be explained by the condition (5) in Theorem 1 since the factor k n α n /(1 − α n ) approaches 0 faster when α n goes to 0 than when α n goes to 1. However, for p and g 0 as in (S 3 ), the coverage probabilities are better in the left tail. Indeed, in this situation, the quantity ω(α n , h n ) is very close to 0 when α n is close to 1, counteracting the bad effect of the factor k n α n /(1 − α n ).
Influence of the sample size − To illustrate the influence of the sample size on the confidence intervals, we generate samples from Model 1 with different sample sizes n ∈ {100, 200, . . . , 2000}. The number of nearest neighbors are given by (6). In Figure 2, the values of the coverage probabilities are represented as a function of n. Three conditional quantiles levels are considered: α n = 1 − [n −3/10 ln(n)] 3 /14 (left tail), α n = 1/2 (conditional median) and α n = [n −3/10 ln(n)] 3 /14 (right tail). The point of interest is the vector of ones. Concerning the choice of p and g 0 in Model 1, the settings (S 1 ) and (S 3 ) are investigated.
When p = 1, the coverage probability for the conditional median (α n = 1/2) is correct for any value of the sample size between 100 and 2000. For a conditional quantile in the right tail, i.e., for a level close to 0, the coverage probability converges to the preselected value γ. This is no longer the case when the level is close to 1. This phenomenon can be explained by the difficulty to choose a correct number k n of nearest neighbors, see Figure 1 and the corresponding discussion.
When p = 4 and α n = [n −3/10 ln(n)] 3 /14 → 0, the coverage probability does not converge to γ. This is not a surprising fact in view of the data sparsity in the right tail of the distribution. For the conditional median, it seems that the coverage probability getting worse when n increases. This can perhaps be explained by a bad choice of k n , see Figure 1. Finally, a better behavior is observed for a conditional quantile close to 1 and a large value of n.
Influence of the point of interest x 0 − We generate a sample of size n = 1000 from Model 1 and we construct confidence intervals for the conditional and α ∈ {1 − β; 1/2; β}, β = [n −3/10 ln(n)] 3 /14 ≈ 0.047. For p and g 0 , we consider the two settings (S 1 ) and (S 3 ). In Figure 3, the values of the coverage probabilities are represented as a function of x 0 .
It appears that the coverage probability deteriorates when x 0 get closer to the boundary of the support, i.e., when t get closer to −5 or 5. This boundary effect is a classical source of bias for local estimators as for instance the density kernel estimator. The coverage probability is also poor when t is close to 0. Indeed, in this case, x 0 is close to 0 and, as mentioned before, Model 1 is not defined when the covariate X is equal to 0.
Influence of the covariate dimension − Our goal here is to assess the finite sample performance of the confidence interval for different values of the covariate dimension. The point of interest x 0 is the vector with all its components equal to 1 and the sample size is n = 1000. Three different levels for the conditional quantile Q(α n | x 0 ) are considered: α 1 = 1 − 8 ln(n)/n ≈ 0.945 (left tail), α 2 = 1/2 (conditional median) and α 3 = 8 ln(n)/n ≈ 0.055 (right tail). The values of the coverage probabilities are gathered in Table 1 for Model 1 and Table 2 for Model 2.
For the conditional median, the coverage probability is quite close to γ and the accuracy of the confidence interval is not affected by the dimension p of the covariate. For a right tail extreme conditional quantile, the coverage probability is close to the nominal one when p = 1, but the precision of the confidence interval is strongly deteriorated when p increases. As discussed before, this is an expected consequence of the data sparsity around x 0 when p increases. Finally, for a left tail extreme conditional quantile, the accuracy mostly depends on the function g 0 . As mentioned before, a bad performance in the left tail can be explained by the factor k n α n /(1 − α n ) in condition (5). However, for some functions g 0 , the quantity ω(α n , h n ) is very close to 0 leading to good coverage probabilities.

Selection of the nearest neighbors for large-dimensional covariates
Without any further assumptions, the classical Euclidean distance is the natural distance to use in order to select the nearest neighbors. Unfortunately, due to the data sparsity when p is large, this distance selects observations that can be located far away from the point of interest x 0 . The obtained confidence intervals can then perform very badly in particular for conditional quantiles in the tail of the distribution. This phenomenon has been illustrated in the previous section. We propose below a data driven procedure to choose a more convenient distance for the selection of the nearest neighbors located in the right tail of the distribution. Our data driven procedure is based on a tail dimension reduction model presented in the next section. The method described below is devoted to the right tail but it can be easily adapted to the left tail.

Dimension reduction model
In the literature dedicated to dimension reduction, it is commonly assumed that there exists a function g 0 : R p → R such that X |= Y | g 0 (X) or equivalently such that the conditional distribution of Y given X is equal to the conditional distribution of Y given g 0 (X). The dimension of the covariate is thus reduced since X can be replaced by g 0 (X) without loss of information. In this case, to select the nearest neighbors, it seems preferable to use the pseudo-distance d 0 defined for all (x, y) ∈ R p × R p by d 0 (x, y) := |g 0 (x) − g 0 (y)| instead of the Euclidean distance in R p . Recall that our goal is to select nearest neighbors located in the right tail of the conditional distribution. The classical condition X |= Y | g 0 (X) is thus relaxed by assuming that Y is tail conditionally independent of X given g 0 (X), see Gardes [9]. More specifically, we assume that (TCI) the right endpoint of the conditional distribution of Y given X = x is infinite for all x ∈ X and that there exists a function ϕ y : R → R such that, as y → ∞, −→ 1.
−→ stands for the almost surely uniform convergence 1 , see for instance Lukács [22] or Rambaud [24,Proposition 1]. Roughly speaking under (TCI), inference on extreme conditional quantiles of Y given X can be achieved only by using the information brought by the reduced covariate g 0 (X). The appropriate distance to select the nearest neighbors is thus the distance d 0 .
Note that if there exist φ : R → R, φ = Id andg 0 : R p → R such that g 0 = φ •g 0 then if g 0 satisfies (T CI) same holds for the functiong 0 . To ensure that g 0 is the only function satisfying (TCI), we must assume that g 0 ∈ G where G is a set of functions satisfying the following property: Let u p = (1, . . . , 1) ∈ R p . A classical set satisfying (P) is the set of linear functions given by with Θ p := {b ∈ R p with b b = 1 and b u p > 0}. Note that this set is the one considered in Gardes [10]. One can also consider sets of non-linear functions (see Section 3.3 for an example). The function g 0 satisfying (TCI) is unknown and has to be estimated. This is done in the next section. Note that in Gardes [10], an estimator has been proposed but the procedure is computationally expensive and can be used only for a linear function g 0 ∈ G L .

Estimation of g 0
To explain our estimation procedure, let us first assume that the function ϕ y involved in (TCI) is such that for all y ∈ R arg max where z * does not depend on y. Since under (TCI), P[Y > y | X] ≈ ϕ y (g 0 (X)) for y large enough, condition (8) entails that the largest observations of Y are more likely to be observed when g 0 (X) is close to z * . In other word, given that Y is large, the dispersion of g 0 (X) around z * must be small. One way to quantify such a dispersion is to consider a Gini-type dispersion measure given for a large threshold is an independent copy of (X, Y ). This measure is estimated by replacing the expectation by its empirical counterpart and by taking for the threshold y the order statistic Y n− nβn ,n where (β n ) is a sequence tending to 0 as the sample size increases. An estimator of g 0 is then obtained by solving arg min where X (i) is the concomitant of the order statistic Y n−i,n . This estimation procedure is only reliable if (8) holds. This quite restrictive condition can be weakened if for each g ∈ G we assume the existence of H ∈ N \ {0} nonoverlapping intervals S 1,g , . . . , S H,g covering the support of g(X) and such that for all h ∈ {1, . . . , H} arg max Hence, as explained above, for each h ∈ {1, . . . , H}, the Gini-type dispersion measure is expected to be small for a large threshold y. Let and, if n h,g = 0, let be the estimator of the Gini-type measure E h,g (u) obtained by replacing the expectation by its empirical counterpart and y by the order statistic Y n− nβn ,n . An estimator of g 0 can be defined as the solution of arg min for some penalty coefficient λ > 0. In practice, g n,0 is computed by taking the non-overlapping intervals S h,g := [ ξ h−1,g , ξ h,g ] where ξ h,g is the (h/H)sample quantile of {g(X i ), i = 1, . . . , n}. When H is large enough, one can reasonably assume that (10) is satisfied. The setting parameters of our procedure of estimation are the sequence β n , the number H of intervals and finally the penalty coefficient λ. In the simulation study below, these parameters are set to β n = 5/(3 √ n), λ = 1 and H = 20. A more theoretical justification of (12) is provided in Appendix A.

Illustration on simulated data
Let (X 1 , Y 1 ), . . . , (X n , Y n ) be n independent copies of a random vector (X, Y ) where X is a R p -valued random variable with p > 1 and Y is a R-valued random variable. The goal of this section is to assess the finite sample performance of the confidence interval for the conditional quantiles of Y given X = x 0 when the dimension p is large. Assuming that condition (TCI) holds for some function g 0 , we propose the following two step procedure for the construction of the confidence interval. i) Estimate the function g 0 by g n,0 = g 0 given in (12); ii) Select the nearest neighbors with the estimated distance defined for (x, y) ∈ R 2p by d 0 (x, y) = | g 0 (x) − g 0 (y)| and construct the confidence interval (3).
We recall that the estimator of g 0 is only adapted to the right tail of the conditional distribution. We thus focus on extreme conditional quantiles with α close to 0. Throughout this simulation study, the point of interest x 0 is the vector of ones and the 3 following models are considered for the distribution of the random vector (X, Y ): It can be shown that Model 1 satisfies condition (TCI) with ϕ y (z) = y −1/ξ(z) .
− Model 2: as defined in Section 2.2, with the same choices for p and for the function g 0 as before. Of course, condition (TCI) also holds for Model 2 with ϕ y (z) = exp(−y −1/ξ(z) ).

− Model 3:
The p components of the random vector X are independent and distributed as a normal random variable with mean 1/2 and standard deviation 1/3. The conditional quantile of Y given X is, for α ∈ (0, 1) For all x ∈ X and t > 0, and thus the conditional distribution in Model 3 is heavy tailed with conditional extreme value index ξ 1 (g 0 (x)). Again, one can show that for this model, condition (TCI) is satisfied with ϕ y (z) = y −1/ξ1(z) .
i) Estimation of g 0 − Let us first take a look at the finite sample performance of the estimator of the function g 0 defined in (12). The optimization problem is solved by using a coordinate search method, see Hooke and Jeeves [18] and Appendix B for more details. For the settings (S 1 ) to (S 4 ), the function g 0 is linear i.e., g 0 (x) = b 0 x for x ∈ R p . In this case, the minimization (12) is achieved over the set G = G L given in (7). For the setting (S 5 ), g 0 (x) = |b 0 x 2 | 1/2 and the minimization is achieved over the set In all cases, the function g 0 only depends on a vector b 0 ∈ R p and the minimization (12) is in fine achieved over the set Θ p . We denote by b 0 the obtained estimator of b 0 . The distance d 0 is then estimated by where, for the settings (S 1 ) to (S 4 ), g 0 (x) = b 0 x and, for the setting (S 5 ), g 0 (x) = | b 0 x 2 | 1/2 . Our estimator of b 0 is compared to the one obtained with another dimension reduction approach: the Slice Inverse Regression (SIR) method, see Li [21]. The assumption behind SIR is the existence of a direction b SIR ∈ R p such that the projection b SIR X captures all the information on Y . In other word, the conditional distribution of Y given X is supposed to be the same as the conditional distribution of Y given b SIR X. The estimator b SIR of b SIR is the eigen vector associated to the largest eigen value of the matrix Σ −1 Γ where Σ is the sample covariance matrix of X and Γ is the sample version of Cov(E(Y | X)). The SIR method is implemented in R, see https://cran.rproject.org/package=dr. Roughly speaking, b SIR X is the linear combination providing the best available information on Y . A natural idea is then to select the nearest neighbors with the data driven pseudo-distance To measure the performance of b 0 and b SIR as estimators of b 0 , we use the criterion where b is either b 0 or b SIR . We replicate N = 2 000 times the original sample of size n = 1000 in order to compute the empirical mean and standard deviation of this criterion. The results are gathered in Tables 3 to 5. For the 3 models, the estimator b 0 obtained by our approach is more accurate than the SIR estimator b SIR . For Model 1 and Model 3, the linear combination b 0 X captures the information on the tail distribution of Y but not on the whole distribution. This can explains the difficulty for SIR to estimate the vector b 0 . For Model 2, the conditional distribution of Y given X is the same as the one of Y given b 0 X. Despite of this, the performance of SIR is very poor. This is due to the fact that in this case, E(Y | X) is a symmetric function of b 0 X and it is well known, see e.g., Cook and Weisberg [3], that SIR fails to recover the true direction b 0 in this case. Finally, SIR is clearly not adapted to setting (S 5 ) since g 0 is a non linear function in this case.
ii) Behavior of the confidence interval − Our goal is to assess the finite sample performance of the confidence interval for Q(α n | x 0 ) defined in (3) when the estimated distance d 0 is used to select the nearest neighbors. First, we are interested in the influence of the number k n of nearest neighbors on the coverage probability. For a sample of size n = 1000 generated from Model 1 with settings (S 2 ) to (S 5 ), the coverage probabilities are represented as a function of k n ∈ {10, . . . , 200} in Figure 4. The k n nearest neighbors are selected with 3 different distances: our data driven distance d 0 , the ideal but unknown distance d 0 and the Euclidean distance. The conditional quantile level is fixed to α = 8 ln(n)/n ≈ 0.055. It appears that the choice of k n is really less crucial when one use the distances d 0 or d 0 . We can also check again that the selection of k n by the procedure described in Section 2.2 provides confidence intervals with a coverage probability close to γ.  Let us now look at the influence of the sample size n. We generate samples from Model 1 with n ∈ {100, . . . , 2000} and under setting (S 3 ). The coverage probabilities for the conditional quantile Q(α n | x 0 ) with α n = [n −3/10 ln(n)] 3 /14 are represented on Figure 5 as a function of n. The estimated distance d 0 and the Euclidean distance are considered for the selection of the nearest neighbors. As expected, when the distance d 0 is used, the coverage probability converges to γ as the sample size increases. This is not the case for the Euclidean distance.  δ( b, b 0 )   Finally, in Tables 6 to 8, we compare the coverage probabilities obtained under the 3 models and the 4 different settings for p and g 0 . The value of the sample size is fixed to n = 1000 and the conditional quantile level to α = 8 ln(n)/n ≈ 0.055. The nearest neighbors are selected with 4 distances: d 0 , d SIR , d 0 and the Euclidean distance d e . For settings (S 2 ) to (S 4 ), replacing the Euclidean distance by the estimated distance d 0 leads to a significant improvement in the coverage probability. Note that for setting (S 5 ), the estimation of the non linear function g 0 is more challenging, especially in Model 1, but the obtained coverage probability remains better than the one obtained with the Euclidean distance. Of course the best results are obtained for the unknown distance d 0 but they are generally close to the ones obtained with the estimated distance d 0 . Finally, the coverage probabilities obtained by using the distance d SIR are far from the preselected probability γ except for Model 3, setting (S 2 ). This was expected in view of the results presented in paragraph i).

Chicago air pollution data set
The Chicago air pollution data set, available on the R package NMMAPS Data Lite, gathers the daily concentrations of different pollutants (ozone (O 3 ), particular matter with diameter smaller than 10 microns or 25 microns (PM 10 or PM 25 ), sulphur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), carbon monoxide (CO), etc.) and some meteorology and mortality variables. The data were collected in Chicago from 1987 to 2000 during n = 4841 days. This data set has been studied by several authors in a dimension reduction context (e.g., Scrucca [26] and Xia [29]) and, in an extreme value context, by Gardes [10]. We are interested in the conditional distribution of Y given X = x 0 where Y corresponds to the centered and normalized concentration of O 3 (in parts per billion) and X is the covariate vector of dimension p = 4 corresponding to the centered and normalized daily maximum concentrations of PM 10 , SO 2 , NO 2 and CO. As in Gardes [10], we assume that condition (TCI) holds with g 0 (x) = b 0 x for x ∈ R 4 . These two vectors are quite different but both of them show that the covariate NO 2 bring the most important information on large values of ozone concentration. This point has also been noted by Scrucca [26] or Gardes [10].We construct the confidence interval for Q(α | x 0 ) given in (3). For the selection of the nearest neighbors, two distances are considered: The quantile level α is taken between 8 ln(n)/n ≈ 0.014 and 64 ln(n)/n ≈ 0.112. The number of nearest neighbors is chosen by the data driven procedure (6). For instance for α = 8 ln(n)/n and under situation 2, the number of nearest neighbors is 242. The value 0.014 for the quantile level is thus close to 0 in that sense that 242 × 0.014 ≈ 3.39. Keep in mind that condition (4) in Theorem 1 entails that k n α n → ∞. The confidence intervals with preselected probability γ = 0.9 are represented on Figure 6 as a function of α for the 2 values of x 0 and by using the two distances d 0 and d SIR . It appears that for α close to 0, the confidence intervals obtained with d 0 and d SIR are different. This difference is more important for situation 2 corresponding to large values for NO 2 and CO. When the distance d 0 is used, the length of the confidence interval increases when the quantile level α decreases. This is an expected behavior of confidence interval for extreme conditional quantiles. This is no longer the case when the distance d SIR is considered. Since our method is dedicated to right tail of the distribution and in view of the results presented in the simulation study, the use of d 0 is preferable. Note also that, as pointed out in Gardes [10] or Han et al. [17], very important ozone concentrations are more likely to be observed for large concentrations of NO 2 and CO.  Finally, the confidence intervals obtained with the distance d 0 are represented on Figure 7 as a function of b 0 X along with the concentrations Y of ozone. Two quantile levels are consider: α n = 0.02 and α n = 0.05. Be aware that what is represented in Figure 7 are the point wise confidence intervals and not the confidence bands for the function x → Q(α n | x), see the discussion in the next section.

Concluding remarks
As illustrated in the simulation study presented in Section 2.2, the construction of confidence intervals for extreme conditional quantiles with large-dimensional covariates is a difficult task. The main contribution of this paper is to propose a method to construct confidence intervals in such situations. First, based on the condition (TCI) introduced in Gardes [10], we reduce the dimension of the covariate. This dimension reduction method is dedicated to the right tail of the conditional distribution. Second, a nearest neighbors version of the order statistics approach is used to obtain the confidence intervals. The nearest neighbors are selected with a distance based on the reduced covariate rather than the classical Euclidean distance. The results obtained on simulated data show that the dimension reduction step improve substantially the performance of the confidence intervals when the quantile level is close to 0. This work can be continued in at least two directions.
1) Condition (4) in Theorem 1 entails that k n α n (1 − α n ) → ∞. As a consequence, α n cannot tend to 0 or 1 too fast and in particular, the conditional quantile must be located inside the range of the k n nearest neighbors. In this situation, the endpoints of the confidence interval are order statistics that can be seen as nonparametric estimators of the conditional quantiles Q(α n,L | x 0 ) and Q(α n,R | x 0 ) with α n,L = 1−L γ (k n , α n )/k n and α n,R = 1−R γ (k n , α n )/k n . For an extreme conditional quantile Q(α n | x 0 ) located outside the data range, i.e., when k n α n (1 − α n ) → c ∈ [0, ∞), the endpoints of the confidence interval can no longer be order statistics. In such a case, a possible solution to construct confidence intervals is to assume that the conditional distribution of Y given X = x 0 belongs to a given maximum domain of attraction. The endpoints of the confidence intervals can then be obtained by extrapolating the conditional quantiles Q(α n,L | x 0 ) and Q(α n,R | x 0 ) outside the data range. Extrapolated estimators can be found for instance in Daouia et al. [4]. The main difficulty is to establish the convergence of the coverage probability to γ; this is a work in progress.
2) In this paper, we focus on point wise confidence intervals since x 0 is fixed. It would also be interesting to obtain confidence bands for extreme conditional quantiles. Here the problem is to find a collection {(A n,γ (x), B n,γ (x)), x ∈ X } of random variables such that Proving such a convergence result is a difficult mathematical problem. As a departure points, one can try to adapt some elements of the proof of Theorem 1 in Gardes and Stupfler [15] where a uniform consistency result is proven in an extreme value framework.

Acknowledgements
I would like to thank the two anonymous referees and the Associate Editor for their valuable suggestions that led to an improved version of this paper. This research was supported by the French National Research Agency under the grant ANR-19-CE40-0013-01/ExtremReg project.

Preliminaries results
In this section we give two useful results on Beta distribution. The probability density function of a Beta distribution with parameters a and b is given by where Γ is the gamma function.
First, taking r = m n leads to √ 2πm 1/2+mn Next, using the Stirling's bounds with r = m n α n,τ yields to √ 2πs n ≤ ( m n α n,τ )! ≤ e × s n with s n := (m n α n,τ ) mnαn,τ +1/2 e mnαn,τ m n α n,τ m n α n,τ It is easy to check that for all τ ∈ [−1, 1], and m n α n,τ ≤ m n α n (1 + ε n ). As a consequence, one has for all τ ∈ [−1, 1] and for n large enough that, Note that the first inequality is due to the fact that since by assumption m n α n → ∞ and ε n → 0. We finally get that for n large enough and all τ ∈ [−1, 1], Finally, the Stirling's bounds applied to r = m n − m n α n,τ − 1 leads to √ 2πt n ≤ (m n − m n α n,τ − 1)! ≤ e × t n with Remark that for all τ ∈ [−1, 1], Furthermore, since by assumption one has for n large enough that As a consequence, we get Gathering (13), (14) and (15) yields to and the proof is complete by letting c 1 := π/2e −1 and c 2 := 4e.
Hence, under (H1) and (H2) and for a sufficiently large number of intervals S h,g0 , the J local maximum points of ϕ y belong to the interior of an interval S h,g0 . A supplementary condition on the distribution of (X, Y ) is also required. Let C ⊂ X such that P(X ∈ C) > 0. For all y and t ∈ (0, 1), let p y (· | C) be the survival function of S(y | X) given X ∈ C (p y (t | C) The associated quantile function is denoted by q y (· | C) := inf{t; p y (t | C) ≤ ·}.
Condition (H3) entails that the observations of g 0 (X) given that Y > y and g 0 (X) ∈ C are located, for y large enough, on a small probability. More specifically, we have the following result.
Condition (H3) is satisfied for instance by conditional heavy-tailed distributions defined for all x ∈ X by S(y | x) := y −1/γ(x) L(y | x), where γ is a positive function and for all x ∈ X , L(· | x) is a slowly varying function. This is the object of the following result.

Lemma 5.
Let us consider the random vector (X, Y ) such that for y > 0 and x ∈ X ⊂ R p , S(y | x) = y −1/γ(x) L(y, x), where γ is a positive function defined on X and for all x ∈ X , L(· | x) is a slowly varying function. Let C ⊂ X with P(X ∈ C) > 0. If the cumulative distribution of γ(X) given X ∈ C is continuous and if lim y→∞ sup x∈C ln L(y, x) ln y = 0 (24) then condition (H3) holds.
The result given by (25) appears clearly as a theoretical justification of our estimation procedure.
The proof is then complete by using condition (H2).
This inequality is true for all δ > 0. Since G is continuous, one can take to conclude the proof.
The first step of the proof consists in showing that lim y→∞ E j (y) = 0.
Since on I j , ϕ y admits a unique maximum point z * j in the interior of I j , B y,ε is an interval included in I j and containing z * j . Since f 0 (x) ≥ m for all x ∈ X 0 , conditions (29) conducts to m × l(B y,ε ) ≤ By,ε f 0 (x)dx ≤ εP(B j ).
As a consequence, We are now in position to prove (28). For y ∈ R,