Estimating the Conditional Tail Expectation in the Case of Heavy-Tailed Losses

The conditional tail expectation CTE is an important actuarial risk measure and a useful tool in financial risk assessment. Under the classical assumption that the second moment of the loss variable is finite, the asymptotic normality of the nonparametric CTE estimator has already been established in the literature. The noted result, however, is not applicable when the loss variable follows any distribution with infinite second moment, which is a frequent situation in practice. With a help of extreme-value methodology, in this paper, we offer a solution to the problem by suggesting a new CTE estimator, which is applicable when losses have finite means but infinite variances.


Introduction
One of the most important actuarial risk measures is the conditional tail expectation CTE see, e.g., 1 , which is the average amount of loss given that the loss exceeds a specified quantile.Hence, the CTE provides a measure of the capital needed due to the exposure to the loss, and thus serves as a risk measure.Not surprisingly, therefore, the CTE continues to receive increased attention in the actuarial and financial literature, where we also find its numerous extensions and generalizations see, e.g., 2-8 , and references therein .We next present basic notation and definitions.
Let X be a loss random variable with cumulative distribution function cdf F. Usually, the cdf F is assumed to be continuous and defined on the entire real line, with negative loss interpreted as gain.We also assume the continuity of F throughout the present paper.The CTE of the risk or loss X is then defined, for every t ∈ 0, 1 , by where Q t inf{x : F x ≥ t} is the quantile function corresponding to the cdf F. Since the cdf F is continuous, we easily check that Naturally, the CTE is unknown since the cdf F is unknown.Hence, it is desirable to establish statistical inferential results such as confidence intervals for CTE F t with specified confidence levels and margins of error.We shall next show how to accomplish this task, initially assuming the classical moment assumption E X 2 < ∞.Namely, suppose that we have independent random variables X 1 , X 2 , . . ., each with the cdf F, and let X 1:n < • • • < X n:n denote the order statistics of X 1 , . . ., X n .It is natural to define an empirical estimator of CTE F t by the formula where Q n s is the empirical quantile function, which is equal to the ith order statistic X i:n for all s ∈ i − 1 /n, i/n , and for all i 1, . . ., n.The asymptotic behavior of the estimator CTE n t has been studied by Brazauskas et al. 9 , and we next formulate their most relevant result for our paper as a theorem.
Theorem 1.1.Assume that E X 2 < ∞.Then for every t ∈ 0, 1 , we have the asymptotic normality statement when n → ∞, where the asymptotic variance σ 2 t is given by the formula The assumption E X 2 < ∞ is, however, quite restrictive as the following example shows.Suppose that F is the Pareto cdf with index γ > 0, that is, 1 − F x x −1/γ for all x ≥ 1.Let us focus on the case γ < 1, because when γ ≥ 1, then CTE F t ∞ for every t ∈ 0, 1 .Theorem 1.1 covers only the values γ ∈ 0, 1/2 in view of the assumption E X 2 < ∞.When γ ∈ 1/2, 1 , we have E X 2  ∞ but, nevertheless, CTE F t is well defined and finite since E X < ∞.Analogous remarks hold for other distributions with Pareto-like tails, an we shall indeed work with such general distributions in this paper.
Namely, recall that the cdf F is regularly varying at infinity with index for every x > 0. This class includes a number of popular distributions such as Pareto, generalized Pareto, Burr, Fréchet, Student, and so forth, which are known to be appropriate models for fitting large insurance claims, fluctuations of prices, log-returns, and so forth see, e.g., 10 .In the remainder of this paper, therefore, we restrict ourselves to this class of distributions.For more information on the topic and, generally, on extreme value models and their manifold applications, we refer to the monographs by The rest of the paper is organized as follows.In Section 2 we construct an alternative, called "new", CTE estimator by utilizing an extreme value approach.In Section 3 we establish the asymptotic normality of the new CTE estimator and illustrate its performance with a little simulation study.The main result, which is Theorem 3.1 stated in Section 3, is proved in Section 4.

Construction of a New CTE Estimator
We have already noted that the "old" estimator CTE n t does not yield the asymptotic normality in the classical sense beyond the condition E X 2 < ∞.Indeed, this follows by setting t 0, in which case CTE n t becomes the sample mean of X 1 , . . ., X n , and thus the asymptotic normality of CTE n 0 is equivalent to the classical Central Limit Theorem CLT .Similar arguments show that the finite second moment is necessary for having the asymptotic normality in the classical sense of CTE n t at any fixed "level" t ∈ 0, 1 .Indeed, note that the asymptotic variance σ 2 t in Theorem 1.1 is finite only if E X 2 < ∞.
For this reason, we next construct an alternative CTE estimator, which takes into account different asymptotic properties of moderate and high quantiles in the case of heavytailed distributions.Hence, from now on we assume that γ ∈ 1/2, 1 .Before indulging ourselves into construction details, we first formulate the new CTE estimator: where we use the simplest yet useful and powerful Hill's 15 estimator of the tail index γ ∈ 1/2, 1 .Integers k k n ∈ {1, . . ., n} are such that k → ∞ and k/n → 0 when n → ∞, and we note at the outset that their choices present a challenging task.In Figures 1 and 2, we illustrate the performance of the new estimator CTE n t with respect to the sample size n ≥ 1, with the integers k k n chosen according to the method proposed by Cheng and Peng 16 .Note that when t increases through the values 0.25, 0.50, 0.75, and 0.90 panels a -d , resp., the vertical axes of the panels also increase, which reflects the fact that the larger the t gets, the more erratic the "new" and "old" estimators become.Note also that the empirical i.e., "old" estimator underestimates the theoretical CTE F t , which is a well known phenomenon see 17 .
We have based the construction of CTE n t on the recognition that one should estimate moderate and high quantiles differently when the underlying distribution is heavy-tailed.For this, we first recall that the high quantile q s is, by definition, equal to Q 1 − s for sufficiently small s.For an estimation theory of high quantiles in the case of heavy-tailed distributions we refer to, for example, Weissman of the high quantile q s .Then we write CTE F t as the sum CTE 1,n t CTE 2,n t with the two summands defined together with their respective empirical estimators CTE 1,n t and CTE 2,n t as follows: Simple integration gives the formula Consequently, the sum CTE 1,n t CTE 2,n t is an estimator of CTE F t , and this is exactly the estimator CTE n t introduced above.We shall investigate asymptotic normality of the new estimator in the next section, accompanied with an illustrative simulation study.

Main Theorem and Its Practical Implementation
We start this section by noting that Hill's estimator γ n has been thoroughly studied, improved, and generalized in the literature.For example, weak consistency of γ n has been established by Mason 22 assuming only that the underlying distribution is regularly varying at infinity.Asymptotic normality of γ has been investigated under various conditions by a number of researchers, including Cs örgő and Mason 23 , Beirlant and Teugels 24 , Dekkers et al. 25 , see also references therein.
The main theoretical result of this paper, which is Theorem 3.1 below, establishes asymptotic normality of the new CTE estimator CTE n t .To formulate the theorem, we need to introduce an assumption that ensures the asymptotic normality of Hill's estimator γ n .Namely, the cdf F satisfies the generalized second-order regular variation condition with second-order parameter ρ ≤ 0 see 26, 27 if there exists a function a t which does not change its sign in a neighbourhood of infinity and is such that, for every x > 0, When ρ 0, then the ratio on the right-hand side of 3.1 is interpreted as log x.For statistical inference concerning the second-order parameter ρ, we refer, for example, to Peng and Qi 28 , Gomes et al.21 , Gomes and Pestana 29 .Furthermore, in the formulation of Theorem 3.1, we shall also use the function where the asymptotic variance σ 2 γ is given by the formula The asymptotic variance σ 2 γ does not depend on t, unlike the variance in Theorem 1.1.This is not surprising because the heaviness of the right-most tail of F makes the asymptotic behaviour of 1 t Q n s −Q s ds "heavier" than the classical CLT-type behaviour of t 0 Q n s − Q s ds, for any fixed t.This in turn implies that under the conditions of Theorem 3.1, statement 3.2 is equivalent to the same statement in the case t 0. The latter statement concerns estimating the mean E X of a heavy-tailed distribution.Therefore, we can view Theorem 3.1 as a consequence of Peng 30 , and at the same time we can view results of Peng 30 as a consequence of Theorem 3.1 by setting t 0 in it.Despite this equivalence, in Section 4 we give a proof of Theorem 3.1 for the sake of completeness.Our proof, however, is crucially based on a powerful technique called the Vervaat process see 31-33 , for details and references .
To discuss practical implementation of Theorem 3.1, we first fix a significance level ς ∈ 0, 1 and use the classical notation z ς/2 for the 1 − ς/2 -quantile of the standard normal distribution N 0, 1 .Given a realization of the random variables X 1 , . . ., X n e.g., claim amounts , which follow a cdf F satisfying the conditions of Theorem 3.1, we construct a level 1 − ς confidence interval for CTE F t as follows.First, we choose an appropriate number k of extreme values.Since Hill's estimator has in general a substantial variance for small k and a considerable bias for large k, we search for a k that balances between the two shortcomings, which is indeed a well-known hurdle when estimating the tail index.To resolve this issue, several procedures have been suggested in the literature, and we refer to, for example, Dekkers   In our current study, we employ the method of Cheng and Peng 16 for an appropriate value k * of the "parameter" k.Having computed Hill's estimator and consequently determined X n−k * :n , we then compute the corresponding values of CTE n t and σ 2 γ n , and denote them by CTE * n t and σ 2 * γ n , respectively.Finally, using Theorem 3.1 we arrive at the following 1 − ςconfidence interval for CTE F t : To illustrate the performance of this confidence interval, we have carried out a smallscale simulation study based on the Pareto cdf F x 1 − x −1/γ , x ≥ 1, with the tail index γ set to 2/3 and 3/4, and the level t set to 0.75 and 0.90.We have generated 200 independent replicates of three samples of sizes n 1000, 2000, and 5000.For every simulated sample, we have obtained estimates CTE n t .Then we have calculated the arithmetic averages over the values from the 200 repetitions, with the absolute error error and root mean squared error rmse of the new estimator C n t reported in Table 1 γ 2/3 and Table 2 γ 3/4 .In the tables, we have also reported 95%-confidence intervals 3.4 with their lower and upper bounds, coverage probabilities, and lengths.
We note emphatically that the above coverage probabilities and lengths of confidence intervals can be improved by employing more precise but, naturally, considerably more complex estimators of the tail index.Such estimators are described in the monographs by Beirlant et al. 11 , Castillo et al. 12 , de Haan and Ferreira 13 , and Resnick 14 .Since the publication of these monographs, numerous journal articles have appeared on the topic.Our aim in this paper, however, is to present a simple yet useful result that highlights how much Actuarial Science and developments in Mathematical Statistics, Probability, and Stochastic Processes are interrelated, and thus benefit from each other.

Proof of Theorem 3.1
We start the proof of Theorem 3.1 with the decomposition where

4.2
We shall show below that there are Brownian bridges B n such that

4.4
Assuming for the time being that statements 4.3 and 4.4 hold, we next complete the proof of Theorem 3.1.To simplify the presentation, we use the following notation:

4.5
Journal of Probability and Statistics 9 Hence, we have the asymptotic representation The sum W 1,n W 2,n W 3,n is a centered Gaussian random variable.To calculate its asymptotic variance, we establish the following limits:

4.7
Summing up the right-hand sides of the above six limits, we obtain σ 2 γ , whose expression in terms of the parameter γ is given in Theorem 3.1.Finally, since X n−k:n /Q 1 − k/n converges in probability to 1 see, e.g., the proof of Corollary in 39 , the classical Sultsky's lemma completes the proof of Theorem 3.1.Of course, we are still left to verify statements 4.3 and 4.4 , which make the contents of the following two subsections.

Proof of Statement 4.3
If Q were continuously differentiable, then statement 4.3 would follow easily from the proof of Theorem 2 in 39 .We do not assume differentiability of Q and thus a new proof is required, which is crucially based on the Vervaat process see 31-33 , and references therein Hence, for every t such that 0 < t < 1 − k/n, which is satisfied for all sufficiently large n since t is fixed, we have that

4.9
It is well known see 31-33 that V n t is nonnegative and does not exceed Since the cdf F is continuous by assumption, we therefore have that where e n t is the uniform empirical process √ n F n Q t − F Q t , which for large n looks like the Brownian bridge B n t .Note also that with the just introduced notation e n , the integral on the right-hand side of 4.9 is equal to

4.11
We shall next replace the empirical process e n by an appropriate Brownian bridge B n in the first integral on the right-hand side of 4.11 with an error term of magnitude o P 1 , and we shall also show that the second and third summands on the right-hand side of 4.11 are of the order o P 1 .The replacement of e n by B n can be accomplished using, for example, Corollary 2.1 on page 48 of Cs örgő et al. 40 , which states that on an appropriately constructed probability space and for any 0 ≤ ν < 1/4, we have that sup This result is applicable in the current situation since we can always place our original problem into the required probability space, because our main results are "in probability".Furthermore, since

4.13
Changing the variables of integration and using the property

4.14
The main term on the right-hand side of 4.14 is W 1,n .We shall next show that the right-most summand of 4.13 converges to 0 when n → ∞.
Changing the variable of integration and then integrating by parts, we obtain the bound

4.15
We want to show that the right-hand side of bound 4.15 converges to 0 when n → ∞.For this, we first note that Next, with the notation φ u Q 1 − u /u 1/2 ν , we have that when n → ∞, where the convergence to 0 follows from Result 1 in the Appendix of Necir and Meraghni 39 .Taking statements 4.15 -4.17 together, we have that the right-most summand of 4.13 converges to 0 when n → ∞.Consequently, in order to complete the proof of statement 4.3 , we are left to show that the second and third summands on the right-hand side of 4.11 are of the order o P 1 .The third summand is of the order o P 1 because |e n t Q n t − Q t | O P 1 and k/n 1/2 Q 1 − k/n → ∞.Hence, we are only left to show that the second summand on the right-hand side of equation 4.11 is of the order o P 1 , for which we shall show that To prove statement 4.18 , we first note that

4.19
The first summand on the right-hand side of bound 4.19 is of the order O P 1 due to statement 4.12 with ν 0. The second summand on the right-hand side of bound 4.19 is of the order O P 1 due to a statement on page 49 of Cs örgő et al. 40 see the displayed bound just below statement 2.39 therein .Hence, to complete the proof of statement 4.18 , we need to check that Observe that, for each n, the distribution of Q n 1−k/n is the same as that of Q E −1 n 1−k/n , where E −1 n is the uniform empirical quantile function.Furthermore, the processes {1 − E −1 n 1 − s , 0 ≤ s ≤ 1} and {E −1 n s , 0 ≤ s ≤ 1} are equal in distribution.Hence, statement 4.20 is equivalent to From the Glivenko-Cantelli theorem we have that for any θ ∈ 0, γ .In view of 4.23 , the right-hand side of 4.24 is equal to 1 o P 1 , which implies statement 4.21 and thus finishes the proof of statement 4.3 .

Proof of Statement 4.4
The proof of statement 4.
and so we have

4.26
We next show that the right-most term in 4.26 converges to 0 when n → ∞.For this reason, we first rewrite the term as follows: The right-hand side of 4.27 converges to 0 see notes on page 149 of Necir et al. 42 due to the second-order condition 3.1 , which can equivalently be rewritten as for every s > 0, where A z γ 2 a U z .Note that √ kA n/k → 0 when n → ∞.Hence, in order to complete the proof of statement 4.4 , we need to check that

4.29
With Hill's estimator written in the form we proceed with the proof of statement 4.29 as follows:

4.31
Furthermore, we have that

4.33
We now need to connect the right-hand side of 4.33 with Brownian bridges B n .To this end, we first convert the Y -based order statistics into U-based i.e., uniform on 0, 1 order statistics.For this we recall that the cdf of Y is G, and thus Y is equal in distribution to G −1 U , which is 1/ 1 − U .Consequently,

4.34
Next we choose a sequence of Brownian bridges B n see pages 158-159 in 42 and references therein such that the following two asymptotic representations hold:

Figure 1 :
Figure 1: Values of the CTE estimator CTE n t vertical axis versus sample sizes n horizontal axis evaluated at the levels t 0.25, t 0.50, t 0.75, and t 0.90 panels a -d , resp. in the Pareto case with the tail index γ 2/3.
18 , Dekkers and de Haan 19 , Matthys and Beirlant 20 , Gomes et al.21 , and references therein.We shall use the Weissman estimator

Figure 2 :
Figure 2: Values of the CTE estimator CTE n t vertical axis versus sample sizes n horizontal axis evaluated at the levels t 0.25, t 0.50, t 0.75, and t 0.90 panels a -d , resp. in the Pareto case with the tail index γ 3/4.

4 . 32 Arguments
on page 156 of Necir et al. 42 imply that the first term on the right-hand side of 4.32 is of the order O P √ kA Y n−k:n , and a note on page 157 of Necir et al. 42 says that √ k A Y n−k:n o P 1 .Hence, the first term on the right-hand side of 4.32 is of the order o P 1 .Analogous considerations using bound 2.5 instead of 2.4 on page 156 of Necir et al. 42 imply that the first term on the right-hand side of 4.31 is of the order o P 1 .Hence, in summary, we have that

1 1 −
k/n B n s 1 − s ds o P 1 .

1
statements on the right-hand side of 4.34 and also keeping in mind that γ n is a consistent estimator of γ see 22 , we have that k of equation 4.36 by 1 − γ, we arrive at 4.29 .This completes the proof of statement 4.4 and of Theorem 3.1 as well.

Table 1 :
Point estimates CTE n t and 95% confidence intervals for CTE F t when γ 2/3.

Table 2 :
Point estimates CTE n t and 95% confidence intervals for CTE F t when γ 3/4.
Since the function s → Q 1 − s is slowly varying at zero, using Potter's inequality see the 5th assertion of Proposition B.1.9 on page 367 of de Haan and Ferreira 13 ,we obtain that 42s similar to that of Theorem 2 in Necir et al.42, though some adjustments are needed since we are now concerned with the CTE risk measure.We therefore present main blocks of the proof together with pinpointed references to Necir et al.42for specific technical details.We start the proof with the function U z Q 1 − 1/z that was already used in the formulation of Theorem 3.1.Hence, if Y is a random variable with the distribution function