Fitting tails affected by truncation

In several applications, ultimately at the largest data, truncation effects can be observed when analysing tail characteristics of statistical distributions. In some cases truncation effects are forecasted through physical models such as the Gutenberg-Richter relation in geophysics, while at other instances the nature of the measurement process itself may cause under recovery of large values, for instance due to flooding in river discharge readings. Recently Beirlant et al. (2016) discussed tail fitting for truncated Pareto-type distributions. Using examples from earthquake analysis, hydrology and diamond valuation we demonstrate the need for a unified treatment of extreme value analysis for truncated heavy and light tails. We generalise the classical Peaks over Threshold approach for the different max-domains of attraction with shape parameter $\xi>-1/2$ to allow for truncation effects. We use a pseudo-maximum likelihood approach to estimate the model parameters and consider extreme quantile estimation and reconstruction of quantile levels before truncation whenever appropriate. We report on some simulation experiments and provide some basic asymptotic results.


Introduction
Modelling extreme events has recently received a lot of interest. Assessing the risk of rare events through estimation of extreme quantiles or corresponding return periods has been developed extensively and was applied to a wide variety of fields such as meteorology, finance, insurance and geology, among others. The methodology on modelling the univariate upper tail of the distribution of such quantities Y relies on the fact that the maximum of independent measurements Y i , i = 1, . . . , n, can be approximated by the generalised extreme value distribution: as n → ∞ P max i=1,...,n Y i − b n an ≤ y → G ξ (y) = exp −(1 + ξy) −1/ξ , 1 + ξy > 0, (1) where b n ∈ R, a n > 0 and ξ ∈ R are the location, scale and shape parameters, respectively. For ξ = 0, G 0 (y) has to be read as exp{− exp(−y)}. In fact, (1) represents the only possible non-degenerate limits for maxima of independent and identically distributed sequences Y i . Condition (1) is equivalent to the convergence of the distribution of excesses (or peaks) over high thresholds t to the generalised Pareto distribution (GPD): as t tends to the endpoint of the distribution of Y , then, withF the right tail function (RTF) of a given distribution, where σ Y (t) > 0. Below we set σ Y (t) = σ t . Setting t at the (k + 1)th largest observation y n−k,n for some k ∈ {1 . . . , n − 1} so that k data points are larger than the threshold t, (2) leads to the estimator of the tail probability P(Y > c) for c > 0 large, where (ξ,σ) denote estimators for (ξ, σ t ). The modelling of extreme values and the estimation of tail parameters through the peaks over threshold (POT) methodology has been discussed for instance in Coles (2001), Embrechts et al. (1997), Beirlant et al. (2004), and de Haan & Ferreira (2006).
Recently, Aban et al. (2006), Chakrabarty & Samorodnitsky (2012) and Beirlant et al. (2016) have addressed the problem of using unbounded probability mass leading to levels that are unreasonably large or physically impossible. All of these papers consider cases with shape parameter ξ > 0. In Beirlant et al. (2016) it was observed that the above mentioned extreme value methods, even when using a negative extreme value index, are not able to capture truncation at high levels. However, in several other fields, such as hydrology and earthquake magnitude modelling, the underlying distributions appear to be lighter tailed than Pareto. In this paper we will propose an adaptation of the classical approach to truncated tails over the whole range of max-convergence (1) with ξ > −0.5 as in the original POT approach.
First, we consider recent magnitude data (expressed on the Richter scale) of the 200 largest earthquakes in the Groningen area (the Netherlands), in the period 2003-2015, which are caused by gas extraction. In Figure 1, we present the time plot and the exponential QQ-plot (x n−j+1,n , log(j/n)) (j = 1, 2, . . . , n) where x 1,n ≤ . . . ≤ x n−j+1,n ≤ . . . ≤ x n,n denote the ordered data. Along the Gutenberg-Richter (1956) law the magnitudes of indepen- In Figure 1, a linear pattern is visible for a large section of the magnitudes data, while some curvature appears at the largest values. The data set was tested for serial correlation and no significance could be detected.
Secondly, we revisit the diamond size data considered in Verster et al. (2012). The nature of metallurgical recovery processes in diamond mining may cause under recovery of large diamonds between 30 and 60 cts per stone. If stones are not recovered during this process they are discarded onto tailing dumps from which they can be recovered during future re-mining programs.
Because even a small number of large diamonds can have a large value, the question arises whether re-mining a mine dump can be made profitable by recovering these large diamonds. Therefore, the expected number of large diamonds above certain carat values c is of interest and the original nontruncated values are to be reconstructed from the data, which exhibit truncation. In Figure 2, the Pareto QQ-plot or log-log plot (log x n−j+1,n , log(j/n)) (j = 1, 2, . . . , n) of the available carat data is presented. Again, a curvature near the top data is visible. Thirdly, we study the river flows of the Molenbeek river at Erpe-Mere in Belgium (in m 3 /s, n = 426) obtained between 1986 and 1996. The data are peaks over threshold values taken from a complete series of hourly flow measurements which was filtered in order to satisfy hydrological independence as discussed in Willems (2009). This river is prone to flooding at high flow levels and hence the measurements can be truncated. In Figure 3 the exponential QQ-plot is given, which exhibits a linear (i.e. exponential) pattern with a downward curvature near the largest floods. In this paper, we aim to provide a statistical model being able to approximate tail characteristics of distributions truncated at high levels. Moreover, the statistical estimation methods should also include the case of no-truncation in order for these methods to be useful and competitive both in cases with and without truncation. In the case of Pareto-type tails with ξ > 0 the proposed methods should also be compared with the methods which have been developed specifically for that sub-case.
To this purpose we extend the classical POT technique with maximum likelihood estimation of the GPD parameters ξ and σ. Of course estimators for tail probabilities and extreme quantiles of a truncated distribution are to be discussed. Estimation of the endpoint T of a truncated distribution is of particular importance as discussed above in earthquake applications. Motivated by the river flow and diamond valuation examples, we finally consider the problem of reconstructing quantiles of the underlying unobserved variable Y before truncation.

Model
Let Y denote a parent random variable with distribution function . We consider the right truncated distribution from which independent and identically distributed data X 1 , X 2 , . . . , X n are observed with, for some The corresponding RTF is denoted withF T (x) = P(X > x) and the tail quantile function is given by where D T =F Y (T )/F Y (T ) equals the odds of the truncated probability mass under the untruncated distribution Y .
The goal of this paper is to provide a test for truncation and to estimate • the model parameters ξ and σ = σ t , • the odds D T , • quantiles Q T (1 − p) (p small) of the truncated distribution and the truncation point T = Q T (1), • tail probabilities P(X > c) (c large) of the truncated distribution, • and reconstruct quantile levels Q Y (1−p) of the parent variable Y before truncation, all on the basis of a pure random sample from X (possibly) truncated at some large T .
We assume that the distribution of Y satisfies (1) or, equivalently, (2). Condition 2 is also known to be equivalent to the following condition relating extreme quantile levels at 1 − 1 vy and 1 − 1 y close to the endpoint of the distribution: there exists a positive measurable function a such that with a(1/F Y (t k,n )) = σ t where t = t k,n = U T (n/k). The right hand side of (8) is to be read as log v for ξ = 0.
The specific case ξ > 0 of Pareto-type distributions satisfies Also when ξ > 0, σ t ∼ ξt as t → ∞. Furthermore, it is known that σ t /t → 0 when ξ ≤ 0. Note that for a given T fixed, the tail of a truncated model X defined through (4) has an extreme value index ξ X = −1, see for instance Figure 2.8 in Beirlant et al. (2004).
Truncation of a distribution Y satisfying (2) at a value T necessarily requires t < T → ∞. The threshold t is mostly taken at the theoretical quantile Q T (1 − k n ) = U T (n/k), which in practice is estimated by the empirical quantile X n−k,n . Given the fact that our model is only defined choosing t = t n , T = T n → ∞ as the sample size n → ∞, the underlying model depends on n and a triangular array formulation X n1 , . . . , X nn of the observations should be used in order to emphasise the nature of the model. However, in statistical procedures as presented here, when a single sample is given, the notation X 1 , . . . , X n is more natural and will be used throughout. The model considered in this paper is then given by (M) For a sequence T n → ∞, {X n1 , . . . , X nn } = {X 1 , . . . , X n } are independent copies of a random variable X = X Tn where X = X Tn is distributed as Y |Y < T n , with Y satisfying (2) or equivalently (8).
Now we consider the distribution of the POT values for the data of the truncated distribution under (M): One can now consider two cases as t, T → ∞: • (T t ) Rough truncation with the threshold t = t n : and hence from (2) and with local uniform convergence in (2) This entails that for x ∈ (0, κ) (13) This corresponds to situations where the deviation from the Pareto behaviour due to truncation at a high value T will be visible in the data from t on, and the approximation of the POT distribution using the limit distribution in (13) appears more appropriate than with a simple GPD.
Light truncation is introduced for mathematical completeness. But (T t ) means that the truncation is not really visible in the data above t, and the classical extreme value modelling without truncation is appropriate. Hence, it will be practically impossible to discriminate light truncation from no truncation (i.e. T = ∞).
Under (T t ) with t = t k,n = U T (n/k) we find from applying F Y to both sides of (6) with u = n/k that while, using (2) and Now in order to be able to construct extreme quantile estimators under (T t ), remark that from (8) with vy = 1/p, y = 1/F Y (t) and k ξ (u) = (u ξ − 1)/ξ, we have as t → ∞ andF Y (t)/p → C for some constant C > 0 that Hence, with (7) and p =F Y (T )(1 + 1 uD T ) we obtain Using (15) and (2) with y = κ we obtain under (T t ) that Hence, we conclude that under (T t ) for 1/(uD T ) → 0 These derivations will motivate the proposed estimators of D T and extreme quantiles Q T (1 − p).

Estimators and goodness-of-fit
Estimation of the parameters (ξ, σ) in the classical POT without truncation is well-developed (Coles 2001, Beirlant et al. 2004. Fitting the scaled GPD with RTF (1 + ξ σ x) −1/ξ to the excesses X − t given X > t (based on (13)) using maximum likelihood is by far the most popular method in this respect. Here we rely on the generalisation (13) under (T t ), with t replaced by a random threshold X n−k,n and using the exceedances E j,k = X n−j+1,n −X n−k,n (j = 1, 2, . . . , k) for some k ≥ 2. Substituting E 1,k /σ for κ following (11), the log-likelihood is given by The partial derivatives are given by from which the likelihood equations defining the pseudo maximum likelihood estimators (ξ k ,τ k ) are obtained: When computing (ξ k ,τ k ), one has to impose the model restrictions. In order to meet the restrictions σ = ξ/τ > 0 and 1 + τ E j,k > 0 for j = 1, . . . , k, in our implementation we require the estimates of these quantities to be larger than the numerical tolerance value 10 −10 .
An estimator of D T now follows from taking u = n in (16): Similarly taking u = 1/p in (16) with np/k → 0, we obtain estimators for Setting p = 0 in (20) one obtains an estimator for the truncation point T : Based on (3) and (5) an estimator for tail probabilities P(X > c) can be derived:p Note that all proposed estimators from (17) from which the following estimator reconstructing Q Y (1 − p) of the parent distribution Y follows: In the specific case ξ > 0 the estimators developed above can be compared with those developed in Beirlant et al. (2016) for this special Pareto-type case: with H k,n = 1 k k j=1 log X n−j+1,n − log X n−k,n the Hill (1975) statistic, and R k,n = X n−k,n /X n,n .
Of course, in practice there is a clear need for detecting rough truncation. Let (T k ) and (T k ) denote light and rough truncation with the thresholds X n−k,n .
can be constructed generalising the goodness-of-fit test which was proposed by Aban et al. (2006) within a Pareto context, rejecting H 0,k at asymptotic level q ∈ (0, 1) when while the P-value is given by e −T k,n , as under H 0,k , T k,n approximately follows a standard exponential distribution as will be shown in Theorem 3 below.

Simulation study
The authors have performed an extensive simulation study concerning all the proposed estimators for different distributions of Y . We compare the results with the results from a Pareto analysisξ + k andQ + T,k (1 − p) (Aban et al. 2006, Beirlant et al. 2016, with the classical POT maximum likelihood results denoted byξ ∞ k ,Q ∞ k (1 − p), and with the classical moment estimators (Dekkers et al. 1989) In the Appendix we give a selection from these simulation results for Y following the standard Pareto distribution, the standard lognormal distribution, the standard exponential distribution, and the GPD with RTF H −0.2 . For each setting, 1000 samples for X of size 500 were generated where we consider different levels of truncation: T = Q Y (0.975), T = Q Y (0.99) and T = Q Y (1). Note that the last case corresponds to no truncation, or X = d Y . The samples were generated using inverse transform sampling with the quantile function Q T (p) = Q Y (pF Y (T )) (which can easily be deduced from (5)).
To show the performance of the test for truncation, we plot the average Pvalues over the 1000 simulations as a function of k in the first columns of Figures 7-10 (full line). Additionally, the median (dashed line), first quartile (dotted line) and third quartile (dotted line) of the P-values over the 1000 simulations are also plotted as a function of k. This corresponds to the box of the boxplot of P-values as a function of k. Finally, we add blue horizontal lines (dash-dotted line) indicating the standard significance levels of 1% and 5%. When truncation is present (T = Q Y (0.975) or T = Q Y (0.99)), the average P-values show that the test rejects the null hypothesis of no truncation when k is large enough. For the standard exponential, standard lognormal and GPD(-0.2,1) truncated at T = Q Y (0.99), the average P-value is higher than, or just below, the 5% significance level, even for high values of k. However, when looking at the median values and the third quartile, we see that the majority, and sometimes more than 75%, of the P-values are below the 5% significance level. When the data are not truncated, i.e. X = d Y , the P-values are on average always well above the considered significance levels, hence correctly not rejecting the null hypothesis. The first quartile of the P-values is also above the 5% significance level, except for smaller values of k. Note that when we look at Y ∼ GPD(−0.2, 1), Y itself is upper truncated at −σ/ξ = 5, but still X = d Y when we set T = Q Y (1). The simulation results show that the test performs as expected: rejecting the null hypothesis when T = Q Y (0.975) or T = Q Y (0.99), and not rejecting the null hypothesis when T = Q Y (1).
Concerning the estimation of ξ, see the second and third columns in Figures 7-10, the behaviour ofξ k in the standard Pareto case exhibits a slightly smaller bias but quite a larger variance compared toξ + T,k from Aban et al. (2006), Beirlant et al. (2016) which was constructed exclusively for the case ξ > 0. The classical POT and moment estimators exhibit large bias under truncation, as they tend to -1 when the threshold tends to x n,n . The mean squared error ofξ k is comparable to the mean squared error (MSE) of these estimators for k ≥ 200. In case of no truncation the bias ofξ k is the smallest for k ≥ 100 while the mean squared error is the worst of the four estimators. When ξ ≤ 0, the estimatorξ + T,k from the Pareto analysis is breaking down as can be expected whereas the difference between the classical estimators and the newly proposed POT estimator is small for k ≥ 200 in case ξ = 0 and k ≥ 300 in the case ξ < 0. In all cases presentedξ k compares well for k sufficiently large with the classical estimators when there is no truncation. Note that all estimators have a large bias for the (truncated) log-normal distribution. As can clearly be seen, the bias of all estimators decreases as truncation becomes lighter, or when there is no truncation, as expected. Moreover, the stable area of theξ k estimates starts for smaller values of k when the truncation point gets larger.

Asymptotic results
Here we present the asymptotic normality of (ξ,τ ) andQ T,k (1 − p) under rough truncation, and the asymptotic null distribution of the goodness-of-fit test statistic T k,n . The proofs are provided in the Appendix. We assume a second-order remainder relation in (8) (2006) where with ρ ≤ 0. Furthermore, we introduce the notations b T,k,n := k+1 (n+1)D T , a T,k,n := a Y 1/(F Y (T )(1 + b T,k,n )) , and we denote the limit of k/(nD T ) under rough truncation as derived in (15) by β := (1 + ξκ) 1/ξ − 1.
for a sequence of Brownian motions {W n (s); s ≥ 0}.
Under (T t ) the asymptotic result for (ξ k ,τ k ) can be checked to be identical to that of the classical MLE estimators under no truncation as given in Theorem 3.4.2 in de Haan & Ferreira (2006). Note that the information matrix I β equals 0 when κ = 0, or equivalently β = 0, so that the asymptotic variances are unbounded in such case. In practice this induces large variances for smaller values of k. This also appears in Figures 7-10. Fortunately, the bias stays reasonably small for larger values of k, as can be deduced for instance in case of the lognormal distribution.
In order to state the asymptotic result for the quantile estimatorQ T,k (1 − p) with p = p n → 0, we use the notation d n = k/(np n ). Furthermore, we will use the result that when U Y satisfies (25), we have that (2006)).
where E is a standard exponential random variable and {W n (s); s ≥ 0} a sequence of Brownian motions.
This result should be compared with Theorem 4.3.1 in de Haan & Ferreira (2006) stating the basic asymptotic result for the quantile estimator based on the classical ML estimators under no truncation. Note that under (T t ) the rate of the stochastic part in the asymptotic representation where E is a standard exponential random variable.

Case studies
Analysing the magnitude data from the Groningen area, it appears that the given 200 top data confirm the Gutenberg-Richter law withξ k clearly indicating that Y = M belongs to the Gumbel ξ = 0 domain. The goodnessof-fit test rejects light truncation for k ≥ 40 and the proposed truncation model fits well to the top 50 data as indicated on the exponential QQ-plot in Figure 4. Furthermore,D T,k indicates a truncation volume D T between 0.01 and 0.02. Finally, the endpoint can be estimated in two ways: directly on the magnitude data usingT M,k =Q T,k (1), or using a Pareto analysisT + E,k = Q + T,k (1) based on the energy data and transforming back to the magnitude scale with a logarithmic transformation. Both approaches lead to a value around 3.75.
Concerning the diamond data introduced in Figure 2,ξ k andξ + k , respectivelŷ D T,k andD + T,k , correspond well for k ≥ 250 and lead to a Pareto fit with extreme value index around 0.5 and a truncation odds D T around 0.02. The

Discussion
We proposed a general tail estimation approach for cases where truncation affects the ultimate right tail of the distribution. Using applications from geophysics, hydrology and geology we motivated the importance of this problem. The proposed estimators of the extreme value index, and quantiles of the truncated and underlying non-truncated distribution, in most cases compare well with the best performing alternatives, even in case there is no truncation. The proposed estimator of extreme quantiles of a truncated distribution is performing uniformly best. While the alternative procedures sometimes break down in at least one situation, our proposals remain always useful for large enough k. Hence, in addition to the existing methods, this method can be an interesting extra tool when analysing tails.
Proof. In order to derive (a), note that for j = 1, . . . , k, where we used (7), and where Y 1,n ≤ Y 2,n ≤ . . . ≤ Y n,n denote the order statistics of an i.i.d. sample from a standard Pareto distribution with distribution function 1 − 1/x for x ≥ 1. Hence, using (25) with t = 1 F Y (T )(1 + b T,k,n ) and x = 1 + b T,k,n 1 + n+1 jY n−j+1,n j k+1 b T,k,n , we obtain Using the mean value theorem we now obtain Hence, combining this with (27) and the result from Lemma 2.4.10 in de Haan & Ferreira (2006), we arrive at (a). Combining (a) with the analogous result for j = k + 1, one arrives at (b). To this end note that Ψ ξ,ρ (1) = 0.
Proof of Theorem 1. This proof follows the approach of the proof of Theorem 3.4.2 in de Haan & Ferreira (2006). Letτ k a T,k,n =τ s k , and Then, uniformly in j ∈ {1, . . . , k}, Hence, the first term on the left hand side of (17) is given by Moreover, using Proposition 1(b) with j = 1, we obtain where we used the series expansions (1)). Hence, the second term on the left hand side of (17) equals Combining (17), (28) and (29) gives This equation can be written as The left hand side of (18) yields, using similar asymptotic methods as above, The right hand side of (18) is asymptotically equivalent to (where we used again Proposition 1(b) with j = 1) Combining (18), (31) and (32) leads to (after some lengthy calculations) Proof of Theorem 2. Hence, First, using again the notation Y 1,n ≤ Y 2,n ≤ . . . ≤ Y n,n for the order statistics of an i.i.d. sample of size n from a standard Pareto distribution, we obtain using (26) Here, we used that n Yn,n = d E + O p 1 n and that Ψ ξ,ρ 1 + D k = O 1 k 2 for any constant D. Furthermore, Finally, using k/(nD T ) = (1 +τ k E 1,k ) 1 ξ k − 1 (1 + O p (1/k)) and derivations as in the proof of Theorem 1, the third term in the right hand side of (34) equals The result follows from joining (34), (35), (36) and (37) and retaining terms of order O 1 k , O Proof of Theorem 3. Note that using (6) andF Y (T ) = D T F Y (T ), we obtain  (1)) and x = Yn,n Y n−k,n 1+Y n−k,n D T 1+Yn,nD T = U −1 1,k (1 + o p (1)) since kY n−k,n n = 1 + O p (1/ √ k), Y n,n /n = 1 + o p (1), nD T → 0 and Y n−k,n /Y n,n = d U 1,k , the minimum of an i.i.d. sample of size k from the uniform (0,1) distribution. This, withτ s k /ξ = 1 + o p (1), yields The result now follows from kU 1,k = d E(1 + o p (1)).

Appendix: simulation results
and corresponding MSE with p = 0.01 for the standard lognormal distribution truncated at Q Y (0.975) (top), Q Y (0.99) (middle) and non truncated (bottom).
and corresponding MSE with p = 0.005 for the standard lognormal distribution truncated at Q Y (0.975) (top), Q Y (0.99) (middle) and non truncated (bottom).