Modelling extreme claims via composite models and threshold selection methods

The existence of large and extreme claims of a non-life insurance portfolio influences the ability of (re)insurers to estimate the reserve. The excess over-threshold method provides a way to capture and model the typical behaviour of insurance claim data. This paper discusses several composite models with commonly used bulk distributions, combined with a 2-parameter Pareto distribution above the threshold. We have explored how several threshold selection methods perform when estimating the reserve as well as the effect of the choice of bulk distribution, with varying sample size and tail properties. To investigate this, a simulation study has been performed. Our study shows that when data are sufficient, the square root rule has the overall best performance in terms of the quality of the reserve estimate. The second best is the exponentiality test, especially when the right tail of the data is extreme. As the sample size becomes small, the simultaneous estimation has the best performance. Further, the influence of the choice of bulk distribution seems to be rather large, especially when the distribution is heavy-tailed. Moreover, it shows that the empirical estimate of $p_{\leq b}$, the probability that a claim is below the threshold, is more robust than the theoretical one.


Introduction
In non-life insurance, a few large losses from a portfolio of policies may represent a major proportion of the total cost. Such extreme events are of great interest for actuaries as they play an important role in decision making procedure for insurers, for instance for calculating premia, measuring tail risk and finding optimal reinsurance schemes. The classic stochastic model for the total amount the insurer pays due to claims over a defined period of time is the collective risk model, where the claim number N and the individual losses Z i are modelled separately by the claim frequency and the claim severity distributions, respectively. The total claim loss X is then given by It is common to assume that the claim severities are identically distributed and independent of each other and of the claim number (Kaas et al., 2001;Klugman et al., 2012). In particular, the assumption that claim severities are identically distributed is unrealistic. However, the heterogeneity between the claim severities for different policies will be averaged out when we essential to have a good model for both moderate and large claims. Extreme value theory (EVT) and the properties of the Generalised Pareto distribution (GPD) allow us to construct a composite model which consists of two parts; a sub-threshold, or bulk distribution and an over-threshold distribution, modelling claims below and above the threshold b, respectively (Bølviken, 2014). This method for constructing composite models is known as the excess over threshold (EOT) or peaks over threshold (POT) approach (Smith, 1989) and is commonly used within non-life insurance.
One of the main concerns of EOT is the selection of an appropriate threshold for a given data set. Many approaches for threshold estimation in extreme value applications have been developed. The traditional fixed threshold estimation method uses graphical diagnostics to make a choice of threshold visually. The mean residual life (or mean excess) plot (Davison and Smith, 1990), which based on the mean excess function, is a quite common tool in the field of insurance. Cebriaan et al. (2003) use a sequential Mann-Kendall test, known as the Gertensgarbe Plot proposed by Gerstengarbe and Werner (1989). Other commonly used graphical diagnostics can be found in Coles (2001), including threshold stability, quantile and return level plots.
There have also been proposed a number of heuristic methods for threshold selection, which include the upper 10% rule of DuMouchel (1983), the square root rule (Ferreira et al., 2003) and the empirical rule (Loretan and Phillips, 1994), all outlined in Scarrott and Macdonald (2012). These approaches have no theoretical justification, but are easy to implement and frequently used in practice.
Another alternative is automatic threshold selection with the aim of balancing bias and variance. The most common methods of this type are based on minimisation of some kind of mean squared error (MSE) estimate, and are discussed in Caeiro and Gomes (2015). Hall (1990) and Gomes and Oliveira (2001) use the bootstrap in the threshold selection, whereas Guillou and Hall (2001) and Drees and Kaufmann (1998) propose a bias reduction procedure for choosing the optimal threshold.
In the above mentioned methods, the threshold is selected in a first step. Then the parameters of the distributions above and below the threshold are estimated keeping the threshold fixed. Cooray and Ananda (2005) propose a composite model with a corresponding estimation method, that allows for a simultaneous estimation of the threshold and other model parameters. Their model is generalised by Scollnik (2007). Yet another approach is to use a threshold mixing model, in order to take the threshold uncertainty into account, as in Tancredi et al. (2006). Further, Pigeon and Denuit (2011) develop the composite model of Cooray and Ananda (2005) to allow for heterogeneity of the threshold and let it vary among data. Alternatively, Wong and Li (2010) propose a composite model, which the threshold is solely determined within, and the parameters are estimated by the method of maximum product of spacings (MPS).
The question is what the optimal threshold selection method for non-life insurance claim data is. In particular, since limitation of data is common within certain business lines and for young companies, increasing parameter uncertainty with decreasing amount of available data should be considered. The crucial point here is not the error in the estimated parameters themselves, but rather how different threshold selection methods affect the measure of primary interest, namely the claim reserve and how they perform as the sample size decreases. Most of the literature on threshold selection focuses on the fit of the distribution above the threshold. As the reserve also depends on the distribution below the threshold, the so-called bulk distribution, we also want to investigate how much the estimation of the reserve is affected by the choice of bulk distribution. In order to investigate this, we will perform a large simulation study.
The rest of the paper is organised as follows. Section 2 gives an overview of the considered composite models. Then, the threshold selection methods compared in the simulation study are presented in Section 3. Further, the simulation study and its results are given in Section 4. In Section 5, we apply the methods from Section 3 to a real data set. Finally, Section 6 provides a discussion and some concluding remarks.

Composite models
The usual way of modelling claim severity data that contain extreme values is to use a composite distribution, where claim sizes below a certain threshold b are assumed to be ''normal'', and hence follow one of the common claim size distributions, right truncated at b, and claim sizes above the threshold are assumed to be extreme. Let f 1 and f 2 be two probability density functions(pdf's) concentrated on [0, ∞), and let b > 0. Moreover, let F 1 and F 2 denote the corresponding cumulative distribution functions (cdf's). We then define: Then let r ∈ [0, 1], and define a new pdf f as follows, It is easy to verify that f is a legitimate pdf by checking that the integral of f is 1.
Assume in particular that f 2 (z) = 0 for z < b. Thus, we have: In this case (2.1) can be written as: (2.2) Moreover, we assume that r = c 1 = F 1 (b). Then (2.2) can be written as: (2. 3) Let f 1 = f ≤b and f 2 = f >b be the pdfs of the distributions below and above the threshold and f ≤b|z≤b and f >b|z>b are the corresponding conditional pdfs of the distributions below and above the threshold, conditioning on Z ≤ b and Z > b, respectively.
Then the pdf of the composite model according to (2.3) is given by (2.4) Correspondingly, the cdf of the composite model can be written as where F ≤b and F >b are the cdfs corresponding to f ≤b and f >b , respectively. Pickands (1975) and Balkema and de Haan (1974) have shown that f >b|z>b , for the shifted variable [Z − b|Z > b], may be approximated by a GPD for sufficiently large b. We restrict our attention to the Pareto type II distribution, which is a special case of the GPD, and in practice is very commonly used for extreme insurance claims. This means that where α is a shape parameter and β is a scale parameter.
The bulk distribution is given by where f ≤b may be any suitable distribution for small and moderate claims sizes, for instance the Gamma, Weibull, log-normal or log-Gamma, that we have used in the simulation study.
As mentioned earlier, this model is usually fitted by first selecting the threshold b with some methods. Then the data are divided according to b, fitting the bulk distribution (2.7) to the data below and the Pareto distribution (2.6) to the data above b. We have chosen to do this with maximum likelihood estimation.
As an alternative, we consider the threshold mixing model of Scollnik (2007). Their composite model consists of a truncated log-normal bulk distribution mixed with a Pareto type II distribution. The mixing is done in such a way that the resulting model (2.1) is continuous in the threshold b, which is generally not the case for the model (2.4). More specifically, the mixing where µ and σ are parameters of log-normal distribution. The resulting model has a similar shape to log-normal distribution but with a thicker tail, which shows a better fit for larger losses in general insurance. The authors have proposed a method for estimating the parameters b and α with maximum likelihood, that we use in the simulation study. This model here seems more advantageous than the model (2.4) since the parameters are estimated simultaneously given the data.
There are other threshold mixing models that might have been considered, among others the ones of Tancredi et al. (2006) or Pigeon and Denuit (2011). These are more flexible than (2.4), but also more cumbersome to fit.

Threshold selection methods
Choosing a plausible threshold for real life data is a classic tradeoff between bias and variance. A too large threshold yields few exceedances and consequently a larger uncertainty in the parameters of the GPD. On the other hand, if the threshold is too small, the GPD is no longer adequate for all the exceedances, which results in a larger bias in quantiles estimates. As earlier mentioned, a large number of threshold selection approaches has been proposed. As a large simulation study requires the threshold selection to be automatic, methods that involve visual inspection, such as the mean excess plot, are not included. Further, we have chosen to focus on methods that are easy to use and computationally not too demanding and also represent different types of approaches.

Heuristic methods
The heuristic methods are essentially rules of thumb. They have the advantage of being very easy to implement. That is probably why they are so frequently used in practice by insurance companies. Let z (1) ≤ · · · ≤ z (n) be the data sorted in ascending order and k be some real number in the interval [0, n]. Then, the methods included in the study all estimate the threshold b bŷ b = z ([n−k]) , where [n − k] denotes the integer among 1, . . . , n that is closest to n − k. The fixed quantile rule estimates the threshold by the (1 − ϵ) · 100% empirical quantile of the data, for some fixed, level α, which means that k = ϵn. DuMouchel (1983) proposed to use ϵ = 0.1, but ϵ = 0.05 is more appropriate in our setting and is therefore used in the study. The square root rule, suggested by Ferreira et al. (2003) is to let k = √ n. Finally, the empirical rule of Loretan and Phillips (1994) consists in letting k = n 2/3 log(log(n)) .

Minimum AMSE of the Hill estimator
This method is based on the Hill estimator of the tail index ξ = α −1 , where α is the shape parameter of the Pareto distribution (see Section 2). Using the fact that is Z is Pareto distributed when Z > b, then ln(Z )−ln(b) follows the exponential distribution with mean ξ , Hill (1975) proposed the estimator where z (n−k) is an estimate of the threshold b and k is now an integer. The idea of Caeiro and Gomes (2015) is to letb = z (n−k 0 ) wherek 0 is an estimate of the value k 0 of k that minimises the asymptotic mean squared error (AMSE) ofξ k,n , i.e. the sum of the squared bias and the variance of the asymptotic distribution of ξ k,n . The purpose is precisely to find a threshold that is a tradeoff between bias and variance. It may be shown that under certain regularity conditions (see Caeiro and Gomes, 2015 for details) which is minimised for ⌊x⌋ denoting the integer part of x. Here, ρ and λ are parameters that may be estimated as described in Caeiro and Gomes (2015), resulting in This method is also available in the R-package tea.

Exponentiality test
As mentioned above, the log-differences ln(z (n−i+1) )−ln(z (n−k) ), i = 1, . . . , k come (approximately) from an exponential distribution with mean ξ if k is large enough. Based on this, Hill (1975) proposed to choose k as the minimum value for which the hypothesis of exponentiality does not fail. Since that tends to result in a too large k, Guillou and Hall (2001) have suggested a modification that reduces this bias. More specifically, their This method is also available in the tea package.

Gertensgarbe plot
The Gertensgarbe plot (Gerstengarbe and Werner, 1989) is inspired by the non-parametric Mann-Kendall test for monotonic The basic idea is that it is reasonable to expect that the behaviour of the ∆ i s of a given dataset is different between the extreme and the normal observations. Hence, there should be a change point in the series of ∆ i s, and this change point is considered as the starting point of the extreme region, and is therefore the estimate of the threshold b. To identify the change point, the test statistic of the sequential Mann-Kendall test is computed both for the ∆ i s from i = 1 to i = n − 1 and for the differences in the reverse order. In this test, the normalised test series U i is given by Then, the intersection point between the U i s and theŨ i s is the estimateb. This method is for instance available in the R package tea.

Simultaneous estimation
The next method we consider in the study is the simultaneous estimation of the threshold and other model parameters, assuming the composite model of Scollnik (2007) (see Section 2). The main differences between this approach and the others, are that all the model parameters are estimated in one step instead of several, and that the bulk distribution is fixed as the log-normal. Wong and Li (2010) proposed to use the method of maximum product of spacings (MPS) (Cheng and Amin, 1983) to estimate the threshold and the parameters in the EOT model. As an alternative to maximum likelihood estimation (MLE), the MPS estimators of the parameters ζ in the EOT model are found by maximising

Maximum product of spacings
where F (z) is the CDF of the EOT model, given in (2.5), and F (z (0) ) = 0 and F (z (n+1) ) = 1. The idea of Wong and Li (2010) is to maximise (3.1) for each value of b = z (n−k) . The estimatê b and the estimates of the remaining model parameters are the ones that maximise (3.1).

Simulation study
To investigate what the optimal threshold selection methods for non-life insurance data are and what the impact of the bulk distribution is, we have performed a simulation study where the sample size and tail properties of the bulk distribution are varied. Section 4.1 gives the parameter settings for this study and the results are presented in Section 4.2. Table 1 Parameter values for the composite claim severity distributions used in the simulation study.

Parameter settings
Tables 1 and 2 present the choice of parameter values for each of the four bulk distributions with corresponding standard deviations and 95%, 99% and 99.5% reserves (ϵ = 0.05, 0.01 and 0.005, respectively). The corresponding non-truncated distributions have a mean of (approximately) 10 and tail varying from moderate to quite heavy. Similarly, the parameters of Pareto distribution are set so that the mean claim size above the threshold is 50 + b and the standard deviation is 118.3, which distinguishes it from the body of the observations.
As the claim sizes are assumed to be i.i.d., the number N ≤b of claims below the threshold, given that the total number of claims is N = n total , follows a binomial distribution with parameters (n total , p ≤b ), where p ≤b = P(Z ≤ b) = F ≤b (b; θ) is the probability that a claim is below the threshold, θ being the parameter vector of the bulk distribution. The probability p ≤b can be estimated either empirically or parametrically. The empirical estimator is simply the observed proportion of claims below the estimated thresholdb, i.e.p emp = n ≤b /n total , with n ≤b = ∑ n total i=1 I(z i ≤b), and the parametric one isp the = F ≤b (b;θ),θ being the estimate of θ.
In this simulation study, both estimators will be applied. Further, claim sizes below the threshold are simulated from the truncated bulk distribution f ≤b|z≤b with the inversion method, i.e. by first sampling a uniform number U * and then transforming it with the inverse cdf F −1 ≤ (bU * ) of the truncated distribution, which is much more efficient than for instance rejection sampling.
The number of simulations in each experiment is N = 1000 and each experiment is performed as follows. One of the 4 distributions is chosen to be the true bulk distribution, the true threshold b is set as the (1−γ )·100% quantile of this distribution, where γ is chosen as one of the values 0.08, 0.06, 0.04, 0.02. A Table 3 Bias in the reserve estimates when ϵ = 0.01, n = 5000, λ = 50, the true threshold is chosen as the 92% quantile and p ≤b is estimated empirically. We.-P. Ga sample z 1 , . . . , z n of size n is drawn from the composite distribution, where n is varied between n = 5000, 500, 50, representing a large, medium, small sample size, respectively. Subsequently, the threshold and the remaining model parameters are estimated using each of the eight methods described in Section 3, assuming each of the 4 bulk distributions (except for the simultaneous estimation). This results in the estimatesb 1 , . . . ,b 8 andθ 1 , . . . ,θ 29 , where for simplicity, M 1 to M 8 stand for the three heuristic methods, namely, the fixed quantile rule, the square root rule and the empirical rule, methods of minimum AMSE of the Hill estimator, exponentiality test, Gertensgarbe plot, simultaneous estimation and MPS method from Section 3, respectively.
As mentioned earlier, we only consider the classic Poisson distribution for the claim frequency of the portfolio, and the expected number of occurrences λ is set to be 50 or 500. Finally, we estimate the reserve q ϵ , for ϵ = 0.01 and also 0.005 and 0.05, with p ≤b estimated both empirically and parametrically. This is done with Monte Carlo, as described in Algorithm 1, using m = 1 000 000 simulations. This results in the estimatesq ϵ,ij , i = 1, . . . , N, j = 1, . . . , 29. The quality of these estimates is then evaluated by the finite sample bias b j = 1

Results
Tables 3-9 show the bias of the reserve estimates when ϵ = 0.01, λ = 50 and the probability p ≤b is estimated empirically, with varying sample size and threshold values. It shows that as the sample size changes, the performance of the different threshold selection methods varies a lot. Methods M 1 to M 7 are examined with n = 5000. It can be seen that, the empirical rule (M 3 ) has the overall best performance except when the true threshold is the 98% quantile. The second best methods are either the exponentiality test (M 5 ) or the square root rule (M 2 ), except M 5 works slight better when the right tail of the data becomes more extreme. However, the performances of the fixed quantile rule (M 1 ) are not as good as the other two heuristic methods, especially when the true threshold value is large. The problem Table 5 Bias in the reserve estimates when ϵ = 0.01, n = 5000, λ = 50, the true threshold is chosen as the 96% quantile and p ≤b is estimated empirically.  Draw Y * from Pareto(α p ,β p ) 12: end for 15: end for 16: Sort as X * with the heuristic methods, as explained in Section 3, is that the choice of k only depends on the sample size. Therefore, with different threshold values, one would expect the performance of the heuristic methods to vary a lot. However, that does not seem to be the case for the square root rule. Moreover, it is somewhat surprising that the minimum AMSE of the Hill estimator (M 4 ), which constructed based on the asymptotic properties of the Hill estimator, has a poor performance when the sample size is large. This counterintuitive result may be due to the fact that M 4 underestimates the threshold value in most cases. When the threshold is underestimated, the assumption of a Pareto distribution above the threshold might be violated and thus increase the model errors. Further, we see that the choice of the bulk distribution makes a difference on estimating the reserves. Generally, when the data are from a heavy-tailed distribution, for instance, the Log-normal or the Log-gamma, the choice of the bulk distribution given the estimated threshold matters a lot, in the sense that the differences in the bias estimates among bulk distributions vary greatly. The effect is further amplified when the threshold is estimated by methods with M 4 , M 5 or the Gertensgarbe plot (M 6 ). This result may be explained by the fact that there is large difference between the distribution below and above the threshold in the study, which enlarges the impact of the bulk distribution. In addition, the influence also depends on the threshold selection methods. For instance, there are no great differences among the bulk distributions when the heuristic methods (M 1 , M 2 and M 3 ) are used.
As the sample size decreases, the performance of each method becomes comparatively worse due to fewer exceedances above the estimated threshold, which leads to large estimation errors and consequently unrealistic reserve estimates, regardless of the types of the data and the tail properties. Instead, with limited data, one could resort to the simultaneous estimation (M 7 ) since it performs best among all the methods, though its performance is unsatisfactory when the data are sufficient. There are two likely causes for the relative poor performance of M 7 when the sample size is large. One is that M 7 assumes a continuous distribution at the threshold which is certainly not the case in the study. The other explanation for these results might be the assumption of a fixed family of bulk distribution that makes M 7 less flexible than the other methods. The MPS method (M 8 ) does not perform well in estimating the reserves as the sample size is moderate or small, though the corresponding threshold estimates seem to be better than others (see the supplementary material). The reason for the extremely large bias is that the estimated shape parameter for the Pareto distribution is close to zero when the data are limited, which leads to a very heavy tail. This may be due to the fact that M 8 is quite sensitive to initial values for the parameters in the optimisation. It may therefore be a good idea to try different start values. Moreover, due to the increase of the estimation uncertainty, the influence of the bulk distribution seems to be smaller when the sample sizes are medium and small. The patterns in the RMSE estimates are similar to the ones for bias estimates, and can be found in the supplementary material.
Corresponding results for ϵ = 0.005 are shown in Tables 10-13. Since the pattern is similar to that for ϵ = 0.01, the simulations for n = 500, 50 are not shown (see supplementary material). Now it indicates that M 2 performs best among all methods regardless of the true threshold level. Method M 4 has a poor performance, as for ϵ = 0.01, but it performs better Bias in the reserve estimates when ϵ = 0.01, n = 5000, λ = 50, the true threshold is chosen as the 98% quantile and p ≤b is estimated empirically.  Bias in the reserve estimates when ϵ = 0.01, n = 500, λ = 50, the true threshold is chosen as the 92% quantile and p ≤b is estimated empirically.
True m. Due to the larger estimation error with smaller ϵ, the bias of the reserve estimates becomes relatively larger than for ϵ = 0.01. The influence of the bulk distribution however remains the similar pattern as for ϵ = 0.01. Further, it seems the variation between the estimates from different bulk distributions tends to be larger for data from moderated-tailed distribution. The results for ϵ = 0.05 are not shown here because they are similar to the ones for ϵ = 0.01. As the expected claim number λ increases, the total loss becomes a sum of a larger number of independent, identically distributed random variables. Its distribution should therefore become more and more similar to the normal distribution. In light of that, one would expect the location of the threshold, and in particular the choice of bulk distribution to matter less than for smaller λ. However, the results are similar to the ones for λ = 50 and the influence of bulk distribution does not seem smaller. The results are therefore not displayed here. A possible explanation for this might be that the total loss distribution is not quite normal yet because of the heavy tail of the claim size distribution, that makes the convergence slow. This finding also indicates that the claim size distribution has more influence on the reserve estimation than the claim frequency distribution. As the probability p ≤b is estimated theoretically (p the ), the corresponding results when ϵ = 0.01, λ = 50 and n = 5000 are shown in Tables 14-15. Since the results are similar when the data become more extreme, only the threshold values equal to the 92% and the 98% quantiles are shown here. The larger bias estimates indicate there is a big difference betweenp emp andp the . It demonstrates thatp emp gives a better performance than that withp the when data are sufficient. As n becomes smaller,p emp tends to overestimate the exceedance above the threshold, which results in larger reserves. One would expect the performance witĥ p the to be better, but it does not seem to be the case, and the performance is worse than that withp emp . This may be explained by the larger estimation error under limited sample, thenp the is close to zero and the corresponding reserve become much smaller than the true one. Further, the choice of bulk distribution influences the bias estimates more than that withp emp even when data are sufficient. This makes sense because there are larger variations betweenp the 's given different bulk distributions. Bias in the reserve estimates when ϵ = 0.01, n = 500, λ = 50, the true threshold is chosen as the 98% quantile and p ≤b is estimated empirically.  Table 9 Bias(1 × 10 4 ) in the reserve estimates when ϵ = 0.01, n = 50, λ = 50, the true threshold is chosen as the 92% quantile and the p ≤b is estimated empirically.

Real data example
In this section, we apply the composite models and the threshold selection methods considered in the simulation study to the Danish fire claim data. The data have been widely discussed in actuarial literature, Scollnik (2007) and Cooray and Ananda (2005), among others. It consists of 2492 fire insurance losses in millions of Danish Kroner(DKK) from years 1980 to 1990 inclusive. The data have a minimum value of 0.31 and a maximum value of 263.30. The mean and standard deviation are 3.06 and 7.98, respectively. The claim frequency distribution is set as Poisson distribution, with λ = 227, which is the average number of claims per year in the historical data. Table 16 gives us the parameter estimates for 4 composite models based on different threshold estimates. Note that for M 8 , as the optimal threshold value is nearly the same for different bulk distributions, the results are presented only for the Lognormal-Pareto distribution as an illustration. Moreover, the estimated reserve based on the parameter estimates are shown in Table 17.
In general, the differences in estimated thresholds from the eight methods are quite large. We see that the minimum AMSE of the Hill estimator (M 4 ), the Gertensgarbe plot (M 6 ), simultaneous estimation (M 7 ) and the MPS method (M 8 ) select a lower threshold than the others. Particularly, M 8 gives us the smallest threshold value which leads to only 26 observations below the threshold. This result does not make sense and might be due to the increasing value of M(ζ) in (3.1) for each b = z (n−k) , where k = 1, . . . , 2491. In this case, the maximalM(ζ) is reached for a very large k and a corresponding smallb is selected. However, Wong and Li (2010) suggests to select k from 1 to [n/4] such that the GPD is preserved for the empirical upper quantiles (Pickands, 1975). If the optimisation is restricted to this interval, the estimate of b is exactly z [n−n/4] = 2.64, which is a more reasonable estimate.
Compared to the simulation results, one interesting finding is that the impact of the bulk distribution is minor for each threshold selection method at different solvency levels. As mentioned in Section 4, the large difference between the distributions Bias in the reserve estimates when ϵ = 0.005, n = 5000, λ = 50, the true threshold is chosen as the 92% quantile and p is estimated empirically.  Bias in the reserve estimates when ϵ = 0.005, n = 5000, λ = 50, the true threshold is chosen as the 94% quantile and p ≤b is estimated empirically. below and above the threshold might be the reason why the bulk distribution influences the reserve estimates a lot. Therefore, the results in this case is likely to be related to less difference between the bulk distribution and the Pareto, though the ground truth is unknown. Further, it is shown that the results are similar to the ones from the simulation study, and might indicate that the reserves from the exponentiality test (M 5 ) and the square root rule (M 2 ) are the most trustworthy.

Concluding remarks
The existence of large and extreme claims of a non-life insurance portfolio influences the ability of (re)insurers to estimate the reserve. The excess over-threshold method provides a way to capture and model the typical behaviour of insurance claim data. This paper discusses several composite models with the commonly used Gamma, Log-normal, Weibull and Log-gamma as bulk distributions, combined with a 2-parameter Pareto distribution above the threshold. We were interested in how the candidate threshold selection methods perform when estimating the reserve and the influence of the bulk distribution, with varying sample size and tail properties. To investigate this, we performed a simulation study.
Our study shows that when data are sufficient, the empirical rule(M 3 ) has the overall best performance of reserve estimation among all approaches. The second best are either the square root rule (M 2 ), or the exponentiality test (M 5 ). The latter works better when the right tail of the data is extreme. The other heuristic method, the fixed quantile rule (M 1 ), though lacking theoretical background, are comparatively better than the remaining ones. Moreover, the influence of the choice of bulk distribution on the reserve is large when the distribution is heavy-tailed. The effect is further amplified when the threshold is estimated by methods with the minimum AMSE of the Hill estimator (M 4 ), the exponentiality test (M 5 ) or the Gertensgarbe plot (M 6 ). Thus, when these methods are used, the choice of bulk distribution should be carefully considered.

Table 12
Bias in the reserve estimates when ϵ = 0.005, n = 5000, λ = 50, the true threshold is chosen as the 96% quantile and p is estimated empirically.  As the sample size decreases, the simultaneous estimation (M 7 ) works best while its performance is unsatisfactory when the data are sufficient. One possible explanation for its poor performance under large sample is M 7 assumes a continuous distribution at the threshold which is certainly not the case in the study. The other might be the assumption of a fixed family of bulk distribution that makes M 7 less flexible than the other methods. The MPS method (M 8 ) makes a better estimation of the threshold than others, however, the reserve estimates are unsatisfactory with medium and small sample sizes. The small value of estimated shape parameter in the Pareto distribution leads to an extremely heavy tail and corresponding overestimated reserve. Furthermore, M 8 is sensitive to initial values for the parameters in the optimisation, so one should be careful and try different initial values when using the MPS method.
Due to the increase of the estimation uncertainty, the influence of the bulk distribution seems to be smaller when the sample sizes are medium and small. Furthermore, the effect of the bulk distribution seems larger as the probability p ≤b is estimated theoretically (p the ) rather than empirically (p emp ). This makes sense because there are larger variations betweenp the 's given different bulk distributions. Despite the fact thatp emp tends to overestimate the exceedance above the threshold when the sample size is small, the overall performance withp emp is better than that withp the . This finding may indicate that the empirical estimate of p ≤b is more robust than the theoretical one.
In this study, we have restricted our attention to the choice of claim severity distributions, fixing the claim frequency distribution at the Poisson distribution. The reserve estimates can however easily be modified to account for other claim frequency distributions, such as the negative binomial. Further, it would be interesting to exploit more flexible simultaneous estimation by varying the bulk distribution and see whether the results improve, especially when the sample size is limited.

Table 14
Bias in the reserve estimates when ϵ = 0.01, n = 5000, λ = 50, the true threshold is chosen as the 92% quantile and p ≤b is estimated theoretically.  Bias in the reserve estimates when ϵ = 0.01, n = 5000, λ = 50, the true threshold is chosen as the 98% quantile and p ≤b is estimated theoretically.