Modeling Data with Extreme Values Using Three-Spliced Distributions

: When data exhibit a high frequency of small to medium values and a low frequency of large values, fitting a classical distribution might fail. This is why spliced models defined from different distributions on distinct intervals are proposed in the literature. In contrast to the intensive study of two-spliced distributions, the case with more than two components is scarcely approached. In this paper, we focus on three-spliced distributions and on their ability to improve the modeling of extreme data. For this purpose, we consider a popular insurance data set related to Danish fire losses, to which we fit several three-spliced distributions; moreover, the results are compared to the best-fitted two-spliced distributions from previous studies.


Introduction
Data coming from fields like insurance or finance often exhibit a particular behavior: there are many small to medium values but also a few extreme values.Such behavior prevents the fitting of a classical distribution to the entire data set so that spliced models built of different distributions in subdivided intervals have been proposed to overcome this issue (see the work of Klugman et al. [1] for details).In this sense, Cooray and Ananda [2] called the two-component spliced model composite, and introduced a composite LogNormal-Pareto model, aiming to better capture tails of loss distributions.This composite distribution was extended by Scollnik [3], whose model underlies many subsequent studies: see [4][5][6][7][8][9][10][11][12] to mention just a few.Also, among recent papers dealing with composite models, we mention [13][14][15][16][17][18]; see also the references therein.On the other hand, spliced distributions are also used in other fields dealing with extreme value theory; see the review [19].
In what concerns spliced distributions with more than two components, their study is scarce maybe because the estimation of the thresholds where they change shape is a difficult problem.In particular, three-spliced regression models were considered by [20,21].In [22], the three-spliced Exponential-LogNormal-Pareto distribution was studied, ref. [23] approached the three-spliced Gamma-LogNormal-Pareto distribution, ref. [24] discussed three three-spliced models (having Weibull as the first component, LogNormal as the second and Pareto/GPD/Burr as the third) in connection with two insurance data sets, while [25] modeled household income data using an Inverse Pareto-Gaussian mixture-Pareto distribution.
In this paper, we first recall the general three-spliced model of [22], then we approach some particular cases with the purpose of modeling a real data set from insurance: the Danish fire losses data.This data set has been intensively studied in the literature in connection with composite distributions starting with [2], and culminating with the comprehensive analysis [7], which compared 256 composite models derived from 16 parametric distributions commonly used in actuarial science.Motivated by this extensive study, in this paper, we propose several three-spliced distributions for modeling the Danish data, built on the basis of the composite distributions that provided the best fits in [7].
Therefore, we define these three-spliced distributions with the purpose of better modeling small, medium and large claims (or data) by separately capturing the behavior of each group, which is consistent with, respectively, lighter-, medium-and heavy-tailed distributions.Also, the choice of the combinations of distributions was made bearing in mind the mentioned behavior of actuarial data.
The rest of the paper is structured as follows: in Section 2, we first recall the threecomponent spliced model, we discuss parameters estimation, and we list the distributions that will be used to define the three-spliced distributions for modeling our data; in connection with these distributions, we briefly present some differentiability conditions because we will consider each three-spliced distribution with and without differentiability conditions at the thresholds.In Section 3, we describe the Danish data set, the distributions fit and compare the results with the ones in [7,24]; we also evaluate and compare some risk measures used in actuarial calculations.We end with a conclusions section followed by two appendices, containing some tables and R code.

General Form
A general three-component spliced probability density function (pdf) is defined in [22] by where θ 1 < θ 2 are the thresholds where the pdf changes shape.f i is a pdf with F i its corresponding cumulative distribution function (cdf), and i = 1, 2, 3, while f * 1 is the right truncation of f 1 at θ 1 , f * 2 the left-right truncation of f 2 at the two thresholds, and f * 3 the left truncation of f 3 at θ 2 ; therefore, θ 1 and θ 2 are also the points where f 1 , f 2 , f 3 are truncated.The normalizing constants r i ∈ [0, 1], i = 1, 2, 3, are such that r 1 + r 2 + r 3 = 1, and represent the probability of each component, e.g., in the case of modeling small, medium and large size claims, r i represents the proportion of the corresponding type of claim.
A natural requirement for a spliced pdf is the continuity imposed at the thresholds, often followed by the differentiability condition that ensures a smooth pdf.In the next proposition, we present the constraints resulting by imposing such conditions.These constraints are used to reduce the number of estimated parameters, and hence the computational burden of the estimation procedure, e.g., from the first statement of the next proposition, we can see how the normalizing constants can be obtained based on the continuity condition at the two thresholds.Proposition 1.Consider the three-spliced pdf (1).

Proof. (a)
The continuity conditions at the thresholds yield (see also [22]) From this first equation, we obtain the second stated formula (of r 1 ).From the second equation follows the third stated formula (of r 3 ), and inserting these formulas into the condition r 1 + r 2 + r 3 = 1 gives the first stated formula of r 2 .
(b) The proof can be found in [22].
Let M f (t) = E(e tX ) denote the moment-generating function (mgf) of the random variable (rv) X with pdf f , and let E n ( f ) = E(X n ) denote its initial moment of order n.The following results are proved in [22]: The cdf of ( 1) is given by Remark 1. Random values from the three-component spliced distribution can be generated by using the inversion method on the cdf (2) assuming that F 1 , F 2 and F 3 admit inverse functions.
Moreover, from the cdf (2), we obtain the quantiles of a three-spliced distribution as follows: Proposition 3. Assume that F 1 , F 2 and F 3 admit inverse functions and q p is the quantile of order p, p ∈ (0, 1), i.e., q p = in f {x : F(x) ≥ p}.Then, Remark 2. For large values of p, the quantile is used under the name of Value-at-Risk (VaR) as a risk measure in risk analysis.Based on VaR and developed in connection with heavy-tailed distributions, the Tail-Value-at-Risk (TVaR) is another frequently used risk measure defined for a rv X as TVaR p (X) = E(X|X > q p ) = 1 1−p ∞ q p x f (x)dx.In our case, it can be calculated from hence needing integration.Alternatively, Monte Carlo simulation can be used.

Parameter Estimation
Let x = (x 1 , ..., x n ) be a random data sample from a three-spliced pdf and Λ the set of parameters to be estimated.The corresponding likelihood function is Note that to perform maximum likelihood estimation (MLE), in addition to the parameters of f * 1 , f * 2 , f * 3 , the two thresholds must be estimated.In [22], a complete estimation algorithm is proposed by letting the two threshold estimations sweep the entire data domain, which unfortunately proves to be very time consuming (more precisely, after sorting the data set, MLE solutions are searched by restraining the thresholds in between every two consecutive data such that θ 1 < θ 2 ; finally, the best solution is kept).Therefore, in [23], some threshold selection methods are revised.Based on these reasons, in the numerical section of this work, we use the following algorithm: Step 1. Select the initial values for the thresholds (using, for example, graphical diagnostics) and, for each threshold, set up a search grid around its initial value.Step 2. For θ 1 in its grid For θ 2 in its grid Evaluate the remaining parameters by MLE of the likelihood (5) under the constraints of continuity at the thresholds and differentiability if considered.Step 3.Among the solutions obtained at Step 2, choose the one that maximizes the loglikelihood function.
Step 4. The algorithm can be reiterated by refining the grids around the last threshold values.
Being based on partial MLE, the algorithm does not necessarily perform full MLE, so variants for improving the estimation should be studied in the future.
Regarding the thresholds selection in Step 1, the histogram can give an idea of where its shape changes.Further, graphical diagnostics like the mean excess function or Hill plots (implemented in the evmix package of R) can be used to detect transitions between different parts of the distribution; we recall that a Pareto tail could be detected if there exists a point beyond which the mean excess plot is linearly increasing.Also, previous information as the estimated thresholds for some two-spliced distributions could be of great help (we used such information in the numerical study).
In what concerns the grids, they can be chosen equispaced around the thresholds, strongly depending on the data: it is recommended to leave a certain amount of data for each of the three components in order to justify the fit of a three-spliced distribution (as pointed out by a referee, estimating parameters within each segment requires a substantial amount of data, which can be challenging to obtain in certain circumstances).Also, the number of values in the grids should be taken such that the computing time is not too long (depending on the computer's characteristics).Moreover, we preferred to start with grids having larger spans because once a solution was found, we reiterated the algorithm with denser grids around the thresholds of the last solution found.
In Step 2, we used the function optim from R with the default optimization method of Nedler and Mead [26], which works reasonably on non-differentiable functions as was our case for half of the fitted distributions.
Remark 3. Parameter estimation may present difficulties due to the complexity of the model, potentially leading to unreliable estimates (e.g., potential interaction among different distribution segments may introduce errors in estimations).What we noticed when working on the numerical part is that for a few three-spliced distributions, the algorithm provided some best solutions (sometimes with extreme values of the parameters) that were quite different from the other solutions obtained for the thresholds in the respective grids; this happened especially for the distributions without differentiability at the thresholds.Other problems were reaching the iteration limit (maxit had to be increased) and degeneracy of the Nelder-Mead simplex (see details in the numerical part).

Remark 4.
Other methods suggested in the literature to reduce the grid search are based on empirical quantiles and on the method of moments (however, the resulting system of equations might not have a closed-type solution).On the other hand, to improve the estimation, the MLE procedure was complemented in [24] with a Bayesian analysis via Markov chain Monte Carlo (MCMC) simulation.
Note that if the order of the head, middle and tail distributions is changed, it is expected to obtain different estimated parameters.

Particular Involved Distributions
In the numerical illustration section, several three-spliced distributions will be fitted to a real data set from insurance: the Danish fire losses.Therefore, the choice of the component distributions is related to this data set and the pdfs of the selected distributions considered for the head, middle, and/or tail of the three-spliced distribution are listed in Table 1.All the distributions are given in terms of shape and scale parameters.Regarding the choice of these eight distributions, we mention that it was based on the detailed discussions of the existing literature and best fits from [7].
Table 1.Eight positive parametric distributions considered the tail, middle, and/or head distributions in the three-spliced model (abbreviation in brackets).

Distribution Density
Inverse Burr (IB) In total, we fitted 44 three-spliced distributions: 22 with and 22 without differentiability conditions at the two thresholds.We would have liked to try more such distributions, but the computing time is still very long; plus, we can already get an idea based on the selection criteria AIC and BIC.
Two examples.We illustrate the three-spliced distribution with two particular distributions that we fitted to the Danish data set in Section 3: the Weibull-Paralogistic-InverseWeibull and the Paralogistic-Weibull-InverseWeibull.For consistency with the densities notation in Table 1, their parameters are denoted as follows: Weibull(α, σ), Paralogistic( ᾱ, σ), Inverse Weibull( α, σ), α, σ, ᾱ, σ, α, σ > 0. For both distributions, we present the pdfs and the restrictions resulting from the differentiability conditions (as discussed above, the continuity conditions yield direct formulas for the normalizing constants, so we do not insist on these ones).
From Table 1 and (1), the pdf of the Weibull-Paralogistic-InverseWeibull distribution is while the pdf of the Paralogistic-Weibull-InverseWeibull is In order to impose the differentiability conditions, the following lemma concerning the derivatives of the pdfs from Table 1 involved in Proposition 1 (b) is helpful (the proof is simple so we skip it).
Lemma 1. Denote the pdf of the distribution by f indexed with the abbreviation of the name.Then, the following relations hold for x > 0: As an illustration, we present the differentiability conditions for the two particular three-spliced distributions considered above as examples.The following proposition easily results from the above lemma.

Numerical Illustration
To illustrate the applicability of three-spliced distributions, we consider the Danish fire losses data set consisting of n = 2492 claims available in the SMPracticals package of R 4.4.0 software.This choice is motivated by the fact that this data set was intensively studied in connection with two-spliced distributions, enabling thus comparisons between the fits of various distributions.From the descriptive statistics displayed in Table 2, it can be seen that the data exhibit high right skewness and a very fat tail, hence motivating the fit of spliced distributions to capture the extreme values.Moreover, the histogram in Figure 1 supports the fat tail remark, and also the spliced assumption; see the high gap in the left side at values smaller than 1.To this data set, we fitted several three-spliced distributions, which, as mentioned in the introduction, are built on the basis of the composite distributions that provided the best fits in [7], also resumed in [8].Therefore, we combined the 8 distributions presented in Table 1 and obtained the 22 three-spliced distributions from Table 3, each one considered with no differentiability condition and with both differentiability conditions at the thresholds (in total, 44 fitted distributions).For all 22 distributions, we imposed continuity conditions at both thresholds.In a first step, we chose not to impose differentiability conditions in order to obtain a greater flexibility of the pdf, which is necessary to capture, for example, the big jump at the very left side of the histogram.However, in this case, we encountered some problems like degeneracy of the Nelder-Mead simplex and standard errors being unavailable.Also, for a few three-spliced distributions, we obtained substantially smaller BIC values but very bad Kolmogorov-Smirnov (KS) results (see Table 4), so we reiterated the algorithm, avoiding these extreme situations (for this purpose, we initially kept all the solutions from Step 2 of the algorithm in a list together with the KS values, and based on the log-likelihood and KS, we decided which one is reliable).In a second step, we considered the same 22 distributions but with differentiability conditions at both thresholds.The order in which the spliced distributions are displayed in the tables is based on the BIC order of the spliced distributions with no differentiability; see Table 3.The position of the distributions as the head, middle, or tail was selected following the discussions in [7,24].More precisely, ref. [24] noticed that the Weibull distribution, having a more flexible shape on the left side, can better capture the smaller claims; therefore, in more than half of our models, Weibull is the head distribution.Alternatively, we considered for the head of several models the Paralogistic as discussed in [7], and the LogNormal.Also, based on the comments of [7], we alternated the Weibull, Paralogistic, LogLogistic and InverseBurr as the middle distributions.A more heavy-tailed distribution should be used for the tail, and, based on their findings, ref.
[7] recommended the following ones: InverseWeibull, InverseParalogistic, LogLogistic, Burr and Paralogistic.Among these, as the tail, we used the first three, and added the Pareto one as in [24], which proved to be a good choice.
Also, even if the 20 best-fitting two-spliced distributions in [7] performed better than the models using the LogNormal distribution in the head, we considered the LogNormal as the head distribution in four of our three-spliced models, obtaining good results for the no differentiability models.However, we must mention that we tried several three-spliced distributions with the LogNormal as the middle distribution and without differentiability for models that we do not display here because for all of them, we obtained extreme values of the estimated parameter µ that necessitate higher precision in the calculation.Nevertheless, we note that [24] obtained good results with the LogNormal as the middle distribution (see Table 6) but including also differentiability conditions and using a more complex estimation algorithm that involves MCMC simulation.
In Tables A1 and A2 from Appendix A, we display the parameters of the distributions estimated by the method described in Section 2.2, with standard errors under each parameter (in Table A2, the two parameters resulting from the differentiability conditions can be identified by the lack of standard errors beneath them).Concerning the algorithm, based on the previously estimated thresholds (for two-spliced composite distributions) and on the data histogram (see the big jump at the very left side of the histogram), for θ 1 , we chose a first search grid in the interval 0.8-3 with step 0.1, comprising the thresholds θ 1 = 0.947 estimated in [24], and the thresholds from Table A2 in [7], all over 0.92.For θ 2 , a broader grid in the interval 1-20 was considered, avoiding the overlapping; this interval was chosen to be more wide since there is not much information available on a second threshold.Once we found a first solution for θ 1 , θ 2 , we refined the corresponding grids centered on these solutions, of length 0.5 and with step 0.001, hence obtaining the better solutions displayed in Tables A1 and A2.
In Appendix B, we added two examples of R code for estimating the parameters with and without differentiability.
Note that the estimated threshold θ 1 varies between 0.891 and 1.030, mostly around 0.92-0.95,which is consistent with the thresholds reported in Table A2 from [7] and in [24].Regarding the estimated second threshold θ 2 , its range is broader from 0.991 to 9.506.
In what concerns the normalizing constants, we note that the three-spliced distributions without differentiability on positions 9 and 15 have small values of r 3 (under 0.1) consistent with higher values of θ 2 over 9. Also, almost all distributions present values of r 1 smaller than 0.1 correlated with small values of θ 1 (excepting the distributions with the LogNormal as the head and with differentiability conditions).
Regarding the standard errors displayed in Table A1 in parentheses under each estimated parameter of the distributions without differentiability, we note several high values: for the first parameter of the LogNormal used as the head distribution, and for the second parameter of some Weibull distributions used as the head.Since this situation does not happen with the standard errors in Table A2, i.e., for the distributions with differentiability, it seems to be due to the lack of differentiability at threshold θ 1 .
In Table 3, we present the negative log-likelihood (NLL), number of estimated parameters k, AIC, and BIC values for the 22 distributions with and without differentiability (diff.)conditions; since the distributions are ordered according to the BIC of the distributions without differentiability, in the last column, we display the global BIC order for all 44 distributions.For comparison, in Table 5, we also show the same results for the nine best-fitted two-spliced distributions from [7], and in Table 6, the results for the three three-spliced distributions considered in [24].We note that 30 of our three-spliced distributions provide a better fit, the first 3 with significant improvement.Also, in general, each one of our spliced distributions with differentiability has higher AIC but smaller BIC (due to a smaller number of estimated parameters) than the same distribution without differentiability.Interestingly, 5 of our top 6 (BIC) distributions contain the Pareto as the tail distribution, while none of the 20 best-fitting two-spliced distributions in [7] contain the Pareto.
In Table 7, we also calculate the Kolmogorov-Smirnov statistics and p-values using the R function ks.test for all 22 distributions, where we recall that this test is based on D n = max 1≤i≤n |F(x i ) − F n (x i )|, representing the maximum distance between the empirical distribution function F n of the sample and the cdf F of the assumed distribution.The smallest such value is provided by the Weibull-InverseBurr-LogLogistic distribution without differentiability (position 36 on BIC scale, unfortunately having a large standard error of the second parameter of the Weibull head distribution).We note that a few spliced distributions have small p-values, especially the ones with the LogNormal as the head distribution and with differentiability (these are also the last ones on the BIC scale); hence, in this case, the LogNormal seems acceptable as the head distribution only without differentiability conditions.
As mentioned above, in Table 4, we display the results for the initial estimation of four three-spliced distributions without differentiability, characterized by substantially smaller BIC values but also by very small KS p-values.In Figure 2, we plot the pdfs of three of these distributions overlapping the data histogram: on the left side, the pdfs corresponding to the distributions in Table 4, and on the right side, the pdfs for the same distributions but re-estimated as in Table A1.We note that the pdfs in the right plot provide a better fit, having a smaller mode.However, note how the Weibull-Paralogistic-LogLogistic (that provides the very best BIC fit) captures the histogram hump between 1.5 and 2.   Since in actuarial modeling, an important purpose of distribution fitting is to evaluate the risk, in Table 8, we also present the two common risk measures VaR and TVaR evaluated at the 0.95 and 0.99 security levels.For comparison, in Table 9, we display the same risk measures obtained in [7] for the nine best-fitting two-spliced distributions given in Table 5.
To calculate the VaR values, we used the quantile Formula (3), while for the evaluation of TVaR, we used the function integrate from R according to Formula (4).
Note that even if most of our three-spliced distributions slightly underevaluate the empirical VaR 0.95 risk measure, almost all of them slightly overevaluate the empirical VaR 0.99 and overevaluate TVaR 0.95 .All our three-spliced distributions overevaluate the empirical TVaR 0.99 , some strongly probably due to a too heavy tail (see the LogNormal-Weibull-Pareto without differentiability).
To conclude, the best overall fitting might not provide the best fit of the tail, which is of great importance in risk modeling, where a moderate overestimation of, for example, actuarial quantities is preferable to underestimation.

Conclusions and Future Work
Motivated by a real insurance data set that includes small to extreme values, in this paper, we considered three-spliced distributions with the aim of better modeling small, medium, and large data.After briefly recalling some properties, we discussed an estimation procedure that, to reduce the computing time, is based on a grid search for the two unknown thresholds.The procedure was applied on the insurance data set by fitting 22 three-spliced distributions considered with and without differentiability at the two thresholds.By not considering the differentiability conditions, we hoped to obtain a better fit; however, we noticed that the lack of differentiability can create estimation problems, and for this data set, it did not necessarily produce better fits in terms of BIC.
By comparing the results with the best two-spliced models previously fitted to the same data set, we concluded that three-spliced distributions can indeed improve the fitting.Still, we note the fact that there are many more ways to combine distributions into a threespliced model than into a two-spliced model, while the computing time needed to fit a three-spliced distribution is much longer than for a two-spliced one, especially in the case without differentiability at thresholds (see Appendix B).
Therefore, when deciding to use three-spliced models, it is better to have a good knowledge of their components, which could be achieved by first fitting some two-spliced distributions.Related to this aspect, as future work, we plan to improve the estimation procedure in order to reduce the computing time.We mention that we recently became aware of the work [24], which made use of MCMC simulation to complement the MLE procedure, so we will consider this as a future work direction.
Also, as future work, it would be interesting to study three-spliced models having negative support.

Appendix B. R Code
In Figure A1, we present an example of R code to estimate the six parameters of a three-spliced distribution without differentiability, while Figure A2 shows the results obtained in about 30 min.
Note that this code is the same for any distribution with six parameters to be estimated, the only difference consisting in changing the functions names in f 1 , f 2 , f 3 .Also, this code can be easily adapted for other three-spliced distributions with a different number of parameters to be estimated (in our case, we also have distributions with five and seven such parameters).In Figure A3, we present an example of R code to estimate the four parameters of a three-spliced distribution with differentiability, while the results are shown in Figure A4 obtained in about 34 s (this is faster because there are only four parameters to estimate).On the other hand, it is more complicated to adapt this code to other three-spliced distributions because the parameters calculated from the differentiability conditions change with the distribution.
Note that the computing times above are for grids of dimensions 21 × 190.However, we used larger grids, and the computing time increased, especially when we also stored the results in a list containing also KS values as explained before.
We mention that we used a laptop Intel(R)Core(TM) i7-1255U, 1.76 GHz.

Proposition 4 .
The differentiability conditions from Proposition 1 imposed to the Weibull-Paralogistic-InverseWeibull distribution yield the following two restrictions for α

Figure 1 .
Figure 1.Histogram of the Danish fire data.

Figure 2 .
Figure 2. Histogram of the Danish fire data with fitted pdfs of 3 three-spliced distributions estimated twice.

Figure A1 .
Figure A1.R code for estimation without differentiability.

Figure A2 .
Figure A2.Result of R code for estimation without differentiability.

Figure A3 .
Figure A3.R code for estimation with differentiability.

Figure A4 .
Figure A4.Result of R code for estimation with differentiability.

Table 2 .
Descriptive statistics of Danish fire data.

Table 3 .
Results of fitting 22 three-spliced models with and without differentiability (ordered according to the BIC of models without differentiability).

Table 4 .
Results of fitting 4 three-spliced models with the best BIC values but worst KS result (estimated parameters under distribution).

Table 5 .
Results of the 9 best-fitting two-spliced models (according to the BIC) in[7].

Table 8 .
VaR and TVaR at 0.95 and 0.99 security levels for the 22 three-spliced models.

Table 9 .
VaR and TVaR at the 0.95 and 0.99 security levels for the 9 best-fitting composite models in[7].

Table A2 .
Estimated parameters for 22 three-spliced models with differentiability (standard error under estimated parameter value).