DEFICIENCIES IN THE THEORY OF FREE-KNOT AND VARIABLE-KNOT SPLINE GRADUATION METHODS WITH SPECIFIC REFERENCE TO THE ELT 14 MALES GRADUATION

This paper revisits the theory and practical implementation of graduation of mortality rates using spline functions, and in particular, variable-knot cubic spline graduation. The paper contrasts the actuarial literature on free-knot splines with the mathematical literature. It finds that the practical difficulties of implementing free-knot spline graduation are not recognised in the actuarial literature reviewed. The paper also revisits the results of the graduation of the English Life Tables no. 14 (ELT 14) experience for male lives using a ‘multistart’ optimisation approach for the free-knot graduation. Application of this technique results in the finding that the chi-squared values reported in the ELT 14 graduation for male lives for 10, 11 and 12 knots were not optimal values. The multistart optimisation results appear to show that McCutcheon’s t-statistic, which is used in variable-knot spline graduation to select the optimal number of knots, may not in fact result in an optimal choice. Free-knot spline graduation should be used with caution and variable-knot spline graduation, in the form that employs McCutcheon’s t-statistic, should not be used.


INTRODUCTION
This paper examines deficiencies in the theory of variable-knot spline graduation, which was the method used for the graduation of the 14th English Life Tables ('ELT 14'). The paper begins by explaining what a spline function is, and then describes graduation using fixed-knot splines. The question how to select the knot values in such a graduation leads to a discussion of free-knot splines, which are spline functions whose knot values are also optimised with respect to fit. The paper proceeds to discuss the method of variable-knot splines, as described by McCutcheon (1984). In this paper, the term 'variable-knot splines' is intended to convey the choice of a best-fitting free-knot spline with an optimal number of knots. The results of attempting to duplicate the ELT 14 male mortality graduation are examined. The paper closes by drawing conclusions regarding variable-knot spline graduation.

2.1
Spline graduation involves fitting a spline function, usually a cubic spline, to crude mortality data, and taking the fitted spline as the graduated mortality curve.

2.2
A spline s(C) of degree k (where k is a positive integer) with knots x 0 <x 1 <...<x n <x n+1 is a function defined on the closed interval [a,b] (where x 0 = a and x n+1 = b) for which the following conditions hold: -s(C) is a polynomial of degree not greater than k on each open interval (x i-1 ,x i ) for i=1,2,...,n+1; and s(C) has k-1 continuous derivatives in the open interval (a,b). (1)

2.3
In spline graduation, knots are allowed to coincide, but the number of equal knots at a point (the 'multiplicity' of that knot) may not be greater than k+1.

2.4
Various criteria can be used to fit spline functions. McCutcheon (1981) minimises a chi-squared criterion. London (1985) and Jupp (1978) describe an unweighted least-squares criterion, and De Boor (1978) describes this and other criteria. Forfar, McCutcheon & Wilke (1988) describe the minimum chi-squared and maximumlikelihood criteria, and note that similar graduations are produced by these two methods. Benjamin & Pollard (1985) have shown that, for mortality experiences involving large exposures, the maximum-likelihood, minimum-weighted-least-squares and minimumchi-squared methods will give approximately the same results. All the results in this paper employ the chi-squared criterion to fit splines, since it is clear that this criterion is as capable as the others of producing satisfactory results, and since it was the criterion which McCutcheon used in his development of the variable-knot spline technique of graduation.

2.5
Variable-knot splines are discussed, albeit not necessarily under that name, in the statistical literature. Eubank (1988) gives a summary of various methods of selecting the number of parameters of a fitted spline. The methods discussed include a 'backfitting' algorithm, a 'projection pursuit' algorithm, and an 'alternating conditional expectations' algorithm. Breiman (1993) describes the theory and application to data of the 'delete knot/cross-validation' algorithm for fitting splines. Pittman (2000) discusses the use of genetic algorithms and model selection techniques to fit variable-knot splines. However, this paper considers variable-knot splines only as described in the actuarial literature.

3.1
We concentrate in this paper on results involving graduation of central mortality rates m x , although one could fit minimum-chi-squared spline functions to mortality probabilities q x (McCutcheon, 1981).

3.2
The minimum-chi-squared criterion (McCutcheon, 1984) consists of finding the spline s(C) of degree k on [a,b] with specified internal knots x 1 ,...,x n that minimises where: qx is the number of deaths in the period of investigation classified age x at death; and -E x c is the corresponding central exposure to risk for lives classified age x.
3.3 Formula (2) may also be expressed as with .

3.4
In formula (3), the weights w x depend on the unknown spline function s(C). This is a key disadvantage of using the chi-squared function that would not exist if another criterion such as the maximum-likelihood-estimate method were used to fit the splines. However we do not pursue this idea further in this paper.

3.5
In minimising ( 4. Fit the weighted-least-squares spline s(C) on [a,b] with internal knots x 1 ,...,x n using the weights in the previous step. 5. Repeat 3 and 4 until the chi-squared value converges.

3.6
Hence the minimum chi-squared criterion is equivalent to an iteratively reweighted least-squares procedure. McCutcheon develops the formulae that may be used to obtain the weighted-least-squares splines. The following is a brief outline of how to calculate the parameters of a weighted-least-squares spline with fixed internal knots.

3.7
Let (x-c) + = 0 if x < c x-c if x ³c. It is easy to show that the class of splines of degree k on [a,b] with internal knots x 1 ,...,x n forms a vector space of dimension n + k + 1 . A basis for this class is then .

3.9
Solving for ¶ ¶l W j = 0 for j=1,2,...,n+k+1, yields: 3.10 If BWB¢ is not singular, then a unique solution exists. McCutcheon (1981) and Booysens (1992) implicitly assume that a unique solution always exists, and although it is true that in general there is a unique solution, Jupp (1978) has shown that for the unweighted-least-squares case there may be no unique solution when, loosely speaking, there are regions containing too many knots compared to the number of data points. In this case, BWB¢ will be singular.

3.11
The vector-space basis in (4) may result in ill-conditioned matrices BWB¢that are numerically unstable to invert. In other words, the matrices BWB¢ may be special cases where the algorithms which are used to find their inverses are sensitive to small changes in the BWB¢ matrices, and may result in matrices that are not in fact inverses of BWB¢.
McCutcheon shows how an alternative basis to (4) consisting of 'B-Splines' that are superior in this respect may be used in determining solutions. These results are not important to the subject of this paper and the reader is referred to McCutcheon (1981) for details on B-Splines.

4.1
In 1981, McCutcheon noted that knot placement could influence the fit dramatically, and suggested that fit be optimised with respect to knot position. Jupp (1978) and De Boor (1978) refer to this as the free-knot spline problem.

4.2
Formula (2) may be expanded as follows: McCutcheon's proposal was to minimise formula (7) with respect to knot vector x = (x 1 ,x 2 ,...,x n ) after first having minimised it with respect to parameter vector ë. This involves finding values for a total of 2n+k+1 parameters, comprising n +k +1 parameters in fitting the least chi-squared spline and a further n parameters in finding the best knots.

5.1
In the following discussion we refer to z k (x 1 ,x 2 ,...,x n )[a,b] as the chi-squared value obtained from fitting to the data a minimum-chi-squared spline of degree k on [a,b] with specified internal knots x 1 ,x 2 ,...,x n . We refer to zuw k (x 1 ,x 2 ,...,x n )[a,b] as the least-squares error term from fitting to the data the minimum-unweighted-least-squares spline of degree k on [a,b] with specified internal knots x 1 ,x 2 ,...,x n . We also define Z k (n) = min{z k (x 1 ,x 2 ,...,x n )[a,b]:a#x 1 #...#x n #b}.

5.2
Referring to the problem of optimising fit with respect to knot position, McCutcheon states that 'theoretically it is not obvious that the criterion does in fact lead to a unique spline, but in practice this seems always to be the case ' (1981: 436). However, De Boor (1978) points out that, for splines fitted using the unweighted-least-squares criterion (not criterion 2), it is actually impossible to show that a given value of zuw k (x 1 ,x 2 ,...,x n )[a,b] is a minimum for all a#x 1 #...#x n #b. In addition, De Boor (1996) also points out that this is likely to be the case also for splines fitted using the chi-squared criterion.

5.3
Since it is possible to characterise local minima on z k (x 1 ,x 2 ,...,x n )[a,b], a practical method of attempting to find a global minimum is to find all local minima and choosing the smallest of these. McCutcheon (1993) and others (for example Jupp in 1978) have used a multistart approach to do this. The reason for the use of a multistart technique is that a minimisation algorithm requires as input an initial vector of knots, and will usually converge to the local minimum nearest to that starting point. Choosing a wide range of initial starting points, increases the probability of finding all the local minima, but it is clear that there is no guarantee of having obtained a global minimum.

5.4
Jupp (1978) shows that for the unweighted-least-squares case, this process is complicated not only by the existence of numerous stationary points (these may be local minima, saddle points or local maxima) of zuw k (x 1 ,x 2 ,...,x n )[a,b], but also by the slow convergence to a solution of algorithms used to find local minima. The numerical results in Table 1 ( ¶7.5) suggest that this conclusion holds also for splines fitted using the chisquared criterion.

5.5
It is clear from the above discussion that, while McCutcheon's suggestion regarding optimisation of fit with respect to knot position appears to be easy to implement, it is not easy to guarantee an optimal fit.

5.6
Splines allow the graduator to approximate any continuous function to any level of precision by increasing the number of knots. This characteristic is desirable for the graduation of a large mortality experience (such as ELT 14) where variances around underlying mortality rates are small and hence relatively small adjustments need to be made to the crude rates. But for the graduation of a mortality experience with relatively small exposure, where the graduator's a-priori notion of the shape of the mortality curve plays an important role in determining the graduated rates, use of free-knot splines will not necessarily give the graduator the required shape of curve. (Although the graduator could adapt the knot positions under the free-knot graduation to obtain the required curve, this would not only defeat the object of the optimisation procedure, but it would require a time-consuming trial-and-error process, and render useless chi-squared tests of the adherence of the curve to the crude rates.)

MCCUTCHEON'S VARIABLE-KNOT SPLINE GRADUATION METHOD
6.1 One of the complications with graduation is that the process often involves many attempts to fit curves and requires the graduator to exercise judgement in deciding which curve is best. McCutcheon's method of variable-knot splines appears almost to remove the need for judgment, saving time, and reducing uncertainty regarding the number of degrees of freedom to use in chi-squared tests of goodness of fit of the graduation (Benjamin & Pollard, 1985). (This objective may be less desirable than it appears. The chi-squared test may not be the most important test of goodness of fit in a graduation, since it does not make full use of the information contained in the deviations between expected and actual deaths. In testing the adherence of graduated rates to a mortality experience, the application of the chi-squared test would normally be supplemented by other tests such as the individual standardised deviations test, the cumulative deviation test, the sign test, Stevens' test, and the binomial test (Benjamin & Pollard, 1985)).

6.2
McCutcheon (1984) proposed the following procedure to select the optimal number n of knots to use in a free-knot cubic spline graduation: 1. Determine Z 3 (n) for various values of n.
2. Choose the free-knot spline with n 0 knots, where n 0 is the lowest integer for which where: McCutcheon's rationale for step 2 is that t x x k ( )= --2 2 1is a test statistic for a c 2 variable with k degrees of freedom. (It is true that for large k, if x has a chi-squared distribution, t has approximately a normal (0,1) distribution.) The degrees of freedom corresponding to Z 3 (n) equal the number of data points (which is b-a+1) less the number of parameters fitted (which is 2n+4). His argument is that { } t Z n 3 ( ) has a minimum with respect to n, leading to the criterion set out in step 2. He states that 'this procedure for determining the number of knots to be used leads to an acceptable graduation method ' (1984: 49).

COMPARISON WITH ELT 14 RESULTS
7.1 ELT 14 is based on the mortality experience of the entire population of England and Wales for the period from 1980 to 1982. Central mortality rates m x for each gender and age were derived from the experience and graduated using variable-knot splines. Mortality probabilities q x were then calculated from the graduated rates.

7.2
This section of the paper compares the results obtained applying a multistart algorithm to free-knot spline graduation of ELT 14 male data with those reported by the Office of Population Censuses and Surveys (1987).

7.3
Since the t-statistic used in variable-knot spline graduation tests how significant the deviation is of the graduated results from the crude rates, a crude justification might be that the graduation with the smallest t-statistic might be thought to be the best from this point of view. However, the method appears to be flawed in two important respects.

7.4
Firstly, because there is no guarantee that the minimisation algorithm has produced the minimum value Z 3 (n), it is impossible to tell whether the reason for an increase in the calculated value of { } t Z n 3 ( ) is that the minimum value of t has been attained or that the minimisation algorithm has not achieved the minimum value Z 3 (n).

7.5
Secondly, as Table 1 shows, it appears that the minimum in { } t Z n 3 ( ) may be attained for a much larger value of n than McCutcheon envisaged. If the t-statistic attains a minimum value for some number of knots over 21, it is clearly not of much use, since even the 21-knot graduation, which involves fitting 46 parameters, is not a parsimonious model. (Indeed, it could be argued that a free-knot cubic-spline graduation employing more than 11 knots is not a parsimonious model.) 7.6 The results in Table 1 were obtained using FORTRAN programs calling the 'black box' numerical optimisation and spline-fitting routines of the Numerical Algorithms Group.

7.7
For an 8-knot spline, the minimum chi-squared values resulting from running the minimisation routine 20 times on groups of 50 random starting knots were collected. A similar procedure was followed for 9 to 15 knot splines. The histograms of the resulting distribution of values for each group within each number of knots are shown in the Appendix.

7.8
For 8, 9, and 10 knots, there is a bunching of chi-squared results for each group around the smallest chi-squared values in the range. Although this does not prove that the values being obtained are close to an absolute minimum, it is consistent with the behaviour of a multistart minimisation algorithm near an absolute minimum. This may be taken as evidence that the lowest chi-squared values within each group for a given number of knots may be close to the absolute minimum.

7.9
For 11 knots onwards the peak in the distribution is never near the lowest chi-squared values in the range. Hence for splines with more than 11 knots, the chi-squared values at the bottom of the range are less likely to be close to the absolute minimum. Larger groups of random starting points are apparently required to locate a likely range for the absolute minimum as the number of knots increases. 7.10 It appears that the reason for the use of a ten-knot spline in the published graduation of ELT 14 was that, as shown in Table 1, the lowest t-statistic of 6,35 is attained for ten knots. This appears not to be optimum. Nevertheless, the graduation of ELT 14 appears to have been of a high quality.
7.11 A question arises as to the number of random starting points required to be able to state, for a given number of knots, that there is, say, a 95% probability that the result given by the multistart algorithm is in fact the absolute minimum. This might be addressed by the following Monte-Carlo procedure for each number k of knots: 1. Choose a block size b (say 100) for the number of random starting points. 2. Optimise for each block, recording the least chi-squared value for that block. 3. Repeat step 2 a large number of times (say 1000 times) to obtain the distribution of the least chi-squared values.
4. Determine the proportion p of blocks for which the chi-squared value equals the least achieved for the entire set of 1000b simulations. 5. Repeat steps 2 to 4 for larger or smaller b until p equals 95%. 6. Repeat steps 1 to 6 for other values of k as required.
7.12 This procedure requires a very large number of simulations. It is also difficult to automate fully. It is therefore computationally expensive and time-consuming to implement. It is not definitive, since the minimum attained in step 4 is not guaranteed to be the absolute minimum. Finally, the results are likely to depend on the nature of the numerical optimisation procedure used. Because of these difficulties, this question has not been considered as part of this study.
7.13 The practical consequences of the choice of a sub-optimal number of knots is another area that this paper leaves to further research.

8.1
The method of variable-knot splines is difficult to implement in practice. As a result, some of the decisions in the ELT 14 graduation were based on an insufficient number of random starting points. Free-knot splines may still be used for graduation, but the graduator can never know how close the fit is to the absolute-minimum-chi-squared fit for that number of knots. The larger the number of random starting points used in conjunction with the minimisation algorithm graduation, the more likely it will be that the graduation will be close to the minimum. Further study is required to determine how the number of starting points varies with knot numbers to achieve a particular level of confidence (for example 95%) that the absolute minimum value has been obtained.

8.2
The selection of too many knots will result in graduated rates that are not sufficiently smooth, while the selection of too few knots will result in graduated rates that do not adhere adequately to the crude rates. Because McCutcheon's criterion for selecting the number of knots appears not to identify an optimum, an alternative criterion for choosing a number of knots needs to be formulated. Further work is also required to determine how well an alternative criterion, such as the Akaike information criterion or the likelihood ratio, performs for deciding on a number of knots.

8.3
In addition, free-knot splines do not allow adequately for incorporation of the judgement of the graduator of mortality experiences with small exposures. Even for large experiences, the graduator may be required to make adjustments to the rates at the oldest ages because of the difficulty of incorporating a-priori information regarding the shape of the curve. This is unsatisfactory because it reduces the reliability of statistical tests of adherence to data, and requires time-consuming trial-and-error fitting of a range of alternative curves.

8.4
It is suggested that, for mortality graduation the method of free-knot splines be used with caution, and that the method of variable-knot splines as described in McCutcheon's paper be avoided.