Forecasting mortality of non-extinct cohorts with the penalized composite link model

Mortality forecasting has crucial implications for insurance and pension policies. A large amount of literature has proposed models to forecast mortality using cross-sectional (period) data instead of longitudinal (cohort) data. As a consequence, decisions are generally based on period life tables and summary measures such as period life expectancy, which reflect hypothetical mortality rather than the mortality actually experienced by a cohort. This study introduces a novel method to forecast cohort mortality and the cohort life expectancy of non-extinct cohorts. The intent is to complete the mortality profile of cohorts born up to 1960. The proposed method is based on the penalized composite link model for ungrouping data. The performance of the method is investigated using cohort mortality data retrieved from the Human Mortality Database for England & Wales, Sweden, and Switzerland for male and female populations. © 2020 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Mortality forecasts are, in almost all cases, based on period mortality information. These forecasts are calculated using period life tables, constructed under the following assumption: what if a hypothetical cohort was subject to the mortality conditions of that specific time period, normally one year (Preston, Heuveline, & Guillot, 2001). This shows the probability of death at a given point in time for people of different ages. A clear limitation of this approach is that people are not living in one time period but instead are members of a birth cohort that is aging one year every year. The period perspective does not reflect any real population and potentially ignores important connections within a birth cohort. For example, it has been argued that conditions in a person's childhood matter for the mortality experienced at more advanced ages (Barker, 2004;Elo & Preston, 1992), and it has been clearly demonstrated that smoking behaviors have an impact on death rates later in life (Doll & Hill, 1950;Doyle, Dawber, Kannel, Heslin, & Kahn, 1962;Janssen & Kunst, 2005;Preston & Wang, 2006). Long-term effects are potentially shared across a specific birth cohort; therefore, cohort life tables are often considered to be more informative than period life tables.
Cohort forecasting has been less common due to the heavy data demand it requires. To have mortality data https://doi.org/10.1016/j.ijforecast.2020.03.003 0169-2070/© 2020 The Author(s). Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). on one complete cohort, one must wait for over a century, and the ''youngest'' cohort with completed mortality might provide outdated information. Because cohort life tables represent the real mortality schedule for a group of individuals, they provide relevant information for life tables users, including life insurance and pension companies. A pension company is better off knowing the actual future life expectancy of its customers than the period life expectancies. Likewise, public institutions that plan health care, public pensions, and so forth, are essentially interested in the real future life time of people living in a society. In order for a cohort life table to be useful to these life table users, forecasting methods that can complete recent cohorts that are not yet extinct are required.
Mortality forecasts are typically obtained by extrapolating period death rates, which are used to calculate life expectancy forecasts. The most widely used forecasting method in countries with reliable vital statistics data is the model by Lee and Carter (1992). It summarizes death rates on a logarithmic scale using principal component techniques and linearly extrapolates the associated time index. The Lee-Carter model has been the inspiration for various modifications and extensions, including Booth, Maindonald, and Smith (2002), Hyndman, Booth, and Yasmeen (2013), Lee and Miller (2001) and Renshaw and Haberman (2006). The model is used by statistical offices to produce official mortality forecasts (Stoeldraijer, van Duin, van Wissen, & Janssen, 2013). Instead of using period death rates, Cairns, Blake, and Dowd (2006) suggested using another period life table measure, the probability of death, which can be linearly extrapolated after a logit transformation. More recently, the life table death distribution has been suggested as another useful measure when forecasting period mortality Bergeron-Boucher, Canudas-Romo, Oeppen, & Vaupel, 2017;Bergeron-Boucher, Simonacci, Oeppen, & Gallo, 2018;Kjaergaard, Ergemen, Kallestrup-Lamb, Oeppen, & Lindahl-Jacobsen, 2019;Pascariu, Lenart, & Canudas-Romo, 2019). Common to all these mortality models is that period life table information is used.
Forecasts accounting for a cohort effect have been estimated based on age-period-cohort (APC) models. For example, Renshaw and Haberman (2006) provided an extension of the Lee-Carter model and Cairns et al. (2009) an extension of the Cairns et al. (2006) model to produce APC forecasts of mortality. These models, which have been shown to improve the fit to mortality data, provide period forecasts. They are also subject to limitations such as identifiability problems, which are addressed with parameter constraints, and independence is assumed between the period and the cohort effects even though they can be correlated (Currie, 2012). Cohort (agecohort) forecasting has rarely been implemented. Such forecasts can be performed by applying parametrization functions (e.g., Heligman-Pollard or Siler models) to incomplete cohort data to estimate mortality after the age of truncation (Booth & Tickle, 2008). However, applying these models to incomplete data can lead to estimation problems. In an extensive literature review of forecasting models, Booth and Tickle (2008) found that only the Continuous Mortality Investigation Bureau (2006) achieved a true cohort forecast, using the P-spline regression method from Currie, Durban, and Eilers (2004) to smooth agecohort mortality. Later, Chiou and Müller (2009) explored the use of functional data analysis to model cohort lifetables and derive mortality forecasts, and Basellini, Kjaergaard, and Camarda (2019) used an age-at-death distribution model which segments the death distribution. To the knowledge of the authors, these latter three are the only applications of cohort data used for forecasting.
In this article, we suggest a novel method to forecast mortality using cohort information from a cohort life table. We aim to complete the mortality trajectories for non-extinct cohorts with a penalized composite link model (PCLM) (Eilers, 2007). Rizzi, Gampe, and Eilers (2015) previously used the model to estimate detailed age-at-death distributions from coarsely grouped death counts, and the model is further extended in the analysis here. The PCLM has proven useful for ungrouping data aggregated in an open-age interval (e.g., 85+). The number of survivors from a given birth cohort at the age of truncation can be considered as a coarsely grouped death count not yet observed. The remaining deaths can be ungrouped (forecast) by age using the PCLM, and the cohort mortality can be completed. The forecast is based on the death distribution of a cohort life table. This makes it possible to exploit several observable features: the death distribution sums to the radix of the life table, a well-defined local maximum of the distribution can be found at high ages, and the number of deaths after the age of 120 is very close to zero or equal to zero. We exploit all three features in our model to allocate remaining deaths for cohorts that are still alive.

Cohort death distributions
The cohort death distribution is the basis of the proposed method so as to forecast cohort mortality. Standard life table techniques (Preston et al., 2001) are used to calculate death distributions for all complete and noncomplete cohorts assuming an initial population size, conventionally set at N = 100.000, and known as the radix of the life table. The death distribution is denoted by the matrix d x,c , where x stands for ages x ∈ (0, 1, . . . , 110) and c for the specific birth cohort.
We divide d x,c into three time windows that are treated differently. First, a time window denoted by T1 consists of all complete cohort death distributions. For observed data, we define complete cohorts by a maximum life length of 110 years following standard conventions similar to, for example, Wilmoth et al. (2017). A small fraction of the population lives past the age of 110, but closing the life table at 110 years is of minor importance for the studied populations and does not affect the results. The T1 time frame ends with the cohort born 1905. The second time frame is denoted by T2 and consists of incomplete cohorts until the birth cohort 1935. 2 Here, the complete death distribution of the cohorts not yet extinct is estimated with the PCLM for ungrouping (Rizzi et al., 2015). The birth cohort 1935 is the final cohort where the PCLM can determine a correct local maximum of the death distribution at adult ages for the studied populations. The local maximum at adult ages is referred to as the number of deaths at the mode, and the corresponding age as the modal age at death (Canudas-Romo, 2010). Any cohort in T2 consists of a part, d 0:A,c , with observed deaths from age zero to the last observed age (A) and a part, d A+,c , which is unobserved and grouped for all ages above A. Because the number of deaths over all ages, by construction, sum to the radix of the life table, it is possible to determine the unobserved part using the equation d A+,c The unobserved deaths in that sense remain because the people are still alive but will eventually die. Hence, a forecast of the mortality for each birth cohort can be found by allocating the remaining deaths in a way similar to an ungrouping problem for right censored data, which can be performed accurately for high ages (Rizzi et al., 2015). The third time window, T3, consists of all cohorts born after 1935. In this setting, the complete death distribution of the incomplete young cohorts is estimated with an extended version of the PCLM for ungrouping by augmenting the input data: estimates of the local maximum of the death distribution, the corresponding modal age at death, and the proportion of deaths after the mode are all incorporated into the PCLM. Note that the separation of T2 and T3 is based on the selected populations and might differ for other populations than those analyzed in this study. Fig. 1 illustrates the three time windows and the corresponding analysis strategy.

PCLM used on T2 cohorts
The aim of the PCLM is to estimate a complete cohort death distribution from an incomplete life table; i.e., from a cohort that is not yet extinct. Rizzi et al. (2015) shows that a PCLM can be formulated so that it relates agespecific death counts from a latent but expected sequence of death counts, γ = (γ j ), with j = 0, . . . , J + 1 to an observed sequence of death counts d = (d i ) with i = 0, . . . , I + 1, where I < J gives the number of age intervals; i.e., the length of all the observed single ages plus an open interval, which includes the remaining deaths of a cohort not yet extinct. The sum of the latent sequence γ equals the sum of the observed death counts d. The observed death counts for each cohort are assumed to be realizations of a random variable X i that follows a Poisson distribution with the expected value µ i = E(X i ) so that µ i measures the expected number of counts from a realization of X i . Hence, the age-specific death count is observed with the probability P( Latent death counts in a population of size N equal the age-specific probability of death p j times the sample size γ j = N p j . In other words, the vector γ represents the complete cohort death distribution of expected means so that we aim to estimate from the observed incomplete cohort death distribution d. We assume that γ and µ are linearly related by a composition matrix C with a dimension (I + 1) × (J + 1) such that µ = C γ. The composition matrix C is a 0/1 matrix that describes how the latent sequence γ was summed before generating the data.
To estimate the complete death distribution of the cohorts not yet extinct in T2, the PCLM for ungrouping redistributes the sum of the remaining deaths on a fine grid of single ages until age 130, under the assumption that γ is smooth. An interval of zero death counts at the ages 120-130 is added to provide sufficient flexibility at high ages. Extending with a zero death count interval ensures that the estimated deaths after age 120 in the death distribution forecasts are very close to zero. Results from the PCLM are aggregated for ages 110 to 130. Estimates up to age 110 with the residual category 110+ are used coherently with the input data. Hence, the vector of observed deaths will be equal to The corresponding composition matrix C is equal to the identity matrix for the first A ages in which we observe that deaths since γ j and µ i are related one to one. For the unobserved and zero parts, C equals sequences of ones as d A+ should be distributed to all ages from A + 1 to 130, as follows: 1 0 0 · · · 0 · · · · · · · · · · · · · · · 0 d 1 0 1 0 · · · 0 · · · · · · · · · · · · · · · 0 d 2 0 0 1 · · · 0 · · · · · · · · · · · · · · · 0 . . .
For further details on the PCLM for ungrouping, see Rizzi et al. (2015).

PCLM used on T3 cohorts
For cohorts younger than the birth cohort of 1935 the PCLM is not able to determine a reliable forecast of the death distribution without further assumptions: in particular, the procedure fails to estimate the modal age at death. The PCLM relies on the modest assumptions that death counts are Poisson distributed and that the resulting complete death distribution is smooth, which makes it less suitable for redistributing deaths over a high number of ages without further restrictions. Hence, we augment the matrix of observed deaths d x,c with forecasts of deaths at the mode d M,c of the modal age at death M and of the number of deaths after the mode d M+,c . Information on the structure of the death distribution is thereby given to the PCLM, making it possible to efficiently allocate deaths over a long age range. Augmenting the data by d M,c and M leaves two intervals of deaths to be distributed: one with deaths before M (d M−,c ) and one after (d M+,c ). Estimates of these are found by forecasting the proportion of deaths after the mode and, from that, by calculating d M + ,c and d M − ,c , again using the property that all deaths sum to the radix. The corresponding compositional matrix is given in Box I: For the populations considered in this study, d M,c and M have shown a highly linearly increasing pattern over 1 0 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 d 1 0 1 0 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 d 2 0 0 1 · · · 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 . . . . . . . . . . . . . . . . . . · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 d A 0 0 0 · · · 1 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 . . . · · · · · · · · · · · · 0 1 · · · 1 0 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 1 · · · 1 many years -see Fig. 2 panels a) and b). The proportion of deaths after the mode does also show an increasing pattern, but is less pronounced compared to the two other time series. The three time series are forecast using a state space model; more precisely, a structural time series model; i.e., the local linear trend model (LLT) (Durbin & Koopman, 2001). The LLT model allows an explicit estimation of both trend and local components for long-and short-term fluctuations and it is in general more flexible compared with the ARIMA models that are traditionally used (Durbin & Koopman, 2001). The model allows for a stochastic trend that can model an increasing and trending pattern. A detailed description of the LLT model is provided in the Supplementary Material. Residuals of the models are shown in the Supplementary Material and show that the models provide a reasonably good estimate with stationary residuals.
An increasing pattern is most pronounced for the modal age and the number of deaths at the modal age, but an increasing pattern can also be found for the proportion of deaths after the mode for some of the considered populations. For the populations with a less increasing pattern, the slope component (modeling the trending behavior) will be close to zero and not affect the forecast. The proportion of deaths after the mode cannot trend upwards in a long horizon because this will collapse the distribution. Thus, we recommend that the model is reestimated when new data are available so that new data patterns can be modeled. Furthermore, very long-term forecasts should be interpreted with caution. Fig. 2 shows both the observed time series, fitted (smoothed with the Kalman smoother), and forecast values using the LLT model for Swedish females. The number of deaths at the mode shows little high-frequency variation compared to the long-term trending behaviour, and thus the fitted values are close to the observed values. The LLT model fits the time series well without any notable signs of autocorrelation in the residuals. See the Supplementary Material for further details and plots of the residuals.

Estimation of the PCLM model
For both time windows T2 and T3, the PCLM can be estimated by maximizing a penalized log-likelihood function. To avoid negative deaths counts, γ is assumed to follow the functional form γ = e X β and thus the maximization process determines the β values (Rizzi et al., 2015). A roughness penalty is imposed to ensure smooth death distribution forecasts. We follow the method proposed by Eilers (2007) and use a roughness penalty with an order of difference of two or three. The second-order difference can be written as where D 2 is a second-order difference matrix. A thirdorder difference penalty, D 3 , can be implemented by taking the third-order differences of β instead of the second.
We select the order of differences based on an out-ofsample procedure as described in Section 3.1.1 by selecting the order that provides the lowest RMSE in an out-of-sample scheme.
The penalty is multiplied by the weight, λ, which determines the impact of the penalty and thereby how smooth the resulting death distribution should be. λ is selected by a grid search over different values; the optimal λ chosen returns the lowest AIC value. An additional weight is imposed for T3 to ensure that the estimated death distribution is close to the forecasts for d M− , d M+ , and d M , and close to the zero death count at age 121+. V = diag(1, 10 2 , 10 8 , 10 3 , 10 3 ) is the diagonal matrix with weights, and the first A + 1 elements equal to 1 refer to the observed deaths up to age A. In summary, the penalized log likelihood function for a given cohort c can be written as: Eilers (2007) showed that the maximization can be solved by an Iteratively Re-Weighted Least Squares (IRWLS) algorithm. The maximizing equation (1) leads to the following system of equations: where the matrixX has elementsx ik = ∑ j c ij x jk γ j /µ i and can be considered a 'working X ' in the IRWLS algorithm andW = diag(μ). The tilde indicates the current values in the algorithm. See Eilers (2007) for further details on the IRWLS. All computations have been performed with the software R (R Core Team, 2018). Demo R code can be found in the Supplementary Material.

Comparison with the two-dimensional P-spline method
P-spline methods are well established for smoothing mortality rates (Camarda, 2019;Currie et al., 2004). In a two-dimensional setting for forecasting cohort mortality by Continuous Mortality Investigation Bureau (2006), death and exposure counts are modeled assuming that the deaths are subject to a Poisson process. B-splines are used as a regression basis in a penalized IRWLS algorithm. Death counts and exposures are considered as the response and the offset, respectively. An additional penalty enforces smoothness over both age and time. The forecast is treated as a missing value problem that assumes no information for future years. Therefore, the two-dimensional splines method allows smoothing over both age and time and can be used to forecast mortality by treating the forecast period as a missing variable problem (Camarda, 2012;Currie et al., 2004). The main problem with the two-dimensional spline method is its robustness when used for forecasting. Although the method can be very useful and produces accurate forecasts with a short time horizon, it often fails to produce reliable and robust long-term forecasts (Camarda, 2019). We find that the two-dimensional spline method (without further restrictions) is less useful for long-term forecasting because of these robustness problems.
The PCLM model presented in this article is one-(age)dimensional in the sense that each cohort is estimated independently from the other. However, the time dimensional is modelled via the augmented data, where information on the modal age of death is imposed depending on and forecast in time. The PCLM model, therefore, differs from the two-dimensional spline method by using the death distribution as an input variable instead of death rates, and by incorporating the time dimension by augmenting the data instead of specifying age and time parameters.

Data
Data from four countries, England, Wales, Sweden, and Switzerland, are used to illustrate the fit and forecast of the model. These four countries have been chosen because they have high quality data and sufficiently long data series of deaths and exposure. Actual cohort death rates and exposures have been downloaded from the Human Mortality Database (HMD) (Human Mortality Database, 2019). For females and males in Sweden and England & Wales, we start with the birth cohort 1860. We begin in 1860 because the data quality for Sweden is considerably lower before 1860 (Glei & Lundström, 2019) and we aim to have equally long time series. Data for Switzerland are only available from the 1876 birth cohort and hence this is the first cohort used for both Swiss males and females. For all countries, data are collected for years up to and through 2016.

Application and results
For each country analyzed, cohort death rates are calculated by dividing cohort death counts and exposures from the HMD. From the cohort death rates, cohort life tables are derived. Cohort life tables are completed up to the birth cohort of 1905 and are abridged for all more recent birth cohorts. From the estimated complete cohort death distribution, cohort life expectancy is calculated to provide information on the average period that a specific birth cohort may be expected to live. For each country, and separately for each incomplete cohort death distribution, the PCLM is estimated. For cohorts in T2, the simple PCLM for ungrouping is applied, while in T3 the extended version of the PCLM with augmented data is used. A second-order difference penalty provided the lowest forecast error for all of the considered populations and thus we use this throughout. Forecast errors for the thirdorder difference penalty are shown in the Supplementary Material.
In this section, we report results for Swedish women only as an illustration. Extensive results by country and gender are presented in the Supplementary Material. Fig. 3 shows the age-at-death distribution of Swedish females by cohort: the blue part of the distributions represents the actual raw data (smoothed with the PCLM), whereas the red part of the distributions shows the estimate of the PCLM that allocates the sum of the remaining cohort deaths in single ages up to age 110. Up to the cohort of 1935, the PCLM successfully redistributes the death counts in the death distribution. For the younger cohorts, the PCLM is fed information on the forecast modal age at death and the number of deaths at the mode, as well as before and after the mode. With such augmented data, the PCLM is proven to be able to estimate cohort death distributions up to the 1960 cohort.
The life expectancy at birth for Swedish females, computed with the estimates of the complete death distributions, is illustrated in Fig. 4. The life expectancy is smoothed on a small scale such that the overall patterns are more visible, and a local polynomial regression fitting smoothing procedure is applied. For both time windows T2 and T3, life expectancy at birth increases over cohorts.

Validation tests leaving out data for T2
The proposed modeling of cohort mortality is validated by an out-of-sample comparison of observed and forecast life expectancies. Because actual life expectancy can only be calculated for complete cohorts, only data until the 1905 birth cohort can be used for validation. Death distribution forecasts are constructed, fitting the PCLM to a restricted part of the data and leaving out 10, 15, 20, 25, and 30 years to be forecast. From the perspective of the cohort, when 10 years are omitted, this means that observations for the 10 oldest ages are left out for validation of the 1905 cohort, the 9 oldest ages for the 1904 cohort, and so on until the 1896 cohort. Life expectancies are then calculated for the cohorts for which data are left out or until no earlier cohort is available. For example, when 10 years are left out from the analysis, life expectancies for cohorts from 1896 to 1905 are computed. A similar procedure is performed, leaving 15, 20, 25, and 30 years out. Observed and forecast life expectancies are compared by calculating the root mean squared error (RMSE): whereê x,i is the life expectancy forecast, e x,i is the one observed, and H is the number of years left out. Results are reported in Table 1 for life expectancies at age 0, 50, and 65 years.
When calculating life expectancy, a combination of insample fitted death distributions and out-of-sample death distribution forecasts are used. Because life expectancy at age x only depends on ages higher than or equal to x, the RMSE at age 65 will depend more on out-of-sample forecast values compared to the RMSE at ages of 0 and 50 years. A similar argument holds for the RMSE at age 50 compared to age 0.
We also measure the forecast bias of the model by calculating the mean forecast error using a similar scheme as with the RMSE; that is: (4) Table 1 shows the accuracy of the results for all analyzed populations; i.e. England & Wales, Sweden and Switzerland, females, and males. The PCLM in general forecasts life expectancy accurately. Even when 30 years are left out, the RMSE error remains below 1 year. The RMSE error for life expectancy increases when the forecast horizon widens, which is to be expected because longer and less-certain forecasts are being calculated. The PCLM fits all the analyzed countries equally well, with no systematically lower accuracy errors found for one particular country or sex. The only difference occurs with the 30-years analysis, for which larger errors are found for the female population, but we have no reason to expect this to be a general tendency. Table 2 shows the mean forecast error (ME) for all the analyzed populations. The ME shows a similar result to the RMSE in terms of accuracy, and displays no systematic bias in the model as negative and positive ME errors are found.

Sensitivity analysis for T2
In addition to the out-of-sample test, we validate the model by comparing actual and fitted death distributions. For such a comparison, again, only complete cohorts can be analyzed. Therefore, data from the first cohort available (see Section 2.6 for country-and gender-specific starting cohorts) up to the 1905 cohort are used. For each cohort analyzed, we assume that the last observed age is 65, 70, 75, and 80. We apply the PCLM to the artificially uncompleted cohorts and compare the fit with the actual death distributions. See Fig. 5 for an example of results; i.e., Swedish females born in 1901 with an assumed last age observed of 75.
To further assess the quality of the results, we use the Kullback-Leibler (KL) divergence for each cohort: where f (x) stands for the density of the truly observed age-at-death distribution, andf (x) is the density of the forecast age-at-death distribution. The KL divergence measures the information lost when the fitted density is used to approximate the true one. The minimum of the divergence equals 0. For each country and gender, an average of the KL for each cohort analyzed is reported in Table 3: as a general pattern, the KL divergence decreases as the last age observed of a cohort increases. Additionally, we checked how accurately the PCLM estimates the mode when the last age assumed to be observed for a cohort is 75. We found that the model correctly sets the modal age at death within ±4 years deaths (out of 100.000). We found a slight tendency of the PCLM to overestimate the modal age at death and underestimate the number of deaths at the mode when the last age assumed to be observed for a cohort is 75, as reported in Fig. 5.

Discussion and conclusion
We have presented a novel method to forecast ageat-death distributions of cohorts not yet extinct based on the penalized composite link model. The basis for the forecast is the death counts of a cohort life table. For the studied populations, we constructed cohort life tables using cohort mortality data from the HMD. The grounding assumptions of the proposed method are (i) the Poisson distributed death counts d i of a cohort life table, (ii) the smoothness of the forecast age-at-death distribution γ, and (iii) no deaths occurring after age 120.
The PCLM applied to T2 cohorts -cohorts where the last observed age of death is not far from the modal age at death -only makes use of these three modest assumptions. No target parametric model of the forecast age-at-death distribution is given. Hence, the PCLM smoothly redistributes the remaining deaths after the last age of death is observed. In the presented application, we are able to forecast with the simple PCLM age-at-death distributions of cohorts born up to 1935; i.e., cohorts where the last ages of deaths observed correspond to around 30 years before the cohorts' natural extinction. For younger cohorts -T3 cohorts -where the modal age at death is far from being observed, additional assumptions are required. We set up the necessary constraints by augmenting the input data from the PCLM with prior information on (iv) the position of the modal age at the death of the distribution, and (v) the proportion of deaths before and after the mode. In the presented study, the modal age at death with the corresponding number of death counts are estimated with a LLT model. This works accurately for the populations we studied when both the modal age at death and the number of deaths at the mode show a linearly increasing pattern over calendar years. The PCLM with augmented data succeeds in forecasting age-at-death distributions up to the birth cohort of 1960; i.e., cohorts where the last age of death observed corresponds to around 50 years before the cohorts' natural extinction and have therefore not yet reached retirement age. The proposed model can be applied both to forecast mortality for country populations and sub-populations; for instance, to groups of insured individuals.
The suggested forecasting method for T3 could show limitations when the mortality pattern over time is not relatively stable. For example, if an age-at-death distribution for a particular cohort deviates from the pattern of the neighboring cohorts, the predicted deaths and the modal age at death of that particular cohort might become inaccurate. Weights in the PCLM fitting methodology allow for some flexibility but cannot handle large deviations between neighboring cohorts, and odd fitted death distributions can occur in such situations.
Most mortality forecasts are based on period mortality data, despite the demand from public and private institutions to obtain forecasts for actual cohorts rather than synthetic populations. True cohort forecasts are rare in the literature and this article is one of the few that successfully forecasts cohort mortality. Here, we have introduced a novel method to complete the mortality trajectories of non-extinct cohorts. The completion of cohort mortality is not only useful for forecasting but also in comparisons of cohorts over time.