Dynamic hazards modelling for predictive longevity risk assessment

Predictive risk assessment and risk stratiﬁcation models based on geodemographic postcode-based consumer classiﬁcation are widely used in the pension and life insurance industry. However, these are static socio-economic models not directly related to health information. Health information is increasingly used for annuity underwriting in the UK, using health status when the annuity is purchased. In real life, people develop new health conditions and lifestyle habits and can start and stop a certain treatment regime at any time. This requires the ability to dynamically classify clients into time-varying risk proﬁles based on the presence of evolving health-related conditions, treatments and outcomes. We incorporate landmark analysis of electronic health records (EHR), in combination with the baseline hazards described by Gompertz survival distributions, for dynamic prediction of survival probabilities and life expectancy. We discuss a case-study based on landmark analysis of the survival experience of a cohort of 110,243 healthy participants who reached age 60 between 1990-2000. analysis, in combination with the baseline hazards described by the Gompertz survival distributions, to translate time-varying medical and lifestyle hazards into dynamic predictions of survival and life expectancy. We discuss a case study based on


Introduction
Life expectancy (LE) and longevity projections are of the greatest importance both to the pension and insurance industry and to their clients. Longevity-trend projections are used for managing longevity risk in pricing and reserving for insurance and annuity products as well as for costing of public and private pensions. Changes in mortality projections directly affect annuities costs, especially in the decreasing interest rates environment. LE is also a paramount consideration to individuals planning their financial goals and retirement strategies.
Predictive risk assessment and risk stratification models based on geodemographic postcode-based consumer classification are widely used for life insurance/annuities underwriting (Richards 2008, Villegas & Haberman 2014. These methods are based on the anticipated differences in longevity among groups of people, such as by sex, age, and deprivation. However, these socio-economic models are not directly related to health information.
Health information is increasingly used for annuity underwriting in the UK, using health status when the annuity is purchased.
In medicine, availability of electronic health records (EHR) and other big health data and the accompanying rapid development of analytical means to interpret such data paves the way to the emergence of precision medicine (Hulsen et al. 2019). For an individual approaching retirement, using their pension savings pot to purchase an annuity is no longer the default option. Pension freedoms mean that, guided by financial advice, each individual must make decisions about their own investment and longevity risk. These decisions are made not just once at retirement but dynamically thereafter, allowing for emerging investment performance, changing lifestyle, changing attitudes to risk, and developing health conditions. For example, an individual might choose not to purchase an annuity when they retire, using an income drawdown product and retaining their investment and longevity risk. Over the next years, their changing health status will mean that their life expectancy will change.
Based on this, they might decide to use some of their pensions savings to buy an annuity, or they might feel able to increase the level of income that they take from their savings, or they might feel that they have to decrease the income that they take.
When pricing and reserving for their longevity risks, it is becoming increasingly important that insurance and pensions providers allow for heterogeneity of health conditions and lifestyle as well as socioeconomic status, how these change over time and the impact of changes to treatment regimes. The premium calculation would account for possible future health trajectories of the individual, appropriately weighted according to the results of a dynamic model. In simple terms, this will involve pricing to allow for an individual's medical conditions and lifestyle as with underwritten annuities. Pricing for 'healthy' individuals with no serious medical conditions will be 'select' and will have longer life expectancy than an aggregate level. This 'select' effect will also have to allow for the fact that the individual buying an annuity has made an active decision to do so, as explained above. Life expectancies used in pricing annuities will need to allow for possible future health trajectories. Premiums based on historic mortality experience will reflect historic trajectories and this will need adjusting where expected future trajectories are different (e.g. due to changes in treatment regimes, promotion of smoking cessation, obesity prevalence, etc.).
There are implications for both pricing and reserving assumptions used by annuity providers. As an example, if a higher annuity rate is offered to a smoker at age 60, Unless the annuity provider requests updates from clients regarding their health and lifestyle, the provider would not know the longevity trajectory of the client. The individual might have given up smoking or developed serious health problems. This longevity risk element will be different from the longevity risk of a 70-year-old smoker who is applying for a new annuity because, for the latter, changes in health, etc. over the previous 10-year period will be known to the annuity provider. The individual did not give up smoking between the ages of 60 and 70, and health status is known, for example. The landmark analysis presented in this paper provides a way of allowing for this dynamically.
In medicine, the Cox proportional hazards model is the most popular method of time-to-event analysis or survival analysis. The vast majority of clinical trials and observational studies that analyse survival outcomes use this model. In our previous work (Kulinskaya et al. 2020) we developed a method to incorporate proportional hazards modelling of EHRs into actuarial modelling of hypothetical changes in population or group life expectancy due to medical advances and health interventions.
To do this successfully, some parametric assumptions about the shapes of survival distributions are necessary. We demonstrated that both Gompertz and Weibull survival distributions in combination with the Cox model can be successfully used for actuarial calculations. We illustrated our methodology on the important example of the survival benefits of statins.
Statin therapy for primary prevention of cardiovascular disease (CVD) has been reported to improve life expectancy (Mihaylova et al. 2012) and is widely available in the UK (NICE 2016). In Kulinskaya et al. (2020) we estimated the effect of statin prescription on longevity at ages 70 and 75. We also calculated the hypothetical changes in national life expectancy if all eligible people were to be prescribed a statin.
However, these calculations were based on the health and lifestyle characteristics and statin prescription or lack thereof only at the baseline Gitsels et al. (2016).
Additionally, an assumption of proportional hazards means that a treatment such as statins or a risk factor such as smoking has the same effects at any age, but this does not appear to be a realistic assumption.
In this paper, we incorporate landmark analysis, a dynamic method of survival analysis, in combination with the baseline hazards described by the Gompertz survival distributions, to translate time-varying medical and lifestyle hazards into dynamic predictions of survival and life expectancy. We discuss a case study based on 4 J o u r n a l P r e -p r o o f our landmark analysis on the use of statins, Gitsels et al. (2020). This case study considers the EHRs of a cohort of 110,243 participants who reached age 60 between 1990-2000 with no previous history of cardiovascular disease or statin prescription at baseline. Participants' medical history was updated at 'landmark' time points occurring every six months. Individual life expectancy depends on the individual time-varying health trajectory. Changes in overall and group life expectancy depend on the composition of the population and these attributes. As an application of our methodology, we developed a life expectancy calculator that is available on https://mylongevity.org.

Basics of Landmark Analysis
This section briefly reviews the methodology of landmark analysis. Cox proportional hazards model is introduced in Section 2.1, and the main concepts of landmark analysis are outlined in Section 2.2.

Cox Proportional Hazards Model
Cox proportional hazards model is a semi-parametric method of survival analysis which is widely used in medical applications. In Cox regression, given a vector of covariates Y , the hazard function or force of mortality at time t is factorised as µ(t|Y ) = µ 0 (t) exp(Y T β) for a vector of parameters β. The baseline hazard µ 0 (t) is not specified, and interest is centred on the hazard ratios µ(t|y 1 )/µ(t|y 2 ) = exp((y 1 − y 2 ) T β), which do not depend on the baseline hazard and are constant over time.
This is termed the proportional hazards assumption. However, some parametric assumptions about the shape of the baseline hazard are necessary to estimate a survival function or a life expectancy. The three common parametric distributions easily combined with the proportional hazards assumption are exponential, Weibull and Gompertz distributions. Denote the baseline log-hazard by λ 0 (t) = log µ 0 (t).
Then λ 0 (t) = a corresponds to the exponential, λ 0 (t) = a + bt to Gompertz G(a, b), and λ 0 (t) = a + b log t to Weibull baseline hazards. The respective proportional hazards models are called Cox-exponential, Cox-Gompertz and Cox-Weibull survival models (Bender et al. 2005).
We shall concentrate on Cox-Gompertz model, as it is well accepted that the Gompertz distribution provides a good description of human mortality between ages 5 J o u r n a l P r e -p r o o f the increase in the annual hazard of mortality associated with ageing one year is approximately constant. For England and Wales in 2010, this increase was 1.103 for men and 1.111 for women.
We are aware of an ongoing debate on validity of the Gompertz law at high ages.
Recent publications arguing for extension of the Gompertz law to 106 years and beyond include Newman (2018) and Gavrilov & Gavrilova (2019a,b). A competing view is that of mortality deceleration -distributions of this nature can result from the heterogeneity of Gompertz or Makeham distributions across sub-populations, modelled by a gamma frailty, and can be represented by logistic models, see High Age Mortality Working Party (2015) and Feehan (2018) for comprehensive discussion.
The impact of a Gompertz mortality shape in comparison with S2PML tables which assume mortality deceleration at high ages is -2.0% to -0.4% of annuity value at age 90 male, and -0.2% to -0.0% at age 65 (High Age Mortality Working Party 2015).

Landmark analysis
In the landmarking approach, dynamic predictions for the conditional survival after t = t LM is based on current information for all patients still alive just prior to t LM (van Houwelingen & Putter 2011). The sliding landmark model for a prediction window w, at each landmark point t LM , is the simple Cox model for the data set obtained by truncation at s = t LM and administrative censoring at t LM + w. Here µ 0 (t|t LM , w) is the baseline hazard or force of mortality within the window w. This is a convenient way to obtain a dynamic prediction without fitting a complicated model with time-varying effects.
Such a prediction data set, called "strata" is created for each of a set of prediction time points {s 1 , · · · , s L }, 20 ≤ L ≤ 100, keeping the window width w fixed. We address the choice of window width in Section 4. All the strata are stacked into the so-called super-prediction data set comprising the full data for landmark analysis.
Passing from stratum s to s + 1 corresponds to sliding the window over the time range. A set of Cox models fitted at each prediction data set is the so-called "crude model" of landmark analysis.
More sophisticated modeling is required to analyse the full super-prediction data 6 J o u r n a l P r e -p r o o f set. In the pseudo-partial log-likelihood landmark model ipl * , the regression coefficients β LM (s) are assumed to be the polynomials of time s, and the baseline hazard is modelled as µ 0 (t|t LM , w) = µ 0 (t) exp(Θ(s)), resulting in a smooth time-varying where β LM (s) and Θ(s) are the kth degree polynomials in s.
An individual at risk at t i ≥ s has n is = #{s ≤ t i } copies in the risk set R(t i ). For an individual with an event at t i in the original data set, there are n is tied events in the stacked data. The parameters of the ipl * model are estimated by maximizing the integrated (over s) pseudo-partial log-likelihood introduced by Van Houwelingen (2007).
It is worth noting, that in the ipl * model only the baseline hazard µ 0 (t) changes within a window, but the intercept values Θ(s) and the covariate effects β LM (s) are fixed at their starting values at x LM = s. Therefore, predictions for all s ∈ {s 1 , · · · , s L } are obtained from estimated cumulative hazards where M 0 (s) = an age x 0 consists of multiple risk groups j = 1, · · · , J, each corresponding to a Then the value of the overall population survival function S(x) = P (T ≥ x) for the random lifetime T , at age x ≥ x 0 is the weighted mean of the survival functions ) in the individual risk groups: The sum of weights in the above equation is 1, but we kept the denominator as, in real data, the estimated prevalencesf j are subject to rounding and perhaps other errors.
We use methodology developed in Kulinskaya et al. (2020) to find survival functions S j (x) for each risk group j at age x.

Survival functions under Gompertz-Landmark model
Given a window [s, s + w], the hazards in the ipl * model of landmark analysis (1) are very similar to those in the Cox model. Landmark analysis also allows estimating the cumulative baseline hazards M 0 (x). The goodness-of-fit of a particular parametric survival distribution to the cumulative hazards estimated by the ipl * model needs to be evaluated. In this Section we assume that these hazards are well described by a Gompertz distribution G(a, b). Substituting the Gompertz baseline hazard where a(s) is the baseline value which may depend on the landmark s = x LM , and the effects β LM are centred at the baseline, so that β LM = 0 provides the baseline risk. The log-hazards in various risk subgroups Y = Y j differ by intercept but have the same slope b.
The cumulative hazards, and the survival functions are obtained by integrating the baseline hazards within the window. The survival function can be written as 8 J o u r n a l P r e -p r o o f Journal Pre-proof

Calibration of a predictive survival model
The hazard ratio term's contribution to the survival model needs to be calibrated to provide correct population hazard or survival function at a specific age. One possible approach is to centre the hazard (1) to ensure that the baseline hazards function represents, in some sense, an average risk. Royston (2012) Following Kulinskaya et al. (2020), for a set of the risk groups j = 1, · · · , J, substitute the survival functions (4) at age x ≥ x LM into Equation (2) to obtain the overall survival function as At x = x LM , this is a non-linear equation with one unknown, a(x LM ). The lefthand side is given by the period life-table and the slope b should be determined for a particular population of interest. As S(x) is a decreasing function of a, equation (5) has a unique solution.
Substituting a set of estimated prevalences {f j (x LM )}, estimated landmark pa-rametersΘ andβ LM , and the estimated Gompertz slopeb, and solving equation (5) for a 0 (x LM ), the component estimated survival functionsŜ j (x|x LM = s) = S(x|Y j , x LM = s) are found from (4).

Life expectancy at a landmark age
For a particular population, the parameters a = a(x LM ) and b of the underlying Gompertz distribution G(a, b) can be estimated from a period life dt denotes the exponential integral. However, this expression should be divided by the survival S(z), to provide a proper remaining LE at z. Thus, the remaining life expectancy at age z for a Gompertz distribution is obtained as Similar to Kulinskaya et al. (2020), the component remaining life expectancies e j (z) for each risk group j at age z ≥ x LM are obtained from (4) substituting a To calculate the population remaining life expectancy, consider the survival function of the overall population at age x ≥ x LM , which is a finite mixture of subpopula- Then the remaining life expectancy Using (7), we can estimate a hypothetical impact of changing prevalences of an intervention or lifestyle at a landmark age. Consider one covariate ("intervention"), denoted by y 1 , and coded as 0 or 1. By specifying f j (x LM ) = 0 for all risk groups with y 1 (x LM ) = 1, we obtain a hypothetical remaining life expectancy e 0 (z), z ≥ x LM if there was no intervention of interest, and, by specifying f j (x LM ) = 0 for all risk groups with y 1 = 0, a hypothetical remaining life expectancy e 1 (z), z ≥ x LM with full uptake of the intervention.

Dynamic estimation of survival and life expectancy
The estimated survival functionŜ j (x) =Ŝ(x|Y j , x LM = x) and the estimated remaining life expectancyê j (x) for a particular risk profile Y j (x) given by equations (4) and (6) (x)), we simply use the trapeziodal rule from age z to age x T , and continue with the Gompertz residual LE at age x T . Assume, for simplicity, that the landmark points are ∆ apart. Then

Landmark analysis of the survival benefits of statins
Our retrospective cohort study Gitsels et al. (2020) Gitsels et al. (2020). The models were assessed on the proportional hazards assumption and discrimination (Antolini et al. 2005).
The adjusted hazards of all-cause mortality associated with current statin prescription, smoking and type II diabetes at each age are presented in Figure 1 and (for key ages) in Table 1  No other interactions with current statin prescription were found. Furthermore, among patients with a history of statin prescription, the survival prospects at any landmark age did not differ by how long ago the first prescription was.

Estimating survival and life expectancy
In our previous article (Kulinskaya et al. 2020)   In this case study, we established that the baseline hazards from the ipl * landmark model were also well described by the Gompertz distribution. Figure 2 Due to the study recruitment period (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000), the 1936-40 birth cohort participants were 76 to 80 years old by the end of the study in January 2017. Therefore, we used the observed prevalences in this cohort for ages 60-75, and the prevalences in the two combined birth cohorts for ages 76 to 85, where the number of participants considerably reduced with age due to death or attrition; see Table 2 for the total numbers for this selection.
We considered, for each sex and TS quintile, all combinations of statin use (2 levels), smoking (3), hypertension (3), diabetes (2), hypercholesterolaemia (2), BMI category (3) and cardiac risk (3), 648 combinations in total, where Table 3 provides the numbers of diabetics, smokers and obese patients at key ages for TS quintiles 1 and 5, by cardiac risk. A large number of combinations were absent or very scarce in the data, for example, cardiac risk increases with age and the presence of morbidities and is strongly associated with lifestyle factors. Even though there were approximately 2000 male patients with QRISK2< 20% at age 70 in TS quintile 1 (Table 2), they had no diabetes and did not smoke. There were also no diabetics or smokers among men at medium cardiac risk (QRISK2 20%-40%) from age 80 onward ( Table 3). Females are somewhat better represented in these categories at 70, but not at older ages.
Typically, at all ages, a set of 15-30 risk profiles with more than 50 participants includes from 50% (at age 80) to 89% (at age 60) of all people. And there are only 5 to 15 risk profiles with more than 100 participants, which include from 25% ( and their confidence intervals from the best-fitting Gompertz (green), Weibull (blue), lognormal (aqua) and log-logistic (red) distributions.

17
J o u r n a l P r e -p r o o f 80) to 82.5% (at age 60) of all people. As an example, for males in TS quintile 1, these typical profiles include only non-and ex-smokers from age 75, and people without untreated hypertension from age 70, whereas the QRISK2 increases with age. For the numerous low-count risk profiles, the estimated prevalencesf j , and therefore the respective survival functions and LEs may not be robust.
We estimated the component survival functions (for all window widths) and the life expectancies (based on the 30-year window) for non-zero count risk profiles at each landmark age, using equations (5) and (6). Obtained LEs at age 65 by sex, QRISK2 category, TS quintile and statin use for healthy people, diabetics and smokers without further morbidities are given in Table 4. As an application of our methodology, we developed a life expectancy calculator that is available on https://mylongevity.org.
This LE calculator uses LEs based on the 30-year window.
As an example, for a healthy female aged 65-years old and resident in the least deprived postcode, We also considered in more detail three hypothetical risk trajectories over the life-course: the "healthy" people with a healthy weight, no morbidities, and nonsmokers; and the groups of "healthy diabetics" and "healthy smokers" who differ only by a current diagnosis of diabetes type II and keep this diagnosis, or are current smokers and do not quit. We assume no further morbidities over the life-course.
We also assume the best possible cardiac health for all participants, i.e. the lowest realistically QRISK2 category for this group at age 65, and a switch to a higher cardiac risk category only when the majority in this cohort moved to the higher QRISK2 score. This happens at different ages for different risk cohorts. See Appendix for more details.
Probabilities of death within 10 years for these three trajectories, by statin prescription, sex and TS quintiles 1 and 5 are illustrated in Figure 3.   in diabetics and smokers than in healthy people.
We also calculated hypothetical LEs (8) for our three risk trajectories at the three window widths, which are given in Table 5. Comparing these LEs to the LEs from In this study, we extended this methodology to incorporate the results of the landmark analysis, a statistical method which allows dynamic modelling of the force of mortality. Combining age-dependent health and lifestyle hazard ratios obtained from landmark analysis, with the baseline Gomperz hazards, we obtain dynamic predictions of the survival and life expectancy for particular risk trajectories. We illustrated our methodology on a case-study based on the landmark analysis of the use of statins, Gitsels et al. (2020). In this analysis, we differentiated between 648 possible risk groups within each deprivation quintile and sex.

31
J o u r n a l P r e -p r o o f