Distributed under Creative Commons Cc-by 4.0 Modeling Absolute Differences in Life Expectancy with a Censored Skew-normal Regression Approach

Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest.


INTRODUCTION
Absolute differences in life expectancy are of importance in many scientific fields, i.e., biology, demography and epidemiology (Aldenhoven, 1986;Liu et al., 2012;Olshansky et al., 2012;Sarkar et al., 2010;Spoerri et al., 2006).Often, differences in life expectancy are calculated by traditional life table methods (Chiang, 1984) using sex and age specific mortality rates.Additional covariate information (e.g., education or marital status) and corresponding death rates are usually not available, but absolute differences across levels of such covariates are of interest.Nationwide census based cohort studies (Bopp et al., 2009;Frisch & Simonsen, 2013;Spoerri et al., 2010;Ueda et al., 2013) allow the investigation of mortality trends and the calculation of life expectancy conditional on covariates on individual, household and area level.In such large cohort studies, demographers and To our knowledge, no implementation is currently available to use a skew-normal modeling approach in survival analysis situations in which a fraction of the observations have censored survival times and in which delayed entry is present.Delayed entry occurs e.g., in situations in which subjects are observed from a study entry date until end of the study, and not over the total risk time from a certain given age.In this article we describe how we implemented censored skew-normal regression models for survival data, give results when analyzing life expectancy for the Swiss population using data from the Swiss National Cohort Study and official estimates of life expectancy in Switzerland.

METHODS Data
The Swiss National Cohort (SNC) is a longitudinal study of the entire resident population of Switzerland, based on the 1990 and 2000 national censuses (Bopp et al., 2009).Deterministic and probabilistic record linkage (Fellegi & Sunter, 1969) were performed using the Generalized Record Linkage System (Fair, 2004) to link census records to a death record or an emigration record, based on a set of key variables that are available in both data sets (sex, date of birth, place of residence, marital status, religion, nationality, profession, date of birth of partner and date of birth of children).Mortality patterns and life expectancy patterns are of major interest in the SNC (Moser et al., 2014;Spoerri et al., 2014;Spoerri et al., 2006).Initial SNC mortality linkage was successful for 94% of the deaths (Bopp et al., 2009).The 6% not linked deaths bias the calculation of absolute age-specific mortality rates which in turn would bias estimates of life expectancy (Schmidlin et al., 2013).This bias was removed when including the 6% deaths using pragmatic linkages matching for age and sex only.For analyses presented here, we used the SNC data from the 2000 census onwards with mortality follow-up up to end of 2008 including the pragmatically linked deaths.We investigated 4,098,675 individuals aged ≥35 years or older from the 2000 census.Of those, 481,157 individuals died between 5th December 2000 (date of census) and 31th December 2008.We investigated associations of gender, civil status and education on individual's life span, using parametric survival regression and censored skew-normal and censored Gaussian regression approaches.
For a second analysis and the simulation study we used data from the Human Mortality Database (HMD) which contains death rates and life tables from various populations including Switzerland (Human Mortality Database, 2014).Data are provided from national statistical offices or other sources.We used death rates for one year age intervals from age 35 to 105.For a hypothetical population of N = 100,000 men and women we estimated the number of deaths for each one year age interval from age 35 to age 105, I [x,x+1] := I x , x ∈ 35,...,105, as follows.The death rate for an age interval I x , x ∈ 35,...,105, is denoted as m x .The number of persons alive at the start of the age interval I x is denoted as l x , x ∈ 36,...,105.l x is equal to 100,000 minus all the deaths in all age intervals before I x .The number of deaths n x for each one year age interval I x was then calculated as n x = m x • l x , x ∈ 35,...,105.Data were downloaded and analyzed using the R-package demography (Hyndman, 2012).

Parametric survival models
Parametric survival regression assumes that, for each individual, the time T elapsed from a starting time point (e.g., at age 35) to an event (e.g., death) has a cumulative distribution function (cdf) F(t) and a probability density function (pdf) f (t) from certain classes of distribution functions.Often one assumes a distribution function from an extreme value type, e.g., Weibull, or from a lifetime type, e.g., Gompertz.The survival function is defined as One basic concept of survival regression is the hazard function, defined as The survival function can then be expressed in terms of the hazard function as If one assumes that the observed survival time of each individual is a realization from the same distribution of T (without covariates representing differences between the groups of individuals) and T comes from a Weibull distribution function, then where α > 0 is a scale parameter and γ ≥ 0 a shape parameter of the Weibull distribution, or when T is Gompertz distributed with scale parameter α > 0 and shape parameter γ > 0, then For the survival regression problem one often assumes that for a set of covariates X = (X 1 ,...,X k ) ⊤ the following equation holds where β = (β 1 ,...,β k ) ⊤ is a vector of regression coefficients.This model specification is known as proportional hazard (PH) model and λ(t) is then often called an underlying baseline hazard function.In case that T is assumed to be Weibull distributed one gets or with the assumption of a Gompertz distribution Note that in such a model representation the relationship between the covariates and the hazard function are linear on the log hazard scale.Thus, the effect of increasing a continuous covariate, say X 1 , by d, holding all other variables constant, is to increase the hazard of the event by a factor of exp(β 1 d) at all time point in time, assuming X 1 is linearly related to logλ(t) (Harrell, 2001).Often the above described effect is reported in terms of hazard ratios, i.e., one compares the ratio of hazard rates of an individual with predictor value d compared to one with a value of 0. The hazard ratio is then Unlike the interpretation of a multivariable linear Gaussian regression model, where the regression coefficients β are reflecting an increment in the expected value of a response variable by a one unit change in the predictor variable, the interpretation of regression coefficients from survival regression models is not easily translated to differences in mean survival time.
Another often used type of survival model is the accelerated failure time (AFT) model (see e.g., Harrell (2001)), where one assumes that log with σ , a scale parameter, and ϵ is assumed to come from a survival distribution function ϑ.
Common choices of ϑ(u) are the logistic distribution ϑ Both distributions fail to fit lifetime data adequately, because of their positive skewed distribution.To address a negative skewed distribution, an underlying Weibull distribution is possible.Note from (2) that in the AFT model specification the interpretation of the estimated parameters are in terms of T = exp(X ⊤ β + σ ϵ).The effect of increasing a continuous covariate, say X 1 , by d, holding all other variables constant, is to increase the survival time T by a factor of exp(β 1 d), assuming Eq. (2).Similarly to reporting hazard ratios in PH models, one reports time ratios in AFT models, i.e., the ratio of survival times of an individual with predictor value d compared to one with a value of 0. The time ratio is then exp(β 1 d + σ ϵ)/exp(σ ϵ) = exp(β 1 d).Also in this type of survival modeling, interpretation of regression coefficients is not straightforward in terms of mean survival time.Note that the Weibull PH model and the Weibull AFT are equivalent (Harrell, 2001).

Life expectancy from survival models
It is well known that life expectancy (or expected survival time) conditional on covariates where S(t) is the underlying survival distribution, see e.g., Harrell (2001).Hence, the expected survival time for a reference individual is For Weibull, Gompertz and log-normal distribution functions closed-form expressions exist, i.e., for the Weibull distribution for the Gompertz distribution and for the log-normal distribution where µ, α, γ are location, scale and shape parameters from the corresponding distribution functions, see e.g., Johnson, Kotz & Balakrishnan (1994) and Missov & Lenart (2011).95% confidence intervals (CI) were approximated by sampling procedures from multivariate Gaussian vectors using the covariance matrices of the parameter estimates.

Censored skew-normal regression
The skew-normal distribution function generalizes the Gaussian distribution function allowing for skewness in its shape.We start with a recapitulation of the definition of a Gaussian distributed random variable.A random variable X is Gaussian (normal) distributed with location parameter µ ∈ R and scale parameter σ > 0 if it has the pdf One writes X ∼ N(µ,σ 2 ) if X is Gaussian distributed with mean µ and variance σ 2 .The pdf of standard Gaussian distributed random variable Z ∼ N(0,1) is in the following written as ϕ(•).
The definition of a skew-normal distributed random variable is as follows, see e.g., Genton (2004). where The expectation and variance of a skew-normal distributed random variable Note that if X ∼ N(0,1) and Y ∼ SN(0,1,0), then X and Y are equally distributed.
Since the parameters (ξ,σ,ψ) are directly involved in the pdf representation (3), one speaks of a direct parametrization (DP).Another representation is the so-called centered parametrization (CP) (Azzalini, 1985), where one uses a reparametrization of (3).In brief, one uses centered parameters (µ,α,γ ) in the parametrization of the problem, where γ is a measure of skewness defined as and ψ is the same as in (3) (Genton, 2004).One has the following relation such that the location estimate from a CP corresponds to the expectation of skew-normal distributed random variable in a DP.In Supplemental Information 1 we explain the principles and the conversion of CP to DP, and vice versa.The skew-normal regression problem is similar to Gaussian linear regression.One assumes for a set of covariates X = (X 1 ,...,X k ) ⊤ and regression coefficients β where ϵ ∼ SN(0,σ 2 ,ψ) and is assumed to be independent across individuals.The effect of increasing a continuous covariate, say X 1 , by d, holding all other variables constant, is to increase the survival time T by a shift of β 1 d.For the conversion of DP to CP (or vice versa) in the regression problem only the distributional parameters need to be transformed accordingly (Azzalini & Capitanio, 1999).

Parameter estimation
Parameters for a censored skew-normal regression can be estimated by maximum likelihood estimation.Let the underlying measurement scale be individual's age at end of study Y i or censoring C i , i.e., min{Y i ,C i }.Censoring is assumed to be non-informative (i.e., censoring is independent of the underlying time scale), or from type I censoring (i.e., follow-up time ends at predetermined time).We write δ i = I(Y i ≤ C i ) for the indicator variable for an observed death, where I is the indicator function.Under censoring the likelihood is where g(•;ξ,σ 2 ,ψ) is the pdf given in (3) and S(•;ξ,σ 2 ,ψ) is the survival function assuming and for a censored individual i ≤ n : To obtain the likelihood of all individuals one replaces in ( 6) g(y i ;ξ,σ 2 ,ψ) by g * (y i ;ξ,σ,ψ|Y i ≥ D i ), and S(c i ;ξ,σ,ψ) by S * (c i ;ξ,σ,ψ|Y i ≥ D i ), respectively.It has been mentioned in e.g., (Azzalini, 1985) that maximizing the negative log likelihood of ( 6) has singularity problems if ψ = 0, and yield convergence problems in the MLE.To overcome this problem it has been suggested to use the CP approach, which removes the singularity at ψ = 0, and has further advantages in faster convergence and improved likelihood shape over the DP (see e.g., Azzalini (1985) and Azzalini & Capitanio (1999)).We used the CP for the numerical derivation of the estimates by MLE using R Version 3.1.1and Stata Version 13.1.Program code and used functions are provided as Supplemental Information 1.

Model assessment
For all types of survival models, the model should adequately fit the data in order to obtain correctly interpretable estimates and correct coverage of confidence intervals.For PH models the relationship between the covariates and the log hazard should be linear.Further, the covariates affect the underlying distribution of the time variable by adding X ⊤ β to the log hazard function.The effect of the covariates is assumed to be the same at all time points.For AFT models each covariate affects log(T) linearly.Further, the underlying variance σ in Eq. ( 2) is a constant, independent of the covariates (Harrell, 2001).Assessing the model fit of parametric survival models is often restricted to e.g., graphical assessment by stratified predictor levels or stratified Kaplan-Meier estimates of the distribution of residuals.For the skew-normal regression problem Y = X ⊤ β + ϵ, where ϵ ∼ SN(0,σ 2 ,ψ), certain assumptions must be validated (Harrell, 2001): residuals should have no systematic trend in central tendency against any predictor, they should have the same dispersion and they should have a skew-normal distribution in the predictor-space.These assumptions can be checked by median and lower and upper quartiles of the residuals, stratified by intervals of the predicted outcome.Distributional model assumptions of a skew-normal regression can be visually checked by a comparison of quantiles of estimated residuals and quantiles from a theoretical skew-normal distribution in a quantile-quantile plot.
We graphically compared the model fit from a Gompertz model with a skew-normal model by a log-hazard plot.In general from Eq. ( 1) one obtains that logλ(t) = logf (t) − logS(t).Note that for a Gompertz distribution the log-hazard is linear in the time scale t, i.e., logλ(t) = logα + γ t, t ≥ 0. Thus, any non-linearity in the log-hazard from a skew-normal model would indicate a deviation from the Gompertz model fit.
Goodness-of-fit using official life expectancy estimates was assessed by Pearson's chi-squared test statistic (Agresti, 2013), where O i denotes the observed number of deaths from offical estimates in one year age intervals.We calculated expected number of deaths E i for one year age intervals from the underlying regression models.A larger value of X 2 indicates a greater difference of O i − E i (Agresti, 2013).

Model setting
SNC data were analyzed by survival regression models with remaining age at 35 years as the underlying time scale and delayed entry date as the 5th of December 2000.Assumed underlying survival distribution function was either Weibull or Gompertz for a PH model, or Weibull and log-normal for an AFT model.Individuals were censored if they were alive after 31th December 2008.For a direct comparison using the censored skew-normal regression approach we investigated age at 31th December 2008 (censoring information) or age at death as the dependent variable with a delayed entry date of 5th December 2000.To compare with a Gaussian linear regression approach we used an author programmed MLE function for censored Gaussian regression with left-truncated observations, not further described here.HMD data were analyzed using the same regression models but without delayed entry or censoring.

Simulation study for model distribution misspecification
Parametric modeling assumes that a given underlying distribution function is the true distribution of the outcome variable.We performed a simulation study to assess whether life expectancy estimates of a skew-normal regression and a Gompertz survival regression are biased in case of model distribution misspecification.For that purpose we first fitted Gompertz and skew-normal models to Swiss data of the Human Mortality Database to get location, scale and shape parameters for each distribution, as close to real data as possible.Second, using these parameter estimates, we built random samples with different samples sizes (i.e., 100, 1,000, and 10,000) from Gompertz and skew-normal distribution functions.Third, for each sample we fitted a Gompertz or a skew-normal model and reported estimated life expectancy μ and corresponding confidence intervals.As a third distribution function we combined the Gompertz and skew-normal distribution functions to get a mixture distribution functions with mixing proportions δ = {0.1,0.25,0.5,0.75,0.9},where the mixing pdf is defined as m(x) = δf G (x) + (1 − δ)g(x) with f G , the Gompertz pdf, and g(x) the skew-normal pdf, defined as in Eq. (3).Thus, with the mixture distribution we mimic a situation where the study sample consists of samples from different underlying distribution functions, with different mixing proportions.For each sample we reported coverage of the true mean life expectancy µ 0 and the bias μ − µ 0 .We did 1,000 simulation repetitions.a larger sample size of 10,000 the skew-normal model showed worse coverage proportions compared to the Gompertz model (i.e., 0.72 if the true underlying distribution function was Gompertz).In case of a mixture distribution we found that the skew-normal model overestimates the true mean life expectancy with increasing mixture weights for Gompertz (δ = 0.1: −0.0166 ± 0.1080, δ = 0.5: −0.0433 ± 0.1114, δ = 0.9: −0.0747 ± 0.1189), leading to decreasing coverage proportions of (δ = 0.1: 0.946, δ = 0.5: 0.926, δ = 0.9: 0.868).
Figure 1 compares the log-hazards from a Gompertz model (dark-blue line), a skew-normal model (red line) and from SNC data (light-blue line), by gender.Both a Gompertz and a skew-normal model underestimates the log-hazard at ages from 35 to 55.Nevertheless, the Gompertz model shows a slightly better model fit at these ages, compared to a skew-normal model.From age 50 onwards, a Gompertz model and a skew-normal model show almost identical model fits.Figure 2 shows a histogram of number of deaths per one year age intervals from the hypothetical population per gender.Density lines for each model were overlaid.Goodness-of-fit measured by X 2 was lowest for a Gompertz survival model and the skew-normal regression model.Thus, both showed best model fit.Highest X 2 were obtained for the AFT log-normal model and the Gaussian regression model, indicating worst model fit among all investigated models.

DISCUSSION
For modeling absolute differences in life expectancy, we compared results from commonly used parametric survival regression models to those obtained from a skew-normal  regression model.We implemented a censored skew-normal regression approach which allowed to account for left-truncated observations having censored survival times.Our findings suggest that a censored skew-normal regression model is an adequate approach in the analysis of absolute differences in life expectancy.Surprisingly, statistical software like Stata (Stata Corporation, College Station, TX, USA) or R (R Project, University of Vienna, Austria) do not support skew-normal regression approaches in their core.A Stata suite for skew-normal and skew-t models has been introduced (Marchenko & Genton, 2010), and an R package for fitting univariate and multivariate skew-normal and skew-t models is available in the package sn (Azzalini, 2011).However, both available additions do not allow for regressions with left-truncated and censored observations.Parametric survival models are often used to investigate the association of covariates on survival time.Effect measures are reported either as hazard ratios or time ratios, which yield no direct interpretation in terms of differences in survival time or life expectancy.Differences in life expectancy and corresponding confidence intervals from parametric survival models have to be calculated from estimated distributional and further model parameters from the underlying survival distribution function, using non-trivial transformations.Our presented skew-normal regression approach has the advantage that parameter estimates are directly interpretable in terms of absolute differences in life expectancy, similar to results from a linear Gaussian regression.In contrast to a Gaussian regression model, the negative skewness of lifetime data seemed reasonably captured when assuming a skew-normal distribution.Analyzing mortality data from the Swiss National Cohort or hypothetical data derived from official death rates in Switzerland showed that results, in terms of differences in life expectancy and goodness-of-fit, were comparable to those of commonly used parametric survival regression models, especially the Gompertz model commonly used for the analysis of life expectancy (Robertson, De Los Campos & Allison, 2013).Our results confirm the results from Robertson and colleagues (2012) that differences in life expectancy are similar across Gaussian-type regression models and parametric survival models, which can be explained by the central limit theorem.Our simulation study showed that the Gompertz model had better true mean life expectancy coverage in case of model misspecification compared to the skew-normal model.Thus, parameter estimates and standard errors from the skew-normal model could be more biased than those from a Gompertz model.Our presented approach has limitations.First, the skew-normal distribution has a support on the real line R, such that life expectancy could be estimated within an implausible range.Such situations could occur in a population with a high proportion of dead individuals compared to surviving individuals, and the individuals died very young.Then, most of the survival information lies in the upper end of the left-tail, and estimates from a skew-normal distribution could be implausible.Second, it is well-known that the distribution of time from birth to death has a bimodal shape, i.e., peaks occur at early infancy and older ages (Robertson & Allison, 2012).Our current approach does not include the modeling of bimodal distributed data, i.e., through mixtures of skew-normal distribution functions (Lin, Lee & Yen, 2007) or semiparametric approaches (Ma & Hart, 2007).Third, possible bias in the estimation of life expectancy is introduced by censoring.From the 4.1 mio investigated individuals in the SNC roughly 480,000 persons died over eight years of follow-up.Thus, 88.3% of the study population is censored and only 11.7% have exact time to death information.By study design this is a type-I censoring situation, and most of the likelihood information is driven by a pre-determined time point c i in the likelihood function defined in the survival function S(c i ;ξ,σ 2 ,ψ) in Eq. ( 6).
Besides the mentioned analysis methods described so far, other approaches of analysing mean survival time have been proposed.For example, Andersen and colleagues (2004) used so called pseudo-observations for the estimation of (restricted) mean survival time.Pseudo-observations are defined as "leave-one-out" estimators, i.e., parameters are estimated on subsamples where one observations is omitted, and is thus related to jackknife procedures.The advantage of this approach is that for the calculation of restricted mean survival time nonparametric estimators (i.e., Kaplan-Meier estimator), but also parametric survival models (i.e., using standard survival distribution functions) in the regression setting, can be used to calculate the pseudo-observations.Another approach is the use of flexible survival regression techniques (Royston & Lambert, 2011).Flexible survival regression models model the baseline cumulative hazard function using restricted cubic splines, and allow the calculations of restricted mean survival time (Royston & Parmar, 2013).
We conclude that a censored skew-normal regression approach is a possible alternative to parametric survival models for modeling differences in life expectancy.The advantage of this approach over parametric survival regression techniques is that parameter estimates are directly in terms of mean life expectancy.Other underlying Gaussian-type distribution functions (i.e., the skew-t distribution or the compressed Gaussian distribution) have been investigated (Clark et al., 2013;Robertson & Allison, 2012), with promising results in terms of model fit.In our analysis and simulation study the skew-normal distribution did not outperform the Gompertz distribution function.However, we found the skew-normal distribution a good compromise compared to other distribution functions in terms of model fit and modeling complexity.For example, fitting a skew-t distribution to larger data sets is computationally more intense due to the additional distributional complexity.Obviously a weighing of the gain in the use of more complex distribution function is needed, especially when differences in mean life expectancy are of main interest.A censored skew-normal regression approach is an alternative to existing Gaussian-type regression approaches in the modeling of life expectancy with the advantage of parameter estimates directly expressed in differences of life expectancy.

Figure 1
Figure 1 Log-hazard plots of SNC death rates, Gompertz proportional hazard model, and skewnormal model, by gender.

Figure 2
Figure 2 Histogram of estimated number of deaths per one-year age intervals.Probability density functions from estimated parameters from proportional hazard (PH) Weibull and Gompertz models, accelerated failure time (AFT) log-normal model, and skew-normal and Gaussian regression models.
). Choosing the relevant time scale is a crucial decision in modeling survival times.Often not the total risk time starting from time origin (e.g., date of birth) is observed, but only the risk time from study entry (the date at which a person entered the study and came under observation) until the end of the study.This concept is called delayed entry or left-truncation.In this case one has to consider the conditional distribution of Y i given that Y i ≥ D i , where D i is a given time point or individual's age at study entry.Note that the likelihood of an uncensored individual i ≤ n :

Table 3 Remaining life expectancy at age 35 years estimated from official death rates 2008 (simulated 100,000 individuals), by gender.
Notes.CI, Confidence interval; DF, Degrees of freedom; PH, Proportional hazards model; AFT, Accelerated failure time model; HMD, Human Mortality Database; RLE, Remaining life expectancy.