Educational Differences in Mortality and Hospitalisation for Cardiovascular Diseases for Males

IZA DP No. 14507 JUNE 2021 Educational Differences in Mortality and Hospitalisation for Cardiovascular Diseases for Males* High educated individuals are less frequently admitted to hospital for cardiovascular diseases and live longer than the lower educated. We address whether the educational gradient in the mortality rate can be explained by the educational difference in the timing of CVD hospitalisation. We account for possible selective hospitalisation, by using a correlated multistate hazard model (a ‘Timing-of-events’model) and, for selection into education, by using inverse propensity weighting based on the probability to attain higher education. We use Swedish Military Conscription Data (1951-1960), for males only, linked to administrative Swedish registers. Our empirical results indicate a clear educational gradient in mortality and in the impact of CVD hospitalisation on mortality. The implied educational gain in the number of months lost is, however, mainly due to other factors than CVD hospitalisation. Extending the analysis to cause specific mortality reveals that the largest educational differences exist in death due to external causes. JEL Classification: C41, I14, I24


Introduction
The strong association between education and health is one of the most widely studied in economics. Observational evidence suggests that high educated people live longer (Mazumder, 2008, Clark and Royer, 2013, Fletcher, 2015, McCartney et al., 2013. The recent focus in the education-health literature is on determining to what extent education causes later mortality (Galama et al., 2018, Xue et al., 2021. However, uncovering the underlying mechanisms that produce these causal relationships are mostly ignored. One path to higher mortality for the low educated is through more (and longer) hospitalisations, in particular hospitalisations for cardiovascular diseases (CVD). CVD hospitalisation is clearly associated with life-style behaviour, as smoking, limited exercise and obesity are all well-known causes for cardiovascular health problems.
Higher education may have countervailing e↵ects on hospitalisation if it reduces the probability of negative health conditions that might lead to hospitalisation yet increases the probability of hospitalisation (e.g., through greater income, more knowledge, or better connections) for given health conditions. Likewise, higher education may have countervailing e↵ects on mortality if it increases income and access to health-system care but also increases higher-risk behaviors and selection into more stressful occupations.
Only a few studies have attempted to identify the causal e↵ect of education on hospitalisation (Arendt, 2008, Tansel and Keskin, 2017, Meghir et al., 2018. Arendt (2008) found, for Denmark, using a bivariate probit model, a clear educational gradient in the probability of ever been hospitalised, but no significant educational impact on the number of days in hospital. Meghir et al. (2018) found, for Sweden using linear Di↵erence-in-di↵erence regressions and using a regression discontinuity approach both based on a reform in compulsory schooling, no impact of education on the total number of days in hospital nor on the probability of ever been hospitalised for (amongst other diseases) circulatory diseases. Tansel and Keskin (2017) found, for Turkey using a Tobit and a Double Hurdle model, that an increase in years of education reduces the number of days hospitalized. However, in all these papers the role hospitalisation experience plays in explaining mortality is ignored.
The contribution of this paper is threefold. First, it addresses the educational gradient in the mortality rate and the impact of the timing of entry and discharge of the life-style related 1 CVD hospitalisation on the educational gradient. Second, it derives the implied educational gain, both directly and through CVD hospitalisation, on survival and months lost, of improving education with one level. Third, it is the first paper that investigates the hospitalisation and mortality process jointly, through a correlated multistate hazard model.
Most studies investigating the impact of education on mortality have used a linear or a probit model to estimate the educational gradient, although the age at death (and timing of CVD hospitalisation process) is clearly a duration outcome. In duration analysis the hazard rate, the instantaneous probability that an individual enters a certain state (death, hospitalentry or discharge) at a certain age conditional on surviving (or not in that state) up to that age, is usually modelled. Accounting for right-censoring, when the individual is only known to have survived up to the end of the observation window and time-varying variables, e.g. the age somebody enters hospital, are easy to handle in hazard models, see a.o. Lancaster (1990) and Van den Berg (2001). A common way to accommodate the presence of observed characteristics is to specify a proportional hazard (PH) model, in which the hazard is the product of the baseline hazard, the age dependence, and a log-linear function of covariates.
Neglecting confounding in inherently non-linear models, such as proportional hazard models, leads to biased inference. To accommodate this (see e.g. Van den Berg (2001) for a discussion of the importance of this), the mixed proportional hazard model (MPH) extends the PH model by multiplying it by a time-invariant person-specific random error term. This has been the main model for analysis of duration data in economics.
Only a few studies on the educational gradient in mortality used a hazard model. Meghir et al. (2018) used a regression discontinuity-type approach based on a reform in compulsory schooling, Bijwaard et al. (2017) and Bijwaard and Jones (2019) used an inverse propensity weighting method and, Bijwaard et al. (2015aBijwaard et al. ( ,b, 2019 a structural modelling approach. Yet, none of these studies considered the relation between hospitalisation and mortality or accounted for selective hospitalisation (nor the timing of hospitalisation) in the mortality process. Bijwaard and van Kippersluis (2016) has shown that education influences both entry and discharge from hospitals, and that higher educated individuals are less likely to die after a hospitalization. When accounting for the role of intelligence using a structural equation multistate model, this association disappears. In their model the interdependence of the hospitalisation and mortality process, is solely driven by intelligence. They ignore the 2 endogeneity of the timing of hospitalisation.
The richness of our data enables us to go beyond standard modelling of life cycle durations, and to tackle the complex task of examining jointly the mortality and hospitalisation processes. Correlation between these two processes might stem from correlated unobserved heterogeneity, and our model accommodates this. Applying the "timing-of-events" (ToE) method (Abbring and van den Berg, 2003) allows us to give the estimated e↵ects of the hospitalisation dynamics on mortality a causal interpretation. Controlling for unobserved correlated heterogeneity in the hospitalisation and mortality processes is thus crucial, since otherwise the resulting endogeneity would confound the causal impact. In particular, (first-time) hospital entry, hospital discharge, hospital re-entry and mortality durations are modelled as mixed proportional hazards which incorporate correlated unobservables.
However, the timing-of-events method still fails to correct for possible enodogeity of educational attainment. The propensity score method (Hirano et al., 2003, Caliendo andKopeinig, 2008) we employ accounts for such confounding. It is based on the unconfoundedness assumption, which assumes that all variables that a↵ect mortality, hospitalisation and education attainment are observed. This is a stringent assumption, but our data contain important factors as detailed family socioeconomic background (including paternal-and maternal socioeconomic status at birth and education level), cognitive skills (IQ-test) and non-cognitive skills (psychological test). However, confounding between the education choice and the hospitalisationmortality process may still exist. To test possible violation of the unconfoundedness we use an extension of the sensitivity approach of Bijwaard and Jones (2019).
We base our (generalised) propensity scores on a multistage sequential model of educational attainment developed by Cameron and Heckman (2001) with five education levels.
Based on the estimated propensity scores we estimate a weighted Timing-of-events model (IPW-ToE) with weights based on the inverse of the generalised propensity score (Inverse Propensity Weighting, IPW, (Hirano et al., 2003)), see Bijwaard and Jones (2019) for an application of (a single valued) IPW method for a Mixed Proportional Hazard mortality model.
Based on the estimated model we calculate the educational gain of improving education in both the survival probability till age 63 (the highest age observed) and in the number of months lost till age 63. We also decompose these educational gains into a indirect e↵ect, running through changes in the hospitalisation process, and a direct e↵ect due to other factors. Data (1951Data ( -1960, linked to administrative Swedish registers, o↵ers the opportunity to investigate the impact of hospitalisation and education on (cause-specific) mortality. We have information on about half a million men who are followed from the date of conscription till the end of 2012, or till death. For those men who die we observe the cause of death. From the Swedish National Hospital Discharge Register we observe CVD hospital care from 1964 till the end of 2012. These data include recordings of demographic and socioeconomic characteristics such as education, parental socioeconomic status, parental education, along with anthropometric measures, an intelligence test and a psychological assessment. Educational level was classified in five categories: primary education; some secondary education (2 years); full secondary education (3 years); post-secondary education and higher education.

Data from the Swedish Military Conscription
A couple of other studies have also used data from Sweden to investigate the educationalmortality gradient. Spasojević (2010), Lager and Torssander (2012) and Meghir et al. (2018) all used the compulsory schooling reform [1949][1950][1951][1952][1953][1954][1955][1956][1957][1958][1959][1960][1961][1962] in Sweden as an instrument for an increase in education, Spasojević (2010) using survey data and both Lager and Torssander (2012) and Meghir et al. (2018) using register data of children born 1940-1955. Lundborg et al. (2016 used a (linear) twin fixed e↵ects model design, twins born 1886-1958, to address endogeneity of education choice. Bijwaard et al. (2017) and  both focus on the educational gradient in cause-specific mortality. Bijwaard et al. (2017) used a cause-specific months-lost model with IPW and family fixed e↵ects and  used a structural cause-specific hazard model to account for educational endogeneity. The data used in these papers is close to the data we used, the Military Swedish Conscription data linked to the death register and census data. Bijwaard et al. (2017) used the birth cohorts 1951-1983, but restricted the analyses to men with at least one brother, while  used, just as we do, only the birth cohorts 1951-1960 but discarded the men without IQ measurement. However, neither of these papers used information on CVD hospitalisation.
The empirical analyses show that both education and CVD hospitalisation are important factors influencing mortality. We find a clear educational gradient in the mortality, even after accounting for the endogeneity of education through inverse propensity weighting. We also find that mortality is much higher for those in hospital for CVD (39-68 times higher) and for those who have been in hospital (2.1 to 4.7 times higher). These hospitalisation e↵ects also exhibit a educational gradient, with a decreasing impact of hospitalisation the higher the education level.
Based on these parameter estimates we calculate the educational gains and find that men with only primary education would gain the most (about 9 months from age 18 to 63) if they had a higher education level. Only a small part of this educational gain is running through a change in CVD hospitalisation. Although men with post-secondary education would only gain 2.3 months if they had higher education most of this educational gain (1.6 months) is attributable to changes in the CVD hospitalisation for the higher educated.
Evidence suggests di↵erential impact of education on various diseases accumulating in di↵erent educational cause-specific mortality gradients (Galobardes et al., 2004, Bijwaard et al., 2017. As the socioeconomic association seems the largest for cardiovascular diseases most studies have focussed on socioeconomic di↵erences in mortality on these types of diseases. Some have indeed found that the incidence of cardiovascular disease is higher for individuals with low socioeconomic status (Mackenbach et al., 2008, Kulhánová et al., 2014. However, Bijwaard et al. (2017 found that most of the educational gains in mortality up till age 63 (the same maximum age we are using) are attributable to the reduction in mortality due to external causes and that the reduction in death due to CVDs with improving education is rather small.
To investigate whether education and hospitalisation for CVD a↵ects di↵erent causes of death di↵erently we also estimate a model with cause-specific mortality rates, distinguishing five di↵erent causes of death : 1) Ischemic Heart Disease (IHD); (2) Stroke; (3) other cardiovascular causes; (4) External causes, and (5) Other (natural) causes of death. The model is an extension of the timing-of-events model with IPW. We find that all causes of death show a clear educational gradient (except between post-secondary and university/PhD), with the largest educational di↵erences for external causes of death. We also find that death due to other CVD is a↵ected the most by hospitalisation. However, hospitalisation for CVD also a↵ects the mortality due to other natural causes and due to external causes.

2 Data
The data come from several Swedish population-wide registers which are linked using unique individual identification. The Swedish Military Conscription Data includes demographic information of the conscripts and information obtained at the military examination, including a battery of intelligence tests and a psychological assessment. These data are linked to the National Population and Housing Censuses , containing information on the socioeconomic status and educational levels of the parents. Information on each conscript's own education was obtained from the Longitudinal Integration Database for Health Insurance and Labour Market Studies (LISA) for the 1990-2012 period, and information on cause-specific mortality as the underlying cause of death was obtained from the Cause of Death Register for the period up to 2012. The information (timing of admittance and discharge) on hospitalisation for Cardiovascular Diseases (CVD) 1 is derived from the Inpatient Register. Coverage runs from 1964 to December 31 st 2012. The study population consists of men born between 1950 and 1960, who were identified in the Multi-Generation Register, and who were conscripted for military examination between 1969 and 2001 usually when they were aged 18-20. At that time in Sweden, military service was mandatory for men only. So, we only observe males. We selected only those, 517,843, men for whom at least one parent is known and for whom we observe the conscription date.
These data include recordings of demographic and socioeconomic characteristics such as education, parental socioeconomic status, parental education, along with anthropometric measures, an intelligence test and a psychological assessment and health measures (height, weight, blood pressure, and muscular strength). The intelligence measurement is based on a battery of IQ tests, which consisted of four subtests that measured logical, spatial, verbal, and technical abilities. Each subtest was first evaluated on a normalized ninepoint (stanine) scale. The subtest scores were summed to obtain an overall score and transformed onto a stanine scale with a mean of five and standard deviation of two. We only used this final global IQ measurement.
The psychological assessment is also based on a normalized ninepoint (stanine) scale. For both the inteligence and the psychological assessment measurements we also define a missing indicator, when this measurement is not observed. The parental socio-economic status has 1 ICD 8 and 9: 390-459 ICD 10: I. 6 seven categories: Non-manual workers at higher level, Non-manual workers at intermediate level, Non-manual workers at lower level, Farmers, Skilled workers, Unskilled workers and Others, not classified. We also define the unknown SES indicator. Parental education has six categories: primary (< 9 years), primary (9 10 years), Secondary education (2 years), Full secondary education (3 years), Post-secondary (University < 3 years) and Higher (University 3 years or PhD). 2 .
We aggregated the observed education of the conscripts into five classes: (i ) Less than 10 years of education (only primary schooling); (ii ) Some secondary education (2 years); (iii ) Full secondary education (3 years); (iv ) Post-secondary education (less than 3 years) and (v ) Higher education (University and PhD). More detailed information on the data can be found in Bijwaard et al. (2017).
About 80 thousand (of the 518 thousand) men have experienced CVD hospitalisation. Table 1 presents the distribution of this hospitalisation by educational attainment. The CVD hospitalisation experience is lower for the high educated and so is the average number of days spent in hospital for CVD. For men without hospitalisation experience low educated men have died four times more often than high educated men. While for men with hospitalisation experience the educational gradient is less steep. Finally, the table shows that hospitalisation experience clearly increases mortality. Next, we calculate the Kaplan-Meier survival curves. These non-parametric survival curves for the five education categories are shown in Figure 1 and reflect the mortality di↵erences by education and by hospitalisation experience. Survival increases with the education level and the di↵erences between the education levels increase with age. Comparing the survival curves without hospitalisation (left panel) and with hospitalisation (right panel) shows the large impact of hospitalisation on mortality. The Kaplan-Meier survival curves for the first admittance to hospital and re-admittance to hospital by education level in given in Figure 2 also show a clear educational gradient.
However, it takes time to experience hospitalisation, which depends on educational attainment and, these mortality di↵erences, therefore, do not necessarily reflect the influence of education and/or hospitalisation on mortality per se. The observed di↵erences in survival between low and high educated men could also be induced by a higher IQ or a higher socio-economic background of the high educated men. For example, understanding a doctor's advice and adhering to complex treatments after hospitalisation may be driven by intelligence rather than education. In the next section we explain how we account for this. A major methodological concern with the empirical analysis of the impact of hospitalisation on mortality is that the admittance and discharge processes depend on individual characteristics, both observed and unobserved, that also influence mortality. This implies that any observed relationship between admittance to (or discharge from) hospital and mortality may be caused by unobserved factors that influence both the hospitalisation and mortality. For example, a finding that men with high educated fathers live longer may not necessarily imply that low socioeconomic background (low educated father) causes higher mortality. Rather, it may be induced by the higher hospitalisation of men from low socio-economic background. To account for the interdependence of the hospitalisation process we model the first admittance-, discharge-and re-admittance hazards of this process simultaneously with the mortality hazard. This is a multistate model with correlated hazards, also called a 'timing-of-events model' (ToE) (Abbring and van den Berg, 2003), which explicitly controls for the correlation between 9 the hospitalisation process and mortality, to account for this interdependence.

Timing-of-events method
Let T m denote the time, age, till death (mortality), T h the age at the start of the first hospital spell, T d the age at hospital discharge and, T r the age at hospital re-admittance (after discharge). The durations of a hospital stay and the time after hospitalisation are denoted by d h = T d T h and d r = T r T d . In order to keep track of hospitalisation events, we also define the associated time-varying indicators: the indicator I h (t) takes value one if the individual is in hospital at age t, and I o (t) indicates that the individual has experienced a period of hospitalisation before age t.
In Figure 3(a) we depict the hospitalisation dynamics and mortality for an arbitrary military recruit. This recruit is not in hospital (healthy) at the military examination, as holds for all observed men. At some age t h the recruit is admitted to hospital for a CVD.
This implies that the time till first hospitalisation is h = t h . The recruit is discharged from hospital at age t d . Thus, the time he stayed in hospital is d = t d t h . However, at age t r he is admitted again to hospital for a CVD, which implies the time till re-hospitalisation was r = t r t d . At age t m this recruit dies, after a time of d = t m t r in hospital. During hospitalisation the indicator I h is one and after the first hospitalisation spell indicator I o is one. Thus, both I h and I o are one when the recruit is in hospital for the second time.
The most commonly applied multistate model in biostatistics is the illness-death model (Hougaard, 2000, Putter et al., 2007. In this type of model individuals start out healthy. From healthy they may become ill (enter hospital) or they may die. Ill individuals may die or recover (leave hospital) and become healthy again. The timing-of-events model is a special case of a such a model, in which the mortality not only depends on current illness-status, but also on illness experience. This model is depicted in Figure 3(b). Another large di↵erence with standard ill-death models is that in the Timing-of-Events model the transition rates among the di↵erent states are independent conditional on unobserved random heterogeneity.
We model the first admittance to hospital using a Mixed Proportional Hazard (MPH)

Mortality
Hospitalisation Death 11 with a baseline hazard, age dependence, h0 (t), unobserved time-invariant characteristics v h , and observed time-invariate characteristics x and education level e. We assume a Gompertz baseline hazard, which assumes that the hazard increases exponentially with age: x captures the impact of exogenous individual characteristics, x on the hospitalisation hazard and h e captures the impact of (possibly endogenous) individual education, e on the hospitalisation hazard.
As the individual is either in hospital or not, the hospitalisation is alternating, and has three possible transitions: admittance, discharge and, (the absorbing state) death. The conditional hazards for the discharge and re-admittance spells also follow MPH models: with transition specific Weibull baseline hazards k0 (d) = ↵ k d ↵ k 1 , unobserved time-invariant characteristics v k , and observed individual characteristics x, where k 2 {d, r} denotes the hospitalisation state.
The mortality hazard, our main outcome, is also of the MPH form. We allow the mortality hazard to depend on the timing of the hospitalisation process through a direct e↵ect of hospitalisation, captured by I h (t) and I o (t), or, indirectly, through correlated unobservable heterogeneity terms: The age dependence of the mortality hazard is assumed Gompertz. The Gompertz hazard is known to provide accurate mortality hazards (Gavrilov and Gavrilova, 1991). Our parameters of interests are m e , the e↵ect of the individual education level and the e↵ect of staying in a hospital, he , and the e↵ect of hospital experience, oe (both also depending on the education level of the individual), on the mortality hazard. For identification we set m 1 = 0, primary education, as the reference category.
For the sake of parsimoniousness, we assume that each of the unobserved heterogeneity terms is time invariant, remains the same for recurrent durations of the same type, is independent of observed characteristics x and, we adopt a discrete distribution (which is standard in the "Timing-of-events"-literature), i.e. v has discrete support (V 1 , . . . , V W ), with V w = v h,w , . . . , v m,w and p w = Pr(V = V w ). 3 The likelihood function is given in Appendix A.
The identification of the e↵ect of hospitalisation on the mortality in the "timing-of-events" model hinges on two key assumptions. The first assumption is the mixed proportional structure of the hazard rates, as reflected in the hazard structure defined in (1) to (4). This assumption is necessary to distinguish between true duration dependence modelled via the k0 -functions and dynamic selection because the group of men with early entry in hospital has a di↵erent composition than the group with late entry. Similarity, the group of men with long hospital stays is di↵erent from the group with much shorter hospital stays. Dynamic selection is taken into account via observed and unobserved heterogeneity. It is well known that in the absence of observed heterogeneity, true duration dependence cannot be distinguished from unobserved heterogeneity (Elbers andRidder, 1982, Heckman andSinger, 1984).
The second assumption is the no-anticipation assumption, which is defined in terms of the mortality hazard of the potential outcomes, the mortality rate that would be observed if the individual entered hospital at age s (including those who never enter, s = 1), ✓ In other words, we assume that individuals do not anticipate entering hospital for CVD by dying before the anticipated event would occur. This assumption does not hold if individuals died because they know they will enter hospital for CVD. Although we think this is rather unlikely (but untestable with our data) we are cautious in using a causal interpretation of the obtained CVD hospitalisation e↵ects. Still, even if the no-anticipation assumption does not hold the timing-of-events method corrects for possible endogeneity of the hospitalisation processes.
The identification of the model framework is proven and discussed at length in Abbring and van den Berg (2003). To provide some intuition, first note that the data can be broken into three parts: (i ) a competing risk part for the duration until either a recruit is admitted to hospital for CVD or dies, whichever comes first, (ii ) a competing risk part for the duration until either discharge from hospital or until death, and (iii ) the residual duration from the moment of hospital experience until death. From Heckman and Honoré (1989), it follows that under general conditions the whole model except for he and oe is identified from the data corresponding to the competing risk parts. Subsequently, he and oe are identified from the data corresponding to part (iii ) of the model.
To clarify further what drives the identification of he and oe , consider individuals who enter hospital for CVD at age t. The natural control group consists of individuals who have not been in hospital at t. A necessary condition for a meaningful comparison of these groups is that there is some randomization in the hospital admittance at t. The duration model framework allows for such a randomization because it specifies assignment by the way of the rate of entering hospital. In addition, we have to deal with the selection issue that the unobserved heterogeneity distribution is di↵erent between the hospitalised and unhospitalised groups at t. This is handled by exploiting the information in the data on what happened to individuals who entered hospital or died before age t.
Another way to look at this is to note that the timing of the consecutive events of admittance to and discharge from hospital is informative on the presence of the causal e↵ect of a hospitalisation. If CVD hospitalisations are often followed very quickly by death, then this

Accounting for selective educational attainment
However, mortality and the hospitalisation process may be influenced by factors that also determine the education choice. This may render education a selective choice and makes it endogenous to mortality and hospitalisation. The timing-of-events method still fails to correct for possible enodogeity of educational attainment.
We follow a propensity score method to account for selection on observed characteristics and estimate the e↵ect of education on the hospitalisation rates and on the mortality rate. Figure 4 provides a graphical illustration of the relationship between observed characteristics, education and the hazards from the timing-of-events model ✓ using a directed acyclic graph, where each arrow represents a causal path (Pearl, 2000(Pearl, , 2012. It states that observed early childhood characteristics X, such as parental background and intelligence, influence the education choice E, and the hazards of the hospitalisation and mortality processes. Possible unmeasured childhood (pre-age 18) factors, U 0 , may also influence both the education choice and the hazards.
We base our (generalised) propensity scores on a multistage sequential model of educational choice developed by Cameron and Heckman (2001) with five education levels e = 1, . . . , 5 (see also (Heckman et al., 2018b,a)). People start their educational career in primary school (e = 1) and choose if they wish to finish some secondary school (e = 2). Importantly, the set of future choices available to them depends on their earlier educational choices. If people choose to finish some secondary school, they have the choice to graduate secondary school (e = 3), if they graduate they have the choice to enroll in post-secondary schooling (e = 4), if they enroll, they have the choice to graduate from post-secondary schooling, if they graduate post-secondary schooling they have the choice to enroll in higher education (e = 5) and, if they enroll they have the choice to graduate from higher education.
We assume a Probit model for the probability of attaining a higher education level conditional on having obtained the (previous) lower level, based on the sequential model for educational attainment. The propensity score of choosing education level e using the sequential probit assumption is: with (·) is the standard Normal cumulative distribution. In the propensity scores we control for maternal socio-economic status, paternal education, maternal and paternal age at birth, birth order, IQ and psychological assessment (all obtained at the military examination, when the individuals are 18 years of age), see Table C.6 in Appendix C. We check whether the propensity score is able to balance the distribution of all included variables in all education groups by calculating the standardized bias, or normalised di↵erence in means, see Table B.3 in Appendix B. The overlap, or common support assumption requires that the propensity score is bounded away from zero and one, see Figure D.1-D.1 in Appendix D.
First, we discuss the assumptions, common in the potential outcomes literature that uses propensity score methods, to identify the impact of education on the mortality outcomes.
We use potential hazard rates, the hazard rate that would be observed if the individual had obtained education level e = 1, . . . , 5, ✓ k (t|e) for k = {h, r, d, m}. We observe pre-treatment (educational level) covariates X that influence the education choice.
Unconfoundedness Assumption: ✓ k (t|e)? E|X for e = 1, . . . , 5 and k 2 {h, r, d, m} where ? denotes independence. The unconfoundedness assumption (Rubin, 1974, Rosenbaum andRubin, 1983) asserts that, conditional on covariates X, the education level is independent of the potential outcomes. This assumption requires that all variables that a↵ect both the hazards and the education choice are observed. Thus, there is no U 0 in Figure 4. Note that this does not imply that we assume all relevant covariates are observed. Any missing factor is allowed to influence either the hazards or the education choice, not both. We check the robustness of our estimates to this, rather strong, unconfoundedness assumption by assessing to what extent the estimates are robust to violations of this assumption in Section 4.3. Rosenbaum and Rubin (1983) show that if the potential outcomes are independent of treatment (in our case education) conditional on covariates X, they are also independent of treatment conditional on the propensity score. Hence if unconfoundedness holds, all biases due to observable covariates can be removed by conditioning on the propensity score (Imbens, 2004). The average e↵ects can be estimated by matching or weighting on the propensity score. We follow a inverse propensity weighting method to account for this endogeneity (Hirano et al., 2003). Inverse probability weighting based on the propensity score, or inverse propensity weighting (IPW), creates a synthetic sample in which the educational attainment is independent of the included covariates. The synthetic sample is the result of assigning to each individual a weight that is proportional to the inverse of their propensity score. Note that the only relation between the assumptions for the propensity score and the assumptions for the timing-of-events model is that the covariates used to calculate the propensity score are independent of the unobserved heterogeneity of the timing-of-events model. This implies that we assume that the unobserved heterogeneity factors in the hazards do not influence the education choice.
Based on the generalised propensity scores we re-estimate the Timing-of-events models using a re-weighted pseudo-population based on inverse generalised propensity score weighting (IPW-ToE) (Frölich, 2004, Feng et al., 2012, see Bijwaard and Jones (2019) for an application of (a single valued) IPW method for a Mixed Proportional Hazard mortality model.
Misspecification of the propensity score will generally produce bias. Rotnitzky and Robins (1995) point out that if either the regression adjustment or the propensity score is correctly specified the resulting estimator will be consistent. To account for this, we also use a doubly robust estimator, that includes the covariates both in the propensity score and in a regression adjustment.
Although the unconfoundedness assumption is not directly testable and clearly a strong assumption, it may be a reasonable approximation. Bijwaard and Jones (2019) have shown that intelligence, as measured by an IQ-test, is a principal source of education selection and including this information in the propensity score is robust to possible unconfoundedness violation. The literature has developed a few ways to address violation of the unconfoundedness assumption, e.g. Imbens (2003), Nannicini (2007), Ichino et al. (2008), Bijwaard and Jones (2019). In Section 4.3 we discuss some sensitivity analyses in detail.

Results
We estimate both a timing-of-events model without accounting for endogeneity of education (ToE-model) and a timing-of-events model that accounts for this endogeneity through inverse propensity weighting (IPW-ToE-model), as described in the previous Section. Table 2 presents the estimated e↵ect on the mortality hazard of education and CVD hospitalisation by education for both models. 4 We observe a clear educational gradient. The mortality rate for men with some secondary education is 44% (= 1 e 0.585 ) lower than the mortality rate for men with only primary education. For men with full secondary education the mortality rate is 60% lower, for men with post-secondary education 70% lower and, for men with higher education 73% lower.
For all education levels we find that the mortality rate is higher in hospital and also for those who have experienced CVD hospitalisation. For both hospitalisation e↵ects we find a clear educational gradient. For example, consider the di↵erence in the estimated e↵ects of CVD hospitalisation while in hospital on the mortality rate for men with primary education versus men with higher education. While for men with primary education the mortality rate in hospital is 68 (= e 4.221 ) times higher, for men with higher education it is 39 times higher (compared to an individual with higher education not in hospital and without hospital experience, or 11 (= e 3.66 ⇥ e 1.303 ) times higher compared to an individual with primary education not in hospital and without hospital experience). Similarly, the mortality rate for men with higher education is 'only' 2 times higher when they have experienced CVD hospitalisation (or even 44% lower compared to the mortality rate of men with primary education without hospitalisation experience).
Accounting for education endogeneity through IPW, reported in the second panel of the table, slightly a↵ects the estimated coe cients of education on the mortality rate. The estimated educational impact on mortality for the two highest education groups is smaller when using the IPW correction. Thus, this reduces the educational gradient in mortality. Accounting for educational endogeneity also changes the estimated impact of hospitalisation.
On the one hand, for the higher education groups the IPW-correction (slightly) increases the estimated impact of being in hospital on the mortality rate and, therefore, decreasing the educational gradient in the hospitalisation e↵ect. On the other hand, the IPW correction reduces, for all education groups, the impact of hospital experience on the mortality rate.

Implied Educational gain
In a single risk (e.g. only mortality) proportional hazard model, the educational coe cients have a clear interpretation as providing the proportionality factor of the hazard for a given education level (or conditionally on the unobserved heterogeneity for a mixed proportional hazard rate model). However, in our timing-of-events model, when the e↵ect of the education is also modelled in the hospitalisation process, for the hazard of first hospitalisation (1), the discharge hazard of discharge (2) and, the admission hazard (3), this does not hold. The reason is that (total) survival not only depends on the mortality hazard, but also on the hospitalisation hazards. Hence, the impact of a specific education level on survival not only depends on the parameter value of this education level on the mortality rate, but also on the e↵ect of this education level on all hospitalisation hazards. Admission to and discharge from hospital also show a clear educational gradient, see Table C.1 to Table C.3 in Appendix C.
Including interactions in the hazards between the education level and the status of the hospitalisation process further complicates interpretation. As a result the reported coe cients of education and hospitalisation in Table 2 are rather di cult to interpret. The measures we derive in this section provide information with clear interpretation on the impact of improving education (by one level) on survival and the length of life, while accounting for the age dependence of mortality, the impact of education on the mortality and, the educational di↵erences in the hospitalisation process.
We use counterfactual simulations to assess the educational gain for two such measurements: (1) the educational gain in the survival probability up till age 63 and (2) the educational gain in the number of months lost due to early mortality before that age. The the men can only enter hospital (for CVD) or die and we are using the (first) hospitalisation and mortality hazards to simulate these events. When a simulated man enters hospital, we continue to use the discharge hazard from hospital (and adjust the mortality hazard), and similarly for a man with simulated hospital experience we simulate possible re-admission (and adjust the mortality hazard). In this fashion we obtain, for each simulation round, the simulated hospitalisation and mortality process for each of the 10,000 men in the synthetic cohort. Of course, this also includes (many) men who do not experience (are simulated to) any hospitalisation and/or live beyond age 63.
The simulated survival outcome is the average (averaged over the rounds and the synthetic cohort) percentage of simulated men that survives till age 63. The simulated months lost outcome is the average month lost (months before age 63) for each simulated man. Note that when a (simulated) man survives beyond age 63, the survival is censored at age 63, and no months alive are lost.
For the counterfactual simulations we impose a given education level for the hospitalisation hazards and the mortality hazards. Let Y (e 1 , e 2 ) be the simulated average outcome (survival probability or months lost) for men with a mortality rate given education level e 1 and hospitalisation hazards given education level e 2 , then the educational gain for improving education from e to e + 1 is: with the direct e↵ect, the educational gain not running through CVD hospitalisation, in the first term in brackets of (6) and the indirect e↵ect, the educational gain running through changes in CVD hospitalisation, in the second term in brackets. Table 3 reports the estimated educational gains by education level. Improving education (with one level) would lead to 2% to 5.5%-point increase in the survival probability from age 18 to age 63 and 2 to 9 months longer expected life (till age 63). The educational gains (for both measures) are clearly the largest for men with the lowest education level, men with primary education. Except for an improvement of education for men with post-secondary education the direct e↵ect of educational improvement, i.e. from other factors a↵ected by education than CVD hospitalisation, is statistically significant and the main source of the educational gain. Only for men who would improve their education level from some to full secondary education or from post-secondary to university the indirect e↵ect of CVD hospitalisation on the educational gain in the survival probability is statistically significant. In fact, for the latter men the educational gain in the survival probability is mostly due to di↵erences in the hospitalisation process between the two education levels.
For the educational gain in months-lost, the di↵erences in the educational parameters of the hospitalisation hazards do not significantly contribute to the decrease in months lost when a man would have had a higher education level. The reason why the educational di↵erences in the CVD hospitalisation process do not lead to significant educational gain in months alive is probably due to the early censoring age of 63. We expect that we would have found significant 22 educational gains if we could have followed the men until they had reached a higher age.

Robustness
In this section, we present a couple of robustness checks. First, we investigate whether removing the possibility of reverse causation changes the estimated impact of education (and hospitalisation) on mortality. Reverse causation might occur as education influences both psychological fitness and intelligence measured at the military examination. A couple of studies have shown that additional education improves intelligence (Falch and Sandgren Massih, 2011, Banks and Mazzonna, 2012, Schneeweis et al., 2014, Carlsson et al., 2015, Dahmann, 2017. In that case, intelligence is a mediator in the causal path from education to health (Bijwaard and Jones, 2019). Ideally, we would have multiple measurements of the (development) of intelligence over the life cycle, to account for both the selection and mediation of intelligence in the causal path from education to mortality. However, in our data, we only observe intelligence at late adolescence (during the military examination) when measured IQ 23 can be either the result of the attained education or a proxy of early childhood intelligence which influences education choice. Similar reasoning holds for psychological fitness.
To account for this possible 'reverse causation' we also estimate models without psychological assessment or IQ measurement in the propensity score. The results in the first panel of Table 4 reveal that when leaving out these measurements form the propensity score the estimated coe cients of education and hospitalisation for CVD in the mortality hazard change significantly. Both the direct educational gradient (except for some secondary education) and the "hospital experience"-e↵ect and its implied educational gradient become (significantly) larger. However, this would violate the unconfoundedness assumption as these measurements influence both the education choice and the hospitalisation and mortality process. From the estimated coe cients of IQ and psychotical assessment, given in Table C.5 and Table C.6 in Appendix C, it is clear that these measurements are very important for both education attainment and the hospitalisation and mortality hazards. We, therefore, prefer the model with IQ and psychological assessment measurements included in the propensity score.
Second, throughout we have assumed that the propensity scores are estimated consistently.
Misspecification of the propensity score will generally produce biases. Rotnitzky and Robins (1995) point out that if either the regression adjustment or the propensity score is correctly specified the resulting estimator will be consistent. Thus, to improve the robustness of the proposed methodology we estimate a doubly robust estimator of the model, which also includes a regression adjustment (using the same control variables as included in the propensity score).
The results in the second panel of Table 4 indicate that these doubly robust estimates are very similar to the original estimates, reported in the second panel of Table 2.
Third, the association between education and hospitalisation and mortality could also stem from another type of 'reverse causality', in which childhood ill-health constrains educational attainment Rosenzweig, 2004, Case et al., 2005). To avoid such reverse causation, we had left out any health measurements from the propensity score. To test whether the estimated e↵ects are a↵ected by this choice we estimate a, double rouble robust, IPW-ToE model that includes health measurements at age 18 available in the data. These additional health measurements include the height (also height-squared), the BMI (= weight in kg/(length in meters) 2 ), BMI-squared, systolic and diastolic blood pressure (and both squared) and grip strength. The results in the third panel of Table 4 indicate that the estimated education and hospitalisation e↵ects on the mortality rate including these health measurements into both the propensity score and as control vaiables in the hazards are very similar to the original estimates.  (4) Post-secondary education; (5) University or PhD. b Without IQ or phycological assessment as control variables in the propensity scores. c Including all the covariates both in the propensity scores and all the hazards. d Doubly robust with additional health measurements included: height, height-squared, BMI, BMI-squared, systolic and diastolic blood pressure (and squared) and grip strength.
Significance of the di↵erence compared to the basic results in the second panel of In Bijwaard et al. (2017), who also use data on Swedish recruits (more cohorts, but no hospitalisation data), it is argued that the men in the lowest and the highest educational groups di↵er too much in their observed background characteristics, which causes severe overlap problems in the propensity score. To address this overlap problem, they estimate separate propensity scores of attaining a higher educational level through pairwise comparisons of ad-25 jacent educational levels. They did not consider the sequential probit model we assume for the educational attainment. As a fourth robustness check, we investigate whether using only adjacent education levels would lead to di↵erent results. To this end we estimate an IPW-ToE model for each pair of adjacent education levels separately: primary vs some secondary education; some secondary vs full secondary education; full secondary vs post-secondary education and; post-secondary vs university education. The estimated educational e↵ects from these models, in Table B.4 in Appendix B, are not directly comparable to the original results and are, therefore, deferred to the appendix. The reason is that for each pair of education levels the reference group changes. We focus on the implied educational gains instead and compare them with the original educational gains reported in Table 3. These educational gains, reported in Table 5, do not di↵er significantly from the original educational gains.

Sensitivity
The critical assumption in propensity score weighting is that of no selection on unobservables. To test the sensitivity of the estimates to this unconfoundedness assumption we build on the sensitivity analyses of Nannicini (2007), Ichino et al. (2008) and, in particular, Bijwaard and Jones (2019) and extend these analyses to the Timing-of-events model with IPW.
The two main assumptions for these sensitivity analyses are that the possible unobserved confounding factors can be summarised in a binary variable, U , and that the unconfoundedness assumption holds conditional on X and the additional variable U , i.e. ✓ k (t|e) ? D|X, U (for and ! R (e) respectively, all obtained from estimating a MPH model on the relevant duration (time till first hospitalisation, time to discharge or time to re-admittance after discharge).

27
We call these coe cients the 'first hospitalisation e↵ect', the 'discharge e↵ect' and, the readmittance e↵ect'. A measure of the e↵ect of U on the relative probability to have chosen eduction level e is ⇠(e), which is the coe cient of U in a sequential probit model of choosing education (E = e) using U and X as covariates. In line with Bijwaard and Jones (2019) we call this coe cient the 'selection e↵ect'.
The probability values of the distribution for U are chosen so that they mimic the distribution for each included binary variable. For example, consider the probability that an individual with full secondary education, e = 3, has the highest IQ score, 9. Then, p 300 is the probability for those individuals who died and experienced CVD hospitalisation before the end of the observation period, p 310 is the probability for those individuals who survived till the end with hospital experience, p 301 is the probability for those individuals who died before the end without hospital experience, and p 311 is the probability for those individuals who survived without hospital experience till the end. For each probability configuration, p emh , of U we repeat the simulation of U , the estimation of the mortality-, hospitalisationand, selection e↵ects M = 100 times and obtain the average of these 100 simulations. The total variance of these averages can be estimated from (see Ichino et al. (2008)): with f 2 {!, ⇠} of each pairwise education comparison,f m is the estimated f in each simulation sample m and s 2 m is its estimated variance. For each probability configuration, p emh and each simulated U we re-estimate the IPW-ToE model including U in the propensity score to obtain the education and hospitalisation impact on the mortality hazard, he and oe . Again, for each probability configuration the average impact and its variance, using (9), is calculated.
From the estimated parameters in the doubly robust IPW-ToE model (Table C.5 and   Table C.6 in Appendix C) we see that IQ is the most important control variable, influencing education choice, the hospitalisation process and, mortality. We, therefore, focus on the results of the sensitivity analysis when assuming U mimics the observed distribution of the IQ-measurements, i.e. the observed education choice and censoring probabilities are equal to the observed education choice and censoring prevalence for individuals with a given IQ level.
We find indeed that the simulated U 's that mimic the distribution of the IQ-values lead to high and significant mortality, first hospitalisation and selection e↵ects, see Table B.5 in Appendix B. 5 Table 6 reports the simulated impact of education and hospitalisation on the mortality hazard for including these simulated U 's in the propensity score. We find the largest changes in our IPW-ToE estimates when U mimics the education-hospitalisation-mortality distribution of those with the lowest IQ, global IQ = 1. These di↵erences are, however, not statistically significant. 6 This seems to indicate that the applied propensity score adequately accounts for the endogeneity of education.
5 The estimated mortality-, hospitalisation-and, selection e↵ects for the simulated U 's that mimic the distribution of the other control variables are given in Table B.6 and Table B.7 in Appendix B.
6 The estimated education and hospitalisation impact in the IPW-ToE model using simulated U 's that mimic the education-hospitalisation-mortality distribution of the other control variables are given in Table B.8 in Appendix B. Again, for these U 's we do not find any statistically significant di↵erence with the estimated original education and hospitalisation impact in Table 2.

Cause of death
In the previous section we have shown that hospitalisation increases and education decreases mortality. Bijwaard et al. (2017 have shown that, even after accounting for selective education choice, education is negatively associated with most major causes of death. However, these articles ignored the hospitalisation process. Here, we investigate the impact of education and hospitalisation on cause-specific mortality and how the impact of hospitalisation di↵ers by education, by extending the timing-of-events model with competing causes of death. The data also contain, for those who died, the cause of death. We aggregated the causes of death into five (the first three reflect death due to CVD) categories: (1) Ischemic Heart Disease ; (2) Stroke; (3) other cardiovascular causes; (4) external (suicide, tra c accidents and homicide) causes of death, and (5) Other causes of death. Table 7 reports the percentage of individuals that died from a particular cause before the end of the observation window. To take the timing of the deaths into account, we also calculated the cumulative incidence 31 functions, the probability of dying from a specific cause of death before some age, with or without hospitalisation. The (non-parametric) Aalen-Johansen cumulative incidence functions Aalen and Johansen (1978)  show again a clear educational gradient in the probability to die from each of the five causes of death. Comparing the cumulative incidence curves with and without hospitalisations we notice two things. First, the shape of the cumulative incidence curves for external causes (including suicide, tra c accidents and homicide) without hospitalisation di↵ers substantially from the cumulative incidence curves for other causes of death without hospitalisation and, second, only the probability to die from cardiovascular diseases increases after hospitalisation for CVD. Of course, some caution in interpreting these figures is that the probability somebody has been in hospital for CVD also increases with age and depends on observed and unobserved individual factors and this is not accounted for in the cumulative incidence functions. We account for such dynamic selection in our timing-of-events model.
We use an extension of the timing-of-events model of Section 3 to cause-specific mortality.
Instead of one mortality hazard we have five mortality hazards, one for each cause of death.
For each of these hazards we assume a MPH form as in (4). To account for possible endogeneity of the hospitalisation process the unobserved heterogeneity of each cause-specific hazard is possibly correlated with the hospitalisation hazards in (1) to (3) and with the other causespecific hazards. Just as for the analysis of total mortality we account for possible endogeneity of education by using an inverse propensity weighting (in fact the weights are exactly the same based on the same sequential probit estimation of improving education, see Table C.6 in Appendix C).
The results in Table 8 indicate that all causes of death show a clear educational gradient (except for stroke or other natural causes of death between post-secondary and university/PhD), with the largest educational di↵erences for external causes of death. The results also indicate that, not surprisingly, death due to all CVD causes is elevated by CVD hospitalisation. Death due to stroke and other CVD's are a↵ected the most by hospitalisation for CVD. It seems odd that the mortality for external (up to 12 times higher) and other natural causes (up to 61 times higher) is also higher when an individual is in hospital for CVD. A reason for this might be that when discharge from hospital occurs in the morning, which would imply in the data that the individual is still in hospital on that day, could still lead to death for other causes during that day (including external causes, such as a tra c accident or a fatal fall). Unfortunately, we cannot account for this as we do not observe the discharge time at the day of discharge. Another reason for this anomaly might be that we only observe the main cause of death. Co-morbidity could lead to death to other natural causes.
The educational gradient in the hospitalisation e↵ect (decreasing impact of hospitalisation with increasing education) is not clearly present for all causes of death, especially when comparing the highest two education levels. Also surprising is, that for men with the highest education level CVD hospitalisation experience decreases mortality due to external causes.
This implies that these men have less tra c accidents and/or suicide after CVD hospitalisation.

Implied Educational gain
The coe cients in a competing risks Timing-of-events model are even more di cult to interpret than the coe cients of the "standard" timing-of-events model, because the impact of a specific education level on death by a specific cause not only depends on the parameter value of this education level on the cause-specific mortality rate, but also on the e↵ects of this education level on all the other causes-of-death hazards (and still also on all hospitalisation hazards).
We, therefore, derive the implied educational gains, similar to the simulation analysis in Section 4.1 based on the estimated timing-of-events model for the cause-specific mortality with IPW. In a competing risk setting for cause specific mortality we can derive the implied educational gain in months lost for each specific cause of death. If a simulated man dies from a specific cause of death, say ischemic heart diseases, the simulated months lost due to this cause for this man is the number of months this man died before age 63. This implies, for this particular man, he has no months lost for the other causes of death. Table 9 reports the estimated months lost for each cause of death, decomposed into a direct e↵ect, the educational gain not running through CVD hospitalisation, and the indirect e↵ect, the educational gain running through CVD hospitalisation. We do not find any statistically significant educational gradient in the CVD causes of death. Bijwaard et al. (2017) also found small, but statistically significant, educational gains for death due to cardiovascular diseases.
Thus, including the CVD hospitalisation process reduces this even further. However, as also mentioned by Bijwaard et al. (2017), these men are still rather young (max 63) to be really a↵ected by CVD.
The largest educational gains (also in line with Bijwaard et al. (2017)) are found for external causes of death. Men with primary education would gain 5.7 months alive due to lower external death mortality when their education is improved. Men with some secondary or full secondary would gain about one month alive due to lower external death mortality when their education is improved. Men with primary education are the only men who would gain, about 2.5 months, from educational improvement due to reduced mortality due to other natural causes. For all these significant educational gains we only find a direct e↵ect of education, i.e. from other factors a↵ected by education than CVD hospitalisation.
The educational gains in terms of the cumulative incidence functions, i.e. the probability to die from a particular cause as it evolves over age, are depicted in Appendix D. Figure

Conclusion and discussion
Higher educated individuals are less frequently admitted to hospital for cardiovascular diseases (CVD) and live longer than their lower educated peers. A relevant question is, therefore, whether the educational gradient in mortality can be explained by the educational di↵erence in CVD hospitalisation. A common approach to obtain the impact of both education and CVD hospitalisation on mortality is to estimate a (mixed) proportional hazard model for the mortality hazard. However, viewing the educational level and the hospitalisation as ordinary (exogenous) variables may lead to biased inference of the e↵ect of these variables on the mortality hazard. Any observed relationship between admittance to (or discharge from) hospital and mortality may be caused by unobserved factors that influence both the hospitalisation and mortality. Educational attainment is also very likely to depend on the same observed factors. Such confounding renders education and hospitalisation endogenous in the mortality analysis. We obtain the impact of education and hospitalisation by education on mortality by accounting for both the selection into the hospitalisation process (both admittance and discharge) and the selection into education.
In particular, we estimate the e↵ects of the hospitalisation process on the mortality rate using the "timing-of-events" -method (Abbring and van den Berg, 2003). We control for correlated e↵ects that arise from correlation between unobservables in the hospitalisation and mortality processes. To account for the endogeneity of the education attainment we apply inverse probability weighting (IPW) methods using the propensity score, based on an estimated sequential probit model for educational attainment. Based on the estimated model we calculate the educational gain of improving education in both the survival probability till age 63 and in the number of months lost till age 63. We also decompose these educational gains into an indirect e↵ect, running through changes in the hospitalisation process, and a direct e↵ect due to other factors.
We base our results on Swedish Military Conscription Data (1951Data ( -1960, linked to administrative Swedish registers, linked to administrative Swedish registers including information on CVD hospitalisation and death. We have information on about half a million men who are followed from the date of conscription till the end of 2012, or till death. For those men who die we observe the cause of death. Educational level was classified in five categories: primary education; some secondary education (2 years); full secondary education (3 years); post-secondary education and higher education.
The empirical analyses show a clear educational gradient in mortality, even after accounting for the endogeneity of education through inverse propensity weighting. We also find that mortality is much higher for those in hospital for CVD and for those who have been in hospital.
These hospitalisation e↵ects also exhibit a educational gradient, with a decreasing impact of hospitalisation the higher the education level. From the implied educational gains we conclude that men with only primary education would gain the most if they had a higher education level. However, only a small part of this educational gain is running through a change in hospitalisation for CVD. The only exception are men with post-secondary education for whom the expected gain in months lost if they had higher education is mostly attributable to changes in the CVD hospitalisation for the higher educated.
We present a couple of robustness checks. We investigate whether removing the possibility of reverse causation for psychological assessment and IQ measurement changes the estimated impact of education (and hospitalisation) on mortality. The association between education and hospitalisation and mortality could also stem from childhood ill-health which constrains educational attainment. To test whether the estimated e↵ects are a↵ected by the choice to exclude the health measurement from the propensity score (to avoid such reverse causation) we also estimate a model that includes health measurements at age 18 available in the data.
We further address the robustness of our results by comparing the estimation results with the results from a doubly robust model, in which observed characteristics are both included in the propensity score and as control variables in the hazards. For comparison with Bijwaard et al. (2017), we investigate whether using only adjacent education levels leads to significantly di↵erent educational gains. Only for the model excluding IQ and psychological assessments from the propensity scores we find, slightly, di↵erent estimation results. However, leaving out these variables from the propensity score would violate the unconfoundedness assumption as these measurements influence both the education choice and the hospitalisation and mortality process. We, therefore, prefer the model with IQ and psychological assessment measurements included in the propensity score.
The empirical results for the cause-specific mortality analysis reveal that all five causes of death show a clear educational gradient, with the largest educational di↵erences for external causes of death. They also reveal that, not surprisingly, death due to CVD (IHD, stroke and other CVD) is a↵ected the most by hospitalisation for CVD. The educational gradient in the hospitalisation e↵ect (decreasing impact of hospitalisation with increasing education) is not present for all causes of death, especially when comparing the highest two education levels.
The implied educational gains in months lost by cause of death are only significant for external causes and other natural causes. They do not indicate that hospitalisation (indirect e↵ect) plays an important role in explaining the educational gain. We do not find any statistically significant educational gradient in the CVD causes of death.
During the observation period, Sweden had an advanced public health care system providing services independently of individual income. Educational gains through increasing access to health services through higher income seem to be less important in the context of the Swedish health care system, with its broad coverage and access, than in a health-care system such as that in the United States, in which many individuals are not covered by health insurance. However, the role of education in understanding of health information and in changing health behaviour, healthy life-style, is still potentially present. This study provides better insight in the role of CVD hospitalisation, strongly related to healthy life-style, in explaining the educational di↵erence in mortality. Of course, this is only one of the many possible channels that explain this di↵erence.
A limitation of our data, based on military entrance examination, is that we only observe men and no information on women is available. Another limitation is that, although military conscription was mandatory in Sweden, men with severe mental disabilities or severe chronic diseases were exempted from the military examination. Thus, our results only apply to those who had no severe chronic diseases at age 18 and are, therefore, likely an underestimate of the impact of CVD hospitalisation on mortality. Another limitation is that we only observe mortality before the age of 63. In the future, when these men have been followed for a longer period, the educational di↵erences in mortality and the relevance of CVD hospitalisation in explaining this may change as mortality due to CVD plays a larger role.
The main issue is whether we can give our results a causal interpretation. In the literature three di↵erent approaches have been employed to examine the causal e↵ects of education on mortality. The first approach exploits changes in compulsory schooling policies as instrumental variables for educational attainment to control for endogeneity. However, a major limitation of using changes in compulsory schooling to detect educational e↵ects on mortality, is that often only a relatively small part of the population is a↵ected by the laws (Mazumder, 2008, Fletcher, 2015. Another issue with the instrumental variable methods applied in these studies is that they implicitly assume that the compulsory schooling reforms only a↵ect long-term health through their e↵ect on education, ignoring any other contemporary policy changes that may accompany these reforms. Another identification strategy is to use variation in education among siblings, often identical (monozygotic) twins, to distinguish the unobserved factors shared by these siblings. These studies (Behrman et al., 2011, Lundborg, 2013, Naess et al., 2012, Amin et al., 2015 obtain estimates of the impacts of the di↵erences in schooling within a pair of identical twins on their health di↵erences at various schooling levels. Although by using twins it is possible to control for both shared environmental and shared genetic factors, a major shortcoming of twin studies is that they only analyse twins, yet twins are usually not representative of the whole population. Using twins will substantially reduce the statistical power, because only twins with di↵erent education levels are analysed. Not only is it rare that twins would have the same cognitive ability, they also experience a large number of nonshared events throughout life, events that may be unobserved and influence both education and mortality (e.g. accidents). A third approach to account for confounding factors is to include them directly into the model (Bijwaard et al., 2015a(Bijwaard et al., ,b, 2019 A disadvantage of these models is that they impose a rather stringent structure on the relation between education, mortality and the influence of confounding factors. Another limitation is that estimation of these structural models can be very computer intensive if a large data set is available. The IPW method we (and Bijwaard et al. (2017) ) employ also accounts for possible confounding factors, however, without making any structural assumptions on the relation between the confounding factors and hospitalization or mortality. However, the unconfoundedness assumption of no unmeasured confounding influencing both the education choice and the hospitalisation-mortality process is a strong assumption and nonrefutable.

Appendix A Likelihood function
We have data for i = 1, . . . , n male recruits in our observation window. Let K id and K ir denote the number of the discharges and re-admittances out/in a hospital for CVD of individual i.
Note that for some individuals K id = 0 and K ir = 0, i.e an individual who either never entered a hospital or who died in hospital. An important feature of duration data is that for some individuals we only know that he or she survived up to a certain time (often the end of the observation window). In this case an individual is (right) censored and we use the survival function instead of the hazard in the likelihood function. The three indicators d ik , r ik and m i signal that k th CVD hospitalisation discharge/re-entry or the mortality spell is uncensored.
is (suppressing dependence on observed characteristics x and education e), in the light of the preceding discussions: This likelihood naturally separates admittance, discharge, re-admittance and mortality spells, and for each spell allows for censoring. I h (t ik ) indicates that the individual is in hospital just before t ik and similarly for I o (t il ). When K id = 0 or K ir = 0 the relevant term becomes 1. Note that the last, and only the last, hospitalisation spell is censored. This is either because the individual is still alive at the end of the observation period, or has died.
Another feature of duration data is that only individuals are observed having survived up to a certain age. In our case, mortality follow-up is only available from the conscription date, around age 18, onwards. In this case the individuals are left-truncated, and we need to condition on survival up to the age of first observation, t 0 = 18. With left-truncated data the distribution of unobserved heterogeneity among the survivors (up to the left-truncation time) changes. When only individuals are observed that have survived until age t 0 the likelihood contribution is with the distribution of the unobserved heterogeneity conditional on survival up to t 0 is the joint distribution of the unobserved heterogeneity terms implied by the discussion of v k .

Appendix B Additional tables
For an IPW method to hold we need to check if the propensity score is able to balance the distribution of all included variables in all education groups. One suitable way to check whether there are still di↵erences is by calculating the standardized bias, or normalised di↵erence in means: 100 ·x e x p p Var(x) p (B.1) With e = 1, . . . , 5 the education group and p is the whole sample population. Table B.3 shows the percentage bias measure before and after adjusting the data in our sample. They reveal substantial imbalances between those who attained adjacent education levels before accounting for selective education choice. The biases in columns labelled 'after' show that these imbalances disappear when we use the inverse propensity weights.     (1) and (2); (2) and (3); (3) and (4); (4) and (5); + p < 0.05, ⇤⇤ p < 0.01.   Based on adding discrete U to propensity score with probabilities of U from observed probabilities for each covariate. No e↵ect would give !R(e) = 0 and ⇠(e) = 0. + p < 0.05 and ⇤⇤ p < 0.01   (2) Some Secondary education; (3) Full Secondary education 3 years; (4) post-secondary education; (5) Higher b Running from low to high.
Based on adding discrete U to propensity score with probabilities of U from observed probabilities for each covariate. No e↵ect would give !R(e) = 0 and ⇠(e) = 0. + p < 0.05 and ⇤⇤ p < 0.01 (2) Some Secondary education; (3) Full Secondary education 3 years; (4) post-secondary education; (5) Higher b Running from low to high.
Based on adding discrete U to propensity score with probabilities of U from observed probabilities for each covariate. Significance of di↵erence with original estimates in Table 2, + p < 0.05 and ⇤⇤ p < 0.01 (4) post-secondary education; (5) Higher b Running from low to high.
Based on adding discrete U to propensity score with probabilities of U from observed probabilities for each covariate. Significance of di↵erence with original estimates in Table 2, + p < 0.05 and ⇤⇤ p < 0.01 (4) post-secondary education; (5) Higher b Running from low to high.
Based on adding discrete U to propensity score with probabilities of U from observed probabilities for each covariate. Significance of di↵erence with original estimates in Table 2, + p < 0.05 and ⇤⇤ p < 0.01 Appendix C Full tables with parameter estimates       (1) some secondary education over primary education; (2) full secondary education over some secondary; (3) post-secondary over full secondary education; (4) higher over post-secondary education. + p < 0.05, ⇤⇤ p < 0.01. (2) full secondary education over some secondary; (3) post-secondary over full secondary education; (4) higher over post-secondary education. + p < 0.05, ⇤⇤ p < 0.01.     Other causes of death