Educational attainment and cardiovascular disease in the United States: A quasi-experimental instrumental variables analysis

Background There is ongoing debate about whether education or socioeconomic status (SES) should be inputs into cardiovascular disease (CVD) prediction algorithms and clinical risk adjustment models. It is also unclear whether intervening on education will affect CVD, in part because there is controversy regarding whether education is a determinant of CVD or merely correlated due to confounding or reverse causation. We took advantage of a natural experiment to estimate the population-level effects of educational attainment on CVD and related risk factors. Methods and findings We took advantage of variation in United States state-level compulsory schooling laws (CSLs), a natural experiment that was associated with geographic and temporal differences in the minimum number of years that children were required to attend school. We linked census data on educational attainment (N = approximately 5.4 million) during childhood with outcomes in adulthood, using cohort data from the 1992–2012 waves of the Health and Retirement Study (HRS; N = 30,853) and serial cross-sectional data from 1971–2012 waves of the National Health and Nutrition Examination Survey (NHANES; N = 44,732). We examined self-reported CVD outcomes and related risk factors, as well as relevant serum biomarkers. Using instrumental variables (IV) analysis, we found that increased educational attainment was associated with reduced smoking (HRS β −0.036, 95%CI: −0.06, −0.02, p < 0.01; NHANES β −0.032, 95%CI: −0.05, −0.02, p < 0.01), depression (HRS β −0.049, 95%CI: −0.07, −0.03, p < 0.01), triglycerides (NHANES β −0.039, 95%CI: −0.06, −0.01, p < 0.01), and heart disease (HRS β −0.025, 95%CI: −0.04, −0.002, p = 0.01), and improvements in high-density lipoprotein (HDL) cholesterol (HRS β 1.50, 95%CI: 0.34, 2.49, p < 0.01; NHANES β 0.86, 95%CI: 0.32, 1.48, p < 0.01), but increased BMI (HRS β 0.20, 95%CI: 0.002, 0.40, p = 0.05; NHANES β 0.13, 95%CI: 0.01, 0.32, p = 0.05) and total cholesterol (HRS β 2.73, 95%CI: 0.09, 4.97, p = 0.03). While most findings were cross-validated across both data sets, they were not robust to the inclusion of state fixed effects. Limitations included residual confounding, use of self-reported outcomes for some analyses, and possibly limited generalizability to more recent cohorts. Conclusions This study provides rigorous population-level estimates of the association of educational attainment with CVD. These findings may guide future implementation of interventions to address the social determinants of CVD and strengthen the argument for including educational attainment in prediction algorithms and primary prevention guidelines for CVD.


OLS Analysis
OLS models are represented by the following equation: Here, ℎ is a given health outcome of interest for individual born in state in year , is the individual's self-reported educational attainment, is a vector of individual-level covariates, is a vector of state-level time-varying covariates, represents year-of-birth fixed effects, and represents robust standard errors clustered by state of birth. We present the results of linear models for both continuous and binary outcomes to allow for comparability in the reporting of effect estimates as beta coefficients, although logistic models for binary outcomes were similar in magnitude and statistical significance (data available upon request).

IV Analysis
IV analysis rests on the assumption that there is not a separate pathway linking the instrument and outcome that does not pass through the predictor. In this case, there are unlikely to be plausible pathways linking CSLs to health other than through duration of educational attainment. For example, one prior study suggests that school quality is unlikely to be affected by changes in CSLs [2].
Another assumption is that there are no confounders of the relationship between the instrument and the outcome; for example, there may be concern that other state-level factors may confound the relationship between CSLs and health [3]. We address this with the inclusion of additional variables representing state-level characteristics, a technique used in prior work [4][5][6][7], as well as secondary models including state fixed effects. For years without data, the most recently reported value of the state CSL variable was carried forward. We assume that individuals remained in their state of birth until age 18; prior studies have shown that cross-state migration was low during this period and that it was uncorrelated to the implementation of CSLs, so any measurement error would bias our results to the null [8,9].
The IV models employed in this study can be represented by the following two equations: Equation (1) represents the first stage of the IV analysis, in which education, the endogenous variable, is regressed on the two instruments representing years of compulsory schooling, while adjusting for individual-( ) and state-level ( ) characteristics as well as fixed effects for year .
Using the coefficients from the first stage, a predicted level of education is produced for every individual in the sample. In this study, the first stage was conducted in the Census sample, and predicted education was linked to individuals in HRS and NHANES by gender, race, year of birth, and state of birth. This predicted education was then used in Equation (2) with health as the dependent variable, which represents the second stage of the IV analysis. The coefficient of interest is 1 , which represents the causal effect of an additional year of educational attainment on the health outcome of interest. Robust standard errors were clustered by state of birth. This type of IV analysis is known as two-sample IV (TSIV) analyses, which is an extension of the more commonly used two-stage least squares (2SLS) IV analyses.
Using a two-sample approach allowed for more precise estimation of the first stage as the Census sample size is much larger, thereby alleviating concerns of weak instrument bias resulting from instruments that explain only a small fraction of the variation in the endogenous variable. In this case, for example, the F-statistic for the first stage using HRS data was 11.2, estimates. The estimates reported here represent the mean of these 10,000 estimates of 1 , and 95% confidence intervals are the estimates at the 2.5 th and 97.5 th percentile. HRS and NHANES survey weights were not employed due to our use of the bootstrapping technique to calculate standard errors, and since the utility of weighting is diminished when the goal of inference is determining causal effects rather than population estimates [10].
In Census data, an additional year of compulsory schooling using the first instrumentdifference between enrollment and dropout ages-led to an increase in educational attainment of 0.27 years (95% CI: 0.26, 0.28). For the second instrument-the difference between enrollment and minimum work ages-an additional year of schooling led to an increase in educational attainment of 0.21 years (95% CI: 0.20, 0.22). This satisfied the IV assumption that the two instruments were each associated with the endogenous variable of interest. This supports the use of the Census sample for the first stage, given its larger size and subsequently greater precision in estimating the first stage.