Biased Exposure–Health Effect Estimates from Selection in Cohort Studies: Are Environmental Studies at Particular Risk?

Background The process of creating a cohort or cohort substudy may induce misleading exposure–health effect associations through collider stratification bias (i.e., selection bias) or bias due to conditioning on an intermediate. Studies of environmental risk factors may be at particular risk. Objectives We aimed to demonstrate how such biases of the exposure–health effect association arise and how one may mitigate them. Methods We used directed acyclic graphs and the example of bone lead and mortality (all-cause, cardiovascular, and ischemic heart disease) among 835 white men in the Normative Aging Study (NAS) to illustrate potential bias related to recruitment into the NAS and the bone lead substudy. We then applied methods (adjustment, restriction, and inverse probability of attrition weighting) to mitigate these biases in analyses using Cox proportional hazards models to estimate adjusted hazard ratios (HRs) and 95% confidence intervals (CIs). Results Analyses adjusted for age at bone lead measurement, smoking, and education among all men found HRs (95% CI) for the highest versus lowest tertile of patella lead of 1.34 (0.90, 2.00), 1.46 (0.86, 2.48), and 2.01 (0.86, 4.68) for all-cause, cardiovascular, and ischemic heart disease mortality, respectively. After applying methods to mitigate the biases, the HR (95% CI) among the 637 men analyzed were 1.86 (1.12, 3.09), 2.47 (1.23, 4.96), and 5.20 (1.61, 16.8), respectively. Conclusions Careful attention to the underlying structure of the observed data is critical to identifying potential biases and methods to mitigate them. Understanding factors that influence initial study participation and study loss to follow-up is critical. Recruitment of population-based samples and enrolling participants at a younger age, before the potential onset of exposure-related health effects, can help reduce these potential pitfalls. Citation Weisskopf MG, Sparrow D, Hu H, Power MC. 2015. Biased exposure–health effect estimates from selection in cohort studies: are environmental studies at particular risk? Environ Health Perspect 123:1113–1122; http://dx.doi.org/10.1289/ehp.1408888


Studies: Are Environmental Studies at Particular Risk?
Marc G. Weisskopf, David Sparrow, Howard Hu, and Melinda C. Power Table S1. Variables used in forward selection logistic regression model to calculate inverse probability of attrition weights (IPW). Table S2. Variables included in the final inverse probability of attrition weighting model. Table S3. Characteristics at the time of bone lead measurement among all those with bone lead measurement (N=835). Table S4. Fully adjusted a hazard ratios (95% confidence intervals) for all-cause, cardiovascular disease, and ischemic heart disease mortality, by tertile b of patella lead at baseline among white men in the Normative Aging 45 years old or younger at NAS study entry (N=637), and applying inverse probability of attrition (IPW) weights truncated at the 1 st and 99 th percentile of the distribution of the IPW weights. Table S5. Adjusted hazard ratios (HR; 95% CI) for all-cause, cardiovascular disease, and ischemic heart disease mortality, by tertile b of blood lead at baseline among either all white men in the Normative Aging Study (N=1,206), or those 45 years old or younger at NAS study entry (N=909). Table S6. Adjusted hazard ratios (HR; 95% CI) for all-cause, cardiovascular disease, and ischemic heart disease mortality, by tertile of tibia lead at baseline among either all white men in the Normative Aging Study (N=834), or those 45 years old or younger at NAS study entry (N=636). Figure S1. Nonlinear association between patella bone lead concentration and the log of HR (logHR) for all-cause, cardiovascular, and ischemic heart disease adjusted for age at KXRF, age at KXRF squared, smoking (never/former/current & packyears), and education among all white men (n=835) (Model 1: Base Model). The reference logHR=0 is at the mean of patella lead concentration. The estimates are indicated by the solid line and the 95% CIs by the dashed lines.

Details of inverse probability weighting
The P values for significance of the nonlinear component for all-cause, cardiovascular, and ischemic heart disease mortality were 0.39, 0.54, and 0.64, respectively. Patella lead concentrations of all individual participants are indicated by short vertical lines on the x-axis.. Figure S2. Nonlinear association between patella bone lead concentration and the log of HR (logHR) for all-cause, cardiovascular, and ischemic heart disease adjusted for age at KXRF, age at KXRF squared, smoking (never/former/current & packyears), and education among white men 45 years old or younger at NAS entry (n=637) and with inverse probability weighting to weight the analyses to reflect the full group still alive at the time of KXRF (Model 4). The reference logHR=0 is at the mean of patella lead concentration. The estimates are indicated by the solid line and the 95% CIs by the dashed lines. The P values for significance of the nonlinear component for all-cause, cardiovascular, and ischemic heart disease mortality were 0.48, 0.91, and 0.28, respectively. Patella lead concentrations of all individual participants are indicated by short vertical lines on the x-axis.

Details of inverse probability weighting
We used a forward selection process to inform our final inverse probability of attrition weights (IPW) for the bone lead analyses. The variables considered in this process are shown in table S1.
Our final model necessarily considered a subset of these, which are shown in table S2. The cstatistic from this model was 0.81. The unstabilized weights did not have very extreme values (see footnote, Table S2), and results for the fully adjusted models restricted to those 45 or younger at NAS entry using weights truncated at the 1 st and 99 th percentiles (i.e. assigning the 1 st and 99 th percentile weight to anyone with weights more extreme) were similar, only slightly weaker as expected (Table S4). IPW for blood lead analyses was done similarly to that for bone lead. The c-statistic for blood lead weighting models was 0.66.   10 Supplemental Material, Figure S1. Figure S1. Nonlinear association between patella bone lead concentration and the log of HR (

11
Supplemental Material, Figure S2. Figure S2. Nonlinear association between patella bone lead concentration and the log of HR (logHR) for all-cause, cardiovascular, and ischemic heart disease adjusted for age at KXRF, age at KXRF squared, smoking (never/former/current & packyears), and education among white men 45 years old or younger at NAS entry (n=637) and with inverse probability weighting to weight the analyses to reflect the full group still alive at the time of KXRF (Model 4). The reference logHR=0 is at the mean of patella lead concentration. The estimates are indicated by the solid line and the 95% CIs by the dashed lines. The P values for significance of the nonlinear component for all-cause, cardiovascular, and ischemic heart disease mortality were 0.48, 0.91, and 0.28, respectively. Patella lead concentrations of all individual participants are indicated by short vertical lines on the x-axis.