Associations Between Selected Xenobiotics and Antinuclear Antibodies in the National Health and Nutrition Examination Survey, 1999–2004

Background: Potential associations between background environmental chemical exposures and autoimmunity are understudied. Objectives: Our exploratory study investigated exposure to individual environmental chemicals and selected mixtures in relation to the presence of antinuclear antibodies (ANA), a widely used biomarker of autoimmunity, in a representative sample of the U.S. population. Methods: This cross-sectional analysis used data on 4,340 participants from the National Health and Nutrition Examination Survey (1999–2004), of whom 14% were ANA positive, to explore associations between ANA and concentrations of dioxins, dibenzofurans, polychlorinated biphenyls, organochlorines, organophosphates, phenols, metals, and other environmental exposures and metabolites measured in participants’ serum, whole blood, or urine. For dioxin-like compounds with toxic equivalency factors, we developed and applied a new statistical approach to study selected mixtures. Lognormal models and censored-data methods produced estimates of chemical associations with ANA in males, nulliparous females, and parous females; these estimates were adjusted for confounders and accommodated concentrations below detectable levels. Results: Several associations between chemical concentration and ANA positivity were observed, but only the association in males exposed to triclosan remained statistically significant after correcting for multiple comparisons (mean concentration ratio = 2.8; 95% CI: 1.8, 4.5; p < 0.00001). Conclusions: These data suggest that background levels of most xenobiotic exposures typical in the U.S. population are not strongly associated with ANA. Future studies should ideally reduce exposure misclassification by including prospective measurement of the chemicals of concern and should track changes in ANA and other autoantibodies over time. Citation: Dinse GE, Jusko TA, Whitt IZ, Co CA, Parks CG, Satoh M, Chan EKL, Rose KM, Walker NJ, Birnbaum LS, Zeldin DC, Weinberg CR, Miller FW. 2016. Associations between selected xenobiotics and antinuclear antibodies in the National Health and Nutrition Examination Survey, 1999–2004. Environ Health Perspect 124:426–436; http://dx.doi.org/10.1289/ehp.1409345


Statistical Model
Let C denote concentration. We assume C is lognormally distributed, which is equivalent to the natural logarithm of C being normally distributed. Assigning parameters, if ln(C) is normally distributed with mean µ and standard deviation σ, then C is lognormally distributed with scale and shape parameters θ = exp(µ) and σ, respectively. The median of C is θ, its mean is θ × exp(σ 2 /2), its variance is θ 2 × exp(σ 2 ) × [exp(σ 2 ) -1], and its probability density function is: [1] We use regression modeling to adjust for confounders. Let Z be a vector of covariates, such as ANA status and the independent variables listed in Table S1. We make the conventional assumption that σ is constant and µ is a linear function of the covariates, µ = βz, in which case the median of C is θ(z) = exp(βz), where β is a vector of regression coefficients and z is a vector of observed covariate values. As the lognormal distribution belongs to the family of accelerated failure time models, each covariate has a multiplicative effect on concentration. That is, if C 0 is the baseline concentration (at Z = 0) and C is some general concentration (at Z = z), then C has the same distribution as θ(z) × C 0 .
As an illustration, suppose Z is a single binary (0,1) covariate. If β = ln(2), for example, then θ(0) = exp(β × 0) = 1 and θ(1) = exp(β × 1) = 2. Thus, under the assumed lognormal distribution, the median (and mean) concentration for those with Z = 1 is twice as large as for those with Z = 0. Similarly, a 1-unit increase in a quantitative covariate with β = ln(2) also corresponds to a doubling in median (and mean) concentration. The same interpretation holds when Z is a vector of covariates if we focus on the effect of a single component of Z for fixed values of the other covariates.
We use the LIFEREG procedure in SAS (version 9.3, SAS Institute) to obtain maximum likelihood estimates (MLEs) of σ and β for each chemical.

Accounting for censoring
For a given chemical, a detectable concentration produces an uncensored observation, The full likelihood is proportional to a product of terms of the form f(c) and F(LOD) over all persons, and the MLEs of σ and β are the values that maximize the full likelihood.
Now suppose we want to analyze a mixture of dioxin-like chemicals, each of which has a TEF that relates its potency to that of the reference chemical (2,3,7,8-TCDD). We consider the dioxin-like chemicals listed in Table 1, with TEFs that decrease from 1.0 (most potent) to 0.00003 (least potent). The TEF is used as an adjustment factor to transform the concentration of a dioxin-like chemical to the same potency scale as the reference chemical. For example, PCB126 is considered one-tenth as potent as 2,3,7,8-TCDD, and their TEFs are 0.1 and 1.0, respectively; thus, we treat concentration C of PCB126 as toxicologically equivalent to concentration 0.1 × C of 2,3,7,8-TCDD, by multiplying its measured concentration by 0.1 prior to combining with other chemicals in the mixture.
Once the concentrations of dioxin-like chemicals have been expressed in equal potency units, they can be summed to create a TEQ concentration for the mixture. Consider a person exposed to a mixture of M chemicals. For the i-th chemical (i=1,…,M), let C i denote its concentration, LOD i its limit of detection, and TEF i its toxic equivalency factor. If each concentration can be measured, that person's TEQ for the mixture is the sum of M products of the form TEF i × C i . However, if at least one concentration is below its LOD, we must account for censoring. Define an interval [L i , R i ] that contains the i-th chemical's concentration, where L i is the left endpoint and R i is the right endpoint. If C i is uncensored and equals c i , then L i = R i = c i , whereas if C i is only known to be below LOD i , then L i = 0 and R i = LOD i (i=1,…,M). Each person's TEQ is viewed as interval censored, where the left and right endpoints of the censoring interval are L and R, which equal the sums of TEF i × L i and TEF i × R i , respectively, over the M chemicals. Intuitively, the smallest the sum could be is L and the largest the sum could be is R.
For example, consider a binary mixture (M = 2). If the first chemical is detectable with measured concentration c 1 (uncensored) and the second chemical is undetectable with limit of detection LOD 2 (left censored), then the TEQ for the binary mixture is known to be between a lower bound TEF 1 *c 1 + TEF 2 *0 and an upper bound TEF 1 *c 1 + TEF 2 *LOD 2 (interval censored).
As with the individual chemicals, we use the LIFEREG procedure in SAS to fit a lognormal model to interval-censored TEQ data for the mixture being investigated. Note that because the sum of lognormal variables is not itself lognormally distributed, the concentrations of individual chemicals and their mixtures cannot both be truly lognormally distributed. In practice, however, statistical models are simply approximations to reality. Hence, while the lognormal assumption (or virtually any parametric assumption) cannot be strictly true for both a TEQ concentration and its component concentrations, we apply the same analysis in both cases to be consistent and as a reasonable method of adjusting for confounders and heavy censoring when assessing the association between a mixture concentration and ANA positivity.
For a given mixture, rather than a concentration being left censored by the LOD, suppose a person had no information for a component chemical. In this case, the TEQ censoring interval for the mixture cannot be calculated and one might elect to simply exclude that person from the analysis. To accommodate missing data more efficiently, we instead treat measurement C i as censored in the interval [0, MAX i ] for anyone with no information on the i-th chemical, where MAX i is the largest observed concentration for the i-th chemical (i=1,…,M) across all persons.

Accounting for sampling
In order to account for the complex NHANES sampling design, we use jackknife methods to obtain appropriate variance estimates for the MLEs (SAS 2011), whether the outcome is an individual chemical concentration or a mixture TEQ concentration. The jackknife approach creates K replicate data sets, where K equals the number of primary sampling units (PSUs). Each replicate data set excludes a different PSU; an MLE is calculated from the remaining data; and the variance of the full-data MLE is estimated by a weighted sum of squared deviations between the replicate-specific MLEs and the full-data MLE.
The 1999-2004 NHANES data involves K = 87 clusters (PSUs) and 43 strata. One stratum contains three clusters and each of the other 42 strata contains two clusters. Without loss of generality, suppose the first stratum is the one with three clusters and let the j-th replicate data set be the one that excludes the j-th cluster (j=1,…,K). Define a set of jackknife coefficients {h j }, where h 1 = h 2 = h 3 = 2/3 and h j = ½ for j=4,…,87. For the j-th replicate data set, all observations receive a replicate weight of 1.0, except for those in the stratum with the excluded PSU, where observations in the j-th cluster receive a replicate weight of 1/h j (i.e., either 3/2 or 2). Let γ denote the parameter of interest, which in our lognormal analysis could be either σ or an element of the vector β. The jackknife estimate of variance for , the full-data MLE of γ, is: where ! is the MLE of γ from the j-th replicate data set based on using the replicate weights. We calculate the left and right endpoints of a 95% confidence interval (CI) for γ using the formula: where 1.96 is the 97.5-th percentile of the standard normal distribution. The endpoints of a CI for a component of β, say β 1 , can be exponentiated to produce a CI for exp(β 1 ), which is the mean concentration ratio (MCR) when β 1 is the regression coefficient for ANA. c BMI was classified respectively as underweight, normal, overweight, or obese using standard cut points of <18.5, 18.5 to <25, 25 to <30, or ≥30 in adults (age 20+) and using 2000 CDC growth chart percentiles of <5, 5 to <85, 85 to <95, or ≥95 (adjusted for sex, age, weight, height, and head circumference) in adolescents (age <20) (Kuczmarski et al. 2002).