Using three statistical methods to analyze the association between exposure to 9 compounds and obesity in children and adolescents: National Health and Nutrition Examination Survey 2005-2010

Abstract


Introduction
The continuous increase in obesity has become an important worldwide health problem in the past 30 years [1].In 2016, about 18% of children and adolescents aged 519 were overweight or obese [2].Obesity in children increases the risk of health conditions, such as coronary heart disease, diabetes mellitus, hypertension, and heart failure, and those obese children or adolescents can become obese adults [3][4][5].
Therefore, it is vital to identify potential risk factors contributing to obesity to reduce the prevalence and mortality rates in obesity-related diseases.Although genetic predisposition, physical activity, and diet play an essential role in the occurrence of obesity, there is still a need for further explanation.More evidence indicates that environmental endocrine-disrupting chemicals might increase the occurrence of obesity [6][7][8][9].Twum et al. demonstrated an underlying relation between exposure to 2,5-dichlorophenol (2,5-DCP) and obesity in children [4].A significant association was found between bisphenol A (BPA) and general and abdominal obesity [10].
Deierlein showed that phthalates-specifically low-molecular weight phthalates (monoethyl phthalate [MEP], a metabolite of diethyl phthalate (DEP); mono-n-butyl phthalate [MBP], a metabolite of di-n-butyl phthalate (DBP), and mono-isobutyl phthalate [MiBP], a metabolite of di-isobutyl phthalate (DiBP))-had slight associations with girls' anthropometric outcomes [11].These substances are readily present in our daily lives, since consumer products usually use parabens as preservatives, building and food packaging materials use phthalates as plasticizers, and the production of pharmaceutical and agricultural products uses 2,5-DCP as a chemical intermediate [12][13][14].We can easily contact these environmental endocrine-disrupting chemicals via gastrointestinal intake, dermal contact, and applying products that contain these chemicals [15,16].However, most of the previous research studied only a unitary exposure or a set of similar exposures [17][18][19].
We are exposed to all kinds of chemical exposures simultaneously, which can result in complicated interactions.Therefore, it is necessary to use a suitable statistical model for risk assessment of exposure and obesity [20][21][22].
We collected data on urinary chemicals or metabolites that had been reported to have an effect on obesity in the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2010.We studied 9 chemical exposures including phenols (BPA, benzophenone-3 (BP-3)), parabens (methylparaben (MeP), propyl paraben (PrP)), pesticides (2,5-DCP, 2,4-DCP) and phthalate metabolites (Mono-benzyl phthalate (MBzP), MEP, MiBP).We selected three statistical methods, including generalized linear regression, weighted quantile sum (WQS) regression, and Bayesian kernel machine regression (BKMR) models, to better analyze multi-exposures' co-function on adolescent obesity.Among them, BKMR model can resolve the non-linear and complicated interactions between chemical exposures and get more accurate results comparing with the generalized linear regression [23].All of these three methods have their own advantages and disadvantages, and we expected that this comprehensive analysis would yield insightful and fruitful conclusions.

Study sample
The NHANES is a cross-sectional nationally representative program, aiming to collect information on adults' and children's health and nutritional condition in the United States, which is reviewed and approved by the National Center for Health Statistics, as one of the departments of Centers for Disease Control and Prevention (CDC).The NHANES program was conducted in the early 1960s and released the data in biennial datasets.In order to get the unbiased national health information on the non-institutionalized population of the United States, the NHANES used a considerate, multi-stage stratification probability sampling design [24].We collected publicly accessible data from 2005 and 2010.We selected participants between 6 and 19 years old, with attainable measurements of urinary phenols, parabens, pesticides, and phthalate metabolites Body mass index (BMI) and waistcircumference simultaneously (n=2629) and excluded the participants whose data on covariates, including age, gender, race, education level, family income-to-poverty ratio, caloric intake, serum cotinine, and urinary creatinine, were missing (n=257).Finally, 2372 participants were included in our study.

Measurement of chemical exposures
Urinary samples were collected and stored at -20°C.They were sent to the National Center for Environmental Health, the Organic Analytical Toxicology Branch, for analysis.BPA, BP-3, MeP, PrP, 2,4-DCP, and 2,5-DCP,were extracted by on-line solid-phase extraction (SPE).They were measured by high-performance liquid chromatography as well as tandem mass spectrometry (MS/MS).MBzP, MEP, and MiBP were measured by high-performance liquid chromatography-electrospray ionization-tandem mass spectrometry (HPLC-ESI-MS/MS).The limit of detection (LOD) for the compounds to be analyzed, including BPA, BP-3, MeP, PrP, 2,4-DCP, and 2,5-DCP, were 0.4 ng/mL, 0.4 ng/mL, 1.0 ng/mL, 0.2 ng/mL, 0.2 ng/mL, 0.2 ng/mL in the data from 2005 to 2010, respectively.And the LOD for MBzP, MEP, and MiBP were 0.3 ng/mL, 0.8 ng/mL, and 0.3 ng/mL in the data from 2005 to 2008 and 0.2 ng/mL, 0.4 ng/mL, and 0.2 ng/mL in the data from 2009 to 2010.These values below the limit of detection were divided by the square root of 2 to replace the original values.As one study recommended [25], we treated urinary creatinine as a covariate to explain the urinary dilution.Urinary creatinine was measured by a Beckman Synchron CX3 Clinical Analyzer.The NHANES provides detailed information on the measurement method in the section on laboratory methods on its website [26,27].

Anthropometric variables
Trained health technicians measured the body weight and height according to the standardized protocol.The BMI was calculated using each person's weight in kilograms to divide the square of their height in meters.However, because the standard BMI shows differences for the different ages and gender among children, measuring BMI percentiles and the BMI z-score was more appropriate.The BMI z-score was calculated in regards to the children's age, gender, and BMI.An appropriate standard was used, which reflected the number of SDs differing from the mean of the BMI with reference to the same age and gender.The methodology to calculate the BMI z-score specifically for different ages and gender was provided by the CDC [28].We defined a child to be obese when their BMI was above or equal to the 95th percentile for their age and gender in accordance with the CDC recommendations [29].

Covariates
Covariates, including age, gender, race, education level, family income-to-poverty ratio, caloric intake, serum cotinine, and urinary creatinine, were collected by interview or laboratory detection by NHANES.Race was grouped into Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, and Other Race.Education level was categorically grouped into ≤5 grade, 6−8 grade, 9−12 grade, or High School Graduate with No Diploma, High School Graduate and GED or Equivalent, or More than high school.The family income-to-poverty ratio was divided into three groups: ≤1.30, 1.31−3.50,and >3.50.The caloric intake was dichotomously divided into normal intake and excessive intake, according to the Dietary Guidelines for Americans 2010 [30].Serum cotinine indirectly reflected the exposure to environmental tobacco.Serum cotinine, age, and urinary creatinine were considered to be continuous variables.

Statistical analysis
We used the χ 2 test and the t-test to analyze categorical variables and continuous variables, respectively.And for the serum cotinine and urinary creatinine, we used wilcoxon rank-sum test.We calculated the descriptive statistics on BPA, BP-3, MeP, PrP, 2,4-DCP, 2,5-DCP, MBzP, MEP, and MiBP.Because the distributions of the chemical exposures were skewed, we log-transformed the concentrations of all chemical exposures.We used the Pearson correlation to calculate the correlation coefficients among all chemical exposures.p < 0.05 was considered to be statistically significant.

Generalized linear regression
We conducted multivariable logistic regression to analyze each chemical exposure and the odds ratios (ORs) of obesity in different quartiles.We also fitted a multivariable linear regression model to assess the association between each chemical exposure and the continuous variable of the BMI z-score in different quantiles.In addition, we fitted the models, adjusting for all the chemical exposures.All the regression models were adjusted by age, gender, race, education level, family income-to-poverty ratio, caloric intake, serum cotinine, and urinary creatinine.We used log-transformed urinary creatinine as an independent covariate instead of the creatinine-adjusted concentration [25].

Weighted quantile sum (WQS) regression
The WQS model scored all the chemical exposures into quantiles and estimated the weight index: where ℊ() represents any monotonic link function, μ is the predictable variable,  is the weight of the ith components to be estimated, qi refers to different quantiles, and (∑    =1   ) represents the weight quantile sum of the set of c components of interest.Furthermore,  1 denotes the regression coefficient for the weight quantile sum,  0 is the intercept,  ′ refers to the covariates, including risk factors and confounders, and  is the coefficients for the covariates.The weights were estimated between 0 and 1, and added up to 1.In this study, we divided the data into the training set (40%) and the validation set (60%), we also set  1 to be positive and the seed was set to be 2019.Besides, we also constrained  1 to be negative to find if there was a significant relationship in this way.We bootstrapped the training set 10000 times and got the estimated weights, which maximized the likelihood of the non-linear model.A significant level (p < 0.05) was set to test the significance of the weights in each bootstrap.We calculated the  ̅  to estimate the weight quantile sum: where   represents the number of bootstraps in which  1 was significant.The estimated WQS was then determined using the validation set.All the chemical exposures were included in the model, and a specific weight was calculated for each component, representing their contribution to the WQS index.The chemical exposures included were constrained to have the same effect with the outcome (all positive or all negative).[31]

Bayesian kernel machine regression (BKMR)
The BKMR model utilizes a non-parametric approach to flexibly model the association between chemical exposures and healthy outcomes, including the nonlinear and/or interactions in the exposure-outcome association.A high-dimension exposure-response relationship induced by multiple variables incorporated in the model would make it difficult to ascertain the basis function.Thus, we used a kernel machine regression: where   is the health outcome, i refers to the individual (i = 1, 2, 3…n),   is the chemical exposures,   is the potential confounders, and  represents the effect of the covariates.  is the residual that obeys the normal distribution N (0,  2 ).ℎ() is the function that fits the exposure and the outcome considering nonlinear and interactions between the exposures.We grouped the chemical exposures into three groups (group1: BPA, BP-3, MeP, and PrP; group2: 2,5-DCP and 2,4-DCP; group3: MBzP, MEP, and MiBP), according to their source and correlation (chemical exposures with high correlation were grouped) with each other.A hierarchical variable selection approach was used to estimate the posterior inclusion probability of highly correlated variables, which was based on our prior knowledge.The model was fit with 10000 iterations using a Markov chain Monte Carlo (MCMC) method.The parameter r.jump2 was separately set to 0.2 (in the BMI z-score model) and 0.001 (in the obesity model) to get suitable acceptance rates.
We also analyzed the association between the quantiles of the chemical exposures and binary healthy outcome (obesity and non-obesity) using a probit BKMR model: where Φ −1 is the link function and   is the probability of the binary outcome.[22,23] Trace plots of parameter in both BMI z-score and obesity model were visualized to investigate the convergence.
All of the statistical analysis were conducted using R software (version 3.6.0).

Results
There were 2372 children and adolescents included in our study.The general characteristics of the participants are presented in We found significant correlations (P < 0.05) among 9 chemicals (Fig. 1), except for the correlation between BP-3 and 2,4-DCP (P = 0.69).There was a positive correlation between other compounds, except for a nearly no correlation of BP-3 with 2,5-DCP (r = -0.06).2,5-DCP was found to have a strongly correlation with 2,4-DCP (r = 0.87).Additionally, a high correlation between MeP and PrP (r = 0.81) was found.
In the multivariable logistic and linear regression models, including all the chemical exposures, adjusting for the confounding effects of other chemicals,  S1 and S2).We calculated the variance inflation factors (VIFs) (see Additional File 1, Tables S3), and none of them was higher than 10. fully adjusted obesity model was 2,5-DCP (weighted 0.41), followed by BPA and MEP (weighted 0.17 and 0.16, respectively).We also treated the BMI z-score as a continuous variable and fitted the BMI z-score model (Table 5).However, we did not find any significant association between the exposures and the BMI z-score in all three models.The estimated chemical weights of BMI z-score are presented in Fig. 2b.
The highest weighted chemical in the BMI z-score model was 2,5-DCP (weighted 0.30).Next to this were BP-3 and MEP, weighted 0.28 and 0.18, respectively.In addition, we also fitted WQS model including all covariates with  1 constrained to be negative.However, no statistical difference was found in this way.(see Additional File 1, Tables S4) The three groups in BKMR model were Phenols and paraben (group1), pesticides (group2), and phthalate metabolites (group3).Models were adjusted for age, gender, race, educational levels, family income-to-poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.
We grouped 9 chemical exposures into three groups, according to their source and correlation with each other, and fitted the BKMR model to analyze the simultaneous exposure with obesity and BMI z-score.In the obesity model, the group posterior inclusion probabilities (PIP) of the pesticides group was 0.966, while the group PIP of phenol and phthalates metabolites was higher than 0.5 (Table 6).In the pesticides group, 2,5-DCP seemed to drive the effect of the whole group (CondPIP = 0.978; Table 6).In the phthalate metabolites group, MEP drove the main effect of the whole group (CondPIP: 0.656), while MeP drove the main effect in the phenols group (CondPIP = 0.903) (Table 6).The overall association between the chemical mixtures and the binomial outcome is shown in Fig. 3a.We found a positive tendency between

Discussion
Due to the interactions between chemicals, it would be inaccurate to fit only the generalized linear regression model.Therefore, we further used the WQS and BKMR models, which can deal with the interaction between chemicals.
The generalized linear regression showed a positive association between 2,5-DCP, MEP, and MiBP and obesity; however, MeP was negative with the outcome.The WQS mode can include mixed chemicals exposures, with possible high correlations and interactions that are common in real life.In our analysis, 2,5-DCP and MEP were weighted highly in the WQS model.Among these, it is worth noting that BPA and BP-3 were found to weigh highly in the WQS model, yet was found to have a negligible relationship with obesity in the other two models, which may be due to the limitation of the WQS model.The WQS model may lose the full exposure information of the chemical exposures using the quantiles to score the exposures.
MeP weighed slightly in the WQS model, which differed from the results in the the other two models.This may result from its negative correlation with the outcome.
Since one limitation of WQS is that all chemical exposures included in the model must have the same effective trend with the outcome, otherwise they will be distributed to a negligible weight in the WQS model [34].In addition, the WQS model may result in a slight weight if a large number of exposures were included, or if there were complex interactions within mixed exposures.Two likely important exposures would have smaller weights if one of them was highly correlated with another one that was assigned a slight weight [31].However, as for the interactions between chemical exposures, the WQS model still has a high specificity and sensitivity when dealing with mixed predictors, considering the correlated high-dimensional mixtures [31].
The BKMR model is a new approach to deal with the complexity of mixed exposures, which can analyze not only the exposure-response function of the overall risk of mixed chemical exposures but also the interaction between two chemical exposures.In our study, 2,5-DCP and MEP have a positive association with the continuous variable BMI z-score, which was consistent with the results of our findings in the other two models.However, with the non-linear exposure-response function, other exposures were slightly or negatively associated with the outcomes, which showed consistency with its slight weight in the WQS model.Among the three groups, the MeP was found to have an inverse association with obesity, which is consistent with a previous study [12].Previous studies could not reach consensus concerning phthalate and BPA, [35][36][37], and further studies are needed.It is worth noting that MiBP had a positive relationship with the dichotomous variable of obesity but had no relationship with the continuous variable.This may be due to the misleading information when we artificially classified the continuous variable into a dichotomous variable.Besides, we also found potential interactions between 2,5-DCP and MeP as well as MEP and MeP in obesity model, while in the BMI z-score model there was no oblivious interactions.And further investigation is needed on these interactions.The BKMR model also has some limitations.An inconspicuous overall risk association may be observed when exposures which were positive with the outcome or were negative with the outcome both exist [22].
There were several limitations to our study.First, because of the design of the cross-sectional survey project, which collected all of the data at a single time point, there was a limit to the inference of the causation between the chemical exposures and obesity.Second, we used the education level of the individuals themselves instead of their parents' education level, which can be a factor, since parental education can change their intention to alter the obesity risk factor [38].Third, chemical concentrations below the limit of detection were simply replaced by the value of the limit of detection divided by the square root of 2, which may cause inaccurate results.
Thus, we selected chemical exposures with a high detection frequency.Fourth, obesity is the result of a combination of the long-term effects of various factors.We determined that the concentration of various exposures in urine does not justify a full inference about the mixed chemical exposures on individuals.Further prospective studies are required to investigate the long-term exposure.

Conclusion
Our study uses three statistical models to analyze the mixed chemical exposures with obesity.2,5-DCP and MEP were found to have a significant association with the outcome in all models, these results may lead to a false conclusion if only one model is considered.Since all of the models have their own advantages and disadvantages, our study confirms the necessity of combining different statistical models when dealing with the effects of mixed exposures on obesity.

Additional Files
Additional File1: Table S1.Association between chemical exposures and obesity with all the chemicals included in NHANES 20052010 (N = 2529).Table S2.
Association between chemical exposures and BMI z-score with all of the chemicals included in NHANES 20052010 (N = 2529).Table S3.Variance inflation factors (VIFs) in the multivariable logistic and linear regression models, including all the chemical exposures, adjusting for the confounding effects of other chemicals in NHANES 20052010 (N = 2529).Table S4.Association between the WQS index and obesity in negative direction.
Overall risk (95% CI) of chemical exposures on obesity (A) and BMI z-score (B) when comparing all the chemicals at different percentiles with their median level.Models were adjusted for age, gender, race, educational levels, family income-to poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.Association between exposure 1 with obesity (A) and BMI z-score (B), while xing exposure 2 at the 10th, 50th, and 90th quantiles (and holding the remnant predictors to their median level).The models were adjusted for age, gender, race, educational levels, family income-to-poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download.

Fig. 4 a
Fig. 4 a illustrates the positive associations of 2,5-DCP, MEP, and MiBP with obesity 2,5-DCP and MEP were significantly associated with the BMI z-score.In the WQS model, 2,5-DCP, BPA, and MEP were found to have relatively high weights in the obesity model, while 2,5-DCP and MEP were found to weight relatively high in the BMI z-score model.In the BKMR model, although no significant association was found between the overall risk of the mixed chemicals and obesity (either obesity or the BMI z-score), there was an upward trend.2,5-DCP, MEP, and MiBP were found to have a positive association in the obesity model, when fixing others at their median concentration, while in the BMI z-score model, 2,5-DCP, and MEP were positively correlated with the BMI z-score.These results point out the necessity for combining three different models, considering their various advantages and disadvantages.The generalized linear model, which is used frequently to deal with the exposure-response model, has a fast modeling speed and allowed us to obtain an understandable interpretation of the coefficients.Usually, in the analysis to evaluate the association between exposures and outcome, a unitary exposure or a set of similar exposures is included[12,32,33].Our study included 9 chemical exposures of different sorts.It should be noted that the generalized linear model could not analyze the interactions between exposures.The results may be confusing due to the co-linear or interactions between the exposures.

Figure legends Figure 1 .
Figure legends

Figure 2 .
Figure 2. WQS model regression index weights for the obesity (A) and BMI z-score (B).Models were adjusted for age, gender, race, education levels, family income-to-poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.

Figure 3 .
Figure 3. Overall risk (95% CI) of chemical exposures on obesity (A) and BMI z-score (B) when comparing all the chemicals at different percentiles with their median level.Models were adjusted for age, gender, race, educational levels, family income-to poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.

Figure 4 .
Figure 4. Association and 95% credible intervals for each chemical exposure with obesity (A) and BMI z-score (B) while fixing other chemical exposures at their median level.The model was adjusted for age, gender, race, educational levels, family income-to-poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.

Figure 5 .
Figure 5. Association between exposure 1 with obesity (A) and BMI z-score (B), while fixing exposure 2 at the 10th, 50th, and 90th quantiles (and holding the remnant Llewellyn A, Owen CG, Woolacott N: Predicting adult obesity from childhood obesity: a systematic review and meta-analysis.Obesity reviews : an official journal of the International Association for the Study of Obesity 2016, 17(2):95-107.4. Twum C, Wei Y: The association between urinary concentrations of dichlorophenol pesticides and obesity in children.Rev Environ Health 2011, 26(3):215-219.

Figures Figure 1
Figures

Figure 5
Figure 5 National Health and Nutrition Examination Survey; BMI: body mass index.Data are presented as mean ± SD or Geometric mean ± SD or n (%).The t-test and  2 test were between the general obesity and no obesity groups.
Table 1.The prevalence of obesity was 20.53%.It showed that the mean age of obesity and non-obesity is close: approximately 12-and-a-half years old.About half (44.98%) of the participants were ≤5 grade, and 53.03% had a normal caloric intake.The mean (SD) BMI and waist Table 1 Demographic characteristics of the NHANES 2005-2010 participants (N = 2372), aged 619 years(SD) BMI z-scores were 2.12 (0.32) in the obesity group and 0.18 (0.94) in the non-obesity group.There were significant differences between the obesity and non-obesity participants in terms of race, family income, caloric intake, urinary creatinine, BMI, BMI z-score, and waist circumference.The LOD and the detection frequency of the chemicals above the LOD are shown in Table2.The detection frequency of MEP (99.9%) had the highest detection frequency of chemical exposures and the detection frequency of all chemical exposures was above 90%.Table 2 also shows the geometric mean, the mean, and the distribution of the chemical exposures.The highest and the lowest geometric means of the chemical exposures were related to the MEP (87.12) ng/mL and 2,4,5-TCP (0.09) μg ∕ L.

Table 5
Association between the WQS index and obesity in NHANES 2005-2010 National Health and Nutrition Examination Survey; CI: confidence interval.The weighted quantile sum (WQS) regression was fitted for the obesity and BMI z-score, which scored all the chemical exposures into quantiles and estimated the weight index.OR estimates represent the odds ratios of obesity as 1 quartile increased in the WQS index. estimates represent the mean differences in the BMI z-score as 1 quartile increased in the WQS index.Model 1: Adjusted for age, gender, ethnicity, and log-transformed creatinine.Model 2: Adjusted for age, gender, ethnicity, caloric intake, serum cotinine, and log-transformed creatinine.Model 3: Adjusted for age, gender, ethnicity, educational levels, family income-to-poverty ratio, caloric intake, serum cotinine, and log-transformed creatinine.We fitted the WQS regression model to the data to evaluate the relationship between the chemical exposures and the outcome in three models, adjusting for different covariates respectively (Table5).The WQS index had a significant also calculated the estimated chemical weights of the dichotomous variable obesity in obesity model, which are presented in Fig.2a.The highest weighted chemical in the

Table 2
Distribution of the chemical exposures in NHANES 20052010 (N =2372)

Table 3
Association between single exposure and obesity in the NHANES 2005-2010 (N = 2372) National Health and Nutrition Examination Survey; OR: odds ratio; CI: confidence interval.Total means continuous chemical variable.Multivariable logistic regression was conducted, and odds ratios (ORs) were calculated while comparing the second, third, and fourth quartiles of each chemical with reference to the first exposure quartile (N = 2372).Models were adjusted for age, gender, race, educational levels, family income-to-poverty ratio, caloric intake, serum cotinine and log-transformed creatinine.

Table 4
Association between single exposure and BMI z-score in NHANES 2005-2010 (N = 2372)