Plasma metabolites associated with exposure to perfluoroalkyl substances and risk of type 2 diabetes – A nested case-control study

Perfluoroalkyl substances (PFAS) are widespread persistent environmental pollutants. There is evidence that PFAS induce metabolic perturbations in humans, but underlying mechanisms are still unknown. In this exploratory study, we investigated PFAS-related plasma metabolites for their associations with type 2 diabetes (T2D) to gain potential mechanistic insight in these perturbations. We used untargeted LC-MS metabolomics to find metabolites related to PFAS exposures in a case-control study on T2D (n = 187 matched pairs) nested within the V ¨ asterbotten Intervention Programme cohort. Following principal component analysis (PCA), six PFAS measured in plasma appeared in two groups: 1) perfluorononanoic acid, perfluorodecanoic acid and perfluoroundecanoic acid and 2) perfluorohexane sulfonic acid, per-fluorooctane sulfonic acid and perfluorooctanoic acid. Using a random forest algorithm, we discovered metabolite features associated with individual PFAS and PFAS exposure groups which were subsequently investigated for associations with risk of T2D. PFAS levels correlated with 171 metabolite features (0.16 ≤ |r| ≤ 0.37, false discovery rate (FDR) adjusted p < 0.05). Out of these, 35 associated with T2D (p < 0.05), with 7 remaining after multiple testing adjustment (FDR < 0.05). PCA of the 35 PFAS-and T2D-related metabolite features revealed two patterns, dominated by glycerophospholipids and diacylglycerols, with opposite T2D associations. The glycerophospholipids correlated positively with PFAS and associated inversely with risk for T2D (Odds Ratio (OR) per 1 standard deviation (1-SD) increase in metabolite PCA pattern score = 0.2; 95% Confidence Interval (CI) = 0.1 – 0.4). The diacylglycerols also correlated positively with PFAS, but they associated with increased risk for T2D (OR per 1-SD = 1.9; 95% CI = 1.3 – 2.7). These results suggest that PFAS associate with two groups of lipid species with opposite relations to T2D risk.


Introduction
Perfluoroalkyl substances (PFAS) comprise a large group of chemicals used in a variety of technological applications like textile or leather treatment, surfactants and food packaging to make them e.g.non-stick and stain repellent.The half-life of PFAS in humans is 3-5 years (Calafat et al., 2007;Olsen et al., 2007) and bioaccumulative properties of PFAS increase by chain length and sulfonation with the strongest bio-accumulative properties for long-chain and sulfonated PFAS (Conder et al., 2008).Exposure occurs mainly via consumption of fish and contaminated drinking water (Haug et al., 2010;Stubleski et al., 2017), although house dust and contact with products containing PFAS could contribute (Haug et al., 2011;Vestergren and Cousins, 2009;Strynar and Lindstrom, 2008).PFAS have been hypothesized to influence metabolic control mechanisms involved in the development of type 2 diabetes (T2D) (Sunderland et al., 2019), a major global health problem, characterised by insulin resistance of target tissues and reduced insulin secretion by the pancreatic β-cells (Olokoba et al., 2012).
However, the relevant metabolic and molecular pathways involved in T2D development that may be affected by PFAS have not been fully elucidated.Thus, to gain more insight in the PFAS-T2D relation, metabolomics can be a useful tool (Wild, 2005;Rattray et al., 2018) as the presence and concentration of metabolites directly reflects the underlying biochemical activity (Kim et al., 2016).So far, few studies have investigated metabolites associated with PFAS (Salihovic et al., 2019), of which two were in children (Alderete et al., 2019;Kingsley et al., 2019) and one related to pregnancy (Hu et al., 2019).These studies indicated mainly perturbations in lipid and fatty acid metabolism and also found distinctive metabolic profiles for different PFAS (Salihovic et al., 2019), nevertheless, none investigated associations with T2D risk.
Thus, we analysed untargeted metabolomics data from a T2D casecontrol study nested in a population-based prospective cohort.Our objectives were to discover metabolites related to PFAS exposures, explore potential differences in metabolic profiles of six chemically different PFAS, and to investigate whether PFAS-related metabolite patterns were associated with risk of developing T2D.

Study population
This nested case-control study from the Västerbotten Intervention Programme cohort included participants that had donated overnight fasting blood samples to the biobank on two occasions on average 10 (±4.4) years apart (of which at least the first was prior to the case diagnosis) (Donat-Vargas et al., 2019a).The Västerbotten Intervention Programme, initiated 1985, invited all residents in Västerbotten County in Northern Sweden for health screening and blood sampling when they turned 40, 50 and 60 years.This study includes baseline data 1990-2003 and follow-up data 2000-2013.Participation rate exceeded 56%, being often around 70%, and 90.5% also donated blood samples (Norberg et al., 2010).Incident diabetes cases were identified in the DiabNorth register (last update in 2008), which had a participation rate of 70% of VIP participants diagnosed with diabetes, (Rolandsson et al., 2012) and were diagnosed according to WHO recommendations and analysis of autoantibodies (Rolandsson et al., 2012).The average time from baseline blood sampling to diabetes diagnosis was 8.2 years.Diabetes cases were individually matched with controls based on age, sex and date of blood draw.PFAS and metabolites were measured in 374 participants at baseline and follow-up (187 diabetes case-control pairs).All 374 participants were included in an initial exploratory phase where PFASrelated metabolite features were discovered, as indicated in Fig. 1.For the association analyses between PFAS-related metabolite features and T2D, analysis was restricted to 124 T2D case-control pairs (excluded matched pairs if cases had an unclear diabetes diagnosis (n = 34 pairs) or had diabetes at baseline (n = 21 pairs) and if controls had possible diabetes (n = 8 pairs)) (Donat-Vargas et al., 2019a).
Participation information, collected via questionnaires, included marital status, attained education, smoking status, body mass index (BMI) and physical activity (as determined by Cambridge index for physical activity).Based on a validated food frequency questionnaire, we retrieved information on red/processed meat, fish (g/day per 1000 kcal) and alcohol consumption (g/day) (Johansson et al., 2001(Johansson et al., , 2002;;Nettleton et al., 2013).Informed consent was obtained from all participants.The study protocol was approved by the regional ethics review boards in Umeå and Uppsala, Sweden (Dnr 2013(Dnr /414-31, 2014(Dnr /147-32 M and 2014/011)/011).

Untargeted LC-MS metabolite profiling
Untargeted LC-quadrupole time of flight (qTOF)-MS metabolic profiling was performed on baseline and follow-up plasma samples (Shi et al., 2018b).Data acquisition, corrections and normalizations were performed previously and have been described elsewhere (Brunius et al., 2016;Shi et al., 2017).In brief, de-proteinised fasting heparin plasma samples were analysed by LC-qTOF-MS (Agilent Technologies, United States) on reverse phase and hydrophilic interaction liquid chromatography columns in both positive and negative ionisation modes.The MassHunter Acquisition B.04.00 software (Agilent Technologies) was used for data acquisition.Raw data was converted to mzXML format and deconvolution was performed ('XCMS' R package) (Shi et al., 2017).The stability of the instrumental analysis was monitored by quality control samples (batch-specific pooled aliquots of the study samples and a longterm reference sample were injected at the beginning, end and regularly throughout the injection sequence).Measurement drift was calculated per batch by a cluster-based approach using batch-specific quality control samples with signal correction performed per cluster if resulting in improved measurement homogeneity among long-term reference samples ('batchCorr' R package) (Brunius et al., 2016).Features were over-all retained if they passed QC test (CV < 0.3) in at least five out of the eight analytical batches (Supplementary Material of L. Shi et al. 2018(Shi et al., 2018a)).Missing values were replaced with values selected at random from a normal distribution between 0 and the lowest measured peak intensity for each metabolite feature under the assumption of missing not at random due to low concentration.Poorly retained lipids in hydrophilic interaction chromatography (retention time < 70 s) were removed, as these were better represented in reverse phase.We use the term metabolite feature to refer to a molecular entity with a unique mass to charge ratio and retention time as measured by LC-MS.We putatively annotated six metabolite features as possible PFAS (Supplementary Table 1), which were excluded from the metabolomics dataset in order not to bias data models by being included both as independent and dependent variables.

Statistical analysis
A flow-chart of the analytical procedure is presented in Fig. 1.To discover metabolite features related not only to individual PFAS exposures, but also PFAS exposure groups, we performed an orthogonally (varimax) rotated principal component analysis (PCA) on individual square root-transformed PFAS scaled to unit variance.We obtained two PFAS groups through manual inspection of the first two principal component (PC) loadings (eigenvalue > 1).The two PFAS groups and the individual PFAS were later tested for their correlations with PFASrelated metabolite features, to investigate whether the PFAS groups associated with similar metabolic alterations as the individual PFAS.We filtered out the majority of metabolite features likely not associated with PFAS by sparse partial least squares regression using four components and the metabolomics data as the predictor variables and each of  (bottom).PFAS-related features were discovered using the random forest algorithm in step 1 and validated in step 2 for the reduced sample population at baseline after inclusion of confounders.Subsequently, the validated features were investigated for their prospective associations with T2D.Abbreviations: Principal Component Analysis (PCA) and Partial Least Squares (PLS).
individual square root-transformed PFAS concentrations and two PFAS groups as response variables.The arbitrary choice of four components allowed for a compromise between modelling complexity and parsimony and reduced the number of features from 24,758 to approximately 4000 and was performed primarily to improve the computational performance in the subsequent predictive modelling: The pre-filtered metabolomics data were entered as predictor data and processed using a random forest model within a repeated double-cross validation framework incorporated with unbiased variable selection (R package MUVR (Shi et al., 2019a), tutorial available on https://gitlab.com/CarlBrunius/MUVR), using each of the individual square roottransformed PFAS concentrations and PFAS groups (n = 2) as response variables.In order to assess modelling performance, permutation analysis was performed (n = 50, p < 0.001) (Lindgren et al., 1996), parameter settings for the random forest models and permutation analyses are described in Supplementary Table 2.Moreover, to exclude metabolite features that likely originated from sex or baseline versus follow-up measurement differences, we tested the robustness of the discovered metabolite features associated with PFAS by a series of similar analyses, stratified by sex and measurement occasion.Those metabolite features that were present both in the overall model and in at least one of the sex and measurement stratified models, were further validated using partial Spearman correlation with PFAS levels at the baseline measurement while adjusting for the following potential confounding factors: sex, age, sample year, marital status, education, smoking status, physical activity and case-control status.PFAS were square root-transformed and metabolite features were square roottransformed as well as standardized prior to the correlation analysis and p-values were False Discovery Rate adjusted for multiple testing (FDR < 0.05).
In the second step, PFAS-related metabolite features were tested for their individual prospective associations with T2D risk using conditional logistic regressions, with standardized square root-transformed metabolite features as the independent variables.In model 1, the analyses were adjusted for age, sex and sample year, marital status, education, smoking status and physical activity, and in model 2 additionally for fish-, red/processed meat-and alcohol consumption and BMI (Donat-Vargas et al., 2019a).Resulting p-values from all models were adjusted for multiple testing (FDR < 0.05).Odds ratio (OR) is presented per 1standard deviation (SD) increase.The likelihood for discovering T2Drelated metabolite features amongst the PFAS-related metabolite features was assessed using the hypergeometric test (p < 0.05).
In order to investigate intercorrelations between PFAS-and T2Drelated metabolite features, a PCA was constructed based on the 35 selected metabolite features associated with T2D (p < 0.05).Correlations between PCs (n = 3, eigenvalue > 2) and individual PFAS, PFAS groups, food intakes (i.e.fish, red/processed meat), alcohol consumption, BMI and HOMA-IR were analysed using partial Spearman correlation analysis adjusted for sex, age, sample year, marital status, education, smoking status, physical activity and case-control status.Associations between PC scores and T2D risk were analysed using conditional logistic regression adjusted for covariates according to model 1 and model 2 (as above).In a sensitivity analysis, we also investigated associations between PC scores and T2D risk while stratifying for sex.OR is presented per 1-SD increase in the pattern score.The associations between T2D-and PFAS-related metabolite feature patterns, the PFAS levels and diet with T2D risk were displayed in a triplot (Schillemans et al., 2019) (R package Triplot, tutorial available on https://gitlab.com/CarlBrunius/triplot).Briefly, the triplot is a visualization tool that facilitates interpretation of exposures, the metabolome and disease risk data.It presents loadings of metabolite features in 2 PCs (in black), whilst superimposing the correlations between the PCs and the PFAS levels or diet components (in blue) as well as the associations between the PCs and T2D risk (in red).Thus, metabolites, PFAS levels/diet components and T2D risk pulling in the same axis in the triplot figure are positively correlated/directly associated, whereas opposite axis are negatively correlated/inversely associated.
These findings were validated in a second dataset containing participants from the same case-control study with similar background characteristics (Supplementary Table 3), but with only one measurement (baseline) and without PFAS measurements (N = 598) (Shi et al., 2018a).PC scores for the PFAS and T2D-related metabolite feature patterns were predicted in the validation dataset based on the PCA of the original dataset (described above) and then used in a conditional logistic regression with T2D risk.
To avoid confusion, the term "group" refers to exposure groups obtained from the PCA on PFAS, whereas the term "pattern" refers to metabolite feature patterns obtained from the PCA on the PFAS-and T2D-related metabolite features.All statistical analyses were performed using R software version 3.4.3(R Core Team, Vienna, Austria) (R Core Team, 2017).

Metabolite identification
The metabolite annotation is reported in Supplementary Table 4, following the Metabolomics Standards Initiative (MSI) reporting criteria for the confidence level.Auto-MS/MS data was only available for a few of the selected metabolite feature peaks, some of these could be matched to literature based on accurate mass and product ion spectrum (level 2) (Shi et al., 2018b(Shi et al., , 2019b;;Hanhineva et al., 2014).Other metabolite features associated with PFAS and T2D were putatively annotated for compound class based on m/z (mass tolerance < 20 ppm) and retention time using matching against online databases (level 3).Unknown compounds were presented as "analytical mode _ m/z @ retention time" (level 4).

Population and PFAS
Total study population characteristics at baseline are described in Table 1.
Differences in characteristics and PFAS concentrations between cases and controls at baseline and follow-up as well as the PFAS intercorrelations are described in our previous papers (Donat-Vargas et al., 2019a, 2019b).In brief, T2D cases had somewhat higher BMI and physical activity, but lower PFAS concentrations (except for PFNA) compared to the controls (Donat-Vargas et al., 2019a).High correlations were found between PFOS and PFOA as well as between PFNA, PFDA Note: Continuous variables are given as mean ± SD and categorical variables are given as number (percentage).a Matching factors for case-control pairs.b Cambridge index for physical activity.
and PFUnDA (Donat-Vargas et al., 2019a).In addition to the previously observed results where PFOS and PFOA had higher concentrations at baseline, whereas PFNA and PFDA seemed higher at follow-up (Donat-Vargas et al., 2019b), a tendency for higher PFAS concentrations was observed in males compared to females, with exception of PFUnDA (Fig. 2).The PCA of the six PFAS showed that two PCs accounted for 73% of the variance.High loadings were observed for the long-chain PFDA, PFNA and PFUnDA in PC1 (henceforth referred to as LC_PFAS) and for the shorter-chain PFHxS, PFOS and PFOA in PC2 (SC_PFAS) (Supplementary Table 5).

PFAS-related metabolite features
The number of metabolite features observed per individual PFAS and PFAS group after each analysis step is presented in Table 2.We found 290 unique metabolite features related to one or more PFAS in the multivariate random forest models (permutation analysis p < 0.001), after filtering out those features likely to be selected because of sex or time of measurement differences alone (Supplementary Table 6).Out of these 290 metabolite features, 171 correlated with individual PFAS at baseline in the selected population (n = 124) after adjustment for confounders (0.16 ≤| r |≤ 0.37, FDR < 0.05).Among them, a majority of metabolite features were associated with PFNA, PFDA, PFUnDA or their affiliated LC_PFAS group (Table 2).

PFAS and T2D-related metabolite features
Among the 171 PFAS-associated metabolite features, 35 associated with T2D adjusted for matching factors, marital status, education, smoking status and physical activity in model 1, (Supplementary Fig. 1), but only 13 remained after full multivariable-adjustment in model 2 (model 1 additionally adjusted for fish-, red/processed meat-and alcohol consumption and BMI) (Fig. 3).After adjustment for multiple testing, 7 metabolite features remained in model 1 (hypergeometric p < 0.0025; Supplementary Table 7-8), but only 1 in model 2 (hypergeometric p = 0.45; Supplementary Table 7-8).One metabolite annotated as gamma-butyrobetaine (GBB) correlated negatively with PFAS but associated inversely with T2D risk.Other features with strong associations to both PFAS and T2D risk were annotated to compound classes of Fig. 2. PFAS concentrations (ng/mL) in the entire study population (n = 374) at baseline and follow-up for males and females.All PFAS, except PFUnDA, had higher concentrations in males compared to females.glycerophospholipids and diacylglycerols.
In the PCA on the 35 metabolite features selected for their associations with PFAS and diabetes (Supplementary Fig. 2, Supplementary Table 5), PC1 and PC2 accounted for the largest proportion of the total variance (23% and 12%, respectively) and showed the strongest association with T2D risk, thus these were further investigated.We found that metabolite features putatively annotated as glycerophospholipids had high loadings in PC2, whereas those annotated as diacylglycerols had high loadings in PC1 (Supplementary Table 5, Fig. 4).PFDA, PFUnDA and LC_PFAS correlated positively with both PC1 and PC2, whereas PFNA correlated only with PC1.PFHxS, PFOS, PFOA and SC_PFAS did not correlate with any of the PCs.PC1 furthermore correlated positively with alcohol and fish intake as well as with BMI and HOMA-IR and was associated with increased risk of T2D (model 1: OR = 1.9; 95% Confidence Interval (CI) = 1.3-2.7).PC2 correlated negatively with BMI and HOMA-IR, but not with any of the dietary intakes and associated with lower risk of T2D (model 1: OR = 0.2; 95% CI = 0.1-0.4) (Fig. 4).
Mutual adjustment for PC1 and PC2 did not affect results qualitatively (data not shown).Similar associations with T2D were seen in the validation cohort (Supplementary Fig. 3).Results did not differ greatly between males and females, although males had a slightly higher OR of T2D for PC1, whereas females had a higher OR for PC2 (Supplementary Fig. 3).

Discussion
This population-based prospective nested case-control study is the first to use metabolomics data to examine the relationship between PFAS and risk of T2D, providing valuable information about potential underlying mechanisms.Modelling of PFAS exposures by the metabolome in the random forest algorithm resulted in strikingly strong models and 171 metabolite features associated with PFAS, from which 35 also associated with T2D risk.This is more than would be expected by chance Fig. 3. Multivariable-adjusted univariate associations between metabolite features and (a) T2D risk (odds ratios per 1-standard deviation increase ± confidence interval (CI)) and (b) their correlations with PFAS and food components (missing n = 2).(a) Model 1 was adjusted for matching factors (sex, age and sample year), marital status, education, smoking status and physical activity.Model 2 was additionally adjusted for fish, meat and alcohol intake and BMI (missing n = 2).(b) Partial Spearman correlations were adjusted for sex, age, sample year, marital status, education, smoking status, physical activity and case-control status and significance is indicated by * FDR < 0.05, ** FDR < 0.01, *** FDR < 0.001.Red color represents positive correlations, blue color represents negative correlations.Unknown metabolite features are presented as: Mode_m/z@RT.Abbreviations: docosahexaenoic acid (DHA), long-chain PFAS: PFDA, PFNA and PFUnDA (LC_PFAS) and short-chain PFAS: PFHxS, PFOS and PFOA (SC_PFAS).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)(n = 2), indicating that most of the 35 features could be involved in pathways related to both PFAS and T2D.Such common pathways are of interest for the understanding of biological mechanisms behind any relationship between PFAS exposures and T2D risk.
PFAS were, via PCA, assembled into two groups based on their measured concentrations in blood, with shorter-chain PFHxS, PFOS and PFOA in one group (i.e.SC_PFAS) and long-chain PFNA, PFDA and PFUnDA in the other (i.e.LC_PFAS).However, the underlying rationale of the observed grouping is uncertain.PFAS could potentially group together either due to similarities in exposure sources or to their biological accumulation and potency (Wolf et al., 2008a).Several lipidrelated metabolites were associated with individual PFAS congeners or PFAS groups.These findings are supported by previous untargeted metabolomics studies, which also indicate associations between PFAS (i.e. PFHxS, PFOS, PFOA, PFNA, PFDA, PFUnDA) and fatty acids, lipids and, in particular, glycerophospholipids (Salihovic et al., 2019;Alderete et al., 2019;Kingsley et al., 2019;Hu et al., 2019).The current results are further strengthened by the fact that similar PFAS associated metabolites were discovered using different data analytical strategies, i.e. univariate analysis (Salihovic et al., 2019;Alderete et al., 2019;Kingsley et al., 2019;Hu et al., 2019) versus the multivariate modelling introduced in this work.Although causality cannot be inferred in the present study design, a PFAS-related effect on lipids is biologically plausible since PFAS are thought to disrupt fat metabolism through several mechanisms, e.g.activation of PPARs (Fievet et al., 2006;Berger et al., 2005), interference with endocrine pathways (Zhao et al., 2010) or cholesterol transport (Boujrad et al., 2000;Shi et al., 2010Shi et al., , 2007)).
Unsupervised analysis of the PFAS-associated metabolite features showed that one pattern was dominated by annotated glycerophospholipids and the other by diacylglycerols.Both PFAS-related metabolite feature patterns correlated with the long-chain PFNA, PFDA and PFUnDA, whereas they did not correlate equally strong with the shorter PFHxS, PFOS and PFOA.PFDA and PFUnDA were mainly associated with the glycerophospholipid metabolites, whereas PFNA associated with diacylglycerol metabolites.The stronger associations for long-chain PFAS may be explained by a higher potency and biological accumulation (Wolf et al., 2008a), supported by a previous finding where the strongest associations were also found between long-chain PFAS (i.e.PFNA and PFUnDA) and glycerophospholipids as well as other fatty acids (e.g.docosapentaenoic acid (DPA) and docosahexaenoic acid (DHA)) (Salihovic et al., 2019).
Furthermore, we discovered that the two metabolite feature patterns related to long-chain PFAS had opposite associations with T2D risk.The glycerophospholipid pattern associated with a decreased risk of T2D, whereas the diacylglycerol pattern associated with an increased risk.This is in line with a previous plasma lipidomic profiling study that Fig. 4. Multivariate associations of PFAS-and T2D-related metabolite features with PFAS, fish and red/processed meat intakes, alcohol consumption, BMI, HOMA-IR and risk of T2D.The triplot represents the correlations between metabolite features (features with absolute loading > 0.4 are visualized by black arrows with label, whereas with absolute loading < 0.4 are represented by light grey arrows), fish, red/processed meat intakes, alcohol consumption, BMI and HOMA-IR as well as risk of T2D (Schillemans et al., 2019).Correlations were calculated using partial Spearman correlation adjusted for sex, age, sample year, case/control status, marital status, education, smoking status, physical activity and FDR multiple testing (missing n = 2).Odds ratios per 1 standard deviation increase in pattern score ±95% confidence intervals were obtained from two conditional logistic regression models: Model 1 was adjusted for matching factors (sex, age and sample year), marital status, education, smoking status and physical activity.Model 2 was additionally adjusted for fish, meat and alcohol intake and BMI (missing n = 2).Unknown metabolite features are presented as: Mode_m/z@RT.Abbreviations: docosahexaenoic acid (DHA), long-chain PFAS: PFDA, PFNA and PFUnDA (LC_PFAS) and shortchain PFAS: PFHxS, PFOS and PFOA (SC_PFAS).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) showed inverse associations between certain phospholipids, especially lyso-phosphatidylcholines, and T2D as well as direct associations between diacylglycerols and T2D (Razquin et al., 2018).Our results suggest that PFAS are associated with two groups of lipid species with opposite relations with T2D development, which could contribute to the interpretation of inconsistent results from epidemiological studies investigating PFAS exposures and subsequent T2D risk.A prospective and a cross-sectional study indicate associations of PFOS and PFOA with increased T2D risk (He et al., 2018;Sun et al., 2018), which would be supported by the results of the diacylglycerol dominated metabolite feature pattern.On the other hand, a few cross-sectional studies show lower T2D prevalence with higher serum PFAS (i.e.PFHxS, PFOS, PFOA and PFNA) concentrations (MacNeil et al., 2009;Conway et al., 2016).Likewise, in this same study population, we previously showed inverse prospective associations between PFAS (i.e.PFHxS, PFOS, PFOA, PFNA, PFDA and PFUnDA) levels and T2D risk as well as inverse associations with insulin resistance and triglycerides in those who were nondiabetics, although these were mainly non-significant (Donat-Vargas et al., 2019a, 2019b).These findings get support from the results of the glycerophospholipid dominated metabolite feature pattern, which also shows an inverse correlation with HOMA-IR.Thus, PFAS-related glycerophospholipids may be related to insulin sensitivity and T2D risk.Interestingly, glycerophospholipids and diacylglycerols share reactions in lipid synthesis pathways (Saltiel and Kahn, 2001).Additionally, increased circulating free fatty acids and accumulation of triglycerides and fatty acid-derived metabolites (e.g.diacylglycerols) in muscle and liver is thought to contribute to insulin resistance (Saltiel and Kahn, 2001).
Another potentially interesting PFAS-and T2D-related metabolite was identified as GBB, an intermediate metabolite in the carnitine biosynthesis and in the gut microbiota-dependent formation of trimethylamine N-oxide (Koeth et al., 2019).GBB correlated negatively with PFAS and associated inversely with T2D.One possible pathway linking PFAS to GBB relates to PPARα activation, as animal studies indicate that carnitine metabolism may be regulated by PPARα (Ringseis et al., 2012), another speculation is that PFAS affects the gut microbiota.Carnitine plays a key role in lipid metabolism and studies indicate both beneficial (Strand et al., 2018) and detrimental (Koeth et al., 2014) effects of carnitine on T2D due to fatty acid transport and β-oxidation as well as trimethylamine N-oxide associated pro-diabetic effects (e.g.inflammation, insulin resistance and hyperlipidemia) (Li et al., 2015;Dambrova and Liepinsh, 2015).Red meat is the main source for carnitine, but there was no correlation between GBB and meat or fish, making it unlikely that the PFAS-GBB link is confounded by diet.
Nevertheless, since different dietary habits and different dietary sources of PFAS have been suggested to confound the PFAS-T2D association and contribute to the inconsistencies in the epidemiological findings (Donat-Vargas et al., 2019a), we further explored correlations between other PFAS-related metabolite features and different dietary components.In this study, fish intake correlated the strongest with the diacylglycerol dominated metabolite feature pattern associated with increased T2D risk.For fish intake, literature indicates inverse associations with T2D (Jannasch et al., 2017), although there are some inconsistencies (Namazi et al., 2019).Since fish is known to be the main dietary source of PFAS as well as of other persistent organic pollutants (POPs) and the latter are associated with increased T2D risk (Tornevi et al., 2019), we speculate that the positive association between PFAS and T2D may be confounded by other POPs.This is supported by our previous finding that some of the PFAS-related metabolites including diacylglycerol (RP_684.55@805.11)and a known biomarker of fish intake DHA (RP_329.25@632.04)were also related to fish intake and POPs (Shi et al., 2019b).Although BMI correlated with the PFAS-related metabolite feature pattern, adjustment for BMI in the logistic regression did not affect the qualitative assessment of risk.This finding suggests that the associations between the PFAS-related metabolite features and T2D risk were independent of BMI.It is uncertain whether we should consider BMI as a confounder or a mediator, as PFAS exposures may affect weight and weight regain (Liu et al., 2018;Cardenas et al., 2018).
Our study fits in a broader perspective of exposome-based studies that are using metabolomics in population-based research as a tool to investigate the influence of environmental exposures on disease risks (Walker et al., 2019).We employed a meet-in-the-middle approach, where the exposure is related to intermediate biomarkers (metabolomics), which are then connected to disease risk (Vineis and Perera et al., 2007;Chadeau-Hyam et al., 2011;Gängler et al., 2019).This approach has advantages for causality assessment in epidemiology and may be integrated with the toxicological concept of adverse outcome pathways.Our study has several strengths.First, the prospective design, measurements of PFAS in blood and stringent T2D diagnosis reduced the likelihood of reverse causation bias and chances of exposure and outcome misclassification, respectively.Second, the collected data allowed adjustments for several potential confounders as well as to explore the impact of diet on the associations.Finally, the random forest modelling was designed to minimize overfitting and false positives while selecting all-relevant metabolite features, allowing us to select metabolic features using a multivariate approach.Nevertheless, there are also several limitations that need to be acknowledged.First, the explorative nature of our study hinders inference regarding possible biomarkers or dose-response relations.Our study may also be subject to false-negative results, i.e. we may have missed associations between PFAS and metabolite features due to stringent inclusion criteria and stratification analyses, as well as false-positive results due to chance findings or overfitting.However, the hypergeometric test indicated a significant enrichment in our results in model 1, but not in model 2, which may be due to over-adjustment in model 2. The validation in the second independent dataset indicates that overfitting is limited, although the possibility for residual confounding remains.Second, we focused on 2 PCs of 35 PFAS-related metabolite features that explained approximately a third of the variance within the data.It is possible that remaining variability might be attributed to either residual variability or potentially important but unknown systematic variability.However, the primary aim of this study was to identify plasma metabolites associated with exposure to both PFAS and T2D risk, which were reflected by the first 2 PCs and were thus further investigated.There was no opportunity to replicate these results in a separate cohort, but similarities in results between different metabolomics studies investigating PFAS do strengthen the observed findings.Furthermore, not all the discovered PFAS-related metabolite features had MS-MS data available and were thus only putatively annotated for class, i.e. glycerophospholipids and diacylglycerols.There was no possibility to do targeted MS-MS yet, but detailed IDs will be included upon targeted examinations.
In conclusion, we annotated metabolite classes that were associated both with baseline PFAS levels and prospective T2D risk in this exploratory nested case-control study.These metabolite features formed two patterns mainly represented by annotated glycerophospholipid and diacylglycerol classes.Although both PFAS-related metabolite feature patterns correlated positively with PFAS, they had opposite directions on T2D risk, with glycerophospholipids showing inverse associations and diacylglycerols showing direct associations.Furthermore, GBB correlated negatively with PFAS and associated inversely with T2D.Our findings highlight the possibility that PFAS are associated with two groups of lipid species with diverging relations with T2D development, which could explain the conflicting associations found in literature between PFAS and T2D risk (Sun et al., 2018;Donat-Vargas et al., 2019a).

Fig. 1 .
Fig. 1.Flow chart of the analytical study design divided in selection of PFAS-related features (top) and associations between PFAS-related features and risk of T2D(bottom).PFAS-related features were discovered using the random forest algorithm in step 1 and validated in step 2 for the reduced sample population at baseline after inclusion of confounders.Subsequently, the validated features were investigated for their prospective associations with T2D.Abbreviations: Principal Component Analysis (PCA) and Partial Least Squares (PLS).

Table 1
Baseline main characteristics of the total study population (N = 374).

Table 2
Number of metabolite features (maximum model) related per individual PFAS or PFAS pattern exposures and model performance by prediction fitness and permutation analysis (N = 374).
c I.e.metabolite features selected from the full sample model (males, females and both measurements) that also occur in at least one of the sex and measurement time stratified models (i.e.males and females baseline only; males and females follow-up only; males both measurements; males baseline only; males follow-up only; females both measurements; females baseline only; females follow-up only).dI.e.FDR < 0.05 after partial correlation analysis adjusted for sex, age, sample year, marital status, education, smoking status, physical activity and casecontrol status (n = 248).e Refers to long-chain PFAS: PFDA, PFNA and PFUnDA (LC_PFAS) and shortchain PFAS: PFHxS, PFOS and PFOA (SC_PFAS).