Metabolomic biomarkers of the mediterranean diet in pregnant individuals: A prospective study

SUMMARY Background and aims: Metabolomic profiling is a systematic approach to identifying biomarkers for dietary patterns. Yet, metabolomic markers for dietary patterns in pregnant individuals have not been investigated. The aim of this study was to identify plasma metabolomic markers and metabolite panels that are associated with the Mediterranean diet in pregnant individuals. Methods: This is a prospective study of 186 pregnant individuals who had both dietary intake and metabolomic profiles measured from the Fetal Growth Studies-Singletons cohort. Dietary intakes during the peri-conception/1st trimester and the second trimester were accessed at 8–13 and 16–22 weeks of gestation, respectively. Adherence to the Mediterranean diet was measured by the alternate Mediterranean Diet (aMED) score. Fasting plasma samples were collected at 16–22 weeks and untargeted metabolomics profiling was performed using the mass spectrometry-based platforms. Metabolites individually or jointly associated with aMED scores were identified using linear regression and least absolute shrinkage and selection operator (LASSO) regression models with adjustment for potential confounders, respectively. Results: Among 459 annotated metabolites, 64 and 41 were individually associated with the aMED scores of the diet during the peri-conception/1st trimester and during the second trimester, respectively. Fourteen metabolites were associated with the Mediterranean diet in both time windows. Most Mediterranean diet-related metabolites were lipids (e.g., acylcarnitine, cholesteryl esters (CEs), linoleic acid, long-chain triglycerides (TGs), and phosphatidylcholines (PCs), amino acids, and sugar alcohols. LASSO regressions also identified a 10 metabolite-panel that were jointly associated with aMED score of the diet during the peri-conception/1st trimester (AUC: 0.74; 95% CI: 0.57, 0.91) and a 3 metabolites-panel in the 2nd trimester (AUC: 0.68; 95% CI: 0.50, 0.86). Conclusion: We identified plasma metabolomic markers for the Mediterranean diet among pregnant individuals. Some of them have also been reported in previous studies among non-pregnant populations, whereas others are novel. The results from our study warrant replication in pregnant individuals by future studies. Clinical trial registration number: This study was registered at ClinicalTrials.gov.


Introduction
In the past two decades, growing evidence has consistently indicated the importance of dietary patterns as a measure of overall dietary quality, rather than individual nutrients or foods, in promoting health and preventing disease risk [1][2][3]. The Mediterranean diet, a traditional dietary pattern among people living in the Mediterranean Basin, featured higher intakes of vegetables, fruits, nuts, legumes, fish, cereals, and olive oil, but lower intakes of red and processed meats and sweets [4,5]. The favorable cardiometabolic and neurological effects of the Mediterranean diet have been demonstrated in both high-quality randomized controlled trials (RCT) [6] and systematic reviews and meta-analyses across different populations [7][8][9], including pregnant individuals [10,11]. Yet, molecular markers of the Mediterranean diet have not been elucidated in pregnant individuals.
Biomarkers of dietary intake can be applied as objective measures of dietary patterns and help to understand the underlying biological pathways between diet and health outcomes [12]. As compared to the traditional approach of examining biomarkers for a single nutrient or food separately (which likely overlooks the interactions among nutrients and food groups), the recent advance in high-throughput untargeted metabolomic profiling techniques permits a more comprehensive and systematic approach to identifying biomarkers for dietary patterns [13]. By measuring down-stream small molecules or metabolic products (<1.5 kDa, metabolomic markers), the metabolomics approach may provide more information on the interactions between nutrients/foods and genes for individuals and could identify novel biomarkers or biological pathways. Such an approach is ideal for identifying the complex, net impact of numerous nutrients and their metabolism in human bodies for a given dietary pattern.
Recent studies have investigated the metabolomic markers for the Mediterranean diet in nonpregnant populations, with each study having identified several blood metabolomic markers for the Mediterranean [14][15][16][17][18][19][20][21][22][23]. Yet, no studies have investigated the metabolomic markers for this healthy dietary pattern in pregnant individuals. Pregnancy can result in a series of dynamic physiological changes, including alterations of the maternal hormonal profile, basal metabolic rate, energy storage, and partition [24]. Thus, the metabolic responses to diet in pregnant individuals may not be the same as the non-pregnant populations. Furthermore, no previous studies have performed supervised methods to identify panels of dietary patterns related to metabolites. As dietary patterns are combinations of foods and nutrients, a panel of metabolites could better capture the multidimensionality and interrelations of nutrients and foods presented in the dietary patterns. Therefore, the primary aim of this study is to identify blood metabolomic markers and metabolite panels that are associated with the Mediterranean diet in pregnant individuals.

Study population and design
This was a prospective study among racially diverse pregnant individuals who enrolled in the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Fetal Growth Studies-Singletons cohort (FGS). Details of the cohort have been described previously [25]. Briefly, a total of 2802 individuals aged 18-40 years with singletons were recruited between 8 and 13 weeks of gestation from 12 clinical sites in the United States between July 2009 and January 2013. Institutional Review Board approval was obtained for all participating clinical sites, the data coordinating center, and NICHD. The current analysis only included 186 individuals from a nested gestational diabetes mellitus (GDM) case-control study within the FGS who had both blood plasma metabolomic profiling data and dietary intakes [26]. The participant flow chart is presented in Supplementary Fig. 1.
Frequency Questionnaire (FFQ) and during the second trimester (16-22 weeks) using the automated self-administered 24-h dietary recall (ASA24 ® ; version Beta, 2011). Both dietary assessment tools were developed and validated by the National Cancer Institute, National Institutes of Health (NIH) [27][28][29]. We measured the adherence to the Mediterranean diet by calculating the alternate Mediterranean score (aMED) using the foods and nutrients data from the FFQ and ASA24. The aMED score was calculated from 8 food and nutrient components (i.e., fruits, vegetables, whole grains, nuts, legumes, fish, red and processed meats, and monounsaturated-to-saturated fat ratio), with the healthy components, scored 1 for above the median intake and 0 for below, and unhealthy components scored reversely [30]. Thus, a higher score indicated a better adherence to the Mediterranean diet. This method has been applied to calculate the alternate Mediterranean score in the same cohort in previous publications [10,11].

Biospecimen collection and metabolomics profiling and data pretreatment
We collected the fasting blood samples in the 2nd trimester (91.0% of the blood draw were within 16-22 weeks). Plasma samples were immediately processed and stored at −80.0 °C until analysis. We performed the untargeted metabolomic profiling at the NIH West Coast Metabolomics Center at the University of California Davis using the two platforms: highthroughput liquid chromatography-quadrupole time of flight mass spectrometry (LC-QTOF-MS/MS) and gas chromatography-time of flight mass spectrometry (GC-TOF-MS) [31]. All samples were analyzed by the two platforms. Internal standards were used for the calibration of retention times. A total of 751 features were detected, with 459 annotated metabolites and 292 unknown features. In the current study, we only included the 459 annotated metabolites. All 459 metabolites had missing values < 20% and missing values were imputed with half of the minimum value by batch, which is recommended for metabolomics data [32]. To correct day-to-day technical variation from the platform, metabolites were divided by their median values (i.e., re-scaled to a median of 1) and log-transformed within each batch [33].

Covariates
We collected individuals' sociodemographic characteristics, lifestyle factors, and reproductive and medical history using a detailed questionnaire at study enrollment. Prepregnancy body mass index (BMI) was calculated based on height measured at baseline and self-reported pre-pregnancy weight. We categorized individuals into normal weight (19.0-24.9 kg/m 2 ), overweight (25.0-29.9 kg/m 2 ), or obese (≥30.0 kg/m 2 ) by their prepregnancy BMI. We assessed physical activity using a validated Pregnancy Physical Activity Questionnaire (PPAQ) [34] in the first trimester for measuring habitual physical activity in the past 12 months and second trimester for physical activity since the baseline visit.

Statistical methods
We applied sampling weights in the statistical analyses to account for the oversampling of individuals with GDM in nested-case control samples and to represent the results in the full FGS sample. We described and compared individuals' characteristics at the study enrollment across the aMED score tertiles. We presented the results as weighted percentages (%) and actual frequency (N) for categorical variables and weighted mean (standard errors, SE) for continuous variables. We calculated the P-values comparing individuals across aMED score tertiles by one-way Analysis of variance (ANOVA) tests for continuous variables and χ 2 -tests for categorical variables.
Several steps were performed to identify plasma metabolomic markers for aMED score. We used the aMED scores derived from the FFQs (assessed at baseline visit of 8-13 weeks) in the prospective analyses and the aMED scores derived from the ASA24 ® (assessed at 16-22 weeks) during the 2nd trimester in the cross-sectional analyses. We first identified plasma metabolites that were individually associated with aMED score using the linear regressions with adjustment for potential confounders, including maternal age (years), race/ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic, Asian/Pacific Islander), education (high-school degree or less, associate degree or more), pre-pregnancy BMI (kg/m 2 ), and total physical activity (minutes/week). Metabolites were selected if they were significantly different, comparing the highest aMED tertile to the lowest tertile. We controlled for multiple comparisons by using the Benjamini-Hochberg method, with the overall false discovery rate (FDR) < 0.05 being considered statistically significant [35]. We also conducted the sensitivity analyses of the associations of individual metabolites with aMED score, stratified by the GDM status (i.e., within individuals who developed GDM and individuals who didn't develop GDM in the late 2nd semester). We performed the least absolute shrinkage and selection operator (LASSO) regressions to select the panel of plasma metabolites jointly associated (i.e., with) with aMED score [36]. The metabolites panels were selected if they had non-zero coefficients based on the criteria of lambda.1se. To avoid over-fitting, we performed the 10-fold cross-validation in LASSO regression with participants randomly divided into training and validation sets with a ratio of 2:1 and calculated the area under the curve (AUC) of each panel. All data analyses were conducted using SAS software (version 9.4; SAS Institute, Cary, NC, US) or R (version 4.0.2; R Studio: Integrated Development for R. R Studio, Inc., Boston, MA, US).

Participants baseline characteristics
Among all individuals, 27.2% were non-Hispanic White, 25.7% were non-Hispanic Black, 24.4% were Hispanic, and 22.7% were Asian/Pacific Islander. The mean (SE) age of individuals at enrollment was 28.0 (0.4) years, and the mean BMI was 25.5 (0.4) kg/m 2 , with 50.4% having a pre-pregnancy BMI in the normal range. The baseline characteristics of all individuals and by their aMED score tertiles are presented in Table 1. Compared to individuals in the lowest aMED score tertile (T1), individuals in the highest tertile (T3) were more likely to be non-Hispanic White, highly educated, nulliparous, and had lower pre-pregnancy BMI and a healthier profile of dietary intake.

Prospective associations of aMED score reported in the first trimester
Among 459 annotated metabolites, 64 were individually associated with aMED score reported in the first trimester (FDR<0.05), adjusting for age, race/ethnicity, education, prepregnancy BMI, and physical activity ( Table 2). Among 64 metabolites, 51 (79.69%) had positive associations and 13 (20.31%) had negative associations ( Fig. 1A and Supplementary  Fig. 3A). In the sensitivity analysis, all 64 metabolites remained significantly associated with aMED score in individuals without GDM (Supplementary Table 1 , and inverse associations of glutamic acid, aspartic acid, and 3-hydroxybutyric acid, and epsilon-caprolactam (Fig. 2). Except for the epsilon-caprolactam, all other 9 metabolites were also individually associated with aMED scores.

Cross-sectional associations of metabolites with aMED score reported in the second trimester
After adjusting for the abovementioned covariates, 41 metabolites were significantly associated (FDR< 0.05) with aMED score reported in the second trimester, with 25 (61.0%) having positive associations and 16 (39.0%) having negative associations (  Table 2). In the sensitivity analysis, all 41 metabolites remained significantly associated with aMED score in individuals without GDM (Supplementary Table 2).
A panel of 3 metabolites was jointly associated with aMED score with an AUC of 0.68 (95% CI: 0.50, 0.86) in the LASSO regression, including a positive association of CE (20:5) B and inverse associations of glycolic acid and acylcarnitine (C16:0). All 3 were also individually associated with the aMED score in linear regressions.

Discussions
In this longitudinal study among pregnant individuals, we identified several plasma metabolomic markers for the Mediterranean diet in peri-conception and 1st trimester or recent diet in the 2nd trimester. Most Mediterranean diet-related metabolites are lipids species (e.g., acylcarnitines, linoleic acid, long-chain TGs, PCs, and CEs); others are amino acids and derivatives, and sugar alcohols. Of note, 14 metabolites were significantly related to the Mediterranean diet in both time windows. In addition to individual metabolites, we also identified multi-metabolite panels for the Mediterranean diet in pregnant individuals at two time windows. Such multi-metabolite panels with good-to-excellent predictability are promising to be considered the potential biomarkers of the Mediterranean diet because they significantly reduced the numbers of metabolites (i.e., as compared with all individually associated metabolites) we need to measure in future applications.
Consistent with previous studies in non-pregnant populations [14][15][16][17][18][19][20][21][22][23], we found most metabolites associated with the Mediterranean diet among pregnant individuals are lipids species (i.e., CE (20:5) A and B, long-chain acylcarnitines (C18:0) and (C18:2), long-chain TGs (TG (56:1) A and B, TG (58:4), TG (60:2) A, and PC (36:5) B, and linoleic acid). As the esterified form of cholesterol with a single fatty acid, CE is the major form of cholesterol in human plasma [37]. The plasma levels of CEs have been linked with dietary intake of monosaturated fatty acids (MUFA) [38], olive oil, and seafood [21,39]. The health effects of individual CEs are unclear, but available evidence tends to suggest that CEs with longer, more unsaturated acyl chains may be favorable for cardiovascular diseases (CVD) [16] and diabetes [40]. Acylcarnitines have been identified as markers of meat (positive associations) [1,41,42], and coffee intakes [43] (inverse associations). Long-chain acylcarnitines were also positively associated with Western dietary patterns in a Canadian study [44]. As intermediates of fatty acid oxidation, disturbance of plasma acylcarnitines has been linked with the development of diabetes [45,46] and CVD [47,48] in the non-pregnant population, as well as, GDM and fetal development in pregnant individuals [49,50]. Indeed, findings from PREDIMED trial data have suggested that the Mediterranean diet may mitigate the adverse associations of acylcarnitines with CVD [16,51]. Plasma long-chain TGs and PCs mainly come from the consumption of fish, nuts, and vegetable oils (e.g. olive oil) [52]. Long-chain PCs have also been identified as markers of coffee consumption [53]. Plasma profiles of TGs and PC have been suggested as important signatures of insulin sensitivity and CVD risk [40,54]. TGs and PCs with different acyl chains and double bonds may play different roles, but the exact contribution of each or the combination of TGs and PCs is still poorly understood.
Linoleic acid was inversely associated with the Mediterranean diet at two-time windows in our study and a previous study in male Finnish smokers from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study cohort [23]. As an essential fatty acid to humans, the only resource of linoleic acid is from dietary intake (primarily from vegetable oils, nuts, and seeds) [55]. In the human body, linoleic acid is the parental n-6 polyunsaturated fatty acid (PUFA) and can be converted to arachidonic acid and subsequently metabolized to proinflammatory lipids mediators (i.e., eicosanoids). On the other hand, α-linolenic acid, an n-3 PUFA, can be converted to eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA), which subsequently produce the anti-inflammatory lipid mediators (resolvins and protectins) [56]. Mediterranean diet has been linked with a well-balanced plasma linoleic/α-linolenic acid ratio, which may result in a lower risk of CVD through the inflammation responses [57,58].
Several amino acids and derivatives such as glutamic acid (a major excitatory neurotransmitter and a by-product of the branched-chain amino acids catabolism), aspartic acid (a metabolite involved in the urea cycle), and 3-hydroxybutyric acid (a by-product of short-chain fatty acids and branched-chain amino acids; also known as β-hydroxybutyric acid) were identified as metabolomic markers for Mediterranean diet in our study and two observations studies in non-pregnant populations [20,21]. These amino acid derivatives are mainly involved in the urea cycle during amino acid catabolism [59]. Previous studies have shown the association of glutamic and aspartic acids with central obesity [60][61][62], diabetic retinal disease [61,63], and risk of CVD [64]. More importantly, the glutamic acid/ glutamate ratio has been identified as the single metabolite most strongly correlated with the visceral adipose tissue in several populations [62,65,66]. Taken together, it is possible that gluconeogenesis from amino acids could play important roles in the potential metabolic pathways that explain some health benefits of the Mediterranean diet.
We identified a couple of novel carbohydrate metabolites that were associated with the Mediterranean diet in pregnant individuals, including two sugar alcohols (i.e., lyxitol and xylitol), glycolic acid, and two organic acids (i.e., citric acid and isocitric acid). Xylitol and lyxitol are sugar alcohols that can be found naturally in many fruits and vegetables, or artificially produced [67]. As non-digestible carbohydrates, they have a sweet taste but produce much fewer calories in human bodies. Thus, they (mainly xylitol) are widely used as a sugar substitute in "sugar-free" products such as chewing gums, yogurt, cookies, and candies [67]. The plasma levels of these sugar alcohols are mainly from dietary sources through passive diffusion in the small intestine [68]. Therefore, the positive associations of xylitol and lyxitol with the Mediterranean diet may reflect the high consumption of fruit and vegetables, or some "commercially labeled "sugar-free" products, or both. Glycolic acid was inversely associated with the Mediterranean diet in both the first and second trimesters in our study of pregnant individuals, but not in previous studies in non-pregnant populations. Foods containing oxalic acid (e.g., spinach, rhubarb, and almond milk) or glyoxal (e.g. bread, cookies, yogurt, sardine oil, coffee, tea, beer, and wine) are likely the main exogenous sources of glycolic acid. However, most glycolic acids are produced endogenously from glycolaldehyde (a product of fructose and xylitol metabolism) during the synthesis of oxalate [69]. Clinically, glycolic acid is widely used in skin care products [70]. Urinary glycolic acid has also been inversely related to obesity and visceral fat tissue in healthy adults [71]. Citric acid and isocitric acid are tricarboxylic acid (TCA) cycle intermediates. TCA cycle is the final common oxidative pathway for carbohydrates, fats, and amino acids. It is the most important metabolic pathway for the energy supply in humans [72]. Citric acid is found in citrus fruits (e.g., oranges, lemons, and limes) and isocitric acid is rich in most berries (e.g., blackberries) and vegetables (e.g., carrots). Both of them are also commonly used as a flavoring (add sour taste) and preservative in food and beverages, especially soft drinks and candies. Urinary citric acid has been identified as the marker of dietary intake of wine and grape juice and lactovegetarian diet [73]. Serum isocitric acid has been linked to dementia in a small case-control study [74] and mild cognitive impairment in a large prospective study among Hispanic Community Health Study/Study of Latinos (HCHS/SOL) [75]. In pregnant individuals, first-trimester plasma citric acid has been identified as one of the top metabolites for predicting preeclampsia [76].
In the current study among pregnant individuals, we replicated some Mediterranean diet-related metabolites which were identified in previous studies among non-pregnant population. Among them, linoleic acid, aspartic acid, 3-hydroxybutyric acid, and CE (20:5) were associated with the Mediterranean diet in both time windows during the pregnancy, suggesting the robustness of the findings. The replicated findings are promising, supporting the concept that metabolomics could be a new approach for identifying biomarkers for dietary patterns across populations. The novel metabolites may reflect the variability of the physiological conditions of study populations (e.g., pregnant vs. non-pregnant), but could also be due to the differences in study design, population, metabolomic profiling, and statistical analysis approach.
Our study has several unique strengths. As the first study that investigated the metabolomic markers for Mediterranean diet in pregnant individuals, we recruited pregnant individuals from multiple racial and ethnic groups in 12 health centers in the US to enhance the generalizability of the study results to US pregnant women. We used longitudinal data with dietary intakes assessed prior to the plasma specimen collection to ensure the temporality of the association. Most previous studies only reported cross-sectional correlations [18][19][20][21][22][23]39]. Furthermore, we calculated the aMED score using a predefined method, which has been applied to both pregnant and non-pregnant populations, making our results to be easily compared with and replicated in other studies. In addition, we collected fasting blood samples that are less subjected to measurement variability and less influenced by other factors. As dietary patterns are combinations of foods and nutrients, a panel of metabolites could better capture the interrelations of nutrients and foods presented in dietary patterns with much smaller numbers of metabolites. In addition, with the availability of longitudinal dietary data before pregnancy and during early pregnancy, we were able to identify both long-term and short-term metabolite markers of Mediterranean diet. The multi-metabolite panels had fair-to-good predictability of adherence to Mediterranean diet in both first and second trimesters among pregnant individuals, suggesting the potential applications of using them as biomarkers of dietary intake.
Several limitations of this study should be considered when interpreting the results. First, our study is an observational study. Although we have examined and adjusted confounders rigorously, the residual confounding cannot be completely ruled out. Second, we only have one measure of 24-h recall at 2nd trimester, which may not be able to capture some foods that are usually consumed by study participants. Nevertheless, dietary assessments using one 24-h recall have been applied to derive the dietary patterns, including the Mediterranean diet, in both non-pregnant adult population [77] and pregnant individuals in large epidemiological studies [10]. Having meaningful numbers of overlapped metabolites from both the aMED scores derived from FFQ and one 24-h recall also suggested the usefulness of both dietary assessment methods. Future studies may need to quantify the levels of misclassification. Lastly, we only included the known metabolites, which could possibly miss some important metabolic features. However, these known metabolites are more reliable and can be used to compare with the results from other studies.
In conclusion, we detected a set of metabolites associated with the Mediterranean diet in pregnant individuals. Some of them were significantly related to the Mediterranean diet in both peri-conception and early-to-mid pregnancy time windows. These metabolites may deserve further investigations using the targeted approach to improve the accuracy of quantification and examining their relationships with pregnancy and fetal outcomes in future studies.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Data sharing
Data described in the manuscript, code book, and analytic code will be available upon request pending application and approval of a data-sharing agreement. Volcano plots showing individual metabolite associated with peri-conception and first trimester alternate Mediterranean Diet (aMED) score reported at 8-13 weeks (A) and second trimester aMED score reported at 16-22 weeks (B). β coefficients were adjusted for age, race, education, pre-pregnancy BMI, and physical activity in multivariable linear regression analyses. Multiple comparisons were corrected using the Benjamini-Hochberg method with the false discovery rate (FDR) < 0.05 being considered as statistically significant.  Characteristics of pregnant individuals by tertiles of peri-conception and first trimester alternate Mediterranean Diet (aMED) score reported at [8][9][10][11][12][13] weeks, the NICHD Fetal Growth Studies-Singleton Cohort.  (37) 24.39 (13) 25.45 (17) 36.10 (7) <0.001  Table 2 Metabolites associated with peri-conception and first trimester alternate Mediterranean Diet (aMED) score reported at 8-13 weeks and second trimester aMED score reported 16-22 weeks, the NICHD Fetal Growth Studies-Singleton Cohort.   Metabolites that were individually and significantly associated with the alternate Mediterranean Diet (aMED) scores in multivariable linear regression analyses after adjusting for age, race, education, pre-pregnancy BMI, and physical activity.
b Classification of chemical compound classes was performed using ClassyFire.
c Multiple comparisons were corrected using the Benjamini-Hochberg method with the false discovery rate (FDR) < 0.05 being considered as statistically significant.