Plasma Metabolomic Profiling in 1391 Subjects with Overweight and Obesity from the SPHERE Study

Overweight and obesity have high prevalence worldwide and assessing the metabolomic profile is a useful approach to study their related metabolic processes. In this study, we assessed the metabolomic profile of 1391 subjects affected by overweight and obesity, enrolled in the frame of the SPHERE study, using a validated LC–MS/MS targeted metabolomic approach determining a total of 188 endogenous metabolites. Multivariable censored linear regression Tobit models, correcting for age, sex, and smoking habits, showed that 83 metabolites were significantly influenced by body mass index (BMI). Among compounds with the highest association, aromatic and branched chain amino acids (in particular tyrosine, valine, isoleucine, and phenylalanine) increased with the increment of BMI, while some glycerophospholipids decreased, in particular some lysophosphatidylcholines (as lysoPC a C18:2) and several acylalkylphosphatidylcholines (as PC ae C36:2, PC ae C34:3, PC ae C34:2, and PC ae C40:6). The results of this investigation show that several endogenous metabolites are influenced by BMI, confirming the evidence with the strength of a large number of subjects, highlighting differences among subjects with different classes of obesity and showing unreported associations between BMI and different phosphatidylcholines.


Introduction
Overweight and obesity are defined as an excessive fat accumulation able to impair human health. These problematic conditions have been almost tripled in the last decades and they are currently one of the most relevant global public heath burdens and major risk factors for noncommunicable disease such as type 2 diabetes, cardiovascular diseases, musculoskeletal disorders, and also some forms of cancer [1]. Body mass index (BMI) is calculated by dividing an individual's weight (kg) by the square of their height (meters). Although it is not a perfect measure of a person's fat accumulation, since it does not take into account the difference between body fat and lean body mass, it is easy to obtain and it is widely applied in both clinical evaluations and epidemiological studies to categorize an individual in one of the following groups: underweight (BMI below 18.5 kg/m 2 ), normal weight (18.5-24.9), overweight (25.0-29.9), class I obesity (30.0-34.9), class II obesity (35.0-39.9), and class III obesity (above 40) [2].
To investigate the metabolic processes related to obesity, an approach could be the study of the metabolome, which is the ensemble of all the small molecules present in a biological fluid generated from the multitude of metabolic reactions of an organism [3]. Metabolomics is the comprehensive study of metabolites and is regarded as a promising approach to acquire a functional knowledge of the system's biochemistry [4]. Recent advancements in analytical instrumentation allow to conduct the simultaneous determination of hundreds of metabolites in a short amount of time and requiring a small quantity of biological sample.
Previous studies, summarized by Rangel-Huerta et al. [5], have tried to outline the metabolomic signature of human obesity. In individuals with obesity elevated levels of branched-chain and aromatic amino acids and decreased levels of glutamine and glycine have been reported; also glucose is usually increased; while, considering lipids, alterations of glycerophospholipids, among which lysophosphatidylcholines lysoPC 18:0, 18:1, 18:2, and various phosphatidylcholines and sphingomyelins have been described, although with differences across studies [5]. Most of the reported studies investigated only a limited number of subjects [5]. Overall, further studies with elevated numbers of subjects are required to confirm if the metabolites reported to be altered in individuals with higher BMI represent a specific metabolic signature of this problematic condition.
The aim of this work was to characterize the plasma metabolome in subjects affected by overweight and obesity. A large number of individuals, including subjects with overweight and all different classes of obesity, was investigated. A validated targeted metabolomic assay, measuring 188 metabolites belonging to amino acids, biogenic amines, sum of hexoses, acylcarnitines, glycerophospholipids, and sphingolipids, was used.

Study Population
The main demographic and clinical characteristics of the 1391 examined subjects are reported in Table 1. Age ranged from 18 to 86 years, with most of the participants aged from 30 to 69 years (84.5%). The population was predominantly composed of females (82%). The mean (±standard deviation) of BMI was 33.3 ± 5.5 kg/m 2 . In total, 397 (28.5%) subjects had a BMI less than 30 kg/m 2 , while 994 (71.5%) greater; among the latter, 530 (38.1%) had values of BMI ranging from 30.0 to 34.9 kg/m 2 (class 1 obesity), 304 (21.9%) from 35.0 to 39.9 kg/m 2 (class 2 obesity), and 160 (11.5%) greater than 40.0 kg/m 2 (class 3 obesity). About half of subjects was never smokers (50.9%) and the majority of them was employed (60.4%). Further details about the population can be found in the paper describing the rationale and study protocol of the SPHERE study [6].
The network analyses are reported in Figures 3 and 4. In Figure 3 only those metabolites that correlated one to each other with a Pearson's r > 0.4 are reported. In general, we note that metabolites belonging to the same class are grouped together. Most amino acids were intercorrelated; in particular phenylalanine, alanine, tyrosine, methionine, and tryptophan showed a good correlation (Pearson's r > 0.6), while a higher correlation (Pearson's r > 0.8) was found among leucine, isoleucine and valine. All lipid classes were grouped together, with the exception of acylcarnitines which were closer to amino acids, in particular C3 with valine (r = 0.43) and isoleucine (r = 0.47). The hexose also had a good correlation with isoleucine (r = 0.41) and leucine (r = 0.41). Among biogenic amines, alpha-amino adipic acid correlated with isoleucine (r = 0.42), kynurenine was related to tyrosine (r = 0.41), while serotonin was correlated with taurine (r = 0.57), which was correlated with spermidine (r = 0.46) and with glutamic acid (r = 0.47), in turn related to aspartic acid (r = 0.43). Dimethylarginine (total DMA) showed a good correlation with SM C26:0 (r = 0.44). In Figure 4 metabolites that are correlated one to each other with a Pearson's r > 0.7 are reported. Among amino acids, only leucine, isoleucine and valine are included. Several lysoPCs were correlated each other as well as several SMs. Among PCs, those with similar number of carbons were more closely related. Similar results can be observed in the Supplementary Materials with the cluster analysis (dendrogram) ( Figure S6).   A heat map considering only the metabolites most significantly associated with BMI in the Tobit models is reported in Figure 5. The dendrogram reported is obtained with a cluster analysis and shows how these metabolites are related to each other. Metabolites grouped on the right part of the picture (including the hexose, kynurenine, C0, C3, and some amino acids) showed higher levels with the increase of BMI of subjects, vice versa those grouped on the left part, as lyso PCs, glycine and most PC, showed lower level with the increase of BMI. Only the most significant metabolites in the Tobit models for BMI (FDR p-value < 0.0001) were considered and they were grouped with a cluster analysis. Dendrograms built with Euclidean distances related to the cluster analysis are reported above.  (Table S13). Some metabolic pathways involved in the metabolism of amino acids might be altered among individuals with different BMI, includ-ing valine, leucine, and isoleucine degradation; phenylalanine and tyrosine metabolism; aspartate metabolism. Further pathways altered might be oxidation of branched and very long chain fatty acids, ammonia recycling, carnitine synthesis, catecholamine biosynthesis, bile acid biosynthesis, and porphyrin metabolism. Figure 6. Plot relative to the pathway analysis displaying each altered pathway as a dot, ordered for pathway impact (x-axis and size) and negative logarithm (base 10) of the p-value (y-axis and color). The pathway analysis was performed with regressions between metabolites and the BMI of subjects, a GlobalTest was selected as pathway enrichment analysis, an out-degree centrality was chosen as pathway topology analysis, and the SMPDB Homo sapiens library was chosen as pathway library.

Discussion
In this work we described the application of a targeted metabolomic approach to a large cohort of subjects affected by overweight and obesity with the aim of describing their metabolomic profile. The main result of this study is the identification of several differences in terms of metabolite concentrations among subjects with different BMI.
Among metabolites significantly associated with BMI, the branched chain amino acids (BCAAs) valine, isoleucine, and leucine were positively associated. BCAAs have been reported to be associated with BMI in several studies [7][8][9][10][11][12][13][14][15][16]. The results of our study further confirm that an increased BMI is associated with increased levels of these compounds also among subjects with overweight and obesity. BCAAs has been suggested to be strongly connected with well-known consequences of obesity: insulin resistance and diabetes [17,18]. Some reasons for the increase of circulating BCAA have been postulated: one is the suppression of the enzymatic catabolism of BCAAs in the adipose tissue and liver of individuals with obesity [18,19] as, in particular, the insulin-induced impairment of branched-chain α-keto acid dehydrogenase (BCKD) [20]. Others have suggested that high circulating levels of BCAAs might be one of the causes of insulin resistance through the activation of the mammalian target of rapamycin (mTOR) signaling [17,21,22].
The aromatic amino acids tyrosine and phenylalanine were also strongly related with BMI; the first, in particular, was the most significant metabolite in our analysis being associated with BMI (FDR p-value < 0.0001, +29.8% of variation). Higher levels of aromatic amino acids in individual with higher BMI have been found in other studies [5,7,[11][12][13]16]. Tyrosine can originate from phenylalanine [23]. Similarly to what has been proposed for branched chain amino acids, also the mechanism proposed for the increased level of tyrosine is the BCKD inhibition [20].
We did not find higher concentrations of tryptophan in subjects with increased BMI, as opposed by some other studies [8,16,24], instead we observed an association between BMI and kynurenine, which is a metabolite of tryptophan [5], as also observed by other studies [13,25,26]. Indeed, alteration in kynurenine pathway has been related with BMI, insulin resistance, and with the low-grade systemic inflammation characteristic of obesity [5,27].
In addition, the hydrophobic amino acids alanine and proline were positively associated with BMI. Similar observations were reported in previous studies (for alanine [7,10,11,13]; for proline [7,10]). Other amino acids increased with BMI were glutamic acid (in agreement with other studies [9,10,16,25,26]) and lysine (as observed in previous reports [7,11]). Maltais-Payette and coworkers, in particular, found that glutamate is strongly associated with visceral adipose tissue [28]. Furthermore, the nonproteinogenic amino acid ornithine was higher and this association was observed also by Dunn et al. [7].
Amino acids negatively associated with BMI were the polar amino acids asparagine, glycine, histidine, serine, and citrulline. Some other studies also reported some of these negative associations [7,10], while these results were in disagreement with others: Oberbach et al. found a significant positive association with glycine [29].
Higher levels of the sum of hexose were found in individuals with higher BMI; this is not surprising since the main hexose in human blood is glucose, and glucose impairment related to metabolic syndrome and type 2 diabetes is one of the main consequences of obesity [30].
Considering biogenic amines, in addition to kynurenine, already discussed above, according with the Tobit regression models, aminoadipic acid and 4-hydroxyproline were positively associated while serotonin and creatinine were negatively associated to BMI. However levels of creatinine are not in agreement with Dunn et al., where serum creatinine levels were positively associated with BMI [7]. Some acylcarnitines were significantly higher in individuals with higher BMI: C0, C3, C2, and C5, some of which in agreement with other studies [11,12,14]. An incomplete betaoxidation of fatty acids may be the cause of the increased levels of acylcarnitines [5,18,31,32]. Furthermore, C3 and C5 are by-products and intermediates of the BCAAs isoleucine and leucine catabolism [5,33]. Indeed, C3 was strongly correlated with leucine and isoleucine.
Only a few PC aa increased with BMI (PC aa C38:3, PC aa C40:4, PC aa C32:1, PC aa C38:4, and PC aa C40:5) while many decreased. In agreement with our results, Ho et al. showed positive association with PC aa C38:3 and negative association with PC aa 38:6 [13]. Our results agreed with Bagheri and coworkers in finding higher levels of PC aa C32:1 and PC aa C38:3 in individuals with obesity [10]. Carayol et al., assessing the metabolomic profile in two cohorts, were in agreement with our results in finding a positive association with PC aa C32:1, PC aa C38:3, PC aa C38:4 and negative association with PC aa C42:0, PC aa C42:1, and PC aa C42:2 [25]. Conversely, Oberach et al. found higher level of PC aa 42:0 in subjects with obesity, while we found a significantly inverse association of this compound with BMI, furthermore they found PC aa 32:1 and PC aa 40:5 decreased in subjects with obesity while we found a positive association of these metabolites with BMI [29].
A possible reason for the decreased levels of lysoPCs and PCs in individuals with higher BMI might be related to an increase of their degradation. Indeed, Müller and coworkers reported that glycosylphosphatidylinositol-anchored proteins (GPI-AP) are released into circulation in rats and humans suffering from obesity and diabetes. As a protective mechanism to balance the deleterious effect of circulating GPI-AP, a serum glycosylphosphatidylinositol-specific phospholipase D is upregulated to degrade those proteins, thus reducing their circulating levels [35,36]. Since lysoPCs and PCs are associated with GPI-AP in micelle-like complexes, a similar increased lipolytic degradation could be postulated as an explanation of their reduced concentrations in individuals with higher BMI.
Among sphingomyelins, SM C18:1 and SM C16:1 were significantly associated with BMI. Ho et al. found positive association between BMI and these two compounds, finding also an association with SM 18:0, which was not significant in our population [13]; also Carayol et al. found a positive association with SM C18:0 [25]. Sphingomyelins significantly lower were SM C24:1, SM C16:0, SM C26:1, SM (OH) C22:2, SM (OH) C16:1, and SM (OH) C14:1. However, even though the evidence from this study is noteworthy considering the number of subjects considered, there is not a clear agreement across studies for most of the considered lipids, so most of them cannot be considered critical biomarkers of obesity, unless further validation.
Trabado and coworkers performed the same targeted metabolomic analysis in 800 healthy subjects in order to obtain reference values for the considered metabolites [37]. Since our population had higher BMI levels (interquartile rage of our population was 29.5-36.3 kg/m 2 , while in the healthy population was 21.7-25.4 for males and 20.4-24.0 kg/m 2 for females, respectively), we compared our results with the proposed reference values to observe differences in the metabolome between populations with such a different distribution of BMI ( Figures S7-S25). Some amino acids showed higher levels in our population than the proposed reference values: glutamic acid, ornithine, phenylalanine, tyrosine, leucine, lysine, valine, and alanine. These amino acids, indeed, were significantly higher in individuals with higher BMI in our Tobit models. Asparagine and arginine were considerably lower in our population than reference values; only asparagine was also negatively associated with BMI in our Tobit model, while the results for arginine were not significant. LysoPC a C26:0 and lysoPC a C28:0 were considerably higher in reference values than our population, while we found no significant differences. Interestingly, though, Carayol et al. did find a positive association between BMI and lysoPC a C28:0 [25]. It is noteworthy that most phosphatidylcholines, especially PC ae, were considerably lower than reference values. SM C18:0 and SM C18:1 were higher in our population than the reference values; among those SM C18:1 was actually significantly associated with BMI in the Tobit models while SM C18:0 was not, while Ho et al. and Carayol et al. did find a positive association for this metabolite [13,25]. SM C26:0 and SM C26:1 were considerably lower in this population than reference values, but only the latter was also significantly correlated with BMI in the Tobit model. However, a limitation should be considered when comparing our results with this reference values since sex was another variable consistently different among the two populations (82.1% of females in our population vs. 47.9% of females in the study of healthy volunteers). Finally, an overview of results and a comparison with the literature is reported in Supplementary Materials (Table S14).
Results of the cluster analysis performed on metabolites significantly associated with BMI, reported in the heat map, showed that a cluster composed by branched chain and aromatic amino acids, kynurenine, sum of hexose (H1), carnitine (C0), and propionylcarnitine (C3), PC aa 40:4, 38:3, 32:1, and 42:5 have higher metabolite concentrations in individuals with higher BMI; while the other cluster of metabolites (containing glycine, many PC ae, and lysoPC,) tends to have lower metabolite levels with subjects with higher BMI, consistently with the other statistical analyses performed. Interestingly, histidine and asparagine, while being negatively associated with BMI in the linear regressions, were part of the main cluster of positively associated metabolites. However, the heat map does not show a clear separation of colors and this indicates that the complexity of the metabolome, which is determined by a multitude of factors, cannot be related exclusively to the BMI. This can also be observed with the result obtained with the PCA visualization.
The pathway analysis reported suggests which metabolic pathways might be altered in individuals with different BMI. The main altered pathways are related to amino acid metabolic pathways, in particular to aromatic and branched chain amino acids; others are related to oxidation of fatty acids (due to the alteration of acylcarnitines) and, indeed, a defective fatty acid oxidation in subjects with obesity has been reported [38]. Other altered pathways evidenced are ammonia recycling (due to alterations of glycine, aspartic acid, asparagine, serine, and histidine), catecholamine biosynthesis (due to alterations of tyrosine), carnitine synthesis (due to alterations in carnitine, glycine, and lysine), porphyrin metabolism (due to alterations of glycine and alanine), bile acid biosynthesis (due to alterations of glycine, alanine, and taurine); nevertheless, the metabolites responsible for the supposed pathway alterations are only a few compared with the total number of compounds in each pathway and the impact is low (0.21 or less) [39]. It is worth mentioning that this tool has some limitations, as it did not identify any altered pathways concerning the several altered glycerophospholipids, maybe because this information was not present in the pathway library used. Moreover, it was adopted only to have an insight of the possible altered metabolic pathways, which require adequate confirmation with dedicated studies.
One of the main strengths of this study is the high number of subjects included (more than a thousand), which, not only have allowed to find several significant associations with BMI, but could have also helped to reduce the entities of underlined bias affecting the metabolome. Moreover, the implemented metabolomic analyses allowed a robust absolute quantification and a good interlaboratory reproducibility [40], therefore the results are suitable for comparisons among studies, as we did with the reference values proposed by Trabado and coworkers [37].
This study has some limitations: the metabolome is highly influenced by several factors and we took in consideration only a few: indeed, some of the considered metabolites might be influenced by diet. This bias was only partially mitigated with the fasting collection of blood. Moreover, most of the subjects were female (82%) and important metabolomic differences have been reported among subjects with different sex: higher levels of SMs and PCs in females and higher levels of creatinine and branched-chain amino acids in males [37,41], also confirmed by our results ( Figure S2); however, the Tobit linear regression models were corrected also for sex. Another limitation is the use of a targeted approach: even though a great number of metabolites was considered (more than 180), they do not cover the entire metabolome and other approaches, like untargeted metabolomics, could have evidenced alteration in other metabolites. Finally, the study aimed only to assess an association of the considered metabolites with the BMI of subjects, but other variables should be considered (as associations with metabolic complications of obesity). Furthermore, longitudinal analyses should be performed in order to find metabolites able to predict the onset of the negative consequences related to obesity.
In conclusion, the present study assessed the metabolomic profile using a validated targeted metabolomic approach on the SPHERE cohort, composed by subjects affected by overweight and obesity. The results obtained evidenced several biomarkers related to obesity, most of which are a further confirmation of the body of evidence already present (as for BCAAs, tyrosine, lyso PC a 18:1 and 18:2), while several others have been firstly evidenced, in particular metabolites belonging to the classes of phosphatidylcholines and sphingomyelins.

Study Subjects
The subjects were enrolled in the frame of the SPHERE ("Susceptibility to Particle Health Effects, miRNAs and Exosomes") project. Among the subjects of this cohort, 1391 had suitable data and plasma samples collected and were therefore included in this work. The study design and enrolment criteria were described previously [6]. Briefly, the study participants were recruited at the Center for Obesity and Work, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy from 2010 to 2015. The eligibility criteria were (1) age: 18 years or older; (2) BMI: equal or greater than 25 kg/m 2 ; (3) resident in Lombardy at the enrolment; (4) agreement to sign an informed consent and donate blood and urine samples. Exclusion criteria included previous diagnosis of cancer, heart disease, stroke, or other chronic diseases in the last year (such as multiple sclerosis, Alzheimer's disease, Parkinson's disease, depression, bipolar disorder, schizophrenia, and epilepsy). Weight and height were measured by a nurse following standardized procedures as part of the routine protocol. Biochemical parameters were also collected such as glycated haemoglobin. A questionnaire collected sociodemographic and lifestyle information, such as smoking status and medications use. The study was approved by the local Institutional Review Board (Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico review board). To reduce bias associated with obesity, we implemented peoplefirst language according to the standard recommendation of European Association for the Study of Obesity (EASO), The Obesity Society (TOS) and Obesity Canada (OC) [42][43][44].

Plasma Sample Collection
Blood was collected into EDTA tubes on the morning of recruitment (between 8 and 10 a.m.) and transported to the laboratory within 2 h of phlebotomy; about 7.5 mL of blood was centrifugated at 1300× g for 15 min at room temperature to obtain the plasma. The supernatant was stored in aliquots at −80 • C until use.

Metabolomic Analyses
The analysis of the subjects' metabolomic profile was conducted using an LC-MS/MS targeted metabolomic method using the AbsoluteIDQ ® p180 kit (Biocrates Life Sciences AG, Innsbruck, Austria) [45]. This tool allows a standardized assay for the determination of a total of 188 metabolites in plasma: 21 amino acids, 21 biogenic amines, the sum of hexose (H1), 40 acylcarnitines, 15 sphingolipids, and 90 glycerophospholipids. Among the latter, 14 are lysophosphatidylcholines (LysoPC), 38 are diacylphosphatidylcholine (PC aa), and 38 are acylalkylphosphatidylcholine (PC ae). The complete list of analyzed metabolites and the related abbreviations used in this article is reported in Table S1. The good interlaboratory reproducibility of this assay for the measurements of these metabolites in human plasma has been reported [40].
The instrumentation consisted of a high-pressure liquid chromatograph Agilent 1260 (Agilent Technologies, Cernusco Sul Naviglio, Italy) coupled with a hybrid triple quadrupole/linear ion trap mass spectrometer (QTRAP 5500; AB Sciex, Monza, Italy) with an electrospray ionization source. The assay was conducted following the indications contained in the AbsoluteIDQ ® p180 kit manual and the instructions of the Biocrates application support specialists. Briefly, the samples were loaded on 96-well plates already containing the isotopic labeled internal standards of lipids, along with the blank samples (phosphate buffer solution), a 7-points calibration curve, and three levels of plasma quality control samples (the medium level was repeated at least three times in each plate). From 78 to 80 plasma samples were loaded in each plate. Before adding the samples, each well of the plate was added with the solution containing the isotopically labeled internal standards of amino acids and biogenic amines, dried, added with 50 µL of a phenyl isothiocyanate solution (solubilized in water, ethanol, and pyridine) for derivatization of amino acids and biogenic amines, incubated at room temperature for 20 min, dried, added with 300 µL of 5 mL ammonium acetate in methanol as extraction solvent. The plate was then agitated and centrifuged at 500× g for 2 min. From each well, an aliquot was taken, transferred, and diluted in the analysis plate used for the LC-MS/MS method, while another aliquot was transferred and diluted in the plate for FIA-MS/MS analysis. The LC-MS/MS analysis was used for the quantitation of amino acids and biogenic amines, it consisted of a linear gradient of water (A phase) and acetonitrile (B phase) containing formic acid, using an Agilent Zorbax Eclipse XDB C18 (3.0 × 100 mm, 3.5 µm) (Agilent Technologies, Santa Clara, CA, USA) equipped with a guard-column SecurityGuard, C18 (4 × 3 mm) (Phenomenex, CA, USA), while the mass spectrometry operated in scheduled multiple reaction monitoring (s-MRM), in positive polarity. The Analyst ® software (version 1.6.3; Ab Sciex S.r.l, Milano, Italy) was used to prepare the sequence of analyses, the MultiQuant TM software (version 3.0.8664.0; Ab Sciex S.r.l, Milano, Italy) was used for the integration of chromatographic peaks; results were then elaborated with interpolation with the 7-points calibration curve and imported into the MetIDQ software (version 7.13.11-DB109-Nitrogen-2850; Biocrates Life Sciences AG, Innsbruck, Austria). All the lipid classes and the sum of hexose were analyzed within the FIA-MS/MS method. All the samples were directly injected in the mass spectrometer, which operated in multiple reaction monitoring, positive polarity, and quantified with a one-point calibration. The Analyst software was used for the preparation of analyses sequence while results were directly imported and elaborated with the MetIDQ software. For both methods (LC-MS/MS and FIA-MS/MS), the injection order was randomized. Precision and accuracy of the method were checked for each batch of analysis using the MetIDQ software. Limit of detections (LOD) were automatically calculated by MetIDQ for each plate through comparison with the blank sample. Even metabolite levels greater than the LOD but lower than the LLOQ (lower limit of quantification) or greater than the ULOQ (upper limit of quantitation) were calculated using the calibration curve in order to reduce the entity of missing data.

Statistical Analyses
Metabolomic data were batch-normalized using the MetIDQ software, with median values used for calculation of normalization factors. For each metabolite, data <LOD were replaced with the minimum LOD values among all plates; then, descriptive statistics were performed including mean ± standard deviation (SD), median, interquartile range, and extreme values. Only metabolites with at least 20% of observations greater than the LOD were considered for the following statistical analyses. The few missing values in metabolite concentrations were imputed using the k-nearest neighbors algorithm (k-NN) [46] with a value of k equal to 37 (about the square root of the total number of observations). Metabolite concentrations were log-transformed (base e) to ensure normal distribution and then standardized performing an auto-scale (each value was subtracted by the mean and divided by the standard deviation) to make the effects of BMI on the different metabolites comparable.
We used multivariable censored linear regression models (Tobit) to test the relationship between metabolite concentrations and subject characteristics. The Tobit model, a censored regression model, is designed to estimate linear relationships between variables when there is left-or right-censoring in the dependent variable [47,48]. In our case, metabolite concentrations lower than the limit of detections (LOD) were considered as the left-censored values. A Tobit linear regression model was estimated for each metabolite: the standardized natural logarithm of the metabolite concentrations was the dependent variable, while independent variables were age, sex (male = reference, or female), body mass index (BMI) (kg/m 2 ), and smoking status (non-smokers = reference, former smokers, or current smokers). We calculated standardized beta coefficients to estimate and compare the individual and independent effects of all predictors. The percent of variation (∆%) was also calculated through the formula: (exp(β) − 1) × 100, where β was the regression coeffi-cient representing the increase of the metabolite in relation to the one unit increase of the independent variable. The p-values were then adjusted for multiple testing by controlling the false discovery rate (FDR) according to the method of Benjamini and Hochberg [49]. A two-sided p-value of 0.05 was considered as statistically significant. These statistical analyses were performed with SAS software (version 9.4; SAS Institute Inc., Cary, NC, USA).
Data visualization was performed using R (R version 4.0.2, R Foundation, Vienna, Austria) [50] with the Rstudio interface (Version 1.3.959, RStudio Inc., Boston, MA, USA) and the package "tidyverse" [51]. Volcano plots of ∆% vs. −log 10 (FDR p-values) were used to display results of the Tobit analyses. Boxplots were built to visualize how the distribution of metabolite concentrations differs among different BMI classes, considering the 16 analytes with the lowest FDR p-values for BMI in the Tobit models. Network analyses were performed to visualize how the metabolites correlate each other: the packages "tidygraph" and "ggraph" [52,53] were implemented; metabolites were considered as nodes and correlation coefficients (r) obtained from each pair of metabolites as edges; the Fruchterman-Reingold force-directed layout algorithm was used and edge weights were set on the value of r; only statistically significant correlations with r > 0.4 and 0.7 were considered and metabolites with no connection were removed: these cut-offs were chosen to have an optimal overview of most of the considered metabolites (r > 0.4) and to have a better separation and visualization of glycerophospholipids and sphingolipids (r > 0.7). A principal component analysis (PCA) was performed with "ggfortify" package [54] to visualize if overall metabolites variation was related to the different classes of BMI. Implementing the "ape" package [55], cluster analysis was used for building dendrograms with Euclidean distances among metabolites. A heat map was built considering only the most significant metabolites (FDR < 0.0001) in the Tobit regressions, which were grouped by a cluster analysis, and the subjects, sorted by BMI.
Finally, a pathway analysis was performed in order to have an insight of the altered metabolic pathways among subjects using the web-based tool MetaboAnalyst [39]. An HMDB code [56] was assigned to each metabolite as long as it was available, a regression was performed to compare data with BMI of subjects, a GlobalTest was selected as pathway enrichment analysis, an out-degree centrality was chosen as pathway topology analysis, and the small molecule pathway database (SMPDB) Homo Sapiens library [57] was chosen as pathway library.
Supplementary Materials: The following are available online at https://www.mdpi.com/2218-1 989/11/4/194/s1, Figure S1: Volcano plot representing the results of the Tobit linear regression models considering the metabolites (dependent variables) in relation to age (independent variable), adjusted for BMI, sex, and smoking habit. Each dot represents a metabolite and are displayed based on the % (∆% = (exp(β) − 1) × 100) (x-axis) and the negative logarithm (base 10) of the FDR p-value (y-axis). The upper dashed line represents an FDR p-value equal to 0.0001, while the lower dashed line represents an FDR p-value equal to 0.05., Figure S2: Volcano plot representing the results of the Tobit linear regression models considering the metabolites (dependent variables) in relation to sex (independent variable), adjusted for BMI, age, and smoking habit. Each dot represents a metabolite and are displayed based on the % variation in female vs. male (∆% = (exp(β) − 1) × 100) (x-axis) and the negative logarithm (base 10) of the FDR p-value (y-axis). The upper dashed line represents an FDR p-value equal to 0.0001, while the lower dashed line represents an FDR p-value equal to 0.05, Figure S3: Volcano plot representing the results of the Tobit linear regression models considering the metabolites (dependent variables) in relation to the smoking habit (independent variable), adjusted for BMI, age, and sex. Each dot represents a metabolite and are displayed based on the % variation in smokers vs. non-smokers (∆% = (exp(β) − 1) × 100) (x-axis) and the negative logarithm (base 10) of the FDR p-value (y-axis). The dashed line represents an FDR p-value equal to 0.05, Figure S4: Volcano plot representing the results of the Tobit linear regression models considering the metabolites (dependent variables) in relation to the smoking habit (independent variable), adjusted for BMI, age, and sex. Each dot represents a metabolite and are displayed based on the % variation in former smokers vs. nonsmokers (∆% = (exp(β) − 1) × 100) (x-axis) and the negative logarithm (base 10) of the FDR p-value (y-axis). The dashed line represents an FDR p-value equal to 0.05, Figure S5: Results of the principal component analysis, showing the relationship between principal component 1 (x-axis) and principal component 2 (y-axis). Each dot represents a subject and the color represents the BMI group that subject belonged, Figure Table S1: Complete information about all the 188 considered metabolites, including the abbreviations used, the HMDB codes, and the CAS registry numbers, when available, Tables S2-S7: Results obtained from the analyses of the targeted metabolites. Results are reported as mean, standard deviation, 25th and 75th percentile (interquartile range), minimum and maximum (extreme values). For each compound, the limit of detection is reported, along with the number of subjects with quantifiable concentrations, Tables S8-S12: Complete results of the Tobit linear regression models. The dependent variables were the log transformed and standardized concentrations of the metabolites, while independent variables were BMI, age, sex, and smoking habit. The results for each independent variable are reported in each sheet, including the slope (beta), the standard error (SE), the p-value (Pvalue), the false discovery rate p-value (FDR_Pvalue), the negative logarithm (base 10) of the FDR p-value (negative_log10fdr), the variation percentage calculated with the formula (exp(β) − 1) × 100), (variation_perc), and the number of censored values (<LOD) (Nlowerbound), Table S13: Complete results of the pathway analysis, performed with regressions between metabolites and the BMI of subjects, GlobalTest as pathway enrichment analysis, out-degree centrality as pathway topology analysis, and SMPDB Homo sapiens library as pathway library. Results include total number of compounds in the pathway (Total Cmpd), number of our considered metabolites for that pathway (Hits), p-value (Raw p), the negative logarithm (base 10) of the p-value (-LOG10(@p)), Holm-Bonferroni adjusted p-value (Holm adjust), the false discovery rate p-value (FDR), and the impact (impact), Table S14: Summary information about findings of our study, including the comparison with reference values reported by Trabado et al., the percentage of variation identified in the Tobit models, and a comparison with other studies. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.