MetSCORE: a molecular metric to evaluate the risk of metabolic syndrome based on serum NMR metabolomics

Background Metabolic syndrome (MetS) is a cluster of medical conditions and risk factors correlating with insulin resistance that increase the risk of developing cardiometabolic health problems. The specific criteria for diagnosing MetS vary among different medical organizations but are typically based on the evaluation of abdominal obesity, high blood pressure, hyperglycemia, and dyslipidemia. A unique, quantitative and independent estimation of the risk of MetS based only on quantitative biomarkers is highly desirable for the comparison between patients and to study the individual progression of the disease in a quantitative manner. Methods We used NMR-based metabolomics on a large cohort of donors (n = 21,323; 37.5% female) to investigate the diagnostic value of serum or serum combined with urine to estimate the MetS risk. Specifically, we have determined 41 circulating metabolites and 112 lipoprotein classes and subclasses in serum samples and this information has been integrated with metabolic profiles extracted from urine samples. Results We have developed MetSCORE, a metabolic model of MetS that combines serum lipoprotein and metabolite information. MetSCORE discriminate patients with MetS (independently identified using the WHO criterium) from general population, with an AUROC of 0.94 (95% CI 0.920–0.952, p < 0.001). MetSCORE is also able to discriminate the intermediate phenotypes, identifying the early risk of MetS in a quantitative way and ranking individuals according to their risk of undergoing MetS (for general population) or according to the severity of the syndrome (for MetS patients). Conclusions We believe that MetSCORE may be an insightful tool for early intervention and lifestyle modifications, potentially preventing the aggravation of metabolic syndrome. Supplementary Information The online version contains supplementary material available at 10.1186/s12933-024-02363-3.


Supplement S1. Sub-cohorts descrip5on.
AKRIBEA sub-cohort 1 consists of urine and serum samples from individuals from the working general populaXon of the Basque Country.They were collected between 2019 and 2023 in overnight fasXng condiXons and during the annual medical test.People parXcipaXng in this study were males and females between 18 and 67 years old.The only exclusion criterium was to have suffered a serious illness like cancer or ictus in the 3 months preceding the sample collecXon.Biochemical data, lifestyle habits, and medicaXon were also recollected as general characterisXcs from all the donors.Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.All samples were provided by the Basque Biobank for research (BIOEF).Ethics approval number: CEIC-E 19-13.
OSARTEN sub-cohort 2 consists of urine and serum samples from individuals from the working general populaXon of the Basque Country.They were collected between 2017 and 2018 in overnight fasXng condiXons and during the annual medical test.People parXcipaXng in this study were males and females between 19 and 66 years old.The only exclusion criterium was to have suffered a serious illness like cancer or ictus in the 3 months preceding the sample collecXon.Biochemical data, lifestyle habits, and medicaXon were also recollected as general characterisXcs from all the donors.Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.All samples were provided by the Basque Biobank for research (BIOEF).Ethics approval number: CEIC-E 16-114.
LIVER-BIBLE sub-cohort 3 consists of serum samples from individuals from Milan with metabolic dysfuncXon, who were consecuXvely enrolled from 2019 to 2022.These were apparently healthy blood donors, aged 40-65 years, who were selected for a comprehensive liver disease, metabolic and cardiovascular screening, owing to the presence of at least three metabolic risk abnormaliXes.Individuals with chronic degeneraXve diseases (such as advanced kidney disease, cirrhosis, or acXve cancer), except for well-controlled arterial hypertension, treated hypothyroidism and well-compensated type 2 diabetes (T2D) not requiring pharmacotherapy (except for me`ormin), were excluded from the cohort.Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.Ethics approval number: CE 125_2018bis.MET+ sub-cohort consists of serum samples from individuals from Navarra and Madrid who have diabetes and one or more of the other risk factors.Samples were collected before 2022 from donors aged 31-90 years.These samples were selected and obtained for research purposes from both the Madrid and Navarra biobanks to increase the number of serum sample cases potenXally indicaXve of metabolic syndrome.Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.Ethics approval number: CEIC-E 21-199.AGEPORTUGAL sub-cohort 4 consists of serum and urine samples from individuals belonging to a study on aging in the Portuguese populaXon, conducted in geriatric centres.Samples were collected between 2017 and 2019 in overnight fasXng condiXons at the nursing homes.Individuals included in this cohort comprehend ages between 65 and 94 years old.Sociodemographic data, cogniXve examinaXon, comorbidiXes, medicaXon and cholesterol, glucose and triglycerides data were collected for all individuals.Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.Individuals that suffered from cancer or ictus prior the sample collecXon were excluded.Ethics approval number: CE-UBI-Pj-2017-012.
BPH validaXon sub-cohort 5 consists of serum samples from individuals with benign prostaXc hyperplasia recruited in Basurto University Hospital, samples accrued in the urology service before 2021 and managed by the Basque Biobank for Research (BIOEF).Blood serum samples were collected, processed and frozen in an interval of maximum 2 hours and 4 hours for urine samples.Ethics approval numbers: CEIC-E 11-12, 14-14 and 19-20.BIOPERSMED validaXon sub-cohort 6 consists of serum samples from individuals parXcipaXng in an observaXonal study (Biomarkers for Personalized Medicine) in Graz/Austria to evaluate novel biomarkers in cardiovascular and metabolic diseases.To all individuals was obtained data regarding sociodemographic characterisXcs, biochemistry data and comorbidiXes (i.e.hypertension, diabetes, and dyslipidaemia).Blood serum samples were immediately aliquoted and stored at the biobank of the Medical University of Graz (Austria) at -80°C.Ethics approval number: 24-224 ex 11/12.TÜBINGEN validaXon sub-cohort 7 consists of serum samples from individuals parXcipaXng in a study of anXhypertensive medicaXons; the paXent recruitment took place in Tübingen, Germany.Since the data were kept in Würzburg, Germany, there is a second ethics vote with the number (52/18).The process of collecXng urine and serum samples has been the same for all sub-cohorts, following the same Standard OperaXng Procedures (SOPs).Following the principles of the DeclaraXon of Helsinki, all individuals provided informed consent for clinical research, with the consequent evaluaXon and approval of the corresponding ethics commikees.To protect paXent confidenXality, all data has been double codified.Ethics approval numbers: 141/2018BO2 and 52/18.Supplement S2.Brief Insight: Kohonen Self-Organizing Maps.
Self-Organizing Maps (SOM), also known as Kohonen maps, are a type of arQficial neural network originally created by Teuvo Kohonen in the 1980s 8,9 .It is used for unsupervised learning, and it serves as a technique for dimensional reducQon, transforming high-dimensional data into a lower-dimensional representaQon while preserving the topological properQes of the input space.Picture a network of neurons, each represenQng a specific region within the input space.Throughout training, SOM adjusts these neurons based on input similarity, allowing similar data points to be mapped closer together on the grid.This process yields a simplified representaQon of complex data, unveiling underlying paoerns and relaQonships.
A significant advantage of Kohonen maps in clustering is their ability to organize data based on similarity without the need for labelled informaQon.They efficiently group similar data points, aiding in data exploraQon and classificaQon.Moreover, SOM offers advantages in informaQon representaQon by creaQng a visual map that reflects the input data's structure.This visualizaQon allows for intuiQve interpretaQon and understanding of complex datasets, enabling researchers and analysts to idenQfy trends, outliers, and relaQonships that might not be evident in the original high-dimensional data.
There are several examples of the use of SOM in literature, both in metabolomics and other scienQfic disciplines [10][11][12][13][14][15] .In summary, SOM serves as a powerful tool for data clustering by organizing data into similaritybased groups and providing a detailed representaQon of informaQon, thus facilitaQng data exploraQon and comprehension.

Supplement S3. Ra5onale of employing O-PLS-DA.
Orthogonal ParQal Least Squares Discriminant Analysis (O-PLS-DA) 16,17 stands as a pivotal technique in mulQvariate modelling, specifically addressing the interpretability and predicQve complexiQes within datasets, parQcularly in fields like metabolomics.One of its key strengths lies in striking a delicate balance between interpretability and predicQve performance, surpassing convenQonal methods like logisQc regression in certain aspects.
When crafing a predicQve model, interpretability emerges as a pivotal requirement.While logisQc regression might seem to offer a straighsorward interpretaQve approach, O-PLS-DA not only matches its interpretaQve strength but also surpasses it by intricately modelling the intricate relaQonship between predictors, notably metabolic variables, and the outcome, such as the presence or absence of metabolic syndrome.This is crucial given the inherent mulQcollinearity prevalent in metabolomic data, a challenge effecQvely addressed by O-PLS-DA.By disentangling the predicQve variaQon from the uncorrelated systemaQc variaQon, O-PLS-DA excels in isolaQng essenQal metabolic components contribuQng to group disQncQons, thereby significantly enhancing interpretability, especially in the context of intricate and interconnected metabolic data.
Moreover, what disQnguishes O-PLS-DA from its non-orthogonal counterpart, PLS-DA, is its ability to segregate informaQon.This segregaQon proves immensely beneficial in discriminaQng between various groups, allowing a sharper focus on the metabolic components driving these disQncQons.Furthermore, O-PLS-DA extends beyond mere classificaQon; it integrates a predicQve element that not only discerns between classes but also provides a conQnuous measure, signifying progression across these classes.This feature is parQcularly valuable, offering insights into the nuanced changes and developments within and between groups, a facet not readily achievable through tradiQonal classificaQon methods.
Several examples of O-PLS-DA usage can be found in literature, parQcularly within the field of metabolomics 18- 22 .In essence, O-PLS-DA emerges as a comprehensive and powerful tool in mulQvariate analysis, offering a nuanced understanding of complex relaQonships within datasets, especially in metabolomics, by merging interpretability with predicQve capabiliQes and effecQvely handling mulQcollinearity to disQl essenQal metabolic insights for classificaQon and progression evaluaQon.

Supplement S4. Evalua5on of poten5al impact of medica5on on MetSCORE.
To assess the potenQal effect of medicaQon on MetSCORE, we iniQally selected all sub-cohorts with medicaQon informaQon records.Consequently, two new sub-cohorts were formed: one comprising individuals taking some form of medicaQon and another consisQng of individuals who have not reported taking any medicaQon.The posiQve effect on MetSCORE, while staQsQcally significant (P-value < 7.03e-11), is very small, merely 0.019 units.

Figure S1 .
Figure S1.Comparison of raw quan5fica5ons (MetS_WHO vs. other).RepresentaQon in boxplot format of the distribuQon of quanQficaQon values of selected variables for the metabo/lipo-serum predicQve model, segregated based on whether they belong to individuals classified with metabolic syndrome according to WHO criteria, compared to the remaining individuals.QuanQficaQon units are mmol/L for metabolites and mg/dL for lipoproteins.Outliers have been omioed in the boxplots to enhance visualizaQon.

Figure S2 .
Figure S2.Clustering of serum lipoproteins.Panel a show a dendrogram generated from the hierarchical clustering analysis performed on serum lipoprotein values.The red rectangles represent the groups formed when the dendrogram tree is cut at a height of h=0.5.Those clusters are represented in panel b, which is a scores plot from a PCA of variables.

Figure S4 .
Figure S4.Clustering of serum metabolites.Panel a show a dendrogram generated from the hierarchical clustering analysis performed on serum metabolite values.The red rectangles represent the groups formed when the dendrogram tree is cut at a height of h=0.5.Those clusters are represented in panel b, which is a scores plot from a PCA of variables.For a quick reference, clusters with more than one component receive these names: Ala-Tyr (Alanine+Tyrosine); Iso-Leu-Val (Isoleucine+Leurine+Valine); Lact-Pyru.acids(LacQc acid+Pyruvic acid); and Aceto-3Hydroxybut (Acetone+3-Hydroxybutyric acid).

Figure S5 .
Figure S5.Clustering of urine bins.Panel a show a dendrogram generated from the hierarchical clustering analysis performed on urine bin values.The red rectangles represent the groups formed when the dendrogram tree is cut at a height of h=0.85.Those clusters are represented in panel b, which is a scores plot from a PCA of variables.

Figure S6 .
Figure S6.Heatmap represen5ng the univariate analysis conducted for each metabolic syndrome profile compared to the asymptoma5c profile for the metabo_serum dataset.The colors indicate the direcQon of change observed in the profile relaQve to the asymptomaQc profile: red for posiQve and blue for negaQve.The intensity of the color refers to the amount of change in standard deviaQon units.If the change is staQsQcally significant, it is indicated by asterisk symbols (*: adjusted p-value less than 0.05; **: p-value < 0.01; ***: p-value < 0.001; ****: p-value < 0.0001).Both the profiles and variables are clustered in a dendrogram obtained through hierarchical clustering.

Figure S7 .
Figure S7.Heatmap represen5ng the univariate analysis conducted for each metabolic syndrome profile compared to the asymptoma5c profile for the lipo_serum dataset.The colors indicate the direcQon of change observed in the profile relaQve to the asymptomaQc profile: red for posiQve and blue for negaQve.The intensity of the color refers to the amount of change in standard deviaQon units.If the change is staQsQcally significant, it is indicated by asterisk symbols (*: adjusted p-value less than 0.05; **: p-value < 0.01; ***: p-value < 0.001; ****: p-value < 0.0001).Both the profiles and variables are clustered in a dendrogram obtained through hierarchical clustering.

Figure S8 .
Figure S8.Metabolic syndrome model for the combined_serum/urine dataset based on O-PLS-DA combining urine and serum metabolomic data.a) scores plot with the predicQve component on the X axis and the orthogonal component on the Y axis.The green dots represent individuals who do not have metabolic syndrome according to their metadata and WHO criteria, while the red triangles represent those who are classified as having metabolic syndrome.b) ROC curve showing the area under the curve for the final model along with its 95% confidence interval.It also indicates the sensiQvity and specificity for the selected cutoff based on the Youden index.The dashed horizontal line shows the threshold selected using the Youden index from the ROC curve.c) Cartoon showing the most influenQal variables in the model; the size of the bar indicates their relaQve influence, while the color indicates whether they are elevated or not in metabolic syndrome.

Figure S9 .
Figure S9.Projec5on of individuals from each metabolic syndrome profile onto the scores plot of the final O-PLS-DA metabo/lipo_serum model.The ellipse indicates the region where 95% of individuals are located.The profiles are ordered according to their average predicQve component.Profiles with more than 50% of their individuals to the right of the verQcal threshold are colored orange.

Figure S10 .
Figure S10.Metabolic syndrome model using the limited serum dataset: excluding Glucose, HDLs and LDLs.a) scores plot with the predicQve component on the X axis and the orthogonal component on the Y axis.The green dots represent individuals who do not have metabolic syndrome according to their metadata and WHO criteria, while the red triangles represent those who are classified as having metabolic syndrome.b) ROC curve showing the area under the curve for the final model along with its 95% confidence interval.It also indicates the sensiQvity and specificity for the selected cutoff based on the Youden index.The dashed horizontal line shows the threshold selected using the Youden index from the ROC curve.c) Cartoon showing the most influenQal variables in the model; the size of the bar indicates their relaQve influence, while the color indicates whether they are elevated or not in metabolic syndrome.

Figure S11 .
Figure S11.A plot with the evolu5on of the mean quan5fica5on values from the significant parameters of the MetSCORE.EvoluQon goes from low MetSCORE level (0-0.2) to high level (0.8-1).Each point represents the mean value, while standard errors are included as verQcal lines.QuanQficaQon units are mmol/L for metabolites and mg/dL for lipoproteins.

Figure S12 .
Figure S12.Performance of MetSCORE on the valida5on cohort.ROC curve in panel a show the area under the curve for the validaQon cohort along with its 95% confidence interval.It also indicates the sensiQvity and specificity for the midpoint of the score.Panel b shows the distribuQon of those individuals with MetSCORE above 0.5; it is clearly shown that, despite the reduced specificity (at the expense of increasing sensiQvity to nearly 100%), a significant difference is maintained between the values obtained by individuals classified as having metabolic syndrome by the WHO.
The following table summarizes the number of individuals in each case and sub-cohort:Using linear regression analysis, with MetSCORE as the dependent variable and the condiQon of taking medicaQon or not as the independent variable, controlling for gender, age, and metabolic syndrome profile, we determine the average effect that taking medicaQon has on MetSCORE.The following is a summary of the linear regression analysis, with the coefficient associated with taking medicaQon highlighted in bold:

Table S8 . List of metabolites and lipoproteins that are quan5fied in serum samples with Bruker IVDR so^ware.
SomeQmes lipoproteins are referred to by their short code, which is also indicated.