Introduction

Metabolic syndrome (MetS) is mainly characterized by a clustering of metabolic dysfunctions, including impaired glucose tolerance, dyslipidemia, hypertension, and central obesity1. These metabolic dysfunctions, also referred to as MetS components, are considered the interrelated risk factors of cardiovascular diseases (CVDs)2,3, which have been a primary cause of death and disability worldwide. Therefore, MetS has been used as a biomarker that can predict the future risk for CVDs; for example, individuals with MetS may have a 2-fold risk of developing CVDs over the next 10 years than those without MetS4, and 4.1 times higher risk of progressing to Type 2 diabetes3.

According to the diagnosis criteria of Adult Treatment Panel III (ATP-III)2, MetS is usually defined as the presence of three or more of the five common metabolic dysfunctions, i.e., elevated waist circumference (WC), elevated triglycerides (TG), reduced high-density lipoprotein cholesterol (HDL-C), elevated blood pressure (BP), and elevated fasting blood glucose (FBG). However, such a dichotomous definition for MetS, although helpful to identify certain populations at disproportionate future risk for CVDs5,6, cannot reflect the severity of MetS, which has challenged its applications in tracking the progression of CVD risk and hence preventing CVDs. Previous studies have developed a continuous MetS score in western countries to determine the MetS severity, which was considered a compelling means of motivating patients that may feel empowered by a decrease in score7,8. Nevertheless, to the knowledge of the authors, there have not been such efforts in eastern countries where the components and progression of MetS may differ due to different genetic, demographic, socioeconomic, and lifestyle characteristics. This is particularly important in large eastern countries, such as China, which has a large population and has been facing a growing burden of CVDs9. It is therefore urged to develop a continuous MetS score on the basis of the Chinese population, to strengthen CVD prevention among about one-fifth of the world population.

In addition, as our understanding of MetS advances, some recent epidemiological studies have demonstrated different associations between MetS components and CVD risks across demographics, such as sex, age, and ethnicity10,11,12. Such variations have been completely hidden in the one-for-all MetS definition, which may in particular affect the accurate assessment of MetS severity in China, in which about one-tenth of the population are minorities. Therefore, the consideration of demographic variations in the MetS definition would be crucial for personalizing CVD prevention strategies.

To fill the two aforementioned gaps, we developed a continuous, age-sex-ethnicity-specific MetS score to assess the severity of MetS in China on the basis of the China Multi-Ethnic Cohort (CMEC), which is the largest multi-ethnic prospective cohort study in China, with about 44% of the participants being minorities13. To demonstrate the clinical usefulness of the MetS score for indicating the future risk for CVDs, we validated it in a prospective cohort study on the basis of the CMEC follow-up dataset. Also, to ensure the robustness of the MetS score for broader clinical use in all demographic subpopulations, especially in minorities, we further validated it in the three independently established cohorts of 134,403 residents in Yunnan, Hubei, and Fujian Province of China (Fig. 1). The findings of this study would provide a better understanding of the MetS severity and CVD risk in the Chinese. Also, an online MetS score calculator was developed and made available to the public, which can facilitate the use of the MetS score and this MetS severity assessment tool in clinical practice. The new knowledge and assessment tools resulting from this study are expected to improve individual-level CVD risk prediction and population-level CVD prevention and management in China and, more broadly, in other eastern countries.

Fig. 1: An overview of the study design.
figure 1

AUC area under the receiver operating characteristic (ROC) curve, BP blood pressure, CMEC China Multi-Ethnic Cohort, CVD cardiovascular disease, FBDS Fujian Behavior and Disease Surveillance cohort, FBG fasting blood glucose, HDL-C high-density lipoprotein cholesterol, HBDS Hubei Behavior and Disease Surveillance cohort, MAP mean arterial pressure, MetS metabolic syndrome, TG triglycerides, WC waist circumference, YBDS Yunnan Behavior and Disease Surveillance cohort. Reprinted from the National Geographic Information Resources Directory Service System https://www.webmap.cn/commres.do?method=result100W.

Results

Development of the MetS score

Of the 77,639 participants at CMEC baseline, the mean age was 50.2 ± 11.1 years, with 39.0% (30,295) being male and 42.3% (32,857) being minorities (Table S1). The prevalence of MetS was 19.4% and varied across demographic subgroups, e.g., higher in those aged ≥60 than aged <60 (24.2% vs. 18.1%), in males than females (31.0% vs. 17.0%), and in minorities than Hans (19.7% vs. 19.1%) (Table 1).

Table 1 The prevalence of metabolic syndrome (MetS) and levels of the MetS components among the China Multi-Ethnic Cohort (CMEC) participants, by sex, age, and ethnicity

Overall, TG had the highest factor loadings (0.75) among the five MetS components, followed by HDL-C (0.59), WC (0.47), MAP (0.30), and FBG (0.25) (Table 2). There were significant differences in the factor loadings of the MetS components across the eight subgroups, except for FBG (P = 0.115). TG had higher factor loadings in the Hans (ranging from 0.74–0.84) than minorities (0.58–0.78), especially with sex and age being equal. Similarly, HDL-C had higher factor loadings in the Hans (0.60–0.74) than minorities (0.41–0.65), also in the old minorities (0.62–0.65). WC had generally higher factor loadings in males (0.46–0.56) than females (0.34–0.40), especially in male minorities (0.55–0.56). Contrary to TG and HDL-C, MAP had higher factor loadings in the minorities (0.20–0.31) than Hans (0.11–0.27), and such disparities were more apparent in old people (0.27 vs 0.15 in males, and 0.20 vs. 0.11 in females). FBG had higher factor loadings in the young (0.20–0.27) than old (0.13–0.22), especially in the minorities (0.20 vs. 0.13 in males, and 0.27 vs. 0.13 in females). Model 2 with varying factor loadings of the MetS components across the demographic subgroups outperformed Model 1, resulting in a smaller overall AIC and some age-sex-ethnicity-specific submodels with a better fit to the data, such as for Hans (especially old Hans) and old minorities.

Table 2 Factor loadings of the metabolic syndrome (MetS) components and model fit assessments

The equations fitted for the overall population and age-sex-ethnicity-specific subpopulations were presented (Table 3), for calculating a given individual’s MetS score. An online tool was developed and made available to the public, allowing the easy clinical use by entering actual values of the MetS components to the equations (https://gisersqdai.shinyapps.io/mets_severiity_calculator/).

Table 3 Equations of the age-sex-ethnicity-specific metabolic syndrome (MetS) scores

Performance assessment of the MetS score

In the CMEC baseline, each unit increase in the MetS score was associated with an increased risk for all CVD-related risk factors, including hyperlipidemia (OR = 1.30 [1.29, 1.32]), diabetes (OR = 1.05 [1.05, 1.05]) and hypertension (OR = 1.09 [1.07, 1.11]), and also an elevated level of all CVD-related risk markers, including HbA1c (β = 0.17 [0.10, 0.17]), CHOL (β = 0.24 [0.23, 0.24]), BMI (β = 1.71 [1.69, 1.73]), and SUA (β = 26.21 [25.70, 26.72]) (Table S2). There were heterogeneities in these associations among the age-sex-ethnicity-specific subgroups. For example, the increase of the risk for hyperlipidemia per unit increase in the MetS score was smallest among old female minorities (OR = 1.20 [1.12, 1.29]) and largest among young male Hans (OR = 1.34 [1.31, 1.38]); similarly, the increase of the CHOL level per unit increase in the MetS score was smallest among old female minorities (β = 0.05 [0.01, 0.08]) and largest among old male minorities (β = 0.30 [0.29, 0.31]). Some other patterns included that the increase of the BMI level per unit increase in the MetS score was larger among the older in the Hans, but among the younger in the minorities.

In the CMEC follow-up (Table S3), each unit increase in the MetS score increased the rate of experiencing hyperlipidemia by 72% (HR = 1.72 [1.69, 1.75]), diabetes by 61% (HR = 1.61 [1.45, 1.78]), and hypertension by 42% (HR = 1.42), and also elevated the levels of HbA1c (β = 0.16 [0.14, 0.18]), CHOL (β = 0.18 [0.17, 0.20]), BMI (β = 1.62 [1.56, 1.68]), and SUA (β = 24.94 [23.73, 26.16]) (Table 4). Heterogeneities in these associations were observed among the age-sex-ethnicity-specific subgroups. For example, among all age-sex-ethnicity-specific subgroups, the increases of the risk for hyperlipidemia (HR = 2.13 [1.80, 2.53]) and diabetes (HR = 2.04 [1.40, 2.96]) per unit increase in the MetS score were largest in old male minorities; the increases of the risk for diabetes (HR = 1.41 [1.19, 1.65]) and hypertension (HR = 1.12 [1.03, 1.22]) per unit increase in the MetS score were smallest in old female Hans. Some other patterns included that the increases of the risks for hyperlipidemia and diabetes and of the SUA level per unit increase in the MetS score were generally larger among the younger than their older counterparts. The rates of having the CVD-related risk factors and risk markers increased as the quartiles of both the overall and age-sex-ethnicity-specific MetS scores, with the highest HRs observed in the 4th quartile of the MetS score for all groups (Table 5). The performance of the overall MetS score had moderate accuracy in predicting the CVD-related risk factors (C-index ranging from 0.71-0.80) and risk markers (C-index ranging from 0.75-0.76) except in predicting CHOL (C-index=0.65) (Table S4). The predictive performance of the MetS score in most age-sex-ethnicity-specific subgroups also had moderate accuracy, with the highest accuracy observed when predicting the elevated BMI among old male minorities (C-index=0.90).

Table 4 Associations between the metabolic syndrome (MetS) score and the cardiovascular disease (CVD)-related risk factors and risk markers in the China Multi-Ethnic Cohort (CMEC) follow-up survey
Table 5 Associations between the metabolic syndrome (MetS) score in quartiles and the cardiovascular disease (CVD)-related risk factors and risk markers in the China Multi-Ethnic Cohort (CMEC) follow-up survey

Comparison with the traditionally defined MetS

The overall and age-sex-ethnicity-specific MetS scores, after being converted into the dichotomous MetS variables, indicated the traditionally defined MetS with high accuracy except in the old Hans (both male and female) with highly acceptable accuracy (Fig. 2). The AUC value was 0.917 for the overall group, and ranged from 0.900 to 0.932 for the age-sex-ethnicity-specific subgroups other than the old male Hans (0.882) and old female Hans (0.869).

Fig. 2: Receiver operating characteristic (ROC) curves for the overall and age-sex-ethnicity-specific metabolic syndrome (MetS) scores and the traditionally defined MetS in the China Multi-Ethnic Cohort (CMEC) dataset.
figure 2

AUC area under the ROC curve.

In comparison with the traditionally defined MetS, the dichotomous overall and age-sex-ethnicity-specific MetS scores showed higher sensitivity and AUC in predicting the presence of ≥1 CVD-related risk factor and ≥1 abnormal risk marker in the CMEC baseline (all p values < 0.05), particularly in young male Hans (sensitivity = 0.61, AUC = 0.77), but showed the lower specificity (all p values < 0.05) (Fig. 3, Table S5). The only exceptions were old female (both Hans and minorities) showing a comparable AUC between the traditionally defined MetS and the dichotomous MetS score.

Fig. 3: Comparison of the traditionally defined metabolic syndrome (MetS) and the dichotomous MetS score in detecting the cardiovascular disease (CVD)-related risk factors and risk markers in the CMEC dataset.
figure 3

The CVD-related risk factors included hypertension, diabetes, and dyslipidaemia; the CVD-related risk markers included elevated glycated hemoglobin (HbA1c), cholesterol, body mass index, and serum uric acid. *Significance (p < 0.05) for the differences in the capacity of detecting the CVD-related risk factors and risk markers by the traditionally defined MetS and the dichotomous MetS score, with the paired Chi-square test used for sensitivity and specificity and the Delong’s test used for area under the curve (AUC). All tests are two-sided. The error bars indicate 95% confidence intervals of values of sensitivity, specificity, and AUC (see Table S5 for all specific values).

Performance assessment of the MetS score in three external dataset

The YBDS participants included about 69% aged <60, 45% being male, and 42% being minorities (Table S6). Each unit increase in the MetS score was associated with an increased risk for all CVD-related risk factors, including hyperlipidemia (OR = 1.31 [1.31, 1.32]), diabetes (OR = 1.08 [1.07, 1.08]), and hypertension (OR = 1.14 [1.13, 1.14]), and also an elevated level of all CVD-related risk markers, including HbA1c (β = 0.22 [0.21, 0.23]), CHOL (β = 0.19 [0.18, 0.20]), BMI (β = 2.11 [2.07, 2.15]), and SUA (β = 24.57 [23.65, 25.48]) (Table S7). Heterogeneities in these associations also existed among the age-sex-ethnicity-specific subgroups. For example, the increase of the risk for hyperlipidemia per unit increase in the MetS score was smallest among old female minorities (OR = 1.21 [1.19, 1.24]) and largest among young male Hans (OR = 1.38 [1.36, 1.40]), which were consistent with the patterns observed in the CMEC dataset. The risks of having the CVD-related risk factors and risk markers increased as the quartiles of both the overall and age-sex-ethnicity-specific MetS scores, with the highest ORs observed in the 4th quartile of the MetS score for all groups (Table S8).

The overall and age-sex-ethnicity-specific dichotomous MetS scores indicated the traditionally defined MetS with generally acceptable accuracy in the YBDS dataset. Some subgroups had highly acceptable accuracy, such as in young male Hans (0.865) and young female Hans (0.883) (Fig. 4). Compared to the traditionally defined MetS, the dichotomous overall and age-sex-ethnicity-specific MetS scores also showed the higher sensitivity and AUC in predicting the presence of ≥1 CVD-related risk factor and ≥1 abnormal risk marker (all p values < 0.05) but showed the lower specificity (all p values < 0.05) (Table S9). The only exceptions were old female Hans and old female minorities showing a comparable AUC between the traditionally defined MetS and the dichotomous MetS score.

Fig. 4: Receiver operating characteristic (ROC) curves for the dichotomous overall and age-sex-ethnicity-specific MetS scores and the traditionally defined MetS in the Yunnan Behavior and Disease Surveillance (YBDS) survey.
figure 4

AUC area under the ROC curve.

The HBDS participants were all Hans, with about 66% aged <60 and 39% being male (Table S10). Each unit increase in the MetS score was associated with an increased risk for all CVD-related risk factors, including hyperlipidemia (OR = 1.35 [1.34, 1.35]), diabetes (OR = 1.13 [1.12, 1.14]), and hypertension (OR = 1.20 [1.19, 1.21]), and also an elevated level of all CVD-related risk markers, including CHOL (β = 0.19 [0.17, 0.20]), BMI (β = 2.37 [2.32, 2.41]), and SUA (β = 29.34 [27.85, 30.83]) (Table S11). The risks of having the CVD-related risk factors and risk markers increased as the quartiles of both the overall and age-sex-specific MetS scores, with the highest ORs observed in the 4th quartile of the MetS score for nearly all groups (Table S12). The traditionally defined MetS had generally acceptable accuracy in the HBDS dataset (Fig. 5), compared to which the dichotomous overall and age-sex-ethnicity-specific MetS scores showed the higher sensitivity and AUC in predicting the presence of ≥1 CVD-related risk factor and ≥1 abnormal risk marker, but showed the lower specificity (Fig. 5, Table S13).

Fig. 5: Receiver operating characteristic (ROC) curves for the dichotomous overall and age-sex-ethnicity-specific MetS scores and the traditionally defined MetS in the (a) Hubei Behavior and Disease Surveillance (HBDS) survey and (b) Fujian Behavior and Disease Surveillance (FBDS) survey.
figure 5

AUC area under the ROC curve.

The FBDS participants were all Hans as well, with about 66% aged <60 and 44% being male (Table S14). The similar performance was observed among the FBDS participants (Tables S15-S17, Fig. 5).

Discussion

This study, for the first time, developed a continuous, age-sex-ethnicity-specific MetS score in the Chinese population based on the CMEC baseline survey. The MetS score had good performance in predicting the changes in CVD-related risk markers in the CMEC follow-up survey and the occurrence of CVD-related risk factors in both the CMEC follow-up and the three external surveys. This new MetS score offers improvements on the traditionally defined MetS to differentiate MetS severity, and had good performance in predicting the CVD risk among age-sex-ethnicity-specific subgroups in the Chinese population. Rather than replacing individual MetS components for decision-making in the choice of intervention and treatment strategies among those with abnormal MetS components, the MetS score would be mainly used for early identification of CVD risks among the general or healthy population. Particularly, if having the levels of all MetS components close to the corresponding cut-off values (on the normal side) but demonstrating no relevant symptoms, one would usually not get diagnosed and treated clinically but be at a greater risk for CVD. In such cases, those individuals could be identified due to their high MetS scores, and get further clinical examinations or lifestyle interventions to decrease the MetS score and thus CVD risks. Also, the MetS score may help determine the urgency of public health interventions in given populations, especially in those with higher risk and to be prioritized for timely clinical interventions. According to previous studies7, 8, people could be motivated by the MetS score and may feel empowered by a decrease in score.

Previous studies have developed MetS scores for the populations in developed, mostly western, countries. For example, adolescents’ and adults’ sex- and race/ethnicity-specific MetS scores were developed using the 1999–2010 data from 4413 adolescents and 6,870 adults in the National Health and Nutrition Examination Survey in the United States, where a significant correlation between the scores and biomarkers of future diseases was observed7,14. Another study in Singapore with 4419 participants developed an MetS score to assess the performance of predicting diabetes in the follow-up15. However, both scores were derived based on a single, small dataset without performance assessment in externally independent datasets, and both were in developed countries. The prevalence and components of MetS in the Chinese population are quite different from those of western people, mainly due to the differences in hereditary and lifestyle factors10,16, which may result in the variation in the composition of MetS components and their weights in the models. The current age-sex-ethnicity specific MetS score was developed from a recently established large-sample, multi-ethnic cohort, and validated in the three newly established, independent large-sample datasets. Besides, the MetS score showed good predictive performance in detecting CVD-related risk factors and risk markers in the CMEC follow-up survey, as well as in the external datasets, which has demonstrated the generalizability of the MetS score.

The MetS score was developed mainly based on the traditional MetS components, resulting in the higher weights of TG, HDL-C, and WC. TG and HDL-C have been considered surrogate insulin resistance in the diagnosis of MetS17,18, and their relationships with insulin resistance differ by ethnicity;18 WC may to a large extent indicate the deposition of excess fat in internal organs, which could increase MetS and CVD risks by disrupting hormone secretion, lowing the level of insulin, and elevate the level of blood fat19. One prospective study in China has also showed that HDL-C, TG, and WC were better predictors of MetS than BP and FBG20, which supported the patterns of weights of MetS components found in this study. MAP and FBG exhibited the lower weights than other components, which was not surprising given the relatively high rates of essential hypertension and diabetes independent of MetS21. Previous studies have also revealed the low loading factors (<0.3) of SBP and FBG7,14,15. However, they remained in the equations because they can provide an additive effect on the capacity of the MetS score for assessing CVD risks.

Notably, we observed that the factor loadings of the MetS components varied greatly by age, sex, and ethnicity, suggesting that the MetS scores for different subgroups should be considered, such as the high weights of TG and HDL-C in the Hans, relatively higher weight of WC in minorities and males, and higher weight of TG in young adults. The variations in the age-sex-ethnicity-specific MetS score may also be explained by central obesity and lipid measurement across different populations. The high level of blood lipid markers may be more closely associated with MetS in Hans than minorities10, which may account for the greater weights of TG and HDL in the Hans. The weight of TG in young people may partly be explained by the aging effect on the complex pathophysiological mechanisms of metabolic diseases, and blood lipids are considered precursors to insulin resistance and closer to MetS among young adults22,23,24. Moreover, the higher weight of WC in males suggested that central obesity may be a more sensitive marker of MetS in males. Unlike in other countries, central obesity in China has been increasingly prevalent in males in recent years. The higher WC in minorities may be related to their lifestyles and ambient natural environment, compared to the Hans25. For example, central obesity in minorities may be attributed to low temperature, high altitude, high-fat food, and exercise-discouraging environments26,27,28,29.

Compared to the traditionally defined MetS, the MetS score, both in the continuous form or in quartiles, had the higher sensitivity and AUC in diagnosing CVD-related risk factors and risk markers. This has strengthened its use in the management of CVD-related risk factors associated with the population-level and individual-level characteristics. Specifically, at the individual level, with the MetS score calculator as a freely available online tool, healthcare providers can estimate an individual’s chance of developing CVD-related risk factors in the future. Changes in the MetS score over time would suggest an individual’s worsening risk, which may trigger the motivation of intervention or therapies at early stages. At the population level, the MetS score can identify the future CVD risks in a given population, identify the high-risk groups, and determine the urgency and priority of public health interventions. Previous studies have assessed the changes in the American sex-ethnicity-specific MetS score over time, and reported that an increased score may draw attention to the altered disease risk and help tracking earlier response to treatment closely8,30. As the burden of CVD in China increases and spreads, the web-based MetS score calculator could assist medical workers in applying prevention practices among those with a high MetS score.

Several limitations in this study should be noted. First, some age-sex-ethnicity-specific subgroups have a relatively small sample size, such as old male minorities, resulting in some less-than-optimal performance. However, to the knowledge of the authors, the sample size of the most subgroups based upon which the MetS score has been developed and validated was quite large, particularly much larger than that of the existing study populations elsewhere. Also, Hans in this study, accounting for 57.7% of the study population, were disproportional to the Hans in the study area. Nonetheless, the MetS score of the Hans has been validated externally (all Hans in the HBDS and FBDS), which implies that the large sample size may cancel out potential under-representation. Future efforts to further validate the MetS score in larger, specific populations are warranted. Second, some risk markers were not included for performance assessment due to the lack of measurement, such as high-sensitivity C-reactive protein (hs-CRP) and homeostasis model assessment of insulin resistance (HOMA-IR), so this score should not be used without caution. Third, although lifestyles and dietary patterns were adjusted in the models, the study participants were limited to western China, where lifestyles and dietary patterns may be different from other places in China. Such differences may still slightly affect our results. Also, the participants without information on MetS were excluded without being able to conduct imputation. However, the MetS score was developed on the basis of a large sample and validated externally, it is reasonable to assume that missing data did not affect our results to a large extent. Fourth, although outside the scope of this study, heterogeneities across all 55 minority ethnic groups in China were not examined due to insufficient sample size in most groups, which is expected to be studied in future efforts. Such inter-minority differences should not be ignored whenever possible, and considering these differences would further improve the quality of the MetS score.

In conclusion, this study provides a continuous, robust, age-sex-ethnicity-specific MetS score to reflect the severity of MetS and predict CVD risks in different demographic subpopulations. An online MetS score calculator was developed and made available, which holds potential to help primary medical workers quantify CVD risk quickly and conveniently, and inform choice of medication and non-pharmacological interventions for chronic disease management. This continuous MetS score would also be beneficial to monitor the development of MetS and to improve overall population health in China and in other eastern countries. More broadly, this study would advance preventive medicine and sustainable development goals (SDGs).

Methods

The CMEC was approved by the medical ethics committee of Sichuan University (K2016038). The three provincial surveys received ethical approval from the ethical review board of the Yunnan Center for Disease Prevention and Control (202017), the Fujian Center for Disease Prevention and Control (2018001), and the National Center for Chronic and Non-communicable Disease Control and Prevention. All participants provided written informed consent before the survey.

Study population and design

The CMEC study is an ongoing community-based prospective cohort study, which aims to examine ethnic variation in the profiles of non-communicable diseases and related risk factors in China13. The CMEC used a multi-stage stratified cluster sampling method to recruit 99,556 participants aged 30-79 years from the five provinces of Southwest China (Sichuan, Chongqing, Yunnan, Guizhou, and Xizang) between May 2018 and September 2019. Among them, about 10% of the subjects in each participating district/county, adding up to 11,527 participants, were selected between August 2020 and July 2021 by a purposive sampling method and followed up in the same way as the baseline survey.

Three provincially representative epidemiological surveys established in Yunnan Province (the one with the largest number of minorities in China), Hubei Province, and Fujian Province, were used to validate the MetS score. Among them, the Yunnan Behavior and Disease Surveillance cohort (YBDS) recruited 51,480 adults aged >18 years from 35 districts/counties of Yunnan Province from January to August 2021, by a multi-stage stratified cluster random sampling method; similarly, during 2018-2020, the Hubei Behavior and Disease Surveillance cohort (HBDS) recruited 27,964 adults aged >18 years from 10 districts/counties of Hubei Province, and the Fujian Behavior and Disease Surveillance cohort (FBDS) recruited 54,961 adults aged >18 years from 29 districts/counties of Fujian Province.

An overview of the study design is shown in Fig. 1. The baseline data of the eligible CMEC participants were used to develop the MetS score. All with complete information on the MetS components and sociodemographic characteristics (e.g., sex, age, and ethnicity) were included. Those with self-reported hypertension, diabetes, or hyperlipidaemia and taking relevant medication at the time of the survey were excluded, as medication taken by them to control blood pressure, blood glucose, or lipids may make their measurements not capable of reflecting the natural levels. A total of 77,639 participants were included, with 21,917 (22.0%) excluded who had missed any MetS component and were taking antihyperlipidemic, anti-diabetic, or antihypertensive medications (Table S1). The CMEC follow-up data were used for internal validation of the MetS score, where the 9,249 participants were included after excluding those with incomplete information on the MetS components and diagnosed as CVD-related risk factors at baseline (Table S3). The YBDS, HBDS, and FBDS data were used for external validation, where a total of 99,563 participants were included after excluding those with incomplete information on the MetS components, aged <30 years, or taking antihyperlipidemic, anti-diabetic, or antihypertensive medications (Tables S6, S10, S14). A flowchart of participant inclusion and exclusion can be found in Figure S1.

Data collection

The surveys of the CMEC, YBDS, HBDS, and FBDS consisted of an electronic questionnaire with face-to-face interviews, medical physical examinations, and clinical laboratory tests. The information on sociodemographics and lifestyles was collected by the electronic questionnaire, including sex, age, ethnicity, marital status, educational level, annual family income, residential location, drinking habit, smoking habit, dietary intake of cereal, fruits, red meat, vegetables, soybean products and aquatic products, and physical activity. The physical examination included height, weight, WC, systolic BP (SBP), and diastolic BP (DBP). The WC was measured 1.0 cm above the navel and to the nearest 0.1 cm with light clothing. The SBP and DBP measurements were performed in a seated, upright position, and were instructed not to smoke, drink alcohol, coffee, or tea, or exercise for at least 30 minutes before the measurement. Both WC and BP were measured three times with a unified instrument, and then the mean values were taken. A series of biochemical tests were performed to obtain biological markers, and included cholesterol (CHOL), TG, HDL-C, total cholesterol (TC), FBG, serum uric acid (SUA, only in the CMEC, YBDS and HBDS), and glycated hemoglobin (HbA1c, only in the CMEC and YBDS). The venous blood samples were used to measure biochemical blood indexes at least 8 hours after fasting.

Development of the MetS score

Score development and assessment

The confirmatory factor analysis (CFA), previously used to develop an MetS score in western countries on the basis of the continuous measures of the five MetS components (WC, TG, HDL-C, SBP/DBP, and FBG)7,14, was used in this study to develop the MetS score in a similar way. Some of the MetS components to be used were improved: SBP and DBP, with a high correlation which can result in a zero-matrix determinant and an ill-conditioned matrix that could not be inverted31, were replaced by a mean arterial pressure (MAP) that has a better capability of predicting future MetS32; TG was log-transformed to reduce the skewness of distribution; HDL-C was inverted to make its higher value similar in interpretation to the other measures in the model. All variables of MetS components with different units were standardized for better integration. The CFA can estimate the factor loading (i.e., weight) of each MetS component, which indicates the degree of contribution of each component to the MetS score33. Two multi-group one-factor CFA were fit: Model 1 constrained the factor loadings to be equal across the eight combinations of age (young [<60 years] and old [≥60 years]), sex (male and female), and ethnicity (Han and minority); Model 2 allowed the factor loadings to vary across the eight subgroups, for developing age-sex-ethnicity-specific MetS scores. The differences in factor loadings across the subgroups were tested by the Chi-square test.

To evaluate the model’s goodness of fit, several indexes of model fit were used, including Chi-square (χ2), Akaike’s Information Criteria (AIC), Root Mean Square Error of Approximation (RMSEA), Standardized Root Mean Square Residual (SRMR), Comparative Fit Index (CFI), Goodness of Fit Index (GFI), McDonald’s Fit Index (MFI), and Bentler-Bonett Normed Fit Index (NFI)34. A well-fitting model is expected to have as small χ2 and AIC as possible, with other indexes of fit meeting the recommended criteria: RMSEA < 0.08 (the smaller, the better); SRMR < 0.08 (the smaller, the better); CFI, GFI, MFI, and NFI ≥ 0.90 (the larger, the better)35.

The MetS score of each individual was calculated based on the factor loading matrix and the specific variance matrix of the five MetS components, and then normalized (mean=0, standard deviation=1) for convenient comparison, with a higher MetS score representing more severe MetS. A linear model of the MetS score on the basis of original values of the MetS components was also fitted for each age-sex-ethnicity-specific group, allowing these equations to be applied to any individuals conveniently in a clinical context.

Performance assessment of the MetS score

The performance of the MetS score in predicting the incidence of the CVD-related risk factors and risk markers, selected according to the existing studies, was assessed in the CMEC baseline and follow-up datasets. The CVD-related risk factors used in this study included hypertension, diabetes, and dyslipidaemia36, which were newly diagnosed in the medical examinations or blood tests at the CMEC baseline survey. Hypertension was defined as SBP ≥ 140 mmHg or DBP ≥ 90 mmHg, according to the International Society of Hypertension criteria (2020)37. Diabetes was defined as FBG ≥ 7.0 mmol/L or HbA1c ≥ 6.5%, according to the American Diabetes Association criteria (2019)38. Dyslipidaemia was defined as having one or more of the four conditions: TC ≥ 6.2 mmol/L (hypercholesterolemia), TG ≥ 2.3 mmol/L (hypertriglyceridaemia), HDL-C < 1.0 mmol/L (hypolipidaemia), and low-density lipoprotein cholesterol (LDL-C) ≥ 4.1 mmol/L (hyperlipidaemia), according to the Chinese Guidelines for Dyslipidaemia Management in Adults (2016)39. The CVD-related risk markers used in this study included HbA1c40, CHOL41, body mass index (BMI)42, and SUA43, with their abnormal status referred to as the elevated risk markers: (1) elevated HbA1c was defined as HbA1c > 6.5%;44 (2) elevated CHOL was defined as CHOL > 6.2 mmo/L;45 (3) elevated BMI was defined as BMI ≥28 kg/m2;46 and (4) elevated SUA was defined as >417 μmol/L for men and >357 μmol/L for women47.

Multiple logistic regression and linear regression were used to estimate the associations of the MetS score with the CVD-related risk factors and risk markers (i.e., the changes in the risk for these outcomes for each increase of 1 point in score) in the CMEC baseline, respectively, after adjusting for a series of covariates (Table S1). The estimates were presented as odds ratios (ORs) and coefficients (β) with 95% confidence intervals (CIs) for multiple logistic and linear regression, respectively. Cox regression and generalized estimating equations (GEE) were used to estimate the associations of the MetS score with the CVD-related risk factors and risk markers (i.e., the changes in the rate of experiencing these outcomes for each increase of 1 point in score) in the CMEC follow-up, respectively, after adjusting for a series of covariates at baseline. In addition, the abnormal risk markers in the CMEC follow-up were also used as the outcomes to be associated with the MetS score by Cox regression. The estimates were presented as hazard ratios (HRs) and coefficients (β) with 95% CIs for Cox regression and GEE, respectively. The prediction performance of the MetS score in Cox regression was evaluated by the concordance index (C-index), which estimates the probability that the predicted outcome is consistent with the observed outcome. The C-index ranges from 0.5 (completely random) to 1 (completely consistent), with 0.5–0.7, ≥0.7–0.9, and ≥0.9 considered low, moderate, and high accuracy, respectively. To further assess the robustness of those associations in the study populations, all individuals were categorized into quartiles by their MetS scores, and the associations of the MetS score in quartiles with the CVD-related risk factors and the abnormal risk markers were estimated by Cox regression.

Comparison with the traditionally defined MetS

One was considered experiencing the traditionally defined MetS (a dichotomous variable) if presenting three or more of the five common MetS components, with each defined by a commonly used cut-off value: elevated WC was defined as ≥90 cm for men and ≥85 cm for women;48 elevated TG were defined as a fasting TG level ≥1.7 mmol/L (150 mg/dL);49 reduced HDL-C was defined as <1.03 mmol/L ( < 40 mg/dL) in men or 1.29 mmol/L ( < 50 mg/dL) in women;49 elevated BP was defined as SBP ≥ 130 and/or DBP ≥ 85 mmHg; and elevated FBG was defined as ≥5.6 mmol/L50. The capacity of the MetS score in indicating traditionally defined MetS was evaluated by the Area Under the receiver operating characteristic (ROC) curve (AUC), the values of which range from 0.5 (no capacity) and 1.0 (complete capacity), with >0.7 and >0.9 considered acceptable and accurate, respectively. Also, the MetS score was converted into a dichotomous MetS variable to compare and evaluate the degree of consistency with the traditionally defined MetS. The maximum value of the Youden index, calculated as sensitivity + specificity − 1, estimated from the ROC curve was identified for the overall group and each age-sex-ethnicity-specific subgroup separately, to convert the MetS score into the dichotomous MetS for comparison with the traditionally defined MetS.

To demonstrate the clinical usefulness, the dichotomous MetS score, and the traditionally defined MetS were both used to predict the presence of ≥1 aforementioned risk factor (i.e., hyperlipidemia, diabetes, and hypertension), and the presence of ≥1 aforementioned risk marker (i.e., HbA1c, CHOL, BMI, and SUA). The same sociodemographic and lifestyle factors, as covariates, were adjusted for in the overall and age-sex-ethnicity-specific models. In the overall and each age-sex-ethnicity-specific subgroup, sensitivity and specificity were calculated and compared by the paired Chi-square test, and AUC was calculated and compared by Delong’s test51.

Performance assessment of the MetS score in three external datasets

The MetS score was calculated for each YBDS, HBDS, and FBDS participant by the equations established based on the CMEC baseline, and several analyses were conducted on the basis of the three datasets. First, the associations of the MetS score with the CVD-related risk factors and risk markers were estimated. Second, the participants in the three datasets were categorized into quartiles by the MetS score, and the associations of the MetS score in quartiles with the CVD-related risk factors and risk markers were also estimated. Third, the maximum values of the Youden index estimated from the ROC curve, identified based on the CMEC baseline for the overall group and each age-sex-ethnicity-specific subgroup separately, were used to convert the MetS score into the dichotomous MetS variable, which was compared with the traditionally defined MetS by the ROC curve, assessed by AUC. Fourth, the dichotomous MetS score and the traditionally defined MetS were both used to predict the presence of ≥1 aforementioned risk factor (i.e., hyperlipidemia, diabetes, and hypertension), and the presence of ≥1 aforementioned risk marker (i.e., HbA1c, CHOL, BMI, and SUA), assessed by sensitivity, specificity, and AUC. A series of covariates were adjusted for (Tables S6, S10, S14), with age, sex, and ethnicity excluded from the covariates in the age-sex-ethnicity-specific models.

All statistical analyses were completed using R software version 4.0.2 (R Foundation for Statistical Computing), and statistical significance was declared if two-sided P < 0.05.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.