Introduction

The occurrences of metabolic syndrome and its associated risk factors, like hypertension, dyslipidemia, insulin resistance, and central obesity, have increased over the past few decades1,2. The clinical importance of metabolic syndrome has been acknowledged for long time owing to its increased risk for type 2 diabetes and cardiovascular disease (CVD)3. There have been considerable research to find the factors reducing the risk of metabolic syndrome. As a series of events following pregnancy, such as delivery and breastfeeding are known to have long-term impacts on women’s health, multiple studies evaluated the association between the pregnancy-related factors and metabolic syndrome4,5,6. Especially, the protective role of breastfeeding received attentions in terms of resetting metabolic change caused by pregnancy which includes insulin resistance and accumulation of lipid6. Several studies reported that the breastfeeding was associated with reducing the risk of metabolic syndromes7,8. While some studies found no association between breastfeeding and metabolic syndrome8,9,10. In addition, various mediating factors should be considered to determine the association between breastfeeding and metabolic syndrome.

CVD and metabolic syndrome are closely related owing to shared predisposing risk factors11,12. The proportion of pregnant women with CVD has increased over the decades13,14,15. Additionally, the number of pregnant women with pregestational comorbidities, like diabetes and obesity, is also on the rise13,15,16,17. These changes are presumably associated with maternal metabolic syndrome, but the validated data is limited18,19,20.

Understanding the association between metabolic syndrome and breastfeeding is important in terms of suggesting another possible prevention of metabolic syndrome. Therefore, we aimed to investigate the association between obstetric characteristics like breastfeeding and metabolic syndrome and the presence of CVD in a large-scale Asian population-based cross-sectional study of women, using artificial intelligence. We developed a prediction model for metabolic syndrome using artificial intelligence, which assessed 86 variables, including general obstetric characteristics (e.g., parity, gravidity), medical information, demographics, dietary preferences, lifestyles, and socioeconomic factors.

Results

General obstetric characteristics and metabolic syndrome

Among the 80,861 participants in the KNHANES 2010–2019, only women older than 20 years of age were included (n = 35,434). Patients with missing CVD or metabolic syndrome data were excluded (n = 5229). After excluding the outliers (n = 1), the data of 30,204 participants were analyzed (Fig. 1). The mean age of the participants was 50.93 years, and the prevalence of metabolic syndrome was 28.38% (8571/30,204) (Table 1). Among the study population, 21,865 (72.85%) had a history of breastfeeding. The prevalence of CVD was 23.50% (7097/30,204).

Figure 1
figure 1

A flow chart summarizing the experimental approach of the study. KNHANES, Korean National Health and Nutrition Examination Survey; HDL, high-density lipoprotein.

Table 1 The baseline characteristics evaluated for the prediction of metabolic syndrome.

Prediction model for metabolic syndrome

The performance measures for the six prediction models for metabolic syndrome are summarized in Table 2. Among the six prediction models for metabolic syndrome, the random forest performed the best in terms of the area under the receiver operating characteristic curve (AUC); 90.7% (all participants), 87.7% (diagnosed with CVD), and 82.6% (no CVD diagnosis). The values and ranks of the random forest variable importance are summarized in Table 3. A predictor with the ranking of 26th or higher can be considered to be a major predictor in this study, given that it is a top 30% among 86 predictors here. According to the random forest variable importance in Table 3, the major predictors of metabolic syndrome were body mass index (BMI) (0.1032), use of antihypertensive drugs (0.0552), hypertension (0.0499), CVD (0.0453), age at enrollment (0.0437), white blood cell count (0.0297), low-density lipoprotein (LDL), cholesterol levels (0.0263), menstrual status (0.0247), use of lipid-lowering agents (0.0237), red blood cell count (0.0231), total cholesterol levels (0.0229), subjective body image (0.0221), education level (0.0214), daily fat intake (0.0198), hematocrit levels (0.0197), and breastfeeding duration (0.0191). Breastfeeding duration was a major predictor of metabolic syndrome. Let us take an example in which the random forest variable importance of BMI, CVD, or breastfeeding duration is 0.1032, 0.0453, or 0.0191, respectively. Here, the accuracy of the model will decrease by 10.32%, 4.53%, or 1.91% if the values of BMI, CVD, or breastfeeding duration are randomly permutated (or shuffled). The importance rankings of some major predictors showed dramatic changes in the subgroup analysis, i.e., between the participants with and without CVD. For example, the predictors of medication and diagnosis for hypertension ranked second and third for all participants, respectively, but these predictors went out of the top-30 ranking for both subgroups in Table 3. Likewise, the respective rankings of menstrual status and education were eighth and 13th for all the participants, but their rankings dropped to 23rd or lower for both the subgroups in the same table. Breastfeeding duration ranked 16th as a predictor for all the participants. However, it was ranked slightly higher at 14th for those without CVD and much lower at 26th for those with the condition.

Table 2 Model performance: the average was measured for 50 runs.
Table 3 The variable importance from the Random Forest in predicting metabolic syndrome.

The logistic analysis results for each important variable, including obstetric characteristics, are presented in Supplementary Material 2. The breastfeeding duration was associated with a decreased risk of metabolic syndrome (adjusted odds ratio [aOR] 0.998; confidence interval [CI] [0.996–1.000]). The odds of metabolic syndrome will decrease by 0.2% if breastfeeding duration increases by 1 month. In other words, the odds of metabolic syndrome will decrease by 2.4% (or 4.8%) if breastfeeding duration increases by 1 year, i.e., 12 months (or 2 years, i.e., 24 months). The effect of breastfeeding duration on metabolic syndrome looks small on 1 month but it is big on 1 year or two. The odds ratio is not statistical significant at 5% level but it is still useful information in machine learning, given that variable importance is primary and statistical significance is supplementary in machine learning. Logistic regression requires adopting the unrealistic assumption of ceteris paribus, i.e., “all the other variables remain constant”. In this context, the results of the logistic regression would serve as supplementary information to the random forest variable importance.

Discussion

In summary, among the obstetric characteristics, one of the most significant factors associated with metabolic syndrome was the duration of breastfeeding. Among the six prediction models for metabolic syndrome, the random forest had the best performance in terms of the AUC, i.e., 90.7% (all participants). In the subgroup analysis, among the women without CVD, the importance of breastfeeding duration as a predictor of metabolic syndrome was ranked 14th (0.0235), which is as important as the daily intake of sodium (12th, 0.0239).

This study presents the most comprehensive analysis of the determinants of metabolic syndrome in women using a large-scale Asian population-based cross-sectional study of 30,204 participants. While there is one paper that has addressed the association between breastfeeding and metabolic syndrome in postmenopausal women using KHANES data, our study differs in that it targeted all adult women, included more recent data (2010 to 2018), and distinguished itself by constructing a predictive model for metabolic syndrome using machine learning9. This study investigated whether there were differences in metabolic syndrome-related factors between the women with and without CVD. In a recent meta-analysis, the authors assumed that breastfeeding may have a preventive effect on metabolic syndrome and that it was related to breastfeeding duration8. However, the pooled effect of breastfeeding on metabolic syndrome was not conclusive because of the study population heterogeneity, the criteria for breastfeeding, and confounding factors for metabolic syndrome8. In this large-scale population-based study, we evaluated the precise impact of breastfeeding on metabolic syndrome and compared its clinical importance to the other known risk factors known to predispose women to metabolic syndrome.

During pregnancy, the mother undergoes metabolic changes that increase insulin resistance and serum lipid levels (particularly triglyceride [TG])21,22. Breastfeeding reportedly restores the overall maternal postpartum metabolic changes faster back to the prenatal baselines23. It also has a long-term positive effect on maternal glucose levels, lipid metabolism, and adiposity23,24,25. The relationship between gravidity, parity, and metabolic syndrome is still debated, necessitating further research.

In this study, we investigated the importance of specific variables in the development of metabolic syndrome in women with and without CVD. The relative importance of different variables between the participants with and without CVD can have important clinical implications. First, in women without CVD, age (second vs. tenth), breastfeeding duration (14th vs. 26th), and gravidity (26th vs. 31st) were ranked higher as compared to women with CVD. These variables appeared to have a higher association with metabolic syndrome in the women without CVD and were less important in women with CVD. Second, in women with CVD, the importance of lipid-lowering agents or diabetes drugs was relatively higher. A previous meta-analysis reported that among the five factors of metabolic syndrome, the prognosis of CVD was especially poor in patients with dyslipidemia or impaired glucose tolerance26. In this study, it can also be hypothesized that dyslipidemia or impaired glucose tolerance has a stronger mediating effect on metabolic syndrome in women with CVD. Third, in the three models of this study (Table 3), the nutrient intake (especially fat intake) was highly correlated with metabolic syndrome, and the importance of nutrient intake was higher in women with CVD than in women without CVD. Previous studies have reported the significance of healthy diets for metabolic syndrome, which was further emphasized in this study27. Moreover, the importance of diet in metabolic syndrome was reported to be greater in women with CVD than in women without CVD. Additionally, white blood cell count ranked sixth or higher as a predictor of metabolic syndrome in women. Levels of C-reactive protein, plasma, and low-grade inflammation have been reported to be positively associated with metabolic syndrome28,29. It is reasonable to speculate that the white blood cell count also has a positive relationship with metabolic syndrome.

This study has limitations. First, a cross-sectional design was used. However, using data with a longitudinal design is expected to improve the validity of this study. Second, the duration of breastfeeding in this study is reliant on information that has been self-reported several years after the actual breastfeeding took place, which may introduce limitations to the accuracy of the data. Furthermore, although the medical history was presumed based on a physician's diagnosis, it may be subject to limitations in accuracy as it relied on self-report surveys by the participants. Similarly, an investigation into dietary intake involved a nutritionist conducting direct interviews during visits. However, there may be limitations to the objectivity of respondents' responses. Third, expanding this study to other diseases and predictors such as health utility usage might significantly contribute to this line of research. Fourth, we excluded the diagnostic criteria for the metabolic syndrome from the independent variables. However, to examine the influence of CVD and the use of cardiovascular medications on the metabolic syndrome, we included the presence of hypertension diagnosed by a physician and the use of cardiovascular medications as independent variables. Fifth, this study used random forest variable importance as primary results and logistic regression odds ratios as supplementary findings. That is, the former result was considered to be the strength of the association between metabolic syndrome and its major predictor, while the latter finding was considered to be the direction of the association. There would be other ways to examine the direction of the association, and this would make a great contribution for research in this direction. Finally, this study did not consider the possible mediating effects among the variables.

In the prediction model with a random forest of AUC 90.7%, the top predictors of metabolic syndrome included body mass index (0.1032), medication for hypertension (0.0552), hypertension (0.0499), cardiovascular disease (0.0453), age (0.0437) and breastfeeding duration (0.0191). Breastfeeding duration was one of the most important predictors of metabolic syndrome among the various obstetric characteristics.

Methods

Study population

This study was based on the fifth (2010–2012), sixth (2013–2015), seventh (2016–2018), and eighth (2019) Korean National Health and Nutrition Examination Survey (KNHANES) surveys. The KNHANES is a nationwide representative survey that obtains samples annually using a stratified multistage cluster sampling design. The KHANSE is conducted by a dedicated research team, visiting four regions each week (for a total of 192 regions annually). The survey is conducted over a period of 3 days in each region, with mobile examination vehicles visiting the area to perform health screenings, health surveys, and nutritional assessments. Health surveys and medical examinations are conducted in mobile examination vehicles, while nutritional assessments are performed by a specialized team of nutritionists who visit households directly. This data is used to assess the health status, prevalence of chronic diseases, and nutritional intake status of the population in South Korea. In the KNHANES 2010–2019, men and participants under the age of 20 years were excluded from the current analyses. The cases with missing data on the chronic occurrence or diagnosis of hypertension, myocardial infarction, angina, all the factors associated with the diagnosis of metabolic syndrome, and an outlier (the woman over 80 years old before menarche) were excluded.

The data were publicly available and de-identified. The requirement for ethical approval was waived by the institutional review board of Korea University Anam Hospital. All methods were conducted in accordance with relevant institutional/ethical committee guidelines and regulations. The requirement for informed consent was waived because all participant information was deidentified and encrypted to protect privacy.

Variables

The variables included in this study are summarized in Supplementary Materials 1. The sociodemographic characteristics, including the age at enrollment, sex, body mass index (BMI), household income (represented as quartiles), marital status, the level of education (elementary school and below, middle school, high school, and college and above), areas of residence, economic activities, and occupations, were assessed using questionnaires.

Information regarding the general obstetric characteristics, including gravidity, parity, breastfeeding (history of breasting, the number of children breastfed, and lifetime total breastfeeding duration), history of abortions, the age at menarche, and the menstrual status (menstruation, pregnancy, breastfeeding, menopause, and others), were also obtained from the questionnaires. The presence of the following diseases was defined based on an interview: (1) hypertension, (2) myocardial infarction, (3) angina, (4) stroke, (5) osteoarthritis, (6) rheumatoid arthritis, (7) pulmonary tuberculosis, (8) asthma, (9) thyroid-related disease, (10) major depressive disorder, (11) kidney failure, (12) hepatitis B, (13) hepatitis C, (14) liver cirrhosis, (14) cancers (gastric cancer, hepatic cancer, colorectal cancer, breast cancer, cervical cancer, and lung cancer), and (15) atopic dermatitis. Data on family histories of hypertension, hyperlipidemia, ischemic heart disease, stroke, and diabetes mellitus were also obtained from the questionnaires. Additionally, the questionnaires also provided the data on the use of (1) antihypertensive drugs, (2) lipid-lowering agents, (3) oral hypoglycemic agents, and (4) insulin.

The blood pressures, waist circumferences and body mass index (BMI) of the participants were measured. Levels of total cholesterol, TG, LDL, high-density lipoprotein (HDL), hemoglobin, hematocrit, blood urea nitrogen, blood creatinine, white blood cell, and red blood cell were also measured at the time of survey.

The participants answered questions about their insights and habits associated with their health. They were asked about their subjective body image, their goals associated with controlling their body weights, history of medical checkups for the past 2 years, history of smoking, frequency of alcohol consumption (per year), and weekly weight training routines. Data on mental health, including stress awareness and feelings of depression within a year, were also collected. The quality of life, based on health indicators, was assessed using the European Quality of Life-5 Dimensions (EQ-5D) scale30. The daily intake of energy (kcal), carbohydrates (g), protein (g), fat (g), sodium (mg), water (g), calcium (mg), phosphorus (mg), iron (mg), potassium (mg), and vitamin C (mg) was ascertained from the nutrition survey.

A diagnosis for CVD required the presence of at least one of the following: (1) hypertension, (2) myocardial infarction, or (3) angina. Based on the modified National Cholesterol Rationale Education Program Adult Treatment Program III criteria and the appropriate cutoff for central obesity in Korean adult women (suggested by the Korean Endocrine Society), metabolic syndrome was defined as having three or more of the following1,31: (1) central obesity (waist circumference ≥ 85 cm); (2) elevated TGs (serum TG concentration ≥ 150 mg/dL); (3) low HDL cholesterol (serum HDL cholesterol concentration < 50 mg/dL); (4) elevated blood pressure (systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 85 mmHg) or the prescription of antihypertensive drugs; (5) elevated fasting glucose (fasting serum glucose ≥ 100 mg/dL) or the prescription of diabetes drugs. And we excluded the variables corresponding to the diagnostic criteria of metabolic syndrome among the independent variables, including waist circumference, TG, HDL cholesterol, blood pressure measurements, and fasting glucose.

Statistical analysis

An artificial neural network, decision tree, logistic regression, naïve Bayes, random forest, and support vector machine were used to predict metabolic syndrome. Data on 30,204 observations with full information were divided into training and validation sets in a 70:30 ratio (21,143:9061). The AUC curve and accuracy (the ratio of correct predictions among the 9061 observations in the validation set) were employed as the standard for model validation. The random forest variable importance, the contribution of a certain variable to the random forest performance (accuracy), was used to examine the major predictors of metabolic syndrome. Let us assume that the importance of the random forest variable of CVD is 0.0453. Here, the accuracy of the model drops by 4.53% if the values of a predictor of CVD are randomly permutated (or shuffled). The random split and analysis were repeated 50 times and averaged for external validation32,33,34. R-Studio 1.3.959 (R-Studio Inc.: Boston, United States) and Python 3.52 (CreateSpace: Scotts Valley, United States) were employed for the analysis between February 1, 2022–March 31, 2022.