Abstract
We aimed to investigate sex-specific associations between cardiovascular risk factors and atherosclerotic cardiovascular disease (ASCVD) risk using machine learning. We studied 258,279 individuals (132,505 [51.3%] men and 125,774 [48.7%] women) without documented ASCVD who underwent national health screening. A random forest model was developed using 16 variables to predict the 10-year ASCVD in each sex. The association between cardiovascular risk factors and 10-year ASCVD probabilities was examined using partial dependency plots. During the 10-year follow-up, 12,319 (4.8%) individuals developed ASCVD, with a higher incidence in men than in women (5.3% vs. 4.2%, Pā<ā0.001). The performance of the random forest model was similar to that of the pooled cohort equations (area under the receiver operating characteristic curve, men: 0.733 vs. 0.727; women: 0.769 vs. 0.762). Age and body mass index were the two most important predictors in the random forest model for both sexes. In partial dependency plots, advanced age and increased waist circumference were more strongly associated with higher probabilities of ASCVD in women. In contrast, ASCVD probabilities increased more steeply with higher total cholesterol and low-density lipoprotein (LDL) cholesterol levels in men. These sex-specific associations were verified in the conventional Cox analyses. In conclusion, there were significant sex differences in the association between cardiovascular risk factors and ASCVD events. While higher total cholesterol or LDL cholesterol levels were more strongly associated with the risk of ASCVD in men, older age and increased waist circumference were more strongly associated with the risk of ASCVD in women.
Similar content being viewed by others
Introduction
The global burden of atherosclerotic cardiovascular disease (ASCVD) is increasing1. Primary prevention, which includes the control of cardiovascular risk factors through lifestyle modifications or pharmacotherapy to prevent the first occurrence of ASCVD, is essential to minimize cardiovascular mortality and morbidity. Importantly, the target of these interventions is based on the individualized probabilities of ASCVD events2,3. Therefore, accurate risk prediction is important for identifying high-risk individuals to maximize the benefits of primary prevention.
A body of evidence showed significant sex differences in the prevalence of cardiovascular risk factors and the incidence of ASCVD4,5,6,7,8,9,10,11. In the general population, men tend to have a higher prevalence of obesity, smoking, high blood pressure (BP), diabetes mellitus, and dyslipidemia when compared with women4,5. Regarding cardiovascular outcomes, men are at a higher risk of ischemic heart disease and cardiovascular mortality than women6,7,8,9,10,11. The associations between cardiovascular risk factors and outcomes also differ by sex6,7,8,9. These emphasize the importance of sex-specific cardiovascular risk assessment and targeted primary prevention strategies.
Several studies have shown that machine learning models have a similar or higher performance for predicting ASCVD probabilities compared to established risk scoring systems, such as the pooled cohort equations (PCE) or Framingham Risk Score12,13,14,15,16. However, few studies have developed sex-specific machine learning models. In addition, previous studies have rarely provided information on how each variable is associated with the outcome in these models, information which could significantly improve model interpretability. Thus, constructing separate machine learning models by sex and delineating the impact of cardiovascular risk factors on outcomes in these models may provide a deeper insight into the importance of cardiovascular risk assessment by sex.
We aimed to investigate the sex differences in cardiovascular risk factors and their association with outcomes using nationwide health examination data with a machine learning approach. The aims of this study were: (1) to develop a sex-specific machine learning model for the prediction of ASCVD probabilities; (2) to stratify important predictors of ASCVD in each sex in the machine learning models; (3) to investigate the sex-specific associations of these risk factors with ASCVD.
Methods
Cohort characteristics
The National Health Insurance Service (NHIS) in Korea covers the entire Korean population, and the NHIS database incorporates detailed information on the individualsā sociodemographics, medical check-up results including laboratory tests and health behaviors, healthcare utilization including diagnoses and treatments, and date and causes of death17. The representative sample of this database has been made publicly available for researchers, and its validity as a reliable data source has been established18.
Specifically, this study utilized the āmedical check-up sample cohortā of the NHIS database, which includes approximately 510,000 randomly sampled individuals (10%) aged 40Ā years or older from the general Korean population who underwent the standardized national medical check-up program in 2002 or 2003. These individuals were recommended to undergo repeated biannual medical check-ups up to 2013. Of these, we selected individuals who underwent medical check-up in 2009 or 2010, as 2009 was when levels of not only total cholesterol but its individual components, including low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides, were also measured. The date of the NHIS medical check-up in 2009/2010 was used as the index date for each individual. We excluded individuals with a previous history of cardiovascular diseases at the index date, including ischemic heart disease (International Classification of Diseases, Tenth Revision [ICD-10] codes I20āI25), heart failure (ICD-10 codes I50 and I420), stroke (ICD-10 codes I60ā69), and atrial fibrillation (ICD-10 code I48). Other exclusion criteria were chronic obstructive pulmonary disease, liver cirrhosis, end-stage renal disease, and cancer.
This study conformed to the Declaration of Helsinki and the Institutional Review Board approved the study protocol (Seoul National University Hospital, approval number: E-2104-087-1211). The need for informed consent was waived by the same ethics committee (Institutional Review Board of Seoul National University Hospital) as anonymized data were used.
Variable definitions
All clinical information was collected from the medical check-up conducted on the index date. Systolic and diastolic BP were measured after resting for at least 5Ā min. Data on smoking status, alcohol consumption, physical activity, and income levels were collected from structured self-administered questionnaires. Low income was defined as that within the lowest 30% of entire Korean residents. The lifetime amount of smoking was calculated as pack-years, and the mean alcohol consumption per day (g/day) was reported. The workload of daily physical activity was calculated as the metabolic equivalent of tasks (MET) minutes per week19, and the intensity of physical activity was categorized as low, moderate, and high intensity according to the International Physical Activity Questionnaire scoring protocol.
Past medical history of hypertension was defined as either (1) previous diagnostic codes for hypertension (ICD-10 codes I10āI13, I15) with prescription records of anti-hypertensive medications including angiotensin-converting enzyme inhibitors, angiotensin II receptor blockers, calcium channel blockers, thiazides, and beta-blockers, or (2) systolic/diastolic BPāā„ā140/90Ā mmHg measured at the medical check-up. A history of diabetes mellitus was defined by one of the followings: (1) previous diagnostic codes for diabetes mellitus (ICD-10 codes E11āE14) accompanied with prescription records of glucose-lowering medications, or (2) fasting glucose levelā>ā126Ā mg/dL at the medical check-up. Dyslipidemia was defined as either (1) diagnostic codes for dyslipidemia (ICD-10 code E78) with prescription records of lipid-lowering medications or (2) total cholesterolāā„ā240Ā mg/dL at the medical check-up.
Laboratory results, including fasting glucose, cholesterol, and liver enzyme levels (aspartate transaminase [AST] and alanine aminotransferase [ALT]), were obtained from the tests conducted on the index date. The estimated glomerular filtration rate was calculated using the Chronic Kidney Disease Epidemiology Collaboration equation. Proteinuria was examined using the dipstick test and categorized as negative, trace, or positive. The PCE method was used to calculate the probabilities of 10-year ASCVD according to guidelines, using the original beta coefficients provided by the guideline20.
Outcome assessment
Individuals were followed up from the index date to December 31st, 2019, or death, whichever came first. The median follow-up duration of study participants was 10.1Ā years (interquartile interval, 9.6ā10.5Ā years).
The primary endpoint was newly developed 10-year ASCVD events, defined as a composite of myocardial infarction, stroke, heart failure, and cardiovascular death. Myocardial infarction was defined as hospital admission with a diagnosis of non-ST-elevation and ST-elevation myocardial infarction (ICD-10 codes I21 and I22). Stroke events were defined as a hospital admission with a diagnosis of ischemic or hemorrhagic stroke (ICD-10 codes I60āI64), along with brain computed tomography or magnetic resonance imaging during hospitalization. Heart failure was defined as hospitalization for heart failure (ICD-10 codes I50 and I42). Cardiovascular death was defined as mortality attributed to cardiovascular causes (ICD-10 codes for death: I00āI99).
Random forest model
In this study, random forest models were developed to predict 10-year ASCVD probabilities. We chose the random forest model over other machine learning models because it can effectively handle high-dimensional non-linear data and has a reduced tendency to overfit, thereby generally yielding high prediction accuracy in large-scale clinical datasets21,22. Moreover, the random forest model provides an effective variable selection that estimates variable importance.
We included 16 variables in the random forest model development: age (years), body mass index (BMI) (kg/m2), waist circumference (cm), systolic BP (mmHg), diastolic BP (mmHg), smoking (pack-year), alcohol consumption (g/day), physical activity (MET minutes per week), fasting glucose (mg/dL), total cholesterol level (mg/dL), triglyceride (mg/dL), LDL cholesterol level (mg/dL), HDL cholesterol level (mg/dL), estimated glomerular filtration rate (mL/min/1.73m2), AST (IU/L), and ALT (IU/L). The inclusion criteria were established risk factors for adverse cardiovascular events and variables routinely assessed for cardiovascular risk prediction2,3. We included only continuous variables since the random forest model may be biased in the assessment of relative variable importance by the variable type23. We used the relevant continuous variables for the categorical type of cardiovascular risk factors (i.e., fasting glucose level instead of diabetes mellitus). In addition, including additional categorical variables in the model (i.e., proteinuria) did not significantly improve risk prediction.
The outcome for the random forest model was set as the 10-year ASCVD event. We constructed a separate random forest model for each sex, and each group of men and women was randomly divided into training (70%) and test (30%) sets, the commonly used division ratio in machine learning studies. A decision tree was grown, and a random set of variables was chosen to split the samples into two branches, maximizing the decrease in node impurity. The predicted probability was a numeric value that ranged from 0 to 1. The model performance was tested with a different number of decision trees (ntree), minimum value of terminal node size (nodesize), and the number of variables randomly sampled as candidates at each split (mtry) (Supplemental Table S1)24. The best-performing model with the highest area under the receiver operating characteristic curve (AUC) was selected as the final model. The performance of random forest models appeared robust with changes in the parameters: mean AUC values were between 0.721 (standard deviation 0.009) in men and 0.757 (standard deviation 0.013) in women. The calibration of random forest models was assessed to evaluate the relationship between predicted versus actual probabilities of 10-year ASCVD.
Relative variable importance
We used the permutation variable importance to stratify the importance of predictors in the random forest model25,26. The difference in prediction error before and after randomly permuting each variable is calculated, which is averaged over all trees and normalized by the standard deviation. The resulting measure is reported as the mean decrease in accuracy. A greater mean decrease in accuracy indicates a higher level of variable importance in the respective random forest model.
Partial dependency plot
For the top ten important variables, the relationship between the variables and 10-year ASCVD probabilities in the random forest model was visualized using the partial dependency plot, which is a useful tool to improve the modelās interpretability27. The partial dependency plot is generated by calculating the marginal effect of a variable of interest on the outcome and integrating out the effects of all other variables28,29. The average probabilities of 10-year ASCVD were calculated at different values of a variable, which were traced using locally estimated scatterplot smoothing curves. Partial dependency plots were compared between men and women to investigate whether there were significant sex differences in these associations. To verify the associations between variables and ASCVD probabilities on the partial dependency plots, we further performed conventional Cox analysis, confirming these associations and assessing for any sex differences.
Statistical analysis
Continuous variables were presented as median values with interquartile ranges, and categorical variables were presented as frequencies with percentages. Differences between the groups were compared using the KruskalāWallis test for continuous variables and the chi-square test for categorical variables. The cumulative incidence of 10-year ASCVD and each component of the outcome were calculated using KaplanāMeier estimates and compared between men and women using the log-rank test. The performance of the PCE-predicted ASCVD probabilities and random forest models was evaluated using AUC and compared using DeLongās method.
The associations between the top ten important variables in the random forest model and the risk of 10-year ASCVD were examined using Cox proportional hazard analysis and were reported as hazard ratios (HRs) with 95% confidence intervals (CIs). In the Cox analysis, BMI was categorized intoā<ā18.5,āā„ā18.5 toā<ā25,āā„ā25 toā<ā30, andāā„ā30Ā kg/m2 based on the U-shaped relationship between BMI and 10-year ASCVD probabilities observed in the partial dependency plot. Multivariable Cox models were adjusted for variables included in the PCE, avoiding multicollinearity. The Cox proportionality assumption was evaluated using the scaled Schoenfeld residuals plots. The differences in risks between men and women were tested in Cox models using the interaction term.
A two-tailed Pā<ā0.05 was considered statistically significant. All analyses were performed using R version 3.3.0 (R Foundation for Statistical Computing, Vienna, Austria). The R package randomForest was used for model development, and the partial dependency plots were generated using the pdp package29.
Results
Baseline characteristics according to sex
Of the 258,279 individuals, 132,505 (51.3%) were men and 125,774 (48.7%) were women. Women were slightly older than men (56 vs. 55Ā years, Pā<ā0.001) and had a lower BMI (Table 1). Men had a higher systolic and diastolic BP (both Pā<ā0.001), with a higher prevalence of hypertension (35.5% vs. 33.6%, Pā<ā0.001), but the use of antihypertensive medications was less frequent in men than in women (25.8% vs. 29.8%, Pā<ā0.001). The proportion of current smokers and the amount of alcohol consumption were markedly higher in men than in women (current smokers: 31.2% vs. 1.5%, Pā<ā0.001; alcohol consumption: 5.7 vs. 0.0Ā g per day, Pā<ā0.001). Diabetes mellitus was also more prevalent in men than in women (12.6% vs. 8.6%, Pā<ā0.001), with higher fasting glucose levels. On the other hand, women more frequently had dyslipidemia than men (26.6% vs. 17.9%, Pā<ā0.001), and both total cholesterol and LDL cholesterol levels were significantly higher in women. The PCE-predicted 10-year ASCVD probabilities were 7.6% in men and 2.6% in women (Pā<ā0.001).
Cardiovascular outcomes according to sex
During the 10-year follow-up period, 12,319 patients (4.8%) developed ASCVD, and the annualized rate of ASCVD in the entire cohort was 4.96 cases per 1000 person-year. The events of 10-year ASCVD included 3413 myocardial infarctions (1.3%), 6951 heart failure events (2.7%), 1776 stroke events (0.7%), and 2115 cardiovascular deaths (0.8%) (Table 2). The cumulative incidence of ASCVD was significantly higher in men than in women (5.50 vs. 4.40 cases per 1000 person-year, Pā<ā0.001). Men had a significantly higher incidence of myocardial infarction, heart failure, stroke, and cardiovascular death than women (all Pā<ā0.050).
Performance of random forest model according to sex
FigureĀ 1 shows the performance of the random forest model and PCE-predicted ASCVD probabilities for the prediction of 10-year ASCVD. PCE showed fair predictability for 10-year ASCVD, with a higher AUC noted for women (men: AUC 0.727, 95% CI 0.715ā0.738; women: AUC 0.762, 95% CI 0.750ā0.774). Similarly, the random forest model achieved an AUC of 0.733 (95% CI 0.721ā0.744) for men and a higher AUC of 0.769 (95% CI 0.757ā0.782) for women. The performance between PCE-predicted ASCVD probabilities and the random forest model was similar in both sexes (P-for-differenceā=ā0.184 in men and 0.087 in women). Calibration plots of the random forest models are presented in Supplemental Fig. S1. The ASCVD probabilities predicted by the random forest were generally similar to the observed probabilities, although there was a tendency towards underestimation of random forest models for high ASCVD probabilities in both men and women.
Relative variable importance in random forest model according to sex
In both sexes, age and BMI were the two most important predictors in random forest models, with a mean decrease in accuracy of 95.4 and 72.2 in men and 86.5 and 74.3 in women (Fig.Ā 2). Waist circumference, systolic BP, diastolic BP, total cholesterol, triglyceride, LDL cholesterol, AST, and ALT ranked between the third to tenth important variables in both sexes (Fig.Ā 2). Smoking, drinking, physical activity, fasting glucose, HDL cholesterol, estimated glomerular filtration rate were variables with lower importance in both sexes. The top ten variables were consistently ranked between first and tenth place when different hyperparameters were used in the random forest model (Supplemental Fig. S2).
Partial dependency plots of top ten important variables according to sex
Partial dependency plots, which show the adjusted relationship of variables with the outcome, were generated from the random forest models for men and women (Fig.Ā 3). In the partial dependency plot, the probabilities of ASCVD increased with age, and this trend was more prominent in women than in men (Fig.Ā 3a). There was a U-shaped relationship between BMI and ASCVD probabilities in both sexes (Fig.Ā 3b). The ASCVD probabilities increased with a higher waist circumference, more strongly in women (Fig.Ā 3c). For systolic BP, the ASCVD probabilities gradually increased beyond 140Ā mmHg in men, whereas it increased more steeply once the systolic BP exceeded approximately 170Ā mmHg in women (Fig.Ā 3d). The probabilities of ASCVD gradually increased with higher diastolic BP in both sexes (Fig.Ā 3e). Higher total cholesterol and LDL cholesterol levels were associated with increased probabilities of ASCVD more strongly in men, whereas the association between triglyceride and ASCVD appeared stronger in women (Fig.Ā 3fāh). The increase in AST was more strongly associated with the ASCVD probabilities in men (Fig.Ā 3i). ASCVD probabilities increased with higher ALT in both sexes (Fig.Ā 3j).
Associations of cardiovascular risk factors with 10-year ASCVD risk according to sex
In the univariable Cox analysis, an increase in age, BMIā<ā18.5Ā kg/m2 orāā„ā30Ā kg/m2 compared to BMIāā„ā18.5 toā<ā25Ā kg/m2, increase in waist circumference, systolic/diastolic BP, total cholesterol, triglyceride, AST was associated with higher ASCVD risk in both sexes (all Pā<ā0.050) (Supplemental Table S2). However, an increase in LDL cholesterol or ALT level was associated with a higher ASCVD risk only in men.
In the multivariable Cox analysis, increased age was more strongly associated with a higher risk of 10-year ASCVD in women than in men (per 1Ā year increase, men: adjusted HR 1.09, 95% CI 1.09ā1.10, Pā<ā0.001; women: adjusted HR 1.11, 95% CI 1.10ā1.11, Pā<ā0.001, P-for-interaction by sexā<ā0.001) (Table 3). Higher waist circumference was associated with higher ASCVD risk more strongly in women (per 10Ā cm increase, men: adjusted HR 1.04, 95% CI 1.01ā1.08, Pā=ā0.015; women: adjusted HR 1.13, 95% CI 1.09ā1.17, Pā<ā0.001, P-for-interaction by sexā<ā0.001). In contrast, the increase in total cholesterol levels and LDL cholesterol levels were significantly associated with higher ASCVD risk only in men (total cholesterol, per 50Ā mg/dL increase, men: adjusted HR 1.17, 95% CI 1.13ā1.21, Pā<ā0.001, women: adjusted HR 1.03, 95% CI 0.99ā1.07, Pā=ā0.164, P-for-interaction by sexā<ā0.001) (LDL cholesterol, per 20Ā mg/dL increase, men: adjusted HR 1.06, 95% CI 1.04ā1.07, Pā<ā0.001, women: adjusted HR 1.00, 95% CI 0.99ā1.02, Pā=ā0.885, P-for-interaction by sexā<ā0.001).
Discussion
This study demonstrated significant sex differences in cardiovascular risk factors and their associations with ASCVD probabilities in the general population by applying machine learning to large-scale nationwide health examination data. Our random forest models had fair performance in predicting 10-year ASCVD, with a higher performance noted for women than for men. Importantly, the partial dependency plots demonstrated distinct sex-specific associations between these risk factors and 10-year ASCVD probabilities, which were also verified using Cox analysis. While the risk of ASCVD increased with higher total cholesterol and LDL cholesterol level more strongly in men, increased age and waist circumference were associated with higher ASCVD risk, especially in women (Fig.Ā 4).
There are significant differences in the prevalence of cardiovascular risk factors according to sex. Men generally have more risk factors than women, including a higher prevalence of hypertension, smoking, and diabetes4,5, and this was also observed in our cohort. We also observed that the cumulative ASCVD events were significantly higher in men than in women, with more frequent events of myocardial infarction noted in men. Regarding pharmacologic treatment for ASCVD prevention, studies have shown that women may be less likely to be treated for dyslipidemia, whereas men may receive less antihypertensive treatment4,5,30,31,32. Patient perception of health status, patient-provider communications, and quality of life related to ASCVD may also be substantially different by sex32. Given these significant differences, sex-specific risk assessments and targeted prevention strategies are important to improve the outcomes of ASCVD.
While several previous studies have constructed machine learning model for predicting ASCVD, they rarely considered sex-specific models or evaluated sex-related differences using these models12,13,14,15,16. Interestingly, the performance of our random forest model was higher in women than in men (men: AUC 0.733 vs. women: AUC 0.769), and this was also observed for the PCE-predicted ASCVD probabilities (men: AUC 0.727 vs. women: AUC 0.762). This finding suggests there is a need to improve risk prediction specific to each sex. It may also imply more complex patterns and interactions between cardiovascular risk factors in men. Deep phenotyping or clustering individuals into similar but mutually exclusive subgroups may enable a more accurate prediction of ASCVD risk in men. The results also suggest that incorporating variables beyond the traditional cardiovascular risk factors may help to improve risk prediction for each sex. Genetic information, serum biomarkers, or socioeconomic factors, including income, education level, and relationship status, contribute to the development of ASCVD and have the potential to improve predictive ability33,34. For women, menopausal status or gestational diabetes are independent predictors of ASCVD35. Future studies are required to test these possibilities.
To enhance interpretability, we further investigated how each variable is associated with ASCVD probabilities using feature extraction technique of a partial dependency plot. In partial dependency plots of age, we observed that the adverse effects of aging on the risk of ASCVD were stronger in women than in men. The cardio-protective role of estrogen may be one reason for the heightened risk associated with aging in women, especially after menopause. Estrogen plays an important role in the maintenance of cardiac structure and function by reducing oxidative stress, preserving endothelial function, and preventing the accumulation of myocardial fibrosis36. Studies have shown that cardiovascular mortality rates increase more steeply with age in women than in men, especially after age 45ā64 years37. Importantly, the earlier onset of natural menopause is associated with a higher risk of ASCVD, further supporting the concept of female vulnerability related to estrogen withdrawal35,38. However, our data did not contain information on the menopausal state or estrogen level, and future studies are warranted to clarify the role of estrogen and menopause in the sex-specific association between age and risk of ASCVD.
We observed a U-shaped association between BMI and ASCVD probability in both sexes. Recent studies have demonstrated that underweight is a robust risk factor for adverse cardiovascular events, including heart failure, cardiovascular mortality, and all-cause mortality39,40. The exact mechanism underlying the association between underweight and cardiovascular events is not yet fully understood. However, it is plausible to speculate that underweight may be indicative of malnutrition status or sarcopenia, both of which have significant implications for cardiovascular health. Our findings indicate that underweight is a significant risk factor in both men and women, highlighting the importance of optimizing body weight for the prevention of ASCVD.
Our study provides comprehensive information on the sex differences in cardiovascular risk factors and their association with the risk of ASCVD, which support the concept of targeted primary prevention interventions based on the assessment of individualized risk in terms of sex difference. Our findings imply that more active lipid-lowering therapy may benefit men, whereas control of abdominal obesity may be more crucial in women. However, more conclusive data demonstrating the benefits of such targeted intervention is eventually required, and future studies are warranted to test this hypothesis to improve cardiovascular outcomes in clinical practice.
Limitations
Our study has several limitations. First, although our analysis showed associations between cardiovascular risk factors and outcomes, it does not demonstrate the cause-and-effect relationship. Potential confounders across biological and sociodemographic factors (i.e., income status) may have influenced the study results, which is inherent to all observational cohort studies. Second, some risk factors, such as BP, may change during follow-up, and the change or variability of risk factors was not considered in our analysis. Third, the random forest model is not designed to account for time-to-event survival data. While the random survival forest algorithm has been shown to provide risk prediction using this type of data, we were unable to implement this algorithm due to its lengthy computational time with large sample sizes. Lastly, our study was conducted solely on individuals of Korean ethnicity, which may limit the generalizability of our results to other populations. Notably, smoking and alcohol consumption were markedly lower in women compared to men, which is consistent with previous research in Korea41,42. Therefore, further research is necessary to determine if our findings are applicable to other ethnic groups.
Conclusion
In conclusion, we developed a sex-specific machine learning model for predicting ASCVD events and investigated the associations between cardiovascular risk factors and ASCVD events in a large cohort from the general population using routine health examination data. While higher total cholesterol or LDL cholesterol levels were more strongly associated with the risk of ASCVD in men than in women, an increase in age and waist circumference were more strongly associated with the risk of ASCVD in women.
Data availability
All data created and/or used during this study are not publicly available according to the NHIS policy. Researchers can submit an application form through the NHIS website (https://nhiss.nhis.or.kr) to access and analyze the database.
Abbreviations
- ASCVD:
-
Atherosclerotic cardiovascular disease
- BMI:
-
Body mass index
- BP:
-
Blood pressure
- CI:
-
Confidence interval
- HDL:
-
High-density lipoprotein
- HR:
-
Hazard ratio
- ICD-10:
-
International Classification of Diseases, Tenth Revision
- LDL:
-
Low-density lipoprotein
- NHIS:
-
National Health Insurance Service
- PCE:
-
Pooled cohort equations
References
Roth, G. A. et al.Ā Global burden of cardiovascular diseases and risk factors, 1990ā2019: Update from the GBD 2019 study. J. Am. Coll. Cardiol. 76, 2982ā3021 (2020).
Arnett, D. K. et al. 2019 ACC/AHA guideline on the primary prevention of cardiovascular disease: Executive summary: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J. Am. Coll. Cardiol. 74, 1376ā1414 (2019).
Lloyd-Jones, D. M. et al. Use of risk assessment tools to guide decision-making in the primary prevention of atherosclerotic cardiovascular disease: A special report from the American Heart Association and American College of Cardiology. J. Am. Coll. Cardiol. 73, 3153ā3167 (2019).
Peters, S. A. E., Muntner, P. & Woodward, M. Sex differences in the prevalence of, and trends in, cardiovascular risk factors, treatment, and control in the United States, 2001 to 2016. Circulation 139, 1025ā1035 (2019).
Pinho-Gomes, A. C., Peters, S. A. E., Thomson, B. & Woodward, M. Sex differences in prevalence, treatment and control of cardiovascular risk factors in England. Heart 107, 462ā467 (2021).
Millett, E. R. C., Peters, S. A. E. & Woodward, M. Sex differences in risk factors for myocardial infarction: Cohort study of UK Biobank participants. BMJ 363, k4247 (2018).
Albrektsen, G. et al. Lifelong gender gap in risk of incident myocardial infarction: The TromsĆø study. JAMA Intern. Med. 176, 1673ā1679 (2016).
Bots, S. H., Peters, S. A. E. & Woodward, M. Sex differences in coronary heart disease and stroke mortality: A global assessment of the effect of ageing between 1980 and 2010. BMJ Glob. Health 2, e000298 (2017).
Jousilahti, P., Vartiainen, E., Tuomilehto, J. & Puska, P. Sex, age, cardiovascular risk factors, and coronary heart disease: A prospective follow-up study of 14 786 middle-aged men and women in Finland. Circulation 99, 1165ā1172 (1999).
Mikkola, T. S., Gissler, M., Merikukka, M., Tuomikoski, P. & Ylikorkala, O. Sex differences in age-related cardiovascular mortality. PLoS ONE 8, e63347 (2013).
Leening, M. J. et al. Sex differences in lifetime risk and first manifestation of cardiovascular disease: Prospective population based cohort study. BMJ 349, g5992 (2014).
Cho, S. Y. et al. Pre-existing and machine learning-based models for cardiovascular risk prediction. Sci. Rep. 11, 8886 (2021).
Weng, S. F., Reps, J., Kai, J., Garibaldi, J. M. & Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data?. PLoS ONE 12, e0174944 (2017).
Rousset, A. et al. Can machine learning bring cardiovascular risk assessment to the next level? A methodological study using FOURIER trial data. Eur. Heart J. Digit. Health 3, 38ā48 (2021).
Steinfeldt, J. et al. Neural network-based integration of polygenic and clinical information: Development and validation of a prediction model for 10-year risk of major adverse cardiac events in the UK Biobank cohort. Lancet Digit. Health 4, e84āe94 (2022).
Nakanishi, R. et al. Machine learning adds to clinical and CAC assessments in predicting 10-year CHD and CVD deaths. JACC Cardiovasc. Imaging 14, 615ā625 (2021).
Choi, E.-K. Cardiovascular research using the Korean national health information database. Korean Circ. J. 50, 754ā772 (2020).
Lee, J., Lee, J. S., Park, S. H., Shin, S. A. & Kim, K. Cohort profile: The national health insurance service-national sample cohort (NHIS-NSC), South Korea. Int. J. Epidemiol. 46, e15 (2017).
Craig, C. L. et al. International physical activity questionnaire: 12-Country reliability and validity. Med. Sci. Sports Exerc. 35, 1381ā1395 (2003).
Goff, D. C. Jr. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129, S49-73 (2014).
Touw, W. G. et al. Data mining in the life sciences with random forest: A walk in the park or lost in the jungle?. Brief Bioinform. 14, 315ā326 (2013).
Li, J. et al. A multicenter random forest model for effective prognosis prediction in collaborative clinical research network. Artif. Intell. Med. 103, 101814 (2020).
Strobl, C., Boulesteix, A. L., Zeileis, A. & Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8, 25 (2007).
CouronnƩ, R., Probst, P. & Boulesteix, A. L. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinform. 19, 270 (2018).
Breiman, L. Random forests. Mach. Learn. 45, 5ā32 (2001).
Austin, A. M. et al. Using a cohort study of diabetes and peripheral artery disease to compare logistic regression and machine learning via random forest modeling. BMC Med. Res. Methodol. 22, 300 (2022).
Kwak, S. et al. Markers of myocardial damage predict mortality in patients with aortic stenosis. J. Am. Coll. Cardiol. 78, 545ā558 (2021).
Jerome, H. F. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189ā1232 (2001).
Greenwell, B. M. pdp: An R package for constructing partial dependence plots. R J. 9, 421ā436 (2017).
OāMeara, J. G. et al. Ethnic and sex differences in the prevalence, treatment, and control of dyslipidemia among hypertensive adults in the GENOA study. Arch. Intern. Med. 164, 1313ā1318 (2004).
Choi, H. M., Kim, H. C. & Kang, D. R. Sex differences in hypertension prevalence and control: Analysis of the 2010ā2014 Korea National Health and Nutrition Examination Survey. PLoS ONE 12, e0178334 (2017).
Okunrintemi, V. et al. Gender differences in patient-reported outcomes among adults with atherosclerotic cardiovascular disease. J. Am. Heart Assoc. 7, e010498 (2018).
Colantonio, L. D. et al. Performance of the atherosclerotic cardiovascular disease pooled cohort risk equations by social deprivation status. J. Am. Heart Assoc. 6, e005676 (2017).
Yin, X. et al. Protein biomarkers of new-onset cardiovascular disease: Prospective study from the systems approach to biomarker research in cardiovascular disease initiative. Arterioscler. Thromb. Vasc. Biol. 34, 939ā945 (2014).
El Khoudary, S. R. et al. Menopause transition and cardiovascular disease risk: Implications for timing of early prevention: A scientific statement from the American Heart Association. Circulation 142, e506āe532 (2020).
Iorga, A. et al. The protective role of estrogen and estrogen receptors in cardiovascular disease and the controversial use of estrogen therapy. Biol. Sex Differ. 8, 33 (2017).
Merz, A. A. & Cheng, S. Sex differences in cardiovascular ageing. Heart 102, 825 (2016).
Muka, T. et al. Association of age at onset of menopause and time since onset of menopause with cardiovascular outcomes, intermediate vascular traits, and all-cause mortality: A systematic review and meta-analysis. JAMA Cardiol. 1, 767ā776 (2016).
Held, C. et al. Body mass index and association with cardiovascular outcomes in patients with stable coronary heart diseaseāA STABILITY substudy. J. Am. Heart Assoc. 11, e023667 (2022).
Lee, H. J. et al. Age-dependent associations of body mass index with myocardial infarction, heart failure, and mortality in over 9 million Koreans. Eur. J. Prev. Cardiol. 29, 1479ā1488 (2022).
Kim, I. et al. Comparison of district-level smoking prevalence and their income gaps from two national databases: The national health screening database and the community health survey in Korea, 2009ā2014. J. Korean Med. Sci. 33, e44 (2018).
Kim, S. Y. & Kim, H. J. Trends in alcohol consumption for Korean adults from 1998 to 2018: Korea national health and nutritional examination survey. Nutrients 13, 609 (2021).
Funding
Supported by Grant No. 0420213040 from the SNUH Research Fund.
Author information
Authors and Affiliations
Contributions
S.Kwak, H.J.L. and S.Kim planned and designed the current study, performed the statistical analysis, and drafted the manuscript. J.B.P., S.P.L., and H.K.K. reviewed/edited the manuscript and contributed to the discussion. Y.J.K. supervised all aspects of the study and revised the manuscript critically. H.J.L. and S.Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kwak, S., Lee, HJ., Kim, S. et al. Machine learning reveals sex-specific associations between cardiovascular risk factors and incident atherosclerotic cardiovascular disease. Sci Rep 13, 9364 (2023). https://doi.org/10.1038/s41598-023-36450-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-36450-4
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.