Establishment of precise prevention strategies for the occurrence and progression of coronary atherosclerotic heart disease using machine learning

Background Coronary atherosclerotic heart disease (CHD) is highly prevalent in Northwest China; however, effective preventive measures are limited. This study aimed to develop metabolic risk models tailored for the primary and secondary prevention of CHD in Northwest China. Methods This hospital-based cross-sectional study included 744 patients who underwent coronary angiography. Data on demographic characteristics, comorbidities, and serum biochemical indices of the participants were collected. Three machine learning algorithms—recursive feature elimination, random forest, and least absolute shrinkage and selection operator—were employed to construct risk models. Model validation was performed using receiver operating characteristic and calibration curves, and the optimal cutoff values for significant risk factors were determined. Results The predictive model for CHD onset included sex, overweight/obesity, and hemoglobin A1c (HbA1c) levels. For CHD progression to multiple coronary artery disease, the model included age, total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), and HbA1c levels. The model predicting an increased coronary Gensini score included sex, overweight/obesity, TC, LDL-C, high-density lipoprotein cholesterol, lipoprotein(a), and HbA1c levels. Notably, the optimal cutoff values for HbA1c and lipoprotein(a) for determining CHD progression were 6 % and 298 mg/L, respectively. Conclusions Robust metabolic risk models were established, offering significant value for both the primary and secondary prevention of CHD in Northwest China. Weight loss, strict hyperglycemic control, and improvement in dyslipidemia may help prevent or delay the occurrence and progression of CHD in this region.


Introduction
Cardiometabolic diseases (CMDs), including type 2 diabetes (T2D) and cardiovascular disease (CVD), are the primary contributors to global mortality and morbidity [1].Coronary atherosclerotic heart disease (CHD) is a major component of the global CVD burden [2] and serves as a crucial clinical phenotype of CMDs.In the United States, an estimated 20.1 million individuals have CHD, while 11.1 million Americans have chronic stable angina pectoris [3].In China, the overall prevalence of CVD significantly increased by 14.7 % from 1990 to 2016, and CVD remains the leading cause of death [4].
The spatial patterns of the mortality and prevalence of CVD and its main subcategories, such as ischemic heart disease (IHD), vary significantly across China [4].The gap in the relative burden of CVD between provinces widened from 1990 to 2016, with a more rapid decline observed in economically developed provinces [4].From 1990 to 2015, 22 of 33 provinces experienced an increase in age-standardized mortality from IHD, with eight provinces experiencing an increase of over 30 % [5].In particular, Qinghai Province in Northwest China saw a 54 % increase in IHD mortality and a 279 % increase in IHD deaths.However, 11 provinces showed a decreasing trend in IHD mortality, predominantly in economically developed regions [5].The variations in CHD incidence and mortality rates, as well as the diverse patterns of change over short periods, suggest the potential for effective prevention strategies.Exposure to cardiovascular risk factors, particularly metabolic disorders, has increased in China.Notably, the prevalence of hypertension increased from 7.7 % in 1980 to 27.5 % in 2018, whereas that of diabetes increased from 0.67 % in 1980 to 11.2 % in 2017 [6].The 2015 China Adult Chronic Disease and Nutrition Surveillance Project indicated elevated cholesterol and triglyceride levels in Chinese adults compared with those in 2002 [7].We hypothesized that the high incidence of CHD in Northwest China is strongly linked to specific metabolic risk factors.This study aimed to develop accurate metabolic risk models for CHD occurrence and progression in Northwest China and to determine the optimal cutoff values for key metabolic risk factors.Our findings will contribute to the development of effective CHD prevention strategies to reduce the burden of CMDs in this region.

Study design and participants
This cross-sectional study was conducted at a single center.We randomly selected 1000 individuals who presented with precardiac discomfort and underwent coronary angiography at the Department of Cardiovascular Medicine of the First Affiliated Hospital of Xi'an Jiaotong University between January 2022 and December 2022.This hospital is a major tertiary institution located in Northwest China and is renowned for its clinical practice, teaching, and scientific research activities.
The study included adult participants with complete clinical information who underwent coronary angiography.The exclusion criteria were as follows: (1) patients with a history of coronary artery bypass grafting or percutaneous coronary intervention; (2) patients with malignant tumors; and (3) patients who had previously been treated for thyroid disease with oral methimazole, propylthiouracil, amiodarone, thyroxine tablets, thyroid radioiodine-131 therapy, or thyroidectomy.Ultimately, 744 participants aged 23-90 years were included.Of these participants, 507 (68 %) were men, and 570 (77 %) were diagnosed with CHD.Among patients with CHD, 418 (73 %) had multiple coronary artery disease (CAD).Detailed information on the clinical and demographic characteristics of the study participants is presented in Tables 1 and 2.

Observed variables
A total of 28 variables were included as candidate risk factors for CHD onset and progression based on previous literature and expert opinions.All 28 variables were classified into three categories, including metabolic factors.The demographic features of the study participants included sex, age, body mass index (BMI), overweight or obese status, smoking status, alcohol consumption status, and family history of CHD.The clinical variables included a history of hypertension, history of diabetes, systolic blood pressure (SBP), diastolic blood pressure, aspirin use, statin use, and presence of chronic kidney disease (estimated glomerular filtration rate [eGFR] <60 mL/min/1.73m 2 ).Additionally, we included blood biochemical indicators that may contribute to the risk assessment of CHD onset and progression.These biochemical indicators included hemoglobin A1c (HbA1c), eGFR, serum levels of total cholesterol (TC), triglycerides, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), lipoprotein(a) (Lp(a)), and plasma fibrinogen levels.Indicators of thyroid function, including thyroxine, triiodothyronine, free thyroxine, free triiodothyronine (FT3), and hypersensitive thyroid-stimulating hormone levels, were also included in this study.HbA1c levels were measured using high-performance liquid chromatography, and plasma fibrinogen levels were assessed using the class coagulation method.Serum triglyceride and TC levels were determined enzymatically using a colorimetric method, while HDL-C and LDL-C levels were measured directly.Serum Lp(a) levels were determined using an immunoturbidimetric method.Five thyroid function indices were measured using chemiluminescence assay.eGFR was calculated using the Chronic Kidney Disease Epidemiology Collaboration equation [8].

Outcome variables
All patients underwent coronary angiography via the right radial artery.CHD was diagnosed when the stenosis exceeded 50 % in any vessel of the main branches of the coronary artery, including the left anterior descending, left circumflex, and right coronary arteries.Stenosis of less than 50 % was considered non-CHD.Patients with CHD were further categorized into single-and multi-vessel CAD groups.Single-vessel CAD was defined as ≥50 % stenosis in one major coronary artery, while multi-vessel CAD was defined as ≥50 % stenosis in at least two vessels of the left anterior descending, left circumflex branches, and the right coronary artery.In addition, lesions with ≥50 % stenosis in the left main coronary artery were defined as multi-vessel CAD [9].Additionally, patients with CHD were divided into two groups based on the median value of their coronary Gensini score: those with a low Gensini score (≤53) and those with a high Gensini score (>53).The Gensini score was calculated by summing the basic scores for each blood vessel multiplied by the corresponding scoring coefficient [10].

Statistical analysis
Machine learning models using random forest (RF), least absolute shrinkage and selection operator (LASSO), and recursive feature elimination (RFE) algorithms were fitted to evaluate the performance of multiple clinical and demographic variables as potential risk indicators of CHD occurrence and progression.A flowchart of the analysis is shown in Fig. 1.
Data from patients with CHD were randomly split into training and validation sets in a ratio of 7:3.Univariate logistic regression was used to screen risk factors in the training set, which were then optimized using RF, LASSO, and RFE.The support vector machine (SVM) method was adopted [11] for the training set to construct a diagnostic classifier for determining the occurrence and progression of CHD.Nomograms were constructed to predict the probability of CHD development.The model performance was evaluated using receiver operating characteristic (ROC) curves, calibration curves, area under the curve (AUC), C-index, and Youden's index for optimal cutoff values.
The Kolmogorov-Smirnov test was used to assess normality.Continuous variables are presented as mean ± standard deviation (SD) and compared using t-tests.Non-normal variables are presented as the median and interquartile range and compared using the Mann-Whitney U test.Categorical variables are expressed as percentages and compared using the chi-square test.Statistical significance was set at P < 0.05.Analyses were conducted using the R software (v3.6.1).

Participant characteristics
In total, 744 participants were recruited, including 570 and 174 with and without CHD, respectively.Table 1 summarizes the patients' clinical and demographic characteristics.Among the 28 features studied, 12 were significantly different between the two groups based on the CHD status.For those with CHD, 10 features were significantly different from those with and without multiple CADs.Similarly, 10 features were significantly different among patients with CHD based on their Gensini scores (Table 2).

Screening for potential risk factors
In the training set, univariate logistic regression was used to screen for risk factors significantly associated with each outcome.We screened five risk factors significantly associated with CHD onset: sex, overweight/obesity, tobacco use, HDL-C, and HbA1c; five risk factors significantly associated with multiple CADs: age, SBP, TC, LDL-C, and HbA1c levels; and eight risk factors significantly associated with increased Gensini scores: sex, overweight/obesity, TC, HDL-C, LDL-C, Lp(a), HbA1c, and FT3 levels.The odds ratios of the selected features for the three outcomes are presented in Supplemental Table S1.

Identification of optimal risk factors
Based on the univariate logistic regression analysis, we utilized the RF, LASSO, and RFE algorithms to refine risk factor screening for the three outcomes: CHD, multiple CADs, and Gensini score.For CHD, these algorithms identified overlapping clinical factors such as sex, overweight/obesity, and HbA1c levels.Similarly, for multiple CADs, the optimized factors included age, TC, LDL-C, and HbA1c levels.Seven clinical factors were identified for the Gensini score: sex, overweight/obesity, TC, HDL-C, LDL-C, Lp(a), and HbA1c levels.The algorithm parameters are presented in Fig. 2.

Development and validation of SVM diagnostic classification models
Furthermore, we established three SVM diagnostic classification models for predicting CHD risk, multiple CADs, and increased CAD Gensini scores and plotted their respective nomograms in the training set (Fig. 3A, D, 3G).Calibration plots were drawn, and the Cindex was calculated to evaluate the predictive ability of the three models.For the CHD occurrence model, the C-index was 0.793 in the training set (Figs. 3B) and 0.754 in the validation set (Fig. 3C).For the multiple CAD model, the C-index was 0.763 in the training set (Figs. 3E) and 0.709 in the validation set (Fig. 3F).For the model with an increased Gensini score, the C-index was 0.772 in the training set (Figs. 3H) and 0.704 in the validation set (Fig. 3I).

Model performance
The ROC curves were constructed to evaluate the effectiveness of each model on the training and validation sets.For the outcome classified according to CHD onset, the ability of the model established by combining sex, overweight/obesity, and HbA1c levels to identify CHD was significantly greater than that of any single factor.The AUC values for this model were 0.731 and 0.716 for the training (Fig. 4A) and validation sets (Fig. 4B), respectively.For the outcome classified by multiple CADs, the ability of the model established using age, TC, LDL-C, and HbA1c levels to identify CHD progression was significantly greater than that of any single factor, with AUC values of 0.804 in the training set (Figs. 4C) and 0.756 in the validation set (Fig. 4D).For the outcome classified by the Gensini score, the ability of the model established using sex, overweight/obesity, TC, HDL-C, LDL-C, Lp(a), and HbA1c levels to identify CHD progression was significantly greater than that of any single factor, with AUC values of 0.841 in the training set (Figs. 4E) and 0.809 in the validation set (Fig. 4F).The AUC values of each ROC curve and the corresponding 95 % confidence intervals are shown in Supplemental Table S2.

Optimal cutoff values for the single-indicator model with HbA1c or Lp(a) levels
The performance of the single-indicator models, including HbA1c and Lp(a) levels, was examined to predict the occurrence and progression of CHD.The optimal cutoff values for HbA1c were 5.95 %, 6.05 %, and 6.15 % for CHD risk, multiple CADs, and increased CAD Gensini scores, respectively (Fig. 5A, C, 5E), with AUC values of 0.585, 0.603, and 0.596, respectively (Fig. 5B, D, 5F).For Lp(a), the optimal cutoff value was 298 mg/L (Fig. 5G), with an AUC of 0.548 (Fig. 5H).The sensitivity and specificity of these models are summarized in Supplemental Table S3.

Discussion
To the best of our knowledge, our study is the first to use three machine learning algorithms to investigate the metabolic risk factors influencing the initiation and advancement of CHD in Northwest China.This study presents four major findings.First, we developed a metabolic risk model for CHD onset, with predictors including sex, overweight/obesity, and HbA1c levels.Second, we established a metabolic risk model to evaluate CHD progression to multiple CADs, with predictors including age, TC, LDL-C, and HbA1c levels.Third, we developed a metabolic risk model to assess CHD progression to an increased CAD Gensini score, with predictors including sex, overweight/obesity, TC, LDL-C, HDL-C, Lp(a), and HbA1c levels.Fourth, the optimal cutoff values of HbA1c and Lp(a) levels for determining CHD progression were approximately 6.0 % and 298 mg/L, respectively.
Hyperglycemia is a significant metabolic risk factor for atherosclerotic cardiovascular disease (ASCVD) [12].The growing prevalence of diabetes has led to an increase in CMDs and related deaths in China, particularly in the Northwest region [6].This study found that higher HbA1c levels increased the risk of CHD occurrence and progression, and the HbA1c level was the most significant factor in the assessment model for CHD, emphasizing the importance of managing hyperglycemia in developing prevention strategies for CHD in this region.Furthermore, this study found that HbA1c levels of >6 % increased the risk of CHD, highlighting the need for routine HbA1c screening, early detection of prediabetes/diabetes, and maintenance of HbA1c levels of <6 % to reduce the risk of CHD.Multiple clinical trials have shown that lifestyle interventions coupled with reduced-calorie meal plans effectively prevent or delay T2D [13][14][15] and improve cardiometabolic markers [16].The 2024 American Diabetes Association guideline recommends lifestyle changes, including a healthy diet and ≥150 min/week of moderate-intensity physical activity, to reduce weight by at least 7 % for adults with overweight/obesity who are at a high risk of T2D [12].Notably, achieving lower HbA1c levels safely without hypoglycemia is also beneficial [17].However, cardiovascular event rates remain high in patients with T2D despite good glycemic control [18]; thus, glucose-lowering medications with proven cardiovascular benefits are recommended for patients with CHD and T2D [19,20].
This study underscores the importance of addressing overweight and obesity in Northwest China to reduce the risk of CHD development and progression.Compared to normal-weight individuals, patients with obesity experience chronic coronary disease (CCD) events at an earlier age, live with CCD for a greater proportion of their lifetime, and have a shorter average life span [21].Excess adiposity accelerates atherosclerosis and promotes adverse changes in cardiac function by exerting deleterious effects on the myocardium, vasculature, and obesity-related comorbidities, such as hypertension, dyslipidemia, and T2D [22,23].Compared with those in the eastern coastal areas, people in Northwest China consume more carbohydrates and fewer vegetables and fruits and exercise less outdoors owing to a colder winter weather.These habits may lead to a high incidence of being overweight or obese in this region, further increasing the risk of CHD.The latest guidelines recommend that, in patients with CCD, the assessment of BMI with or without waist circumference is recommended during routine clinical follow-up [20].For patients who require pharmacological therapy for further weight reduction, drug therapies can be effective alongside counseling regarding diet and physical activity [24].In patients with CCD and severe obesity who have not met weight loss goals with lifestyle and pharmacological intervention and have acceptable surgical risk, referral for a bariatric procedure is reasonable for weight loss and cardiovascular risk reduction [25,26].Therefore, altering unhealthy dietary and behavioral patterns in Northwest China and adopting beneficial lifestyles and dietary patterns recommended by the guidelines can minimize the risk of being overweight or obese, thus reducing the burden of CMDs.
This study highlights the significant role of elevated LDL-C, TC, and Lp(a) levels in promoting CAD progression in Northwest China, suggesting that inadequate lipid management during the secondary prevention of CHD could be a primary factor contributing to the high burden of CHD in this region.Importantly, elevated LDL-C levels are major contributors to ASCVD [27].The cornerstone of managing serum cholesterol levels is promoting a healthy lifestyle throughout one's lifetime [28].Even individuals with a genetic predisposition to CHD can reduce their risk by up to 50 % through lifestyle modifications [29].Strategies such as maintaining normal weight and blood sugar levels, reducing the intake of simple sugars and refined carbohydrates, and increasing physical activity can improve lipid profiles and provide additional health benefits [30].Currently, the medications available for LDL-C reduction include statins, ezetimib, proprotein convertase subtilisin/kexin type 9 inhibitors, and inclisiran [31].
Furthermore, this study is the first to demonstrate that serum Lp(a) levels >298 mg/L contribute to increased CAD Gensini scores in patients with CHD.High Lp(a) levels are known to cause ASCVD and cardiovascular and all-cause mortality in both men and women and in ethnically diverse populations [32].The mechanisms by which Lp(a) increases the risk of ASCVD are diverse.First, Lp(a) particles contain apolipoprotein B, similar to other apolipoprotein B-containing particles such as LDL, conferring atherogenic properties.Second, Lp(a) serves as a significant carrier of oxidized phospholipids, linked to damage and capable of triggering pro-inflammatory responses.Finally, it is hypothesized that apolipoprotein(a) partially and selectively binds to endothelial extracellular matrix proteins, leading to its retention within the arterial wall [33].As Lp(a) levels are genetically determined, lifestyle interventions do not affect Lp(a)-mediated ASCVD risk.Genetically determined Lp(a) levels are not influenced by lifestyle interventions, and current lipid-lowering therapies have a limited clinical impact on Lp(a) levels.However, there are multiple Lp (a)-directed therapies in clinical development that target LPA mRNA, such as pelacarsen [34], olpasiran [35], and SLN360 [36], which have demonstrated the ability to reduce plasma Lp(a) levels by up to 90 %.Although the exact reduction required to achieve clinically meaningful benefits remains uncertain, our findings indicate that maintaining serum Lp(a) levels below 298 mg/L may potentially delay the progression of CHD.
Our study has several strengths.First, our model, derived from various observational variables of patients in our region, exhibited distinct racial, regional, and temporal characteristics, enhancing its applicability to the local population compared to that of previous methods.Second, metabolic factors have been recognized as contributors to the development and progression of ASCVD [37][38][39].Hence, our model incorporates traditional risk factors along with novel variables such as thyroid function, HbA1c, and Lp(a).Third, we utilized advanced machine learning algorithms to construct a multifactorial integrated prediction model for CHD, leveraging various information sources to enhance prediction accuracy.Finally, our research provides optimal cutoff values for HbA1c for predicting the onset and progression of CHD, thereby supporting glycemic control goals for the prevention and management of CMDs.
This study has some limitations.First, cross-sectional data were utilized, preventing the establishment of a temporal relationship between the risk factors and CHD onset and progression.Further prospective studies are required to validate this association.Second, the lack of data on newly identified variables, including predisposition genes, may have hindered our ability to identify additional risk factors.

Conclusions
In summary, we developed effective metabolic risk models to evaluate the occurrence and progression of CHD in Northwest China.These models provide a crucial foundation for enhancing preventive strategies against CHD by identifying and managing the combined metabolic risk factors.Prioritizing strict interventions for hyperglycemia is expected to mitigate the risk of CHD occurrence and progression in this region.5. Cutoff values and model performance of the single important feature.A Youden's index and the cutoff value for CHD onset using the HbA1c level as a single feature.B ROC curve and AUC value for CHD onset using the HbA1c level as a single feature.C Youden's index and the cutoff value for multiple CADs using the HbA1c level as a single feature.D ROC curve and AUC value for multiple CADs using the HbA1c level as a single feature.E Youden's index and the cutoff value for a high Gensini score in CHD using the HbA1c level as a single feature.F ROC curve and AUC value for high Gensini score in CHD using the HbA1c level as a single feature.G Youden's index and the cutoff value for a high Gensini score in CHD using the Lp (a) level as a single feature.H ROC curve and AUC value for a high Gensini score in CHD using the Lp(a) level as a single feature.AUC, area under the curve; CAD, coronary artery disease; CHD, coronary atherosclerotic heart disease; HbA1c, hemoglobin A1c; Lp(a), lipoprotein (a); ROC, receiver operating characteristic.
Q. Wu et al.

Fig. 2 .
Fig. 2. Optimal screening of risk factors for CHD onset and progression.A Parameter diagram of risk factors for CHD onset screened using the random forest algorithm.B Parameter graphs of risk factors for CHD onset screened using the least absolute shrinkage and selection operator algorithm.C Parameter diagram of risk factors for CHD onset screened using the recursive feature elimination algorithm.D Comparative Venn diagram of risk factors for CHD onset screened using the three algorithms.E Parameter diagram of risk factors for multiple CADs screened using the random forest algorithm.F Parameter graphs of risk factors for multiple CADs screened using the least absolute shrinkage and selection operator algorithm.G Parameter diagram of risk factors for multiple CADs screened using the recursive feature elimination algorithm.H Comparative Venn diagram of risk factors for multiple CADs screened using the three algorithms.I Parameter diagram of risk factors for a high Gensini score screened using the random forest algorithm.J Parameter graphs of risk factors for a high Gensini score screened using the least absolute shrinkage and selection operator algorithm.K Parameter diagram of risk factors for a high Gensini score screened using the recursive feature elimination algorithm.L Comparative Venn diagram of risk factors for high Gensini scores screened using the three algorithms.CAD, coronary artery disease; CHD, coronary atherosclerotic heart disease; FT3, free triiodothyronine; HbA1c, hemoglobin A1c; HDL-C, highdensity lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; Lp(a), lipoprotein(a); SBP, systolic blood pressure; TC, total cholesterol.Q. Wu et al.

Fig. 3 .
Fig. 3. Nomograms and calibration plots of the predictive models.A Nomogram built with the training set for CHD onset.B Calibration curve of the training set for CHD onset.C Calibration curve of the validation set for CHD onset.D Nomogram built with the training set for multiple CADs.E Calibration curve of the training set for multiple CADs.F Calibration curve of the validation set for multiple CADs.G Nomogram built with the training set for a high Gensini score.H Calibration curve of the training set for a high Gensini score.I Calibration curve of the validation set for a high Gensini score.CAD, coronary artery disease; CHD, coronary atherosclerotic heart disease; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; Lp(a), lipoprotein(a); TC, total cholesterol.

Fig. 4 .
Fig. 4. ROC curves for evaluating CHD occurrence and progression using the SVM model.A ROC curve of multiple clinical features predicting CHD risk in the training set.B ROC curve of multiple clinical features predicting CHD risk in the validation set.C ROC curve of multiple clinical features predicting multiple CADs in the training set.D ROC curve of multiple clinical features predicting multiple CADs in the validation set.E ROC curve of multiple clinical features predicting a high Gensini score in the training set.F ROC curve of multiple clinical features predicting a high Gensini score in the validation set.AUC, area under the curve; CAD, coronary artery disease; CHD, coronary atherosclerotic heart disease; HbA1c, hemoglobin A1c; HDL-C, highdensity lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; Lp(a), lipoprotein(a); ROC, receiver operating characteristic; SVM, support vector machine; TC, total cholesterol.Q. Wu et al.

Fig.
Fig.5.Cutoff values and model performance of the single important feature.A Youden's index and the cutoff value for CHD onset using the HbA1c level as a single feature.B ROC curve and AUC value for CHD onset using the HbA1c level as a single feature.C Youden's index and the cutoff value for multiple CADs using the HbA1c level as a single feature.D ROC curve and AUC value for multiple CADs using the HbA1c level as a single feature.E Youden's index and the cutoff value for a high Gensini score in CHD using the HbA1c level as a single feature.F ROC curve and AUC value for high Gensini score in CHD using the HbA1c level as a single feature.G Youden's index and the cutoff value for a high Gensini score in CHD using the Lp (a) level as a single feature.H ROC curve and AUC value for a high Gensini score in CHD using the Lp(a) level as a single feature.AUC, area under the curve; CAD, coronary artery disease; CHD, coronary atherosclerotic heart disease; HbA1c, hemoglobin A1c; Lp(a), lipoprotein (a); ROC, receiver operating characteristic.

Table 1
Clinical and demographical characteristics of the study participants.

Table 2
Clinical and demographical characteristics of patients with CHD.