Explainable coronary artery disease prediction model based on AutoGluon from AutoML framework

Objective This study focuses on the innovative application of Automated Machine Learning (AutoML) technology in cardiovascular medicine to construct an explainable Coronary Artery Disease (CAD) prediction model to support the clinical diagnosis of CAD. Methods This study utilizes a combined data set of five public data sets related to CAD. An ensemble model is constructed using the AutoML open-source framework AutoGluon to evaluate the feasibility of AutoML in constructing a disease prediction model in cardiovascular medicine. The performance of the ensemble model is compared against individual baseline models. Finally, the disease prediction ensemble model is explained using SHapley Additive exPlanations (SHAP). Results The experimental results show that the AutoGluon-based ensemble model performs better than the individual baseline models in predicting CAD. It achieved an accuracy of 0.9167 and an AUC of 0.9562 in 4-fold cross-bagging. SHAP measures the importance of each feature to the prediction of the model and explains the prediction results of the model. Conclusion This study demonstrates the feasibility and efficacy of AutoML technology in cardiovascular medicine and highlights its potential in disease prediction. AutoML reduces the barriers to model building and significantly improves prediction accuracy. Additionally, the integration of SHAP enhances model transparency and explainability, which is critical to ensuring model credibility and widespread adoption in cardiovascular medicine.


Introduction
In recent years, incidences and death tolls of cardiovascular disease (CVD) have been increasing, making them one of the leading causes of mortality and morbidity worldwide.The American Heart Association (AHA), in their 2023 CVD statistics report, highlighted that in the initial year of the COVID-19 pandemic in 2020, the U.S. witnessed a dramatic increase in CVD-related deaths (1).Fatalities surged from 876,613 in 2019 to 928,741 in 2020, surpassing 910,000 in 2003.The "Report on Cardiovascular Health and Diseases in China 2021: An Updated Summary" indicates that approximately 330 million people in China are affected by CVD.In 2019, CVD fatalities accounted for 46.74% and 44.26% of total deaths in rural and urban areas, respectively (2).One in every five deaths was attributed to CVD.Coronary Artery Disease (CAD) is a cardiovascular disease caused by the hardening and narrowing of the arteries in the coronary arteries, leading to insufficient blood supply to the heart.It is one of the most common and fatal diseases globally.Therefore, accurate screening of potential patients holds paramount theoretical and practical significance.
Machine learning is a technique of artificial intelligence characterized by algorithms that enable machines to learn and improve autonomously.Combining machine learning and medical data can yield unexpected results, assisting physicians in diagnostic decision-making (3,4).Multiple studies have explored and implemented machine learning methods for predicting the risk of CAD.Within these studies, researchers have assessed different machine learning algorithms using specific data sets and identified models with optimal performance on test sets.K-nearest neighbor, random forest, logistic regression, and neural networks have obtained the highest classification performance in various studies (5)(6)(7)(8).Several studies have concentrated on enhancing machine learning approaches by utilizing optimization strategies to improve prediction accuracy on test sets.For instance, machine learning algorithms that leverage particle swarm and ant colony optimization have been employed to predict and classify heart disease (9).By combining the multiobjective particle swarm optimization and random forest, a new approach is proposed to predict heart disease (10).The advanced particle swarm optimization merged lion algorithm is used to improve heart disease prediction, and the performance is better than other traditional models (11).Bayesian optimization approaches have been implemented to fine-tune the hyperparameter settings of XGBoost in producing heart disease prediction models (12).Furthermore, other analyses have examined multi-model ensemble techniques (13)(14)(15)(16).Findings suggest that model integration can significantly enhance predictive performance, particularly in accurately identifying CAD, showing excellent results.
In the context of disease prediction, machine learning algorithms can analyze vast amounts of patient data, uncovering and extracting vital features related to disease, thus predicting the likelihood of a patient developing disease.However, employing machine learning methods necessitates manual model selection and hyperparameter tuning, requiring a profound understanding and expertise of the algorithms.Automated Machine Learning (AutoML) is a technique that automates machine learning model development using machine learning algorithms (17).It seeks to reduce the need for human intervention in the model development process, thereby accelerating model development and deployment (18).AutoML has extensive applications in diagnostics and forecasting in medicine.A model developed using AutoML to predict the survival rate of COVID-19 patients has demonstrated that AutoML is an efficient method for generating clinical decision support tools based on machine learning ( 19).An AutoML model based on the XGBoost algorithm, designed to predict the 30-day mortality rate of non-cholestatic cirrhosis patients, outperforms existing scoring systems (20).Furthermore, an AutoML model based on the GBM algorithm for the early identification of critically ill acute pancreatitis patients hospitalized due to acute pancreatitis has shown significant clinical utility (21).Additionally, AutoMLbased models have exhibited excellent performance in predicting the recurrence of common bile duct stones after endoscopic retrograde cholangiopancreatography treatment (22).AutoML has also shown remarkable performance in predicting the 90-day mortality rate of gastric cancer patients who have undergone gastrectomy with a large sample size (23).
Artificial Intelligence has tremendous potential in the medical field, but its lack of transparency has limited its adoption in clinical practice.Explainable AI holds the potential to overcome this issue (24).In medical diagnostic research, the explainability of the model is crucial.Providing a clear, transparent, and logically coherent rationale is essential when a model generates predictions or recommendations.This ensures physicians can trust the model's outputs and make informed clinical decisions (25).
Machine learning offers multidimensional possibilities in medical data analysis and mining.With the continuous updating and advancement of machine learning technology, the requirement for practitioners' professional knowledge is also increasing.AutoML significantly reduces the threshold and burden of modelers through automatic modeling and parameter optimization.To investigate the disease prediction effect of AutoML technology in cardiovascular medicine, our study utilized publicly available clinical data on heart disease and used AutoGluon, an AutoML framework based on multi-model fusion technology, to construct prediction models.In addition, the explainability of the model was crucial for clinical prediction.Most of the CAD prediction models built in previous studies were not analyzed for explainability, making it difficult to explain the decision-making process of the models.To improve the explainability of our model, this study utilizes SHapley Additive exPlanations (SHAP) for analysis, which provides an in-depth understanding of the contribution of each input feature to the model.Our study aims to explore the potential of utilizing AutoML techniques to construct explainable CAD prediction models.

Data preparation
This study utilizes a combined data set on heart disease (26) 1.In the data set, a CAD status value of 0 signifies non-CAD patients (arterial diameter narrowing less than 50%), while 1 indicates CAD patients (arterial diameter narrowing greater than 50%).This diagnostic outcome is derived from invasive angiographic examinations of coronary arteries.The original data set contains missing values for cholesterol.We address these missing values by using mean imputation.The baseline comparison table between patients with CAD and patients without CAD is shown in Table 2.It is worth noting that inherent limitations may exist due to using a data set sourced from an open-access website and its limited scope.These limitations include significant disparities and a lack of propensity matching, as well as the absence of BMI, diabetes, and other variables.Additionally, there is a differentiation between prevalent and incidental cases and a deficiency of appropriate data regarding potential confounding factors.

AutoGluon: AutoML framework
AutoGluon is an AutoML framework developed by Amazon Web Services (AWS) (27).Its core idea is to simplify the selection, training, and deployment of machine learning models so that a wide range of developers can easily apply powerful machine learning techniques without needing to understand low-level details.First, AutoGluon offers automated feature generation, selection, and transformation, which aids in enhancing model performance while reducing the complexity of data preparation.Second, AutoGluon integrates and stacks multiple models automatically, enhancing predictive performance.In terms of efficiency, the algorithms in AutoGluon are optimized for limited computational resources, enabling it to produce the best models within a constrained time and computational budget.One of the critical reasons for the high performance of AutoGluon is its model integration and stacking capabilities.Integration and stacking techniques combine predictions from multiple models to improve overall predictive performance.

Bagging in AutoGluon
Bagging is a classic ensemble technique designed to reduce the variance of models.Bagging in AutoGluon is a fundamental ensemble method that is particularly significant for small data sets, effectively preventing model overfitting.The idea of bagging is to train independent weak learners and combine the results of each base learner to obtain one strong learner.As shown in Figure 1A, multiple data subsets of bagging are generated by repeating random sampling and replacing the original data.Bootstrapping refers to repeatedly randomly sampling with replacement from the n data points, sampling a total of n times.After n samplings, a new data set containing n samples is obtained.This data set is called a bootstrap sample.An independent model is trained on each Sample set.Choose an appropriate combination strategy to ensemble the weak learners' predictions.

Stacking in AutoGluon
Stacking is an advanced ensemble technique wherein the output of one model is used as input features for another model.AutoGluon employs multi-level stacking, allowing each level's models to leverage the predictions of preceding models.This approach improves the model's generalization ability.The metalearner integrates the predictions from different perspectives of different base learners, which can achieve better prediction performance than a single model.As shown in Figure 1B, stacking divides the original training data into a training set and a reserved set.Train multiple base learners on the training set.Base learners predict the hold-out set, and these predictions are spliced into new features.Optionally weight the base learner predictions.Train a meta-learner on the predicted features and real labels of the hold-out set as new samples.

Baseline models
The baseline models used in this study are decision tree, LightGBM, CatBoost, XGBoost, random forest, KNN, neural networks, and FastAI.
Decision tree is a classification algorithm based on a tree-like structure, which classifies data by partitioning the feature space into independent regions.Decision tree is a classification algorithm based on tree structure, which is easy to understand and explain, insensitive to outliers, but easy to overfit.LightGBM is a gradient lifting framework, which is famous for its efficient

SHAP: model explanation
SHAP is a method for explaining model predictions based on the Shapley value concept in game theory (32).The calculation of Shapley values utilized the formula provided in Supplementary Material Section 1.It assesses various combinations of feature values to quantify the influence of each feature on the model's predictions, providing a numerical value that represents the degree of contribution of each feature to the predicted value of a specific sample.A positive number indicates a positive impact on the result, while a negative number indicates a negative impact.SHAP can explain linear models, tree models, neural networks, and so on.It provides an intuitive graphical display way to help users better understand the prediction process of the model and reveal the influence degree of each feature in the model on the results.
SHAP offers a systematic framework for estimating feature importance and ensures the consistency and fairness of the results.This study utilizes SHAP to explain the CAD prediction model.

Experimental process
This study proposes an AutoGluon-based model for predicting CAD and analyzes its explainability using SHAP.Experiments are conducted to determine the optimal AutoGluon model by tuning two key parameters, stack-level and bag-fold.Stack-level determines the depth of the layers for stacking learning, while bag-fold specifies the number of folds used in bagging.The experimental flow proposed in this study is shown in Figure 2.
The data set is divided into a 7-3 ratio, with 70% as the training set and 30% as the test set.Firstly, we build an ensemble model of CAD prediction based on AutoGluon.AutoGluon uses bagging and stacking to integrate baseline models.Then, we test the training results of each ensemble model to determine the best parameters.The key code snippets are provided in Supplementary Material Section 2. Outside of Autogluon, we train using the eight individual machine learning models.The performances of these models are The experimental process.3 depicts that variables exhibiting pronounced distributional differences between CAD and non-CAD patients included ST slope, chest pain type, and exercise angina among all categorical data.In contrast, resting ECG displayed a similar distributional pattern between CAD and non-CAD patients.In non-CAD patients, there is a higher proportion of males than females.This gender disparity further increases in CAD patients.Of individuals diagnosed with CAD, 75.0%exhibit a "flat" ST slope, 77.2% have a chest pain type of "Asymptomatic," and 62.2% of CAD patients experience "yes" for exercise angina.In contrast, the predominant ST slope among those without CAD is "up-sloping," accounting for 77.3%.Additionally, 86.6% of non-CAD patients showed "no" for exercise angina.

Quantitative data analysis
The Pearson correlation coefficient is employed to elucidate the interrelationships between variables, and the correlation coefficient heat map among the variables is presented in Figure 4. Maximum heart rate achieved (max HR) and ST depression induced by exercise relative to rest (old peak) exhibit the strongest correlations with CAD status among all quantitative variables.CAD status demonstrates an inverse correlation with max HR, yielding a correlation coefficient of −0.4, while exhibiting a positive correlation with old peak, with a correlation coefficient 0.4.The correlation coefficient between CAD status and age is 0.28, while the correlation coefficient between CAD status and cholesterol is 0.11.Moreover, the correlation coefficient between resting blood pressure (resting BP) and CAD status is also 0.11.
Higher overall max HR values are among non-CAD patients compared to those with CAD.Max HR values for non-CAD patients are primarily distributed between 140 and 160, whereas CAD patients exhibit max HR distributions predominantly between 110 and 140 (Figure 5).Additionally, distributions of old peak values are inspected.Old peak values for non-CAD patients display a concentrated range, primarily distributed between 0 and 1, while CAD patients exhibit a more comprehensive old peak range, chiefly distributed from 0 to 2 (Figure 6).The distributions of max HR and old peak exhibit pronounced differences between patients with and without CAD, implying that these features may be of substantial importance for developing a predictive model for CAD.

AutoGluon model
After detailed model training and comparison, the experiment analyzes the performance of the AutoGluon model to evaluate its effectiveness in predicting CAD.
As shown in Table 3, the experiment examines nine different parameter combinations of the AutoGluon model.Notably, when the bag-fold is set to 4 among these combinations, implying the utilization of 4-fold cross-bagging, the model exhibits the highest accuracy, reaching 0.9167.More specifically, when the parameters are set to stack-level of 1 and bag-fold of 4, the model outperforms other parameter combinations regarding key performance indicators, such as accuracy, recall, F1-score, and AUC.The precision metric is only slightly inferior to the model parameter settings of stack-level of 3 and bag-fold of 5, indicating that the AutoGluon model under this parameter combination demonstrates outstanding performance in CAD prediction.
Overall, all nine different parameter combinations of the AutoGluon model achieve relatively high scores across the five evaluation metrics.This attests to the robust performance of the AutoGluon model and confirms its high reliability in automating the assessment of CAD prediction.prediction accuracy of these eight baseline models exceeds 0.85.Among these models, LightGBM and CatBoost stand out with exemplary performance, achieving a prediction accuracy of up to 0.8768.However, despite the relatively high prediction accuracy shown by these baseline models, their accuracy rates are lower than that achieved using the multi-model ensemble method in AutoGluon, indicating that AutoGluon's multi-model ensemble approach outperforms the individual baseline models in CAD prediction.

Feature analysis
In complex predictive models, a deep investigation into feature importance is pivotal to understanding the behavior of the model.In this study, SHAP is utilized to explain the ensemble model built by AutoGluon with parameters bag-fold of 4 and stack-level of 1.
Feature importance provides an effective method to understand and interpret the decisions made by a model.By quantifying the contribution of each feature to the model's prediction outcome, we can gain insights into the relative importance of the different inputs and their roles in the decision-making process.In qualitative data, ST slope, chest pain type, and exercise angina have a significant impact on the model's predictions (Figure 8A).For the quantitative data, old peak has the most significant influence on model prediction.It consistent with their correlations with CAD risk shown in the heat map.
While a straightforward ranking of feature importance offers insights into model decision-making, it still needs to elucidate the model's intricate decision patterns fully.SHAP visualization provides an additional layer of explainability.Based on feature importance, it arranges the features around a centerline.Based on each feature's SHAP value, samples are marked at their respective coordinate positions.Features on the left of the centerline have a negative SHAP value, indicating a pull towards a negative prediction.In contrast, those on the right side have a positive value, leading the model's prediction towards a positive outcome.Moreover, each sample point is color-coded: red indicates higher feature values, while blue suggests lower values.This visualization lets us perceive how varying feature values influence the model's prediction direction.
Figure 8B reveals the following features.Firstly, when the ST slope is "flat," the model tends to predict the sample as a CAD patient.According to the distribution in Figure 2, 75.0% of CAD patients have an ST slope value of 1 (flat), while 77.3% of non-CAD patients have an ST slope value of 0 (up-sloping).It can be observed that the algorithm's predictions may be related to the differences in the distribution of ST slope values in the population.Similarly, different values of chest pain type and exercise angina also affect the model's predictions.Finally, as shown in Figure 8B, the model tends to predict male samples as CAD patients more than female samples.
The SHAP values showcase the contribution of each feature towards the final prediction, facilitating a lucid understanding and explanation of an individual patient's model prediction.Accuracy: Proportion of correctly classified samples to the total samples; Precision: Of the predicted positives, the proportion that is actually positive; Recall: Of the actual positives, the proportion that is predicted positive; F1-score: the scores considering both precision and recall; AUC, area under the curve.
Accuracy of autoGluon and baseline models.The arrow's color determines whether the factor decreases (blue) or increases (red) the mortality risk.The cumulative influence of all factors provides the final SHAP value, correlating with the prediction score.For the model constructed in this study, the base value is 0.5803.For the first sample, the model output value is relatively high, at 9.06, indicating that the model considers the possibility of CAD in this sample relatively high.The model tends to diagnose this sample as CAD based on the values of old peak, ST slope, exercise angina, max HR, resting BP and so on.In contrast, for the second sample, the model output value is lower, at −4.27, suggesting that based on the values of ST slope, old peak, exercise angina, cholesterol, resting BP and so on, the model believes the probability of CAD in this sample is small.

Discussion
This study validates the effectiveness of AutoGluon in coronary artery disease prediction on the public data set, especially its method of integrating multiple models surpassing singular foundational models.There are inherent limitations associated with the data set used in our study, including significant differences and a lack of propensity matching, which could potentially hinder the effectiveness of our models.Moreover, there is a lack of distinction between prevalent and incident cases, as well as a deficiency in appropriate data on potential confounders, further contributing to the limitations of this data set.
In existing research, single machine learning methods are commonly utilized or further optimized, such as KNN, random forest, or Bayesian-optimized XGBoost (5,6,12).Ayatollahi et al. compared the predictive effects of artificial neural networks and SVM on coronary artery disease (33).The results indicated that the SVM exhibited higher accuracy and superior performance compared to the artificial neural networks model.Abdar (41).Different machine learning models may exhibit varying performance on specific data sets.Building machine learning models requires researchers to manually select algorithms, tune hyperparameters, perform feature selection, and preprocess data, among other steps.This study leverages AutoML technology to automate the construction of coronary artery disease prediction models.Results demonstrate that AutoGluon-based models consistently outperform those built using individual machine learning methods on CAD prediction tasks.
In the medical field, the application of AutoML is attracting increasing attention and showing tremendous potential in various aspects.The main advantage of AutoML lies in its ability to simplify the construction and optimization process of machine learning models, thereby accelerating the speed of medical data analysis and model development, reducing technical barriers, and improving model performance.AutoML can be applied to disease diagnosis and personalized prediction.For example, it can be used to enhance the detection of sinus diseases and predict acute kidney injury in acute pancreatitis (42,43).The application of AutoML in the medical field holds vast prospects and is expected to have a profound impact on medical research and clinical practice.
While AutoML demonstrates excellent performance in cardiovascular disease prediction, model explainability is equally essential in the medical domain.Doctors, when making decisions, not only rely on the predictive outcomes of the model but also need to understand why the model predicts as it does to integrate it with other clinical information.The role of SHAP is to explain the model's prediction results.It achieves this by providing a contribution value for each feature to explain the prediction of each sample, thereby enhancing the explainability of the model.In this study, SHAP was used to identify which features played an important role in predicting the model's results, aiding in understanding the model's decision-making process.According to the feature importance ranking plot, the features used by our model to predict coronary artery disease are ranked as follows: ST slope, chest pain type, old peak, exercise angina, cholesterol, max HR, sex, resting BP, age, fasting blood sugar, and resting ECG.According to the SHAP values, individuals with a flat or down-sloping ST slope have a higher risk of CAD, while those with an up-sloping ST slope have a lower risk.The SHAP value of fasting blood sugar indicates that individuals with high blood sugar levels are more likely to have CAD.This information can offer doctors additional clues in clinical practice, aiding in more precise diagnosis and treatment.
Regarding the limitations of this study, it is important to note that the data set utilized originates from open-source websites and is limited in scope.While AutoGluon demonstrates superior performance compared to other foundational models, it is essential to understand that this does not necessarily mean it is the optimal choice for all medical tasks.Furthermore, in terms of feature importance, although SHAP offers a method, further research is needed to explore its generalizability across different data sets and models.
In light of the limitations mentioned above, future research should consider the use of larger and more diverse data sets or explore the application of techniques such as few-shot learning, transfer learning, and active learning to address the challenge posed by the small size of the data set, thereby enhancing the model's generalization capability (44)(45)(46).Additionally, it is crucial to explore and compare other AutoML frameworks and algorithms, such as H2O AutoML and TPOT, and conduct indepth comparisons with AutoGluon to determine optimal strategies for specific applications (47,48).The explainability of machine learning models remains a pivotal issue, necessitating more in-depth validation across different medical domains to ensure its consistency with clinical practice (49)(50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67)(68).
The application of AutoML in cardiovascular medicine not only streamlines the model construction process but also paves the way for using artificial intelligence technologies in areas such as coronary artery hemodynamics and heart treatment.This ongoing exploration and application of AutoML in these domains promotes innovation, contributing to the refinement and personalization of heart treatment strategies.This enables the implementation of more effective and target intervention measures, promoting the ongoing advancement of precision medicine in the field.

Conclusion
This study proposes an explainable coronary artery disease prediction model based on the AutoML framework AutoGluon, applied in cardiovascular medicine.The model achieves optimal performance through comparative validation when adopting 4-fold cross-bagging, with an accuracy of 0.9167.Furthermore, single foundational models, such as LightGBM and CatBoost, do not surpass the predictive accuracy of the multi-model ensemble approach realized through AutoGluon.Lastly, to further comprehend the model's decision-making process, we interpret the ensemble model constructed by AutoGluon using SHAP.This study confirms the effectiveness of AutoML in disease prediction in cardiovascular medicine.

Figure
Figure3depicts that variables exhibiting pronounced distributional differences between CAD and non-CAD patients included ST slope, chest pain type, and exercise angina among all categorical data.In contrast, resting ECG displayed a similar distributional pattern between CAD and non-CAD patients.In non-CAD patients, there is a higher proportion of males than females.This gender disparity further increases in CAD patients.Of individuals diagnosed with CAD, 75.0%exhibit a "flat" ST slope, 77.2% have a chest pain type of "Asymptomatic," and 62.2% of CAD patients experience "yes" for exercise angina.In contrast, the predominant ST slope among those without CAD is "up-sloping," accounting for

FIGURE 3
FIGURE 3 Distribution of qualitative data.Resting ECG: Resting electrocardiogram results.Exercise angina: Exercise-induced angina.ST slope: The slope of the peak exercise ST segment.ATA: Atypical angina.NAP: Non-anginal pain.ASY: Asymptomatic.TA: Typical angina.ST: Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of >0.05 mV).LVH: Showing probable or definite left ventricular hypertrophy by Estes' criteria.

Figure 7
Figure 7 lists the prediction accuracy of each base model for CAD prediction.Based on the data set used in this study, the

FIGURE 4
FIGURE 4Heat map of the correlations of quantitative data.Resting BP: Resting blood pressure.Max HR: Maximum heart rate achieved.

FIGURE 5
FIGURE 5Box plot of max HR distribution.Max HR: Maximum heart rate achieved.

FIGURE 6
FIGURE 6Box plot of old peak distribution.Old peak: ST depression induced by exercise relative to rest.
Figures 8C,D illustrate two typical examples elucidating the model's individualized predictions: one sample of a patient diagnosed with CAD and another of a patient without CAD.The arrows indicate the influence of each factor on the prediction.

FIGURE 8 Feature
FIGURE 8 Feature importance analysis based on SHAP.(A) Ranking of feature importance: Ranked by the average SHAP values of 11 features.(B) Positive and negative impacts of features on model predictions: Features to the left of the central line negatively influence the predictions, while those to the right positively influence the predictions.(C) Individualized prediction interpretations for samples with CAD: The base value represents the baseline prediction made without any feature contributions.Arrows depict the impact of each feature on the prediction, with blue arrows indicating a decrease in risk and red arrows indicating an increase in risk.(D) Individualized prediction interpretations for samples without CAD.

TABLE 1
Description of variables in the data set.

TABLE 2
(31)line comparison table between patients with CAD and non-patients.trainingspeedandaccuracy,butitmaybeover-fittedto small-scale data sets and needs more tuning parameters(28).CatBoost is specially designed for processing classification features, which can automatically process the coding of classification features, but the training speed is slow, and the requirements for superparameter tuning are high(29).XGBoost uses pre-sorting technology and regularization to improve the performance and stability of the model, but it needs more tuning parameters for large-scale data sets(30).Random forest is an ensemble learning algorithm that performs classification or regression by aggregating the votes of multiple decision trees.K-nearest neighbors (KNN) is an instance-based learning algorithm that performs classification or regression by finding the nearest K neighbors in the training data.KNN does not need a training process, but the computational complexity of large-scale data sets is high.Neural networks are models composed of multiple layers of neurons optimized using the backpropagation algorithm.FastAI is a deep learning library based on PyTorch, which provides an easy-to-use API and is suitable for quickly building and training deep learning models.However, it may not support some advanced functions and needs additional customization and adjustment(31).

TABLE 3
Performance of autoGluon models with nine parameter combinations.
et al. introduced a novel optimization technique, N2Genetic optimizer, for enhancing SVM.Research findings demonstrated that the proposed method for optimizing machine learning-based approaches could be successfully applied to raw data, leading to the development of predictive models for clinical and research purposes (34).Agrawal et al. developed a framework utilizing elastic net regularized Cox regression to select 51 coronary artery disease risk prediction factor subsets from 13,782 features.The model demonstrated better predictive performance in the test cohort compared to algorithms used clinically (35).Wang et al. constructed a predictive model using random forest and evaluated the model using ROC curves.This model integrated 15