Coronary Artery Disease

CAD is the leading cause of mortality and morbidity in the USA and Europe and among the most prevalent and severe manifestation of cardiovascular disease [1, 2]. CAD is characterized by atherosclerotic lesions, whereby plaques consisting of fatty deposits, inflammatory white blood cells, and smooth muscle cells accumulate and obstruct the coronary arteries [3]. Early symptoms of obstructive CAD due to vascular stenosis can result in angina, arrhythmias, and transient ischemic attacks [4]. Progressive intracoronary thrombosis with plaque rupture can eventually develop into a complete blockage, leading to myocardial infarction, heart failure, and sudden cardiac death [5]. Other vascular diseases are also common in patients with CAD, such as carotid atherosclerosis, peripheral arterial disease, and stroke [4]. CAD is a complex multifactorial disease with nearly 300 risk factors statistically associated with its development [6,7,8]. CAD also shows significant heterogeneity across geographic regions, which makes generalized early diagnosis difficult to achieve. Despite WHO Member States global action plan for the prevention and control of CAD, the prevalence of CAD and CAD-related healthcare costs have continued to increase [9,10,11]. Thus, there continues to be a pressing need to build an early prevention eco-system to reduce the global public health burden of CAD.

CAD Risk Factors

CAD risk factors can be divided into (1) modifiable risk factors (including hypertension, hyperlipidemia, hyperuricemia, diabetes mellitus, obesity, smoking, psychosocial stress, diet, sedentary lifestyle, and socioeconomic status), (2) non-modifiable risk factors (including age, gender, and genetic factors), and (3) risk-enhancing comorbid disease factors (including non-alcoholic fatty liver disease (NAFLD), chronic kidney disease (CKD), systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), inflammatory bowel disease (IBD), hypercoagulability, human immunodeficiency (HIV), thyroid disease). Some of these risk factors can be captured by laboratory measurements, indices or biomarkers for daily clinical practice, and target for treatments — including vital signs (body temperature, pulse rate, respiration rate, heartbeat rate, blood pressure), body mass index (BMI), ankle-brachial index (ABI), coagulation indexes, cardiac calcium score, lipids/lipoproteins (LDL cholesterol, HDL cholesterol, total cholesterol triglycerides, lipoprotein(a), homocysteine), high-sensitivity c-reactive protein (hsCRP), and inflammatory cytokines.

Primary Prevention of CAD

Current primary prevention guidelines in the USA and UK involve the use of additive risk assessment tools (including Reynolds score, Framingham risk score, pooled cohort risk equations (PCE), and QRISK), each including some of the risk factors above, and generally assigning individuals into low-, intermediate-, and high-risk populations. General recommendations for primary prevention include “Life’s Simple 8”: to get active, acquire adequate sleep, eat better, lose weight, stop smoking, control cholesterol, manage blood pressure, and reduce blood sugar. For at-risk populations identified by guideline algorithms, the first-line intervention remains lifestyle modification including smoking cessation, Mediterranean diet, intentional weight loss, and increasing physical activity. Guidelines typically recommend medications for individuals considered to be at higher risk or carrying the presence of abnormal levels of specific biomarkers. The mainstay therapy in primary prevention is lipid lowering with statins, ezetimibe, and PCSK9 inhibitors. Other major biomarkers targeted for primary prevention of CAD are controlled via hypertension management (blood pressure lowering agents) and diabetes management (SGLT2 inhibitors, GLP1 receptor agonists). Numerous other contributory conditions and corresponding primary prevention approaches included lipidome remodeling (N-3 fatty acids), gut microbiome remodeling (small molecules, prebiotics, probiotics, or cyclic peptides [12]), antiplatelet (aspirin), anticoagulation (low-dose rivaroxaban), anti-inflammatory (colchicine), and vaccination (influenza and COVID-19) or PrEP (HIV). Multiple biomarkers may be targeted with a single therapeutic agent with polypills.

Emerging Opportunities for AI in Early CAD Risk Assessment

AI models provide an opportunity to combine CAD risk factors into more complex risk assessment models, empowering physicians to make clinical decisions by harnessing the wealth of available health information for each individual [13]. There is not much dispute about the potential value of predictive models in cardiology, especially in early CVD detection [13]. Numerous studies have indicated that the performance of machine learning (ML)-based risk assessment models may exceed traditional risk assessments, even when simply using well-established cardiovascular disease (CVD) risk factors [14,15,16]. Moreover, data with different modalities, e.g., ECGs, chest X-rays, laboratory values, and polygenic risk scores (PRS), can be harnessed in these models to drive multi-modal precision CAD prevention [17]. When combined with genetic data, risk assessments can be made earlier, potentially leading to improved primary prevention [18]. Through genetic insights, we can measure the impact of familial ties on CVD and detect those predisposed to risk well ahead of the primary indicators of atherogenesis [19].

Most learning applications are achieved through supervised learning approaches, requiring labeled ground truth training data [20]. However, many of the available supervised learning algorithms (including generalized linear models (GLM), support vector machines (SVM), or decision trees) contend with the bias-variance tradeoff [21]. This tradeoff represents a situation where a model may overly adapt to its training data, known as overfitting, or conversely, may be too generalized, thereby missing intricate data patterns and resulting in underfitting [22]. Ensemble prediction models, by their very nature, attempt to navigate this tradeoff by amalgamating various algorithms or utilizing multiple iterations of a single algorithm through several forest and boosting methods (AdaBoost, LightGBM, CatBoost, and XGBoost), aiming to strike an optimal balance between bias and variance [23]. Further, modern computing hardware has revitalized a subfield of AI inspired from biological neuron connectivity — deep learning (DL) — which now includes novel neural-network architectures successful in different domains, namely, convolutional neural networks (CNNs) for image recognition, recurrent neural networks (RNNs) for time series forecasting, and attention-based models for natural language processing (NLP) including large language models (LLM) [24, 25]. Finally, unsupervised AI methods, including some DL methods, can also enable improved clinical diagnosis of CAD by learning representative patterns free from human hypotheses in order to capture cryptic early symptoms from high dimensional data [26, 27].

At a high level, these major predictive modeling approaches can be applied to combine CAD risk factors into AI models in the following ways: (1) combining traditional biomarkers in AI models, (2) integrating additional genetic and other omic risk factors into more comprehensive risk assessment models, (3) including sensor-based feeds for real-time risk detection, (4) integrating various imaging modalities for active disease detection, and (5) capturing other data from EHRs using AI. In the remainder of this review, we will discuss the recent specific applications of AI for CAD primary prevention, considering each class of CAD risk factor, and provide our view on the necessary future iterations of these approaches in order to produce actionable insights linking causal mechanisms to preventive interventions (Fig. 1).

Fig. 1
figure 1

Opportunities for AI-driven CAD prevention and management

Laboratory Biomarker Risk Assessment with AI

Simply combining traditional risk factors or contemporary risk tools into more complex predictive frameworks has provided evidence of low-hanging fruit for AI models in CAD risk assessment [16, 28,29,30,31,32,33]. For example, Petrazzini et al. built an EHR score from tabular clinical features with ML framework (random forest, gradient boosted trees, SVM stacked model) and improved CAD prediction from ASCVD score by 12% in the BioMe Biobank and 9% in the UK Biobank [34]. Further gains in CAD risk assessment accuracy have also been described with ensemble prediction models [35]. ML approaches have also facilitated novel biomarker identification and prioritization, including purine-related metabolites [36], apolipoprotein B [37], glutathione peroxidase-3 [38], epicardial adipose tissue [39], sleep heart rate variability [40], plasma lipids (184 lipids in lipidome) [41], and serum sphingolipids [42]. AI models can also be used to impute biomarker levels when direct determination is not available [43,44,45,46]. The success of these stack and/or ensemble prediction models suggest multiple health trajectories leading to CAD, with actionable information potentially more accurately captured by these complex models relative to the more simple, traditional, linear risk scoring approaches [47].

Genetically Informed Risk Assessment Models

CAD risk has a strong genetic component driven by the interplay of environmental factors with genetic susceptibility factors ranging from monogenic (Mendelian) to highly polygenic risk [19]. Twin studies suggest a heritability of 50–60% for fatal CAD [48, 49]. Genetic risk assessment based on germline DNA provides a robust orthogonal predictor to laboratory biomarker-based risk factors and allows for early risk screening before other clinical measurements become informative [50,51,52,53,54,55,56]. CAD polygenic risk scores ranging from dozens to thousands of common variants can convey risk explaining between ~ 10% [57, 58] and ~ 40% [7, 59, 60] of disease heritability. A high CAD PRS is associated with increased benefit from lipid lowering interventions, including both statins [61,62,63] and PCSK9 inhibitors [64, 65]. These observations provide an opportunity to prioritize lipid lowering interventions for individuals predicted to receive improved benefit in the context of a standard guideline-based risk assessment framework [60, 66,67,68,69]. Linear and non-linear combinations of multiple PRSs have also been demonstrated to improve the polygenic prediction [70,71,72]. However, it is currently unclear how to best combine genetic information with clinical risk factors to demonstrate significant clinical benefit in large cohorts [55, 73,74,75,76].

AI approaches have been applied to GWAS data for a number of purposes including the identification of novel prognostic/causal markers and druggable targets [77,78,79,80]. ML/DL models for systematic post-GWAS analysis include those for functional annotation, functional fine-mapping, or functional scoring (pathogenicity or cell-specific importance scoring) to infer underlying regulatory mechanisms of non-coding CAD risk loci [81,82,83,84,85,86,87], and improving of the accuracy and cross-ancestry transferability of CAD-PRS [88, 89]. Other examples incorporate biological networks to further insights, like GCN-GENE, a DL model which leveraged propagation of GWAS signals in biological networks to identify additional CAD-related genes [90], or GenNet [91], another approach leveraging biological networks to perform genotype-to-phenotype mapping and improving risk assessment.

These risk factors, separated into biological pathways, interact with the environment in differing ways and may be further amenable to tree-based learning methods for the construction of genetic risk models theoretically able to capture differing gene-by-environmental interactions across diverse populations [79, 92,93,94]. Despite evidence of complex interactions between genetics and environment in mediating CAD risk [95, 96], existing linear risk prediction models including genetic risk often do not capture gene–gene or gene-environmental interactions in risk assessment [97,98,99,100]. AI models provide an opportunity to capture these complex relationships. Forrest et al. implemented a random forest-based ML system to this end for the cross-sectional detection of CAD and achieved AUROC as 0.89 [101]. Nam et al. used a semi-supervised multi-layered network to devise a network-based MI risk score including interactions between PheWAS and PRS features, achieving an AUC improvement of 28.29% compared to PRS-alone model [102]. Steinfeldt et al. presented NeuralCVD from a deep survival machine to outperform linear algorithms (SCORE, ASCVD, QRISK3, and linear cox model) and identified an interaction of clinical risks and PRSs [103]. However, few studies have succeeded in applying these concepts in large-scale, longitudinal, incident CAD risk prediction [104]. To optimize the efficacy of these models in primary prevention, rigorous prospective studies with guideline recommendations are vital, as they hold the potential to both elucidate incident risk predictions and guide evidence-based preventive interventions [105,106,107].

Multi-omics Data for Precision Risk Assessment

High-throughput transcriptomic, epigenetic, proteomic, and genomic technologies have enabled other comprehensive biomarker surveys in CAD [108,109,110]. These analytes provide signatures of the current physiological state, which may then be used to diagnose CAD or provide predictions of risk for future events. For example, transcriptomic profiles can be used to detect regulatory signatures that capture the current biological or pathological state of tissue. Few studies have progressed beyond cross-sectional detection of potential biomarkers to the prospective prediction of CAD events and myonecrosis. One exception includes transcriptomic predictors of impeding acute myocardial infarction (AMI) events from whole blood-derived circulating endothelial cells (CECs) [111,112,113,114]. The CEC transcriptomic signature potentially represents the biological state at the site of plaque rupture, serving as a biomarker of its present state, and potentially predictive of risk earlier than conventional biomarkers such as troponin and CK-MB. Another example of the utility of transcriptomic data includes a recent study which constructed a single-nucleus atlas of chromatin accessibility in human coronary arteries and identified specific cell-type regulatory mechanisms. By employing a statistical genetics and ML strategy, the study prioritized candidate regulatory variants and mechanisms for CAD loci and revealed detailed mechanisms connecting cell types, causal genes, and CAD risk variants in diverse populations [87].

Additionally, since CAD risk varies in relation to diverse exogenous and endogenous factors such as environment, diet, and lifestyle, DNA methylation signatures associated with CAD may be a proxy for exogenous exposures over time. Thus far, DNA methylation, integrated with genetic and clinical features, has been used to predict the risk of early heart failure, large-artery atherosclerosis stroke, and the risk of CAD via supervised learning with some success. However, few of these studies have demonstrated generalizability in diverse and independent datasets. One study developed an ensemble predictor for incident CAD risk in different cohorts and combined them using a cross-study learning approach. This study demonstrated the feasibility of a genome-wide, epigenomic risk score for the prediction of future CAD events, possibly in individuals who would not be identified by other conventional risk metrics, but true clinical utility of epigenome risk scores remains to be demonstrated [100, 115, 116].

Proteomic studies appear to have similar value in situations where traditional risk assessment methods have limited predictive power for certain high-risk individuals. These scenarios include those with high risk of recurrent atherosclerotic events that require more intensive therapeutic interventions, individuals with known CAD risk but with well-controlled LDL-cholesterol and blood pressure, as well as those with multiple chronic conditions. In one example, researchers developed prognostic risk models based on plasma proteomics coupled with AI that can better predict cardiovascular outcomes within a relatively short period of time [117]. It was also suggested that the plasma proteome predictor could act as a universal surrogate endpoint for CAD, providing an avenue to improve patient outcomes through selective drug allocation and better monitoring in phase 2 clinical trials. In addition, plasma proteomics has been used to develop models for survival prediction after AMI [118, 119], prediction of recurrent events [120, 121], and improved assessment of risk for primary events [122].

Finally, dysregulation of gut microbiome has been shown to be associated with many chronic inflammatory diseases and is connected to the emergence and progression of several CAD-related risk factors [123]. However, it remains a major challenge to disentangle the possible impact of metabolic disruptions on a dysfunctional gut microbiota vs an imbalanced gut microbiota that more causally drives pathogenesis, potentially exacerbating ischemic heart disease processes at a later stages [124,125,126]. AI strategies such as shallow learning algorithms (random forest, support vector machine, and neural network etc.) are generally only useful for classification and identification of taxonomic differences between healthy and diseased individuals [123, 127]. They are not yet able to disentangle the interplay of reactive vs potentially causal changes that are observed in an abnormal microbiome.

Real-Time Sensor-Based Risk Monitoring for Early Detection and Prevention of CAD Using AI

While biomarkers provide intermittent snapshots of health, smart medical devices and biosensors have the potential to revolutionize CAD risk monitoring by allowing real-time, longitudinal, collection of risk factor information and trajectory estimation [128, 129]. Wireless networks, remote data centers, and edge computing enable wearables to monitor risk factors in real time [130, 131]. This eventual internet of things (IoT) also includes AI-assisted wearables that promise to provide accurate point-of-care diagnosis [132, 133], and will eventually cross-over into omics and laboratory-based biomarkers through the use of biochemical sensors. Initially, by building upon baseline applications of portable sensors for the automatic monitoring of cardiac rhythm disturbances, future iterations of cardiac biosensors could detect acute cardiovascular events and longitudinal factors to build up a personal risk baseline [128, 134,135,136,137,138]. In this section, we review recent applications and provide future perspectives on the use of various digital health and biosensor devices for CAD prevention with state-of-art AI approaches [139].

Heart Rhythm (Cardiac Signal) Monitoring

Central to CAD is the heart and the change of mechanical, colorimetric, and electrical signals it produces with each heartbeat. These physical signals can be measured by electrodes, optical sensors, or motion sensors and interpreted into various biosignals including electrocardiogram (ECG), photoplethysmogram (PPG), stereocardiogram (SCG), phonocardiogram (PCG), ballistocardiogram (BCG), gyrocardiogram (GCG), or impulse cardiogram (ICG) for cardiac monitoring [140]. These signals can in turn be used to detect cardiac risks like atrial fibrillation and/or other heart rhythm disturbances indicative of future disease risk or pathology [141]. For example, He et al. extracted features of 30 dimensions from PPG signals to assess hemorrhagic risk in patients with CAD using an XGBoost regression model and achieving an AUC of 0.76 with tenfold cross validation [142]. Neural network-based models promise to improve sensor based prediction through their ability to denoise, annotate, and perform feature extraction on time traces [143,144,145,146,147]. ECG, PPG, and SCG can be used for remote heart condition monitoring when deployed using wearable devices, especially during exercise [148]. And with AI-assisted early abnormal signal detection, cardiac signals can be further used for CAD diagnosis and prediction [149,150,151,152]. While many of these algorithms operate on 12-lead ECG data, some researchers have demonstrated comparable performance with single-lead ECGs [152,153,154]. Despite advances in detecting specific cardiovascular abnormalities, the full utility of sensor-based signals for long-term risk prediction remains unproven. Addressing challenges such as confounding variables, temporal ambiguity, distinguishing genuine signals from noise, and population variability would be a further obstacle to overcome.

Physical Activity Monitoring

Although lifestyle can be difficult to ascertain in a single clinic visit, it is been identified as mediating several risk factors for CAD. Lifestyle risk factors can be measured via accelerometer-based step counting, energy expenditure [155], sleeping composites [156, 157], and other factors [120, 139]. These activity metrics can then be included in ML models for prediction of cardiac outcomes. For instance, Nguyen et al. measured accelerometer-measured daily total movements and found an association between this measure of physical activity and risk of incident CVD in women and young adult [158, 159]. Triaxial accelerometer-based physical activity data has been used to demonstrate CAD risk factor reduction by changing chronoactivity [160, 161]. And Huang et al. designed an ensemble ML algorithm for the prediction of coronary artery calcium from predictors including several lifestyle and physical activity features [162]. Despite the breakthrough in AI and wearable technology [163, 164], the current studies in the field are limited to benefit with cardiac telerehabilitation in patients with CAD [165, 166]. Actual, real-time, and useful inclusion of physical activity measures in prediction of incident CAD risk has yet to be demonstrated.

Biochemical Sensors

Biochemical sensors transform a biochemical analyte into an electronic signal, often using an integrated optical, acoustic, magnetic, or electrochemical sensing array operating in biofluids like blood, sweat, saliva, or urine. This technology can be used to conduct non-invasive, cost-effective, multi-analytes scans of human metabolites, and in quick response to lifestyle influences [167]. These scans have been used in CAD prediction models, integrating multi-modal signals of BP, temperature, ECG, glucose, hemoglobin, and oxygen levels, and achieving 97% accuracy with a minimum redundancy maximum relevance feature selection [168]. Multiple vital signs including electrodermal activity can also be combined in ML models to aid in the detection of sleep stages, which can then be used to improve CAD risk prediction [169, 170]. More unusual examples include the use of chemical gas sensors for the detection of CAD risk via an electronic nose [171]. Or novel quantum sensing approaches used to detect cardiac amyloidosis [172]. Biochemical sensing is an up-and-coming application area where clinical validation is still pending [173]. Economical deployment of these sensors and a system of interpretation and alerts will be the major challenge to overcome with these information-rich technologies.

Environmental Sensing

Sensing can extend beyond the body and range from local neighborhood environmental monitoring [174], to include larger structural elements of society [175]. Several cohort studies have demonstrated the value of wearable sensors by capturing the complex gene-environment interactions [176] as well as the impact of longitudinal actionable changes [177]. One promising area of research involves the use of GPS technology and other environmental sensors to collect real-time data on an individual’s exposure to air pollution, water quality, and other environmental factors that may increase their risk of CAD [178]. This can be valuable in developing personalized prevention strategies that consider an individual’s unique travel patterns and contextual background [179, 180]. In addition to environmental factors, social factors play an important role in the development of CAD, such as social interaction and community engagement [181]. Integrating this data potentially enables AI-powered wearables to recognize regions with heightened environmental risk factors and provide tailored lifestyle coaching to reduce the incidence of CAD through sustainable behavioral changes [182, 183]. These applications are likely to be further into the future as a deeper understanding of the interplay between endogenous and environmental risk factors must be appreciated before these datastreams extend to individual level vs population level utility.

Advanced Applications of AI in Noninvasive Imaging for CAD Risk Evaluation

AI has been increasingly applied to cardiovascular imaging for risk stratification of CAD, by virtue of its ability to accurately quantify prognostic biomarkers from image data, in addition to the reduction in cost and improvement in image acquisition and interpretation. This section summarizes recent promising applications of AI across various noninvasive imaging modalities, including coronary artery calcium imaging, coronary computed tomography (CT) angiography, peri-coronary/epicardial adipose tissue imaging, nuclear imaging, and retinal imaging, for the improved risk assessment of CAD, which can better guide decision-making in the primary prevention of CAD.

Coronary Artery Calcium Scoring

As coronary artery calcium (CAC) is a highly specific feature of atherosclerosis, CAC scoring (CACS) has emerged as a powerful and widely available means of predicting risk for atherosclerotic cardiovascular diseases, particularly useful for guiding primary prevention therapy decisions [184,185,186]. AI approaches have gained great attention due to their promising automation capabilities for annotation of the calcified lesions. Sandstedt et al. used a ML model that integrates patient-specific heart-centric coordinates, local voxel images, and the coronary territory map to evaluate the diagnostic efficacy of this AI-driven, automated CACS software against semi-automated software, using the same ECG-gated CT images [187]. In this study, the AI-based method was less time-demanding, with an excellent correlation and agreement with the semi-automated one. The limited accessibility of ECG-gated CT represents a critical issue restricting its routine use. The millions of people undergo routine chest CT scans and demonstrate CAC, but its quantitation is not feasible yet. In this regard, recent studies have focused on the application of convolutional neural networks (CNN) to a wide range of CT examination types, including low-dose chest CT and radiation therapy planning CT, and suggest that AI-based CACS quantification is robust across the different CT protocols [188, 189]. The future application of AI-based CAC assessment is the microcalcification quantification, since the current methods are limited to advanced calcification. Considering that microcalcifications can induce to vascular stiffening and plaque rupture [190], AI-based microcalcification quantification can allow further risk stratification in primary prevention of individuals with normal results by conventional imaging.

Coronary CT Angiography

Coronary CT angiography (CCTA) is another important modality that can provide information on the risk of subsequent acute coronary syndrome (ACS). Although CACS reflects overall coronary atherosclerosis burden and is useful for predicting the risk of CAD, lesion-specific coronary plaque burden and high-risk plaque features, major determinants for ACS risk, can be only assessed by CCTA [191, 192]. However, the analysis of coronary plaque volume and features requires a high level of human expertise and time-consuming protocols, even compared to that needed for CACS measurement. Recent advances in AI have enabled more rapid and accurate assessment of plaque volume and characteristics. Lin et al. used DL model with the hierarchical convolutional long short-term memory network to segment the coronary arteries and showed that this DL-based plaque volume quantification was comparable to that measure by intravascular ultrasound (IVUS), a well-established reference standard, with a shorter analysis time (5.65 s versus 25.66 min for experts) [193]. Araki et al. utilized IVUS in a framework combining SVM and PCA to achieve AUC 0.98 in the risk assessment for CAD [194]. Al’Aref et al. applied XGBoost to train and tune using 10-fold stratified cross-validation on CCTA images and found that this technique could be useful for identifying high-risk plaque features [195]. Han et al. joined the clinical characteristics, biomarkers, and CCTA-derived variables toward better identify rapid coronary plaque progression in high-risk CAD patients [196]. Li et al. implemented a combined reinforcement multitask progressive time-series networks model using patients’ basic patient information with family history, blood biochemical indicators, echocardiography reports, and coronary angiography data on different time to predict the occlusion degree of eight coronary arteries [197]. Combined with traditional risk factors, biomarkers or plaque features, CCTA metrics could aid the risk stratification for CAD risk prediction [198, 199].

Detection of vascular inflammation and novel therapies targeting inflammation have become promising fields of CAD research [200,201,202,203]. Peri-coronary and epicardial adipose tissue have attracted growing interest, because these imaging markers can reflect the inflammation. However, their measurement is not considered suitable for clinical practice due to the need of a tedious manual process. A recent study demonstrated that DL model allows fully automated quantification of epicardial adipose tissue with a comparable accuracy and a shorter analysis time (1.57 s versus 15 min for experts) [204]. Furthermore, peri-coronary adipose tissue CT attenuation is the current state-of-the-art method to assess coronary specific inflammation. However, this technique does not account for the complex spatial relationship among voxels. Recent studies suggest that the CT-based radiomics coupled with AI improve the discrimination of MI or the prediction of cardiac risk beyond CT attenuation-based model [205, 206].

Nuclear Imaging

Current cardiac nuclear imaging is dominated by myocardial perfusion and viability assessment using the flagship techniques of single photon emission computed tomography (SPECT) and positron emission tomography (PET) [207] and its role in the primary prevention of CAD has been limited so far [208, 209]. However, at least two scenarios provide perspectives on cardiac nuclear imaging, expanding their use in primary prevention with the assistance of AI. First, considering that the usage of biomarkers for risk stratification in primary prevention can vary according to their predictive value [210], which can be improved by the implementation of AI, there are still chances that myocardial perfusion imaging (MPI) can be used for primary prevention in the subclinical stage of CAD. Two recent studies highlight this possibility [211]. The first study comparing quantitative versus visual MPI in subtle perfusion defects proved that total perfusion deficit quantified automatically allowed more precise risk stratification [212]. The following study described a DL model significantly surpassing the diagnostic accuracy of standard quantitative analysis and visual reading for MPI [213]. Another possibility of future use of cardiac nuclear imaging for primary prevention of CAD under the assistance of AI can be related to the prediction of high-risk atherosclerotic plaques associated with a near-term atherothrombotic event such as myocardial infarction. Clinical studies suggested the inflammatory activities in atherosclerotic plaques measured by 18F-fluorodeoxyglucose (FDG) PET or microcalcification tracked by 18F-sodium fluoride (NaF) PET combined with CT in atherosclerotic plaques are related to cardiovascular events [214,215,216]. Though there is limited evidence that this can be applied to the primary prevention of CAD, one recent study developed a ML model incorporating quantitative measures of 18F-NaF PET that successfully predicted the future risk of myocardial infarction in patients with stable CAD [217, 218].

Retinal Imaging

Though there have been several reports on its correlation to CAD, retinal imaging has not been a conventional risk stratification method for the primary prevention of CAD [219, 220]. However, recent studies using DL algorithms shed light on retinal imaging as a potential tool to predict and stratify the risk of CAD. Trained on data from 284,335 patients, the Google AI team employed deep learning algorithms to accurately predict cardiovascular risk factors like age, gender, smoking status, systolic blood pressure, and major cardiac events such as heart attacks from retinal images [221]. In a study using DL algorithms trained on 216,152 retinal images, researchers investigated the algorithm’s ability to predict CAC scores and stratify cardiovascular disease risk. The results showed that the DL method, based on retinal images, could predict CAC as determined by CT scans with equal effectiveness in anticipating cardiovascular events [222]. Most recently, after training DL models with retinal and cardiovascular magnetic resonance (CMR) images together, the researchers showed that their algorithm could predict not only the mass or volume of the heart but also future myocardial infarction just using the retinal images and demographic data [223]. Since retinal scans are comparatively cheap and routinely used in many optician practices, with more validation studies, AI-based retinal images might be pushed up as a new risk stratification tool for primary prevention.

Enhancing CAD Prediction Through AI-Enabled Integration of Personal Health Data and Large Language Models

Increased accessibility of personal EHRs and other digital health data sources provides a rich substrate from which to generate ML-based risk assessment models [224, 225]. Emerging nationwide biobanks have expedited the implementation of these models in care delivery [225, 226]. AI is well suited to parse the sparse yet high dimensional data in EHRs [227]. Recent efforts have begun to demonstrate how genetic risk can be systematically integrated more directly with a wider spectrum of relevant risk factors for risk assessment [228, 229]. ML and NLP approaches have been applied for CVD prediction through parsing structured or unstructured medical big data [230,231,232]. More specifically, language models — essentially, pre-trained models that can be fine-tuned for various natural language tasks, each of which previously required individual network models — have revolutionized natural language processing (NLP) in recent years. They have become pervasive in NLP, largely due to the success of the transformer architecture [233] and its high compatibility with massively parallel computing hardware. It is now widely recognized that scaling up language models — in terms of training and model parameters — can enhance both performance and sample efficiency across a variety of downstream NLP tasks [234].

To date, one of the largest language models trained with unstructured EHR data — ClinicalBERT [235] — has been developed to characterize reasons for statin nonuse in a multiethnic, real-world, ASCVD cohort. ClinicalBERT includes 110 million parameters and was trained using 0.5 billion words from the publicly available MIMIC-III dataset. The study revealed that around 40% of ASCVD patients lacked formal statin prescriptions. ClinicalBERT effectively detected statin nonuse and primary reasons for this nonuse from unstructured clinical notes — prevalently, patient-level reasons (such as side effects and personal preferences) and clinical-level reasons (i.e., practices that deviate from established guidelines). By guiding targeted interventions to address statin nonuse, clinical LLMs like ClinicalBERT potentially provide a pathway to address important treatment gaps in cardiovascular medicine.

Even larger clinical LLMs have been developed — for example, GatorTron [236] and Med-PaLM 2 [237]. GatorTron scaled up to a size of 8.9 billion parameters using a corpus with 90 billion words from clinical notes, scientific literature, and general English text. It achieved SOTA performance on 5 clinical NLP tasks at various linguistic levels (clinical named entity recognition (CNER), medical relation extraction (MRE), semantic textual similarity (STS), natural language inference (NLI), and medical question answering (MQA)) when compared with three existing clinical/biomedical LM. Remarkably, GatorTron performed considerably better in the most complex NLP tasks (NLI and MQA) compared with existing smaller clinical LMs (BioBERT and ClinicalBERT). Google’s Med-PaLM 2 scored up to 86.5% on USMLE MedQA — comparable to an expert doctor — setting a new state-of-the-art and demonstrating the potential of clinical large LM for advanced applications such as MedQA.

A fascinating property of LLMs is emergence, which results from scale [234]. For instance, GPT-3 [238], which boasts 175 billion parameters compared to GPT-2’s 1.5 billion, enables in-context learning, in which the LLM can adapt to a specific downstream task simply by receiving a prompt (a natural language description of the task). Intriguingly, this emergent property was neither explicitly trained for nor initially expected to arise [239]. An important consequence of this aspect is the sociological shift within the NLP community toward general-purpose models, i.e., when scaling enables a few-shot prompt-based general-purpose model to outperform previous SOTA performance held by fine-tuned, task-specific models.

Liévin et al. recently explored the capacity of general-purpose LLMs to reason through complex medical questions [239]. Utilizing a human-aligned version of GPT-3 (InstructGPT [240]), they addressed multiple-choice questions from medical exams (USMLE and MedQA) as well as medical research queries (PubMedQA). Their investigation employed various techniques: chain-of-thought (CoT) prompts for step-by-step reasoning, grounding by augmenting the prompt with search results, and few-shot learning by prefacing the question with example question–answer pairs. A medical domain expert reviewed and annotated the model’s reasoning for a subset of the USMLE questions. Remarkably, even with the most basic prompting schemes, zero-shot GPT-3 outperformed domain-specific BERT baselines. CoT prompting emerged as a particularly effective strategy. By combining multiple CoTs, they discovered that GPT-3 could achieve unprecedented performance on medical questions. Moreover, CoT prompting rendered the zero-shot GPT-3 predictions interpretable, revealing a good comprehension of the context, correct recall of domain-specific knowledge, and non-trivial reasoning patterns. They also noted that the incorporation of few-shot prompt-based learning further improved performance. Lately, Nori et al. ran similar tests on GPT-4 [241], the state-of-the-art LLM at the time of this writing [242]. Without any specialized prompt engineering, GPT-4 exceeded the passing score for the USMLE exam by more than 20 points and outshined earlier general-purpose models (GPT-3.5) as well as models that have been specifically fine-tuned on medical knowledge.

In applications where safety is paramount, such as healthcare, the efficacy of LLMs hinges on their ability to produce outputs that are both factually accurate and comprehensive. The increased conversational abilities of LLMs like GPT-4 enable new paradigms such as multi-agent LLMs. For example, dialog-enabled resolving agents (DERA) is a simple, interpretable forum for models to communicate feedback and iteratively improve output [243]. Dialog is structured as a discussion between two types of agents: a researcher, who processes information and identifies key problem components, and a decider, who has the authority to synthesize the researcher’s information and makes final determinations on the output. DERA was evaluated on three tasks with a clinical focus. In the areas of medical conversation summarization and care plan generation, it demonstrated significant improvement over the baseline GPT-4 performance, as evidenced by both human expert preference assessments and quantitative metrics.

Although impressive, these results are not yet on par with human performance. For example, while chain-of-thought prompting approaches suggest the emergence of reasoning patterns that align reasonably well with human approaches to medical problem-solving, they still expose significant gaps in knowledge and reasoning. Interestingly, only the largest GPT models were capable of answering medical questions in a zero-shot setting. This leads to the speculation that the smaller models cannot hold the intricate factual knowledge needed to address specialized medical queries, and that the ability to reason about medical questions only emerges in the largest models. LLMs are expensive to train and require the development of safeguards before being deployed into real-world systems. Notoriously, LLMs have a propensity to magnify the societal biases inherent in their training data, can fabricate information based on the data encoded in their parameters, and it is possible to extract training data from LLM, with larger models being more likely to memorize training data [244, 245]. Therefore, deploying LLMs into sensitive sectors like healthcare must be undertaken with great caution [246, 247]. Nonetheless, LLM are powerful instruments and therefore hold the potential to transform the field of machine intelligence applied to healthcare and primary prevention (CAD and beyond).

Current Limitations and Future Considerations

The journey to integrate AI into healthcare demands both foundational investment and a cultural shift [248]. This section examines challenges encompassing data availability, data security and privacy, interpretability, and the pivotal role of adequate representation and reduced bias when deploying AI for CAD prevention [249].

Access to diverse and comprehensive datasets is the bedrock of AI’s efficacy in healthcare transformation. However, regulatory constraints and fragmented data storage can impede the development and evaluation of AI models. Even though digital health platforms provide vast datasets ideal for AI, the lack of unified data processing and sharing frameworks necessitates considerable curation efforts. Metadata tagging protocols should be standardized to enhance reliability, comparability, and scalability. Accurately harmonizing data from varied platforms and technologies is formidable but indispensable for creating effective AI models. Initiatives like interdisciplinary consortiums for AI training, technology interfaces for model validation, and open-source sharing of datasets and computational methods are potential solutions [250, 251]. A synchronized strategy for data recording and storage, compatible with diverse devices, is essential [252].

Protecting data throughout AI model lifecycles is paramount. These models, trained on vast and sometimes sensitive datasets, warrant meticulous care to safeguard patient confidentiality. While strategies like data masking and pseudonymization bolster data privacy during AI development, residual risks of data exposure persist. Leveraging AI within decentralized data architectures that emphasize privacy has been proposed [253,254,255], though its true merit in alleviating privacy concerns is yet to be validated. Unified efforts from researchers, institutions, and regulatory authorities are vital to foster inclusivity during data collection, ensuring AI models that are both potent and secure, thereby benefiting healthcare management and patient outcomes [256, 257].

The intricacies of AI algorithms, often labeled as “black boxes” due to their opaque decision-making processes, present hurdles in building trust among healthcare providers and patients [258]. Successful AI adoption in healthcare necessitates its alignment with clinical practices and guidelines, promoting an interoperable and sustainable care delivery system [259]. Incorporating AI into clinical workflows demands a holistic approach, involving AI-human collaboration among healthcare professionals, data experts, and specialists [260]. This integration is not just about the technology but also about reshaping the decision-making processes in healthcare. Algorithmic solutions targeting lifestyle modifications and emphasizing transparent, actionable predictive pathways are also emerging to address this quandary [261]. The seamless fusion of AI into existing clinical workflows, ensuring interpretability and adherence to guidelines, is of utmost importance [262].

A commitment to inclusivity in healthcare AI is essential, ensuring data covers a wide spectrum of populations. Bias mitigation is vital to prevent AI from inadvertently intensifying health inequities. Yet, datasets harnessed for AI often lack a balance inclusion of diverse ethnic and cultural communities, undermining a model’s broad relevance [263]. To counter this, current efforts are underway to diversify ethnic and ancestral makeup of the participant pool [264]. Among them, digital health innovations present opportunities to include often-overlooked groups in medical research, enhancing the accuracy of AI predictions in disease prevention [226, 265]. Tailoring models regionally and adopting transfer learning methods can help bridge performance gaps across demographics [266]. With the flexibility of AI, it could further facilitate precise bias identification and correction, proving superior to conventional risk assessment methods [267]. Such proactive measures are vital for AI’s effective integration into healthcare, promoting health equity [268].

In summary, while there are many challenges to implementing AI in healthcare, there are also promising solutions and opportunities to improve patient outcomes. A coordinated effort is needed to address these challenges and to ensure that AI is used ethically and responsibly in healthcare management [269, 270].