Deep learning algorithm for detecting obstructive coronary artery disease using fundus photographs

doi:10.21203/rs.3.rs-3969562/v1

Download PDF

Article

Deep learning algorithm for detecting obstructive coronary artery disease using fundus photographs

https://doi.org/10.21203/rs.3.rs-3969562/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Previous studies validating fundus photographs to provide information about coronary artery disease (CAD) risk are limited. Deep learning further facilitates and enhances the use of fundus photography. Therefore, we aimed to design and prospectively validate a deep learning model for detecting obstructive CADin patients with suspected coronary artery disease.The algorithm was trained to predict obstructive CAD using fundus photographs of 4808 participants in validation group and 1385 patientsin external test group. The performance of the model was evaluated using area under the receiver operating characteristic curve (AUC) with the cardiologist's diagnosis as the reference standard and compared to pre-test probability models. The algorithm had an AUC of 0.833 and 0.751 for detecting obstructive CAD in the validation and external test groups, respectively, which was higher than the Updated Diamond Forrester Method and the Duke Clinical Score. The proposed deep learning model has a moderate performance in diagnosing obstructive CAD. The results from this multicenter study advance the development of clinically applicable and interpretable deep learning systems for detecting obstructive CAD from fundus photographs.

Health sciences/Cardiology/Interventional cardiology

Physical sciences/Mathematics and computing/Software

Coronary artery disease (CAD) is a leading cause of morbidity and mortality worldwide¹. Early detection and appropriate management of obstructive CAD can significantly reduce the risk of adverse outcomes. Conventional coronary angiography, which is still the reference standard for the diagnosis of obstructive CAD, is invasive and expensive; therefore, prior diagnostic testing is recommended, especially in patients with an intermediate pre-test probability of having obstructive CAD². Furthermore, it is common to perform various noninvasive tests on patients, such as plate tests and coronary CTA, which can be expensive and risky³.

Given the need for an affordable and low-risk noninvasive method for detecting obstructive CAD, Diamond and Forrester developed the now widely known risk prediction model, and efforts have been made ever since to improve its efficiency by combining it with better diagnostic tests⁴. Furthermore, given the limitations of existing clinical prediction models and noninvasive testing, a high percentage of angiograms performed in routine practice do not detect obstructive CAD or do not reveal the presence of CAD. As a result, they do not help in improving patient management in terms of subsequent revascularization to reduce symptoms or improve prognosis.

Fundus photographs have recently emerged as a promising non-invasive tool for screening systemic vascular diseases⁵. The retinal vasculature shares anatomical and physiological similarities to the coronary vasculature⁶. Physicians perform fundus examinations in hypertensive patients to determine the presence and severity of retinal vascular damage as a means of estimating the risk of cardiovascular disease^7-9. The prospective epidemiologic data have suggested that changes in retinal vascular caliber including retinal arteriolar narrowing and venular dilatation were associated with typical cardiovascular disease risk factors (e.g., diabetes)¹⁰ and the presence of subclinical vascular disease (e.g., atherosclerosis and aortic stiffness), and that they predict subsequent clinical cardiovascular disease events (e.g., coronary heart disease, stroke, and mortality)^11-13. However, further validation of these promising findings in other clinical studies has been limited by the current versions of semi-automated computer software.

Artificial intelligence (AI), and especially the convolutional neural network (CNN) deep learning system (DLS), is being evaluated in medical imaging, including radiology, ultrasound, and clinical medicine^14-16. Recent applications of CNNs have suggested that fundus photographs can provide estimates of traditional cardiovascular disease risk factors, such as age, gender, and blood pressure, and can predict cardiovascular disease events^17,18. However, there is a lack of studies using DLS to integrate features from fundus photographs for detecting obstructive CAD. Here, we aimed to design and validate a DLS for detecting obstructive CAD using fundus photographs in an unselected cohort of suspected CAD patients referred for elective invasive coronary angiography.

Study population

Between July 2021 and October 2022, a total of 4808 participants who met the criteria for inclusion were collected from Anzhen Hospital and Daxing Hospital in Beijing. We used the fundus photographs of 1716 participants as the pre-training dataset, and the fundus photographs of the remaining 3092 participants were used to split the training and test datasets. We randomly divided 80% (n = 2473) into the training dataset and 20% (n = 619) into the validation dataset (Supplementary Table 1). Between November 2022 and May 2023, 1385 eligible patients were enrolled at the four sites for inclusion in the external testing dataset (Supplementary Fig. 1). The baseline characteristics were similar between the two groups. The high prevalence of CAD and positive predictive value of the enrolled population because the patients were admitted for imaging with suspected CAD (Supplementary Table 2).

Model performance in the validation and external datasets

The diagnostic performance of the AI algorithm on the validation and external testing datasets is shown in Table 1 and Fig. 1. The algorithm achieved an AUC of 0.833 (95% CI: 0.786–0.864) in the validation group and 0.751 (95% CI: 0.713–0.790) in the test group. Using the operating point with the greatest sum of sensitivity and specificity, the algorithm showed a sensitivity of 0.807 (95% CI: 0.771–0.843) and a specificity of 0.857 (95% CI: 0.837–0.877) in the validation group. In contrast, the sensitivity in the external test group was higher at 0.857 (95% CI: 0.837–0.877) and the specificity was lower at 0.540 (95% CI: 0.471–0.609) compared to that in the validation group. These results corresponded to an accuracy of 0.787 (95% CI: 0.786–0.789) in the validation group and a higher accuracy of 0.812 (95% CI: 0.811–0.813) in the external test group.

In the validation dataset, the model exhibited a higher AUC when compared to the UDFM (0.833 vs 0.608, P < 0.001), DCS (0.833 vs 0.601, P < 0.001), and the logistic regression (LR) model based on 18 baseline variables (0.833 vs 0.736, P < 0.001). Traditional pre-test probabilistic models in combination with AI algorithms improve AUC, UDFM vs. UDFM+AI (0.608 vs. 0.832, P < 0.001) or DCS vs. DCS+AI (0.599 vs. 0.821, P < 0.001). Combining the algorithm with the LR model, the AUC increased by Δ0.019 (0.833 vs 0.852, P = 0.042). The same results were observed in the external test dataset, where the AUC was higher compared to UDFM (0.751 vs 0.631, P < 0.001) and DCS (0.751 vs 0.590, P < 0.001). The diagnostic accuracy of the algorithmic model was significantly higher than that of the LR model (0.812 vs 0.662). Similarly, combining the algorithm with the LR model increased the AUC by Δ0.048 (0.751 vs 0.799, P < 0.001).

The AI algorithm showed a significantly higher reclassification ability in the validation and external test datasets over the UDFM (NRI: 0.793, 95% CI: 0.626–0.961, P < 0.001; NRI: 0.313, 95% CI: 0.244–0.383, P < 0.001), DCS (NRI: 0.530, 95% CI: 0.363–0.678, P < 0.001; NRI: 0.397, 95% CI: 0.268–0.528, P < 0.001), and LR (NRI: 0.527, 95% CI: 0.353–0.712, P < 0.001; NRI: 0.220 95% CI: 0.088–0.351, P = 0.001) (Table 2).

Calibration curves have been validated both internally and externally with good results. (Supplementary Fig. 2). The predictions encompassed almost the entire range, from 0% to 100%. The discrimination of this model was moderate (c = 0.83, 95% CI 0.80–0.87 in the validation group; c = 0.80, 95% CI 0.76–0.83 in the external test group).

Visualization results

In terms of the visual explanation, the saliency highlighted the vessels and optic disc areas in the fundus photographs. The CAM images of our model (Supplementary Fig. 3) show that our model focuses on the arterio-venous vessel region, indicating that the model obtains more information from this part of the region. This is consistent with the existing medical research on the definition of areas related to CAD in fundus photographs.

The performance of the different models is shown in Table 3 and Supplementary Fig. 4. We found that occluding arteries (AUC = 0.785, 95% CI: 0.757–0.813), occluding veins (AUC: 0.795, 95% CI: 0.768–0.822), or occluding both arteries and veins (AUC: 0.749, 95% CI: 0.719–0.779) caused a significant decline in the performance of the model. Further comparison showed that after occluding arteries and veins, the performance of the model decreased more when occluding arteries (ΔAUC = 0.036, P < 0.001) and veins (ΔAUC = 0.046, P < 0.001), demonstrating that blood vessels contain rich information related to CAD.

Algorithm performance in the ablation experiment

Based on our AI model as a multi-tasking model, some branch tasks, such as BMI, showed a significant decrease in AUC (ΔAUC = 0.027), followed by diabetes (ΔAUC = 0.010), drinking (ΔAUC = 0.009), age and hypercholesterolemia (ΔAUC = 0.007), hypertension (ΔAUC = 0.005), smoking (ΔAUC = 0.004), and sex (ΔAUC = 0.002) (Table 4).

Algorithm performance in subgroups

Supplementary Table 3 and Fig. 2 summarize the performance of the algorithm on a subset of the validation and test datasets. The results from fundus photographs with different lesion severities were quite different. In the multi-vessel group, the use of fundus photographs to predict CAD achieved better performance compared to the decreased ability in the single-vessel group (AUC: 0.881 vs 0.814, P = 0.025). The algorithm performed similarly for age, sex, smoking, diabetes, and presence of angina.

Deep learning has the potential to transform clinical care in the field of medical imaging. In this study, we developed and validated a deep learning CNN-based algorithm that uses fundus photographs to detect obstructive CAD (Fig. 3). We demonstrated that the proposed algorithm achieved an AUC of 0.833 (95% CI: 0.786–0.864) in the validation group and 0.751 (95% CI: 0.713–0.790) in the external test group. The proposed algorithm also had a sensitivity of 0.857 (0.837–0.877), a specificity of 0.540 (0.471–0.609), and an accuracy of 0.812 (0.811–0.813) in the external test group. Furthermore, the proposed algorithm showed superior performance compared with the UDFM, DCS, and logistic regression model (NRI for UDFM = 0.313, DCS = 0.397, LR = 0.220). Pre-test probabilistic models combined with AI algorithms can improve disease detection. Thus, the proposed algorithm has the potential to improve the early detection of obstructive CAD, leading to earlier intervention and improved patient outcomes.

Our study further elucidated the feasibility of using fundus photograph features to detect obstructive CAD. Fundus photography, which provides high-resolution in vivo images of the parenchyma and retinal vessels without invasive manipulation, has been used for ocular disease screening and has gained widespread interest as a biomarker for predicting vascular disease¹⁹. Changes in retinal vessel diameter (narrow small arteries and wide small veins) occur early in the development of retinal vascular disease and may reflect past and present levels of cardiovascular health; indeed, previous studies have found that narrower small retinal arteries and wider small retinal veins both suggest a higher risk of long-term mortality²⁰. Our work is based on the principle that the retina belongs to the same group of middle arteries as the coronary arteries of the heart and that they have similar morphological and physiological properties⁶. At the same time, in this study, we illustrated that the fundus vasculature (arteries and veins) contained a wealth of information by modeling interpretable tests, such as CAM and occlusion tests. Therefore, the retina is a unique site for the detection of obstructive CAD. However, most studies have measured retinal signs using semi-automated software that requires human intervention based on a pre-specified protocol, which may introduce internal variability²¹. Our study used AI techniques that might have significant advantages in these areas. By producing faster, easier, more consistent, and more accurate results, AI reduces variability and labor costs, thereby increasing the clinical utility of retinal photography.

Cheung et al.²² used a deep learning model to automatically measure the retinal vessel caliber in retinal photographs and found that it was associated with cardiovascular disease risk factors and the occurrence of cardiovascular events. Studies of retinal fundus photograph-based DLSs for predicting cardiovascular risk, such as the retinal fundus photograph-based deep learning model developed by Poplin et al.¹⁸ showed an AUC of 0.70 (95% CI: 0.648 to 0.740) for cardiovascular risk prediction, while the composite European score risk calculator had an AUC of 0.72 (95% CI: 0.67–0.76), which is close to that of the traditional cardiovascular risk calculator. A study exploring a deep learning algorithm for predicting the 10-year risk of ischemic cardiovascular disease based on fundus photographs showed that the algorithm had an AUC of 0.971 (95% CI: 0.967 to 0.975) and 0.976 (95% CI: 0.973 to 0.980), indicating that deep learning can reveal additional signals in retinal images for better cardiovascular risk stratification²³. However, previous studies were unable to integrate CAD-related fundus features to predict CAD, relied on subjective examination of fundus features, and lacked validation of separate data sets. Therefore, in this study, we developed and validated a deep learning algorithm that integrates information from a patient’s fundus photograph to detect obstructive CAD. The algorithm had a moderate performance in our test dataset (AUC: 0.833, sensitivity: 0.807, specificity: 0.726, and accuracy: 0.812) and in the external test group (AUC: 0.751, sensitivity: 0.857, specificity: 0.540, and accuracy: 0.787). The strengths of this study include the large and diverse clinical sample, with external testing datasets from >10,000 retinal images in different clinical settings, the availability of prospective datasets to examine incident CAD, and the inclusion of a focused vascular feature for potential clinical application. Furthermore, we demonstrated the robustness of the AI model with a prototype version in a prospective feasibility study.

The results of the AUC and NRI tests demonstrated that the diagnostic efficacy of our model exceeded that of traditional cardiovascular diagnostic models (UDFM and DCS)^16,24 and that of logistic regression models based on clinical data. The application of traditional models in primary care settings is limited by their reliance on the collection of information through questionnaires, invasive blood tests, and physical examinations^25,26. Thus, our study demonstrated the feasibility of the use of deep learning algorithms to detect obstructive CAD using human fundus photograph features without the need for additional medical information. The superior performance of our algorithm compared with conventional models may support its use in assessing the likelihood of obstructive CAD before any diagnostic testing or even as a supplement to further diagnostic testing, such as the exercise treadmill test, especially in patients with typical angina or with additional CAD risk factors.

Interestingly, in the subgroup analysis based on lesion severity, we found that patients with multivessel lesions were diagnosed with higher AUC values than those with single-vessel lesions, whereas there were no differences in groups in terms of gender and age. The model may have potential advantages in determining lesion severity. Further randomized control trials are needed to investigate the prognostic value of the model.

The plausibility of our findings was based on our ability to understand the algorithm by analyzing it using several methods. We hypothesized that the algorithm could predict CAD partly by identifying CAD risk factors. Furthermore, our model showed a strong ability to predict BMI and sex. This was consistent with the previous view that BMI and sex were important components of traditional cardiovascular risk factors¹⁸. However, the algorithm was unable to predict the other parameters with high precision. Furthermore, the addition of clinical variables did not significantly improve the algorithm’s performance, meaning that the algorithm could be easily used based on fundus photographs alone, without the need for additional history or examination.

In addition, we confirmed the performance of the ablation experiment between traditional cardiovascular risk factors that are considered important for CAD prevalence. The removal of BMI had a significant impact on the performance of the model. The reason for this may be that during the training process we weighted each task’s loss function, meaning that when the model’s overall loss function reached convergence, there may have been some loss functions of a single task that had not yet reached a state of full convergence. Therefore, for these individual tasks, the model will be unable to make more accurate predictions (Supplementary Methods 1.3). Furthermore, this proves that our model pays more attention to the learning of CAD-related information. In this study, we used multi-task learning to mine potential connections between tasks, which allows the model to pay more attention to the main task. Because our model is not asked to explicitly predict other tasks, the poor performance of individual branch tasks is the result of our model adjustments. More precisely, each layer of the model has limited representational capabilities. In the first few layers of the model, the model may pay attention to all tasks; however, as the number of layers deepens, most of the parameters in the model participate in the prediction of the main task, so that the model can better predict CAD rather than other tasks (Supplementary Methods 2). Overall, these results underscore the reliability and scientific validity of the proposed algorithm.

The above conditions will greatly improve the clinical generalizability of our AI system, and there are several potential application scenarios to explore for our algorithm. First, as a simple algorithm that is better than traditional diagnostic models, our algorithm can help assess the pre-test CAD probability to guide further diagnostic testing. Second, primary care centers lack resources to afford expensive and complex medical equipment technology. Therefore, the proposed algorithm can be developed into a self-reporting mobile application for use in high-risk community populations to assess CAD risk before a medical visit. In our study, the study population was hospitalized patients, so the algorithm needs to be further refined and validated in community populations.

This study also has several limitations that should be considered. First, as we studied only Chinese patients, our results may not be generalizable to other ethnic populations. However, this is unlikely to affect the achievement of our study’s main purpose, which is to evaluate the feasibility of deep learning for the detection of CAD based on fundus photographs. Second, fundus photographs do not provide direct visualization of the coronary vasculature, and the relationship between retinal vessel changes and CAD is not yet fully understood. Therefore, the diagnostic accuracy of retinal fundus photograph analysis for CAD may be limited by the presence of confounding factors, such as other systemic vascular diseases. Third, the study may be subject to selection bias given that the cohort may not be representative of the general population with suspected CAD. Fourth, this study relies on the accuracy of the gold standard diagnostic method, coronary angiography, which itself has limitations, such as inter-observer variability and the potential for false positives and negatives. Further research is needed to establish the diagnostic accuracy and reproducibility of retinal fundus photograph analysis for detecting obstructive CAD, as well as to address the technical and design limitations of this study.

Our study demonstrates the potential of deep-learning-based fundus photographs as a tool for detecting obstructive CAD. This method has several advantages over traditional diagnostic methods, including its non-invasive nature, low cost, and high efficiency. Additionally, fundus photographs are readily available in most clinical settings, making this detection method easily accessible. By providing a non-invasive and efficient method for detecting obstructive CAD, our findings have the potential to improve the overall management of this disease.

Study design

This multicenter cross-sectional study was conducted by the Department of Cardiology, Beijing Anzhen Hospital, Capital Medical University (Beijing, China). Data were obtained at four sites in China. The study was conducted according to the tenets of the Declaration of Helsinki and received approval from the Institutional Review Boards of the four centers involved in the trial.

Participants

Eligible participants were ≥ 18 years of age, with clinically suspected CAD, and were scheduled for coronary angiography. The exclusion criteria were as follows: (i) prior percutaneous coronary intervention (PCI); (ii) prior coronary artery bypass graft (CABG); (iii) other heart disease (e.g., congenital heart disease, valvular heart disease, or macrovascular disease); (iv) inability to have photographs taken; and (v) and a diagnosis of ST-segment elevation myocardial infarction (STEMI). Prior to the coronary angiography procedure, all eligible patients provided informed consent to participate in the study and to have their photographs used for research purposes.

Study setting

The study was conducted in two phases. In phase one, eligible patients from two sites were enrolled and split randomly into mutually exclusive sets for training and internal validation of the model at a ratio of 8:2. In phase two, eligible patients from the four sites were enrolled in a test group. Among the four sites involved in phase two, two also participated in phase one.

Dataset collection

Trained research nurses interviewed and photographed the patients before the procedures. The baseline interviews collected data on clinical presentation, family history, and medications. Two cameras were used to obtain fundus photographs, including a Canon CR-2AF and Topcon NW400. Binocular fundus photographs of each patient were captured using a 45° field of view. The quality of the fundus photographs was assessed by two investigators who were blinded to the study design, according to the protocol outlined in Supplementary Methods 3. Following image capture, the doctor cleaned the ocular images, and those with improper collection (eyelid occlusion or overexposure accounting for more than a tenth of one image) were excluded. Information on the demographic characteristics, medical history, risk factors, and laboratory tests was extracted from the patients’ medical records after the procedure.

Labeling for the artificial intelligence model

All enrolled patients were dichotomized according to the presence of CAD, which was defined as at least one coronary lesion stenosis ≥ 50% based on coronary angiography^27,28. Two interventional cardiologists who were blinded to the study design independently reviewed each patient’s angiogram to assess the degree of coronary artery stenosis. In the case of any disputes, a third cardiologist conducted a review to reach a consensus.

Model development

All fundus photographs were pre-processed using a quality control tool to ensure that unqualifying fundus photographs were excluded from algorithm development (Supplementary Methods 3). For algorithm development, a deep learning neural network was used to learn CAD features from the fundus photographs. To obtain an effective model for real clinical use, the whole development process included two stages, comprising model training and model validation. In the model training stage, the model extracted the useful features from the fundus photographs and performed the CAD classification decision. A loss function was used to calculate the error between the model output and the ground truth, and the model parameters were adjusted by decreasing the error. In the model validation stage, labels were used to measure the performance of the model, but not for prediction (Supplementary Fig. 5).

Notably, to ensure that the model can comprehensively learn the basic information related to CAD, we divided the quality-controlled fundus photographs into two parts: one part was used to pre-train the model to strengthen the attention to CAD-related areas in the original fundus photographs, and the other part was used to divide the training and test datasets.

The structure of our model is shown in (Fig. 4). All fundus photographs were resized to 300 px × 300 px and their black edges were removed before being input into the feature extractor: an Inception-Resnet-V2 network²⁹ consisting of several convolution layers and different pooling layers. The convolution and pooling layers cooperated to form multiple inception and reduction modules and finally extracted CAD features layer by layer. Subsequently, a fully connected layer with a unit size of 128 was used, which was connected to a dense unit to output a CAD probability prediction. To improve the model performance, eight dense units were used as auxiliary branches to output an age, sex, body mass index (BMI), smoking, drinking, hypertension, diabetes, and hypercholesterolemia prediction, given that they are clinical variables explicitly related to CAD. Among the eight auxiliary branches output by the model, the values of age and BMI are continuous variables, while the others are binary discrete variables.

For the details of implementation, we used cross-entropy as the loss function (Supplementary Methods 1 and 2), stochastic gradient descent as an optimizer, and SoftMax as the last activation function. We trained the model for 100 epochs with a batch size of 24 and the weight file used to store the model parameters was saved when the training loss had the minimum value. The algorithm was trained on the TensorFlow library with Nvidia GeForce GTX 3090 GPU equipment.

Interpretability experiments

To verify the interpretability of our model, we conducted a series of experiments to explore how the model works. Next, we will describe the experimental design of each in detail and analyze the results.

Ablation experiment

We next conducted a range of ablation experiments to delve deeper into the significance of each CAD risk factor in the overall model. To this end, we removed various branch tasks (CAD risk factors) and used the remaining branch and main tasks to train and test the model. Subsequently, we tested the performance of the eight algorithms in the test group to speculate the possible working mechanisms of the algorithm identifying CAD.

Model visualization

To understand how the model makes decisions and to make subsequent improvements, we visualized the model using class activation mapping (CAM). CAM is a technique used in computer vision to visualize and understand the regions of an image that are most important for predicting a certain object class. CAM works by utilizing a CNN trained on a specific task, such as image classification. The output of the CNN’s final convolutional layer is then weighted and combined to produce a heatmap that indicates the importance of different regions of the image. The results of CAM visualization revealed that the arteries and veins in the fundus photographs contain rich information related to CAD. Therefore, we designed an occlusion experiment to further reveal its correlation. Specifically, we occluded the arteries and veins in the original fundus photograph and used the occluded images to train and test the model, before analyzing its performance.

Statistical analysis

Based on the results of the internal validation group, the expected sensitivity for diagnostic CAD in the external validation group was 80%, with a 5% tolerance for sensitivity, and the specificity was 70%, with a 10% tolerance for specificity; a confidence level of 1-α = 0.95 was selected. Equal sample sizes were used for the patient and non-patient groups, as calculated using PASS 15 software, which required the inclusion of at least 245 patients and 245 non-patients, a total of 430 subjects.

The normality of the quantitative data was assessed using the Kolmogorov–Smirnov test. Continuous variables are expressed as the mean ± standard deviation (SD), skewed data are expressed as the median (interquartile range, IQR), and categorical variables are reported as percentages.

To evaluate the algorithm performance, the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were calculated on both the validation and external testing datasets. Exact 95% confidence intervals (CIs) were calculated for all measures of diagnostic performance and Delong tests were used to compare the AUCs of different models. The incremental prognostic value of the AI model over the updated Diamond-Forrester method (UDFM) and Duke clinical score (DCS) in the detection of CAD was assessed using the net reclassification index (NRI). We examined the calibration and calibration slopes over a wide range of scales, along with the calibration plots, to assess the consistency between observations and predictions³⁰. The multivariable logistic regression results of the clinical model to predict CAD are shown in Supplementary Table 4. Pre-specified subgroup analyses were conducted according to age, sex, smoking, diabetes, symptoms, and lesion severity, while Python was employed for the analysis of UDFM and DCS. A two-tailed P-value < 0.05 was considered significant. Statistical analyses were performed using R version 4.0.2.

Acknowledgements

We would like to thank all of the professors who helped with this research (especially Dong Zhao and Jing Liu), as well as our colleagues who participated in the data collection process. This work was supported by the National Natural Science Foundation of China (Grant Nos. 82070301 and 82270345), Beijing Municipal Health Bureau (Grant Nos. YGLX202324) and the Beijing Municipal Science and Technology Commission (Grant No. Z211100003521022).

Author contributions

YZ and ZYG conceptualized the project. YZ and YDD contributed to the conception and design of the study. YDD, TM, GZ, and JL organized the database. YDD and SJZ performed the statistical analysis. YDD and SJZ wrote the first draft of the manuscript. SJC, XXL, and YJL were responsible for patient enrollment. YZC supervised the study. All authors contributed to the article and approved the submitted version.

Competing interests

The authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Roth, G. A. et al. Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015. J. Am. Coll. Cardiol. 70, 1–25 (2017). https://doi.org/10.1016/j.jacc.2017.04.052
Udelson, J. E. et al. Myocardial perfusion imaging for evaluation and triage of patients with suspected acute cardiac ischemia: a randomized controlled trial. JAMA 288, 2693–2700 (2002). https://doi.org/10.1001/jama.288.21.2693
Mahmoodzadeh, S., Moazenzadeh, M., Rashidinejad, H. & Sheikhvatan, M. Diagnostic performance of electrocardiography in the assessment of significant coronary artery disease and its anatomical size in comparison with coronary angiography. J. Res. Med. Sci. 16, 750–755 (2011).
Pryor, D. B., Harrell, F. E., Lee, K. L., Califf, R. M. & Rosati, R. A. Estimating the likelihood of significant coronary artery disease. Am. J. Med. 75, 771–780 (1983). https://doi.org/10.1016/0002-9343(83)90406-0
Modi, P. & Arsiwalla, T. in StatPearls [Internet] (StatPearls Publishing, 2023).
Al-Fiadh, A. H. et al. Retinal microvascular structure and function in patients with risk factors of atherosclerosis and coronary artery disease. Atherosclerosis 233, 478–484 (2014). https://doi.org/10.1016/j.atherosclerosis.2013.12.044
Williams, B. et al. British Hypertension Society guidelines for hypertension management 2004 (BHS-IV): summary. BMJ 328, 634–640 (2004). https://doi.org/10.1136/bmj.328.7440.634
Jones, N. R., McCormack, T., Constanti, M. & McManus, R. J. Diagnosis and management of hypertension in adults: NICE guideline update 2019. Br. J. Gen. Pract. 70, 90–91 (2020). https://doi.org/10.3399/bjgp20X708053
Mansia, G. et al. 2007 ESH-ESC Guidelines for the management of arterial hypertension: the task force for the management of arterial hypertension of the European Society of Hypertension (ESH) and of the European Society of Cardiology (ESC). Blood Press. 16, 135–232 (2007). https://doi.org/10.1080/08037050701461084
Cheung, C. Y., Ikram, M. K., Klein, R. & Wong, T. Y. The clinical implications of recent studies on the structure and function of the retinal microvasculature in diabetes. Diabetologia 58, 871–885 (2015). https://doi.org/10.1007/s00125-015-3511-1
Cheung, N. et al. Retinal arteriolar narrowing and left ventricular remodeling: the multi-ethnic study of atherosclerosis. J. Am. Coll. Cardiol. 50, 48–55 (2007). https://doi.org/10.1016/j.jacc.2007.03.029
McGeechan, K. et al. Meta-analysis: retinal vessel caliber and risk for coronary heart disease. Ann. Intern. Med. 151, 404–413 (2009). https://doi.org/10.7326/0003-4819-151-6-200909150-00005
McGeechan, K. et al. Risk prediction of coronary heart disease based on retinal vascular caliber (from the Atherosclerosis Risk In Communities [ARIC] Study). Am. J. Cardiol. 102, 58–63 (2008). https://doi.org/10.1016/j.amjcard.2008.02.094
Zhou, L. Q. et al. Lymph node metastasis prediction from primary breast cancer US images using deep learning. Radiology 294, 19–28 (2020). https://doi.org/10.1148/radiol.2019190372
Obermeyer, Z. & Emanuel, E. J. Predicting the future - big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219 (2016). https://doi.org/10.1056/NEJMp1606181
Haug, C. J. & Drazen, J. M. Artificial intelligence and machine learning in clinical medicine, 2023. N. Engl. J. Med. 388, 1201–1208 (2023). https://doi.org/10.1056/NEJMra2302038
Rim, T. H. et al. Prediction of systemic biomarkers from retinal photographs: development and validation of deep-learning algorithms. Lancet Digit Health 2, e526–e536 (2020). https://doi.org/10.1016/s2589-7500(20)30216-8
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2, 158–164 (2018). https://doi.org/10.1038/s41551-018-0195-0
Wong, T. Y. et al. Retinal vascular caliber, cardiovascular risk factors, and inflammation: the multi-ethnic study of atherosclerosis (MESA). Invest. Ophthalmol. Vis. Sci. 47, 2341–2350 (2006). https://doi.org/10.1167/iovs.05-1539
Ogagarue, E. R., Lutsey, P. L., Klein, R., Klein, B. E. & Folsom, A. R. Association of ideal cardiovascular health metrics and retinal microvascular findings: the Atherosclerosis Risk in Communities Study. J. Am. Heart Assoc. 2, e000430 (2013). https://doi.org/10.1161/jaha.113.000430
McGrory, S. et al. The application of retinal fundus camera imaging in dementia: a systematic review. Alzheimers Dement (Amst) 6, 91–107 (2017). https://doi.org/10.1016/j.dadm.2016.11.001
Cheung, C. Y. et al. A deep-learning system for the assessment of cardiovascular disease risk via the measurement of retinal-vessel calibre. Nat Biomed Eng 5, 498–508 (2021). https://doi.org/10.1038/s41551-020-00626-4
Ma, Y. et al. Deep learning algorithm using fundus photographs for 10-year risk assessment of ischemic cardiovascular diseases in China. Sci Bull (Beijing) 67, 17–20 (2022). https://doi.org/10.1016/j.scib.2021.08.016
Almeida, J. et al. Comparison of coronary artery disease consortium 1 and 2 scores and duke clinical score to predict obstructive coronary disease by invasive coronary angiography. Clin. Cardiol. 39, 223–228 (2016). https://doi.org/10.1002/clc.22515
Wu, Y. et al. Estimation of 10-year risk of fatal and nonfatal ischemic cardiovascular diseases in Chinese adults. Circulation 114, 2217–2225 (2006). https://doi.org/10.1161/circulationaha.105.607499
Reeh, J. et al. Prediction of obstructive coronary artery disease and prognosis in patients with suspected stable angina. Eur. Heart J. 40, 1426–1435 (2019). https://doi.org/10.1093/eurheartj/ehy806
Genders, T. S. et al. A clinical prediction rule for the diagnosis of coronary artery disease: validation, updating, and extension. Eur. Heart J. 32, 1316–1330 (2011). https://doi.org/10.1093/eurheartj/ehr014
Fordyce, C. B. et al. Identification of patients with stable chest pain deriving minimal value from noninvasive testing: the promise minimal-risk tool, a secondary analysis of a randomized clinical trial. JAMA Cardiol 2, 400–408 (2017). https://doi.org/10.1001/jamacardio.2016.5501
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17). 4278–4284 (AAAI).
Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 35, 1925–1931 (2014). https://doi.org/10.1093/eurheartj/ehu207

Table 1 | Performance of different models for detecting CAD.

Models	AUC	Sensitivity	Specificity	Accuracy	P-value
Validation dataset
Algorithm	0.833 (0.786–0.864)	0.807 (0.771–0.843)	0.726 (0.656–0.796)	0.787 (0.786–0.789)	Ref
UDFM model	0.608 (0.554–0.662)	0.709 (0.667–0.750)	0.523 (0.444–0.601)	0.662 (0.661–0.663)	<0.001
CAD Duke clinical score	0.601 (0.546–0.652)	0.650 (0.605–0.692)	0.587 (0.508–0.662)	0.648 (0.646–0.649)	<0.001
Logistic regression model	0.736 (0.683–0.784)	0.777 (0.738–0.815)	0.585 (0.503–0.666)	0.731 (0.730–0.732)	<0.001
Algorithm + UDFM model	0.832 (0.796–0.868)	0.672 (0.629–0.715)	0.852 (0.796–0.908)	0.717 (0.716–0.718)	0.188
Algorithm + Duke clinical score	0.821 (0.784–0.865)	0.807 (0.770–0.843)	0.729 (0.659–0.799)	0.787 (0.786–0.788)	0.447
Algorithm + logistic regression model	0.852 (0.817–0.886)	0.723 (0.682–0.765)	0.831 (0.769–0.893)	0.749 (0.748–0.750)	0.042
External test dataset
Algorithm	0.751 (0.713–0.790)	0.857 (0.837–0.877)	0.540 (0.471–0.609)	0.812 (0.811–0.813)	Ref
UDFM model	0.631 (0.589–0.673)	0.725 (0.699–0.750)	0.555 (0.486–0.624)	0.700 (0.695–0.701)	0.001
CAD Duke clinical score	0.590 (0.545–0.632)	0.673 (0.645–0.699)	0.510 (0.441–0.578)	0.649 (0.647–0.649)	<0.001
Logistic regression model	0.726 (0.684–0.769)	0.657 (0.628–0.686)	0.693 (0.622–0.764)	0.662 (0.771–0.663)	0.400
Algorithm + UDFM model	0.765 (0.727–0.802)	0.828 (0.806–0.849)	0.580 (0.512–0.648)	0.792 (0.791–0.793)	0.224
Algorithm + Duke clinical score	0.741 (0.701–0.782)	0.723 (0.698–0.749)	0.675 (0.610–0.740)	0.716 (0.716–0.717)	0.226
Algorithm + logistic regression model	0.799 (0.762–0.835)	0.750 (0.723–0.776)	0.706 (0.636–0.775)	0.744 (0.743–0.744)	<0.001

The logistic regression model included the following baseline variables: age, sex, smoking, drinking, body mass index, hypertension, hyperlipidemia, diabetes, creatinine, C-reactive protein, red blood cells, homocysteine, white blood cells, platelets, serum glucose, total cholesterol, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, left ventricular end-diastolic volume, and ejection fraction. All variables in the traditional and updated models for predicting coronary heart disease were included in this model.

AUC, area under the receiver operating characteristic curve; CAD, coronary artery disease; UDFM, updated Diamond-Forrester method.

Table 2 | Comparison of the discriminant and reclassification ability of models.

	NRI	P-value
Validation dataset
Algorithm vs UDFM	0.793 (0.626–0.961)	<0.001
Algorithm vs DCS	0.530 (0.363–0.678)	<0.001
Algorithm vs logistic regression model	0.527 (0.353–0.712)	<0.001
External test dataset
Algorithm vs UDFM	0.313 (0.244–0.383)	<0.001
Algorithm vs DCS	0.397 (0.268–0.528)	<0.001
Algorithm vs logistic regression model	0.220 (0.088–0.351)	0.001

DCS, Duke clinical score; NRI, net reclassification index; UDFM, updated Diamond-Forrester method.

Table 3 | Performance of different occlusions.

Process	AUC	Sensitivity	Specificity	Accuracy	P-value^a
Arteries and veins	0.749 (0.719–0.779)	0.701 (0.668–0.734)	0.636 (0.600–0.672)	0.682 (0.648–0.716)	Reference
Arteries	0.785 (0.757–0.813)	0.713 (0.681–0.745)	0.692 (0.659–0.725)	0.703 (0.670–0.736)	<0.001
Veins	0.795 (0.768–0.822)	0.720 (0.688–0.752)	0.603 (0.566–0.640)	0.665 (0.630–0.700)	<0.001

^aP-value was obtained by comparison of the AUC value.

AUC, area under the receiver operating characteristic curve.

Table 4 | Performance of the ablation experiment.

Ablation task	AUC	Sensitivity	Specificity	Accuracy
w/o Age	0.826 (0.793, 0.859)	0.805 (0.770, 0.840)	0.720 (0.678, 0.762)	0.765 (0.726, 0.804)
w/o Sex	0.831 (0.799, 0.863)	0.818 (0.785, 0.851)	0.709 (0.666, 0.752)	0.766 (0.728, 0.804)
w/o BMI	0.806 (0.771, 0.841)	0.777 (0.739, 0.815)	0.701 (0.657,0.745)	0.741 (0.700,0.782)
w/o Smoking	0.829 (0.797, 0.861)	0.813 (0.779, 0.847)	0.703 (0.660, 0.746)	0.761 (0.722, 0.800)
w/o Drinking	0.824 (0.791, 0.857)	0.803 (0.768, 0.838)	0.703 (0.660, 0.746)	0.756 (0.717, 0.795)
w/o Hypertension	0.828 (0.796, 0.860)	0.810 (0.776, 0.844)	0.706 (0.663, 0.749)	0.761 (0.722, 0.800)
w/o Diabetes	0.823 (0.790, 0.856)	0.808 (0.773, 0.843)	0.703 (0.660, 0.746)	0.758 (0.719, 0.797)
w/o Hypercholesterolemia	0.826 (0.793, 0.859)	0.802 (0.767, 0.837)	0.729 (0.687, 0.771)	0.768 (0.730, 0.806)
w/o Baseline	0.833 (0.786, 0.864)	0.807 (0.771, 0.843)	0.726 (0.656, 0.796)	0.787 (0.786, 0.789)

AUC, area under the receiver operating characteristic curve; BMI, body mass index.

There is NO Competing Interest.

SupplementaryMaterials.docx
Dataset 1

Download PDF

Version 1

posted

You are reading this latest preprint version

Deep learning algorithm for detecting obstructive coronary artery disease using fundus photographs

Status:

Version 1

Abstract

Figures

Introduction

Results

Study population

Model performance in the validation and external datasets

Visualization results

Algorithm performance in the ablation experiment

Algorithm performance in subgroups

Discussion

Methods

Study design

Participants

Study setting

Dataset collection

Labeling for the artificial intelligence model

Model development

Interpretability experiments

Ablation experiment

Model visualization

Statistical analysis

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1