Study design
This multicenter cross-sectional study was conducted by the Department of Cardiology, Beijing Anzhen Hospital, Capital Medical University (Beijing, China). Data were obtained at four sites in China. The study was conducted according to the tenets of the Declaration of Helsinki and received approval from the Institutional Review Boards of the four centers involved in the trial.
Participants
Eligible participants were ≥ 18 years of age, with clinically suspected CAD, and were scheduled for coronary angiography. The exclusion criteria were as follows: (i) prior percutaneous coronary intervention (PCI); (ii) prior coronary artery bypass graft (CABG); (iii) other heart disease (e.g., congenital heart disease, valvular heart disease, or macrovascular disease); (iv) inability to have photographs taken; and (v) and a diagnosis of ST-segment elevation myocardial infarction (STEMI). Prior to the coronary angiography procedure, all eligible patients provided informed consent to participate in the study and to have their photographs used for research purposes.
Study setting
The study was conducted in two phases. In phase one, eligible patients from two sites were enrolled and split randomly into mutually exclusive sets for training and internal validation of the model at a ratio of 8:2. In phase two, eligible patients from the four sites were enrolled in a test group. Among the four sites involved in phase two, two also participated in phase one.
Dataset collection
Trained research nurses interviewed and photographed the patients before the procedures. The baseline interviews collected data on clinical presentation, family history, and medications. Two cameras were used to obtain fundus photographs, including a Canon CR-2AF and Topcon NW400. Binocular fundus photographs of each patient were captured using a 45° field of view. The quality of the fundus photographs was assessed by two investigators who were blinded to the study design, according to the protocol outlined in Supplementary Methods 3. Following image capture, the doctor cleaned the ocular images, and those with improper collection (eyelid occlusion or overexposure accounting for more than a tenth of one image) were excluded. Information on the demographic characteristics, medical history, risk factors, and laboratory tests was extracted from the patients’ medical records after the procedure.
Labeling for the artificial intelligence model
All enrolled patients were dichotomized according to the presence of CAD, which was defined as at least one coronary lesion stenosis ≥ 50% based on coronary angiography27,28. Two interventional cardiologists who were blinded to the study design independently reviewed each patient’s angiogram to assess the degree of coronary artery stenosis. In the case of any disputes, a third cardiologist conducted a review to reach a consensus.
Model development
All fundus photographs were pre-processed using a quality control tool to ensure that unqualifying fundus photographs were excluded from algorithm development (Supplementary Methods 3). For algorithm development, a deep learning neural network was used to learn CAD features from the fundus photographs. To obtain an effective model for real clinical use, the whole development process included two stages, comprising model training and model validation. In the model training stage, the model extracted the useful features from the fundus photographs and performed the CAD classification decision. A loss function was used to calculate the error between the model output and the ground truth, and the model parameters were adjusted by decreasing the error. In the model validation stage, labels were used to measure the performance of the model, but not for prediction (Supplementary Fig. 5).
Notably, to ensure that the model can comprehensively learn the basic information related to CAD, we divided the quality-controlled fundus photographs into two parts: one part was used to pre-train the model to strengthen the attention to CAD-related areas in the original fundus photographs, and the other part was used to divide the training and test datasets.
The structure of our model is shown in (Fig. 4). All fundus photographs were resized to 300 px × 300 px and their black edges were removed before being input into the feature extractor: an Inception-Resnet-V2 network29 consisting of several convolution layers and different pooling layers. The convolution and pooling layers cooperated to form multiple inception and reduction modules and finally extracted CAD features layer by layer. Subsequently, a fully connected layer with a unit size of 128 was used, which was connected to a dense unit to output a CAD probability prediction. To improve the model performance, eight dense units were used as auxiliary branches to output an age, sex, body mass index (BMI), smoking, drinking, hypertension, diabetes, and hypercholesterolemia prediction, given that they are clinical variables explicitly related to CAD. Among the eight auxiliary branches output by the model, the values of age and BMI are continuous variables, while the others are binary discrete variables.
For the details of implementation, we used cross-entropy as the loss function (Supplementary Methods 1 and 2), stochastic gradient descent as an optimizer, and SoftMax as the last activation function. We trained the model for 100 epochs with a batch size of 24 and the weight file used to store the model parameters was saved when the training loss had the minimum value. The algorithm was trained on the TensorFlow library with Nvidia GeForce GTX 3090 GPU equipment.
Interpretability experiments
To verify the interpretability of our model, we conducted a series of experiments to explore how the model works. Next, we will describe the experimental design of each in detail and analyze the results.
Ablation experiment
We next conducted a range of ablation experiments to delve deeper into the significance of each CAD risk factor in the overall model. To this end, we removed various branch tasks (CAD risk factors) and used the remaining branch and main tasks to train and test the model. Subsequently, we tested the performance of the eight algorithms in the test group to speculate the possible working mechanisms of the algorithm identifying CAD.
Model visualization
To understand how the model makes decisions and to make subsequent improvements, we visualized the model using class activation mapping (CAM). CAM is a technique used in computer vision to visualize and understand the regions of an image that are most important for predicting a certain object class. CAM works by utilizing a CNN trained on a specific task, such as image classification. The output of the CNN’s final convolutional layer is then weighted and combined to produce a heatmap that indicates the importance of different regions of the image. The results of CAM visualization revealed that the arteries and veins in the fundus photographs contain rich information related to CAD. Therefore, we designed an occlusion experiment to further reveal its correlation. Specifically, we occluded the arteries and veins in the original fundus photograph and used the occluded images to train and test the model, before analyzing its performance.
Statistical analysis
Based on the results of the internal validation group, the expected sensitivity for diagnostic CAD in the external validation group was 80%, with a 5% tolerance for sensitivity, and the specificity was 70%, with a 10% tolerance for specificity; a confidence level of 1-α = 0.95 was selected. Equal sample sizes were used for the patient and non-patient groups, as calculated using PASS 15 software, which required the inclusion of at least 245 patients and 245 non-patients, a total of 430 subjects.
The normality of the quantitative data was assessed using the Kolmogorov–Smirnov test. Continuous variables are expressed as the mean ± standard deviation (SD), skewed data are expressed as the median (interquartile range, IQR), and categorical variables are reported as percentages.
To evaluate the algorithm performance, the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were calculated on both the validation and external testing datasets. Exact 95% confidence intervals (CIs) were calculated for all measures of diagnostic performance and Delong tests were used to compare the AUCs of different models. The incremental prognostic value of the AI model over the updated Diamond-Forrester method (UDFM) and Duke clinical score (DCS) in the detection of CAD was assessed using the net reclassification index (NRI). We examined the calibration and calibration slopes over a wide range of scales, along with the calibration plots, to assess the consistency between observations and predictions30. The multivariable logistic regression results of the clinical model to predict CAD are shown in Supplementary Table 4. Pre-specified subgroup analyses were conducted according to age, sex, smoking, diabetes, symptoms, and lesion severity, while Python was employed for the analysis of UDFM and DCS. A two-tailed P-value < 0.05 was considered significant. Statistical analyses were performed using R version 4.0.2.