Feasibility of tongue image detection for coronary artery disease: based on deep learning

Aim Clarify the potential diagnostic value of tongue images for coronary artery disease (CAD), develop a CAD diagnostic model that enhances performance by incorporating tongue image inputs, and provide more reliable evidence for the clinical diagnosis of CAD, offering new biological characterization evidence. Methods We recruited 684 patients from four hospitals in China for a cross-sectional study, collecting their baseline information and standardized tongue images to train and validate our CAD diagnostic algorithm. We used DeepLabV3 + for segmentation of the tongue body and employed Resnet-18, pretrained on ImageNet, to extract features from the tongue images. We applied DT (Decision Trees), RF (Random Forest), LR (Logistic Regression), SVM (Support Vector Machine), and XGBoost models, developing CAD diagnostic models with inputs of risk factors alone and then with the additional inclusion of tongue image features. We compared the diagnostic performance of different algorithms using accuracy, precision, recall, F1-score, AUPR, and AUC. Results We classified patients with CAD using tongue images and found that this classification criterion was effective (ACC = 0.670, AUC = 0.690, Recall = 0.666). After comparing algorithms such as Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and XGBoost, we ultimately chose XGBoost to develop the CAD diagnosis algorithm. The performance of the CAD diagnosis algorithm developed solely based on risk factors was ACC = 0.730, Precision = 0.811, AUC = 0.763. When tongue features were integrated, the performance of the CAD diagnosis algorithm improved to ACC = 0.760, Precision = 0.773, AUC = 0.786, Recall = 0.850, indicating an enhancement in performance. Conclusion The use of tongue images in the diagnosis of CAD is feasible, and the inclusion of these features can enhance the performance of existing CAD diagnosis algorithms. We have customized this novel CAD diagnosis algorithm, which offers the advantages of being noninvasive, simple, and cost-effective. It is suitable for large-scale screening of CAD among hypertensive populations. Tongue image features may emerge as potential biomarkers and new risk indicators for CAD.


Introduction
The World Health Organization (WHO) has declared that cardiovascular disease (CVD), particularly coronary artery disease (CAD), is the leading cause of death due to illness globally, accounting for 17.9 million fatalities per year, representing 32% of all deaths caused by disease (1).This significant public health challenge places a significant financial burden on national health budgets (2).Hypertension is an important independent risk factor for the development of CAD (3).Relevant studies have shown that patients with hypertension have a higher risk of developing CAD, and when the two diseases coexist, there is a significant increase in the risk of cardiovascular death (4,5).Early diagnosis and prompt treatment of patients with CAD have been shown to significantly improve outcomes and reduce treatment costs (3).However, the gold standard for diagnosing CAD is invasive coronary angiography, which is expensive and can cause complications (6).It is not suitable for early diagnosis and disease risk assessment.Finding non-invasive, cost-effective and efficient methods for early CAD diagnosis is crucial in global public health.
The rapid development of artificial intelligence in recent years has provided new insights into the exploration of non-invasive diagnostic methods for CAD.Clinical data exhibit complex and multidimensional characteristics, in which machine learning (ML) demonstrates advantages over traditional statistical methods (7).ML involves the selection and integration of multiple models.When confronted with the complex nonlinear relationships in clinical data, traditional statistical methods often struggle to accomplish modeling tasks.However, ML algorithms can automatically learn to handle nonlinear relationships and select useful predictive features.These algorithms can more effectively reveal hidden relationships within data and has been increasingly utilized for the diagnosis and risk prediction of clinical diseases.Some scholars have already utilized ML to develop diagnostic models for CAD, with the predictive variables primarily being clinical risk factors (8)(9)(10).These studies demonstrate the promising application prospects of ML in clinical diagnostic tasks.Recent research has found that, in addition to risk factors, other biological information may also hold significant importance for the diagnosis of CAD, such as facial images and pulse waves (11,12).In clinical diagnosis, the primary focus is on the patient's symptoms and signs.Traditional Chinese Medicine (TCM) employs unique and effective diagnostic strategies, particularly in observing the external conditions of patients.TCM theory posits that "internal diseases manifest externally," thereby allowing practitioners to gauge the severity of illnesses through observation.Tongue diagnosis is a critical component of the TCM observation process.The appearance of the tongue, including its color, shape, and coating, has long been utilized in TCM to diagnose various health conditions.From a biomedical perspective, the tongue is a highly vascular organ closely related to the cardiovascular system.Changes in blood circulation and overall systemic health often manifest as observable alterations in the tongue's appearance.Many studies have proven the effectiveness of diagnosing diseases through tongue observation (references 14-27), and we have compiled these studies into Table 1 and provided commentary on each.From a biomedical standpoint, the tongue contains rich physiological and pathological information.It is an important terminal organ with abundant blood supply, closely linked to the cardiovascular system.When there are issues with blood circulation, the tongue's appearance often changes (27,28).Biomedical research suggests that hypoxemia can lead to changes in tongue color and is associated with various cardiovascular diseases (29).In the case of CHD, the narrowing of the coronary arteries restricts blood flow to the heart, potentially causing systemic changes in overall circulation and oxygenation levels, which manifest on the tongue.Despite this, the tongue has not been effectively utilized in the actual diagnosis process of CAD.Recent advancements in artificial intelligence and deep learning have made it possible to extract and analyze subtle features from medical images, including tongue images, which were previously difficult to quantify.By leveraging these technologies, it is possible to detect patterns and features in tongue images associated with CAD, providing a non-invasive, cost-effective, and accessible diagnostic tool.Therefore, what exactly is the diagnostic value of the tongue for CAD? Can tongue images become a crucial basis for optimizing non-invasive diagnosis of CAD?This is precisely the question this study aims to explore.
To this end, we conducted a multi-center cross-sectional clinical study, utilizing deep learning methods to explore the potential connection between tongue image features and CAD.Additionally, we aimed to investigate the feasibility of optimizing CAD diagnostic models by incorporating tongue image features as risk factors.Meanwhile, we also hope to present a new and effective biomarker for the clinical diagnosis of CAD.

Study population and ethical statement
From March 2019 to November 2022, hypertensive patients aged 18-85 were recruited from the cardiology departments of Dongzhimen Hospital, Dongfang Hospital, the Third Affiliated Hospital of Beijing University of Chinese Medicine, and the First Affiliated Hospital of Hunan University of Chinese Medicine.All participants signed an informed consent form, and the study was conducted in accordance with the Declaration of Helsinki.The ethical review of this study was carried out and approved by the Institutional Review Board (IRB) of Shuguang Hospital affiliated with Shanghai University of Traditional Chinese Medicine (IRB number: 2018-626-55-01), with the clinical trial registration number ChiCTR1900026008.All source codes and data analyzed in this study can be obtained from the corresponding author upon reasonable request.
The diagnostic criteria for hypertension refer to the "Chinese Guidelines for the Prevention and Treatment of Hypertension," which define hypertension as a systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg, or currently undergoing treatment with antihypertensive medication (30).The diagnosis of CAD is based on the patient's CAG results, that is, a narrowing of the inner diameter of at least one of the coronary arteries (left anterior descending, left circumflex, right coronary artery, or left main) by ≥50%.Initially, 684 patients were recruited, with the following exclusion criteria: (1) patients whose tongue appearance was altered by medications or food, (2)

Data collection
Trained research physicians conducted interviews and took tongue photographs of participants following a standardized collection process.The interviews gathered baseline data on general conditions, socioeconomic status, lifestyle (alcohol consumption, smoking, insomnia), and clinical manifestations.Tongue photographs were collected using a TFDA-1 tongue diagnosis instrument (Figure 1), two hours after breakfast or lunch.The specific steps for image collection are as follows: (1) Power on the Instrument after inspection and adjust the camera parameters.(2) Disinfect the areas of the instrument that may come into direct contact with the participant using 75% alcohol.(3) Instruct the patient to place their face on the chin rest, relax, and stick out their tongue flatly.(4) Turn on the built-in ring light source and complete the image capture.
(5) Check the photo; if it is satisfactory, the collection is complete; if not, retake the photo until the image quality meets the standard.Qualification criteria for photo quality: no problems such as occlusion, blurring, fogging, overexposure, or underexposure; the tongue should be relaxed and flattened with no twisting or tension; there should be no foreign objects, staining, or other conditions affecting the appearance of the tongue surface.

Data preprocessing
Patients were divided into two groups based on whether they were diagnosed with CAD: the hypertension group and the The tongue diagnosis instrument and collection process.1: lens hood, 2: LED light resource, 3: high-definition camera, 4: chin support plate.Note.Use fixed standard camera parameters when shooting: the color temperature is 5,000 k, the color rendering index is 97, the frame rate is 1/125 s, the aperture is F/6.3, the exposure indicator scale is 0 or ±1.hypertension combined with CAD group, with the labels recorded as 0 and 1, respectively.Considering that tongue images also contain other facial information, which is superfluous for this study, we built a deep learning model for semantic segmentation of the tongue body using the DeepLabV3 + framework (Figure 2A).We used 500 images from a national key research and development program tongue image database for model training, implementing a phased training strategy.In the first 50 epochs of training (Figure 2B), the backbone of the model was frozen to focus on fine-tuning the tail end of the network, with a batch size set to 8. Subsequently, in the unfreezing phase, all network layers were involved in training, with the batch size adjusted to 4 and the learning rate set to 0.01.After completing the segmentation of the tongue body, the image size was uniformly cropped and adjusted to 256 × 256 pixels (Figure 2C).Additionally, due to our overall small sample size, data augmentation was performed on the images through rotation, flipping, and translation.
For the baseline data of patients obtained through interviews, there was only a minimal amount of missing data (less than 5% missing).Various interpolation methods were used to fill in the missing data.For discrete variables in the baseline data, such as gender and ethnicity, one-hot encoding was employed for data preprocessing.

Development of a CAD diagnostic algorithm
The customization of a CAD diagnostic algorithm primarily encompasses two core steps: the classification of images using deep learning frameworks, and the construction of diagnostic models utilizing common ML techniques.In this study, we utilized the ResNet-18 network (Figure 3A   feature vector, which is then linked to a binary classification output layer to produce probabilities.After extracting the deep feature vectors, we constructed a CAD diagnostic model that integrates tongue image feature vectors with risk factors.To optimize the model's performance, we explored a variety of common ML algorithms, including Random Forest (RF), Support Vector Machine (SVM), Decision Trees (DT), Logistic Regression (LR), and XGBoost, all of which are widely used in disease classification and risk prediction.By comparing the performance of these algorithms, we will select the one with the best performance as the core algorithm for our final CAD diagnostic model.

Statistical analysis
Using SPSS 27 and Python 3.8 as statistical tools, we processed and analyzed the data.For data that followed a normal distribution, we employed descriptive statistics using X + SD, along with one-way ANOVA to explore differences between groups.Before conducting the one-way ANOVA, we performed a homogeneity of variance test, such as Levene's test, to ensure that the variances across groups were roughly equal.For data not meeting the normal distribution criteria, we opted for quartile descriptions and employed the Kruskal-Wallis H test to compare differences between two groups.Additionally, we conducted an in-depth analysis of potential risk factors for CAD using binary logistic regression, presenting the results with adjusted odds ratios (adjusted OR) and their 95% confidence intervals (CI).Throughout the analysis, differences were considered statistically significant when the p-value < 0.05.
For binary classification problems, examples can be divided into four categories based on the combination of their true labels and the predictions made by the classifier: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN).In addition, we performed a 5-fold cross-validation on the training set to evaluate the model's effectiveness.
We evaluated the segmentation results of the DeepLabV3 + model using Pixel Accuracy (PA) and Mean Intersection over Union (MIoU) (Equations 5, 6).The formulas for calculating PA and MIoU are as follows, where k represents the number of categories excluding the background, and p ij denotes the number of pixels of class i predicted to be class j (32): 3 Results

Patient recruitment and data analysis
As shown in Figure 4, we recruited a total of 511 hypertension patients from the cardiology departments of Dongzhimen Hospital, Dongfang Hospital, and the Third Affiliated Hospital of Beijing University of Chinese Medicine, and an additional 173 patients from the First Affiliated Hospital of Hunan University of Chinese Medicine.These patients underwent interviews and had standard tongue photographs taken.After screening, we excluded 60 patients (8.77%) with disqualifying tongue images, 164 patients (23.98%) with more than 5% missing data, and 88 patients (12.87%) without a coronary CAD record and who could not be definitively ruled out for CAD.Ultimately, we included a total of 244 hypertension patients and 166 patients with hypertension combined with CAD in our study.
Table 2 provides detailed baseline information for both patient groups.Upon analysis, we found significant differences between groups in terms of age, BMI, and the duration of hypertension across the training, validation, and test sets.To further explore potential risk factors, we analyzed the data from the training and validation sets using logistic regression (Table 3).The results  indicated that older age, male gender, and having hyperlipidemia are risk factors for CAD, while having a higher education, Insomnia, and using antihypertensive medication were considered protective factors.

Performance of tongue feature extraction algorithm
In the task of semantic segmentation of the tongue body, we employed a customized algorithm based on the Deeplab V3 + framework, which demonstrated outstanding performance.The overall accuracy exceeded 99%, with a Mean Intersection over Union (mIoU) for the tongue segmentation task reaching 98.77%, and the Mean Pixel Accuracy (mPA) was as high as 99.45%.Such results fully attest to the effectiveness and accuracy of our algorithm in the task of tongue body semantic segmentation.
We developed a CAD diagnostic algorithm based on the ResNet-18 framework, which uses tongue images as input.On the training set, the algorithm showed a mean accuracy of 0.690, a mean AUC value of 0.749, and a mean recall rate of 0.842 (Table 4); on the test set, it achieved an accuracy of 0.670, an AUC value of 0.690, and a recall rate of 0.666 (Table 5).As shown in Figures 3A,C, the algorithm demonstrates certain classification capabilities on both the training and test sets, indicating that tongue images indeed possess classification value for the diagnosis of CAD in patients with hypertension.3.3Performance of CAD Diagnostic Algorithm.
We developed two types of CAD diagnostic models: one based solely on CAD risk factors and the other combining risk factors with deep features of tongue images.To determine the optimal ML approach, we compared the performance of various algorithms, all utilizing risk factors and deep features of tongue images as inputs.Results on the training set showed that different ML algorithms could effectively complete the classification task, with XGBoost exhibiting the best performance (Table 6).Therefore, we ultimately selected XGBoost as the method for algorithm customization.Moreover, we compared algorithm customized only with risk factors to those incorporating tongue image features, it was found that the inclusion of tongue image features significantly enhanced algorithm performance, indicating that adding tongue features as input variables positively contributes to algorithm optimization (Figure 3B; Table 4).
We also evaluated the performance of different ML algorithms on the test set (Table 7), and the results were broadly consistent with those on the validation set.Although there was a slight decrease in performance, the algorithms still demonstrated good classification capabilities, with XGBoost continuing to show the best performance.Additionally, we compared algorithms developed solely based on risk factors with those that integrate both risk factors and tongue image features, using the test set for evaluation (Figure 3C; Table 5).This finding confirms the practical diagnostic value of tongue images for CAD and also indicates the potential of tongue images to enhance the efficacy of current diagnostic models for the condition.
To more comprehensively evaluate the algorithm's applicability and performance across different populations, we subdivided the test set according to age, gender, and the number of risk factors, and presented the algorithm's performance across various subgroups.As shown in Figure 5, in terms of age distribution, the algorithm demonstrated superior diagnostic ability in the elderly population aged 65 and above.Regarding gender, our algorithm demonstrated relatively stable performance between men and women, with no significant differences.In terms of risk factors, the algorithm's judgment ability significantly improved when the number of risk factors reached or exceeded three; however, with fewer risk factors, the algorithm's performance was comparatively weaker.Hypertensive patients constitute a large patient population and are an important risk factor for coronary heart disease.When the two conditions occur simultaneously, they can result in a higher burden of disease and accidental risks (4).Therefore, this study selected hypertensive patients as the target population.Tongue diagnosis is an important diagnostic method in TCM, closely related to the cardiovascular blood flow status.However, it has been largely overlooked in the actual diagnosis of CAD.In recent years, there have been many studies on constructing artificial intelligence diagnostic models for CAD, but none have attempted to utilize the tongue, a biological marker of the human body.We conducted a multi-center cross-sectional study, customizing a CAD diagnostic algorithm that integrates risk factors and tongue features based on clinically accessible data, achieving moderate performance.We innovated a method using deep learning to optimize CAD diagnosis through tongue images.This method has the advantages of being non-invasive, low-cost, and easy to operate compared to coronary angiography, and it exhibits better diagnostic performance than traditional CAD diagnostic algorithms that use risk factors.Our results affirm the practical diagnostic value of the tongue for CAD and demonstrate the feasibility of enhancing CAD diagnostic algorithm performance with tongue images.
For the development of a CAD diagnostic model, the rational use of clinical risk factors is of great importance.Various CAD diagnostic and prediction models, such as the Framingham (33) and the Systematic Coronary Risk Evaluation (SCORE) (34) heavily rely on clinical risk factors as a crucial basis.Inspired by this approach, our study first analyzed potential clinical risk factors for CAD.In our dataset, not all known CAD risk factors played a risk role, and insomnia, previously considered a risk factor (35) acted as a protective factor.We developed a CAD diagnostic algorithm that uses clinical risk factors as inputs, which exhibited moderate diagnostic performance (ACC = 0.770).Diagnostic models developed in other studies based on clinical risk factors showed similar performance (36)(37)(38).Although the performance of our non-invasive diagnostic model still needs further improvement, which may be related to the grouping method, sample size, and model construction approach, our results still demonstrate the potential of using tongue images for non-invasive diagnosis of CAD.Based on this, we attempted to incorporate tongue image features as inputs to develop a new CAD diagnostic algorithm.To accurately and objectively explore the value of tongue images for CAD diagnosis, we used the that has consistently shown good performance in past studies on tongue images (40)(41)(42).In the analysis of tongue image features using traditional feature engineering, medical prior knowledge is often borrowed to define features based on the color, shape, and texture of the tongue, ensuring that the features possess good interpretability and medical significance.That is precisely why our research opted for deep learning instead of traditional feature engineering.Despite the complexity and lack of explainability in the decision-making process of deep learning, this method simplifies the feature extraction process compared to traditional tongue image feature extraction engineering (43).It eliminates the need for extensive manual labeling of image features and enables fast and efficient automatic learning of complex feature representations in images, uncovering hidden information.The results indicate that, although there was a slight decline in the algorithm's performance on the test set compared to the validation set, the algorithm still retains certain classification capabilities overall.This suggests that tongue images have a definite diagnostic value for CAD, making the tongue an effective biological marker for CAD diagnosis.
Ultimately, we incorporated both risk factors and tongue image features as inputs and developed a new CAD diagnostic algorithm using XGBoost.This algorithm demonstrates superior performance compared to those utilizing single-type features as inputs.This part of the work validates the practical effectiveness of tongue images in enhancing the performance of CAD diagnostic algorithms, proving the feasibility of supplementing clinical diagnosis with TCM diagnostic theories.It offers a new perspective on integrating traditional medical knowledge with modern technology.Additionally, we also focused on the algorithm's performance across different demographic subgroups.The results indicate that the algorithm has better diagnostic capability in elderly populations aged 65 years and above, which may be related to the higher prevalence of CAD in older individuals.The algorithm performs similarly in both men and women, indicating that our developed algorithm has commendable generalization capability across genders.The model exhibits higher accuracy in judgments when the number of risk factors is three or more,  Although this study identified the potential value of the tongue in diagnosing CAD, it also has some limitations.Firstly, despite being a multi-center study, there are only four hospitals from two regions involved in the sub-centers, lacking subjects from different ethnicities and countries.Secondly, this study is focused solely on hypertensive populations, and the overall sample size is relatively small, which may limit the possibility of applying the CAD diagnostic algorithm to a wider population.Thirdly, while this study employed standardized tongue image collection equipment to minimize interference from other factors during image capture, it also restricts the application of the model in different scenarios and with different collection devices.Although this study explores the potential value of tongue diagnosis for CAD, future research needs to further validate and optimize our diagnostic model in a wider and larger population, carrying out prospective studies.We also experimented with using different types of cameras, various light sources, and even mobile portable devices for image collection, to further expand the model's applicability and enhance its generalization capability.In future research, we can expect more optimized and interpretable deep learning models to enhance the study results, capturing finer changes in tongue images more accurately, thus further optimizing the findings of this study.Additionally, tongue diagnosis is only an essential component of TCM diagnosis, and we can further focus on the integration of multimodal data, considering the fusion of other biomarkers with tongue images to build a more comprehensive and integrated CAD diagnostic model.

Conclusion
Exploring an inexpensive, non-invasive diagnostic tool that can be used for early-stage and large-scale screening of CAD is essential.In this study, we analyzed potential risk factors for CAD, extracted potential diagnostic features from tongue images, and developed a new, well-performing CAD diagnostic algorithm based on these findings.Our work introduces a novel perspective, suggesting that tongue images have applicable diagnostic value for CAD diagnosis.Tongue image features could become new risk indicators for CAD, demonstrating the feasibility of integrating TCM theories with modern technology.

FIGURE 1
FIGURE 1 (A) Front view of the device; (B) Side view of the device; (C) Rear view of the device; (D) Schematic of the image acquisition process; (E) Perspective of the operator.
), pretrained on the ImageNet dataset, as the foundation for our deep learning architecture.As a deep residual network, ResNet-18 effectively mitigates the vanishing gradient problem encountered in training deep networks through residual learning, making it widely applicable in image recognition, especially in the field of medical image processing.It has demonstrated excellent performance in various tongue image processing tasks.During the training process, we chose to freeze the first and second layers (layer1 and layer2) of the model, training only the parameters of the third and fourth layers (layer3 and layer4) along with the fully connected layer.This strategy not only helps prevent overfitting, which might arise due to the small size of the dataset, but also further enhances the model's ability to learn representations.Stochastic Gradient Descent (SGD) was employed as the optimizer, with cross-entropy loss function used as the loss function.Upon convergence, the output of the penultimate layer of the model forms a 512-dimensional deep

FIGURE 2
FIGURE 2 Data preprocessing for tongue images.(A) DeepLabV3 + framework diagram, (B) model training loss function graph, (C) preprocessing effect on tongue image.

FIGURE 3
FIGURE 3 Tongue image-based CAD diagnostic algorithm.(A) ResNet-18 framework used in this study, (B) 5-fold cross-validation mean ROC on training set, (C) 5-fold cross-validation mean ROC Comparison on training set, (D) performance comparison of different feature inputs on the validation set.

FIGURE 4
FIGURE 4 Study flowchart.(A) Workflow for algorithm development; (B) Workflow for algorithm validation.

FIGURE 5
FIGURE 5 Algorithm performance in subgroups of test group.(A) AUC for the model in individuals under 65 years old; (B) AUC for the model in individuals 65 years and older; (C) AUC for the model in males; (D) AUC for the model in females; (E) AUC for the model in individuals with fewer than 3 risk factors; (F) AUC for the model in individuals with 3 or more risk factors.

TABLE 3
Binary logistic regression results of the main coronary artery disease risk factors.

TABLE 2
Basic information of three groups.

TABLE 4
Performance of 5-fold cross-validation with different feature inputs.

TABLE 5
Performance on the test Set with different feature inputs.

TABLE 6
Comparison of 5-fold cross-validation results across different algorithms.

TABLE 7
The performance comparison of different algorithm on the test set.
(39)-1 tongue diagnosis instrument for image capture and established a standardized data collection process.We eliminated potential interference from other facial information through tongue body semantic segmentation and image cropping.In customizing the algorithm for diagnosing CAD with tongue images, we employed the Resnet-18 as the deep learning framework.Resnet-18 is a type of deep residual network(39) Duan et al. 10.3389/fcvm.2024.1384977Frontiers in Cardiovascular Medicine 10 frontiersin.orghighlighting the importance of considering multiple risk factors in the diagnosis of CAD.This approach enhances our understanding of the model's generalizability, revealing its applicability to patient groups with varying demographic and clinical profiles.It holds significant value for clinical practice, offering a reference for tailoring diagnostic methods based on demographic characteristics to improve diagnostic accuracy.By evaluating the model's performance in different subgroups, we can identify potential biases that might affect accuracy, thereby rendering it more reliable in clinical settings.