Decision tree-based diagnosis of coronary artery disease: CART model

https://doi.org/10.1016/j.cmpb.2020.105400Get rights and content

Highlights

  • A systematic study is conducted on coronary artery disease (CAD) classification based on Z-Alizadeh Sani dataset.

  • This study offers an effective visualization tool for CAD classification.

  • CART methodology is implemented for CAD classification, considering vital input features.

  • CART offers the highest diagnostic performance, compared to previous models.

  • Atypical and current smooker features have the maximum and minimum importance in CAD diagnosis.

Abstract

Background and Objective

As the most common cardiovascular defect, coronary artery disease (CAD), also called ischemic heart disease, is one of the substantial causes of death globally. Several diagnosis approaches such as baseline electrocardiography, echocardiography, magnetic resonance imaging, and coronary angiography are suggested for screening the suspected patients that may suffer from CAD. However, applying such methods may have health side effects and/or expensive costs.

Methods

As an alternative to the available diagnosis tools/methods, this research involves a decision tree learning algorithm called classification and regression tree (CART) for a simple and reliable diagnosis of CAD. Several CART models are developed based on the recently CAD dataset published in the literature.

Results

Utilizing all the features of the dataset (55 independent parameters), it was found that only 40 independent parameters influence the CAD diagnosis and consequently development of the predictive model. Based on the feature importance obtained from the first CART model, three new CART models are then developed using 18, 10, and 5 selected features. Except for the five-feature CART model, the outcomes of developed CART models demonstrate the maximum achievable accuracy, sensitivity, and specificity for CAD diagnosis (100%), while comparing the predictions with the reported targets. The error analysis reveals that the literature models including sequential minimal optimization (SMO), bagging SMO, Naïve Bayes (NB), artificial neural network (ANN), C4.5, J48, Bagging, and ANN in conjunction with the genetic algorithm (GA) do not outperform the CART methodology in classifying patients as normal or CAD.

Conclusions

Hence, the robustness of the tree-based algorithm in accurate and fast predictions is confirmed, implying the proposed classification technique can be successfully utilized to develop a coherent decision-making system for the CAD diagnosis.

Introduction

Coronary artery disease (CAD) appears to be the most common cardiovascular defect; heart disease is a leading cause of global deaths. A study showed that CAD is responsible for the death of one-third of women, regardless of the ethnicity or race [1]. According to the World Health Organization (WHO) [2], CAD is the world's biggest killer amongst the top ten death causes including CAD, stroke, chronic obstructive pulmonary disease, lower respiratory infections, Alzheimer disease and other dementias, diabetes mellitus, road injury, diarrhoeal diseases, tuberculosis and trachea, bronchus, and lung cancers. Although heart disease management has drastically changed in recent decades [3,4], individuals with stable CAD are still prone to a significant adverse cardiovascular incident [5], [6], [7]. According to a study [8], more than 6% of the adult population in the United States are suffering from CAD. Furthermore, it has been estimated that the clinical CAD will be the concern of approximately one-third of middle-aged women and half of the middle-aged men across the United States [8].

Atherosclerosis, a condition in which plaque builds up inside the arteries supply oxygen-rich blood to the heart, characterizes CAD. The plaque formed over the years is responsible for narrowing the coronary artery lumen and, consequently, limiting the blood flow through the artery. The chest pains in the form of pressure sensation or squeezing can be a symptom of CAD. However, several patients with CAD show no symptoms of the disease [9]. To screen the patients, guideline recommendations are employed; the proposed tips are currently documented by the American Association of Clinical Endocrinologists (AACE), American College of Cardiology/American Heart Association (ACC/AHA), and US Preventive Services Task Force (USPSTF) [10], [11], [12].

The 2004 INTERHEART study [13] defined nine modifiable risk factors that are correlated with CAD. These factors include smoking, hypertension, abdominal obesity, diabetes, stress and depression, regular alcohol consumption, daily consumption of vegetables and fruits, dyslipidemia, and regular physical activity. The majority of the risk factors are similar in women and men. However, compared to men, women are found to have a stronger risk factor profile at younger ages. Men, on the other hand, tend to have better health conditions at older ages [14]. Indeed, at first CAD manifestation, women are approximately ten years older than men [15]. However, smoking, diabetes and/or premature menopause throw this advantage away [14]. The race is known to be another risk factor for CAD. For example, some studies revealed that CAD rate among Asian Indians is higher than that of other ethnics [14,16]. Family history is also associated with the risk of CAD. Based on a research investigation [17], CAD family history in a sister has 12-fold higher risk versus 3-fold for a parent and 6-fold for a brother.

For patients with known or suspected CAD, conventional invasive coronary angiography is found to be the gold standard for diagnosis purposes [18]. However, this approach is time consuming, invasive, and expensive. Its invasiveness nature may cause a degree of discomfort for some patients, since this method usually needs a short stay at hospital [19]. Moreover, this modality has a small but considerable complication rate [20]. Electron-beam computed tomography (EBCT) has paved the way for morphological evaluation of cardiac structures. This is owing to the high temporal resolution of EBCT and the use of prospective electro-cardiographic triggering as well. However, due to the inferior spatial resolution of the EBCT approach, it was not considered as a proper strategy for identifying the presence of coronary stenosis [19]. The introduction of computed tomography (CT) angiography led to substantial improvements in the detection of CAD as well as the assessment of the heart function in different conditions [21], [22], [23].

There are some prevention ways to deal with CAD. These approaches can be divided into two categories, namely primary prevention and secondary prevention. Indeed, primary prevention can be defined as the treatments or modification of risk factors that are proven to avert the first or initial coronary event [24]. A common example of this prevention category is using lipid-lowering agents to avoid the occurrence of the first myocardial infraction [25]. On the other hand, the treatment modalities that are initiated after the first event for the prevention of subsequent outcomes are known as secondary prevention strategies. For example, utilization of beta-blockers after a myocardial infarction for new events reduction belongs to this classification [26].

Studies showed that physical activity reduces the CAD risk. The role of physical activity in both primary and secondary prevention is studied in the literature. For example, in the case of primary prevention, the protective function of working out was declared in the United States for a large cohort of longshoremen [24]. For the secondary prevention, the Clinical Practice Guidelines for Cardiac Rehabilitation [27] indicated that the improvement extent is dependent on different factors, including duration, intensity and frequency of activity, and the training time interval. The concerns regarding regular aerobic exercise are addressed in some research investigations. It was revealed that regular moderate-intensity exercise decreases the cardiac mortality risk, and even with vigorous exercise, the sudden death incidence is low [28,29]. However, the risk of myocardial infarction increases with a high-intensity and vigorous exercise [24].

In addition to the traditional and routine screening approaches to detect CAD, some predictive models have been developed based on different machine learning (ML) and data mining methodologies. This study is intended to employ the decision tree learning algorithm, particularly the classification and regression tree (CART), for the diagnosis of CAD. To the best of our knowledge, this is the first work on the application of CART to study the Z-Alizadeh Sani CAD dataset for diagnosis/classification purposes. Furthermore, we compare the outcomes of our new models with the results of the previously used models, developed based on various methodologies such as ANN and support vector machine (SVM). In this work, we develop several CART models based on different inputs; the inputs are not selected randomly. Indeed, we introduce a new CART model through employing all the inputs existing in the Z-Alizadeh Sani CAD dataset. Based on the Gini index, we then define the relative importance of each independent parameter for the development of the CART model. Finally, we obtain two more CART-based models using the independent parameters, which are detected (recognized) as important parameters. Further highlighting the novelty of this research, the contributions of the work are as follows: a) Utilization of CART strategy for CAD diagnosis using Z-Alizadeh Sani CAD dataset for the first time, b) Selection of vital input parameters and determination of relative importance of inputs, c) Design of the optimal CART model in terms of structure/topology and parameters values, d) Proper data mining for training and testing phases, e) Systematic statistical analysis for performance evaluation of various classification tools, and f) Higher accuracy and simpler (and more understandable) outcomes of the CART approach, compared to the previous techniques applied to the dataset.

In fact, the CART results would help physicians and health scientists to better understand the relationships between different parameters and CAD.

According to the literature, several connectionist and predictive/deterministic methods are currently being used in various science, health, and engineering disciplines for different purposes, particularly when a large amount of data is available. In all cases, selection and application of proper techniques, data mining/management, choosing vital important input data, tuning the parameters of deterministic or classification models based on the selected methods, finding the relative importance of input parameters, results and statistical analysis, and making proper decisions are among the novelties of research works.

After the introduction section, a review of the published works in the literature on the classification of CAD using various ML and data mining approaches is provided. Section 3 presents an overview of the CART methodology for the classification task. Modeling procedure for the application of interest is addressed in detail in Section 4. Section 5 includes the findings as well as a discussion about the modeling results. Finally, the main conclusions are highlighted

Section snippets

Overview of previous relevant studies

In 2017, Xu et al. [30] employed the multivariate logistic regression to detect the correlation between CAD and defined risk factors such as smoking status, angina, age, sex, hypertension, diabetes, serum creatinine, and dyslipidemia. The developed model was then used to differentiate non-CAD from CAD in the test sample. The data required for modeling purpose were gathered by studying 8297 patients, ranging between 19 and 90 years old, in the north and south of China between 2008 and 2014.

Cart methodology

Among the available learning algorithms for decision trees such as Iterative Dichotomiser 3 (ID3) [62], C4.5 [63], successor of the ID3 learning algorithm, fuzzy ID3 [64] and CART [65], the CART strategy is known to be one of the most successful techniques [65,66] that can be utilized for both classification tasks and regression analysis [67,68]. CART is a nonparametric ML method. This feature enables the CART method to freely learn any form of the mapping function from the employed training

CAD dataset

To develop a tree-based classifier for CAD diagnosis, the dataset reported by Alizadeh et al. [51], known as the Z-Alizadeh Sani dataset, is employed. The collected databank comprises of the information of 303 patients. This databank has 55 independent parameters and classifies a person into a normal or CAD class. The criterion for classifying a person as a patient who has CAD is her/his diameter narrowing status. If the diameter narrowing is lower than 50%, the patient is classified as normal,

Results and discussions

This section includes the main findings/results of this study and corresponding discussions on the relative performance of the input parameters and model performance (compared to the previous approaches). Table 2 shows the contingency table, also known as the confusion matrix, for the Z-Alizadeh Sani databank. According to Table 2, the accuracy considers the same costs for misclassified samples. Hence, in addition to the accuracy, TNV and TPR can be employed to perform a comprehensive

Conclusions

Several tree-based classifiers are developed on the basis of the CART algorithm where a recently collected clinical data namely the Z-Alizadeh Sani dataset is utilized for CAD diagnosis. To ensure that the developed models based on the CART algorithm are reliable, both testing strategies with a 10-fold cross validation and with a test sub-dataset are included in the validation process.

Employing ACC%, TPR%, and PPV% as the model assessment criteria, the classification performance of the

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgements

We greatly appreciate the funding from the Natural Sciences and Engineering Research Council (NSERC) of Canada and Memorial University to support this research work.

References (82)

  • J.H. Tan et al.

    Application of stacked convolutional and long short-term memory network for accurate identification of cad ecg signals

    Comput. Biol. Med.

    (2018)
  • U.R. Acharya et al.

    Automated detection of coronary artery disease using different durations of ecg segments with convolutional neural network

    Knowl. Based Syst.

    (2017)
  • D. Giri et al.

    Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ica and discrete wavelet transform

    Knowl. Based Syst.

    (2013)
  • U.R. Acharya et al.

    Automated characterization of coronary artery disease, myocardial infarction, and congestive heart failure using contourlet and shearlet transforms of electrocardiogram signal

    Knowl. Based Syst.

    (2017)
  • M. Kumar et al.

    Characterization of coronary artery disease using flexible analytic wavelet transform applied on ecg signals

    Biomed. Signal Process Contr.

    (2017)
  • U.R. Acharya et al.

    Application of higher-order spectra for the characterization of coronary artery disease using electrocardiogram signals

    Biomed. Signal Process Contr.

    (2017)
  • U.R. Acharya et al.

    Linear and nonlinear analysis of normal and CAD-affected heart rate signals

    Comput. Methods Programs Biomed.

    (2014)
  • T. Nguyen et al.

    Medical data classification using interval type-2 fuzzy logic system and wavelets

    Appl. Soft. Comput.

    (2015)
  • K. Uyar et al.

    Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks

    Procedia. Comput. Sci.

    (2017)
  • O.W. Samuel et al.

    An integrated decision support system based on ann and fuzzy_ahp for heart failure risk prediction

    Expert Syst. Appl.

    (2017)
  • R. Alizadehsani et al.

    A data mining approach for diagnosis of coronary artery disease

    Comput. Methods Programs Biomed.

    (2013)
  • R. Alizadehsani et al.

    Coronary artery disease detection using computational intelligence methods

    Knowl. Based Syst.

    (2016)
  • Z. Arabasadi et al.

    Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm

    Comput. Methods Programs Biomed.

    (2017)
  • R. Alizadehsani et al.

    Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries

    Comput. Methods Programs Biomed.

    (2018)
  • U.R. Acharya et al.

    Automated characterization and classification of coronary artery disease and myocardial infarction by decomposition of ecg signals: a comparative study

    Inf. Sci. (Ny)

    (2017)
  • R. Alizadehsani et al.

    Machine learning-based coronary artery disease diagnosis: a comprehensive review

    Comput. Biol. Med.

    (2019)
  • M.M. Ghiasi et al.

    Application of decision tree learning in modelling CO2 equilibrium absorption in ionic liquids

    J Mol Liq

    (2017)
  • M.M. Ghiasi et al.

    Decision tree-based methodology to select a proper approach for wart treatment

    Comput. Biol. Med.

    (2019)
  • M. Abdar et al.

    A new machine learning technique for an accurate diagnosis of coronary artery disease

    Comput. Methods Programs Biomed.

    (2019)
  • T. Thom et al.

    Heart disease and stroke statistics––2006 update

    Circulation

    (2006)
  • Global Health Estimates 2016: Deaths by Cause, Age, Sex, By Country and By Region, 2000-2016

    (2018)
  • S. Sidney et al.

    Recent trends in cardiovascular mortality in the united states and public health goals

    JAMA Cardiol.

    (2016)
  • G.A. Roth et al.

    Global and regional patterns in cardiovascular mortality from 1990 to 2013

    Circulation

    (2015)
  • M. Task Force et al.

    ESC guidelines on the management of stable coronary artery diseaseThe task force on the management of stable coronary artery disease of the european society of cardiology

    Eur. Heart J.

    (2013)
  • T.M. Maddox et al.

    Nonobstructive coronary artery disease and risk of myocardial infarction

    JAMA

    (2014)
  • D. Mozaffarian et al.

    Heart disease and stroke statistics—2016 update

    Circulation

    (2016)
  • R.H. Eckel et al.

    AHA/ACC guideline on lifestyle management to reduce cardiovascular risk

    Circulation

    (2013)
  • USPST. Force

    Statin use for the primary prevention of cardiovascular disease in adults: us preventive services task force recommendation statement

    JAMA

    (2016)
  • N. Forouhi et al.

    How far can risk factors account for excess coronary mortality in south asians

    Can. J. Cardiol.

    (1997)
  • R. Beverly et al.

    Family history and risk of myocardial infarction in young women

    Circulation

    (1996)
  • Z. Sun

    Coronary computed tomography angiography in coronary artery disease: a systematic review of image quality

    Diagnostic Accuracy and Radiation Dose

    (2013)
  • Cited by (170)

    View all citing articles on Scopus
    View full text