Decision trees and multi-level ensemble classifiers for neurological diagnostics

Cardiac autonomic neuropathy (CAN) is a well known complication of diabetes leading to impaired regulation of blood pressure and heart rate, and increases the risk of cardiac associated mortality of diabetes patients. The neurological diagnostics of CAN progression is an important problem that is being actively investigated. This paper uses data collected as part of a large and unique Diabetes Screening Complications Research Initiative (DiScRi) in Australia with data from numerous tests related to diabetes to classify CAN progression. The present paper is devoted to recent experimental investigations of the effectiveness of applications of decision trees, ensemble classifiers and multi-level ensemble classifiers for neurological diagnostics of CAN. We present the results of experiments comparing the effectiveness of ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart decision tree classifiers. Our results show that SimpleCart was the most effective for the DiScRi data set in classifying CAN. We also investigated and compared the effectiveness of AdaBoost, Bagging, MultiBoost, Stacking, Decorate, Dagging, and Grading, based on Ripple Down Rules as examples of ensemble classifiers. Further, we investigated the effectiveness of these ensemble methods as a function of the base classifiers, and determined that Random Forest performed best as a base classifier, and AdaBoost, Bagging and Decorate achieved the best outcomes as meta-classifiers in this setting. Finally, we investigated the meta-classifiers that performed best in their ability to enhance the performance further within the framework of a multi-level classification paradigm. Experimental results show that the multi-level paradigm performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier.


Introduction
Neurological disorders often span multiple chronic disease entities such as diabetes, kidney and cardiovascular disease and present an area of medical practice where data mining can provide assistance in clinical decision making.Decision making and diagnosis in medical practice is most often based on incomplete data due to either unavailability of diagnostic laboratory services, technical issues or lack of patient cooperation as well as counter-indications for undertaking certain diagnostic tests.Utilizing data mining methods, powerful decision rules can be determined, which enhance the diagnostic accuracy when an incomplete patient profile is available or multiclass presentations are possible.In order to reduce the cost of performing medical tests required to collect the attributes yet maintain diagnostic accuracy, it is essential to optimize the features used for classification and to keep the number of features as small as possible.

Diabetes Mellitus Type II and Cardiac Autonomic Neuropathy
Diabetes mellitus is a major world-wide health issue.Cardiovascular complications associated with diabetes account for 65% of all diabetic deaths.The large impact of cardiovascular disease on people with diabetes has brought about that the National Diabetes Strategy and ACCORD Study Group recommended that people with diabetes be regularly screened for the presence of comorbidities including autonomic nervous system dysfunction with the aim to decrease the incidence of cardiovascular related mortality [1][2][3].The increased risk of cardiac mortality due to arrhythmias makes screening of people with diabetes for autonomic neuropathy vital so that early detection, intervention and monitoring can occur [4].
People with diabetes and autonomic neuropathy have increased mortality rates (29%) compared to people with diabetes without autonomic neuropathy (6%) [5,6].As many as 22% of people with type 2 diabetes suffer from cardiovascular autonomic neuropathy (CAN) which leads to impaired regulation of blood pressure, heart rate and heart rate variability (HRV).Silent ischemia is significantly more frequent in patients with CAN than in those without CAN [7].Significantly more people with diabetes die from cardiovascular disease such as heart attack and stroke, which can be attributed to CAN [8].Early subclinical detection of CAN and intervention are of prime importance for risk stratification in preventing the potentially serious consequences of CAN.

The Ewing Battery
Autonomic neuropathy in diabetics is traditionally detected by performing the Ewing battery of tests, which was recommended by the American Diabetes Association and the American Academy of Neurology and evaluates heart rate and blood pressure changes evoked by stimulation of cardiovascular reflexes [9][10][11].The five different tests/categories in the Ewing battery are shown in Table 1 below following [10].Several studies have shown that abnormalities in reflex testing give a good assessment of advanced diabetic autonomic neuropathy and aid in its objective diagnosis rather than relying on clinical signs such as gustatory sweating, reflux, and incontinence as self-reported by individuals.The response of subjects to each of the Ewing tests is defined as normal, borderline or abnormal.From this grading CAN risk assessment can be divided into a normal and no CAN evident category and four CAN categories comprising: early, definite, severe and atypical (Table 2).

Electrocardiogram Characteristics and CAN
Electrocardiogram (ECG) is a recording of the electrical activity of the heart using surface electrodes [12,13].The most commonly used configuration is a 12-lead ECG, which consists of a number of specific characteristics defined as waves or intervals.These included the P, QRS, T and u-waves and the QT, QTd and PQ intervals.The QRS complex represents the depolarization of the ventricles of the heart.The duration of the QRS complex is also often used in diagnostics.The time from the beginning of the P wave until the start of the next QRS complex is called the PQ interval and represents electrical activity in the atria of the heart.Whereas the distance from the Q wave to the start of the T wave is the QT interval, depicting the re-polarization of the ventricles, which if corrected for heart rate becomes the QTc.The difference of the maximum QT interval and the minimum QT interval over all 12 leads is known as the QT dispersion (QTd).The electrical axis of the heart is determined from the QRS wave and can indicate cardiac myopathy.ECG features have also been shown to indicate CAN [14,15].Sympathetic nervous system activity has been shown to be associated with changes in QT interval length and as a predictor of ventricular arrhythmia [14,16].Our own work has also identified ECG components to be associated with CAN [17].

Methods and Methodology
This section contains brief background material and describes the methodology used in this work.

Diabetes Screening Complications Research Initiative
The Diabetes Screening Complications Research Initiative (DiScRi) is a research initiative in Australia that has made it possible to collect a large dataset consisting of over 2500 entries and several hundred features.Therefore, a priority of any machine learning classification is to reduce the data to a manageable set.A hybrid of Maximum Relevance filter (MR) and Artificial Neural Net Input Gain Measurement Approximation (ANNIGMA) wrapper approaches were used to reduce the number of features necessary for optimal classification.The combined heuristic MR-ANNIGMA exploits the complimentary advantages of both the filter and wrapper heuristics to find significant features [17].

Decision Trees
We investigated six efficient decision tree classifiers, namely ADTree, J48, NBTree, RandomTree, REPTree and SimpleCart [18].Of these ADTree generates smaller rules compared to the other decision trees and the results are therefore easier to interpret [20].J48 is based on the C4.5 algorithm and uses information entropy to build decision trees from the set of training data.Each node of the tree represents the most effective split of the samples determined by the highest normalized information gain [21].NBTree is a Naïve-Bayes/Decision Tree hybrid and contains Naïve-Bayes classifiers at the leaves to create the decision tree [22].RandomTree employs a simple pre-pruning step to stops at a fixed depth for randomly chosen attributes at each node [18].REPTree considers all attributes to build a decision tree based on information gain [18].Finally SimpleCart (classification and regression trees) creates a binary split by applying minimal cost-complexity pruning.This procedure is continued on each subgroup until some minimum subgroup size is reached [23].

Ensemble Classifiers
Ensemble methods have been extensively used in data mining and artificial intelligence [24][25][26].
Here we describe the following methods implemented in WEKA: AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost and Stacking.
AdaBoost, boosting trains several classifiers in succession.Each classifier is trained on the instances that have turned out more difficult for the preceding classifier.To this end all instances are assigned weights, and if an instance turns out difficult to classify, then its weight increases [28].Bagging (bootstrap aggregating), generates a collection of new sets by resampling the given training set at random and with replacement.New classifiers are then trained, one for each of these new training sets.They are amalgamated via a majority vote [27].MultiBoosting extends the approach of AdaBoost with the wagging technique, which is a variant of bagging but where the training weights generated during boosting are utilized in the selection of the bootstrap samples [29].Stacking is a generalization of voting, where a meta learner aggregates the outputs of several base classifiers [30].Decorate is based on constructing special artificial training examples to build diverse ensembles of classifiers [31].Dagging is useful in situations where the base classifiers are slow.It divides the training set into a collection of disjoint (and therefore smaller) stratified samples, trains copies of the same base classifier and averages their outputs using vote [32].Grading is a meta-classifier, which grades the output of base classifiers as correct or wrong labels, and these graded outcomes are then combined [33].

Simulation Details
All experiments presented in this paper used WEKA software environment described in the monograph [18] and article [19].WEKA includes ADTree, J48, NBTree, RandomTree, REPTree, SimpleCart, Decision Table, FURIA, J48, NBTree, Random Forest, SMO and ensemble meta classifiers AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost and Stacking.We used WEKA Explorer to run each of these classifiers and meta-classifiers.The monograph [18] provides excellent explanations of how to run each of these classifiers and meta-classifiers in WEKA Explorer.There are also pdf files of the WEKA manual and tutorial available with every installation.
To prevent overfitting, we used 10-fold cross validation in assessing the performance of the classification schemes for all our experiments.It is a standard method for preventing overfitting, which is also available in WEKA Explorer and calculates outcomes for any number of classes among other performance metrics.The standard output produced by WEKA Explorer contains ROC area (as well as several other measures), obtained using 10-fold cross validation (refer to [18] for a detailed explanation of 10-fold cross validation and how to use these classifiers in WEKA Explorer).
To prepare data for WEKA Explorer, all instances of data were collected in one csv file and pre-processed.Pre-processing included the reduction of the number of missing values.To this end, more than 50 expert editing rules were collected and applied.Most of these rules rely on the fact that several medical parameters usually change only gradually with time, and so their values behave as a monotonic mathematical function.Therefore, for the purposes of data mining, it is safe to assume that a missing value of an attribute is approximately equal to the average of the preceding and following values of the same attribute.For other features, it is known that some clinical values indicating pathology very seldom improve.For example, if a person has been diagnosed with diabetes, then this diagnosis can be recorded in all subsequent instances of data for the same patient.Finally, some of the expert editing rules checked data for consistency and deduced missing values of certain attributes from other closely related features.For example, the "Diagnostic DM (years)" feature in DiScRi refers to the number of years since the patient has been diagnosed with diabetes.If this number is greater than zero in an instance, then the value of another related feature, the "Diabetic Status", must be set as "yes".These editing rules were collected in consultation with the experts managing the database.A Python script was written by the third author to automate the application of these rules.Pre-processing of the data utilizing the expert editing rules reduced the data to 1299 complete rows with complete values and 200 features included in the csv file.
We created 3 copies of this file to address the progression of cardiac autonomic neuropathy (CAN) indicated in the DiScRi database as "no CAN", "early CAN" and "definite CAN".In the first copy, a two-class paradigm was investigated with the last column containing the class value for classification including "definite CAN" and "no CAN".In the second copy, we added three CAN classes as "no CAN", "early CAN" or "definite CAN".In the third copy, as the last column we added one of four CAN classes including "no CAN", "early CAN", "definite CAN" or "severe CAN".In order to enable all classifiers available in WEKA Explorer to process these three files, the files were reformatted into ARFF format, which is the standard format used by all classifiers in WEKA.These three files were used in all experiments presented in the paper.

Decision Trees for Cardiac Autonomic Neuropathy
In medical applications it is important to consider models produced by the classifiers that can be expressed in a clear form and facilitate their application in clinical practice.Therefore various versions of the decision trees deserve special attention, since they satisfy this requirement.
Figure 1 presents the results of our experiments comparing the performance of the current decision trees available in WEKA for the neurological diagnosis of CAN progression.We refer to Section 3.4 for complete simulation details and WEKA Explorer results.

Figure 1. ROC of decision trees for the neurological diagnostics of CAN progression.
The best result was obtained using SimpleCart with an area under the curve (ROC) of 0.947 for classification of two CAN classes with classification of four CAN classes having an AUC of 0.936 (normal, early, definite and severe), which is still the best result for the four class paradigm.

Other Base Classifiers for the Cardiac Autonomic Neuropathy
Further, we investigated several other base classifiers: Decision Table, FURIA, J48, NBTree, Random Forest and SMO.Random Forest constructs a multitude of decision trees during training and has as an output the mode of the classes output by individual trees [27].Random Forest is hard wired to RandomTree and cannot use other base classifiers as an input parameter.This is why it is appropriate to regard it as a base classifier in our experiments.Recall that FURIA is a fuzzy unordered rule induction algorithm [35].
We introduce Random Forest into this setting as an additional base classifier.Random Forest cannot be used as a meta-classifier that accepts another base classifier as a parameter.All ensemble classifiers are in fact meta-classifiers that accept any base classifier as a parameter.

Ensemble Classifiers for the Cardiac Autonomic Neuropathy
Since Random Forest performed best, we investigated further ways of enhancing its performance using meta-classifiers.The ensemble methods listed in Section 3 were used with Random Forest as their base classifier.The resulting combined ensembles were created in WEKA Explorer.Figure 3   The outcomes show that the overall best performance was obtained when combining Random Forest with Decorate for the 2, 3 and 4-class problem.

Multi-level Ensemble Classifiers for Cardiac Autonomic Neuropathy
A different method of enhancing base classifiers is to include them into a multi-level scheme generated by several ensemble classifiers combined on two levels.In this scheme, a second ensemble meta-classifier is used as a base classifier for the first meta-classifier in WEKA Explorer.After that a base classifier is connected to the second ensemble meta-classifier.This creates an ensemble with three levels.We investigated all options of combining the best ensemble methods to create all possible pairs of different meta-classifiers and produce a three level ensemble classifier based on Random Forest.In this way we applied the best meta-classifiers via the multi-level classification paradigm.The best outcome was obtained by two options combining Bagging and Decorate into one multi-level ensemble classifier.The first option was Bagging in the second level after applications of Decorate based on Random Forest in the first level.The other optimal result was using Decorate in the second level to combine the results of Bagging applied to Random Forest as a base classifier (Figure 4).

Discussion
Among the standard decision tree base classifiers considered in this paper, the best result was obtained using SimpleCart with an area under the curve (ROC) of 0.947 for classification of two CAN classes and for the four-class classification with an AUC of 0.936.The best base classifier was Random Forest regardless of the number of classes of CAN.
Looking at ensemble classifiers based on decision trees the best performance was obtained by combining RandomTree with Decorate as the ensemble method with a ROC = 0.984.
Comparing ensemble classifiers based on RDR, we see that bagging and boosting outperformed other ensemble methods.Dagging produced worse results because usually it benefits classifiers with high complexity and in the current case RDR is fast enough.Stacking and grading use a meta-classifier to combine outcomes of base classifiers and in the current experiments we only considered Ripple Down Rules as base classifiers and thus stacking performed worse compared to bagging and boosting.The good performance of AdaBoost of bagging indicates that the diversity of ensemble classifiers used in the two levels is crucial for success of the combined multi-level ensemble classifier.
Further experiments have shown that Random Forest also performed best when combined with AdaBoost, Bagging, Decorate and MultiBoost for 2, 3, and 4 classes of the neurological diagnostics of CAN progression, and within the framework of the multi-level paradigm.The experiments show that the multi-level scheme performed best when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier based on Random Forest.

Conclusion
The results of experiments investigating the applications of data mining methods to a large diabetes screening dataset with emphasis on classification of cardiac autonomic neuropathy indicates that Random Forest is the best classifier to apply either on its own or in combination with ensemble classifiers and multi-level applications.
The experimental results presented in Figures 1 through to 4 determine the best options that may be recommended for the neurological diagnostics of CAN progression with regard to using decision tree classifiers, meta classifiers based on RDRs, and multi-level ensemble meta classifiers based on Random Forest.
The multi-level paradigm achieved best outcomes when Bagging and Decorate were combined in the construction of a multi-level ensemble classifier.The first option was to use Bagging in the second level after applications of Decorate based on Random Forest in the first level.The other optimal result was achieved by using Decorate in the second level to combine the results of Bagging applied to Random Forest acting as a base classifier to produce input for Decorate.

Figure 2
presents the results of experiments comparing the outcomes of these base classifiers for 2, 3 and 4 classes of the neurological classification of CAN progression based on the DiScRi dataset.The results show that Random Forest outperformed all other base classifiers.

Figure 2 .
Figure 2. Base classifiers for 2, 3 and 4 classes of the neurological diagnostics of CAN progression.
displays the results obtained using WEKA Explorer for AdaBoost, Bagging, Dagging, Decorate, Grading, MultiBoost and Stacking set up with Random Forest as the base classifier.

Figure 3 .
Figure 3. Meta classifiers for 2, 3 and 4 classes of the neurological diagnostics of CAN progression.

Figure 4 .
Figure 4. Meta-classifiers with two levels for 2, 3 and 4 classes of the neurological diagnostics of CAN progression.