Comparison of the Data Classification Approaches to Diagnose Spinal Cord Injury

In our previous study, we have demonstrated that analyzing the skin impedances measured along the key points of the dermatomes might be a useful supplementary technique to enhance the diagnosis of spinal cord injury (SCI), especially for unconscious and noncooperative patients. Initially, in order to distinguish between the skin impedances of control group and patients, artificial neural networks (ANNs) were used as the main data classification approach. However, in the present study, we have proposed two more data classification approaches, that is, support vector machine (SVM) and hierarchical cluster tree analysis (HCTA), which improved the classification rate and also the overall performance. A comparison of the performance of these three methods in classifying traumatic SCI patients and controls was presented. The classification results indicated that dendrogram analysis based on HCTA algorithm and SVM achieved higher recognition accuracies compared to ANN. HCTA and SVM algorithms improved the classification rate and also the overall performance of SCI diagnosis.


Introduction
The diagnosis of spinal cord injury (SCI) by neurological examination technique depends mainly on the experience of medical doctor and hence may lead to nonobjective ways of assessment [1]. In this technique, doctors assess the patient's symptoms, which may comprise of loss of motor or sensory function. McDonald and Sadowsky [2] reported that assessment should include mental status, cranial nerves, motor, sensory and autonomic systems, coordination, and gait. Patient symptoms may include extreme pain or pressure in the neck, head, or back; loss of sensation in the hand, fingers, feet, or toes; partial or complete loss of control over any part of the body and much more. Since this traditional technique requires continuous feedback, it has some limitations especially for noncooperative and unconscious patients. For clinical applications, such as monitoring the treatment and rehabilitation processes following a surgery, a more quantitative and objective way for the diagnosis of spinal cord injury is required. In recent years, some encouraging investigations have been carried out for this purpose by assessing the thermal [3] and electrical perception threshold [4,5]. However, these techniques require patient's feedback, hence cannot be effectively used for noncooperative and unconscious patients. Roehl et al. [6] examined the temperature difference on the skin surface by evaluating the thermographic imaging, and they concluded that thermography could be prospectively used as a supplement to existing diagnostic measures for SCI.
Recently we have suggested a new method, which can eliminate the patient-feedback dependency for the diagnosis of SCI in a quantitative manner [7]. This technique, which was based on the artificial neural networks (ANN) [8][9][10], could distinguish between skin impedances of control subjects and patients satisfactorily. In the present study, by using alternative algorithms, it was aimed to improve the diagnosing performance of the proposed method. To achieve this goal, in addition to ANN, support vector machine (SVM) and dendrogram-based hierarchical cluster tree analysis (HCTA) approaches were also used as alternative methods for the classification of the impedance values.
SVM is a supervised machine-learning algorithm based on a statistical learning theory approach for solving data classification and pattern recognition problems [11]. Although the fundamental concept of SVM was established in the late seventies [12], this method began to be widely used in the mid of nineties (for review, see [13]). In biomedical applications, this technique is frequently employed [14][15][16].
Dendrogram-based cluster analysis was first appeared in the study of Sneath and Sokal [17]. In this analysis method, data (objects) are divided into groups (clusters) that share common characteristics [18]. HCTA is one of the types of the cluster analysis, which is basically based on calculating the distances between data and finally grouping them into a hierarchical cluster tree (dendrogram) according to these distances. In biology, clustering analysis has been especially used to find groups of genes that have similar functions [18].

Experimental Procedure.
The impedance data analyzed in this study were previously collected for another reported study, hence a detailed explanation of the experimental protocol can be found in [7]. However, for the sake of completeness, a condensed form of the experimental procedure was given below.

Subjects.
Patients with traumatic SCI and control subjects aged between 18 and 55 years were included in the study. Duration of injury of the patients varied from three to twenty years. Initially, they were all evaluated by history and physical examination according to The International Standards for Neurological Classification of SCI, American Spinal Injury Association (ASIA), and International Spinal Cord Society (ISCoS) [1]. All procedures were approved by the Ethical Committee of Cerrahpasa Medical Faculty, Istanbul University.
Skin impedances of the key points between C3 and S1 were measured in 15 control subjects and 15 patients with SCI (13 paraplegics and 2 tetraplegics) bilaterally ( Figure 1). The impedances were measured in all dermatomes except C2 (due to hair), L1-3, and S2-5 (because of the refusal of the control subjects). According to the aforementioned booklet of ASIA and ISCoS, 10 pairs of key muscles and 28 pairs of key points were evaluated and finally the neurological level, completeness, and classification of SCI were determined. For the patients, inclusion criteria were determined as traumatic SCI and both gender; however, the exclusion criteria were determined for patients with any other neurological disorder than SCI and also nontraumatic SCI.  C2  C3  C4  C5  C6  C7  C8  T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12  L1  L2  L3  L4  L5  S1  S2  S3  S4   0   50   100   150   200   250   300 Sensory key points Control Paraplegic Tetraplegic Impedance (kΩ) Figure 2: Impedance data obtained from representative control, paraplegic, and tetraplegic subjects.

Skin Impedance Measurement.
In order to simulate the worst case condition, skin was not prepared artificially by abrasion or cleaning with alcohol before the measurements. Two self-adhesive electrodes were placed on the skin for each key point, and an AC signal (2 V, 200 Hz) was applied by means of a signal generator. The electrodes were placed on either side of the sagittal plane of the body. A portable multimeter was situated between one of the electrodes and signal generator, and the current level was recorded. The other output of the signal generator was connected to the electrode, which was not fixed to the multimeter. All experiments were performed by using electrocardiography (ECG) type electrodes (Unomedical, Unilect). The distance between the centers of the electrodes was 3 cm. In order to prevent the deterioration of adhesiveness of electrodes, which can eventually affect the skin-electrode impedance, each electrode was used once. In Figure 2, representative data obtained from control, paraplegic and tetraplegic subjects can be seen.

Artificial Neural Networks.
Neural Networks are mathematical models inspired by the human brain. These models consist of processing layers, where each node in a given input, hidden or output layers represent a neuron of that layer. They possess ability to approximate any arbitrary input-output mapping function, by learning like backpropagation and adapting parameters to training data and ability to generalize new testing data even from a lack of statistical knowledge about the input data [9]. Learning process of the ANN occurs at the synaptic junctions between the neurons of the input layer and the neurons of the output layer [8].
Dimension of the structure of an ANN considerably affects its classification performance. It is well accepted that networks with large dimensions (large number of hidden layers and neurons) do not always improve the accuracy of the classification process [19]. Moreover, neural networks with large dimensions do not converge easily and may be very time-consuming during the training process. However, small networks may fall into a local error minimum and subsequently learning from training data may not be optimal. In this study, since a three layer network (two hidden layers) can approximate any nonlinear function [20], we used two hidden layers in the network model. We determined the numbers of neurons per layer by grid search. In the network structure, one input layer, two hidden layers, and one output layer have 27, 16, 6, and 1 neurons, respectively. The input array was constituted from the mean values of skin impedances of the left-and right-side key points according to sagittal plane and target array was constituted from array of ones (denotes patients) and zeros (denotes subjects).
Transfer function is used mainly for the calculation of weight factor between neurons during the training process. In our case, we used log-sigmoid transfer function, since its output range (0 to 1) is ideal to output Boolean values. Backpropagation feed-forward algorithm was chosen for the training process, because it has been proven to be a robust algorithm for difficult connectionist learning problems [21]. Backpropagation algorithm is an extension of the least mean square learning algorithm and is widely used in adaptive signal processing. The weights are adjusted at each step to reduce the gradient of the cost function [8,9]. Number of the epoch was limited to 500 for the learning stage of the network.
We built a matrix of training including skin impedance values measured from control and patient subjects. Each row corresponds to a measured skin impedance values, and each column corresponds to a subject. Once we trained the ANN, we classified test subjects including both patient and control subjects disjointed from training set. We then measured performance of ANN on the test subjects. During the training and testing phase, we implemented 10-fold crossvalidation technique which is based on shuffling sample vectors among training and testing space randomly [22]. This method aims to maximize the amount of data that can be used for training to ensure a model that will generalize well to unseen data. In this technique, the impedance data set (30 subjects) was divided into 10 subsets; each subset consisted of three subjects. Training of the ANN was repeated 10 times. Each time, a single subset was retained as the validation impedance data for testing, and the remaining nine subsets (27 subjects) were used as training data. After cross-validation was completed for all of the subjects, means of the 10 classification results were computed.

Support Vector
Machines. This method is a kernelbased classification technique that is based on the marginmaximization principle that minimizes an upper bound on the expected loss (risk) using observed data [23,24]. In this method, the goal is to estimate the influence of an input measurement X ∈ { x 1 , x 2 , . . . , x n } variable on an output classification variable Y ∈ {y 1 , y 2 , . . . , y n } to find an optimal predictor f : X → Y , that is, a kernel function. In our case, n was defined as 30, since we have 30 subjects.
The SVM algorithm finds the decision boundary function as a linear combination of high-dimensional support vectors, which are acquired from training pair of examples from a sample space S n = ( x 1 , y 1 ), . . . , ( x n , y n ) ∈ X × Y , (independent and identically distributed) values from an unknown probability distribution, where n = 27 which corresponds to training sample size. This value denotes size of subset of all cases satisfying (n < n = 30) for training set. Each vector comprised of 21 dimensional skin impedance values corresponding to a patient or healthy subject.
The hyperplane can classify two classes in SVM machines when we set kernel function and regularizing parameter C appropriately. If dist + and dist − are becoming the shortest distances to this separating hyperplane bordering two classes, then the margin of the separating hyperplane becomes |dist + − dist − |. The shortest distance and the normal direction (orthogonal) to the hyperplane are related to each other. w is the 21-dimensional weight vector which is a function of the distance, dist + = dist − = 1/ w . Maximizing the margin means minimizing the term w /2, which shows the best classification success between patients and healthy subjects.
Every training example z = ( x i , y i ) consisted of a vector including the impedance values of key points x i ∈ 21 , and a discrete classification label value (binary classification), which corresponded to two groups (patients (−1) and controls (1)). Our goal was to predict the label value y i ∈ {−1, 1} using other test set which included a mixture of vectors with two states. In the training stage, we had previously done searching for an optimal hyperplane, which maximizes margin and minimizes errors with known corresponding labels y ∈ {−1, 1} included in the training set. In testing phase, kernel function led to predict the patients and control subjects.
The effectiveness of SVM depends mainly on the selection of the kernel, the kernel parameters, and regularization parameter C. In our case, we selected the radial basis function kernel (Gaussian kernel), because it is very flexible and can adapt in complexity to fit the training data. The Gaussian kernel parameter, σ, determines the area of influence of the support vector over the data space. Regularization parameter, C, controls the tradeoff between margin maximization and error minimization. In order to find the optimal values for the kernel parameter and regularization parameter, we used cross-validation and grid search. After grid search, we obtained σ as 0.1 and C as 10.
To be able to compare the performances of ANN and SVM in equivalent conditions, 10-fold cross-validation technique used for ANN was also employed for SVM.

Dendrogram Analysis Based on Hierarchical Cluster
Tree Analysis. In our specific case, hierarchical cluster tree analysis is a way to create groups of subjects in such a way that the skin impedance values of subjects in the same cluster are very close in magnitudes, and the skin impedance values of subjects in different clusters are quite different. Hierarchical clustering analysis divides whole tree into lower branches (leaves) as necessary.
In our study, we established a hierarchical evaluation structure [25] in a tree T, which can be considered as a clustering process for grouping different objects together. The root of the dendrogram denotes the entire data set including control and patient groups. This hierarchical tree consists of many U-shaped lines connecting patients and control subjects. The height of each U represents the distance between the two objects being connected. This method builds up a hierarchical classification in a bottom-up way from leaves up to the roots of the tree ordering with a distance matrix D. The distance matrix contains dissimilarity values among pair of individuals (patients and control subjects) In the first step, since we initially did not know which individual belongs to either patient or control subject class, we initialized all 15 patients and 15 control subjects with singleton clusters (sets with exactly one element) which means that we had a total of 30 clusters with each cluster containing just single patient or control subject as for all x ∈ Ω, { x} ∈ T at the very beginning. In other words, we . . , 30 and i / = j) between those singleton clusters. Once the proximity between subjects has been computed, we linked pairs of subjects that are close together into clusters made up of two subjects (binary clusters). We then linked these newly formed subjects to each other and to other subjects to create bigger clusters until all the subjects in the original data set are linked together in a hierarchical tree. Since it was aimed to observe the natural divisions that exist among links between subjects, we did not apply a clustering threshold.

Results
Since the ANN and SVM methods require cross-validation analysis, statistical significance analysis was only performed between the classification results of ANN and SVM (in our case, "classification result" refers to as percent of cases in which the different computational algorithms correctly predict whether or not an individual has a SCI). Oneway ANOVA was used to analyze the statistically significant difference between means of the classification results of ANN and SVM. However, HCTA does not require training and  cross-validation processes, that is, data set is evaluated as a whole rather than divided into subsets for training and testing phases. Therefore, there is no need to calculate an average of the validation result. For this reason, it cannot be performed a statistical significance analysis for HCTA. The level of significance was preset for all statistics at P = 0.05. Mean and standard deviation (SD) of the magnitudes of the skin impedances of all subjects (controls and patients) are denoted in Figure 3. No statistically significant difference between mean impedance values of controls and patients was found.
Since the number of the tetraplegics (only two patients; due to the inconvenient and difficult situations in measuring the skin impedances of tetraplegics) was much smaller than that of the paraplegics in the patient group, two different data sets were utilized during the data classification process. In doing so, it was intended to observe the effect of insufficient number of tetraplegics on the classifying results. The classification process, in which all the subjects were included, is referred to as Phase I (control + paraplegics + tetraplegics) and the other process, in which only control and paraplegic groups were included, is referred to as Phase II (control + paraplegics).
The average success rate of the classification results of the ANN and SVM was obtained as 73.3% and 78.5% for Phase I, respectively (Table 1)  as a rate of 76.6% and 100%, respectively. In addition, classification results of HCTA were 83.3% and 85.7% for Phase I and Phase II, respectively.
A comparison of the classification performances of ANN and SVM in diagnosing SCI is presented in Figure 4. In Phase I (Figure 4(a)) and Phase II (Figure 4(b)), means of the validation results obtained by SVM are higher than those obtained by ANN. A statistically significant difference between validation results of ANN and SVM was found for Phase II, but not for Phase I.
The dendrogram can be described as a graphical representation of the results of hierarchical cluster analysis. Dendrograms of the HCTA are given for Phase I and Phase II in Figure 5. In this figure, numbers along the horizontal axis and along the vertical axis represent the indices of the subjects (patients and controls) in the original data set and Euclidean distance between the skin impedances of the connected subjects, respectively. In Figure 5(a), patients with SCI (paraplegics + tetraplegics) and control subjects are denoted by the numbers 1 to 15 and 16 to 30, respectively. In Figure 5(b), patients with SCI (only paraplegics) are denoted by the numbers 1-13, and control subjects are denoted by the numbers 14 to 28.
In order to allow visualization of the classification performances of the three algorithms, confusion matrices were given in Tables 2(a)-2(f). Each column of the matrices represents the instances in the predicted class, while each row represents the instances in the actual class. All correct classifications are located in the diagonal of the tables.

Discussion
In our case, hierarchical clustering analysis utilized all available patient and control data in two steps. Initially, it calculated the distance between pair of subjects, which were selected and grouped together according to their impedance values. Later on, similar groups were selected and joined together, which led new and bigger groups. This process continued until all subjects were selected and attached to part of the tree. In our case, the tree had two major branches, which were formed by patients and healthy subjects. This new approach extracted patients and healthy subjects satisfactorily from an unlabeled set without invoking patient feedback. Dendrogram analysis does not require crossvalidation; hence, it is computationally efficient. ANN and SVM require some design parameters which are actually not known a priori. In case of SVM, if the regularization parameter is not selected properly, it might cause overfitting or underfitting. In case of ANN, dimension of network structure, initial weights, number of iterations, transfer function, and learning rate affect the accuracy of the classification considerably [19]. There have been numerous studies on the determination of the optimum neural network structure; however, a consensus on a certain approach to determine the best structure has not been reached [26]. These parameters could be estimated from cross-validation technique; however, it needs extra time and risk. In contrast to ANN, SVM algorithm automatically selects its model size [27]. Moreover, SVM training always finds a global minimum, whilst ANN optimization is often susceptible to local minima [13]. In addition to these advantages of SVM over neural networks, Shawe-Taylor and Cristianini [28] indicated more key features of SVM, such as the use of kernels, the sparseness of the solution, and the capacity control obtained by optimizing the margin.
Validation results obtained by using SVM were in agreement with those obtained by ANN. In both cases, rates of diagnosis of SCI with success were higher for Phase II than those for Phase I. The reason for this difference in the validation rates of Phase I and Phase II stems from the lack of the sufficient number of tetraplegic subjects. The performance of the algorithms used in classifying the patients with SCI and controls depends considerably on the number of subjects used as input in the training stage. ANN showed a modest increase in percentage accuracy from Phase I to Phase II. On the other hand, SVM showed a much larger increase in accuracy when the tetraplegic patients are absent because a multilayer neural network classifier suffers from the existence of multiple local minima solutions, whereas SVM is formulated as a quadratic programming problem and hence SVM training always finds a global minimum [13].
The results showed that HCTA and SVM algorithms improved the classification rate and also the overall performance. For Phase I, hierarchical clustering analysis achieved higher recognition accuracy compared to ANN and SVM systems; however, for Phase II, SVM showed the best classification performance.
Since the neurological examination technique used in the diagnosis of the SCI is mostly subjective, an objective and accurate technique would be a very important improvement for clinical applications. The suggested quantitative method in which the skin impedances were classified using the hierarchical clustering or SVM is a quite simple, noninvasive, and nonexpensive method. A multimeter, a frequency generator, ECG electrodes and a computer are sufficient to perform this technique. Moreover, measurement and analysis of the impedance do not require patient feedback, which ensures this technique to be applicable as a more objective method, especially for unconscious and noncooperative SCI patients.
It is concluded that the proposed skin impedance test based on SVM or HCTA can be used as a supplement to neurological and radiological examinations to enhance the diagnosis of SCI. For future studies, measurements of skin impedance of acute patients are planned. Also, other distinctive parameters, such as skin temperature [6], for diagnosing patients with SCI injury among healthy people, can be taken into account together with skin impedance. Such a combination of these distinctive parameters might improve the accuracy of the diagnosis of SCI.