Classification of Lumber Spine Disc Herniation using Machine Learning Methods

In the medical field computer-aided diagnosis systems (CADs) are an active area of research as CADs serve to aid medical professionals in simplifying the diagnosis of a patients condition. In this paper we propose a machine learning based method for classifying lumbar disc herniation. The automation of herniated disc diagnosis decreases the enormous weight on radiologists who need to analyse several cases every day manually. Automation will also help to decrease inter and intrarater variability. Hence his work focuses on the classification of lumber disc herniation based on sagittal view Magnetic Resonance Images (MRIs). The dataset used in this work comprises of 32 images from 32 patients of which 10 patients are healthy while 22 of them have herniated discs. This data is processed through various image processing techniques to obtain three sets of features: the binary image; shape, height and width measurements of discs; and full attribute images. The proposed approach consists of four stages: region extraction, image segmentation, feature extraction and classification. The classification process is performed through support vector machines (SVMs) and K-nearest neighbor (KNNs) of which the KNN with k=5 produced the best results with 78.6% accuracy, F1 score of 66.7%, precision and recall rate of 60% and 75% respectively.


Introduction
Lower back pain (LBP) may result from a herniated intervertebral disc by lifting too heavy objects or fracture of the vertebrae due to bruise or osteoporosis. 1 The vertebral column, also known as the spinal column, spine, or backbone, is a bony structure that runs through the middle of the body. 2 The human spinal column is made up of 33 bones, 7 cervical vertebras, 12 thoraxial vertebrae, 5 lumbar bones, 5 sacral and 4 coccygeal bone areas as shown in figure 1. 3 The spinal cord itself, as well as surrounding tissues and bones, can be damaged by a spinal cord injury. This may result in herniated discs, which can develop anywhere along the spine, although the lower back is the most common location and different symptoms on the discs can be described as a bulging, protruding, or ruptured disc. 4 Therefore, the main focus of this reserachis on the lumbar discs from (L1-S1).

Fig. 1: The human spinal column 3
Lumbar disc herniation can be classified into different phases. Healthy discs may start to drain, shrink and loose their elastic property, resulting in a reduction in the total joint height. The four stages of herniation are classified as: disc protrusion, disc prolapse, disc extrusion and sequestered disc; these conditions have been illustrated in figure 2. 5 The first two stages where the disc starts to protrude out of its original form are considered as an incomplete herniation, whereas the third and fourth stages where the disc is extruding further are considered as complete disc herniations. 5

Literature Review
Many researchers have worked towards an automatic detection of lumber disc herniation especially over the past decade. Some of these researches will be discussed in this section. Alawneh et al. 6 proposed to perform ROI enhancement with the aid of noise removal tools and the contrast-limited adaptive histogram equalization (CLAHE) algorithm. This was followed by skeletonization to extract important features of the MRI images. Beulah et al. 7 proposed to use histogram of oriented gradients (HOGs) to extract features from the intervertebral disc (IVD) images before using the support vector machine (SVM) to train the images in order to obtain the classified output. Alomari et al. 8 proposed to perform disc localization before performing thresholding to obtain the initial boundary of the disc. This is followed by using a gradient vector flow active contour model (GVF-snake) to perform disc segmentation and finally, classification takes place by using a Gibbs-based classifier.
Ebrahimzadeh et al. 9 proposed to extract the spinal cords using Otsu's thresholding and then aligning it with the third-order polynomial. This was followed by feature extraction and feature selection where the features of interest were the disc intensities and shape features. Finally, three classifiers: multilayer perceptron (MLP), KNN and SVM Hence the main objectives of this work is to extract the region of interest (ROI) i.e. the individual discs from the MRI images, to determine the features of herniated and healthy discs for feature extraction. This will lead to the development of an automated system for the classification of healthy and herniated discs.
were used to classify the images.Ghosh et al. 10 proposed a robust and fully automated lumbar herniation diagnosis system using heterogeneous classifier. The intensity and texture features were extracted from the ROI for each disc. Five classifiers were constructed by using heterogeneous learning algorithms (SVM, PCA+ LDA, PCA+NAÏVE, PCA+QDA and PCA+SVM) to detect if disc is herniated or not. Then combined majority voting scheme is adopted which results in a robust diagnostic system. The five crossvalidation experiments are performed that achieve an accuracy of 94.85%, specificity of 95.9% and sensitivity of 92.45% for 35 clinical cases that is a total of 175 lumbar intervertebral discs. Salehi et al. 11 presented an automatic diagnosis of disc herniation in two-dimensional MR Sagittal images. For this purpose, 50 clinical MRI images include 250 lumbar area disc. The K-fold cross validation method is used by considering K=10. The average accuracy achieved is 97.91%. and 97.08% accuracy with K-fold cross validation method using KNN and linear SVM classifiers.
Rehman et al. 12 proposed a robust framework of vertebra segmentation. This approach is capable of handling the complex shape variations in the vertebra efficiently. The U-Net architecture of deep convolutional network is used to extract the shape of bones and discs accurately. This method was implemented on two different data sets. In the first one 20 publicly available 3D spine MRI images were used for disc segmentation and in second case 173 CT scans were used for the thoracolumbar vertebrate segmentation. The accuracy for disc segmentation is 90.37% . Mbarki et al. 13 proposed a method for lumbar spine disc classification based on deep convolutional neural networks using axial view MRI. The accuracy achieved of the trained model was 94%. Shinde et al. 14  Hence based on the literature review it is observed that different methods; from image processing to machine and deep learning have been used for segmentation and classification of lumbar disc herniation. However all these methods are applied on different datasets, hence it is difficult to evaluate the true significance of the work done. Hence in this preliminary work, we plan to use image processing and machine learning methods to classify lumbar disc herniation, and later in future we will extend our analysis to several datasets to evaluate the effectiveness of the methods developed.

Material and Methods
In this paper, an automated system has been developed to classify healthy and herniated discs from magnetic resonance imaging (MRI) scans. The MRI scans used here are of the sagittal (side) view. A set of algorithms are used to enhance the images for accurate extraction of regions of interest (ROI). The precision of these ROIs are vital for feature extraction and classification of the discs using machine learning classifiers. This method can be divided into four main phases: (1) region extraction, (2) image segmentation, (3) feature extraction and (4) classification. The workflow is illustrated in figure 3.

Fig. 3: Workflow of the proposed lumbar disc herniation classification method
The dataset contains MRI scans of 32 patient's, where 10 patients have all healthy lumbar discs while 22 patients contain herniated lumbar discs as well as healthy lumbar discs. Hence out of 160 (32x5) lumbar discs, there are 112 healthy lumbar discs and 48 herniated lumbar discs. It should also be noted that the herniated lumbar discs may be in any of the multiple stages of lumbar disc herniation.
The CAD system is developed using the Python programming language as there are many diverse libraries to be used which makes the results to be replicated easily. The Python libraries used are Open CV for the image processing and Sk learn for the classification. The designed system can be divided into four general steps: ROI extraction, ROI enhancement, feature extraction and classification.
The 5-fold cross validation scheme has been implemented and the data has been divided into the training, validation and testing sets. It should be noted that the test sets are set aside from the training and validation sets. This whole project is done on a NVDIA GeForce GTX equipped Acer Nitro 5 with 16GB memory. The image processing stages were implemented in MATLAB, while the training was performed on a Spyder IDE using the Scikit-learn Python library.
In the region extraction phase, the main region of the image is extracted from the MRI images.
As the original MRI scans include neighbouring parts of the human body which are not required, hence only the lumbar discs located in the middle of the image are extracted manually, as shown in figure 4, while the other regions are cropped off and discarded. This step aids in the extraction process by narrowing down the scope of the image that is to be processed.

Fig. 4: Original MRI image (left), extracted region (right)
Subsequently, the cropped RGB images are converted to grayscale images. Adaptive thresholding is then applied by specifying the threshold value to convert the grayscale images to binary images. The threshold value is not fixed but determined by trial and error, and in this case the optimum threshold value was found to be 0.9. The output of this stage is shown in figure 5.

Fig. 5: Gray scale to binary image conversion using image thresholding
In the image segmentation phase, several image processing techniques have been implemented to enhance the images to allow important features to be extracted. As it can be observed from figure 5, there are still many irrelevant details in the image which should be removed before segmenting individual discs, and there are holes in several discs which need to be filled. The series of image processing techniques used in the post processing have been summarised in figure 6. The figure 6(a) depicts the filled segmented binary image. This is followed by opening to remove objects with sizes smaller than 50 pixels as shown in figure   6(b). Subsequently, a built in MATLAB function was applied to remove objects which touch the image border and the outcome is as shown in figure 6(c).
The image is then cropped to narrow down the focal area as depicted in 6(d) before the image is eroded with a structuring element of radius size 1 as shown in figure 6(e). The image is then dilated with the same disc-shaped structuring element. Then, opening was performed using a disc-shaped structuring element of a radius size 2 as shown in 6(g). The objects touching the border were then removed before being dilated using the same structuring element as in 6(g). After post processing the image is ready for feature extraction. Three features are extracted from the binary images obtained: the binary disc shape as shown in figure 7(a), the height and width features as shown in figure 7(b) and the full attribute images as shown in figure 7(c). The binary disc shape images were obtained from the image segmentation phase and manually cropped into their individual discs. The height and width features of each individual binary disc from the segmented images were determined from the built-in MATLAB function 'regionprops', whereas the full attribute images were the original MRI images which have been manually cropped into their individual discs withour prior pre-processing.
In the classification phase, the diagnostic process is performed on selected features -namely the segmented binary disc shape, the height and width feature, and the full attribute images. Several classifiers are applied to identify the best relation between the features and produce the best accuracy possible. The classifiers used are the support vector machine (SVM) and K-nearest neighbour (KNN [15][16] SVM is considered as a linear classifier which is used for classification and regression. SVM algorithm findsthe decision boundary or hyper plane that separates the data into different classes. SVM has less over-fitting problem as compared to other algorithms and works well with high dimensional data. The shortcoming of SVM includes large amount of time is needed to train the model if the dataset is big. Besides, it is difficult to choose a proper kernel function used in SVM. K-Nearest Neighbor classifier is the most fundamental type of classifier. It classifies the target data by using the majority class of nearest neighbors. By determining the correct number for k, the optimal classification result can be achieved. It implements a non-parametric technique where there is no assumption made for the data distribution in KNN algorithm. Besides, the training time is short as KNN stores all the training data. A new object can be classified according to the concept of its k nearest neighbours. An odd number is preferable for the value k as to prevent a situation in which the numbers of objects from both classes are the same. One of the benefits of implementing KNN algorithm is that this algorithm is relatively simple and easy to understand and interpret. The SVM and KNN classifiers have shown promising results for many biomedical applications. [17][18][19][20][21] The performance of the models implemented in this paper have been evaluated based on the accuracy, F1 score, precision and recall. Two types of classifiers were implemented in this paper to study the classification performance: support vector machine (SVM), K-nearest neighbour with K=3, 5 and 9. The performance of these models was evaluated based on accuracy, F1 score, precision and recall rate as shown in tables 1 to 6. These evaluations were based on TP, TN, FP and FN. These classifiers were applied to all the three sets of data: segmented binary images, and height and weight features;full attribute images. The segmented binary images have been selected as one of the features as the segmentation serves to enhance the shape and of the degenerated discs as well as to capture the annular fibrosis leak from the discs. The height and width has been selected as one of the features to be extracted due to the fact that herniated discs tend to have a wider width and shorter height due to the leaking. The full attribute images have also been selected as one of the features for comparison. Performance is evaluated both for balanced as well as unbalanced data.      Therefore, based on the results of all three sets of data, it can be said that the classifiers perform best when implemented on the segmented binary images in table 1, of which the KNN9 classifier produces the best results. As mentioned earlier, an increase in K values would result in an increase in the rate of performance evaluation, however this reduces the distinction between classes as the samples are obtained from neighbours located further away. It is also to be noted that K is typically chosen as an odd number to avoid having a tie between classes. 22 The unbalanced data was then used to compare the results with that of the balanced data as it represents data that is closer to real-world situations as in tables 4 to 6. It can be observed that after using all 138 images without balancing the data, the results across all three data types have dropped compared to that of the balanced data. The best overall performance was found to be from the height and width features using the unbalanced dataset with the KNN9 as the best performing model with 78.6% accuracy, 66.7% F1 score, precision rate of 75.0% and recall rate of 60.0 %. The best performing models from the binary image set and the full attribute image set are the SVM and KNN5 respectively.

Conclusion
In this paper a method has been proposed for the classification of lumbar disc herniation. Through the analysis of the results obtained, it can be observed that better results can be obtained when the dataset is balanced, however this is not likely to happen in real-world situations and therefore the unbalanced data set depicts results that are more realistic for the methods applied in this paper. This results in a drop in performance across all three types of features: binary images, height and width features and full attribute images, with the best performing data being that of the extracted height and width features, however the difference in performances of all three types of data are not too far off from each other as compared to the balanced dataset. Hence, it may be suggested that the unbalanced dataset could potentially produce better results for all three attributes in general. For classification KNN performs better than SVM. Further improvements in future may take the form of automatically segmenting the images, implementing deep learning methods to obtain better classification rates, and subsequently attempt to classify the discs according to their level of injury.