Genetic Algorithm Optimized Back Propagation Neural Network for Knee Osteoarthritis Classification

Osteoarthritis (OA) is the most common form of arthritis that caused by degeneration of articular cartilage, which function as shock absorption cushion in our joint. The most common joints that infected by osteoarthritis are hand, hip, spine and knee. Knee osteoarthritis is the focus in this study. These days, Magnetic Resonance Imaging (MRI) technique is widely applied in diagnosis the progression of osteoarthritis due to the ability to display the contrast between bone and cartilage. Traditionally, interpretation of MR image is done manually by physicians who are very inconsistent and time consuming. Hence, automated classifier is needed for minimize the processing time of classification. In this study, genetic algorithm optimized neural network technique is used for the knee osteoarthritis classification. This classifier consists of 4 stages, which are feature extraction by Discrete Wavelet Transform (DWT), training stage of neural network, testing stage of neural network and optimization stage by Genetic Algorithm (GA). This technique obtained 98.5% of classification accuracy when training and 94.67% on testing stage. Besides, classification time is reduced by 17.24% after optimization of the neural network.


INTRODUCTION
Osteoarthritis (OA), which is also known as degenerative joint diseases, is the most prevalent one among the chronic rheumatic diseases (Axelsson and Björhall, 2003).The key feature of the osteoarthritis is the degeneration of joint's cartilage that functions as cushion in our joint.According to research done in United Kingdom by Arthritis Research UK (2013), knee is the most common site infected by osteoarthritis in people aged 45 and above.These chronic rheumatic diseases' patients are estimated to increase from 4.7 million in 2010 to 6.4 million by 2035.
There are few methods can be used for diagnose osteoarthritis but the most common diagnose method that been used lately is through Magnetic Resonance Imaging (MRI).According to Gold et al. (2009), MRI gain popularity because it is able to produce high quality images of the anatomical structures of knee by manipulates contrast of different tissue types.Conventionally, interpretation of MRI images is done manually by physicians, which are very subjective, time consuming and inconsistent.The situation may become worst if the among of patients exceed certain limit.Thus, there is a demand of automated knee osteoarthritis classifier, which able to reduce the time consumption for interpretation as well as avoid the inconsistent of the interpretation.
In digital image analysis, feature selection/ extraction are used to extract or retain the optimum salient characteristics for proper analyze and classify the image (Bandyopadhyay and Pal, 2007).These feature extraction process also able to reduce the dimensionality of the measurement space, thus minimize the time consumption of image processing.There are few techniques used to extract feature from an image recently, such as Principle Component Analysis (PCA) and wavelets transform.Based on the study of En and Swee (2013), wavelets transform is an effective tool for pattern recognition due to its ability that provides time-frequency analysis and decomposition of image.
Basically, classification approaches are divided into two categories, which are supervised classification and unsupervised classification (Zhang et al., 2010).According to Kotsiantis et al. (2007) and Sapkal et al. (2007), supervised classification approach will classifies a set of images with certain pre-given images, references and template while unsupervised classification approach will classifies images based on   string of binary number.Fitness of each chromosome, which will represent the degree of goodness with respected to the solution of the problem is calculated (Bandyopadhyay and Pal, 2007).Then, several biological inspired operators such as mutation, crossover and selection are applied on that chromosome for better solution.This is because higher fitness of the chromosomes tends to produce potentially better solution.The crossover and mutation operations are as shown in Fig. 2. classes.Then, model testing stage is carried out for testing the accuracy and reliability of the model.Lastly, optimization using Genetic Algorithm (GA) is applied with the purpose of reduces the processing time of the classifier.Figure 3 show the system architecture of this project.

Input datasets:
The input datasets are provided by large multi-centered study, Osteoarthritis Initiative (OAI).These images consist of 384×384 pixels which acquired from Sagittal 3D double echo steady state (SAG 3D DESS) sequences.The ability of provide clear cartilage delineation allow SAG 3D DESS water excitation image make it suitable for morphological measurement such as cartilage thickness and volume (Peterfy et al., 2008).shows the image of normal knee (Fig. 4a) and osteoarthritis knee (Fig. 4b).As shown in the figure, cartilage of the normal knee is thicker than osteoarthritis knee, which its cartilage is been degenerated.

Discrete Wavelet Transform (DWT)-feature extraction:
Feature extraction method used in this project is Discrete Wavelet Transform (DWT), which able to decompose discrete time signals by using high pass and low pass filter into few sub-bands with different scales.When the image is inserted into the system, the image is scanned horizontally and thus decomposes the data using low pass and high pass filter.Then, the filtered data is scanned again vertically and generate four sub-bands, which are LL, LH, HL and HH.Among these sub-bands, LL sub-band able to proceed with second level decomposition process, because it contains the most signal's energy.Figure 5 shows the example of DWT image.As shown in the figure, LL sub-band is the most clearest among four sub-bands.
Back propagation neural network: Back propagation neural network is the most commonly used method as a learning algorithm.During the training stage, the extracted images by DWT are used as inputs for neural network development.This input will continue to propagate along the network, from input layer to hidden layer until it reach output layer.The difference between actual output and target output is considered as error, thus back propagate to the earlier layer and updating the weights (Jiang et al., 2010).In this project, neural network will be constructed using single hidden layer with sigmoid activation function and output layer is adopted in the classification.According to the DWT results, third level decomposition images give the most suitable input vector size (2304 pixels) to be handled by the classifier.Thus, input of the neural network is designed as 2304 input nodes, 40 hidden nodes (according to trial and error method) and 2 output nodes which represent two classes, such as normal knee and osteoarthritis knee.The summary of the training process is shown as below (En and Swee, 2013): while is the learning rate of the network  Repeat process 2-6 until difference is sufficiently small (Fig. 6) Genetic algorithm: As mentioned, neural network constructed with 2304 input nodes, 40 hidden nodes, 2 output nodes and associate set of weights.However, not all the weight obtained is important for image classification.These unimportant weights might affect the classification processing time.Thus, optimization In genetic algorithm theory, the nodes of hidden layer can be represented by a chromosome (a string of binary value).Therefore, population pool in genetic algorithm is formed by that possible selection of the node.Fitness of the entire chromosome is calculated and two chromosomes are selected as parent chromosome, which then undergo genetic operation such as crossover and mutation.The process of genetic operation stop when it met the stopping criteria, which is accuracy must more than 94%.Once the appropriate hidden nodes are selected, elimination of unimportant nodes is done for optimize the classification processing time.

RESULTS AND DISCUSSION
Level of wavelet decomposition: As mentioned, knee MR images provided by Osteoarthritis Initiative (OAI) are 384×384 pixels (147456 pixels).Thus, Discrete Wavelet Transform (DWT) is used to reduce the image size in order to minimize the computational time.Relationship between decomposition level and classification accuracy is shown in Table 1.
The classification accuracy is tested by neural network with 40 hidden nodes, 0.01 learning rate and 0.9 threshold value.Inputs of neural network at decomposition level 1 to decomposition level 4 are 36864, 9216, 2304 and 575 pixels respectively.From the experiment results, level 1 and 2 wavelet decomposition achieves highest accuracy (100%), followed by level 3 (98.5%)and then level 4 wavelet decomposition (61.5%).This indicates that level 4 wavelet decomposition lost most of the information that subsequently affect the classification accuracy.Among first three level of wavelet decomposition, level 3 wavelet decomposition is selected as inputs vector due to its high accuracy and small image size which will reduce computational time.
Based on the results in Table 2, classification accuracy does not increase gradually when the hidden nodes is increased.However, smaller network have its limitation, which is easily been trapped in the local minimum.Therefore, network learning rate plays important role on generalization accuracy (Kavzoglu, 1999).As shown in Table 2, classification accuracy increased as the learning rate decreased.Although the smaller learning rate will prolong the classification process, it is worth in order to obtain higher generalization accuracy.Thus, classifier with 40 hidden node and 0.01 learning rate is been selected.

Optimization by Genetic Algorithm (GA):
As mentioned above, classifier with 40 hidden nodes and 0.01 learning rate is been selected for classification.However, not all the nodes are important for classification purpose.Therefore, optimization by Genetic Algorithm (GA) is conducted to reduce the computational time when classification.Table 3 shows the relationship between classification accuracy, hidden nodes and computational time after optimization: % % . . .100% 17.24 Initially, there is 40 hidden nodes in the constructed network.However, only 29 important hidden nodes are selected and used in the new network construction.Other unimportant nodes been removed for minimize the computational time.After the optimization, computational time is reduced 17.24%.Although the difference of computational times does not significant when deal with 150 MR images, it will show the significant improvement when deal with large number of knee MR images.

CONCLUSION
Based on the study, the third level of decomposition by Discrete Wavelet Transform (DWT) extract and reduce the feature of the MR images efficiently, without signification loss of information.Lastly, the network constructed with 29 hidden nodes, 0.01 learning rate, 0.9 threshold and input image with third level of decomposition by DWT yield 94.67% of classification accuracy.This GA optimized ANN-based classifier can be used as computer-aided tool to assist the physicians in knee osteoarthritis diagnosis.

Fig. 5 :
Fig. 5: Example of DWT image and threshold  Apply a sample (input pattern, X k that had targeted output, T i )  Propagate the signal through network and compute actual output, O i :

Fig. 6 :
Fig. 6: Back propagation neural network structure using Genetic Algorithm (GA) is suggested in order to reduce the processing time of the classifier.In genetic algorithm theory, the nodes of hidden layer can be represented by a chromosome (a string of binary value).Therefore, population pool in genetic algorithm is formed by that possible selection of the node.Fitness of the entire chromosome is calculated and two chromosomes are selected as parent chromosome, which then undergo genetic operation such as crossover and mutation.The process of genetic operation stop when it met the stopping criteria, which is accuracy must more than 94%.Once the appropriate hidden nodes are selected, elimination of unimportant nodes is done for optimize the classification processing time.

Table 1 :
Classification accuracy of classifier and DWT decomposition level relationship

Table 3 :
Relationship between classification accuracy, hidden nodes and computational time