EFFECTIVE ANALYSIS OF BRAIN TUMOR USING HYBRID DATA MINING TECHNIQUES

: In the recent era there has been a rapid growth in the availability of medical databases and medical imagery in the past few years, and the uncertainty involved in the effective prediction of diseases from these databases made the research community to take up the challenges in this domain . The Central Nervous System of mankind being is mainly composed of Spinal Cord, Brain and Neurons. Largest part of brain is Cerebrum. The human brain throws good number of challenges to the research community. The Brain Tumour or the Intra Cranial Neoplasm is formed due to the irregular growth of cells in the brain. This sort of irregular growth of cells in brain damages frontal, temporal and parietal lobes thereby results in abnormal behaviour. Machine learning algorithms provide computers with the ability to learn without being explicitly programmed. The clinical brain data set was analysed effectively using machine learning algorithms and made conclusions on the results. In this paper we applied Hybrid Data Mining methods which in turn consists of Clustering, Classification andAssociation techniques and further analysed the results using some statistical techniques.

There has been a rapid growth in the availability of medical databases and medical imagery in the past few years.The uncertainty involved in the effective prediction of diseases from these databases made the research community to take up the challenges in this domain.Brain tumor [1] is due to the uneven multiplication of cells in brain.Two types of Brain Tumors exists namely Primary(basic) and secondary, in case of Primary they originate in the brain itself whereas; in the case of secondary it spreads to brain from some other part of the body by metastasis and results malignant.Basic type of brain tumors will not affect or spread to the rest all body parts; they may be benign or malignant.These two types of tumors pose severe threat for life.Increased Intracranial Pressure occurs as there is limited space in skull.This ends up in reduced blood flow, edema and displacement.Further all healthy tissues are degenerated that act as controllers for all important functionalities in body.The next largest source of cancer is brain tumor which is enhancing the death rate in children and young adults.Brain tumor location specifies position of the tumor in the brain; it generates various symptoms, each of which can be related individual's functioning.Brain tumor not onlyaffects the Central Nervous System (CNS) and the Spinal cord, but many more vital parts of the body.There are two possible ways of predicting or analyzing the Brain Tumors i.e. either by the application of Image Processing Techniques on MRI Images along with the classification or Data Mining Techniques application on Attribute information based data sets.The Data Mining Techniques were applied for Effective Analysis and also to arrive with desired results.Popular Data mining techniques are ARM (Association Rule Mining), k-Means clustering and decision trees.Categories of Data mining methods are Unsupervised, Reinforcement and supervised learning.Learning data in a supervised form is seen as learning based on a trainer which in turngenerates feedback to understand a new task.Feedbackoutcome is shown using a training set.Predictive modeling is used for analyzing existing database so as to govern the data's essentialcharacteristics.Usually for every model being developedtwo phases exists i.e., training and testing.These specializations are part of classification and enable in deriving valued prediction.

Algorithm C4.5 (J48):
Quinlan in 1993 proposed this classification algorithm.By recursively partitioning the given data set this algorithm generates a decision tree.Depth-first method is utilized to build this decision tree internally.Algorithm approach is to split the data set that helps in selecting the best data set which enhances the information gain value.For every discrete attribute, one test with consequences of as numerous as the number of different values of the given attribute is measured.For every continuous attribute, binary tests are considered so that rest all possible distinct values of the attribute is addressed.To conclude, the value related to entropy gain is congregated by applying efficiently all binary tests.This procedure is iterated for every continuous attribute.

Clustering:
This method enables decision making by grouping similar models.This gives a scope for understanding thestructure of data.The most concerned issue is in determining the total number of unknown clusters.One of the finest known examples related to unsupervised learning is clustering.The major goal of clustering lies in grouping similar data points by utilizing some metrics , the most commonly used metric being is Euclidean distance.One of the most popular techniques for clustering is Kmeans.Reasons for the popularity of this technique include the absence of drawbacks of other types.For easy interpretation of results that could enhance the computational efficiency k-means technique usage is advantageous.The drawback of k-means include random choice of centroid locations at the beginning of the algorithm, handling of variables as numbers and the unknown count of clusters 'K'.By iterating and using a good initialization method the first drawback can be used instead of any good initialization method.Matching dissimilarity is a good solution for handling categorical data thereby eliminating the second drawback.The third drawback can be eliminated by using cluster validity index, this helps in deciding the count of clusters to be considered.

ARM (Association Rule Mining)
Association rule mining helps in discovering most related transactions.It finds the relationships between item sets built on their co-occurrence in the transactions.Particularly; it discovers the frequent patterns amongst the available item sets.For example, what are the most frequently purchased items in a shopping center.Assume that there are a total of m items in the database; well defined as set of items I1 = {i1, i2, . . .,im}.A transaction T1 is a set of all items such thatT1⊆I1.Consider the set of all transactions as D1.A transaction T1 contains X1 if X1⊆T1and as well as X1⊆I1.Anassociation rule is a sort of implication taking the formX1 →Y1 where X1⊂I1, Y1⊂I1 and at the same time X1 ∩Y1 = Ø.Two parameters are to be calculated for anyrule derived in this regard, one is confidence and theother is support.A rule R1 = X1→Y1 holds with the confidence value as c, if c% of the transactions o are part of D1 which also specifies that a transaction that has item X1 will also have item Y1 (i.e., c = P(Y1 | X1)).The rule R1 holds with support s if s% of the transactions in D1consists of both X1 and Y1 (i.e., s = P(X1 .Y1)).

II. RELATED WORK
V.P.GladisPushpaRathi et al. [1] stated that "feature extraction is a technique of catching image visual content in raw form and its storage in reduced form that facilitates decision making process, this type can be treated as pattern classification.This technique clubs features such as shape, texture, intensity and classifies the tumor as, gray matter, white matter, CSF, normal and abnormal area.One of the classification techniques SVM serves as measure for comparison between linear verses non-linear techniques.To reduce the number of features selected LDA and PCA techniques are used".RoopaliR.Laddha et al. [2] stated that "Exact detection of location and size of brain tumor plays a vital role in diagnosing it.An effective algorithm was proposed for brain tumor discovery which is based on morphological and segmentation operators.Quality of the image derived should be enhanced and it is subjected to morphological operators so as to detect the tumor.By using MRI and CT images redundant and complementaryinformation is captured by implementing wavelet based technique".VarshaKshirsagar et al. [3] stated that "Simple Algorithms are used for detection of shape and range of tumor through brain images further computer aided methods like segmentation (detection) can be implemented to detect.Accuracy of tumor detection enhances on using segmentation techniques.Time for analyzing all such images also reduces.This entire process outcome would be detection of brain tumor with input being MRI along with its size and position.From the total area of the tumor derived from a specified cluster stage and the intensity of the tumor is defined".Punithavathy Mohan et al. [4] stated that "This deals with image segmentation, its enhancement, segment extraction and the classification of the Brain MRI.Proposed algorithm is CLAHE, k-means and ACO (AntColony Optimization).ACO is utilized for segmenting the image and k-means algorithm is utilized for classifying tissues as abnormal and normal from the brain MRI with very less time complexity and good level of accuracy".Jay Patel et al. [5] stated that "various clustering techniques like Fuzzy Cmeans, k-means Clustering have been used for different segmentation methods namely Thresholding, Region growing and Mean shift in MRI are reviewed".Alan Jose et al. [6] stated that "Using fuzzy c-means, kmeans clusteringsegmentation carried out results in detection of brain tumor and its position.Fuzzy c-means performance is much better when compared with other techniques for segmentation.This process also enables in understanding the tumor stage, its severity and whether it is at a curablestate or not".Shubhangi S. Veer et al. [7] stated that"The global threshold and watershed segmentation areuseful techniques for the segmentation of MRI or CT scan images.The global threshold technique separates out diseased region by considering a single threshold point and further binary image is derived from the gray scale image.Watershed segmentation technique also separates out tumor region successfully from the brain tumor MRI region.From these segmented images it is possible to get the detail information about the tumor location, its shape and area of the extracted tumor regionis measured in terms of the number of white pixels in the segmented image".Roshan G. Selkar et al. [8] statedthat "Brain tumor identification helps in the discovery ofexact shape, size, location and boundary extraction of the tumor.The process has a total of three stages to identify and then to segment the brain tumor.Firstly the quality of image scanned has to be improved later the morphological operators were applied to identify the tumor in the imagescanned.After that edge detection operator is applied for boundary extraction and to find the size of the tumor".Gauri P. Anandgaonkar et al. [9] stated that "In order to locate tumor from the MR images derived different segmentation methods can be used.A method has beenproposed to carry out the same utilizing an algorithm called Fuzzy C-Means and then to calculate the area of tumor, this would enable in deciding on type of brain tumor".Rahil Malhotra et al. [10]stated that "segmentation of the brain tumor area for MR images was made and further it is subjected to image quantization process.This would help in applying the clustering process to understand various areas affected by brain tumor and finally with the ROI method detection of the brain tumor would reflect the segregated part of brain tumor".A.K. Mohantly.et.al [11][14] have applied various techniques for the identification of cancerous tissues .Image mining is one of the technique through which we can mine the hidden data although the image is not in clear.By applying the image mining technique in testing point,space partition is used to categorize the image .S.W. Purnami.et.al [12] applied data mining technique" Multiple Knot Spline Smooth Support Vector Machine", for identifying the medical diagnosis issues majorly diabetes diseases and heart diseases.Shweta Kharya [13] has discussed about different data mining techniques which are used for the identification of breast cancer classification.N.H. Rajini.et.al [15] [18] have projected majorly on two phases which are one of the feature extraction and the other is classification .In the first phase they have obtained the feature extraction which belongs to magnetic resonace imaging using discrete wavelet transformation and in the second phase they have used classification algorithms of feed forward back propagation artificial neural network and k nearest neighbor for automated diagnosis process B.G. Prasad.et.al [16]have stated various data mining classifiers for the purpose of medical image classification where they have applied J48 decision tree and random forest for grouping the CT scan brain images.The projected developed system is mainly based on texture information of images.V.S. Tseng.et.al [17] projected image categorization methods by applying association rules on the image objects.The developed mehod is divided into phases .In the first phase construction of object hierarchy and in the second phase applying the multi level mining algorithms for identifying the image classification rules.A. H. Gondal.et.al [19] evaluated various techniques which highlights new outlook for brain tumor detection from magnetic resonance images.

III. PROPOSED SYSTEM
In the proposed system, at the beginning raw data setswere taken and refined so as to derive the refined data set by subjecting it to various methods of pre-processing.By which the noise and missing values were also addressed.Initially various classification techniques were applied and tested.The results derived are summarized and effectively analyzed for decision making.It was found that if initially clustering method k-means was applied, the resultant data set was taken as input for the decision tree classifier algorithm J48.The derived data set is subjected to Apriori Association algorithm.The derived results were analyzed further for effective decision making.Thus, this sort of Hybrid Data Mining technique helped in showing good results.

Fig. 2 .
The Clustered based classifier data set further subjected to AprioriAssociation Algorithm and 20 best Associations ruleswere generated and shown in Fig.3.It is observed fromthe association rules that Mortality of Persons Rate is depends on Mortality of Males Rate and mortality of Females Rate.The Fig.4shows the 3-D visualization of Association rules in which the Antecedents and consequents were shown.

Table - 1
Depicts the results of 7 classifiers namely Nearest Neighbour with Generalization (NNge), Best-First Decision Tree (BF Tree), CDT: Decision tree the learner based on imprecise probabilities and the uncertainty measures, Hoeffding tree (VFDT), J48: it's a class for creating a pruned or un-pruned C4.5, LAD Tree and Random Forest.The results were depicted in Table-1 and Table-2.In Table -1 the parameters namely correctly classified instances, MAE(Mean Absolute Error), KappaStatistics, RMS(Root Mean Squared) error, RRSE ( Root Relative Squared)Error, RA(Relative Absolute) Error,Time required to form the model and Time taken to Test the model values are depicted.It is observed from the Table-1 that two classifiers namely Random Forest, NNge did better in classification and next the LAD Tree did better than other set of classifiers.It is shown in Table-2 that the performance of 7 Classifiers in terms of parameter's namely FP Rate, TP Rate, value of Recall, value of Precision, value of ROC area, value of F-Measure and their accuracies.From the Table-2 it isfound that the two classifiers namely Random Forest, NNge did better performance wise in classification and next the LAD Tree performed well when compared to other set of classifiers.To enhance the results of classifiers like J48 initially the data mining technique clustering namely k-means clustering method need's to be applied initially and then the resultant clustered dataset was subjected to J48 classifier so as to enhance the results.The results of clustered data set were shown in Table-3.The resulted clustered sets are namely,Meningesand other CNS (Cluster 0), Brain (Cluster1),Brain and the Nervous System (Cluster2).Cluster 0: The age group centroid lies between 0-4.Most of the instances were clustered under this category.It is revealed from the Table-3 that Males Mortality rate is more when compared to Females mortality rate.Cluster 1: The age group centroid lies between 40-44.Less number instances only i.e. 16% were clustered under this category.It is revealed that from the Table-3 that Males Mortality rate is more when compared to Females mortality rate.Cluster 2: The age group centroid lies between 40 -44.Less number instances only i.e. 16% were clustered under this category.It is revealed that from the Table-3 that Males Mortality rate is more when compared to Females mortality rate.It is observed that the Tumor types Brain and Brain andNervous System are behaving in the similar manner without much deviation.The Clustered data sets were shown in Fig.1.It is observed that the J48 classifier results wereimproved from 67% before clustering to 100% after clustering.This is the advantage of Hybrid data mining and showed the performances of it in the Table -1 and Table-2.The resultant J48 decision Tree obtained on clustered data set was shown in Fig.2.The classification cancer type majorly depends on Mortality of person's Number attribute and is observed in