NOVEL FEATURE SET FOR AUTOMATIC ASSESSMENT AND CLASSIFICATION OF BREAST TUMOR THROUGH BACK PROPAGATION ARTIFICIAL NEURAL NETWORK

Breast cancer is a deadly disease having high mortality rate from several years. It is second and fourth leading disease in the world and India respectively as per the WHO. The conventional techniques are unsupervised to classify breast cancer that involves erroneous, laborious and demanding inevitable presence of clinician. It is also experimented on small dataset and the accuracy of the previous classifier methods was unsatisfactory. To overcome these problems, we have experimented on large dataset and extracted several features such as area, convex area, bounding box, eccentricity, orientation, solidity, and perimeter, contour based fractal dimension etc. These feature set describes the size and geometrical shape of the tumor. The increase in feature set leads to increase in the accuracy of the classification. The automatic classification is based on multilayer back propagation artificial neural networks (ANN) algorithm. The breast cancer tumors have an important clue in its boundary, hence analysis of that plays a vital role for better identification of disease. The dataset is split into training and testing data on an around 1700 samples using 80-20 rule with different neural network architectures. Hence the accuracy of 98.11% has been achieved in the classification rate. The successful classification depends on the quality of the enhanced mammograms, localization of tumor and accurate segmentation. The image samples from MIAS, DDSM and local hospitals had been involved in the experiment. 1998 S.V. SHWETHA, L. DHARMANNA, BASAVARAJ S. ANAMI


INTRODUCTION
Out of overall cancer mortality rate the breast cancer is the 38% as per the WHO statistics. As the diagnosis is dependent on the oncologist that is fewer in number, the mortality rate is high.
The conventional system could be upgraded by the automatic method of classification using Neural Networks. Artificial neural network learning provides an approach for learning decimal data function over discrete data and analog data attributes in a way that is very strong against to noise in the training sample. The back propagation method is the most usual leading network technique and had been successfully adapted to classify breast cancer medical images and several learning task like automatic vehicle driving, handwriting recognition, controlling robots.
In the figure 1, the hypothesis space considered by the back propagation technique is the space of all function that can be represented by assigning weights to the given fixed network of interconnected network units.
The general architecture of ANN has three layers such as input, hidden and output layers and two biasing units one for hidden layer and one for output layer as shown in the figure 1. The network of required size are capable of simulating a enough space of non linear results, creating feed forward network a better choice for learning analog and discrete function whose general form is not known in advance.
Back propagation method finds the space of possible hypothesis using gradient descent to repetitively minimize the error in the network and fits to the training samples. Gradient descent function converges to a recompilation of weights in the training error with respect to the network weights.

LITERATURE REVIEW
Denise Gulati, had presented fuzzy region growing segmentation [15], the growth starts from the seed pixel. The membership function of fuzzy C means, statistical measure of the region being grown. The obtained region have the information of nature of tumor either benign or malignant is in its contour, hence study of contour plays an important role for diagnose of breast cancer.
However the author didn't explore the idea for the automatic classification through any classifier that is not sufficient to the radiologist for automatic assessment for the screening.
Chaitanya Varma [16] demonstrated work on alternative approach to detect breast cancer using digital image processing technique. The author proposed texture based segmentation to detect early phase tumor. It does not involve human error. The author did not attempt texture based segmentation for various types of images. Also author didn't attempt several region based 2001 NOVEL FEATURE SET FOR AUTOMATIC ASSESSMENT AND CLASSIFICATION features for the diagnosis of disease. More over author did not involve automatic classifier.
Aqhsa Q [2] revealed work on detection of tumor in MRI images using artificial neural networks.
In this work the author diagnosed high accuracy, lesser delay and automatic detection of brain tumor through artificial neural network. The author also presented diagnosis of brain tumor and the statistical features like mean, median, variance and correlation. The main objective of this work is to classify brain tumor cell as either benign or malignant. The author didn't work on the breast tumor images. Amara Nedra et. al. [5] focused on detection and classification of the breast abnormalities in breast mammogram via linear support vector machine (SVM). In this technique the work has been categorized into three stages , 1) optimal k value selection for k means segmentation for the breast mass 2) Robust feature based on surf interest of the estimated region of interest.3) Classification using linear support vector machine for the two benign and malignant patients.

METHODOLOGY
The section describes work flow in seven phases that as follows input, Pre-Processing, RoI extraction, determination of feature Classification and followed by Diagnosis. In the first phase, Image database has been developed using state of art technique for considering various databases such as DDSM, MIAS, Kaggle and also collected images from local hospitals. In the pre processing phase the noise has been removed using algorithm Modified Gabor [12]. In phase-2 enhancement of mammogram had been obtained with the combination of ADT and Gabor filter.
The mathematical model and algorithm were presented [12].In the phase 3, the segmentation of the region of interest is extracted using the Region growing and merging and the other segmentation of ROI [13].  The average major axis length is 2.29 cm and malignant is 2.6 cm. It is the pixel distance between the minor-axis endpoints and is given by the relation,   The performance of the various architectures  The figure 6 shows the graph plotted different network architectures against accuracy. From the graph it can be inferred that better accuracy could be achieved with the increase in the number of hidden neurons and performance was saturated for 45 and above.
The Dataset of the breast tumor, X-ray mammogram used in this novel research work was down loaded from DDSM online database and some mammograms from local hospitals. All these are considered for experimental work. That database contains 1700 images of several patients. In that, 1200 number of images are malignant and 500 are benign tumor. All the dataset images detection result of the patients was confirmed through biopsy test.
The research work was experimented on MATLAB package [2013R] software for the classification using back propagation neural network. It can be inferred that CBFD is the best features in terms of test and easy way to classify the tumor on x-ray and ultrasound images.
Cancer tumors are the one that proven dangerous to the patient and hence required to take immediate attention. In order to evaluate the performance of the experiment sensitivity and specificity of diagnosis had been considered these two are statistical metrics terms which focus an importance of features related to find the presence and absence of the cancer tumor

CONCLUSION
In this work we have addressed the problem automatic classification of malignant and benign tumor using back propagation artificial neural network. Prior to this the input image has been enhanced by using modified Gabor filter [12] and that makes better visibility of the tumor in the ultrasound and mammogram. Then the region growing and merging algorithm is applied for the extraction of tumor region in the mammogram. The region growing and merging technique is the most accurate method for the segmentation of the contour of the tumor. From the tumor, the novel features like CBFD, tumor area, eccentricity etc. are extracted. In the existing work the tumor region FD had been analyzed using box counting technique that was not so accurate compared to contour based fractal dimension because this algorithm analyses purely on boundary of the tumor with various scale size and that produces excellent result [14]. With these feature set the automatic classification had been done using three layer back propagation ANN for 1700 samples, And arrived 98.11% accuracy which is depicted in the table 3 in the result with classification section and error rate will be minimum. It can be inferred that CBFD feature is the best among other features for the malignant tumor classification from the benign set. The Radiologist can consider this as a one of the important feature to diagnose the breast cancer detection. In future the algorithms like decision tree, random forest, and deep learning approach can be applied for higher accuracy and also faster classification and detection of tumor.