A Novel Method using Convolutional Neural Network for Segmenting Brain Tumor in MRI Images

ISSN: 2321-2381 © 2017 | Published by The Standard International Journals (The SIJ) 48 Abstract—Among brain tumors, glioma area unit the foremost common and aggressive, resulting in a awfully short life in their highest grade. Thus, treatment designing may be a key stage to enhance the quality of lifetime of oncologic patients. Magnetic Resonance Imaging (MRI) may be a wide used imaging technique to assess these tumors, however the big quantity of knowledge created by magnetic resonance imaging prevents manual segmentation in a very affordable time, limiting the use of precise quantitative measurements within the clinical apply. In this paper, we tend to propose associate degree automatic segmentation methodology based on Convolutional Neural Network, exploring small 3x3 kernels. The employment of small kernels permits coming up with a deeper architecture, besides having a positive impact against over fitting, given the less variety of weights within the network. We also investigated the employment of intensity normalization as a pre-processing step, which though not common in CNN-based segmentation methods, well-tried in conjunction with information augmentation to be terribly effective for neoplasm segmentation in magnetic resonance imaging pictures.


INTRODUCTION
LIOMAS are the brain tumors with the highest mortality rate and prevalence. These neoplasms can be graded into Low Grade Gliomas (LGG) and High Grade Gliomas (HGG), with the former being less aggressive and infiltrative than the latter. Even under treatment, patients do not survive on average more than 14 months after diagnosis. Current treatments include surgery, chemotherapy, radiotherapy, or a combination of them. MRI is especially useful to assess gliomas in clinical practice, since, it is possible to acquire MRI sequences providing complementary information. The tumor mass effect changes the arrangement of the surrounding normal tissues. Also, MRI images may present some problems, such as intensity inhomogeneity, or different intensity ranges among the same sequences and acquisition scanners [Bauer et al.,1;Menze et al.,2].
In this paper we report the setup and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty State -Of-The-Art tumor segmentations were Applied to a set of 65 Multi Contrast MR scans of low and high grade glioma patients manually annotated by up to four rates and to 65 comparable scans generated using tumor image simulation software. We found that different algorithms worked best for different subregions (reaching performance comparable to human interrates variability), but that no single algorithm ranked in the top for all sub-regions simultaneously [Zikic et al.,3;Rao et al.,4]. The major drawback of the existing system is that the accuracy is very less and efficiency is low  The existing methods are not faster and adaptive In our proposed system, among brain tumors, gliomas are the most common and aggressive, leading to a very short life expectancy in their highest grade. Thus, treatment planning is a key stage to improve the quality of life of oncological patients. Magnetic Resonance Imaging (MRI) is a widely used imaging technique to assess these tumors, but the large amount of data produced by MRI prevents manual segmentation in a reasonable time, limiting the use of precise quantitative measurements in the clinical practice. So, automatic and reliable segmentation methods are required; however, the large spatial and structural variability among brain tumors make automatic segmentation a challenging problem. In this paper, we propose an automatic segmentation method based on Convolutional Neural Networks (CNN), exploring small 3x3 kernels. The use of small kernels allows designing a deeper architecture, besides having a positive effect against over fitting, given the fewer number of weights in the network. We also investigated the use of intensity normalization as a pre-processing step, which though not common in CNN-based segmentation methods [Dvorak & Menze,5].

RGB Color Image
The RGB color model is additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additive primary colors, red, green, and blue. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers, though it has also been used in conventional photography. Before the electronic age, the RGB color model already had a solid theory behind it, based in human perception of colors. RGB is a device-dependent color model: different devices detect or reproduce a given RGB value differently, since the color elements (such as phosphors or dyes) and their response to the individual R, G, and B levels vary from manufacturer to manufacturer, or even in the same device over time. Thus an RGB value does not define the same color across devices without some kind of color management [Meier et al.,6].

Grayscale
Grayscale is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest [Tustison et al.,7].
Grayscale images are distinct from one-bit bi-tonal black-and-white images, which in the context of computer imaging are images with only the two colors, black, and white (also called bi level or binary images). Grayscale images have many shades of gray in between. Grayscale images are also called monochromatic, denoting the presence of only one (mono) color (chrome).

Module 2: Convolutional Neural Network
CNN were used to achieve some breakthrough results and win well-known contests, the application of convolutional layers consists in convolving a signal or an image with kernels to obtain feature maps. So, a unit in a feature map is connected to the previous layer through the weights of the kernels. The weights of the kernels are adapted during the training phase by back propagation, in order to enhance certain characteristics of the input. Since the kernels are shared among all units of the same feature maps, convolutional layers have fewer weights to train than dense FC layers, making CNN easier to train and less prone to over fitting. Moreover, since the same kernel is convolved overall the image, the same feature is detected independently of the locationtranslation invariance. By using kernels, information of the neighborhood is taken into account, which is a useful source of context information. Usually, a nonlinear activation function is applied on the output of each neural unit. If we stack several convolutional layers, the extracted features become more abstract with the increasing depth. The first layers enhance features such as edges, which are aggregated in the following layers as motifs, parts, or objects [Krizhevsky et al.,9]. The following concepts are important in the context of CNN:

A. Initialization
It is important to achieve convergence. We use the Xavier initialization. With this, the activations and the gradients are maintained in controlled levels, otherwise back-propagated gradients could vanish or explode.

B. Activation Function
It is responsible for non-linearly transforming the data. Rectifier linear units (ReLU), defined as ( ) = (0, ) (1) were found to achieve better results than the more classical sigmoid, or hyperbolic tangent functions, and speed up training. However, imposing a constant 0 can impair the gradient flowing and consequent adjustment of the weights. We cope with these limitations using a variant called leaky rectifier linear unit (LReLU) that introduces a small slope on the negative part of the function. This function is defined as where is the leakiness parameter. In the last FC layer, we use soft max.

C. Pooling
It combines spatially nearby features in the feature maps. This combination of possibly redundant features makes the representation more compact and invariant to small image changes, such as insignificant details; it also decreases the computational load of the next stages. To join features It is more common to use max-pooling or average-pooling.

D. Regularization
It is used to reduce over fitting. We use Dropout in the FC layers. In each training step, it removes nodes from the network with probability p. In this way, it forces all nodes of the FC layers to learn better representations of the data, preventing nodes from co-adapting to each other. At test time, all nodes are used. Drop out can be seen as an ensemble of different networks and a form of bagging, since each network is trained with a portion of the training data [Havaei et al.,10].

E. Data Augmentation
It can be used to increase the size of training sets and reduce over fitting. Since the class of the patch is obtained by the central voxel, we restricted the data augmentation to rotating operations. Some authors also consider image translations, but for segmentation this could result in attributing a wrong class to the patch. So, we increased our data set during training by generating new patches through the rotation of the original patch. In our proposal, we used angles multiple of 90•.

F. Loss Function
It is the function to be minimized during training. We used the Categorical Cross-entropy, H=− c j,k log (c j,k ) j∈voxels k∈classes Where ˆ c represents the probabilistic predictions (after the soft max) and c is the target. In the next subsections, we discuss the architecture and training of our CNN. Figure 6: Segmentation

Module 3: GLCM Feature Extraction
The co-occurrence matrix and texture features were initially used for the automated classification of rocks. The fourteen Haralick measures were used to extract useful texture information from the co-occurrence matrix. From then on the GLCM has been one of the commonly used tools for texture analysis because it can estimate image properties related to second-order statistics. An image with the size of pixels and gray levels could illustrate the frequency of pixel (i.e) at the position) occurrence with gray level and in accordance with a distance d from a certain pixel at the position ( ) with gray level. Frequency is denoted by ( ) and its mathematical expression is Haralick Features describe the correlation in intensity of pixels that are next to each other in space. Haralick proposed fourteen measures of textural features which are derived from the co-occurrence matrix a well-known statistical technique for texture feature extraction. It contains information about how image intensities in pixels with a certain position in relation to each other occur together. Texture is one of the most important defining characteristics of an image. The grey level co-occurrence matrix is the two dimensional matrix of joint probabilities) between pairs of pixels separated by a distance " " in a given direction " ". The second order image histogram referred to as the Grey Level Co-occurrence Matrix (GLCM) of an image offers greater information about the inter-pixel relationship, periodicity and spatial grey level dependencies. This matrix is a source of fourteen texture descriptors [Srivastava et al.,11].

Module 4: Post-Processing
Some small clusters may be erroneously classified as tumor.
To deal with that, we impose volumetric constrains by removing clusters in the segmentation.

III. CONCLUSION
We propose a novel CNN-based method for segmentation of brain tumors in MRI images. We start by a pre-processing stage consisting of bias field correction, intensity and patch normalization. After that, during training, the number of training patches is artificially augmented by rotating the training patches, and using samples of HGG to augment the number of rare LGG classes. The CNN is built over convolutional layers with small 3x3 kernels to allow deeper architectures.
In designing our method, we address the heterogeneity caused by multi-site multi-scanner acquisitions of MRI images using intensity normalization as proposed by Ny´ul et al. We show that this is important in achieving a good segmentation. Brain tumors are highly variable in their spatial localization and structural composition, so we have investigated the use of data augmentation to cope with such variability. We studied augmenting our training data set by rotating the patches as well as by sampling from classes of HGG that were underrepresented in LGG. We found that data augmentation was also quite effective, although not thoroughly explored in Deep Learning methods for brain tumor segmentation. Also, we investigated the potential of deep architectures through small kernels by comparing our deep CNN with shallow architectures with larger filters. We found that shallow architectures presented a lower performance, even when using a larger number of feature maps. Finally, we verified that the activation function LReLU was more important than ReLU in effectively training our CNN.

IV. FUTURE ENHANCEMENT
In future research, the results can be slightly improved through any latest algorithms for removal of noise and with improved accuracy and efficiency.