Classi�cation and Segmentation of Breast Tumor Using Mask R-CNN on Mammograms

Purpose Breast cancer has caused more deaths in women compared to any other cancer that might be found among women. With that being said, this research has proposed a method which can detect classify and segment the different types of breast tumors. This paper has also discussed the different methods by which the breast cancer has been classified and segmented in the past. Method Breast cancer can be detected in its early stages by MRI and/or mammography of the breast muscles. For this research a novel approach is proposed for breast cancer detection, classification and segmentation. The proposed framework uses breast mammograms from the CBIS-DDSM (Curated Breast Imaging Subset of DDSM) DICOM images. Mammograms are radio images of a muscle. The DICOM data has been preprocessed in such a way that it could be incorporated with more traditional format, then the patches from the mammogram images have been taken out and finally fed into the Mask RCNN neural network. Results The outcome of the approach is, that the proposed framework is able to localize cancer tumor, even when it has developed in multiple regions, making it a multi-class classifier. The framework is also able to classify whether the tumor is benign or malignant as well as segments the cancerous tumor region with a pixel wise annotation . The average accuracy observed is about 85% on test cases, with precision value of 0.75, recall being 0.8 and F1 score 0.825. Conclusion The proposed framework is cost efficient and can be used as a helping tool for the radiologist in breast cancer detection. In future the proposed approach can also be implemented on other cancerous tumors for classification and segmentation purposes.


Introduction
Over the years breast cancer has become one of the leading causes of death due to cancer among the women population of the world, even though it has affected men, but the major victims of this type of cancer are mainly women. According to [2] there were 18.1 million new cancer cases and 9.6 million cancer deaths in 2018 alone, these numbers represent both sexes hence. In both the sexes combined, lung cancer was diagnosed to almost 11.6% of all the total cases that were reported back in 2018, and almost 18.4% died due to this type of cancer, second most leading cause of death due to cancer is female breast cancer which is placed second most harmful cancer to humans with about 11.6% of overall deaths due to it, making it the leading cancer type which kill women yearly. Another older research [5] shows that nearly 1.7 million breast cancer cases were reported back in 2012 and out of those 1.7 million cases almost 521,900 died [5]. They [5] also state that Breast cancer causes about 15% of all the cancer related deaths among women. With that being said, breast cancer does not limit itself to female gender, males are also a victim to this sort of cancer, though the cases reported every year are nowhere near to that of women cases and deaths. The breast cancer cases accounts for 0.8%-1% among men [6,7], the scope of this research is only limited to breast cancer among women. We can see from recent data analysis on breast cancer among women in US [3], that the number are on a rise, the deaths have come to a halt and in some cases, they have reduced but the reported cases have not yet been controlled or lowered even in the recent times. This research is here to support the early screening and diagnosis of breast cancer through mammograms. The early detection and classification of breast cancer is the only way to suppress cancer using artificial intelligence and more specifically deep learning, there might be ways to develop many medicines that could become an outbreak for cancer treatment, but for now the early screening, classification and segmentation of breast cancer could contribute wonders towards a bigger picture of CAD (computer aided diagnosis). Breast cancer could be screened and/or diagnosed by a number of ways, among those numerous ways to diagnose a breast cancer patient, two stand out, MRI and mammography. Depending on the doctor, usually the doctors after some physical exam suggest the patient to get MRI or mammogram of their breast muscles for a clearer picture of what's going on inside the muscles [27]. As mentioned in their research [13], that breast cancer can be early diagnosed by MRI and mammography, and besides MRI, mammography is the fastest medical way to diagnose breast cancer, it is a 20-minute procedure and not just that it is one of the safest methods of diagnosis compared to any other treatment [28]. According to these past researches [29][30][31][32] we can also say that mammogram is the only diagnostic method which has results in reducing the chances of deaths due to cancer, since it has played such a key role in the early detection of breast cancer. Even though some researches such as [33] might argue that histopathological imaging of breast cancer would provide more accurate results and that doctor's approach towards cancer diagnosis should be histopathology, but in our defense, we can say that the amount of time and money which histopathology takes is enormous compared to what mammography takes, in addition to that mammography is a harmless procedure. Ever since the advancements in the field of artificial intelligence especially in the field of deep learning, we have seen an increase in the number of medical images analysis-based researches. Machine learning algorithms have opened the doors that the researchers didn't know existed in the past. Due to that there have been multiple researches in the past which have incorporated the machine learning techniques in order to detect, classify and segment breast cancer tumor in a breast mammogram image. In [17] the researchers found that since 2010 and even before that people from different regions of the world have been working to automate the process of screening and diagnosing breast cancer tumors, they [17] have mentioned numerous researches where the people had incorporated different machine learning algorithms to solve the common task at hand which is breast cancer being diagnosed by a computer aided technology, we can very simply say that machine learning algorithms are not made to process image data and that deep learning algorithms completely out perform them when it comes to any form of image based modeling. With that being said, machine leaning algorithms mentioned in [17] mostly achieved accuracy which could be easily be outperform a radiologist. But many deep learning researches say otherwise. In [15] the researcher was able to obtain the data of about 800 breast cancer patients who had underwent a biopsy, with this data he implemented a confusion matrix and ROC analysis, he was able to obtain 90.5% accurate results, classifying the given data into benign, malignant or normal. Another research [18] using artificial neural network and mammogram images trained a Probabilistic Neural Network (PNN) which was able to classify the breast tumor with over 80% accuracy, they have also mentioned other methods to segment and make a bounding box using Fuzzy c-Means (FCM) and Discrete Wavelet Transform (DWT) [18]. But simple artificial neural networks are not made for the image-based data, in fact there's an algorithm called convolution neural network [34] which out performs any other type of algorithm when it comes to image analysis, classification, segmentation and much more. CNN has been incorporated by numerous researches for solving the breast cancer classification and segmentation. Recently due to the advancements in deep learning, specifically in the convolution neural network area, we can see a growing number of researches that revolve around the medical image analysis as we can see from these researches [19][20][21][22][23][24][25], most of these researches are related to the classification of some disease using convolution neural networks and some other deep learning algorithms. Moving towards more specific use of convolution neural network in detection and classification of breast cancer using mammograms and MRI images. In [13] the group of researchers have taken a small dataset from [35]. From mini-MIAS(Mammographic Image Analysis Society Digital Mammogram Database) database they obtained some mammogram cases of patients, since the mammogram images of breast are quite large and bigger dimension of images would mean more time for training a model, hence they made patches out of the mammogram images randomly and then they fed it to a CNN architecture which they had customized, according to their results, these researchers got over an accuracy of 85% this would mean that once given a breast mammogram the neural network would with 85% confidence predict it as normal, benign or malignant. In another research [10] which basically revolved around the use of MRI breast images which were pre-process into taking out the smaller patches of those MRI images, not just some random regions from the MRI images, but the regions which had the tumorous region from the MRI images, then each image was resized to 50x50 dimension in order to normalize their values into range of 0 and 1. According to [10] their proposed CNN architecture was able to outperform almost all the previous CNN based models that were using whether DDSM [36] (Digital Database for Screening Mammography) dataset or the MIAS dataset, they [10] got an accuracy of almost 98%. Another group of researchers made a new architecture using the convolution neural network just for the sake of medical image analysis and named it U-Net as it has a U-shaped structure [37]. Using the U-Net architecture many researches have been done for the medical image analysis, taking this [9] research for instance, these researchers have taken a modified form of U-Net and they have called it a Dense-U-Net, with this neural network they then classified and segmented the breast cancer on mammogram images which they had taken from the DDSM, they were able to achieve an AUC score of 83.36%. Another research done under the U-Net architecture was [14] where they implemented simple U-Net in order to classify and segment the breast cancer using MRI based images of breast. They [14] were able to achieve an accuracy of 97.44%. As for our approach in order to solve the task at hand, we have implemented a Mask RCNN [26] in order to detect, classify and segment the mammogram images of breast. Our research was directly inspired from [8] because they have also detected, classified and segmented the breast tumor but they have done all of this using MRI images and their preprocessing and training techniques were quite different from ours, in this research we have implemented Mask R-CNN but rather than feeding it images in the traditional way we took the patches out of the given mammogram data this way the training was speedy compared to what time Mask RCNN usually takes when it comes to training a network, and not just that but due to us taking out the patches from the mammogram, we were able gather a dataset of almost 1.5 million images which is immensely bigger than any other dataset that has been used in the past, more discussion on the Mask RCNN in Proposed Methodology section. Below Figure 1 shows the overall steps of this research. Breast cancer detection, classification and segmentation have been done before on mammograms, MRI's and other methods of screening cancer as mentioned above. Since this was not a new task to be done hence, we had to introduce a new and more efficient method in order to achieve similar or better results while still having an edge of our proposed method being more efficient than the ones that have been implemented before us [9,10,12,14]. In order to make our approach stand out from all the other methods that have been applied before we selected the Mask RCNN [26] neural network which is particularly used for instance segmentation, along with localization and classification. Mask RCNN has been used for medical image analysis in the past, take the example of this research [8] where the breast cancer was being segmented using the Mask RCNN but on MRI images, in another research we can see that on a microscopic level the Mask RCNN was being employed to segment the nucleus of different cells [38]. Mask RCNN has also been used to classify and segment different types of cancer among which [39] stands out the most, since in this [39] research they have classified and segmented prostate cancer using the MRI images.
Mask RCNN has also been used to classify and segment different types of cancer among which [39] stands out the most, since in this [39] research they have classified and segmented prostate cancer using the MRI images. For our research we took the mammogram images from the CBIS-DDSM [40], the dataset consists of about 3100 images of breast mammogram, and alongside each mammogram we have the ROI (region of interest) where the cancer resides, one can say that the dataset is annotated in such a way that we can apply some simple image processing techniques such as the bitwise-or operation so that we could apply the ROI onto a breast mammogram and see where the cancer is. The dataset is also classified into further two types of cancers one being calcification and other being mass, but since the scope of our research does not classify whether the detected cancer has calcified tumor development or mass tumor development inside the breast muscles, which is why we have taken all the mammograms and put them together while still having them in three classes which are benign, malignant and normal, further discussion on dataset will bet in subsection 2.1 Dataset. Figure 2 shows the pre-processing done on the mammogram dataset.

Dataset CBIS-DDSM
For our research we just did not want to classify some cancerous tumor, that is being done already pretty accurately since the past many years, what we wanted to achieve with the project was to detect the tumor in its early stages, and for the breast cancer tumor detection and segmentation, the only thing that really caught the development of cancerous tumor in its nascent stages was a mammogram image [4,28], not just that, it has been researched that early detection of breast tumor through mammogram was able to reduce the fatalities among women [29][30][31][32]. For this we selected the CBIS-DDSM (Curated Breast Imaging Subset of DDSM) [40]. The dataset has a total of 3100 mammograms while each mammogram has its own ROI (region of interest) corresponding to it. An ROI here refers to the part of the mammogram where the cancerous tumor is concentrated as we can see in the given Figure 3.1 and 1.2, which shows an ROI corresponding to its breast mammogram. The given CBIS-DDSM dataset is in the DICOM format of images, DICOM is the standard image format when it comes to medical images, and since the dataset was in DICOM we had to convert to into the usual PNG format which is usually more programming-oriented format. After the conversion of these images, we saw that the mammogram and ROI images were not at all uniformly sized, the images ranged in dimensions, for the sake of normalization and uniformity we resized all the dataset into almost 2750x1100. The dataset was classified into two parts, firstly we had breast muscles that had calcification formation in the breast tissue, while the other was mass being developed in the breast tissue, then inside calcification type of breast tissues we have benign tumor and malignant tumor, in the same manner under the mass breast tissues we have benign tumor and malignant tumor. But since our research was not going to classify the calcification and mass formation inside the breast tissue hence, we discarded this feature completely from the dataset and combined the rest of the cases into two different categories, first being benign breast cases and other being the malignant breast cases. This dataset was further divided into test and train datasets, hence at the end we had four different folders two for benign training and testing cases, while others were malignant cases for training and testing.

Pre-Processing Patches from Mammogram
All the breast cancer related researches that have been done in the past particularly the ones that have employed mammogram images in them have used as much as 3000 to 4000 mammogram images in order to make a model we can see it in these researches [9,13,19]. Usually, we can see that convolution neural networks and normal neural networks require huge datasets to start generalizing and in order for them to achieve high accuracies, with that keeping in our mind we started to look for methods through which we could increase our dataset, image augmentation was the only option for us since we do not have much of open-source data for breast cancer mammogram. But soon we realized that even if we did that the size(dimension) of the mammograms is so enormous that it would take quite some GPU time, the Mask RCNN neural network itself is an enormous sized neural network and it takes quite some time to train [26]. It wasn't viable to feed such huge images to the network and keep the neural network running for weeks. Another way was to resize the images into smaller dimension but, we noticed that the ROI (region of interest) was already very small, it was not more than 200x200 dimension at most, and if we were to downscale the image any further the ROI's feature would completely disappear. But in the past we have seen researchers taking out the patches from full images and then performing some classification or segmentation on them, a few researchers that got us this idea were [11], in this paper [11] the researchers have segmented the retinal blood vessel, they have incorporated classical transfer learning but the data of retinal blood vessel which was fed into the neural network was in the form of patches, with this image processing method they were able to segment the retinal blood vessels with more than 90% AUPRC(Area Under the Precision/Recall Curve). In [10,13] the researchers have made a breast cancer classification model using Convolution Neural Network, one of which has taken mammograms as data input and other has taken breast MRI images, both these researches fed the data in patches format. In the very same manner as we had seen from these researches [10,11,13], we also implemented this, first we took the full mammogram and their corresponding ROI images, then we extracted patches from each of those mammograms along with their ROI and while we were taking patches out of each mammogram and ROI, we also augmented those patches by randomly flipping, laterally inverting and rotating them up to 30°. Preprocessing the data and then taking the patches out has been defined in detail in this research [41], where they explain how finding the tumor lesion out of a mammogram is same as finding" needle in a haystack" and how patches are an effective approach while dealing with segmentation and classification tasks. Once we processed the images and obtained the patches from those mammograms and their respective ROI's it was time for training the network.

Training
We have trained this neural network using, Nvidia 3080 Ti GPU, the framework we used was TensorFlow version 1.14.0, for image preprocessing we used the open-cv library and since our dataset was almost 1.5 million images and we were sending 500 images per epoch stochastically hence it took us

Mask RCNN
Mask RCNN is the state-of-the-art neural network which is used mainly for its capability of instance segmentation, but since the Mask RCNN is an extension and improvement over the Faster RCNN [43] which is why it is also used for the localization and lastly for classification. Mostly Mask RCNN would be seen with object detection of sorts but, since it has the ability to generalize on over such versatile datasets as mentioned in their original research [26], making it a perfect fit for most medical image analysis-based researches as well. Mask RCNN is an extended version of Faster RCNN, in the past Faster RCNN were used for drawing bounding boxes, while they also classified the given image. With the introduction of Mask RCNN, which takes the entire architecture of a Faster RCNN and adds another branch in parallel with the existing branch which is used for predicting an object mask. Mask RCNN is easy to train and many more datasets can generalize on this neural architecture. The working of Mask RCNN is quite similar to that of a Faster RCNN, both of these networks employee's region proposal network (RPN) in order to extract features from the given data, but RPN for both is quite different. Faster RCNN uses the ROIPool method for the extraction of features while Mask RCNN completely replaces it with a new concept of ROI Alignment, and consecutively uses the mask branch to mark the result of ROIAlign for the object area. Once the dataset was preprocessed, the full images of mammograms and their respective ROI's were processed into patches, and those patches were further augmented, only after that did, we start training the breast cancer model. No part of the neural architecture was changed, but the hyperparameters were tweaked, all the hyperparameters could be seen in the table 1. The value of the loss function , + + in Mask RCNN was minimized so that the model could converge and generalize perfectly. There were many experiments, where we tweaked the hyper paraments and then tried training and only after so many educated guesses and on the basis of trial-and-error method we found the best hyperparameters for our breast cancer model.

Results
Lastly, for the prediction of the trained RCNN, we took out further patches from the original images, the reason we did this was because when we were making predictions on the original mammogram image what happened was that, the ROI where the cancer usually is look quite small when we see them 4000×5000 pixel dimension, rather what would be efficient and easier is if we split the given mammogram image into 4 patches, and then make prediction on all those 4 patches, where the cancer would surely be in the first 2 or maybe even just 1 patch due.
As we can see in the Figure 1.5, the image is split into 4 patches. And from these patches we have done prediction. Below are the given results side by side with the ground truth and the predicted results from our neural network. Figure 1.6 -1.9 results show on average an accuracy of about 85%.
Furthermore, there are certain Loss values which are an easy way for us and usual researchers to monitor the convergence of Mask RCNN as previously discussed in the sub-section 2.3.1, so for our proposed methodology following are the loss value graphs, Lastly, for further results, the trained model was given 120 samples of all the cases, the cases were equally divided, meaning 40 Benign cases, 40 Malignant cases and 40 Normal cases, normal meaning that the breast tissue did not have any cancer lesion in it. With these cases in our hand, a confusion matrix was plotted in order to find out the Recall and Precision of given model. Figure 10 shows a confusion matrix.
To calculate Precision and we call, the true positive (TP) and false positive (FP) from the confusion matrix were taken, In the same manner, Recall can be calculated with the given formula, And for the F1 score, we assume that = = , hence

Literature Review
The diagnosis, detection, classification and segmentation of breast tumor using a mammogram is not at all a new concept, in fact this has been done and implemented using CAD (computer aided diagnosis) for many years, we can see in this research [4] that the researchers have reviewed the use of computers and many non-machine learning techniques in order to detect and classify the cancerous tumor inside the breast. But it wasn't as effective as machine learning and deep learning techniques, in addition to that we can see the excellence and results that deep learning has produced hence due to these past merits of deep learning we can say that due to that people now have a sense of reliability over these computer science practices being incorporated into professional medical usage. The image-based AI (artificial intelligence) has flourished after the introduction of deep learning techniques such as convolution neural network [34], throughout the history we can see a rapid growth in the medical image analysis due to the advancements due to convolution neural network. It was due to the natural curiosity that people started applying CNN in almost every field, whether it be for simple object detection, autonomous cars or medical image analysis, and CNN showed promise in each one of these fields, which is the reason why our research exists. It was not very far when we started solving problems such as pancreas classification [44], prostate cancer classification [39], retinal blood vessel segmentation [11], cell nucleus classification using Mask RCNN  [38] there are numerous other researchers where the researchers have incorporated some deep learning technique or convolution neural network algorithm to detect, classify and/or segment the given images we can see them here [19,20,21,22,23,24,25]. When we first started our research all we knew about was that a convolution neural network could be used for things such as classification, soon when we saw its applications then we came to know that it could be implemented to solve some medical imagery based problem, after some research we found some interesting researchers like [10,11,12,13,44] these are the researches that have used convolution neural network to classify and segment the cancerous tumor, though all these researches are quite different than the other, for instance take [13] they use CNN to detect and classify breast cancer tumor in mammogram images, when it comes to preprocessing they have taken out the patches of those mammograms and then they gave those patches to CNN, but when we see other research that work with deep learning technique for achieving breast cancer detect and/or classification then we see them using full images of mammograms along with some other preprocessing technique such as [9,15,18]. While doing our research to find the best fit CNN for us, we came by the U-Net [37] architecture of convolution neural network, these are most widely used for medical image analysis. We saw some interesting papers such as [9,14], in [9] the researchers were able to classify and segment the cancerous tumor on mammogram images, they also customized and proposed a neural architecture which is an extended version of U-net which they call Attention U-Net. In the other research [14] they had made use of breast MRI images that had some cancerous lesions development in them, they have also localized and segmented the region in MRI images where the cancer was being developed. We also went past some researches that had solved the breast cancer segmentation and/or classification, they had used different encoder and decoder in CNN architecture [16] while using MRI based breast images, and we also made sure that we had seen some normal artificial neural networks solving the breast cancer classification and/or segmentation and so we found two of these researches [15,18] where in [15] is an older research that make use of mammogram images and a normal ANN(artificial neural network) to predict the type of breast cancer, while [18] makes use of traditional ANN(artificial neural network) to detect and classify the breast cancer in mammogram images.
In the case of [9], we can clearly see that the research is quite mature and they have pretty such solved the breast cancer detection, classification and segmentation task using what they call Attention U-Net, but all at the same time their research is quite hard to implement, for it to implement one would have to literally code each neural network along with other aspects of it, the only good thing is that they make use of full mammogram images meaning that you won't have to preprocess the data that much, for the sake of future improvements and even if some curious researcher was going to implement same neural architecture but for different dataset classification and segmentation then that would be tedious task and on top of that one would need a powerful GPU to do that, an E2 instance would also cost quite a few dollars to keep the network running for training. Hence we then came by [8] [42] has done it. When we started training this neural network we saw that rather than giving the neural network full mammogram images and make the training slow due to the enormous dimensions of a mammogram image why don't we feed it patches of those mammograms and corresponding ROI(region of interest) and we did, we then took out the patches as we had seen before, in [11] they also had taken out the patches in a very similar fashion because they also wanted the training to be easy and fast, by doing so we were able to take out patches of 256 × 256 making our network training significantly easier and not just that we have given a novel approach to classify, localize and segment a breast cancer, and you can take almost any cancer, take out the patches from those images and train the network using TensorFlow or PyTorch it is just that easy!

Future Improvements
There's certainly much room for improvement in this

Conclusion
In conclusion Mask RCNN is proven to be an efficient method by which we can detect, classify and segment the breast cancer tumor using just the mammogram images. With an overall loss of 1.238 we can say that the RCNN converged pretty well with the given dataset, while having an overall accuracy of 85%, making it a useful methodology in breast cancer segmentation. We hope that in the future research would implement the same technique and neural network on other case studies and problems, especially in the nurturing field of bio medical. Figure 1 Rietveld re nement of the sample CPI3-10m at 400 °C a) and at room temperature b).

Figure 3
Rietveld re nement of C4PI6 sample. Absorption and steady-state luminescence spectra of CPI3-10m a) and steady-state luminescence spectrum of CPI3-1h b). In the inset, a zoom of the range 400-670 nm.

Figure 5
Raman spectra of CPI3-10h sample, λexc = 785 nm. a) Spectrum collected on a black spot, pointed out in the inset, with peaks of Cs4PbI6 (*) and CsPbI3δ phase. b)Spectrum gathered on the yellow part of the sample, pointed out in the inset. The images were obtained with optical microscope imaging. Luminescence map of the emission at 715 nm of the sample CPI3-10h: in a) a 2D map and in the inset the image by optical microscope. In b) a 3D map and in the inset the PL emission spectrum.  Time resolved luminescence measurement on two different points of CPI3-10h sample; λexc=500 nm.

Figure 10
Representative HRTEM images of the CPI3-10h sample. Lattice planes of γ-CsPbI3, δ-CsPbI3, and Cs4PbI6 are indicated in red, green and white, respectively. The 2D-FFT diffractograms used to calculate the orientations of each domain are reported in the central column.

Figure 11
Epitaxy study on the interface between γ-CsPbI3 and δ-CsPbI3, previously reported in Figure 10. γ-CsPbI3 and δ-CsPbI3 crystal domains in the HRTEM image (left) are depicted in red and green, respectively. The diffraction spots in the 2D-FFT diffractograms corresponding to the lattice planes used for the mismatch calculations are indicated according to the same color-coding.

Figure 12
Model of the phase transition during the synthesis process and image of the sample after the synthesis.