Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model

Automated breast cancer multi-classification from histopathological images plays a key role in computer-aided breast cancer diagnosis or prognosis. Breast cancer multi-classification is to identify subordinate classes of breast cancer (Ductal carcinoma, Fibroadenoma, Lobular carcinoma, etc.). However, breast cancer multi-classification from histopathological images faces two main challenges from: (1) the great difficulties in breast cancer multi-classification methods contrasting with the classification of binary classes (benign and malignant), and (2) the subtle differences in multiple classes due to the broad variability of high-resolution image appearances, high coherency of cancerous cells, and extensive inhomogeneity of color distribution. Therefore, automated breast cancer multi-classification from histopathological images is of great clinical significance yet has never been explored. Existing works in literature only focus on the binary classification but do not support further breast cancer quantitative assessment. In this study, we propose a breast cancer multi-classification method using a newly proposed deep learning model. The structured deep learning model has achieved remarkable performance (average 93.2% accuracy) on a large-scale dataset, which demonstrates the strength of our method in providing an efficient tool for breast cancer multi-classification in clinical settings.

However, automated breast cancer multi-classification still faces serious obstacles. The first obstacle is that the supervised feature engineering is inefficient and laborious with great computational burden. The initialization and processing steps of supervised feature engineering are also tedious and time-consuming. Meaningful and representative features lie at the heart of its success to multi-classify breast cancer. Nevertheless, feature engineering is an independent domain, task-related features are mostly designed by medical specialists who use their knowledge for histopathological image processing 7 . E.g., Zhang et al. 8 applied a one class kernel principal component analysis (PCA) method based on hand-crafted features to classify benign and malignant of breast cancer histopathological images, the accuracy reached 92%. Recent years, general feature descriptors used for feature extraction have been invented, e.g., scale-invariant feature transform (SIFT) 9 , gray-level co-occurrence matrix (GLCM) 10 , histogram of oriented gradient (HOG) 11 , etc. However, feature descriptors extract merely insufficient features for describing histopathological images, such as low-level and unrepresentative surface features, which are not suitable for classifiers with discriminant analysis ability. There are several applications that use general feature descriptors on binary classification for histopathological images of breast cancer. Spanhol et al. 12 used a breast cancer histopathological images dataset (BreaKHis), then provided a baseline of binary classification recognition rates by means of different feature descriptors and different traditional machine learning classifiers, the range of the accuracy is 80% to 85%. Based on four shape and 138 textual feature descriptors, Wang et al. 13 realized accurate binary classification using a support vector machine(SVM) 14 classifier. The second obstacle is that breast cancer histopathological images have huge limitations. Eight classes histopathological images of breast cancer are presented in Fig. 1. These are fine-grained high-resolution images from breast tissue biopsy slides stained with hematoxylin and eosin (H&E). Noticeably, different classes have subtle differences and cancerous cells have high coherency 15,16 . The differences of same class images' resolution, contrast, and appearances are always in greater compared to different classes. In addition, histopathological fine-grained images have large variations which always result in difficulties for distinguishing breast cancers. Finally, despite such effective performance in the medical imaging analysis domain by deep learning 7 , existing related methods only studied on binary classification for breast cancer 8,12,13,17,18 ; however, multi-classification has more clinical values.
To provide an accurate and reliable solution for breast cancer multi-classification, we propose a comprehensive recognition method with a newly proposed class structure-based deep convolutional neural network (CSDCNN). The CSDCNN has broken through the above mentioned barriers by leveraging hierarchical feature representation, which plays a key role for accurate breast cancer multi-classification. The CSDCNN is a non-linear representation learning model that abandons feature extraction steps into feature learning, it also bypasses feature engineering that requires a hand-designed manner. The CSDCNN adopts the end-to-end training manner that can automatically learn semantic and discriminative hierarchical features from low-level to high-level. The CSDCNN is carefully designed to fully take into account the relation of feature space among intra-class and inter-class for overcoming the obstacles from various histopathological images. Particularly, the distance of feature space is a standard for measuring the similarities of images; however, the feature space distance of samples from the same class may be larger than the samples from different classes. Therefore, we formulated some feature space distance constraints integrated into CSDCNN for controlling the feature similarities of different classes of the histopathological images.
The major contributions of this work can be summarized in the following aspects: • An end-to-end recognition method by a novel CSDCNN model, as shown in Fig. 2, is proposed for the multi-class breast cancer classification. The model has high accuracy and can reduce the heavy workloads of pathologists and assist in the development of optimal therapeutic schedules. Automated multi-class breast cancer classification has more clinical values than binary classification and would play a key role in breast cancer diagnosis or prognosis; however, it has never been explored in literature.

Figure 1.
Eight classes of breast cancer histopathological images from BreaKHis 12 dataset. There are great challenging histopathological images due to the broad variability of high-resolution image appearances, high coherency of cancerous cells, and extensive inhomogeneity of color distribution. These histopathological images were all acquired at a magnification factor of 400.
• An efficient distance constraint of feature space is proposed to formulate the feature space similarities of histopathological images by leveraging intra-class and inter-class labels of breast cancer as prior knowledge. Therefore, the CSDCNN has excellent feature learning capabilities that can acquire more depicting features under histopathological images.

Materials.
To evaluate the performance of our method, two datasets that include BreaKHis 12 and BreaKHis with augmentation of breast cancer histopathological images with ground truth are used. Firstly, our method is evaluated by extensive experiments on a challenging large-scale dataset -BreaKHis. Secondly, in order to evaluate the multi-classification performance more qualitatively, we utilize an augmentation method for oversampling imbalanced classes. The augmentation is done on the training set, then validation and a testing phase are used for the real world data in patient-wise. The details about the two datasets are as follows: BreaKHis. BreaKHis  BreaKHis with augmentation. In this study, BreaKHis is augmented by a data augmentation method to boost the multi-classification performance and resolve the imbalanced class problem. Based on the standard method in machine learning domain 19 , the augmentation method is only done on the training set, so the augmentation is only used for training, then validation and a testing phase are used for the real world data in patient-wise. In details, we first split the whole dataset based on patient-wise into training/validation/testing set, then augmented the training examples based on the ratios of imbalanced classes.
Evaluation. Reliability and generalization. First, to make the results to be more reliable, we split the datasets based on patient-wise into three groups: training set, validation set, and testing set. This results in 61 train/validation subjects and 21 test subjects. The training set accounts for 50% of the two datasets, which uses for training the CSDCNN model and optimizing connection parameters of different neurons. The validation set is used for model selection, while the testing set is used for the testing of multi-classification accuracy and model reliability.
The patients of the three-fold are non-overlapping and all experiment results are average accuracy from five cross Recognition rates. Assessing the multi-classification performance of machine learning algorithms in medical image dataset, there are two computing methods to access the results 17 . First, the decision is patient level. Let N p be the number of total patients, and N np be the number of cancer images of patient P. If N rp images are correctly classified, patient score can be defined as Then the global patient recognition rate is Second, we evaluated the recognition rate at the image level, not considering the patient level. Let N all be the number of cancer images of the validation or testing set. If N r histopathological images are correctly classified, then the recognition rate at the image level is Performance. The whole multi-classification accuracy of our method are very high with a reliable performance, as shown in Fig. 3. The average accuracy of the patient level is 93.2%, while image level is 93.8% for all magnification factors. The validation set and testing set have almost the same accuracy, which represents that the CSDCNN model has generalization and the ability to avoid overfitting. The performance of two training  strategies of CSDCNN from scratch and CSDCNN from transfer learning are shown in Fig. 4, which demonstrates the accuracy of transfer learning is better than training from scratch. The CSDCNN based on the data augmentation method achieves enhanced and remarkable performance via different comparison experiments, as shown in Table 2. In comparison with several popular CNNs, the CSDCNN achieves the best results. The AlexNet 20 proposed by Alex Krizhevsky is the first prize of classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2012 (ILSVRC12), which achieved about 83% accuracy in the binary classification of breast cancer histopathological image 17 . LeNet 21 is a traditional CNN proposed by Yann LeCun. LeNet is used for the handwritten character recognition with high accuracy. In comparison with the two datasets, our augmentation methods improved about 3-6% accuracy in different magnification factors, which demonstrates that raw available histopathological images cannot meet the requirements of the CNNs. Besides, the former layers merely learn low-level features that only include simple and obvious information, such as colors, textures, edges. With the model going deep, our CSDCNN can learn high-level features that are rich in easiness discrimination information, as shown in the feature learning process of the testing block in Fig. 2.
Even in the binary classification, the CSDCNN outperforms the state-of-the-art results of existing works, as shown in Table 3. The accuracy of our method is about 10% and 7% higher than the best results of the prior methods in patient level and image level, respectively. In particular, the average recognition rates for patient level are enhanced to 97%. Meanwhile, the experimental results also show that the ability of feature learning for our model is better than traditional feature descriptors, such as parameter-free threshold adjacency statistics (PFTAS) 22 , and gray-level co-occurrence matrix (GLCM) 10 .
Experimental tools and time consumption. The CNN models are trained on Lenovo ThinkStation, Intel i7 CPU, NVIDIA Quadro K2200 GPU, and the Caffe 23 framework. The training phase took about one hour and thirteen minutes, and ten hours and ten thirteen minutes under the BreaKHis and BreaKHis with augmentation datasets, respectively. The test phase with a single mini-batch took about 0.044 s; The training of binary classification took about 50 minutes and 10 hours 16 minutes under the binary dataset, and the testing of a single mini-batch took about 0.053 s. Data augmentation algorithms were executed on Matlab 2016a.

Discussion
It is the first time that automated multi-class classification for breast cancer is investigated in histopathological images and the first time that we propose the CSDCNN model, which achieved reliable and accurate recognition   Table 2. Multi-classification results of comparison experiments based on the raw dataset (Raw) and augmented dataset (Aug).
rates. By validating the challenging dataset, the performance in the above section confirms that our method is capable of learning higher level discriminating features and has the best accuracy in multi-class breast cancer classification. Although high-resolution breast cancer histopathological images have fine-grained appearances that bring about great difficulties in the multi-classification task, the discriminative power of the CSDCNN is better than traditional models. Furthermore, the performance of CSDCNN is very stable in multi-magnification image groups. The model has greater applicable value in clinical diagnosis and prognosis of breast cancer. Since primary-level hospitals or clinics face a desperate shortage of professional pathologists, our work would be extended to an automated breast cancer multi-classification system for providing scientific, objective and concrete indexes. It is a great advantage that the CSDNN classifies the whole slide images (WSI). The CSDCNN preserves fully global information of breast cancer histopathological images and avoids the limitations of patch extraction methods. Although patch-based methods are common occurrence 17,24,25 ; however, it brings up an obvious disadvantage that pathologists have to make biomarkers for the cancerous region because the region of cancerization is only a fraction of breast cancer histopathological images. E.g., Fig. 5 are high-resolution breast cancer histopathological images, the area that is separated by the yellow boxes represent the regions of interest (RoI), which are always solely the cancerous region. However, while the patches are smaller than the WSI, non-cancerous patches will lead to deviations of the parameter learning, that is, deep models will think the non-cancerous region as a cancerous region when training. Hence, only the area that separated by the yellow boxes meet the needs of deep learning models. Under the large-scale medical image dataset, pathologists will waste much time and effort, and the labeling errors will increase the noise of the training sets. Therefore, we carefully use WSI as the model input, which will reduce the workload of pathologists and improve the efficiency of clinical diagnosis.
Multi-classification has more clinical values than binary classification because multi-classification provides more details about patients' health conditions, which relieves the workloads of pathologists and also assists the doctors to make more optimal therapeutic schedules. Furthermore, although CNNs inspired by Kunihiko Fukushima 26,27 , has been used for medical image analysis, e.g., image segmentation 28,29 image fusion and registration [30][31][32] , but there still exists a lot of room for improvement of medical data in comparison with the computer vision domain 7,[33][34][35][36] . Therefore, in this study, an optimal training strategy based on transfer learning from natural images is used to fine-tune the multi-classification model, which is a common manner for deep learning model used in medical imaging analysis.  Table 3. Our model achieves the state-of-the-art accuracy (%) in the binary classification task. Comparison with mean recognition rates of the classifiers trained with different descriptors: parameter-free threshold adjacency statistics (PFTAS) 22 and gray-level co-occurrence matrix (GLCM) 10 are traditional feature descriptors. Quadratic discriminant analysis (QDA) 38 , support vector machine (SVM) 14 , 1-nearest neighbor (1-NN) 39 and random forests (RF) 40 are traditional classifiers.

Methods
The overall approach of our method is designed in a learning-based and data-driven multi-classification manner. The CSDCNN is achieving learning-based manner by structured formulation and prior knowledge of class structure, which can automatically learn hierarchical feature representations. The CSDCNN is achieving data-driven manner by the augmentation method, which reinforces the multi-classification method to obtain more reliable and efficient performance. Therefore, the overall method develops an end-to-end recognition framework.
The CSDCNN architecture. The CSDCNN is carefully designed as a deep model with multiple hidden layers that learn inherent rules and features of multi-class breast cancer. The CSDCNN is layer-by-layer designed as follows: • Input layer: this layer loads whole breast cancer histopathological images and produces outputs that feed to the first convolutional layer. The input layer is designed to resize the histopathological images as 256 × 256 with mean subtraction. The input images are composed of three 2D arrays in the 8-bit depth of red-greenblue channels. • Convolutional layer: this layer extracts features by computing the output of neurons that connect to local regions of the input layer or previous layer. The set of weights which is convolved with the input is called filter or kernel. The size of every filter is 3 × 3, 5 × 5 or 7 × 7. Each neuron is sparsely connected to the area in the previous layer. The distance between the applications of filters is called stride. The hyperparameter of stride is set to 2 that is smaller than the filter size. The convolution kernel is applied in overlapping windows and initializes from a Gaussian distribution with a standard deviation of 0.01. The last convolutional layer is composed of 64 filters that initialize from Gaussian distributions with a standard deviation of 0.0001. The values of all local weights are passed through ReLU (rectified linear activation). Specifically, in comparison with various off-the-shelf " network, GoogLeNet 35 is picked out as our basis network. GoogLeNet is the first prize of multi-classification and detection in ILSVRC14. GoogLeNet has significantly improved the classification performance with 22 layers deep network and novel inception modules.

Constraint formulation.
High precision multi-classifier with loss is the last and crucial step in this study.
Softmax with loss is used as a multi-class classifier that is extended from the logistic regression algorithm in the task of binary classification to multi-classification.
Mathematically, the training set includes N histopathological images: In this study, the class k of breast cancer is eight. For a concrete x i , we use the hypothesis function to estimate the probability of the x i belonging to class j, the probability value is p(y i = j|x i ). Then, the hypothesis function h θ (x i ) is Where 1{y i = j} is a indicator function, and 1{y i = j} is defined as The loss function in equation (5) measures the degree of classification error. During training, in order to converge the error to zero, the model continues to adjust network parameters. However, in fine-grained multi-classification, equation (5) aims to squeeze the images from the class into a corner in the feature space. Therefore, the intra-class variance is not preserved 15 . To address this limitation, we improve the loss function of softmax classifier by formulating a novel distance constraint for feature space 15 .
Scientific RepoRts | 7: 4172 | DOI:10.1038/s41598-017-04075-z Theoretically, given four different classes of breast cancer histopathological images: x i , + p i , − p i , and n i as input, where x i is a specific class image, + p i is the same sub-class as x i , − p i represent the same intra-class as x i , and n i represents the inter-class. Ideally, hierarchical relation among the four images can be described as follows: i i i i i i 1 2 Where D is the Euclidean distance of two classes in the feature space. m 1 and m 2 are hyperparameters, which control the margin of feature spaces. Then the loss function is composed with the hinge loss function: Where m 1 < m 2 . Meanwhile, the output of CSDCNN is inserted into the softmax loss layer to compute the classification error J(x, y, θ). Finally, we can rewrite the novel loss function by combining equation (5) and equation (8) as follows: Where λ is the weight factor controlling the trade-off between two types of losses, we control 0 < λ < 1, and the weight term λ is finally set to 0.5 which achieved optimal performance by cross validation. We optimize equation (9) by a standard stochastic gradient descent with momentum.
Workflow overview. Our overall workflow can be understood as three top-down multi-classification stages, as shown in Fig. 2. We describe the steps as follows: • Training stage: the goal of the training stage is to learn the sufficient feature representation and optimize the distance of different classes' feature space. After importing four breast cancer histopathological images ( i i i i ) at the same time, the CSDCNN first learns the hierarchical feature representation during training and share the same parameters of weights and biases. The high-level feature maps then enter into  2 normalizations. The outputs of the four branches are transmitted to maximize the Euclidean distance of interclass and minimize the distance of intra-class. Finally, the two types losses are optimized jointly by a stochastic gradient descent method. • Validation stage: the validation stage aims to fine-tune hyperparameters, avoid overfitting, and select the best model between each epoch for testing. The validation process presented the optimal multi-classification model of the breast cancer histopathological images, as illustrated in the validation block of Fig. 2. • Testing stage: the testing stage aims to evaluate the performance of the CSDCNN. Feature learning process of CSDCNN is shown in the testing block of Fig. 2. After the first step of the input layer, low-level features that include colors, textures, shape can be learned by the former layers. Via repeated iterations of high-level layers, discriminative semantic features can be extracted and inserted into a trainable classifier.
Finally, We tried two training strategies. The first one is training the "CSDCNN from scratch", that is, directly train CSDCNN on BreakHis dataset. Another one is based on transfer learning that initially pre-trains CSDCNN on imagenet 37 , then fine-tunes it on BreakHis. The "CSDCNN from scratch" performed worse on recognition rates, so we chose valuable transfer learning as the final strategy. In addition, the base learning rate of CSDCNN was set to 0.01 and the number of training iterations was 5K, which had the best accuracy from the validation and test set.

Data augmentation.
We utilize multi-scale data augmentation and over-sampling methods to avoid overfitting and unbalanced classes problem. The training set is augmented by 1) intensity variation between −0.1 to 0.1, 2) rotation with −90° to 90°, 3) flip with level and vertical direction, and 4) translation with ±20 pixels. We also adopt a random combination of intensity variation, rotation, flip, and translation. Since the classes of breast cancer are imbalanced due to a large amount of ductal carcinoma, which meets the Gaussian distribution and clinical regularity, we use an over-sampling manner by the above augmentation methods to control the number of breast cancer histopathological images of each class.