A Review of Computer-Aided Expert Systems for Breast Cancer Diagnosis

Simple Summary Breast cancer is one of the most commonly diagnosed diseases in females around the world. The most threatening is when cancer spreads uncontrollably to other parts of the body and can cause death. Early detection of breast cancer lowers the risk of death among patients and enables appropriate treatments to control the progression of cancer. To diagnose breast cancer, high complex visuals of the breast tissue can be collected through histopathology images that provide informative details which validate the stage of the cancer. The aim of this study is to investigate techniques applied in histopathology images in diagnosing breast cancer. Abstract A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.


Introduction
The human body is formed of trillions of cells. 'Cancer' is a term used when a cell divides abnormally or uncontrollably, which can happen in various parts of the body. The disease type is categorised based on which part of the body cancer occurs. This situation, if left unchecked, will lead to death. Amongst the distinct types of cancer, the most common type of cancer happening in females is breast cancer. According to the World Health Organisation (WHO), breast cancer is the most frequent cancer among women, affecting 2.1 million women each year. About 627,000 women died from breast cancer in 2018, which accounted for around 15% of all cancer deaths among women [1]. In the United Kingdom, there are around 55,200 newly diagnosed breast cancer cases every year, which is about 150 every day from 2015 to 2017 [2]. From Figure 1, we can observe that breast cancer has the highest number of diagnosis incidence among all the common cancers [1]. However, it has been proven that an early detection of breast cancer can significantly increase the chances of successful treatment plan and ensure a long-term survival of the patients [3]. Statistically, if the disease is detected and diagnosed at an early stage, nearly all (98%) patients will survive for five years or more, compared to around 1 in 4 (26%) people when the disease is diagnosed at a later stage [2]. According to the most common procedure, a 'two-week wait' is the procedure to diagnose breast cancer [2]. The standard procedure to diagnose breast cancer by pathologists usually requires extensive microscopic assessment. Therefore, having an automated solution like a computer-aided diagnosis (CAD) system not only contributes to an easier diagnostic process, but also reduces the subjectivity in diagnosis.
With the advanced development of artificial intelligence, many machine learning techniques have been applied for CAD systems. This technique can potentially outperform humans and learn more efficiently with time, therefore integrating machine learning in diagnosis can supply useful knowledge to assist pathologists in evaluating and analysing enormous amounts of medical data [4]. It could also speed up the process due to the capability to process large data much faster than manual diagnosis by a pathologist [4]. Breast cancer diagnosis can be considered as a classification problem in machine learning, in which the result indicates which class of cancer it belongs to. Fundamentally, the main steps involved in developing the core of a computer-aided diagnosis (CAD) system for breast cancer are presented in Figure 2. However, it has been proven that an early detection of breast cancer can significantly increase the chances of successful treatment plan and ensure a long-term survival of the patients [3]. Statistically, if the disease is detected and diagnosed at an early stage, nearly all (98%) patients will survive for five years or more, compared to around 1 in 4 (26%) people when the disease is diagnosed at a later stage [2]. According to the most common procedure, a 'two-week wait' is the procedure to diagnose breast cancer [2]. The standard procedure to diagnose breast cancer by pathologists usually requires extensive microscopic assessment. Therefore, having an automated solution like a computer-aided diagnosis (CAD) system not only contributes to an easier diagnostic process, but also reduces the subjectivity in diagnosis.
With the advanced development of artificial intelligence, many machine learning techniques have been applied for CAD systems. This technique can potentially outperform humans and learn more efficiently with time, therefore integrating machine learning in diagnosis can supply useful knowledge to assist pathologists in evaluating and analysing enormous amounts of medical data [4]. It could also speed up the process due to the capability to process large data much faster than manual diagnosis by a pathologist [4]. Breast cancer diagnosis can be considered as a classification problem in machine learning, in which the result indicates which class of cancer it belongs to. Fundamentally, the main steps involved in developing the core of a computer-aided diagnosis (CAD) system for breast cancer are presented in Figure 2.
Conventionally, several popular machine learning algorithms applied to classification problems include naïve Bayes [5], artificial neural network [6], support vector machine (SVM) [7], and many more. However, these algorithms might not have the ability to consider issues such as imbalanced costs of misclassification within various classes, leading to undesired consequences. Recently, deep learning methods were introduced to improve on conventional machine learning methods by extracting information automatically as part of the learning process, leading to undoubtedly better solutions [8]. Deep learning was shown to outperform state-of-the-art methods in many fields of medical imaging analysis tasks. Therefore, in this paper we will discuss and compare both approaches applied to develop a CAD system for the breast cancer.  Breast cancer varies based on which part of the breast tissue becomes cancerous. Commonly, breast cancer starts in the cells that line the ducts of the breast; however, it may also grow in different areas of the breast such as the lobules, milk ducts or sometimes in between tissues, as illustrated in Figure 3 [9]. Conventionally, several popular machine learning algorithms applied to classification problems include naïve Bayes [5], artificial neural network [6], support vector machine (SVM) [7], and many more. However, these algorithms might not have the ability to consider issues such as imbalanced costs of misclassification within various classes, leading to undesired consequences. Recently, deep learning methods were introduced to improve on conventional machine learning methods by extracting information automatically as part of the learning process, leading to undoubtedly better solutions [8]. Deep learning was shown to outperform state-of-the-art methods in many fields of medical imaging analysis tasks. Therefore, in this paper we will discuss and compare both approaches applied to develop a CAD system for the breast cancer.
Breast cancer varies based on which part of the breast tissue becomes cancerous. Commonly, breast cancer starts in the cells that line the ducts of the breast; however, it may also grow in different areas of the breast such as the lobules, milk ducts or sometimes in between tissues, as illustrated in Figure 3 [9].  The term 'breast cancer' refers to a malignant tumour that has developed from cells in the breast that are considered cancerous and cause danger to health. The stage of this cancer is usually expressed as a number on a scale of 0 through IV, with stage 0 describing non-invasive cancers that are still within their original location and stage IV describing invasive cancers that have spread outside the breast [10]. In cases where cancer is detected, but no cancer cells are visible in the lymph glands, the breast cancer is of a lower risk. When spreading occurs, it carries a substantial risk of death, meaning that the cancer cells from the breast tissue have broken away, which can be carried to nearby lymph nodes by the lymph fluid (fluid that gathers waste products and drains into veins to be removed) [10]. Figure 4 demonstrates the lymph nodes around the breast [9]. invasive cancers that have spread outside the breast [10]. In cases where cancer is detected, but no cancer cells are visible in the lymph glands, the breast cancer is of a lower risk. When spreading occurs, it carries a substantial risk of death, meaning that the cancer cells from the breast tissue have broken away, which can be carried to nearby lymph nodes by the lymph fluid (fluid that gathers waste products and drains into veins to be removed) [10]. Figure 4 demonstrates the lymph nodes around the breast [9]. Breast cancer can be distinguished as benign (non-cancerous) and malignant (cancerous/metastatic) tumours. Benign tissue refers to changes in normal tissue of breast parenchyma, which does not relate to the development of malignancy [11]. Contrarily, malignant tissue can be categorised into two types: in-situ carcinoma and invasive carcinoma. Additionally, in some cases benign breast tumours can be further divided into four subclass types, adenosis, fibroadenoma, phyllodes tumour, and tubular adenoma, whereas malignant breast tumours can be further divided into ductal carcinoma, lobular carcinoma, medullary carcinoma, mucinous carcinoma, tubular carcinoma, and papillary carcinoma [12].
Histopathology (histology) image samples of breast lesions are obtained through either needles or surgical operation, which are then later processed and allocated to a glass slide to undergo a staining process. Haematoxylin and eosin (H&E) and immunohistochemistry (IHC) are the most used histopathology staining protocols [4]. This development of scanners have digitalized histopathological tissue sections and turned digital pathology into a routine practice [13]. Currently, histopathological images play a vital role in cancer diagnosis because of the large amount of information they provide for medical image analysis [14]. Whole-slide images (WSI) can have multiple regions of breast lesion tissue, whereas microscopy images are patches derived from WSI, each representing one type of breast lesion only. In this paper we have chosen to study histopathology images of breast cancer in developing a machine learning based CAD system. Figure 5 demonstrates eight classes of breast cancer from the BreaKHis dataset [15]. Breast cancer can be distinguished as benign (non-cancerous) and malignant (cancerous/metastatic) tumours. Benign tissue refers to changes in normal tissue of breast parenchyma, which does not relate to the development of malignancy [11]. Contrarily, malignant tissue can be categorised into two types: in-situ carcinoma and invasive carcinoma. Additionally, in some cases benign breast tumours can be further divided into four subclass types, adenosis, fibroadenoma, phyllodes tumour, and tubular adenoma, whereas malignant breast tumours can be further divided into ductal carcinoma, lobular carcinoma, medullary carcinoma, mucinous carcinoma, tubular carcinoma, and papillary carcinoma [12].
Histopathology (histology) image samples of breast lesions are obtained through either needles or surgical operation, which are then later processed and allocated to a glass slide to undergo a staining process. Haematoxylin and eosin (H&E) and immunohistochemistry (IHC) are the most used histopathology staining protocols [4]. This development of scanners have digitalized histopathological tissue sections and turned digital pathology into a routine practice [13]. Currently, histopathological images play a vital role in cancer diagnosis because of the large amount of information they provide for medical image analysis [14]. Whole-slide images (WSI) can have multiple regions of breast lesion tissue, whereas microscopy images are patches derived from WSI, each representing one type of breast lesion only. In this paper we have chosen to study histopathology images of breast cancer in developing a machine learning based CAD system. Figure 5 demonstrates eight classes of breast cancer from the BreaKHis dataset [15].
The main contribution of this paper is to discuss the process, methods, comparisons, and remarks on developing a CAD expert system for breast cancer. The rest of the research paper is organized as follows: Section 2 explains the publicly available datasets for breast cancer histopathology images. The process of using a computer-aided expert system using histopathology images is presented in Section 3, which includes techniques employed in (1) image pre-processing, (2a) conventional CAD methods that employ segmentation, feature extraction, feature selection (dimension reduction) and classification; (2b) deep-learningbased CAD and (3) Performance evaluation. Finally, Section 4 discusses the conclusion and future directions for researchers are given Section 5.  Figure 5. Sample of distinct types of breast cancer histopathology images from the BreaKHis dataset [15].
The main contribution of this paper is to discuss the process, methods, comparisons, and remarks on developing a CAD expert system for breast cancer. The rest of the research paper is organized as follows: Section 2 explains the publicly available datasets for breast cancer histopathology images. The process of using a computer-aided expert system using histopathology images is presented in Section 3, which includes techniques employed in (1) image pre-processing, (2a) conventional CAD methods that employ segmentation, feature extraction, feature selection (dimension reduction) and classification; (2b) deep-learning-based CAD and (3) Performance evaluation. Finally, Section 4 discusses the conclusion and future directions for researchers are given Section 5.

Datasets for Breast Cancer Classification
In the field of medical image analysis, machine learning methodologies applied for histopathological images are developing rapidly. However, there is still a demand for an automatic system to get efficient and highly accurate results [14]. To obtain a large and representative annotated dataset to develop a machine learning method for CAD system is a challenging task [16]. Recently, there has been a rise in public challenges for breast cancer diagnosis which has attracted many researchers to this area of study. This section describes various publicly accessible datasets to assist future research and development. . This dataset is the most used dataset by many researchers for CAD breast cancer in histopathology images [11,[17][18][19][20][21][22][23][24][25][26][27][28][29][30] This

Datasets for Breast Cancer Classification
In the field of medical image analysis, machine learning methodologies applied for histopathological images are developing rapidly. However, there is still a demand for an automatic system to get efficient and highly accurate results [14]. To obtain a large and representative annotated dataset to develop a machine learning method for CAD system is a challenging task [16]. Recently, there has been a rise in public challenges for breast cancer diagnosis which has attracted many researchers to this area of study. This section describes various publicly accessible datasets to assist future research and development.

Computer-Aided Diagnosis Expert Systems
CAD systems have not only produced faster diagnosis results but have also emerged as an additional opinion to assist pathologists to avoid overlooking abnormal features. This automated solution can be explained in two sub-categories:

2.
Computer-aided diagnosis (CADx) systems, which determine the distinct types of breast cancer.
There are two approaches in developing a CAD system which are the conventional method and deep learning method. The main difference between these two types of methods is that conventional CAD methods are a traditional approach of extracting the features from an image based on human-defined descriptors to perform classification. Deep learning CAD methods are types of automated learning that can discover representations of data automatically by transforming the input information into multiple layers of abstractions [8]. Figure 6 illustrates these two methods for CAD systems.

Image Pre-Processing
Image pre-processing is an effective route to apply as data preparation at the first step to make raw data more suitable for further analysis. In the case of histopathology images, the most used pre-processing technique is colour normalisation because of the colour variation obtained in these types of images and the powerful impact on the machine learning model. Data augmentation is another commonly used technique for a small dataset. In this section, the techniques of (1) colour normalisation and (2) data augmentation are presented.

Image Pre-Processing
Image pre-processing is an effective route to apply as data preparation at the first step to make raw data more suitable for further analysis. In the case of histopathology images, the most used pre-processing technique is colour normalisation because of the colour variation obtained in these types of images and the powerful impact on the machine learning model. Data augmentation is another commonly used technique for a small dataset. In this section, the techniques of (1) colour normalisation and (2) data augmentation are presented.
(1) Colour normalisation: The inconsistent various appearances of stained sections is amongst the foremost challenges to analyse histopathological images [40]. This is because the samples are collected under various inconsistent conditions of tissue slices, preparation or image acquisition, noise arising, lightning conditions, and protocols of staining while capturing the digital image [40]. Therefore, these variations could produce samples with different colour intensities [41]. Research studies [18,42] have shown the significant effect of stain normalisation that enhances the performance of breast cancer classification. Here, a few colour normalisation techniques will be investigated by categorizing them into three types of method which are global colour normalisation, the supervised method, and the unsupervised method for stain separation.
• Global colour normalisation: This method is suitable for histology images due to comprehensible values of autocorrelation coefficient or spatial dependency of pixel (intensity). This method separates colour and intensity information using principal component analysis (PCA) [43]. Reinhard et al.'s method was one of the first techniques, which uses a simple statistical analysis to achieve colour correction by comparing one image's colour boundaries and choosing it as an appropriate source image as a benchmark, applying it as characteristic to all the other images [43]. It uses an unsupervised method to heuristically estimate the absorbance coefficients for the stains for every image and the staining concentrations for every pixel to recompose the images [43]. • Supervised method for stain separation: In this method, images are converted to optical density (OD) space due to Beer's law [44] that suggests colour stains act linearly in OD space, given in Equation (1).
where V represents the intensity in OD space, I represents the intensity in RGB space, and I 0 represents the illuminating intensity incident on the sample [45]. Khan et al. proposed a method to use stain colour descriptors to compute image-specific stain matrices for stain normalisation [46]. Then, stain separation is applied to obtain different stain concentration values from the image and provide a nonlinear (spline based) mapping function; meanwhile all images will be replaced using the normalised stain channels [46].
• Unsupervised method for stain separation: Training is not required because it is expected to learn itself [47]. Macenko et al. first proposed a method to use singular value decomposition method (SVD) to obtain optical density of images to perform quantitative analysis-based colour normalisation [48]. Kothari et al. then proposed a method based on histogram specification using the quantile normalisation based on distinct colour channels obtain from images to match each image to the target image histogram colour channels [49]. Bejnordi et al. later proposed an improved version which relies solely on colour features; their algorithm makes use of spatial information to achieve robustness against severe colour and intensity variations [50]. The comparison of colour normalisation methods is provided in Table 1. Recently, due to the significant performance and stability portrayed by these proposed methodologies for colour normalisation, many researchers have adapted these popular proposed methods above as part of their colour normalisation processes. Table 2 demonstrates the methodology used by several recent research studies for breast cancer CAD systems. (2) Data augmentation: A data-space solution to the problem of limited data by enhancing the size of training datasets to generate a better learning model [56]. Tellex et al. showed that to obtain a particularly reliable performance of CAD system on histopathology images, colour normalisation should be used along with data augmentation [57]. This procedure will imply data wrapping and oversampling over the dataset to increase the sample size of the training dataset as a limited dataset and overfitting is a common challenge [56]. These processed include various image transformations to modify the image morphology [57,58]. If we were to look at one image from a single perspective and make a determination, it is more likely to be prone to error compared to if we were to look at it from several perspectives to make the final determination. Taking this into breast cancer analysis, checking the image with several more perspectives provides a more confident and accurate answer to which class it belongs to. Thus, this procedure provides a broader interpretation to the original image. The comparison of data augmentation techniques applied by several research studies is provided in Table 3.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2).

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Equation (2). Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2). Table 4 provides a summary for each commonly used segmentation tec with definition, advantages, and limitations. Figure 7 illustrates a general ov approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and a histopathology image to extract the cells in the images. It can region of interest (ROI) and highlight these significant regions i dure involves partitioning the image into non-overlapping r Equation (2).

Segmentation
The segmentation process takes part to locate t a histopathology image to extract the cells in the im region of interest (ROI) and highlight these signific dure involves partitioning the image into non-o Equation (2). Table 4 provides a summary for each common with definition, advantages, and limitations.   The segmentation proces a histopathology image to ex region of interest (ROI) and h dure involves partitioning th Equation (2).

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in      Table 3. Comparison of data augmentation.

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2).

Segmentation
The segmentation process takes part to locate t a histopathology image to extract the cells in the im region of interest (ROI) and highlight these signific dure involves partitioning the image into non-o      Table 3. Comparison of data augmentation.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in       Table 3. Comparison of data augmentation.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Equation (2). Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.      Table 3. Comparison of data augmentation.

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2).      Table 3. Comparison of data augmentation.

Cropping/Shearing
Rotation Translation Shifting Scaling Zooming Con

Segmentation
The segmentation process takes part to locate the edges and a histopathology image to extract the cells in the images. It can region of interest (ROI) and highlight these significant regions i dure involves partitioning the image into non-overlapping r   Table 3. Comparison of data augmentation.

Cropping/Shearing Rotation Translation Shifting Scaling Zooming Contrast Fill Mode
Brightness

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2).

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2)

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2 Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2)

Segmentation
The segmentation process takes part to locate t a histopathology image to extract the cells in the im region of interest (ROI) and highlight these signific

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2)

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2). ∪ = ∩ = ∅, ℎ ≠ Table 4 provides a summary for each commonly used segmentation tec with definition, advantages, and limitations. Figure 7 illustrates a general ov approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2)

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2 Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and a histopathology image to extract the cells in the images. It can region of interest (ROI) and highlight these significant regions i dure involves partitioning the image into non-overlapping r Equation (2). ∪ = ∩ = ∅, ℎ

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2). ∪ = ∩ = ∅, ℎ ≠

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image into non-overlapping regions [61,62], as seen in Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2)

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions i a histopathology image to extract the cells in the images. It can be crucial to identify th region of interest (ROI) and highlight these significant regions in the images. This proc dure involves partitioning the image into non-overlapping regions [61,62], as seen i Equation (2). ∪ = ∩ = ∅, ℎ ≠ (2 Table 4 provides a summary for each commonly used segmentation technique alon with definition, advantages, and limitations. Figure 7 illustrates a general overview of th approached techniques with some examples.

Segmentation
The segmentation process takes part to locate the edges and boundaries a histopathology image to extract the cells in the images. It can be crucial to region of interest (ROI) and highlight these significant regions in the images dure involves partitioning the image into non-overlapping regions [61,6 Equation (2). ∪ = ∩ = ∅, ℎ ≠

Segmentation
The segmentation process takes part to locate the edges and boundaries of regions in a histopathology image to extract the cells in the images. It can be crucial to identify the region of interest (ROI) and highlight these significant regions in the images. This procedure involves partitioning the image I into non-overlapping regions [61,62], as seen in Equation (2). ∪ Ii = I and    [64] to detect the seed points and feed the filtered image to a mean-shift algorithm for segmentation, followed by some morphological operations [65].  Edge-based segmentation: To obtain critical properties, this structural technique can be implemented in several methods for recognising the edges, such as Sobel [66], Watershed [67], Prewitt [68], Laplace [69], Canny [70], and LoG [64]. This process is illustrated in Figure 8.  • Dependant on selection of an effective/correct threshold value.

•
Not suitable for histology images because of the high complexity and various intensity distributions in images.

Machine learning Cluster-based
Objects in image will be categorised into specific regions (groups) based on their similarity in pixels.
• Efficient. • Easy to implement. • Solution dependant on the initialization. • Able to preserve the information. • Suitable for microscopic biopsy images.
• Need to specify number of clusters. • Sensitive to outliers.

Energy-based optimization
Contour object of interest by minimizing/maximizing a predefined cos function.

•
Requires defining an effective cost function.

Feature-based
Uses a model to train and learn the features to determine which pixels are ROI.
• Region-based segmentation: There are two main techniques which are (1) region growing and (2) region splitting and merging. Rouhi et al. proposed the application of an automated region growing for segmentation on breast tumour histology images by using an artificial neural network (ANN) to obtain a threshold [62]. Rundo et al. used split and merging algorithms based on the seed selection by an adaptive region growing procedure [63]. Lu et al. applied a multi-scale Laplacian of Gaussian (LoG) [64] to detect the seed points and feed the filtered image to a mean-shift algorithm for segmentation, followed by some morphological operations [65]. • Edge-based segmentation: To obtain critical properties, this structural technique can be implemented in several methods for recognising the edges, such as Sobel [66], Watershed [67], Prewitt [68], Laplace [69], Canny [70], and LoG [64]. This process is illustrated in Figure 8.  [64] to detect the seed points and feed the filtered image to a mean-shift algorithm for segmentation, followed by some morphological operations [65].  Edge-based segmentation: To obtain critical properties, this structural technique can be implemented in several methods for recognising the edges, such as Sobel [66], Watershed [67], Prewitt [68], Laplace [69], Canny [70], and LoG [64]. This process is illustrated in Figure 8.   [72] was applied to extract nuclear boundaries [73].

•
Threshold-based segmentation: To produce a less complex image, the main concept is to transform every pixel based on a threshold value; any pixels with intensity less than a threshold value/limit esteem T (constant value) will be replaced with black pixels (0), otherwise replaced with white pixels (1). The input image g (x, y) transformed to a threshold image f (x, y) can be represented mathematically as shown in Equation (3).
A few popular techniques applied in this approach are Otsu thresholding [74], greylevel thresholding [72], and gaussian matrix thresholding [72]. Zarella et al. proposed a scheme to segment breast nuclei from other parts of the cell using Otsu thresholding [75]. Saha et al. proposed an automatic nucleus segmentation on the image using histogrambased thresholding with a result of 97% accuracy in nucleus detection [76]. Moncayo et al. used Maximally Stable Extreme Regions (MSER) to perform segmentation on nuclei regions on the image's haematoxylin contribution map, in which several thresholds are applied to the image and areas that change minimally are identified as MSER, followed by some further morphological operations [77]. A novel approach was proposed by Khairuzzaman and Chaudhury to apply a multilevel thresholding based on Grey Wolf Optimizer (GWO) using Kapur's entropy and Otsu's between class variances functions [78]. Sirinukunwattana et al. proposed a thresholding method to group intensity features represented by a sparse coding to create a dictionary [79].

•
Cluster-based segmentation: This can be described in two clustering methods, hierarchal and partitioning [80]. Hierarchal clustering performs recursively to explore nested clusters in agglomerative (bottom to up) or divisive (top to down) ways [80], whereas partitioning clustering iteratively divides into hard clustering and fuzzy clustering [81]. Kowal et al. applied a cluster approach algorithm for nuclei segmentation from biopsy microscopic images, and achieved a high classification accuracy [82]. Kumar et al. used a k-means clustering based segmentation algorithm and mentioned that this method performs better in comparison to other commonly used segmentation methods [83]. A two-step k-means was applied for segmentation by Shi et al. to consider local correlation of pixels; they first generate a poorly segmented cytoplasm, then in a second step the segmentation does not take into account the nuclei identified during the first clustering; finally, a watershed transform was applied to complete the segmentation [84]. Maqlin et al. suggested a segmentation method based on k-means clustering algorithm to recover the missing edge boundaries based on a convex grouping algorithm, which was suitable for open vesicular and patchy types of nuclei that are commonly obtained in high-risk breast cancers [85]. • Energy-based optimization: This technique defines a cost function, and the process will minimize/maximize the function based on the object of interest (ROI) in the images. A study by Belsare et al. used a spatio-colour-texture graph cut segmentation algorithm to perform segmentation as epithelial lining surrounding the lumen [86]. Wan et al. used a combination of boundary and region information to perform a hybrid active contour method to achieve an automated segmentation of the nuclear region [87], where the energy function was defined as Equation (4).
where Z is the image to be segmented, H(φ) denotes the Heaviside function, ω represents the image domain, G = G(|∇ Z|) is the gradient of the image, and α and β are pre-defined weights for the balancing of the two terms. Zhang et al. proposed a three-phase level set method to set contour segments into groups and achieved high accuracy [88]. Jia et al. used a rough segmentation method to combine watershed and improved Gradient Vector Flow (GVF) Snake model to separate nuclei/cells in an image from the background to enhance the segmentation accuracy [89].
• Feature-based segmentation: Automatic segmentation based on feature learning has been commonly used for analysing medical images [90]. Song et al. used a multi-scale convolutional network to accurately apply segmentation of cervical cytoplasm and nuclei [91]. Xu and Huang applied a distributed deep neural network architecture to detect cells [92]. Rouhi et al. also proposed a cellular neural network (CNN) to perform segmentation by using genetic algorithm (GA) to determine the parameters [62]. Graham et al. proposed a deep learning method called the HoVer-Net which is a network that targets simultaneous segmentation and classification of nuclei based on the horizontal and vertical distance maps to separate clustered nuclei [93]. Zarella et al. trained an SVM model to learn the features to distinguish between stained pixels and unstained pixels using HSV colour space to identify regions of interest [94]. A summary of different segmentation approaches by several researchers is provided in Table 5.   • More stable and yields solutions. • Performs faster than BFO, however slower than the PSO-based method.
[79] 2015 Dictionary, thresholding • This is a mitotic cell detection system using a dictionary of cells.

Feature Extraction
Feature extraction is one of the essential steps to pick out a set of features that contain the most effective, relevant, and discriminating information and characteristics of ROI/entire images to be employed for classification. Overall, we can divide the image feature descriptors into three dimensions (shape, pattern and spectra, and density). From Figure 9, we can observe a feature taxonomy based on feature descriptor dimensions from the 3D axis [96].

based
Otsu's between class variance functions.
PSO-based method. [76] 2015 Histogram-based thresholding  Hard to determine a suitable threshold. [79] 2015 Dictionary, thresholding  This is a mitotic cell detection system using a dictionary of cells. [75] 2017 Otsu thresholding  Does not require definition of many parameters.

Feature Extraction
Feature extraction is one of the essential steps to pick out a set of features that contain the most effective, relevant, and discriminating information and characteristics of ROI/entire images to be employed for classification. Overall, we can divide the image feature descriptors into three dimensions (shape, pattern and spectra, and density). From Figure  9, we can observe a feature taxonomy based on feature descriptor dimensions from the 3D axis [96]. Figure 9. Taxonomy for feature descriptor dimensions created by Krig, S. [96].
In this section, feature extraction methods observed in the existing literature will be presented. Then, we will discuss the type of features extracted from images.  Morphological Features: Describes the details of the image regarding information in geometric aspects such as the size (radii, perimeter, and area) and shape (smoothness, compactness, symmetry, roundness, and concavity) of a cell [97]. In this section, feature extraction methods observed in the existing literature will be presented. Then, we will discuss the type of features extracted from images.

•
Morphological Features: Describes the details of the image regarding information in geometric aspects such as the size (radii, perimeter, and area) and shape (smoothness, compactness, symmetry, roundness, and concavity) of a cell [97]. • Textural Features: Collects information of various intensity of every pixel value from histology images by applying several methods to obtain a number of properties such as smoothness, coarseness, and regularity [97]. • Graph-Based Topological Features (architectural features): Describes the structure and spatial arrangement of nuclei in a tumour tissue [97]. When dealing with histopathological images, the arrangement and shape of nuclei is connected to the cancer development, therefore this architecture may be calculated using graph-based techniques [98,99]. There are many different topology-based features including the count of number of nodes, edges, edge length, and roundness factor to detect the tissues [100,101]. There are three types of common graph features: Voronoi diagram, Delaunay triangulation, and minimum spanning tree, as shown in Figure 10.  Textural Features: Collects information of various intensity of every pixel value from histology images by applying several methods to obtain a number of properties such as smoothness, coarseness, and regularity [97].  Graph-Based Topological Features (architectural features): Describes the structure and spatial arrangement of nuclei in a tumour tissue [97]. When dealing with histopathological images, the arrangement and shape of nuclei is connected to the cancer development, therefore this architecture may be calculated using graph-based techniques [98,99]. There are many different topology-based features including the count of number of nodes, edges, edge length, and roundness factor to detect the tissues [100,101]. There are three types of common graph features: Voronoi diagram, Delaunay triangulation, and minimum spanning tree, as shown in Figure 10.  a total combination of around 142 morphological and textural features, which included the size, mean, stain's standard deviation, sum, entropy, and mean of gradient magnitude image, 3 Tamura texture features, 44 grey-level run-length matrix-based textural features, and 88 cooccurrence matrix-based Haralick texture features on the MITOS-ATYPIA-14 dataset [65]. Khan et al. proposed to extract textural features of geodesic means of region covariance (RC) descriptors by calculating RC descriptors for different segmented regions, whereas a single descriptor for the whole image is derived by the geodesic geometric mean of these calculated RC on the MITOS-12 dataset [110]. Maroof et al. proposed a method of using hybrid feature space to combine colour features with morphological and texture features, and then changed the colour channel to calculate normalised and cumulative histograms in the wavelet domain on the MITOS-ATYPIA-14 dataset [111]. On the same dataset, Wan et al. applied a dual-tree complex wavelet transform (DT-CWT) to describe the images in the context of mitosis detection in breast cancer and the generalized Gaussian distribution and symmetric alpha-stable distribution parameters were used as features [108]. Tashk et al. combined features of LBP, morphometric, and statistical features extracted from mitotic candidates on the MITOS-12 dataset [112].
Recently, Mahmood et al. proposed a new methodology to imply post-processing techniques using feature extraction of HOG, LBP, statistical, and colour features to refine the detected mitosis cell as accepted or rejected through a threshold value based on the extracted features on the MITOS-ATYPIA datasets [59]. Bardou et al. developed two approaches where the first approach was an extracting local descriptors of dense scale invariant feature transform (DSIFT) features and speeded-up robust features (SURF) to be encoded by two coding models (bag of words and locality constrained linear coding) on the BreaKHis dataset [26].

Feature Selection (Dimension Reduction)
Feature selection is the selection of a subset of the relevant features used in the model construction [113]. In machine learning, what we want is to avoid feature redundancy and the 'curse of dimensionality' problem. The 'curse of dimensionality' suggests that the training data have an exceptionally low density and lead to inability to promise an accurate estimation result, which defeats the purpose of training for a high accuracy classification model. This phenomenon will eventually impact the generalization performance in a negative way, for example, unstable estimation, overfitting issues. and local convergence; the large estimation error can easily compromise the prediction advantage provided by their greater representation power [114].
Therefore, this process is crucial because popular classification methods such as artificial neural network (ANN) and support vector machine (SVM), which are highly efficient for classification problems, tend to be sensitive to the dimensionality of data [115]. Additionally, data that consist of complex features reflect on quality-related issues such as the presence of noise, outliers, missing or duplicate data, and data that are biased or unrepresentative [116]. This process to reduce dimensionality can significantly eliminate irrelevant features, while the reduction in noise in machine learning contexts can produce a more robust learning model due to the association of fewer features [116].
One of the most common traditional approaches for this process is constructing new dimensions by mapping the original feature space into a new feature space with reduced dimensions. Common techniques used in this process are principal component analysis (PCA) and using a Pearson correlation matrix to construct a hierarchical representation of the data [115]. Other techniques include linear discriminant analysis (LDA), independent component analysis, and manifold learning. These techniques take the auto-covariance to solve the problem by transforming the high dimensional correlated feature set to a reduced feature set with lower dimensions.
However, these traditional approaches focus on choosing the most relevant features but disregard the fundamental interdependent structure of the features [117]. Recently, popular approaches have used heuristic search methods to select essential features from original feature space by applying methods like genetic algorithm, simulated annealing, boosting, and particle swarm optimization. In [117] and [118], the authors proposed to apply a particle swarm optimizer (PSO) as the feature selection method in reducing the high dimensionality. In [119], the authors applied a genetic algorithm (GA) to select the best features and perfect parameter values of the machine learning classifiers. These recent proposed methods focus on the disregarded fundamental interdependent structure of the features from traditional features selection method. Tambasco Bruno et al. reduced their feature space by using an analysis of variance (ANOVA) [107].

Classification
The final set of features will then be fed as input to a classifier to estimate the breast cancer classes. The following are a few commonly applied classification methods.

•
Nearest Neighbour: A non-parametric approach which falls under supervised learning widely used for both pattern recognition and classification applications [120]. The algorithm predicts each new point being input to the closest distance frame arrival point in the data; the calculation for distance varies but Euclidean distance is a common approach [121]. Let p and q be two datapoints of n-dimensions, then distance between x and y can be expressed by Euclidean distance shown in Equation (5).
Then, the algorithm compares the distance between points and classifies it into different categories [121]. Kumar et al. applied a k-nearest neighbour classifier to classify cancer and non-cancerous biopsy breast images, and have suggested that this classifier performs the best among their studies [83]. Murtaza et al. experimented with six different machine learning classifiers and showed that the KNN algorithm performs the best [20].
• Support Vector Machine (SVM): Vapnik et al. proposed this method which works by mapping input information (feature vectors) to a higher dimensional space to obtain a hyperplane that can separate the labels/classes [122]. An optimal hyperplane can be obtained by maximizing the distances between support vectors (the data points closest to the boundary of the class) of two classes [123][124][125]. Recently, several research studies on breast cancer using histopathology images were performed by applying SVM classifiers [15,30,31,54]. Korkmaz and Poyraz proposed a classification framework focusing on minimum redundancy, maximum relevance feature selection, and least square SVM (LSSVM); their results claimed to be 100% accurate with only four false negatives for benign tumours in a three-class problem; however, no further evaluation was performed [126]. Chan and Tuszynski applied SVM classifier on their fractal features; their results achieved 97.9% F-score for magnification level 40× on the BreaKHis dataset [23]. Bardou et al. have also applied an approach of SVM to classify the images using handcrafted features [26]. • Artificial Neural Network (ANN): ANN is inspired by human perception that can models complex nonlinear functions. The basic architecture of ANN starts by receiving input data x i , calculating each of the pieces of input information by multiplying to its corresponding weight w ij , and obtaining a weighted output f (x j ), with the support of a defined activation function until reaching the output layer. Figure 11 below demonstrates the basic structure of a single neuron in a feed-forward ANN [127]. Kassani et al. applied a multi-layer perceptron classifier on four different benchmark datasets and achieved the highest accuracy of 98.13% [19].
models complex nonlinear functions. The basic architecture of ANN starts by receiving input data , calculating each of the pieces of input information by multiplying to its corresponding weight , and obtaining a weighted output f (xj), with the support of a defined activation function until reaching the output layer. Figure 11 below demonstrates the basic structure of a single neuron in a feed-forward ANN [127]. Kassani et al. applied a multi-layer perceptron classifier on four different benchmark datasets and achieved the highest accuracy of 98.13% [19]. Figure 11. Basic structure of feed-forward ANN [127].
 Decision Tree: A decision tree algorithm is a supervised learning method for classification derived from the concept of 'divide and conquer' methodology. A complete decision tree is built based on feature space and labels; every new prediction will traverse from the root to the leaf node to produce an output. Asri et al. applied classification by using the C4.5 algorithm, an approach with a total of 11 features, and obtained 91.13% accuracy [128]. The extreme gradient boosting (XGBoost) is a new tree-based algorithm that has been increasing in popularity for data classification recently, and has proved to be a highly effective method for data classification [129]. Vo et al. have also applied gradient boosted trees as their breast cancer detection classifier [18].  Bayesian Network: Bayesian network (BN) calculates probabilistic statistics to form a representation of relationships among a set of features space using an acrylic graph as shown in Figure 12, along with the value of conditional probabilities for each feature [130]. This type of classifier is commonly used for calculating probability estimations rather than predictions [116]. Figure 11. Basic structure of feed-forward ANN [127].
• Decision Tree: A decision tree algorithm is a supervised learning method for classification derived from the concept of 'divide and conquer' methodology. A complete decision tree is built based on feature space and labels; every new prediction will traverse from the root to the leaf node to produce an output. Asri et al. applied classification by using the C4.5 algorithm, an approach with a total of 11 features, and obtained 91.13% accuracy [128]. The extreme gradient boosting (XGBoost) is a new tree-based algorithm that has been increasing in popularity for data classification recently, and has proved to be a highly effective method for data classification [129]. Vo et al. have also applied gradient boosted trees as their breast cancer detection classifier [18]. • Bayesian Network: Bayesian network (BN) calculates probabilistic statistics to form a representation of relationships among a set of features space using an acrylic graph as shown in Figure 12, along with the value of conditional probabilities for each feature [130]. This type of classifier is commonly used for calculating probability estimations rather than predictions [116].  Ensemble Classifier: This approach simply combines a few classifier methods instead of using a single classifier to produce a more accurate result. Commonly used methods to build an ensemble classifier are bagging, boosting, and random subspace method [131]. T.K. Ho proposed a random subspace classifier, in which a random feature subset is picked up from the original dataset for training each classifier; a voting scheme is then applied to produce a unique output from the from all the outputs in the combined classifiers [132]. Alkassar et al. applied an ensemble classifier that chooses the maximum score of prediction that includes a combination of decision tree, linear and quadratic discriminant, logistic regression, naive Bayes, SVM, and KNN [22].

Deep Learning CAD Methods
Following the recent advancements of deep learning (DL) that have shown a broad potential with state-of-the-art performance, many researchers have been approaching the process of feature extraction and selection using this automated technique. This improved

•
Ensemble Classifier: This approach simply combines a few classifier methods instead of using a single classifier to produce a more accurate result. Commonly used methods to build an ensemble classifier are bagging, boosting, and random subspace method [131]. T.K. Ho proposed a random subspace classifier, in which a random feature subset is picked up from the original dataset for training each classifier; a voting scheme is then applied to produce a unique output from the from all the outputs in the combined classifiers [132]. Alkassar et al. applied an ensemble classifier that chooses the maximum score of prediction that includes a combination of decision tree, linear and quadratic discriminant, logistic regression, naive Bayes, SVM, and KNN [22].

Deep Learning CAD Methods
Following the recent advancements of deep learning (DL) that have shown a broad potential with state-of-the-art performance, many researchers have been approaching the process of feature extraction and selection using this automated technique. This improved approach combines learning and decision making by applying unsupervised learning upon different deep neural network architecture designs. It combines learning the features in histopathology images and classifying the images in one high complex architecture model. This process is often referred to as a black box and it can be complex to understand how deep learning works, i.e., how did the model come to this decision and what was involved in the learning process.
The deep learning approach is based on convolutional neural networks (CNN) to enable a deeper level of exploration and broaden the capability of a model to perform classification on breast cancer histology images. They are able to build a complex level of non-linear mapping of input and output by utilising cascaded convolutional layers. They are considered as a unique type of neural network where instead of having weights for each input, the weights are shared and are convolved across the input as a moving window [133]. They are computational models that are composed of multiple processing layers to retrieve features from raw data with multilevel representations and hierarchical abstraction [8]. A typical CNN consists of convolutional layer, activation function, pooling layer, and output layer. An example of a standard CNN model architecture with two feature stages is shown below in Figure 13 [134].  Figure 13. Basic structure of a CNN architecture [134].


Training from scratch: This method requires a large amount of input on histopathology images of breast cancer to train the CNN model. It requires more effort and skills to achieve a reliable performance CNN model when it comes to selecting hyperparameters such as learning rate, number of layers, convolutional filters and more, which can be a challenging task. This implementation also requires a high GPU processing power to perform training as CNN training can be time consuming because of the complex architecture [142].  Transfer learning: Most publicly available datasets for breast histology images are considered as small datasets for training a deep learning model, which can be highly prone to overfitting due to the inferior performance of generalizability. The transfer learning method provides a solution to this by performing transfer knowledge tasks on the model based on a source domain that provides a large amount of sample data to the target domain. Pre-trained models can sufficiently prepare the small-scale histology dataset in a deep learning model. It can be used to: (1) perform as a baseline model, which uses the architecture of the pre-trained network and builds the model from scratch by random initialization of weights [143]; (2) perform as a feature extractor, which extracts key features and the outputs which go into the convolutional Figure 13. Basic structure of a CNN architecture [134].

•
Training from scratch: This method requires a large amount of input on histopathology images of breast cancer to train the CNN model. It requires more effort and skills to achieve a reliable performance CNN model when it comes to selecting hyperparameters such as learning rate, number of layers, convolutional filters and more, which can be a challenging task. This implementation also requires a high GPU processing power to perform training as CNN training can be time consuming because of the complex architecture [142]. • Transfer learning: Most publicly available datasets for breast histology images are considered as small datasets for training a deep learning model, which can be highly prone to overfitting due to the inferior performance of generalizability. The transfer learning method provides a solution to this by performing transfer knowledge tasks on the model based on a source domain that provides a large amount of sample data to the target domain. Pre-trained models can sufficiently prepare the smallscale histology dataset in a deep learning model. It can be used to: (1) perform as a baseline model, which uses the architecture of the pre-trained network and builds the model from scratch by random initialization of weights [143]; (2) perform as a feature extractor, which extracts key features and the outputs which go into the convolutional base are fed directly to the classifier without modifying any weights or convolutional parameters [143]; and (3) perform fine tuning where weights will be passed into the designed network from the pre-trained network by fine tuning the layer or performing partial training of the network [143]. Figure 14 illustrates the transfer learning approach. Bayramoglu et al. proposed two different CNN architectures: single-task CNN is used to predict malignancy and multi-task CNN is used to predict both malignancy and image magnification level simultaneously [17]. Gandomkar et al. proposed a two-step classification in which they first used a deep residual network (ResNet) with 152 layers trained for classifying patches from the images as benign or malignant for each magnification factor [27]. Then, they used the same pre-trained model to further classify the breast cancer sub-classes. Han et al. proposed a method class structure-based deep convolutional neural network (CSDCNN) based on GoogLeNet for eight-class classification of breast histopathological slides and have shown that their accuracy was higher for fine-tuning in comparison with training from scratch [25]. Spanhol et al. also adopted AlexNet and achieved a better result than a machine learning model trained with hand-crafted textural descriptors [24]. Alom et al. proposed a binary and multi classification for breast cancer methods using the Inception Recurrent Residual Convolutional Neural Network (IR-RCNN) model and achieved 99.05% (for binary) and 98.59% (for multi) classification [11].
Toğaçar et al. proposed a novel method called BreastNet using CNN model architecture that adopted a multi-layer perception (MLP) as classifier [21]. Mahmood et al. performed a score-level fusion of Resnet-50 and Densenet-201 for classification [59]. Bardou et al. experimented with a second approach to apply a CNN model, and their results showed that deep learning approaches outperformed handcrafted features [26]. Sudharshan et al. have also shown their record of achieving the highest patient recognition rate (Prr) using a multiple-instance learning-based convolutional neural network (CNN) [30]. The research by Rakhlin et al. applied ResNet-50, InceptionV3, and VGG-16 models for feature extraction and a gradient boosting tree as classifier in their proposed methodology [53]. Shallu and Mehra applied transfer learning and demonstratef that pre-trained CNNs are good substitutes for the CNNs trained from scratch for the diagnosis of breast cancer using histopathology [60]. This is because training a CNN from scratch might take a lot more time, complexity, and effort to fine tune the model, especially if it has limited numbers of samples to train, whereas a pre-trained model does not suffer from this limitation. With this, a pre-trained CNN on the ImageNet [135] database provides a larger sample to feed into a CNN model to extract features more accurately and efficiently on histopathological images [144].
Cai et al. adopted modified faster-RCNN (regional convolutional neural network) for detecting mitosis cells using the Resnet-101 network pre-trained on ImageNet database to extract features for classification [145]. Mahmood et al. has also adopted the region-based Bayramoglu et al. proposed two different CNN architectures: single-task CNN is used to predict malignancy and multi-task CNN is used to predict both malignancy and image magnification level simultaneously [17]. Gandomkar et al. proposed a two-step classification in which they first used a deep residual network (ResNet) with 152 layers trained for classifying patches from the images as benign or malignant for each magnification factor [27]. Then, they used the same pre-trained model to further classify the breast cancer sub-classes. Han et al. proposed a method class structure-based deep convolutional neural network (CSDCNN) based on GoogLeNet for eight-class classification of breast histopathological slides and have shown that their accuracy was higher for fine-tuning in comparison with training from scratch [25]. Spanhol et al. also adopted AlexNet and achieved a better result than a machine learning model trained with hand-crafted textural descriptors [24]. Alom et al. proposed a binary and multi classification for breast cancer methods using the Inception Recurrent Residual Convolutional Neural Network (IRRCNN) model and achieved 99.05% (for binary) and 98.59% (for multi) classification [11].
Togaçar et al. proposed a novel method called BreastNet using CNN model architecture that adopted a multi-layer perception (MLP) as classifier [21]. Mahmood et al. performed a score-level fusion of Resnet-50 and Densenet-201 for classification [59]. Bardou et al. experimented with a second approach to apply a CNN model, and their results showed that deep learning approaches outperformed handcrafted features [26]. Sudharshan et al. have also shown their record of achieving the highest patient recognition rate (Prr) using a multiple-instance learning-based convolutional neural network (CNN) [30]. The research by Rakhlin et al. applied ResNet-50, InceptionV3, and VGG-16 models for feature extraction and a gradient boosting tree as classifier in their proposed methodology [53]. Shallu and Mehra applied transfer learning and demonstratef that pre-trained CNNs are good substitutes for the CNNs trained from scratch for the diagnosis of breast cancer using histopathology [60]. This is because training a CNN from scratch might take a lot more time, complexity, and effort to fine tune the model, especially if it has limited numbers of samples to train, whereas a pre-trained model does not suffer from this limitation. With this, a pre-trained CNN on the ImageNet [135] database provides a larger sample to feed into a CNN model to extract features more accurately and efficiently on histopathological images [144].
Cai et al. adopted modified faster-RCNN (regional convolutional neural network) for detecting mitosis cells using the Resnet-101 network pre-trained on ImageNet database to extract features for classification [145]. Mahmood et al. has also adopted the region-based CNN technique named Faster R-CNN [146] to perform detection of mitotic cells in breast cancer histology images [59]. The general architecture of Faster R-CNN consists of an extracted feature map from input image, followed by the generation of region proposal network (RPN) [147] and a classification network of deep CNN that detects the final mitotic cells as output. Vo et al. proposed a model called Inception-ResNet-v2 that combines CNNs of Inception and ResNet to train and extract visual features from multi-scale images to achieve both global and local features from breast tumours and feed them into a gradient boosting classifier [18]. George et al. proposed an approach for breast cancer diagnosis, which extracts features from nuclei based on a pre-trained set of CNN, namely, AlexNet, ResNet-18, and ResNet-50, on random patches obtained from histology images and finally classifies them with a SVM classifier [29]. Another study by Spanhol et al. proposed a method that combines a modified AlexNet and DeCAF [148] (or deep) features extraction that is based on reusing a previously trained CNN only as feature vectors, which is then used as input for a classifier trained only for the new classification task [149]. A method named Biopsy Microscopic Image Cancer Network (BMIC_Net) by Murtaza et al. has applied pre-trained AlexNet as feature extractor [20].
Budak et al. proposed a novel method that uses a fully convolutional network (FCN) transform from AlextNet as an encoder for high-level feature extraction; the output of the FCN will then be transformed to a one-dimensional sequence for classification using Bi-LSTM [28]. A recent model named Long Short-Term Memory (LSTM) [150] based on a recurrent neural network (RNN) was introduced and has increased in popularity due to its powerful ability. The authors adopted this architecture that combines a bidirectional RNN (Bi-RNN) that handles two sources of information and LSTM for classifying breast cancer [28]. Alkassar et al. used the Xception and DeseNet to perform extraction on shallow and deep features from breast histology images [22]. Araujo et al. combined a CNN model to extract features and a SVM classifier to perform breast cancer classification [31]. One of the most promising developed deep learning models was the lymph node assistant (LYNA) algorithm based on Inception-v3 by the researchers of Naval Medical Center San Diego and Google AI [55]. They adopted the Inception-v3 network because this model has been shown to achieve greater than 78.1% accuracy on Stanford's ImageNet dataset. Their results have successfully achieved a receiver operating characteristic area under curve (AUC) of 99% and a tumour-level sensitivity of 91% at 1 false positive per patient [55].

Performance Evaluation
To demonstrate the effectiveness of a CAD system for breast cancer diagnosis, it is important that we can evaluate our approaches to understand the performance of the system quantitatively as well as inspecting the underlying problems to be improved. In medical imaging diagnostic tests, sensitivity and specificity parameters are widely used to evaluate performance [40]. Other commonly used metrics for diagnosis evaluation are F1-measure [19,23,59,84,148,151], precision [19,59,60,148], accuracy [11,[18][19][20][21][22][24][25][26][27][28][29]31,53,60,83,128,130,131] and receiver operating characteristics (ROC). The parameters are mostly calculated depending on the true positive (TP), true negative (TN), false positive (FP), and false negative (FN), which are the numbers of pixels corresponding to the parameters. To explain these variables, the true positive (TP) implies the number of patients who are predicted to be suffering cancer and are suffering. True negative (TN) implies the number of patients predicted to be not suffering cancer and in fact they are not suffering. False positive (FP) implies the number of patients who are predicted as cancer patients but in fact they are not suffering from cancer. False negative (FN) is the number of patients predicted as not cancer patients but in fact, they are suffering from cancer [40]. The following shows each parameter used for evaluating the performance of a classification model and its calculation formula [106].

•
Sensitivity represents the percentage of positive numbers of samples classified correctly. The formula to calculate this is shown in Equation (6).
The formula to calculate this is shown in Equation (7).
Speci f icity (%) = TN TN + FP × 100 (7) • Accuracy represents the percentage of correct classification rate. The formula to calculate this is shown in Equation (8).
Accuracy(%) = TP + TN Number o f Samples × 100 (8) • Precision also known as PPV (Positive Predictive Value) represents the statistical variability measurement (total number of positive results). The formula to calculate this is shown in Equation (9).
• Recall represents the proportion of negative numbers of samples classified correctly.
The formula to calculate this is shown in Equation (10).
• F1-measure represents the weighted mean of precision and recall. The formula to calculate this is shown in Equation (11).
Besides the evaluation metrics, another useful technique in visualizing the performance of a classifier, specifically multiclassification, is by using receiver operating characteristics (ROC). This type of visual is represented in a two-dimensional graph to generate information on the trade-off of the true positive rate (sensitivity) and the false positive rate (1-specificity) within different thresholds. When evaluating a multi-class problem, each class generates different ROC graphs for comparison. To understand the ROC curve, we examine the area under the curve (AUC) to determine the capability of the features extracted for training a classifier. The larger area AUC indicates reliable performance of the model.
Recognition rate is also calculated to represent the multi-class performance on machine learning algorithms by measuring the patient-wise diagnosis [24]. The parameter is calculated as Equations (12) and (13).  Table 6 demonstrates the comparison of different methods, datasets used, and evaluation results by different researchers for breast cancer diagnosis systems.

Discussion and Conclusions
In this review paper, various techniques and approaches applied in every process of a CAD system have been discussed. In this section, we will discuss the importance of each process in developing a CAD system, the impact of the CAD system, and applicability of the system in the real world. To build a reliable CAD system, histopathology images that are being fed into a machine learning model perform better when they are refined and normalised. However, this process should not demolish any key features and biological tissues contained in the histopathology images. Diverse quality and intensities in images will directly affect how a machine learning model learns. Therefore, the consistency and accuracy of a machine learning CAD system depends highly on image qualities.
When it comes to segmentation, one of the critical conditions is to consider the problem of overlapping and obstructed boundaries in a histopathology image. A good segmentation technique will resolve this issue without demolishing any geometrical features. Based on our review, we can conclude that machine learning approaches perform better in tackling these issues in segmentation, for example using clustering, energy optimization, and feature-based techniques. As discussed earlier, the most common descriptive features extracted from histopathology images are morphological features, textural features, and graph topological features. Then, feature selection is applied to reduce the dimensionality of the set of features extracted to only select and focus on features that produce high impacts on the classification task. However, another approach is to use CNN topology to automatically learn the features from a histopathology image which is considered as a deep learning approach for feature extraction and selection. This approach for feature learning has been observed to perform better than manual descriptive features.
Based on this review study, the classifiers with the highest accuracy are SVM, ANN (CNN), and ensemble learning with ranges from 97.13-99% accuracy. Of course, the previous steps and design methodology of each of these approaches vary, but these classifiers are the ones that significantly contributed to the final performance. SVM models are highly efficient for non-linear decision boundaries with various kernels to select from. They are also high robustness techniques to tackle overfitting issues and dealing with high dimensional space. CNN derived from the ANN category are autonomous solutions that learn and gather information and knowledge from the images to make a decision. They are a powerful tool in analysing and processing data from grid-like topology [8] which includes images at pixel level. Ensemble learning is a robust approach as it uses multiple classifiers instead of one. It solves the issues of bias and invariance in classification task, which provides a more reliable output considering multiple factors.
The achievement and contribution of a CAD system impacts pathologists in examining breast cancer in several ways. As the process to examine a histology image requires time and effort under the microscope, it can be challenging as this type of visual contains highly complex patterns for a human eye to examine. Each process of a CAD system not only assists visualisation for pathologists, but it also provides a verification on each decision made. The impacts are discussed as follows: • Image enhancement: Original histology images may contain visuals like noise, colour variation, intensity variation, low pixelation or more because of the staining processing during image acquisition. It is challenging to focus on the target area; therefore, image processing plays a role in standardising and improving the quality of histology images.

•
Detecting the cells or nuclei: Segmentation procedure assists in locating and identifying every cell in the image. This plays a role in obtaining the accurate region of interest to further measure the existence of cancer in the cell.

•
Learning the features: This process of feature extraction provides the geometrical information of the detected cell which will be later considered as knowledge to determine the possibility of cancer. The CNN approach on this matter provides a robust solution with automated learning. • Justification on diagnosis results: There always exists a situation where pathologists might examine an incorrect result due to several factors such as lack of experience, heavy workload, human error, or miscalculation. Thus, a CAD system can provide a second perspective on or verification of the diagnosis results by pathologists under the microscope assessment. • Fast diagnosis results: As discussed in this paper, one of the benefits of a CAD system is to help breast cancer patients in early diagnosis to treat it before it progresses to more advanced stages. Pathologists often face challenges while diagnosing breast cancer because it requires an extensive amount time, effort, and process to perform microscope examination on histology images, therefore a CAD system can efficiently provide a faster solution. • Improve productivity: The advancement of machine learning techniques produces higher productivity in a pathologist's microscope examination and possible reduction of the number of false negatives associated with morphologic detection of tumour cells for deep learning techniques [55].
Although CAD systems provide an optimistic aspect in assisting the medical image analysis and better performance, there exist several research gaps on using a CAD system in a real-world environment. Due to these limitations, it is challenging to adopt these tools in a real-world environment.

•
Data limitations: Working with complex and large amounts of medical data can be challenging as they require high processing power and huge memory storage. Machine learning, especially deep learning, requires a large amount of data to train the model to produce a reliable and correct result. Some of the research papers acquired small datasets from private institutions, which are more likely to perform differently when being used in the real-world hospital environment. For publicly available datasets, most of them are considered as small datasets which are also most likely not applicable when it comes to performing in a real-world environment. Looking at the largest public dataset, for example the BreaKHis dataset, it does not satisfy the condition of a dataset with enough patient samples. Therefore, existing CAD systems do not have sufficient knowledge learned that is ready to be applicable in the real-world environment.

•
Bias and imbalance class: This problem among datasets can lead to undesired classification for the diagnosis result. When a CAD system is built upon a dataset with imbalanced classes, the results will be more likely to be biased and therefore produce wrong diagnosis. When a trained model is biased to a specific class due to the imbalanced dataset it destroys the reliability of a CAD system because it will increase the rate of wrong classification. There are solutions to deal with problems like these by applying oversampling, undersampling, and algorithm-level methods [152]. Therefore, there are insufficient investigations performed on solutions that show significant improvement for imbalanced data to be able to practically use it in hospitals.
This paper has presented the detailed process of designing a machine learning computer-aided diagnosis expert system for breast cancer on histopathology images using both conventional and deep learning approaches. Publicly available histopathology images have also captivated the interest of many developers and researchers in exploring the possibilities in the datasets. Machine-learning-based CAD systems have contributed a promising performance when compared to a diagnosis performed by a pathologist using a microscope. The advancement of deep learning has also remarkably outperformed the conventional approach on feature learning and capability of a CAD system. Analysing medical datasets of breast histopathological images is a challenging task due to differences and artifacts during image acquisition and because of the complex images. Therefore, techniques developed for analysing breast histology images require robustness to overcome all underlying variations. This review paper has explored the most recent developments in breast cancer diagnosis systems and provides a comparison overview of accuracy, benefits, disadvantages, and techniques employed by different researchers. A general review has been presented on techniques applied for classifying breast cancer, existing challenges, and the future direction of computer-aided diagnosis (CAD) systems for breast cancer.

Future Directions
Although many research studies presented in this paper have showed particularly reliable performance, there are still further possibilities to be explored in the future scope to further improve breast cancer diagnosis systems.

•
Recently, the investigation and proposed CNN models have been increasing to provide an efficient solution to solve task-specific problems. In the future, there is always space for a new and more powerful CNN model that combines and utilises all the existing CNN's good characteristics to be discovered. For breast cancer classification, specifically a model that performs segmentation on cancer and non-cancerous regions.

•
Most research studies focus on the indicators of accuracy and performance metrics while developing a diagnosis system. However, when it comes to the applicability in real-world hospital environments the performance is undefined. Problems like class imbalance and large-scale diagnosis systems require extensive investigation in unpredictable real-world environments to obtain reliable CAD systems. Therefore, further investigation needs to be performed and will require many years of clinical practice of a CAD system in the real-world environment to constantly adapt and improve to be able gain credibility for clinical adoption in the future. • Currently, the development of pre-trained CNNs on histopathology breast cancer image datasets does not exist. Most of the current research studies apply feature extraction that uses pre-trained CNNs on the general ImageNet data. Therefore, future researchers can explore building a large-scale pre-trained CNN focusing on breast cancer histopathology images that is task specific to assist breast cancer diagnosis.

•
In recent research studies, the authors in [26] have applied feature descriptors of scale invariant feature transform (DSIFT) features and speeded-up robust features (SURF). However, oriented fast and rotated brief (ORB) features have outperformed both SIFT and SURF [151]. In the future, further analysis can investigate the ORB features on a breast cancer classification task. • It will be important to investigate a reliable-performance CAD system over a longer period with various settings to understand the strengths and weaknesses to ensure the confidence and reliability of the system to be integrated in practical healthcare in the future of medical diagnosis.

•
Recently, new algorithms like eXtreme Gradient Boosting (XGBoost) [153] have shown increased popularity because of their reliable performance and can be experimented with and integrated in CAD systems. • Developing a mobile-based compatible expert system for breast cancer diagnosis to provide further convenience for more users to access, especially those with limited access to computer-based systems. Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.