Classifier ensemble generation and selection with multiple feature representations for classification applications in computer-aided detection and diagnosis on mammography
Introduction
Breast cancer is the most common form of cancer among women and is the second leading cause of death (Kopans, 2007). To reduce the workload of radiologists and to improve the specificity and sensitivity in detection of breast cancer, two different types of automated screening systems are being developed (Suri & Rangayyan, 2006): (1) Computer-aided Detection (CADe) and (2) Computer-aided Diagnosis (CADx). Table 1 provides a brief review of CADe and CADx systems. Current CADe and/or CADx systems have been clearly shown to be quite sensitive in its ability to detect cancer, but one of their main drawbacks is the high number of FPs (defined in Table 1) (Suri and Rangayyan, 2006, Sampat, 2005). Hence, high FP rate for mass detection and diagnosis remains to be one of the major problems to be resolved in CADe/CADx study (Suri and Rangayyan, 2006, Sampat, 2005, Tang et al., 2009).
In typical CADe (or CADx) systems, classifier design is one of the key steps for determining FP rates (Suri and Rangayyan, 2006, Sampat, 2005). Thus far, research efforts have mostly been focused on the design of the single classifier in both CADe and CADx systems (Suri and Rangayyan, 2006, Sampat, 2005, Tang et al., 2009, Chan et al., 1999). It should be noted that there are two critical limitations within the classifier design process in mammogram images. First, the large variability in the appearance of mass patterns (Cheng et al., 2005, Velikova et al., 2013) – due to its irregular size, obscured borders, and complex mixtures of margin types – makes classification task quite difficult. Second, research in mammography is characterized by a restricted training data, due to cost, time, and availability to patient medical information and mammography images (Suri and Rangayyan, 2006, Bilska-Wolak and Floyd, 2004). On the other hand, the number of available features (arising from the integration of multiple heterogeneous feature types) is large (Cheng et al., 2005, Jesneck et al., 2006, Wei et al., 1997) (typically, in the thousands) relative to the number of training samples, so-called curse of dimensionality (Kuncheva, 2004). For these reasons, a single classifier design may face a great challenge in achieving a level of FP reduction that meets the requirement of clinical applications.
In this paper, to overcome the aforementioned limitations, we propose a new and novel ensemble classifier framework for classification applications (explained in Table 1) in mammographic CADe and CADx. This paper improves and extends preliminary work presented in Choi, Kim, Plataniotis, and Ro, (2012). In particular, this paper presents a new ensemble selection approach for selecting an optimal subset of base classifiers, aiming to further improve generalized (testing) classification performances. An improved ensemble generation technique is also outlined in the paper by introducing an advanced mechanism that allows the use of strong classifiers extensively used in mammography computer-aided detection and diagnosis systems. In addition, more insightful discussion of our ensemble generation on the local learner hypothesis viewpoint is provided. Moreover, we report integrated experimental results that are more extensive and rigorous in the following aspects: (1) additional assessment of our proposed ensemble classification on computer-aided diagnosis application; (2) the comparison of other state-of-the-art ensemble classification techniques; (3) comprehensive analysis using more classifier models.
The contents of the paper are organized as follows: Section 2 reviews previous work on classification of breast masses on mammograms in CADe and CADx systems. In Section 3, the region-of-interests (ROIs) segmentation and feature extraction methods used in our study are briefly described. Section 4 explains in detail the proposed ensemble classification framework. Section 5 contains the details of the image databases, and experimental setup and condition. In Section 6, we present a series of experimental results to demonstrate the effectiveness of the proposed method. Finally, concluding remarks are provided in Section 7.
Section snippets
Related work
In past years, considerable research efforts have been directed to classifier design aiming at classification applications in mammography. Wei et al. (1997) used global and local texture features extracted from manually selected ROIs of digitized mammograms, and linear discriminant analysis (LDA) to classify the masses from normal glandular tissues to minimize FP detections. Sahiner et al. (1996) proposed a convolution neural network (NN) for the task of discriminating between masses and normal
ROI segmentation and feature extraction
In typical CADe/CADx systems, segmentation of ROIs and feature extraction for generated ROIs are prerequisite steps prior to performing classification of ROIs (Suri & Rangayyan, 2006). Hence, in this section, we will briefly describe the segmentation algorithm and types of mammographic mass features used in our study before explaining our ensemble classifier. As recommended in Wei et al. (1997), Sahiner et al. (1996), Mudigonda et al., (2001), to perform a more realistic assessment of a
Proposed ensemble classifier system
Fig. 2 provides an overview of the proposed ensemble classifier framework. As shown in Fig. 2, this framework largely consists of three parts: (1) ensemble generation, (2) ensemble selection, and (3) ensemble fusion (or combination). Each of the three steps will be described in more detail in the following sections.
Data set and performance evaluation
The public Digital Database for Screening Mammography (DDSM) database (DB) (Heath, Bowyer, Kopans, Moore, & Kegelmeyer et al., 2000) was in our evaluation study. For data consistency purposes, all images were collected from the same type of scanner and resolution. We chose the scanner type Howtek 960 because a large number of cases are digitized by this type (Heath et al., 2000). All images collected from the DDSM were subsampled to 200 μm and quantized to 8 bits per pixel for computational
Evaluating classification of mass and normal tissues in CADe
The proposed ensemble classifier framework was tested on Dataset 1 described in Section 5. It should be noted that nine types of features each marked either E or E/X in the “Usage” column in Table 2 were used as different feature representations in this assessment [i.e., K (defined in Fig. 4) was set to 9]. As for base classifiers, SVM which utilizes a Radial Basis Function (Chang & Lin, 2011) (as kernel) and NN with back-propagation training algorithm (Setiono, 2001) was used.
We compared the
Discussion and conclusion
Note that several methods for classification algorithms have been developed as expert and intelligent systems in mammography (Diaz-Huerta et al., 2014, Junior et al., 2013, Nanni et al., 2012, Krishnan et al., 2010, Verma et al., 2010). However, most of these classification methods have been focused on the study of application of “the single classifier based solutions”. It has been widely accepted in Suri and Rangayyan (2006), Nishikawa (2007) and Tang et al. (2009) that mammographic mass
Acknowledgements
This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015R1A2A2A01005724).
References (69)
- et al.
Learning from unbalanced data: a cascade-based approach for detecting clustered microcalcifications
Medical Image Analysis
(2014) - et al.
A new multi-expert decision combination algorithms and its application to the detection of circumscribed masses in digital mammograms
Pattern Recognition
(2001) - et al.
Quantitative analysis of morphological techniques for automatic classification of micro-calcifications in digitized mammograms
Expert Systems with Applications
(2014) - et al.
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences.
(1997) - et al.
Multi-scaled morphological features for the characterization of mammographic masses using statistical classification schemes
Artificial Intelligence in Medicine
(2007) - et al.
Score normalization in multimodal biometric systems
Pattern Recognition
(2005) - et al.
AdaBoost with SVM-based component classifiers
Engineering Applications of Artificial Intelligence
(2008) - et al.
Background intensity independent texture features for assessing breast cancer risk in screening mammograms
Pattern Recognition Letters
(2013) - et al.
Mammographic masses characterization based on localized texture and dataset fractal analysis using linear, neural and support vector machine classifiers
Artificial Intelligence in Medicine
(2006) - et al.
A very high performance system to discriminate tissues in mammograms as benign and malignant
Expert Systems with Applications
(2012)
Current status and future directions of computer-aided diagnosis in mammography
Computerized Medical Imaging and Graphics
Automatic microcalcification and cluster detection for digital and digitised mammograms
Knowlede-Based Systems
Classifier selection for majority voting
Information Fusion
Computer-aided detection and diagnosis in mammography
On the interplay of machine learning and background knowledge in image interpretation by Bayesian networks
Artificial Intelligence in Medicine
Novel network architecture and learning algorithm for the classification of mass abnormalities in in digitized mammograms
Artificial Intelligence in Medicine
Combining multiple representations and classifiers for pen-based handwritten digit recognition
Turkish Journal of Electrical Engineering and Computer Sciences
Gene expression profile classification: a review
Current Bioinformatics
Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer
Physics in Medicine & Biology
Random forests
Machine Learning
Incorporation of an iterative, linear segmentation routine into a mammographic mass CAD system
Medical Physics
Classifier design for computer-aided diagnosis: Effects of finite sample size on the mean performance of classical and neural network classifiers
Medical Physics
LIBSVM: a library for support vector machines
Transactions on Intelligent Systems and Technology
Approaches for automated detection and classification of masses in mammograms
Pattern Recognition
Multiresolution local binary pattern texture analysis combined with variable selection for application to false positive reduction in computer-aided detection of breast masses on mammograms
Physics in Medicine & Biology
Combining multiple feature representations and adaboost ensemble learning for reducing false-positive detection in computer-aided detection of masses on mammograms
IEEE Engineering in Medicine and Biology Conference (EMBC)
An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization
Machine Learning
Detection of masses in mammograms via statistically based enhancement, multilevel-thresholding segmentation, and region selection
Computerized Medical Imaging and Graphics
Toward breast cancer diagnosis based on automated segmentation of masses in mammograms
Pattern Recognition
A concentric morphology model for the detection of masses in mammography
IEEE Transactions on Medical Imaging
Additive logistic regression: a statistical view of boosting
Annals of Statistics
AdaBoost parallelization on PC clusters with virtual shared memory for fast feature selection
The digital database for screening mammography
Segmentation of regions of interest in mammograms in a topographic approach
IEEE Transactions on Information Technology in Biomedicine
Cited by (49)
Computer-aided breast cancer detection and classification in mammography: A comprehensive review
2023, Computers in Biology and MedicineA review on image-based approaches for breast cancer detection, segmentation, and classification
2021, Expert Systems with ApplicationsFeature discovery in NIR spectroscopy based Rocha pear classification
2021, Expert Systems with ApplicationsComparison of segmentation-free and segmentation-dependent computer-aided diagnosis of breast masses on a public mammography dataset
2021, Journal of Biomedical InformaticsCitation Excerpt :We propose that these discrepancies result from some combination of differences in the segmentation techniques used, parameter tuning on small datasets in the original work, and implementation choices. Due to the importance of characteristics of the lesion margin in differentiating benign and malignant tumors, many existing CADx methods have been based on obtaining mathematical descriptions of the tumor outline [7,12–20]. Such segmentation-dependent techniques require accurate segmentation of the lesion margin in order to extract image features.
Chaos enhanced grey wolf optimization wrapped ELM for diagnosis of paraquat-poisoned patients
2019, Computational Biology and ChemistryCitation Excerpt :Feature selection (FS) is an effective approach to figure out the high dimensional space of features. Due to it efficiently reducing the redundant features to improve the accuracy of identification, FS has been applied to the wide range of fields such as text classification (Ghareb et al., 2016), emotion recognition (Atkinson and Campos, 2016), medical diagnosis (Choi et al., 2016; Sheikhpour et al., 2016) and so on. In this study, we present an efficiently and effective diagnosis framework based on gas chromatography coupled with mass spectrometry (GC–MS), Enhanced grey wolf optimization (EGWO) and ELM together, namely GEE.
Unlocking the Potential: The Crucial Role of Data Preprocessing in Big Data Analytics
2023, 2023 1st DMIHER International Conference on Artificial Intelligence in Education and Industry 4.0, IDICAIEI 2023
- 1
Present address: Department of Biomedical Engineering, Jungwon University, Chungcheongbuk-do, Republic of Korea.