Global optimization of case-based reasoning for breast cytology diagnosis
Introduction
Case-based reasoning (CBR) is a problem-solving technique that is similar to the decision making process that human beings use in many real-world applications. It often shows significant promise for improving the effectiveness of complex and unstructured decision making. In theory, there is no possibility of overfitting in CBR because it uses specific knowledge of previously experienced problems rather than their generalized patterns. CBR is maintained in an up-to-date state because the case-base is updated in real-time, which is a very important feature for the real-world application. Also, it can explain why it provides a solution by presenting similar old cases. Consequently, it has been applied to various problem-solving areas including engineering, finance, marketing, and medical diagnosis. In particular, CBR is very appropriate for medical applications because the characteristics of CBR fit to medical domains very well. In usual, medical knowledge is incomplete, so medical applications put more stress on real cases than applications in other domains. In addition, the explanation capability of CBR can be more important in medical domains because it can be used as the helpful information source for decision makers (i.e. medical doctors).
Despite its various advantages, CBR has been criticized because its prediction accuracy is usually much lower than the accuracy of other AI techniques, especially artificial neural networks (ANNs). Thus, there have been many studies to enhance the performance of CBR. Among them, the mechanisms to enhance the case retrieval process such as the selection of the appropriate feature subsets (Cardie, 1993, Domingos, 1997, Siedlecki and Sklanski, 1989, Skalak, 1994), instance subsets (Chiu, 2002, Kelly and Davis, 1991, Liao et al., 2000, Shin and Han, 1999, Wettschereck et al., 1997), the determination of feature weights (Babu and Murty, 2001, Huang et al., 2002, Lipowezky, 1998, Sanchez et al., 1997, Skalak, 1993, Yan, 1993), and the number of neighbors that combine (Ahn et al., 2003, Lee and Park, 1999) have been most frequently studied.
One of the state-of-the-art techniques for CBR is simultaneous optimization of these parameters in CBR. Most prior research tried to optimize these parameters independently. However, we can easily imagine that the global optimization model for CBR which considers these parameters simultaneously may improve the prediction results.
This study proposes a novel hybrid approach that optimizes three parameters of CBR simultaneously by genetic algorithms (GAs) – (1) the weights of the features, (2) the training instances, and (3) the number of neighbor cases that combine. To validate the usefulness of our model, we apply it to the real-world case of breast cytology diagnosis via digital image analysis, and review the results produced by our model.
The rest of the paper is organized as follows. Section 2 briefly reviews prior studies, and Section 3 proposes our research model, the simultaneous optimization of feature weights, relevant instances and the number of neighbors that combine by the GA approach. In the next section, the explanation for the research design and experiments are presented, and Section 5 describes all the empirical results and their meanings. In the final section, the conclusions of the study are presented.
Section snippets
Prior research
In this section, we first review the general concept of CBR. After that, we examine the previous research to optimize it. We also review the recent studies regarding simultaneous optimization of several parameters for CBR systems. In the end, we examine the GA approach – the key method for simultaneous optimization – in detail.
Global optimization of feature weights, instance selection, and the number of neighbors that combine using genetic algorithms
This study proposes a novel CBR model whose feature weighting, instance selection, and k parameter of k-NN are optimized globally, in order to improve prediction accuracy of typical CBR systems. Our model employs GA to select a relevant instance subset and to optimize the weights of each feature and the number of neighbors that combine simultaneously using the reference and the test case-base. We call it GOCBR (Global Optimization of feature weights, instance selection, and the number of
Application data
In general, there are three available methods for diagnosing breast cancer: mammography, fine needle aspirate (FNA) with visual interpretation, and surgical biopsy. Among them, surgical biopsy is known to be the most accurate method, however it is invasive, time consuming, and costly. Thus, diagnosis systems based on digital image analysis that allow an accurate diagnosis without the need for a surgical biopsy are considered as a realistic alternative. FNA involves using a small gauge needle to
The results of GA-optimized CBRs
Table 2 shows the finally selected parameters of each model. As a result of GOCBR, we obtain 25 optimal weights of each feature and 176 optimal training instances to maximize the prediction result for the test set. Because there are totally 343 training samples, GOCBR selects about 51.31% from the total case-base as an optimal instance subset. As we can see from Table 2, GOCBR selects fewer instances than FISCBR (51.60%) and FWISCBR (79.02%), but it selects more instances than ISCBR (47.52%).
Comparison of the prediction performances
Conclusions
We have proposed a new hybrid CBR model using GA – GOCBR. Our proposed model optimizes feature weighting, instance selection, and the number of neighbors that combine simultaneously. By selecting optimal instances, it may reduce noises or distorted cases which lead erroneous prediction. Our model may also find appropriate nearest neighbors for CBR by applying optimal feature weights to similarity calculation, which may enhance the prediction accuracy. In addition, it generates prediction
References (33)
- et al.
Comparison of genetic algorithm based prototype selection schemes
Pattern Recognition
(2001) Using decision trees to improve case-based learning
In Proceedings of the 10th International Conference on Machine Learning
(1993)A case-based customer classification approach for direct marketing
Expert Systems with Applications
(2002)- et al.
GA based CBR approach in Q&A system
Expert Systems with Applications
(2004) - et al.
Prototype optimization for nearest-neighbor classification
Pattern Recognition
(2002) - et al.
Nearest neighbor classifier: Simultaneous editing and feature selection
Pattern Recognition Letters
(1999) - et al.
A case-based reasoning system for identifying failure mechanisms
Engineering Applications of Artificial Intelligence
(2000) Selection of the optimal prototype subset for 1-NN classification
Pattern Recognition Letters
(1998)- et al.
Prototype selection for the nearest neighbour rule through proximity graphs
Pattern Recognition Letters
(1997) - et al.
Case-based reasoning supported by genetic algorithms for corporate bond rating
Expert Systems with Applications
(1999)