Elsevier

Information Fusion

Volume 4, Issue 1, March 2003, Pages 3-10
Information Fusion

Automatic model selection in cost-sensitive boosting

https://doi.org/10.1016/S1566-2535(02)00100-8Get rights and content

Abstract

This paper introduces SSTBoost, a predictive classification methodology designed to target the accuracy of a modified boosting algorithm towards required sensitivity and specificity constraints. The SSTBoost method is demonstrated in practice for the automated medical diagnosis of cancer on a set of skin lesions (42 melanomas and 110 naevi) described by geometric and colorimetric features. A cost-sensitive variant of the AdaBoost algorithm is combined with a procedure for the automatic selection of optimal cost parameters. Within each boosting step, different weights are considered for errors on false negatives and false positives, and differently updated for negatives and positives. Given only a target region in the ROC space, the method also completely automates the selection of the cost parameters ratio, tipically of uncertain definition. On the cancer diagnosis problem, SSTBoost outperformed in accuracy and stability a battery of specialized automatic systems based on different types of multiple classifier combinations and a panel of expert dermatologists. The method thus can be applied for the early diagnosis of melanoma cancer or in other problems in which an automated cost-sensitive classification is required.

Introduction

Most of the predictive classification tools that we may expect to apply in the future for real-world applications, e.g. for automated diagnosis systems from biomedical data in a distributed setting, will have to incorporate into the learning process the appropriate cost parameters in order to drive the system towards the optimal performance in terms of sensitivity and specificity. Control of both these error measures is critical [1], [24].

However the usefulness of simply adapting a cost-sensitive mechanism within a good predictive data mining tool, in our case the Adaboost algorithm [16], is limited. Given a data set, a specific choice of the cost parameters will determine a model with a specific pair of sensitivity and specificity values, e.g. a point in the ROC space. The costs of a false negative or of a false positive in a binary medical classification problem are often estimated only approximately: as they may influence significantly the classifier accuracy one is often left with the doubt that, in order to reach the minimum specified performance, altering the costs is more effective than refining the model. At the same time, the learning procedure will also depend on the prior probabilities of the classes, thus adding training material at fixed costs may produce a model with different sensitivity and specificity.

A further complication arises whenever the classification process is split in several phases. For example, in a screening phase a high sensitivity test is required in order to recognize the highest number of positive cases, while later a more specific test (e.g. a visit by a more experienced practitioner) may be administered to these positives. It is unclear whether the cost parameters should be defined for the whole process or differently for the two tests.

What remains in any case effective is the definition of pairs of sensitivity and specificity constraints as a target of the classification process.

In this paper we discuss how to develop a good cost-sensitive classification algorithm, which is independent as much as possible from a precise definition of cost parameters and from class imbalance. We complete our methodology with a practical search procedure to get into, or as close as possible to, a target region in the sensitivity-specificity space. The aim is to wrap all of the cost-sensitive boosting learning cycle with a model selection procedure. As a cost-sensitive algorithm, we will present in this paper a variant of the AdaBoost algorithm [16]. The basic AdaBoost algorithm allows to develop systems with high accuracy, but misclassification analysis on different output classes were not originally included within the training mechanism. However, it is still possible to build a good cost-sensitive variant of AdaBoost which differently optimizes the model for the two classes. In our variant, cost-sensitive boosting is achieved by (A) weighting the model error function with separate costs for false negative and false positives errors, and (B) updating the weights differently for negatives and positives at each boosting step.

Similar approaches have been described elsewhere. In particular, a cost-sensitive variant of AdaBoost was adopted for AdaCost [14]: based on the assumption that a misclassification cost factor has to be assigned for each training data, the weights are increased in case of misclassification or decreased otherwise according to a non-negative function of the costs. A different model error function than in (A) is considered, as we focus on explicit weighting in terms of sensitivity and specificity. Karakoulas and Shawe-Taylor [23], have also introduced a similar approach based on misclassification costs constant for all the samples in a class. Their procedure leads to increase the weights of false negatives more than false positives and, differently from our approach, to decrease the weights of true positives more than true negatives.

We applied the procedure in a medical diagnosis task: a classification model for assisting the screening of skin lesions was developed on real data, requiring sensitivity greater than 0.95 and specificity greater than 0.50. Our AdaBoost variant (sensitivity-specificity tuning boosting, SSTBoost) allowed a remarkable improvement over previous results on the same data set obtained with a combination of classifiers specifically designed for the task [5]. An improvement was also found in the control of variability (standard deviation of error estimates). The combined strategy of SSTBoost resulted more effective than applying an external cost criterion to AdaBoost, as documented in [31].

The paper is organized as follows. The next Section 2 briefly introduces the melanoma classification problem which inspired our approach. The SSTBoost method is described in Section 3. The approach is evaluated on the melanoma data in Section 4. Section 5 concludes the paper.

Section snippets

Automatic melanoma classification

Melanoma is one of the most dangerous skin cancers. About 91% of the skin cancer deaths are due to this tumor. Its incidence is constantly increasing worldwide. The early diagnosis is the key factor for its prognosis, but early melanoma lesions can have a benign appearance. Digital epiluminescence microscopy (D-ELM) is a non-invasive clinical technique that allows the visualization of several colorimetric and morphological characteristics of the skin lesions, providing additional diagnostic

AdaBoost and SSTBoost

In this section we describe first the basic Adaboost learning procedure [9], [15], [16], [17], [18], [26]. Given a training data set L={(xi,yi)}, with i=1,…,N, where the xi are input vectors (numerical, categorical or mixed) and the yi are class labels taking values −1 or 1, the discrete Adaboost classification model outputs the sign of an incremental linear combination of different realizations of a base classifier. Each realization is trained on a weighted version of L, and it is obtained

Application to the melanoma data

We applied the procedure described in Boxes 2 and 3 to develop a model for effective early melanoma diagnosis. The goal was the development of a tool for supporting the discrimination between malignant and benign lesions in accordance with application-specific constraints based on the MEDS data set described in Section 2.

Box 3: The SSTBoost tuning procedure

.

  • -

    Given a target region A

  • -

    Set ω1=1, ωmin=0, ωmax=2

  • -

    Train Hω1(x) as in Box 2 and use cross-validation to estimate φH1)

  • -

    For i=1,…,M

    • 1.

      If φHi)∈

Conclusions

We have developed a methodology for cost-sensitive classification which extends boosting into a classification tool for automated diagnosis with self-tuning properties. Given a required minimal performance, in terms of a target region for sensitivity and specificity, we have indicated and tested a procedure for selecting the optimal ω, i.e. such that the corresponding model reaches or goes as close as possible to the accuracy goals.

The introduction of a cost parameter ω both within the

Acknowledgements

The authors wish to thank B. Caprile (ITC-irst) for illuminating discussions on boosting magic, and S. Forti and C. Eccher (ITC-irst), M. Cristofolini and P. Bauer (Department of Dermatology, S. Chiara Hospital––Trento) for their significant collaboration on the melanoma diagnosis application. CF thanks M. Bauer and A. Bergamo for correcting minor errors in a previous version.

References (32)

  • L. Breiman

    Bagging predictors

    Machine Learn.

    (1996)
  • L. Breiman

    Combining predictors

  • T. Dietterich

    An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization

    Machine Learn.

    (2000)
  • B. Efron, R. Tibshirani, An introduction to the bootstrap, vol. 57 of Monographs on Statistics and Applied Probability....
  • B. Efron, R. Tibshirani, Cross-validation and the bootstrap: estimating the error rate of a prediction rule, Tech....
  • F. Ercal et al.

    Neural network diagnosis of malignant melanoma from color images

    IEEE Trans. Biomed. Engng.

    (1994)
  • View full text