Skip to main content
Log in

Direct estimation of class membership probabilities for multiclass classification using multiple scores

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Accurate estimation of class membership probability is needed for many applications in data mining and decision-making, to which multiclass classification is often applied. Since existing methods for estimation of class membership probability are designed for binary classification, in which only a single score outputted from a classifier can be used, an approach for multiclass classification requires both a decomposition of a multiclass classifier into binary classifiers and a combination of estimates obtained from each binary classifier to a target estimate. We propose a simple and general method for directly estimating class membership probability for any class in multiclass classification without decomposition and combination, using multiple scores not only for a predicted class but also for other proper classes. To make it possible to use multiple scores, we propose to modify or extend representative existing methods. As a non-parametric method, which refers to the idea of a binning method as proposed by Zadrozny et al., we create an “accuracy table” by a different method. Moreover we smooth accuracies on the table with methods such as the moving average to yield reliable probabilities (accuracies). As a parametric method, we extend Platt’s method to apply a multiple logistic regression. On two different datasets (open-ended data from Japanese social surveys and the 20 Newsgroups) both with Support Vector Machines and naive Bayes classifiers, we empirically show that the use of multiple scores is effective in the estimation of class membership probabilities in multiclass classification in terms of cross entropy, the reliability diagram, the ROC curve and AUC (area under the ROC curve), and that the proposed smoothing method for the accuracy table works quite well. Finally, we show empirically that in terms of MSE (mean squared error), our best proposed method is superior to an expansion for multiclass classification of a PAV method proposed by Zadrozny et al., in both the 20 Newsgroups dataset and the Pendigits dataset, but is slightly worse than the state-of-the-art method, which is an expansion for multiclass classification of a combination of boosting and a PAV method, on the Pendigits dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agui T, Nakajima M (1991) Graphical information processing. Morikita Press, Tokyo

    Google Scholar 

  2. Asuncion A, Newman DJ (2007) UCI Machine Learning Repository.

  3. Bennett PN (2000) Assessing the calibration of naive Bayes’s posterior estimates. In: Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University, pp 1–8

  4. Caruana R, Niculescu-Mizil A (2005) Predicting good probabilities with supervised learning. In: Proceedings of the American methodology conference (AMS2005), San Diego

  5. Chan YS, Ng HT (2006) Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of the 21st international conference on computational linguistic and the 44th annual meeting of the ACL (ICCL’06 and ACL’06), pp 89–96

  6. Cheeseman P, Stutz J et al (1995) Bayesian classification (AutoClass): theory and results. In: Fayyad UM (eds) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, pp 61–83

    Google Scholar 

  7. Devarakota PR, Mirbach B, Ottersten B (2007) Confidence estimation in classification decision: a method for detecting unseen patterns. In: Proceedings of the sixth international conference on advance topics in pattern recognition (ICAPR), Kolkata, India

  8. Fragoudis D, Meretakis D, Likothanassis S (2007) Best terms: an efficient feature-selection algorithm for text categorization. Knowl Inf Syst (KAIS) 8(1): 16–33

    Article  Google Scholar 

  9. Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2004) Survey methodology. Wiley, Hoboken

    MATH  Google Scholar 

  10. Iwai N, Sato H (eds) (2002) Japanese values and behavioral patterns in JGSS. Yuhikaku Publishing, Tokyo

    Google Scholar 

  11. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the tenth European conference on machine learning (ECML’98), pp 137–142

  12. Jones R, Rey B, Madani O, Griner W (2006) Generating query substitutions. In: Proceedings of the 15th international world wide web conference (WWW’06), pp 387–396

  13. Kita K (1999) Language and computing: volume 4 probabilistic language model. University of Tokyo Press, Tokyo

    Google Scholar 

  14. Kogure A, Sagae M (2005) Estimating probability density from percentiles. Stat Math 53(2): 375–389

    MathSciNet  Google Scholar 

  15. Kressel U et al (1999) Pairwise classification and support vector machines. In: Schölkopf B (eds) Advances in kernel methods support vector learning. MIT Press, Cambridge, pp 255–268

    Google Scholar 

  16. Langford J, Zadrozny B (2005) Estimating class membership probabilities using classifier learners. In: Cowell RG, Ghahramani Z (eds) Proceedings of AISTATS05, Society for Artificial Intelligence and Statistics, pp 198–205

  17. Margineantu DD (2002) Class probability estimation and cost-sensitive classification decisions. In: Proceedings of the 13th European conference on machine learning (ECML’02), pp 270–281

  18. Miwa S, Kobayashi D (eds) (2008) 2005SSM survey series: no. 1 basic analysis of 2005 SSM survey in Japan, 2005SSM Survey Research Group

  19. Niculescu-Mizil A, Caruana R (2005a) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning (ICML’05), pp 625–632

  20. Niculescu-Mizil A, Caruana R (2005b) Obtaining calibrated probabilities from boosting. In: Proceedings of the 21st international conference on uncertainty in artificial intelligence (UAI’05), pp 413–420

  21. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2/3): 103–134

    Article  MATH  Google Scholar 

  22. Ohkura T, Kiyota K, Nakagawa H (2006) Browsing system for weblog articles based on automated folksonomy. In: Proceedings of the third annual workshop on the weblogging ecosystem (WWE2006), Edinburgh

  23. Perlich C, Provost FJ, Simonoff JS (2003) Tree induction vs. logistic regression: a Learning-curve analysis. J Mach Learn Res 4: 211–255

    Article  MathSciNet  Google Scholar 

  24. Platt JC et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 1–11

    Google Scholar 

  25. Provost FJ, Domingos P (2000) Well-trained PETs: improving probability estimation trees. CeDER Working Paper, #IS-00-04, Stern School of Business, New York University

  26. Provost FJ, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215

    Article  MATH  Google Scholar 

  27. Rennie J, Rifkin R (2001) Improving multiclass text classification with the support vector machine. Technical Report AIM-2001-026, MIT Press, Cambridge

  28. Saar-Tsechansky M, Provost FJ (2004) Active sampling for class probability estimation and ranking. Mach Learn 54: 153–178

    Article  MATH  Google Scholar 

  29. Sakamoto Y, Ishiguro M, Kitagawa G (1983) Akaike information criterion statistics. Kyoritsu Press, Tokyo

    Google Scholar 

  30. Schohn G, Cohn D(2000) Less is more: active learning with support vector machines. In: Proceedings of the 17th international conference on machine learning (ICML’00), pp 839–846

  31. Sebastiani F (2002) Machine learning automated text categorization. ACM comput Surv 34(1): 1–47

    Article  Google Scholar 

  32. Takahashi K, Takamura H, Okumura M (2005a) Automatic occupation coding with combination of machine learning and hand-crafted rules. In: Proceedings of the ninth Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 269–279

  33. Takahashi K, Suyama A, Murayama N, Takamura H, Okumura M (2005b) Applying occupation coding supporting system for coders (NANACO) in JGSS-2003. In: Japanese value and behavioral pattern seen in JGSS in 2003, the IRS at Osaka University of Commerce, pp 225–242

  34. Takahashi K (2008) Automated coding in social surveys. In: Tanioka I, Nitta M, Iwai N (eds) Values and behavioral patterns in Japan. University of Tokyo Press, Tokyo, pp 459–471

    Google Scholar 

  35. Tanioka I, Nitta M, Iwai N (eds) (2008) Values and behavioral patterns in Japan. University of Tokyo Press, Tokyo

    Google Scholar 

  36. Tsuruoka Y, Tsujii J (2003) Training a naive Bayes classifier via EM algorithm with a class distribution constraint. In: Proceedings of the seventh conference on natural language learning (CoNLL), pp 127–134

  37. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  38. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, Mclachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst (KAIS) 14(1): 1–37

    Article  Google Scholar 

  39. Zadrozny B, Elkan C (2001a) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the 18th international conference on machine learning (ICML’01), pp 609–616

  40. Zadrozny B, Elkan C (2001b) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01), pp 204–213

  41. Zadrozny B (2002) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems (NIPS’01)

  42. Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th international conference on knowledge discovery and data mining (KDD’02), pp 694–699

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuko Takahashi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takahashi, K., Takamura, H. & Okumura, M. Direct estimation of class membership probabilities for multiclass classification using multiple scores. Knowl Inf Syst 19, 185–210 (2009). https://doi.org/10.1007/s10115-008-0165-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0165-z

Keywords

Navigation