Direct estimation of class membership probabilities for multiclass classification using multiple scores

Takahashi, Kazuko; Takamura, Hiroya; Okumura, Manabu

doi:10.1007/s10115-008-0165-z

Direct estimation of class membership probabilities for multiclass classification using multiple scores

Regular Paper
Published: 09 September 2008

Volume 19, pages 185–210, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Kazuko Takahashi¹,
Hiroya Takamura² &
Manabu Okumura²

214 Accesses
5 Citations
Explore all metrics

Abstract

Accurate estimation of class membership probability is needed for many applications in data mining and decision-making, to which multiclass classification is often applied. Since existing methods for estimation of class membership probability are designed for binary classification, in which only a single score outputted from a classifier can be used, an approach for multiclass classification requires both a decomposition of a multiclass classifier into binary classifiers and a combination of estimates obtained from each binary classifier to a target estimate. We propose a simple and general method for directly estimating class membership probability for any class in multiclass classification without decomposition and combination, using multiple scores not only for a predicted class but also for other proper classes. To make it possible to use multiple scores, we propose to modify or extend representative existing methods. As a non-parametric method, which refers to the idea of a binning method as proposed by Zadrozny et al., we create an “accuracy table” by a different method. Moreover we smooth accuracies on the table with methods such as the moving average to yield reliable probabilities (accuracies). As a parametric method, we extend Platt’s method to apply a multiple logistic regression. On two different datasets (open-ended data from Japanese social surveys and the 20 Newsgroups) both with Support Vector Machines and naive Bayes classifiers, we empirically show that the use of multiple scores is effective in the estimation of class membership probabilities in multiclass classification in terms of cross entropy, the reliability diagram, the ROC curve and AUC (area under the ROC curve), and that the proposed smoothing method for the accuracy table works quite well. Finally, we show empirically that in terms of MSE (mean squared error), our best proposed method is superior to an expansion for multiclass classification of a PAV method proposed by Zadrozny et al., in both the 20 Newsgroups dataset and the Pendigits dataset, but is slightly worse than the state-of-the-art method, which is an expansion for multiclass classification of a combination of boosting and a PAV method, on the Pendigits dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Agui T, Nakajima M (1991) Graphical information processing. Morikita Press, Tokyo
Google Scholar
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository.
Bennett PN (2000) Assessing the calibration of naive Bayes’s posterior estimates. In: Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University, pp 1–8
Caruana R, Niculescu-Mizil A (2005) Predicting good probabilities with supervised learning. In: Proceedings of the American methodology conference (AMS2005), San Diego
Chan YS, Ng HT (2006) Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of the 21st international conference on computational linguistic and the 44th annual meeting of the ACL (ICCL’06 and ACL’06), pp 89–96
Cheeseman P, Stutz J et al (1995) Bayesian classification (AutoClass): theory and results. In: Fayyad UM (eds) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, pp 61–83
Google Scholar
Devarakota PR, Mirbach B, Ottersten B (2007) Confidence estimation in classification decision: a method for detecting unseen patterns. In: Proceedings of the sixth international conference on advance topics in pattern recognition (ICAPR), Kolkata, India
Fragoudis D, Meretakis D, Likothanassis S (2007) Best terms: an efficient feature-selection algorithm for text categorization. Knowl Inf Syst (KAIS) 8(1): 16–33
Article Google Scholar
Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2004) Survey methodology. Wiley, Hoboken
MATH Google Scholar
Iwai N, Sato H (eds) (2002) Japanese values and behavioral patterns in JGSS. Yuhikaku Publishing, Tokyo
Google Scholar
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the tenth European conference on machine learning (ECML’98), pp 137–142
Jones R, Rey B, Madani O, Griner W (2006) Generating query substitutions. In: Proceedings of the 15th international world wide web conference (WWW’06), pp 387–396
Kita K (1999) Language and computing: volume 4 probabilistic language model. University of Tokyo Press, Tokyo
Google Scholar
Kogure A, Sagae M (2005) Estimating probability density from percentiles. Stat Math 53(2): 375–389
MathSciNet Google Scholar
Kressel U et al (1999) Pairwise classification and support vector machines. In: Schölkopf B (eds) Advances in kernel methods support vector learning. MIT Press, Cambridge, pp 255–268
Google Scholar
Langford J, Zadrozny B (2005) Estimating class membership probabilities using classifier learners. In: Cowell RG, Ghahramani Z (eds) Proceedings of AISTATS05, Society for Artificial Intelligence and Statistics, pp 198–205
Margineantu DD (2002) Class probability estimation and cost-sensitive classification decisions. In: Proceedings of the 13th European conference on machine learning (ECML’02), pp 270–281
Miwa S, Kobayashi D (eds) (2008) 2005SSM survey series: no. 1 basic analysis of 2005 SSM survey in Japan, 2005SSM Survey Research Group
Niculescu-Mizil A, Caruana R (2005a) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning (ICML’05), pp 625–632
Niculescu-Mizil A, Caruana R (2005b) Obtaining calibrated probabilities from boosting. In: Proceedings of the 21st international conference on uncertainty in artificial intelligence (UAI’05), pp 413–420
Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2/3): 103–134
Article MATH Google Scholar
Ohkura T, Kiyota K, Nakagawa H (2006) Browsing system for weblog articles based on automated folksonomy. In: Proceedings of the third annual workshop on the weblogging ecosystem (WWE2006), Edinburgh
Perlich C, Provost FJ, Simonoff JS (2003) Tree induction vs. logistic regression: a Learning-curve analysis. J Mach Learn Res 4: 211–255
Article MathSciNet Google Scholar
Platt JC et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 1–11
Google Scholar
Provost FJ, Domingos P (2000) Well-trained PETs: improving probability estimation trees. CeDER Working Paper, #IS-00-04, Stern School of Business, New York University
Provost FJ, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215
Article MATH Google Scholar
Rennie J, Rifkin R (2001) Improving multiclass text classification with the support vector machine. Technical Report AIM-2001-026, MIT Press, Cambridge
Saar-Tsechansky M, Provost FJ (2004) Active sampling for class probability estimation and ranking. Mach Learn 54: 153–178
Article MATH Google Scholar
Sakamoto Y, Ishiguro M, Kitagawa G (1983) Akaike information criterion statistics. Kyoritsu Press, Tokyo
Google Scholar
Schohn G, Cohn D(2000) Less is more: active learning with support vector machines. In: Proceedings of the 17th international conference on machine learning (ICML’00), pp 839–846
Sebastiani F (2002) Machine learning automated text categorization. ACM comput Surv 34(1): 1–47
Article Google Scholar
Takahashi K, Takamura H, Okumura M (2005a) Automatic occupation coding with combination of machine learning and hand-crafted rules. In: Proceedings of the ninth Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 269–279
Takahashi K, Suyama A, Murayama N, Takamura H, Okumura M (2005b) Applying occupation coding supporting system for coders (NANACO) in JGSS-2003. In: Japanese value and behavioral pattern seen in JGSS in 2003, the IRS at Osaka University of Commerce, pp 225–242
Takahashi K (2008) Automated coding in social surveys. In: Tanioka I, Nitta M, Iwai N (eds) Values and behavioral patterns in Japan. University of Tokyo Press, Tokyo, pp 459–471
Google Scholar
Tanioka I, Nitta M, Iwai N (eds) (2008) Values and behavioral patterns in Japan. University of Tokyo Press, Tokyo
Google Scholar
Tsuruoka Y, Tsujii J (2003) Training a naive Bayes classifier via EM algorithm with a class distribution constraint. In: Proceedings of the seventh conference on natural language learning (CoNLL), pp 127–134
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, Mclachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst (KAIS) 14(1): 1–37
Article Google Scholar
Zadrozny B, Elkan C (2001a) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the 18th international conference on machine learning (ICML’01), pp 609–616
Zadrozny B, Elkan C (2001b) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01), pp 204–213
Zadrozny B (2002) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems (NIPS’01)
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th international conference on knowledge discovery and data mining (KDD’02), pp 694–699

Download references

Author information

Authors and Affiliations

Faculty of International Studies, Keiai University, Sakura, Chiba, Japan
Kazuko Takahashi
Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
Hiroya Takamura & Manabu Okumura

Authors

Kazuko Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroya Takamura
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuko Takahashi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takahashi, K., Takamura, H. & Okumura, M. Direct estimation of class membership probabilities for multiclass classification using multiple scores. Knowl Inf Syst 19, 185–210 (2009). https://doi.org/10.1007/s10115-008-0165-z

Download citation

Received: 05 February 2008
Revised: 15 June 2008
Accepted: 13 July 2008
Published: 09 September 2008
Issue Date: May 2009
DOI: https://doi.org/10.1007/s10115-008-0165-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Direct estimation of class membership probabilities for multiclass classification using multiple scores

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Direct estimation of class membership probabilities for multiclass classification using multiple scores

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation