Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB

Prinzie, Anita; Van den Poel, Dirk

doi:10.1007/978-3-540-74469-6_35

Anita Prinzie¹ &
Dirk Van den Poel¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4653))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

Abstract

Random Forests (RF) is a successful classifier exhibiting performance comparable to Adaboost, but is more robust. The exploitation of two sources of randomness, random inputs (bagging) and random features, make RF accurate classifiers in several domains. We hypothesize that methods other than classification or regression trees could also benefit from injecting randomness. This paper generalizes the RF framework to other multiclass classification algorithms like the well-established MultiNomial Logit (MNL) and Naive Bayes (NB). We propose Random MNL (RMNL) as a new bagged classifier combining a forest of MNLs estimated with randomly selected features. Analogously, we introduce Random Naive Bayes (RNB). We benchmark the predictive performance of RF, RMNL and RNB against state-of-the-art SVM classifiers. RF, RMNL and RNB outperform SVM. Moreover, generalizing RF seems promising as reflected by the improved predictive performance of RMNL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baltas, G., Doyle, P.: Random utility models in marketing: a survey. Journal of Business Research 51(2), 115–125 (2001)
Article Google Scholar
Barandela, R., Sánchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3), 849–851 (2003)
Article Google Scholar
Ben-Akiva, M., Lerman, S.R.: Discrete Choice Analysis: Theory and Application to Travel Demand. The MIT Press, Cambridge (1985)
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988)
Article MATH Google Scholar
Dietterich, T.G.: Machine-Learning Research – Four current directions. AI Magazine 18(4), 97–136 (1997)
Google Scholar
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13^th International Joint Conference on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Freund, Y., Shapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proc. of the Thirteenth International Conference, pp. 148–156 (1996)
Google Scholar
Langley, P., Iba, W., Thomas, K.: An analysis of Baysian classifiers. In: Proceedings of the Tenth National Conference on Artificial Inteligence, pp. 223–228. AAAI Press, Stanford (1992)
Google Scholar
Louviere, J., Street, D.J., Burgess, L.: A 20+ retrospective on choice experiments. In: Wind, Y., Green, P.E. (eds.) Marketing Research and Modeling: Progress and Prospectives, Academic Publishers, New York (2003)
Google Scholar
Morrison, D.G.: On the interpretation of discriminant analysis. Journal of Marketing Research 6, 156–163 (1969)
Article MathSciNet Google Scholar
Prinzie, A., Van den Poel, D.: Predicting home-appliance acquisition sequences: Markov/Markov for Discrimination and survival analysis for modelling sequential information in NPTB models. Decision Support Systems (accepted 2007), http://dx.doi.org/10.1016/j.dss.2007.02.008
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
MATH Google Scholar
Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of the Twentieh National Conference on Artificial Inteligence, AAAI Press, Stanford (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Marketing, Ghent University, Tweekerkenstraat 2, 9000 Ghent, Belgium
Anita Prinzie & Dirk Van den Poel

Authors

Anita Prinzie
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Van den Poel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roland Wagner Norman Revell Günther Pernul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prinzie, A., Van den Poel, D. (2007). Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-74469-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics