Skip to main content
Log in

A review on the combination of binary classifiers in multiclass problems

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allwein EL, Shapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for magin classifiers. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, pp 9–16

  • Boser RC, Ray-Chaudhuri DK (1960) On a class of error-correcting binary group codes. Inform Control 3: 68–79

    Article  Google Scholar 

  • Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2–3): 201–233

    Article  MATH  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  • Cohen WW (1995) Fast effective rule induction. In: Machine learning: Proceedings of the 12th conference on machine learning, pp 115–123

  • Collins M, Shapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 47(2/3): 253–285

    Article  Google Scholar 

  • Dekel O, Singer Y (2003) Multiclass learning by probabilistic embeddings. In: Advances in neural information processing systems, vol. 15. MIT Press, Cambridge, pp 945–952

  • Dietterich TG, Bariki G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286

    MATH  Google Scholar 

  • Duan K, Keerthi SS (2005) Which is the best multiclass svm method? An empirical study. In: Proceedings of the 6th international workshop on multiple classifier systems, MCS 2005, vol. 3541 of lecture notes in computer science, pp 278–285

  • Escalera S, Pujol O, Radeva R (2006) Decoding of ternary error correcting output codes. In: Proceedings of the 11th iberoamerican congress on pattern recognition, vol. 4225 of lecture notes in computer science. Springer, New York, pp 753–763

  • Feng J, Yang Y, Fan J (2005) Fuzzy multi-class SVM classifier based on optimal directed acyclic graph using in similar handwritten chinese characters recognition. In: Wang J, Liao X, Yi Z (eds) Proceedings of the international symposium on neural networks, vol. 3496 of lecture notes in computer science. Springer, New York, pp 875–880

    Google Scholar 

  • Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the 21st international conference on machine learning. ACM Press, pp 305–312

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1(55): 119–139

    Article  MathSciNet  Google Scholar 

  • Furnkranz J (2002) Round robin classification. J Mach Learn Res 2: 721–747

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 2: 451–471

    MathSciNet  Google Scholar 

  • Haykin S (1999) Neural networks—a compreensive foundation, 2nd edn. Prentice-Hall, New Jersey

    Google Scholar 

  • Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425

    Article  Google Scholar 

  • Huang T-K, Weng RC, Lin C-J (2006) Generalized bradley-terry models and multi-class probability estimates. J Mach Learn Res 7: 85–115

    MathSciNet  Google Scholar 

  • Kijsirikul B, Ussivakul N (2002) Multiclass support vector machines using adaptive directed acyclic graph. In: Proceedings of international joint conference on neural networks (IJCNN 2002), pp 980–985

  • Klautau A, Jevtić N, Orlistky A (2003) On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. J Mach Learn Res 4: 1–15

    Article  Google Scholar 

  • Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman-Soulie F, Herault J (eds) Neurocomputing: algorithms, architectures and applications. Springer, New York, pp 41–50

    Google Scholar 

  • Knerr S, Personnaz L, Dreyfus G (1992) Handwritten digit recognition by neural networks with single-layer training. IEEE Trans Neural Netw 3(6): 962–968

    Article  Google Scholar 

  • Krebel U (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208

    Google Scholar 

  • Kuncheva LI (2005) Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recognit Lett 26: 83–90

    Article  Google Scholar 

  • Lee J-S, Oh I-S (2003) Binary classification trees for multi-class classification problems. In: Proceedings of the 7th international conference on document analysis and recognition, vol. 2, pp 770–774

  • Lei H, Govindaraju V (2005) Half-against-half multi-class support vector machines. In: Oza NC, Polikar R, Kittler J, Roli F (eds) Proceedings of the 6th international workshop on multiple classifier systems, vol. 3541 of lecture notes in computer science. Springer, New York, pp 156–164

    Google Scholar 

  • Lorena AC, Carvalho ACPLF (2004) Comparing techniques for multiclass classification with binary SVM predictors. In: Monroy R, Arroyo-Figueroa G, Sucar LE, Azuela JHS (eds) MICAI 2004: advances in artificial intelligence, third Mexican international conference on artificial intelligence, Mexico City, Mexico, vol. 2972 of lecture notes in artificial intelligence. Springer, New York, pp 272–281

    Google Scholar 

  • Lorena AC, Carvalho ACPLF (2007a) Evolutionary design of multiclass support vector machines. J Intell Fuzzy Syst 18: 445–454

    MATH  Google Scholar 

  • Lorena AC, Carvalho ACPLF (2007b) Design of directed acyclic graph multiclass structures. Neural Netw World 17: 657–674

    Google Scholar 

  • Lorena AC, Carvalho ACPLF (2008a) Hierarchical decomposition of multiclass problems. Neural Netw World 5: 407–425

    Google Scholar 

  • Lorena AC, Carvalho ACPLF (2008b) Investigation of strategies for the generation of multiclass support vector machines. In: The twenty first international conference on industrial, engineering & other applications of applied intelligent systems (IEA/AIE), 1st edn, vol. 134 of studies in computational intelligence. Springer, New York, pp 319–328

  • Mayoraz E, Moreira M (1996) On the decomposition of polychotomies into dichotomies, research report 96–08, IDIAP, Dalle Molle institute for perceptive artificial intelligence. Martigny

  • Mayoraz E, Alpaydim E (1998) Support vector machines for multi-class classification, research report IDIAP-RR-98-06. Dalle Molle institute for perceptual artificial intelligence, Martigny

  • Mitchell T (1997) Machine learning. McGraw Hill, New York

    MATH  Google Scholar 

  • Mitchell M (1999) An introduction to genetic algorithms. MIT Press, Cambridge

    Google Scholar 

  • Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54

    Article  Google Scholar 

  • Phetkaew T, Kijsirikul B, Rivepiboon W (2003) Reordering adaptive directed acyclic graphs: an improved algorithm for multiclass support vector machines. In: Proceedings of the international conference on neural networks. IEEE Computer Society Press, pp 1605–1610

  • Phetkaew T, Rivepiboon W, Kijsirikul B (2003) Reordering adaptive directed acyclic graphs for multiclass support vector machines. J Adv Comput Intell Intell Inform 7(3): 315–321

    Google Scholar 

  • Pimenta E, Gama J (2005) A study on error correcting output codes. In: Proceedings of the 2005 Portuguese conference on artificial intelligence. IEEE Computer Society Press, pp 218–223

  • Platt JC, Cristiani N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. In: Advances in neural information processing systems, vol. 12. The MIT Press, Cambridge, pp 547–553

  • Pontil M, Verri A (1998) Support vector machines for 3d object recognition. IEEE Trans Pattern Anal Mach Intell 20(6): 637–646

    Article  Google Scholar 

  • Pujol O, Tadeva P, Vitrià J (2006) Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes. IEEE Trans Pattern Anal Mach Intell 28(6): 1007–1012

    Article  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1): 81–106

    Google Scholar 

  • Ratsch G, Smola AJ, Mika S (2003) Adapting codes and embeddings for polychotomies. In: Advances in neural information processing systems, vol. 15. MIT Press, New York, pp 513–520

  • Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5: 1533–7928

    MathSciNet  Google Scholar 

  • Savicky P, Furnkranz J (2003) Combining pairwise classifiers with stacking. In: Berthold MR, Lenz H-J, Bradley E, Kruse R, Borgelt C (eds) Advances in intelligent data analysis V, 5th international symposium on intelligent data analysis. IDA 2003, pp 219–229

  • Schwenker F (2000) Hierarquical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intelligent systems and allied technologies. IEEE Computer Society Press, pp 561–565

  • Schwenker F, Palm G (2001) Tree-structured support vector machines for multiclass pattern recognition. In: Kittler J, Roli F (eds) Proceedings of the international workshop on multiple classifier systems, vol. 2096 of lecture notes in computer science. Springer, New York, pp 409–417

    Google Scholar 

  • Shen L, Tan EC Seeking better output-codes with genetic algorithm for multiclass cancer classification (Submitted to Bioinformatics)

  • Takahashi F, Abe S (2002) Decision-tree-based multiclass support vector machines. In: Proceedings of the 9th international conference on neural information processing, vol. 3, pp 1418–1422

  • Takahashi F, Abe S (2003) Optimizing directed acyclic graph support vector machines. In: Proceedings of artificial neural networks in pattern recognition, pp 166–170

  • Vural V, Dy JG (2004) A hierarchical method for multi-class support vector machines. In: Proceedings of the 21st international conference on machine learning. Banff, pp 831–838

  • Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems, vol. 14

  • Zhang G, Jun W (2006) Automatic construction algorithm for multi-class support vector machines with binary tree architecture. Int J Comput Sci Netw Secur 6(2A): 119–126

    MathSciNet  Google Scholar 

  • Zhigang L, Wenzhong S, Qianqing Q, Xiaowen L, Donghui X (2005) Hierarchical support vector machines, In: Proceedings of the IEEE international geoscience and remote sensing symposium. IEEE Computer Society Press, 4 pp

  • Weston J, Watkins V (1998) Multi-class support vector machines. Technical Report CSD-TR-98-04. Department of Computer Science. University of London, London

    Google Scholar 

  • Windeatt T, Ghaderi R (2003) Coding and decoding strategies for multi-class learning problems. Inform Fusion 4(1): 11–21

    Article  Google Scholar 

  • Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5: 975–1005

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Carolina Lorena.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lorena, A.C., de Carvalho, A.C.P.L.F. & Gama, J.M.P. A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30, 19 (2008). https://doi.org/10.1007/s10462-009-9114-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10462-009-9114-9

Keywords

Navigation