Skip to main content

Quality Assessment and Evaluation Criteria in Supervised Learning

  • Chapter
  • First Online:
Machine Learning for Data Science Handbook

Abstract

Evaluating the performance of a learning algorithm is one of the basic tasks in machine learning and data science. In this chapter, we review commonly used performance measures and discuss their properties. We show that different measures focus on different aspects of the algorithm. Therefore, a learning algorithm is typically evaluated with respect to several criteria. We introduce conceptual tools and provide important guidelines for quality assessment of fully trained algorithms. We focus our attention to classification problems, as we draw connections to basic concepts in statistics, engineering, and other disciplines. In addition, we discuss regression problems, as we study popular residual-based measures. Finally, we suggest that evaluation criteria shall also be considered during the design of the algorithm. In this sense, the desired criteria determine the objective function, prior to the training of the algorithm. These design considerations are discussed, and several approaches are introduced to the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.bom.gov.au/climate/data/.

References

  1. Jinbo Bi and Kristin P Bennett, Regression error characteristic curves, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 43–50.

    Google Scholar 

  2. Andrew P Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition 30 (1997), no. 7, 1145–1159.

    Google Scholar 

  3. Rich Caruana and Alexandru Niculescu-Mizil, Data mining in metric space: an empirical analysis of supervised learning performance criteria, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, pp. 69–78.

    Google Scholar 

  4. Michelle Ciraco, Michael Rogalewski, and Gary Weiss, Improving classifier utility by altering the misclassification cost ratio, Proceedings of the 1st International Workshop on Utility-Based Data Mining, ACM, 2005, pp. 46–52.

    Google Scholar 

  5. Corinna Cortes and Mehryar Mohri, AUC optimization vs. error rate minimization, Advances in Neural Information Processing Systems, 2004, pp. 313–320.

    Google Scholar 

  6. Thomas M Cover and Joy A Thomas, Elements of information theory, John Wiley & Sons, 2012.

    Google Scholar 

  7. Jesse Davis and Mark Goadrich, The relationship between precision-recall and ROC curves, Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, pp. 233–240.

    Google Scholar 

  8. Jonathan J Deeks and Douglas G Altman, Diagnostic tests 4: likelihood ratios, BMJ 329 (2004), no. 7458, 168–169.

    Google Scholar 

  9. Gianluca Demartini and Stefano Mizzaro, A classification of IR effectiveness metrics, European Conference on Information Retrieval, Springer, 2006, pp. 488–491.

    Google Scholar 

  10. Tom Fawcett, ROC graphs: Notes and practical considerations for researchers, Machine Learning 31 (2004), no. 1, 1–38.

    MathSciNet  Google Scholar 

  11. ——, An introduction to ROC analysis, Pattern Recognition Letters 27 (2006), no. 8, 861–874.

    Article  MathSciNet  Google Scholar 

  12. César Ferri, José Hernández-Orallo, and R Modroiu, An experimental comparison of performance measures for classification, Pattern Recognition Letters 30 (2009), no. 1, 27–38.

    Article  MathSciNet  Google Scholar 

  13. Peter A Flach, The geometry of ROC space: understanding machine learning metrics through ROC isometrics, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 194–201.

    Google Scholar 

  14. Jerome Friedman, Trevor Hastie, and Robert Tibshirani, The elements of statistical learning, vol. 1, Springer Series in Statistics New York, 2001.

    Google Scholar 

  15. Tilmann Gneiting and Adrian E Raftery, Strictly proper scoring rules, prediction, and estimation, Journal of American Statistical Association 102 (2007), no. 477, 359–378.

    Google Scholar 

  16. David J Hand and Robert J Till, A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning 45 (2001), no. 2, 171–186.

    Google Scholar 

  17. James A Hanley and Barbara J McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve., Radiology 143 (1982), no. 1, 29–36.

    Google Scholar 

  18. Nathalie Japkowicz and Mohak Shah, Evaluating learning algorithms: a classification perspective, Cambridge University Press, 2011.

    Book  MATH  Google Scholar 

  19. Victor Richmond Jose, A characterization for the spherical scoring rule, Theory and Decision 66 (2009), no. 3, 263–281.

    Article  MathSciNet  MATH  Google Scholar 

  20. Nicolas Lachiche and Peter A Flach, Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 416–423.

    Google Scholar 

  21. Nada Lavrač, Peter Flach, and Blaz Zupan, Rule evaluation measures: A unifying view, International Conference on Inductive Logic Programming, Springer, 1999, pp. 174–185.

    Google Scholar 

  22. Charles X Ling, Jin Huang, Harry Zhang, et al., AUC: a statistically consistent and more discriminating measure than accuracy, IJCAI, vol. 3, 2003, pp. 519–524.

    Google Scholar 

  23. Simon J Mason and Nicholas E Graham, Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation, Quarterly Journal of the Royal Meteorological Society: A Journal of the Atmospheric Sciences, Applied Meteorology and Physical Oceanography 128 (2002), no. 584, 2145–2166.

    Google Scholar 

  24. Neri Merhav and Meir Feder, Universal schemes for sequential decision from individual data sequences, IEEE Transactions on Information Theory 39 (1993), no. 4, 1280–1292.

    Article  MathSciNet  MATH  Google Scholar 

  25. Amichai Painsky and Gregory Wornell, On the universality of the logistic loss function, 2018 IEEE International Symposium on Information Theory (ISIT), IEEE, 2018, pp. 936–940.

    Google Scholar 

  26. Amichai Painsky and Gregory W Wornell, Bregman divergence bounds and universality properties of the logarithmic loss, IEEE Transactions on Information Theory 66 (2019), no. 3, 1658–1673.

    Google Scholar 

  27. H Vincent Poor, An introduction to signal detection and estimation, Springer Science & Business Media, 2013.

    Google Scholar 

  28. Mark D Reid and Robert C Williamson, Composite binary losses, Journal of Machine Learning Research 11 (2010), no. Sep, 2387–2422.

    Google Scholar 

  29. Thornton B Roby, Belief states and the uses of evidence, Behavioral Science 10 (1965), no. 3, 255–270.

    Google Scholar 

  30. Saharon Rosset, Claudia Perlich, and Bianca Zadrozny, Ranking-based evaluation of regression models, Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005, pp. 8–pp.

    Google Scholar 

  31. Leonard J Savage, Elicitation of personal probabilities and expectations, Journal of American Statistical Association 66 (1971), no. 336, 783–801.

    Google Scholar 

  32. George AF Seber and Alan J Lee, Linear regression analysis, vol. 329, John Wiley & Sons, 2012.

    Google Scholar 

  33. Emir H Shuford, Arthur Albert, and H Edward Massengill, Admissible probability measurement procedures, Psychometrika 31 (1966), no. 2, 125–145.

    Google Scholar 

  34. Henri Theil, Economic forecasts and policy, North-Holland Pub. Co., 1961.

    Google Scholar 

  35. Leslie G Valiant, A theory of the learnable, Communications of the ACM 27 (1984), no. 11, 1134–1142.

    Google Scholar 

  36. Tong Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, The Annals of Statistics (2004), 56–85.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amichai Painsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Painsky, A. (2023). Quality Assessment and Evaluation Criteria in Supervised Learning. In: Rokach, L., Maimon, O., Shmueli, E. (eds) Machine Learning for Data Science Handbook. Springer, Cham. https://doi.org/10.1007/978-3-031-24628-9_9

Download citation

Publish with us

Policies and ethics