Quality Assessment and Evaluation Criteria in Supervised Learning

Painsky, Amichai

doi:10.1007/978-3-031-24628-9_9

Amichai Painsky⁴

2043 Accesses
2 Citations

Abstract

Evaluating the performance of a learning algorithm is one of the basic tasks in machine learning and data science. In this chapter, we review commonly used performance measures and discuss their properties. We show that different measures focus on different aspects of the algorithm. Therefore, a learning algorithm is typically evaluated with respect to several criteria. We introduce conceptual tools and provide important guidelines for quality assessment of fully trained algorithms. We focus our attention to classification problems, as we draw connections to basic concepts in statistics, engineering, and other disciplines. In addition, we discuss regression problems, as we study popular residual-based measures. Finally, we suggest that evaluation criteria shall also be considered during the design of the algorithm. In this sense, the desired criteria determine the objective function, prior to the training of the algorithm. These design considerations are discussed, and several approaches are introduced to the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.bom.gov.au/climate/data/.

References

Jinbo Bi and Kristin P Bennett, Regression error characteristic curves, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 43–50.
Google Scholar
Andrew P Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition 30 (1997), no. 7, 1145–1159.
Google Scholar
Rich Caruana and Alexandru Niculescu-Mizil, Data mining in metric space: an empirical analysis of supervised learning performance criteria, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, pp. 69–78.
Google Scholar
Michelle Ciraco, Michael Rogalewski, and Gary Weiss, Improving classifier utility by altering the misclassification cost ratio, Proceedings of the 1st International Workshop on Utility-Based Data Mining, ACM, 2005, pp. 46–52.
Google Scholar
Corinna Cortes and Mehryar Mohri, AUC optimization vs. error rate minimization, Advances in Neural Information Processing Systems, 2004, pp. 313–320.
Google Scholar
Thomas M Cover and Joy A Thomas, Elements of information theory, John Wiley & Sons, 2012.
Google Scholar
Jesse Davis and Mark Goadrich, The relationship between precision-recall and ROC curves, Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, pp. 233–240.
Google Scholar
Jonathan J Deeks and Douglas G Altman, Diagnostic tests 4: likelihood ratios, BMJ 329 (2004), no. 7458, 168–169.
Google Scholar
Gianluca Demartini and Stefano Mizzaro, A classification of IR effectiveness metrics, European Conference on Information Retrieval, Springer, 2006, pp. 488–491.
Google Scholar
Tom Fawcett, ROC graphs: Notes and practical considerations for researchers, Machine Learning 31 (2004), no. 1, 1–38.
MathSciNet Google Scholar
——, An introduction to ROC analysis, Pattern Recognition Letters 27 (2006), no. 8, 861–874.
Article MathSciNet Google Scholar
César Ferri, José Hernández-Orallo, and R Modroiu, An experimental comparison of performance measures for classification, Pattern Recognition Letters 30 (2009), no. 1, 27–38.
Article MathSciNet Google Scholar
Peter A Flach, The geometry of ROC space: understanding machine learning metrics through ROC isometrics, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 194–201.
Google Scholar
Jerome Friedman, Trevor Hastie, and Robert Tibshirani, The elements of statistical learning, vol. 1, Springer Series in Statistics New York, 2001.
Google Scholar
Tilmann Gneiting and Adrian E Raftery, Strictly proper scoring rules, prediction, and estimation, Journal of American Statistical Association 102 (2007), no. 477, 359–378.
Google Scholar
David J Hand and Robert J Till, A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning 45 (2001), no. 2, 171–186.
Google Scholar
James A Hanley and Barbara J McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve., Radiology 143 (1982), no. 1, 29–36.
Google Scholar
Nathalie Japkowicz and Mohak Shah, Evaluating learning algorithms: a classification perspective, Cambridge University Press, 2011.
Book MATH Google Scholar
Victor Richmond Jose, A characterization for the spherical scoring rule, Theory and Decision 66 (2009), no. 3, 263–281.
Article MathSciNet MATH Google Scholar
Nicolas Lachiche and Peter A Flach, Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves, Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 416–423.
Google Scholar
Nada Lavrač, Peter Flach, and Blaz Zupan, Rule evaluation measures: A unifying view, International Conference on Inductive Logic Programming, Springer, 1999, pp. 174–185.
Google Scholar
Charles X Ling, Jin Huang, Harry Zhang, et al., AUC: a statistically consistent and more discriminating measure than accuracy, IJCAI, vol. 3, 2003, pp. 519–524.
Google Scholar
Simon J Mason and Nicholas E Graham, Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation, Quarterly Journal of the Royal Meteorological Society: A Journal of the Atmospheric Sciences, Applied Meteorology and Physical Oceanography 128 (2002), no. 584, 2145–2166.
Google Scholar
Neri Merhav and Meir Feder, Universal schemes for sequential decision from individual data sequences, IEEE Transactions on Information Theory 39 (1993), no. 4, 1280–1292.
Article MathSciNet MATH Google Scholar
Amichai Painsky and Gregory Wornell, On the universality of the logistic loss function, 2018 IEEE International Symposium on Information Theory (ISIT), IEEE, 2018, pp. 936–940.
Google Scholar
Amichai Painsky and Gregory W Wornell, Bregman divergence bounds and universality properties of the logarithmic loss, IEEE Transactions on Information Theory 66 (2019), no. 3, 1658–1673.
Google Scholar
H Vincent Poor, An introduction to signal detection and estimation, Springer Science & Business Media, 2013.
Google Scholar
Mark D Reid and Robert C Williamson, Composite binary losses, Journal of Machine Learning Research 11 (2010), no. Sep, 2387–2422.
Google Scholar
Thornton B Roby, Belief states and the uses of evidence, Behavioral Science 10 (1965), no. 3, 255–270.
Google Scholar
Saharon Rosset, Claudia Perlich, and Bianca Zadrozny, Ranking-based evaluation of regression models, Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005, pp. 8–pp.
Google Scholar
Leonard J Savage, Elicitation of personal probabilities and expectations, Journal of American Statistical Association 66 (1971), no. 336, 783–801.
Google Scholar
George AF Seber and Alan J Lee, Linear regression analysis, vol. 329, John Wiley & Sons, 2012.
Google Scholar
Emir H Shuford, Arthur Albert, and H Edward Massengill, Admissible probability measurement procedures, Psychometrika 31 (1966), no. 2, 125–145.
Google Scholar
Henri Theil, Economic forecasts and policy, North-Holland Pub. Co., 1961.
Google Scholar
Leslie G Valiant, A theory of the learnable, Communications of the ACM 27 (1984), no. 11, 1134–1142.
Google Scholar
Tong Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, The Annals of Statistics (2004), 56–85.
Google Scholar

Download references

Author information

Authors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Amichai Painsky

Authors

Amichai Painsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amichai Painsky .

Editor information

Editors and Affiliations

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Lior Rokach
Department of Industrial Engineering, Tel Aviv University, Ramat Aviv, Israel
Oded Maimon
Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel
Erez Shmueli

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Painsky, A. (2023). Quality Assessment and Evaluation Criteria in Supervised Learning. In: Rokach, L., Maimon, O., Shmueli, E. (eds) Machine Learning for Data Science Handbook. Springer, Cham. https://doi.org/10.1007/978-3-031-24628-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-24628-9_9
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24627-2
Online ISBN: 978-3-031-24628-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics