Abstract
With the growth of interest in data mining, there has been increasing interest in applying machine learning algorithms to real-world problems. This raises the question of how to evaluate the performance of machine learning algorithms. The standard procedure performs random sampling of predictive accuracy until a statistically significant difference arises between competing algorithms. That procedure fails to take into account the calibration of predictions. An alternative procedure uses an information reward measure (from I.J. Good) which is sensitive both to domain knowledge (predictive accuracy) and calibration. We analyze this measure, relating it to Kullback-Leibler distance. We also apply it to five well-known machine learning algorithms across a variety of problems, demonstrating some variations in their assessments using accuracy vs. information reward. We also look experimentally at information reward as a function of calibration and accuracy.
Chapter PDF
Similar content being viewed by others
Keywords
References
Brier, G.W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.
Cover, T.M. and Thomas, J.A. (1991). Elements of information theory. New York: Wiley.
Domingos, P. and Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, (pp. 105–112), Bari, Italy. Morgan Kaufmann.
Dowe, D.L., Farr, G.E., Hurst, A.J. and Lentin, K.L. (1996). Information-theoretic football tipping. Technical Report 96/297, Dept. of Computer Science, Monash University.
Dowe, D.L. (2000). Learning and prediction notes. School of Computer Science and Software Engineering, Monash University.
Good, I.J. (1952). Rational decisions. Jrn. of the Royal Statistical Society, B, 14, 107–114. Reprinted in Good thinking: The foundations of probability and its applications, Minnesota, 1983.
Griffen, D., and Tversky, A. (1992). The weighing of evidence and the determinants of confidence. Cognitive Psychology, 24, 411–435.
Holte, R. C., (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.
Jeffrey, R. (1983). The logic of decision, 2nd ed. New York: McGraw-Hill.
Kononenko, I., and Bratko, I. (1991). Information-based evaluation criterion for classifier’s performance. Machine Learning, 6, 67–80.
Leslie, C. (1998). Lack of confidence. MA Thesis, Department of History and Philosophy of Science, University of Melbourne.
Lewis, D. (1980). A subjectivist’s guide to objective chance. In Jeffrey (Ed.) Studies in inductive logic and probability, vol II (pp. 263–293). Univ of California.
Lichtenstein, S., Fischhoff, B., and Phillips, L.D. (1977). Calibration of probabilities: The state of the art. In H. Jungermann and G. de Zeeuw (Eds.), Decision making and change in human affairs (pp. 275–324). Dordrecht: Reidel.
Matheson, J.E., and Winkler, R. L. (1976). Scoring rules for continuous probability distributions. Management Science, 22.
McClelland, A. G. R., and Bolger, F. (1994). The calibration of subjective probabilities: Theories and models, 1980–1994. In G. Wright and P. Ayton (Eds.) Subjective probability, Wiley.
Mitchell, T. (1997). Machine learning. McGraw-Hill.
Morgan, M. G., and Henrion, M. (1990). Uncertainty: A guide to dealing with uncertainty in quantitative risk and policy analysis. Cambridge University.
Murphy, A. H., and Winkler, R. L. (1984). The probability of precipitation forecasts. Journal of the American Statistical Association, 79, 391–400.
Oliver, J. (1993). Decision graphs-an extension of decision trees. Fourth Int. Conf. Artificial Intelligence and Statistics, pp. 343–350
Pearl, J. (1978). An economic basis for certain methods of evaluating probabilistic forecasts. International Journal of Man-Machine Studies, 10, 175–183.
[Provost et al., 1998]_Provost, F., Fawcett, T. and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. International Conference on Machine Learning, 1998, Morgan Kaufmann.
Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann.
Ramsey, F.P. (1931). The foundations of mathematics and other logical essays, edited by R.B. Braithwaite. New York: Humanities Press.
Savage, L.J. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66.
Turney, P. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 369–409.
Wallace, C.S., and Boulton, D.M. (1968). An information measure for classification. The Computer Journal, 11, 185–194.
Wallace, C.S., and Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Korb, K.B., Hope, L.R., Hughes, M.J. (2001). The Evaluation of Predictive Learners: Some Theoretical and Empirical Results. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_24
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive