Abstract
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.
Article PDF
Similar content being viewed by others
References
Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 1–9).
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319–342.
Argamon-Engelson, S., & Dagan, I. (1999). Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, 11, 335–360.
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–142.
Blake, C. L., & Merz, C. J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html].
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:7, 1145–1159.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the Ninth European Conference on Artificial Intelligence (pp. 147–149).
Cohn, D., Atlas, L., & Ladner, R. (1994). Improved generalization with active learning. Machine Learning, 15, 201–221.
Cohn, D., Ghahramani, Z., & Jordan, M. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145.
Cohen, W., & Singer, Y. A. (1999). A simple, fast, and effective rule learner. In Proceedings of the Sixteenth National Conference of the American Association of Artificial Intelligence (pp. 335–342).
Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning (pp. 148–156).
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Journal of Knowledge Discovery and Data Mining, 55–77.
Iyengar, V. S., Apte, C., & Zhang, T. (2000). Active learning using adaptive resampling. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 92–98).
Lewis, D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3–12).
Lewis, D., & Catlett, J. (1994). Heterogeneous uncertainty sampling. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 148–156).
MaCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 350–358).
Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research, 4: June, 211–255.
Porter, B. W., & Kibler, D. F. (1986). Experimental goal regression: A method for learning problem-solving heuristics. Machine Learning, 1:3, 249–285.
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing classifiers. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445–453).
Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, NYU.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufman.
Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the Eighteenth International Conference on Machine Learning (pp. 441–448).
Seung, H. S., Opper, M., & Smopolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 287–294).
Simon, H. A., & Lea, G. (1974). Problem solving and rule induction: A unified view. In L. W. Gregg (Ed.), Knowledge and Cognition. Chap. 5. Potomac, MD: Erlbaum.
Turney, P. D. (2000). Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML-2000, Stanford University, California, 15–21.
Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill.
Zadrozny B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 204–212).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Saar-Tsechansky, M., Provost, F. Active Sampling for Class Probability Estimation and Ranking. Machine Learning 54, 153–178 (2004). https://doi.org/10.1023/B:MACH.0000011806.12374.c3
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000011806.12374.c3