Oracle Coached Decision Trees and Lists

Johansson, Ulf; Sönströd, Cecilia; Löfström, Tuve

doi:10.1007/978-3-642-13062-5_8

Ulf Johansson¹⁹,
Cecilia Sönströd¹⁹ &
Tuve Löfström¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6065))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

849 Accesses
3 Citations

Abstract

This paper introduces a novel method for obtaining increased predictive performance from transparent models in situations where production input vectors are available when building the model. First, labeled training data is used to build a powerful opaque model, called an oracle. Second, the oracle is applied to production instances, generating predicted target values, which are used as labels. Finally, these newly labeled instances are utilized, in different combinations with normal training data, when inducing a transparent model. Experimental results, on 26 UCI data sets, show that the use of oracle coaches significantly improves predictive performance, compared to standard model induction. Most importantly, both accuracy and AUC results are robust over all combinations of opaque and transparent models evaluated. This study thus implies that the straightforward procedure of using a coaching oracle, which can be used with arbitrary classifiers, yields significantly better predictive performance at a low computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman & Hall/CRC (1984)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines, pp. 200–209. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Andrews, R., Diederich, J., Tickle, A.B.: Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst. 8(6), 373–389 (1995)
Article Google Scholar
Craven, M.W., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: Advances in Neural Information Processing Systems, pp. 24–30. MIT Press, Cambridge (1996)
Google Scholar
Thrun, S., Tesauro, G., Touretzky, D., Leen, T.: Extracting rules from artificial neural networks with distributed representations. In: Advances in Neural Information Processing Systems, vol. 7, pp. 505–512. MIT Press, Cambridge (1995)
Google Scholar
Zhou, Z.H.: Rule extraction: using neural networks or for neural networks? J. Comput. Sci. Technol. 19(2), 249–253 (2004)
Article Google Scholar
Johansson, U., Niklasson, L.: Evolving decision trees using oracle guides. In: CIDM, pp. 238–244. IEEE, Los Alamitos (2009)
Google Scholar
Johansson, U., König, R., Niklasson, L.: Rule extraction from trained neural networks using genetic programming. In: ICANN, supplementary proceedings, pp. 13–16 (2003)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Fawcett, T.: Using rule sets to maximize roc performance. In: IEEE International Conference on Data Mining, ICDM 2001, pp. 131–138. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of American Statistical Association 32, 675–701 (1937)
Article Google Scholar
Nemenyi, P.B.: Distribution-free multiple comparisons. PhD-thesis. Princeton University (1963)
Google Scholar

Download references

Author information

Authors and Affiliations

CSL@BS Research Group School of Business and Informatics, University of Borås, Sweden
Ulf Johansson, Cecilia Sönströd & Tuve Löfström

Authors

Ulf Johansson
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia Sönströd
View author publications
You can also search for this author in PubMed Google Scholar
Tuve Löfström
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Arizona, 1040 East 4th Street, 85721, Tucson, AZ, USA
Paul R. Cohen
Department of Mathematics, Imperial College London, South Kensington Campus, SW7 2PG, London, UK
Niall M. Adams
Department of Computer and Information Science, University of Konstanz, Box 712, 78457, Konstanz, Germany
Michael R. Berthold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Johansson, U., Sönströd, C., Löfström, T. (2010). Oracle Coached Decision Trees and Lists. In: Cohen, P.R., Adams, N.M., Berthold, M.R. (eds) Advances in Intelligent Data Analysis IX. IDA 2010. Lecture Notes in Computer Science, vol 6065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13062-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-13062-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13061-8
Online ISBN: 978-3-642-13062-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics