Abstract
Many effective and efficient learning algorithms assume independence of attributes. They often perform well even in domains where this assumption is not really true. However, they may fail badly when the degree of attribute dependencies becomes critical. In this paper, we examine methods for detecting deviations from independence. These dependencies give rise to “interactions” between attributes which affect the performance of learning algorithms. We first formally define the degree of interaction between attributes through the deviation of the best possible “voting” classifier from the true relation between the class and the attributes in a domain. Then we propose a practical heuristic for detecting attribute interactions, called interaction gain. We experimentally investigate the suitability of interaction gain for handling attribute interactions in machine learning. We also propose visualization methods for graphical exploration of interactions in a domain.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jakulin, A.: Attribute interactions in machine learning. Master’s thesis, University of Ljubljana, Faculty of Computer and Information Science (2003)
McGill, W.J.: Multivariate information transmission. Psychometrika 19, 97–116 (1954)
Han, T.S.: Multiple mutual informations and multiple interactions in frequency data. Information and Control 46, 26–45 (1980)
Yeung, R.W.: A new outlook on Shannon’s information measures. IEEE Transactions on Information Theory 37, 466–474 (1991)
Kononenko, I.: Semi-naive Bayesian classifier. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, Springer, Heidelberg (1991)
Pazzani, M.J.: Searching for dependencies in Bayesian classifiers. In: Learning from Data: AI and Statistics V, Springer, Heidelberg (1996)
Friedman, N., Goldszmidt, M.: Building classifiers using Bayesian networks. In: Proc. National Conference on Artificial Intelligence, Menlo Park, CA, pp. 1277–1284. AAAI Press, Menlo Park (1996)
Brier, G.W.: Verification of forecasts expressed in terms of probability. Weather Rev. 78, 1–3 (1950)
Demšar, J., Zupan, B.: Orange: a data mining framework (2002), http://magix.fri.uni-lj.si/orange
Hettich, S., Bay, S.D.: The UCI KDD archive http://kdd.ics.uci.edu Irvine, CA: University of California, Department of Information and Computer Science (1999)
McClelland, G.H., Judd, C.M.: Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin 114, 376–390 (1993)
Struyf, A., Hubert, M., Rousseeuw, P.J.: Integrating robust clustering techniques in S-PLUS. Computational Statistics and Data Analysis 26, 17–37 (1997)
Koutsofios, E., North, S.C.: Drawing Graphs with dot. (1996), Available on research.att.com.in.dist/drawdag/dotguide.ps.Z
Myllymaki, P., Silander, T., Tirri, H., Uronen, P.: B-Course: A web-based tool for Bayesian and causal data analysis. International Journal on Artificial Intelligence Tools 11, 369–387 (2002)
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proc. 9th European Conference on Artificial Intelligence, pp. 147–149 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jakulin, A., Bratko, I. (2003). Analyzing Attribute Dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive