ABSTRACT
Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions.
In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model.
In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.
- http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.Google Scholar
- http://www.cs.toronto.edu/~delve/data/datasets.html.Google Scholar
- http://research.microsoft.com/en-us/projects/mslr/.Google Scholar
- http://archive.ics.uci.edu/ml/.Google Scholar
- http://www.nipsfsc.ecs.soton.ac.uk/.Google Scholar
- http://osmot.cs.cornell.edu/kddcup/.Google Scholar
- http://www-stat.stanford.edu/~jhf/R-RuleFit.html.Google Scholar
- http://additivegroves.net.Google Scholar
- E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1):105--139, 1999. Google ScholarDigital Library
- J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001.Google ScholarCross Ref
- J. Friedman and B. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, pages 916--954, 2008.Google ScholarCross Ref
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157--1182, 2003. Google ScholarDigital Library
- T. Hastie and R. Tibshirani. Generalized additive models. Chapman & Hall/CRC, 1990.Google Scholar
- G. Hooker. Discovering additive structure in black box functions. In KDD, 2004. Google ScholarDigital Library
- G. Hooker. Generalized functional anova diagnostics for high-dimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3):709--732, 2007.Google ScholarCross Ref
- R. Kelley Pace and R. Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291--297, 1997.Google ScholarCross Ref
- P. Li, C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, 2007.Google ScholarDigital Library
- W. Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12(2):361--386, 2002.Google Scholar
- Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In KDD, 2012. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. Google ScholarCross Ref
- D. Sorokina, R. Caruana, and M. Riedewald. Additive groves of regression trees. In ECML, 2007. Google ScholarDigital Library
- D. Sorokina, R. Caruana, M. Riedewald, and D. Fink. Detecting statistical interactions with additive groves of trees. In ICML, 2008. Google ScholarDigital Library
- S. M. Weiss and N. Indurkhya. Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3:383--403, 1995. Google ScholarDigital Library
- S. Wood. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1):95--114, 2003.Google Scholar
- S. Wood. Generalized additive models: an introduction with R. CRC Press, 2006. Google ScholarDigital Library
Index Terms
- Accurate intelligible models with pairwise interactions
Recommendations
Intelligible models for classification and regression
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningComplex models for regression and classification have high accuracy, but are unfortunately no longer interpretable by users. We study the performance of generalized additive models (GAMs), which combine single-feature models called shape functions ...
Efficiently Training Intelligible Models for Global Explanations
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementGeneralized additive models (GAMs) are one of the popular methods of building intelligible models on classification and regression problems. Fitting the most accurate GAMs is usually done via gradient boosting with bagged shallow trees. However, such ...
Models and selection criteria for regression and classification
UAI'97: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligenceWhen performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we ...
Comments