poster

Accurate intelligible models with pairwise interactions

Authors:
Yin Lou

Cornell University, Ithaca, New York, USA

Cornell University, Ithaca, New York, USA
View Profile

,
Rich Caruana

Microsoft Research, Redmond, Washington, USA

Microsoft Research, Redmond, Washington, USA
View Profile

,
Johannes Gehrke

Cornell University, Ithaca, New York, USA

Cornell University, Ithaca, New York, USA
View Profile

,
Giles Hooker

Cornell University, Ithaca, New York, USA

Cornell University, Ithaca, New York, USA
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 623–631https://doi.org/10.1145/2487575.2487579

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 623–631

ABSTRACT

Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions.

In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA²{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA²M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model.

In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA²M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA²M-models can yield models that are both intelligible and accurate.

References

http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.Google Scholar
http://www.cs.toronto.edu/~delve/data/datasets.html.Google Scholar
http://research.microsoft.com/en-us/projects/mslr/.Google Scholar
http://archive.ics.uci.edu/ml/.Google Scholar
http://www.nipsfsc.ecs.soton.ac.uk/.Google Scholar
http://osmot.cs.cornell.edu/kddcup/.Google Scholar
http://www-stat.stanford.edu/~jhf/R-RuleFit.html.Google Scholar
http://additivegroves.net.Google Scholar
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1):105--139, 1999. Google ScholarDigital Library
J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001.Google ScholarCross Ref
J. Friedman and B. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, pages 916--954, 2008.Google ScholarCross Ref
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157--1182, 2003. Google ScholarDigital Library
T. Hastie and R. Tibshirani. Generalized additive models. Chapman & Hall/CRC, 1990.Google Scholar
G. Hooker. Discovering additive structure in black box functions. In KDD, 2004. Google ScholarDigital Library
G. Hooker. Generalized functional anova diagnostics for high-dimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3):709--732, 2007.Google ScholarCross Ref
R. Kelley Pace and R. Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291--297, 1997.Google ScholarCross Ref
P. Li, C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, 2007.Google ScholarDigital Library
W. Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12(2):361--386, 2002.Google Scholar
Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In KDD, 2012. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. Google ScholarCross Ref
D. Sorokina, R. Caruana, and M. Riedewald. Additive groves of regression trees. In ECML, 2007. Google ScholarDigital Library
D. Sorokina, R. Caruana, M. Riedewald, and D. Fink. Detecting statistical interactions with additive groves of trees. In ICML, 2008. Google ScholarDigital Library
S. M. Weiss and N. Indurkhya. Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3:383--403, 1995. Google ScholarDigital Library
S. Wood. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1):95--114, 2003.Google Scholar
S. Wood. Generalized additive models: an introduction with R. CRC Press, 2006. Google ScholarDigital Library

Index Terms

Accurate intelligible models with pairwise interactions
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning

Recommendations

Intelligible models for classification and regression
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Complex models for regression and classification have high accuracy, but are unfortunately no longer interpretable by users. We study the performance of generalized additive models (GAMs), which combine single-feature models called shape functions ...
Read More
Efficiently Training Intelligible Models for Global Explanations
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Generalized additive models (GAMs) are one of the popular methods of building intelligible models on classification and regression problems. Fitting the most accurate GAMs is usually done via gradient boosting with bagged shallow trees. However, such ...
Read More
Models and selection criteria for regression and classification
UAI'97: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
interaction detection
regression
Qualifiers
- poster
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 223
  Total Citations
  View Citations
- 1,707
  Total Downloads
- Downloads (Last 12 months)244
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accurate intelligible models with pairwise interactions

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Intelligible models for classification and regression

Efficiently Training Intelligible Models for Global Explanations

Models and selection criteria for regression and classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accurate intelligible models with pairwise interactions

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Intelligible models for classification and regression

Efficiently Training Intelligible Models for Global Explanations

Models and selection criteria for regression and classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media