research-article

Public Access

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Authors:
Marco Tulio Ribeiro

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Sameer Singh

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Carlos Guestrin

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 1135–1144https://doi.org/10.1145/2939672.2939778

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1135–1144

ABSTRACT

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.

In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

Supplemental Material

kdd2016_ribeiro_any_classifier_01-acm.mp4

mp4

392.2 MB

Download

References

S. Amershi, M. Chickering, S. M. Drucker, B. Lee, P. Simard, and J. Suh. Modeltracker: Redesigning performance analysis tools for machine learning. In Human Factors in Computing Systems (CHI), 2015. Google ScholarDigital Library
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. Müller. How to explain individual classification decisions. Journal of Machine Learning Research, 11, 2010. Google ScholarDigital Library
A. Bansal, A. Farhadi, and D. Parikh. Towards transparent systems: Semantic characterization of failure modes. In European Conference on Computer Vision (ECCV), 2014.Google ScholarCross Ref
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics (ACL), 2007.Google Scholar
J. Q. Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence. Dataset Shift in Machine Learning. MIT, 2009. Google ScholarDigital Library
R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Knowledge Discovery and Data Mining (KDD), 2015. Google ScholarDigital Library
M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. Neural information processing systems (NIPS), pages 24--30, 1996.Google Scholar
M. T. Dzindolet, S. A. Peterson, R. A. Pomranky, L. G. Pierce, and H. P. Beck. The role of trust in automation reliance. Int. J. Hum.-Comput. Stud., 58 (6), 2003. Google ScholarDigital Library
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32: 407--499, 2004.Google ScholarCross Ref
U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45 (4), July 1998. Google ScholarDigital Library
A. Groce, T. Kulesza, C. Zhang, S. Shamasunder, M. Burnett, W.-K. Wong, S. Stumpf, S. Das, A. Shinsel, F. Bice, and K. McIntosh. You are the only possible oracle: Effective test selection for end users of interactive machine learning systems. IEEE Trans. Softw. Eng., 40 (3), 2014. Google ScholarDigital Library
J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In Conference on Computer Supported Cooperative Work (CSCW), 2000. Google ScholarDigital Library
A. Karpathy and F. Li. Deep visual-semantic alignments for generating image descriptions. In Computer Vision and Pattern Recognition (CVPR), 2015.Google ScholarCross Ref
S. Kaufman, S. Rosset, and C. Perlich. Leakage in data mining: Formulation, detection, and avoidance. In Knowledge Discovery and Data Mining (KDD), 2011. Google ScholarDigital Library
A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press, February 2014.Google Scholar
T. Kulesza, M. Burnett, W.-K. Wong, and S. Stumpf. Principles of explanatory debugging to personalize interactive machine learning. In Intelligent User Interfaces (IUI), 2015. Google ScholarDigital Library
B. Letham, C. Rudin, T. H. McCormick, and D. Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 2015.Google Scholar
D. Martens and F. Provost. Explaining data-driven document classifications. MIS Q., 38 (1), 2014. Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems (NIPS). 2013.Google Scholar
K. Patel, J. Fogarty, J. A. Landay, and B. Harrison. Investigating statistical machine learning as a tool for software development. In Human Factors in Computing Systems (CHI), 2008. Google ScholarDigital Library
K. Patel, N. Bancroft, S. M. Drucker, J. Fogarty, A. J. Ko, and J. Landay. Gestalt: Integrated support for implementation and analysis in machine learning. In User Interface Software and Technology (UIST), 2010. Google ScholarDigital Library
I. Sanchez, T. Rocktaschel, S. Riedel, and S. Singh. Towards extracting faithful and descriptive representations of latent variable models. In AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches, 2015.Google Scholar
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015. Google ScholarDigital Library
E. Strumbelj and I. Kononenko. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research, 11, 2010. Google ScholarDigital Library
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.Google ScholarCross Ref
B. Ustun and C. Rudin. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 2015. Google ScholarDigital Library
F. Wang and C. Rudin. Falling rule lists. In Artificial Intelligence and Statistics (AISTATS), 2015.Google Scholar
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In phInternational Conference on Machine Learning (ICML), 2015.Google Scholar
P. Zhang, J. Wang, A. Farhadi, M. Hebert, and D. Parikh. Predicting failures of vision systems. In phComputer Vision and Pattern Recognition (CVPR), 2014. Google ScholarDigital Library

Index Terms

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Recommendations

Benevolence trust: a key determinant of user continuance use of online social networks

Online social networking (OSN) has attracted increased attention and growing membership in recent years. In this paper, we propose and test an extended and unified theory of acceptance and use of technology (UTAUT) model, including the additional areas ...
Read More
Explaining Recommendations in E-Learning: Effects on Adolescents' Trust
IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

In the scope of explainable artificial intelligence, explanation techniques are heavily studied to increase trust in recommender systems. However, studies on explaining recommendations typically target adults in e-commerce or media contexts; e-learning ...
Read More
Does Technology Trust Substitute Interpersonal Trust?: Examining Technology Trust's Influence on Individual Decision-Making

While an increasing number of trust studies examine technological artifacts as trust recipients, there is still a lack of basic understanding of how technology trust relates to traditional trust and its role within the broader nomological net ...
Read More

Reviews

Reviewer: Mario A. Aoun

When Bohr introduced his theory of quantum jumps as a model of the inside of an atom, he said that quantum jumps exist but no one can visualize them. Thus, at the time, the scientific community was outraged because science is all about explaining and visualizing physical phenomena. In fact, "not being able to visualize things seemed against the whole purpose of science" [1]. This paper is dealing with a phenomenon that is very similar to Bohr's story; however, instead of talking about quantum jumps or what is happening inside an atom, it is talking about interpretable machine learning (IML) or what is happening inside the machine when it is learning facts and making decisions (that is, predictions). In fact, the new topic of IML is very hot right now [2]. The authors present local interpretable model-agnostic explanations (LIME), a model of IML. First, an agnostic model means that the model could allow explanation of the behavior of the machine without referring to (that is, accessing) its internal parameters. Second, a local interpretable model means that the model acts on the neighborhood of its input values. As a result, LIME can be considered as a "white-box," which locally approximates the behavior of the machine in a neighborhood of input values. It works by calculating a linear summation of the values of the input features scaled by a weight factor. I enjoyed this paper-it is very well written and covers a significant fundamental block of IML. I recommend it to any researcher interested in theorizing the basic aspects of IML.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
black box classifier
explaining machine learning
interpretability
interpretable machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6,619
  Total Citations
  View Citations
- 35,363
  Total Downloads
- Downloads (Last 12 months)13,788
- Downloads (Last 6 weeks)2,382
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.