research-article

ModelTracker: Redesigning Performance Analysis Tools for Machine Learning

Authors:
Saleema Amershi

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Max Chickering

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

,
Steven M. Drucker

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Bongshin Lee

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Patrice Simard

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

,
Jina Suh

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing SystemsApril 2015Pages 337–346https://doi.org/10.1145/2702123.2702509

Published:18 April 2015Publication History

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Pages 337–346

ABSTRACT

Model building in machine learning is an iterative process. The performance analysis and debugging step typically involves a disruptive cognitive switch from model building to error analysis, discouraging an informed approach to model building. We present ModelTracker, an interactive visualization that subsumes information contained in numerous traditional summary statistics and graphs while displaying example-level performance and enabling direct error examination and debugging. Usage analysis from machine learning practitioners building real models with ModelTracker over six months shows ModelTracker is used often and throughout model building. A controlled experiment focusing on ModelTracker's debugging capabilities shows participants prefer ModelTracker over traditional tools without a loss in model performance.

References

Ankerst, M., Elsen, C., Ester, M., and Kriegal, H. Visual Classification: An Interactive Approach to Decision Tree Construction. Proc. KDD 1999, ACM Press (1999), 392--396. Google ScholarDigital Library
Becker, B., Kohavi, R., and Sommerfield, D. Visualizing the Simple Bayesian Classifier. Information Visualization in Data Mining and Knowledge Discovery. Fayyad, U., Grinstein, G.G., and Wierse, A. (eds). Morgan Kaufmann Publishers, 2001, 237--249. Google ScholarDigital Library
Bird, S., Klein, E., and Loper, E. Natural Language Processing with Python. O'Reilly Media, 2009. Google ScholarDigital Library
Broekens, J., Cocx, T., and Kosters, W. Object-Centered Interactive Multi-Dimensional Scaling: Ask the Expert. Proc. BNAIC 2006, 59--66.Google Scholar
Caragea, D., Cook, D., and Honavar, V. Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Methods. Proc. KDD 2001, ACM Press (2001), 251--256. Google ScholarDigital Library
Chan, Y., Correa, C., and Ma, K-L. Flow-based Scatterplots for Sensitivity Analysis. Proc. VAST 2010, IEEE (2010), 43--50.Google ScholarCross Ref
Choo, J., Hanseung, L., Liu, Z., Stasko, J., and Park, H. An Interactive Visual Testbed System for Dimension Reduction and Clustering of Large-Scale HighDimensional Data. Proc. SPIE Electronic Imaging 2013, 865402-865402-15.Google Scholar
Domingos, P. A Few Useful Things to Know about Machine Learning. CACM 55, 10 (2012), 78--87. Google ScholarDigital Library
Fails, J.A. and Olsen, D.R. Interactive Machine Learning. Proc. IUI 2003, ACM Press (2003), 39--45. Google ScholarDigital Library
Fiebrink, R., Cook, P.R., and Trueman, D. Human Model Evaluation in Interactive Supervised Learning. Proc. CHI 2011, ACM Press (2011), 147--156. Google ScholarDigital Library
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. The WEKA Data Mining Software: An Update. SIGKDD Explorations 11, 1 (2009). Google ScholarDigital Library
Hao, M.C., Dayal, U., Sharma, R.K., Keim, D.A., and Janetzko, H. Variable Binned Scatter Plots. Information Visualization 9, 3 (2010), 194--203. Google ScholarDigital Library
MATLAB 9.0 and Statistics Toolbox Release 2014a, The MathWorks, Inc., Natick, Massachusetts, USA, http://www.mathworks.com/products/statistics, 2014.Google Scholar
Mayorga, A. and Gleicher, M. Scatterplots: Overcoming Overdraw in Scatter Plots. IEEE TVCG 19, 9 (2013), 1526--1538. Google ScholarDigital Library
Nettleton, D. F., Orriols-Puig, A., and Fornells, A. A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques. AI Review 33, 4 (2010), 275--306. Google ScholarDigital Library
Patel, K., Bancroft, N., Drucker, S.M., Fogarty, J., Ko, A., and Landay, J.A. Gestalt: Integrated Support for Implementation and Analysis in Machine Learning Processes. Proc. UIST 2010, ACM Press (2010), 37--46. Google ScholarDigital Library
Patel, K., Drucker, S.M., Fogarty, J., Kapoor, A., and Tan, D.S. Using Multiple Models to Understand Data Proc. IJCAI 2011, AAAI Press (2011), 1723--1728. Google ScholarDigital Library
Patel, K., Fogarty, J., Landay, J.A., and Harrison, B. Examining Difficulties Software Developers Encounter in the Adoption of Statistical Machine Learning. Proc. AAAI 2008, AAAI Press (2008), 1563--1566. Google ScholarDigital Library
R Core Team, "R: A Language and Environment for Statistical Computing," R Foundation for Statistical Computing, http://www.R-project.org, 2013.Google Scholar
Rossi, F. Visual Data Mining and Machine Learning Proc. ESANN 2006, 251--264.Google Scholar
Simard, P., Chickering, D., Lakshmiratan, A., Charles, D., Bottou, L., Suarez, C.G.J., Grangier, D., Amershi, S., Verwey, J., and Suh, J. ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems. 2014, arXiv:1409.4814.Google Scholar
Talbot, J., Lee, B., Kapoor, A., and Tan, D. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. Proc. CHI 2009, ACM Press (2009), 1283--1292. Google ScholarDigital Library

Index Terms

ModelTracker: Redesigning Performance Analysis Tools for Machine Learning
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Heapviz: interactive heap visualization for program understanding and debugging
SOFTVIS '10: Proceedings of the 5th international symposium on Software visualization

Understanding the data structures in a program is crucial to understanding how the program works, or why it doesn't work. Inspecting the code that implements the data structures, however, is an arduous task and often fails to yield insights into the ...
Read More
Recent research advances on interactive machine learning

Interactive machine learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. ...
Read More
PaintingClass: interactive construction, visualization and exploration of decision trees
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Decision trees are commonly used for classification. We propose to use decision trees not just for classification but also for the wider purpose of knowledge discovery, because visualizing the decision tree can reveal much valuable information in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems
April 2015
4290 pages
ISBN:9781450331456
DOI:10.1145/2702123
General Chairs:
Bo Begole
Huawei, USA
,
Jinwoo Kim
Yonsei University, Korea
,
Program Chairs:
Kori Inkpen
Microsoft Research, USA
,
Woontack Woo
KAIST, Korea
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
debugging
interactive visualization
machine learning
performance analysis
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '15 Paper Acceptance Rate486of2,120submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 175
  Total Citations
  View Citations
- 1,831
  Total Downloads
- Downloads (Last 12 months)201
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ModelTracker: Redesigning Performance Analysis Tools for Machine Learning

CHI '15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heapviz: interactive heap visualization for program understanding and debugging

Recent research advances on interactive machine learning

PaintingClass: interactive construction, visualization and exploration of decision trees