ABSTRACT
Model building in machine learning is an iterative process. The performance analysis and debugging step typically involves a disruptive cognitive switch from model building to error analysis, discouraging an informed approach to model building. We present ModelTracker, an interactive visualization that subsumes information contained in numerous traditional summary statistics and graphs while displaying example-level performance and enabling direct error examination and debugging. Usage analysis from machine learning practitioners building real models with ModelTracker over six months shows ModelTracker is used often and throughout model building. A controlled experiment focusing on ModelTracker's debugging capabilities shows participants prefer ModelTracker over traditional tools without a loss in model performance.
- Ankerst, M., Elsen, C., Ester, M., and Kriegal, H. Visual Classification: An Interactive Approach to Decision Tree Construction. Proc. KDD 1999, ACM Press (1999), 392--396. Google ScholarDigital Library
- Becker, B., Kohavi, R., and Sommerfield, D. Visualizing the Simple Bayesian Classifier. Information Visualization in Data Mining and Knowledge Discovery. Fayyad, U., Grinstein, G.G., and Wierse, A. (eds). Morgan Kaufmann Publishers, 2001, 237--249. Google ScholarDigital Library
- Bird, S., Klein, E., and Loper, E. Natural Language Processing with Python. O'Reilly Media, 2009. Google ScholarDigital Library
- Broekens, J., Cocx, T., and Kosters, W. Object-Centered Interactive Multi-Dimensional Scaling: Ask the Expert. Proc. BNAIC 2006, 59--66.Google Scholar
- Caragea, D., Cook, D., and Honavar, V. Gaining Insights into Support Vector Machine Pattern Classifiers Using Projection-Based Tour Methods. Proc. KDD 2001, ACM Press (2001), 251--256. Google ScholarDigital Library
- Chan, Y., Correa, C., and Ma, K-L. Flow-based Scatterplots for Sensitivity Analysis. Proc. VAST 2010, IEEE (2010), 43--50.Google ScholarCross Ref
- Choo, J., Hanseung, L., Liu, Z., Stasko, J., and Park, H. An Interactive Visual Testbed System for Dimension Reduction and Clustering of Large-Scale HighDimensional Data. Proc. SPIE Electronic Imaging 2013, 865402-865402-15.Google Scholar
- Domingos, P. A Few Useful Things to Know about Machine Learning. CACM 55, 10 (2012), 78--87. Google ScholarDigital Library
- Fails, J.A. and Olsen, D.R. Interactive Machine Learning. Proc. IUI 2003, ACM Press (2003), 39--45. Google ScholarDigital Library
- Fiebrink, R., Cook, P.R., and Trueman, D. Human Model Evaluation in Interactive Supervised Learning. Proc. CHI 2011, ACM Press (2011), 147--156. Google ScholarDigital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. The WEKA Data Mining Software: An Update. SIGKDD Explorations 11, 1 (2009). Google ScholarDigital Library
- Hao, M.C., Dayal, U., Sharma, R.K., Keim, D.A., and Janetzko, H. Variable Binned Scatter Plots. Information Visualization 9, 3 (2010), 194--203. Google ScholarDigital Library
- MATLAB 9.0 and Statistics Toolbox Release 2014a, The MathWorks, Inc., Natick, Massachusetts, USA, http://www.mathworks.com/products/statistics, 2014.Google Scholar
- Mayorga, A. and Gleicher, M. Scatterplots: Overcoming Overdraw in Scatter Plots. IEEE TVCG 19, 9 (2013), 1526--1538. Google ScholarDigital Library
- Nettleton, D. F., Orriols-Puig, A., and Fornells, A. A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques. AI Review 33, 4 (2010), 275--306. Google ScholarDigital Library
- Patel, K., Bancroft, N., Drucker, S.M., Fogarty, J., Ko, A., and Landay, J.A. Gestalt: Integrated Support for Implementation and Analysis in Machine Learning Processes. Proc. UIST 2010, ACM Press (2010), 37--46. Google ScholarDigital Library
- Patel, K., Drucker, S.M., Fogarty, J., Kapoor, A., and Tan, D.S. Using Multiple Models to Understand Data Proc. IJCAI 2011, AAAI Press (2011), 1723--1728. Google ScholarDigital Library
- Patel, K., Fogarty, J., Landay, J.A., and Harrison, B. Examining Difficulties Software Developers Encounter in the Adoption of Statistical Machine Learning. Proc. AAAI 2008, AAAI Press (2008), 1563--1566. Google ScholarDigital Library
- R Core Team, "R: A Language and Environment for Statistical Computing," R Foundation for Statistical Computing, http://www.R-project.org, 2013.Google Scholar
- Rossi, F. Visual Data Mining and Machine Learning Proc. ESANN 2006, 251--264.Google Scholar
- Simard, P., Chickering, D., Lakshmiratan, A., Charles, D., Bottou, L., Suarez, C.G.J., Grangier, D., Amershi, S., Verwey, J., and Suh, J. ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems. 2014, arXiv:1409.4814.Google Scholar
- Talbot, J., Lee, B., Kapoor, A., and Tan, D. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. Proc. CHI 2009, ACM Press (2009), 1283--1292. Google ScholarDigital Library
Index Terms
- ModelTracker: Redesigning Performance Analysis Tools for Machine Learning
Recommendations
Heapviz: interactive heap visualization for program understanding and debugging
SOFTVIS '10: Proceedings of the 5th international symposium on Software visualizationUnderstanding the data structures in a program is crucial to understanding how the program works, or why it doesn't work. Inspecting the code that implements the data structures, however, is an arduous task and often fails to yield insights into the ...
Recent research advances on interactive machine learning
Interactive machine learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. ...
PaintingClass: interactive construction, visualization and exploration of decision trees
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningDecision trees are commonly used for classification. We propose to use decision trees not just for classification but also for the wider purpose of knowledge discovery, because visualizing the decision tree can reveal much valuable information in the ...
Comments