Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis

Authors:
Tongshuang Wu

University of Washington, Seattle, WA

University of Washington, Seattle, WA

0000-0003-1630-0588
View Profile

,
Daniel S. Weld

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Jeffrey Heer

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

Authors Info & Claims

ACM Transactions on Computer-Human Interaction Volume 26 Issue 4Article No.: 24pp 1–27https://doi.org/10.1145/3319616

Published:17 June 2019Publication History

ACM Transactions on Computer-Human Interaction

Abstract

Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making in an IML context around feature selection for a sentiment classification task. Specifically, we characterize the utility of interactive feature selection through a combination of human-subjects experiments and computational simulations. We find that, in expectation, interactive modification fails to improve model performance and may hamper generalization due to overfitting. We examine how these trends are affected by the dataset, learning algorithm, and the training set size. Across these factors we observe consistent generalization issues. Our results suggest that rapid iterations with IML systems can be dangerous if they encourage local actions divorced from global context, degrading overall model performance. We conclude by discussing the implications of our feature selection results to the broader area of IML systems and research.

References

Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual methods for analyzing probabilistic classification data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1703--1712.Google ScholarCross Ref
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120.Google ScholarDigital Library
Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 337--346.Google ScholarDigital Library
Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2009. Overview based example selection in end user interactive concept learning. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology. ACM, 247--256. Google ScholarDigital Library
Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2010. Examining multiple potential models in end-user interactive concept learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1357--1360. Google ScholarDigital Library
Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 21--30. Google ScholarDigital Library
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’10), Vol. 10. 2200--2204.Google Scholar
Dale J. Barr, Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68, 3 (2013), 255--278.Google ScholarCross Ref
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems. 4349--4357.Google ScholarDigital Library
Michael Brooks, Saleema Amershi, Bongshin Lee, Steven M. Drucker, Ashish Kapoor, and Patrice Simard. 2015. FeatureInsight: Visual support for error-driven feature ideation in text classification. In IEEE Conference on Visual Analytics Science and Technology (VAST’15). IEEE, 105--112.Google ScholarCross Ref
Eli T. Brown, Jingjing Liu, Carla E. Brodley, and Remco Chang. 2012. Dis-function: Learning distance functions interactively. In IEEE Conference on Visual Analytics Science and Technology (VAST’12). IEEE, 83--92.Google ScholarDigital Library
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 161--168. Google ScholarDigital Library
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2334--2346.Google ScholarDigital Library
Justin Cheng and Michael S. Bernstein. 2015. Flock: Hybrid crowd-machine learning classifiers. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work 8 Social Computing. ACM, 600--611.Google Scholar
Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 1992--2001. Google ScholarDigital Library
Shubhomoy Das, Travis Moore, Weng-Keen Wong, Simone Stumpf, Ian Oberst, Kevin McIntosh, and Margaret Burnett. 2013. End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression. Artificial Intelligence 204 (2013), 56--74. Google ScholarDigital Library
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. 2015. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 6248 (2015), 636--638.Google Scholar
Jerry Alan Fails and Dan R. Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces. ACM, 39--45.Google Scholar
Rebecca Fiebrink, Perry R. Cook, and Dan Trueman. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 147--156. Google ScholarDigital Library
John R. Firth. 1957. A synopsis of linguistic theory, 1930--1955. In Studies in Linguistic Analysis. Blackwell, Oxford.Google Scholar
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report. Stanford.Google Scholar
Shu-Ping Gong and Kathleen Ahrens. 2011. The prior knowledge effect on the processing of vague discourse in Mandarin Chinese. In Proceedings of the ROCLING 2011 Poster Papers. Association for Computational Linguistics, 252--264.Google Scholar
Kathleen A. Hansen, Sarah F. Hillenbrand, and Leslie G. Ungerleider. 2012. Effects of prior knowledge on decisions made under perceptual vs. categorical uncertainty. Frontiers in Neuroscience 6 (2012), 163.Google ScholarCross Ref
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems. 3315--3323. Google ScholarDigital Library
Florian Heimerl, Steffen Koch, Harald Bosch, and Thomas Ertl. 2012. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2839--2848. Google ScholarDigital Library
Jessica Hullman. 2013. How prior knowledge affects the processing of visualized data. In Proceedings of the ACM CHI 2013, Many People Many Eyes Workshop.Google Scholar
Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. Wiley Online Library, 767--774. Google ScholarDigital Library
John T. Kent. 1983. Information gain and a general measure of correlation. Biometrika 70, 1 (1983), 163--173.Google ScholarCross Ref
Sotiris B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies. Google ScholarDigital Library
Josua Krause, Adam Perer, and Enrico Bertini. 2014. INFUSE: Interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1614--1623.Google ScholarCross Ref
Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5686--5697. Google ScholarDigital Library
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084. Google ScholarDigital Library
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126--137. Google ScholarDigital Library
Todd Kulesza, Simone Stumpf, Weng-Keen Wong, Margaret M Burnett, Stephen Perona, Andrew Ko, and Ian Oberst. 2011. Why-oriented end-user debugging of naive Bayes text classification. ACM Transactions on Interactive Intelligent Systems 1, 1 (2011), 2. Google ScholarDigital Library
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Vol. 1. 2124--2132. Google ScholarDigital Library
Rensis Likert. 1932. A technique for the measurement of attitudes. In Archives of Psychology. The Science Press, New York.Google Scholar
Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2119--2128.Google ScholarDigital Library
Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics 1, 1 (2017), 48--56.Google ScholarCross Ref
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 142--150. Google ScholarDigital Library
Frank J. Massey Jr. 1951. The Kolmogorov--Smirnov test for goodness of fit. Journal of the American Statistical Association 46, 253 (1951), 68--78.Google ScholarCross Ref
Thorsten May, Andreas Bannach, James Davey, Tobias Ruppert, and Jörn Kohlhammer. 2011. Guiding feature subset selection with an interactive visualization. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST’11). IEEE, 111--120.Google ScholarCross Ref
Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. 2006. Spam filtering with naive Bayes—which naive Bayes? In Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS’06), Vol. 17. 28--69.Google Scholar
Mohammad Sadegh, Roliana Ibrahim, and Zulaiha Ali Othman. 2012. Opinion mining and sentiment analysis: A survey. International Journal of Computers and Technology 2, 3 (2012), 171–178.Google ScholarCross Ref
Ted O’Donoghue and Matthew Rabin. 1999. Doing it now or later. American Economic Review 89, 1 (1999), 103--124.Google ScholarCross Ref
Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 667--676. Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual explanation of evidence with additive classifiers. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press. Google ScholarDigital Library
Hema Raghavan and James Allan. 2007. An interactive algorithm for asking and incorporating feature feedback into support vector machines. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 79--86.Google ScholarDigital Library
Hema Raghavan, Omid Madani, and Rosie Jones. 2005. InterActive feature selection. In Proceedings of the 19th international joint conference on Artificial intelligence (IJCAI’05), Vol. 5. 841--846.Google Scholar
Hema Raghavan, Omid Madani, and Rosie Jones. 2006. Active learning with feedback on features and instances. Journal of Machine Learning Research 7, Aug. (2006), 1655--1686.Google ScholarDigital Library
Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.Google ScholarDigital Library
Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D. Williams. 2017. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 61--70. Google ScholarDigital Library
Juha Reunanen. 2003. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, Mar. (2003), 1371--1382.Google Scholar
Xin Rong and Eytan Adar. 2016. Visual tools for debugging neural language models. In Proceedings of ICML Workshop on Visualization for Deep Learning.Google Scholar
Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1467--1478.Google ScholarDigital Library
Eldar Shafir. 2007. Decisions constructed locally: Some fundamental principles of the psychology of decision making. In Social Psychology: Handbook of Basic Principles. Arie W. Kruglanski and E. Tory Higgins (Eds.), Guilford Publications, Chapter 14, 334--352.Google Scholar
Patrice Y. Simard, David Maxwell Chickering, Aparna Lakshmiratan, Denis Xavier Charles, Léon Bottou, Carlos Garcia Jurado Suarez, David Grangier, Saleema Amershi, Johan Verwey, and Jina Suh. 2014. ICE: Enabling non-experts to build models interactively for large-scale lopsided problems. CoRR abs/1409.4814.Google Scholar
Herbert A. Simon. 1957. Models of Man; Social and Rational. Wiley, Oxford.Google Scholar
Simone Stumpf, Adrian Bussone, and Dympna O ’sullivan. 2016. Explanations considered harmful? User interactions with machine learning systems. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’16).Google Scholar
Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies 67, 8 (2009), 639--662. Google ScholarDigital Library
Simone Stumpf, Erin Sullivan, Erin Fitzhenry, Ian Oberst, Weng-Keen Wong, and Margaret Burnett. 2008. Integrating rich user feedback into intelligent user interfaces. In Proceedings of the 13th International Conference on Intelligent User Interfaces. ACM, 50--59. Google ScholarDigital Library
Google Big Picture Team. 2017. Facets: Visualization for ML datasets. Retrieved February 11, 2018 from https://pair-code.github.io/facets/.Google Scholar
Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, and Harry Hochheiser. 2015. An interactive tool for natural language processing on clinical text. In Proceedings of the 4th Workshop on Visual Text Analytics (IUI TextVis’15). Retrieved from http://vialab.science.uoit.ca/textvis2015/{PDF}.Google Scholar
Joe Tullio, Anind K. Dey, Jason Chalecki, and James Fogarty. 2007. How it works: A field study of non-technical users interacting with an intelligent system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 31--40. Google ScholarDigital Library
Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. Science 185, 4157 (1974), 1124--1131.Google Scholar
Paroma Varma, Dan Iter, Christopher De Sa, and Christopher Ré. 2017. Flipper: A systematic approach to debugging training sets. In Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics. ACM, 5.Google ScholarDigital Library
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6 (1945), 80--83.Google ScholarCross Ref

Index Terms

Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
  2. Visualization
    1. Visualization design and evaluation methods

Recommendations

Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
Read More
Deploying an interactive machine learning system in an evidence-based practice center: abstrackr
IHI '12: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Medical researchers looking for evidence pertinent to a specific clinical question must navigate an increasingly voluminous corpus of published literature. This data deluge has motivated the development of machine learning and data mining technologies ...
Read More
The UX of Interactive Machine Learning
NordiCHI '20: Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society

Machine Learning (ML) has been a prominent area of research within Artificial Intelligence (AI). ML uses mathematical models to recognize patterns in large and complex data sets to aid decision making in different application areas, such as image and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer-Human Interaction Volume 26, Issue 4
August 2019
251 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/3341168
Editor:
Kristina Höök
KTH Royal Institute of Technology, Sweden
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2019
- Accepted: 1 March 2019
- Revised: 1 October 2018
- Received: 1 March 2018
Published in tochi Volume 26, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Machine learning
performance analysis
text classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 1,183
  Total Downloads
- Downloads (Last 12 months)237
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis

ACM Transactions on Computer-Human Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Machine Learning: The State of the Art

Deploying an interactive machine learning system in an evidence-based practice center: abstrackr

The UX of Interactive Machine Learning