skip to main content
research-article
Public Access

Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis

Published:17 June 2019Publication History
Skip Abstract Section

Abstract

Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making in an IML context around feature selection for a sentiment classification task. Specifically, we characterize the utility of interactive feature selection through a combination of human-subjects experiments and computational simulations. We find that, in expectation, interactive modification fails to improve model performance and may hamper generalization due to overfitting. We examine how these trends are affected by the dataset, learning algorithm, and the training set size. Across these factors we observe consistent generalization issues. Our results suggest that rapid iterations with IML systems can be dangerous if they encourage local actions divorced from global context, degrading overall model performance. We conclude by discussing the implications of our feature selection results to the broader area of IML systems and research.

References

  1. Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual methods for analyzing probabilistic classification data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1703--1712.Google ScholarGoogle ScholarCross RefCross Ref
  2. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 337--346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2009. Overview based example selection in end user interactive concept learning. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology. ACM, 247--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2010. Examining multiple potential models in end-user interactive concept learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1357--1360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’10), Vol. 10. 2200--2204.Google ScholarGoogle Scholar
  8. Dale J. Barr, Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68, 3 (2013), 255--278.Google ScholarGoogle ScholarCross RefCross Ref
  9. Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems. 4349--4357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael Brooks, Saleema Amershi, Bongshin Lee, Steven M. Drucker, Ashish Kapoor, and Patrice Simard. 2015. FeatureInsight: Visual support for error-driven feature ideation in text classification. In IEEE Conference on Visual Analytics Science and Technology (VAST’15). IEEE, 105--112.Google ScholarGoogle ScholarCross RefCross Ref
  11. Eli T. Brown, Jingjing Liu, Carla E. Brodley, and Remco Chang. 2012. Dis-function: Learning distance functions interactively. In IEEE Conference on Visual Analytics Science and Technology (VAST’12). IEEE, 83--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2334--2346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Justin Cheng and Michael S. Bernstein. 2015. Flock: Hybrid crowd-machine learning classifiers. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work 8 Social Computing. ACM, 600--611.Google ScholarGoogle Scholar
  15. Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 1992--2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shubhomoy Das, Travis Moore, Weng-Keen Wong, Simone Stumpf, Ian Oberst, Kevin McIntosh, and Margaret Burnett. 2013. End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression. Artificial Intelligence 204 (2013), 56--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. 2015. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 6248 (2015), 636--638.Google ScholarGoogle Scholar
  18. Jerry Alan Fails and Dan R. Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces. ACM, 39--45.Google ScholarGoogle Scholar
  19. Rebecca Fiebrink, Perry R. Cook, and Dan Trueman. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John R. Firth. 1957. A synopsis of linguistic theory, 1930--1955. In Studies in Linguistic Analysis. Blackwell, Oxford.Google ScholarGoogle Scholar
  21. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report. Stanford.Google ScholarGoogle Scholar
  22. Shu-Ping Gong and Kathleen Ahrens. 2011. The prior knowledge effect on the processing of vague discourse in Mandarin Chinese. In Proceedings of the ROCLING 2011 Poster Papers. Association for Computational Linguistics, 252--264.Google ScholarGoogle Scholar
  23. Kathleen A. Hansen, Sarah F. Hillenbrand, and Leslie G. Ungerleider. 2012. Effects of prior knowledge on decisions made under perceptual vs. categorical uncertainty. Frontiers in Neuroscience 6 (2012), 163.Google ScholarGoogle ScholarCross RefCross Ref
  24. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems. 3315--3323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Florian Heimerl, Steffen Koch, Harald Bosch, and Thomas Ertl. 2012. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2839--2848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jessica Hullman. 2013. How prior knowledge affects the processing of visualized data. In Proceedings of the ACM CHI 2013, Many People Many Eyes Workshop.Google ScholarGoogle Scholar
  27. Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. In Computer Graphics Forum, Vol. 28. Wiley Online Library, 767--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. John T. Kent. 1983. Information gain and a general measure of correlation. Biometrika 70, 1 (1983), 163--173.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sotiris B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Josua Krause, Adam Perer, and Enrico Bertini. 2014. INFUSE: Interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1614--1623.Google ScholarGoogle ScholarCross RefCross Ref
  31. Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5686--5697. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Todd Kulesza, Simone Stumpf, Weng-Keen Wong, Margaret M Burnett, Stephen Perona, Andrew Ko, and Ian Oberst. 2011. Why-oriented end-user debugging of naive Bayes text classification. ACM Transactions on Interactive Intelligent Systems 1, 1 (2011), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Vol. 1. 2124--2132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rensis Likert. 1932. A technique for the measurement of attitudes. In Archives of Psychology. The Science Press, New York.Google ScholarGoogle Scholar
  37. Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2119--2128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics 1, 1 (2017), 48--56.Google ScholarGoogle ScholarCross RefCross Ref
  39. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 142--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Frank J. Massey Jr. 1951. The Kolmogorov--Smirnov test for goodness of fit. Journal of the American Statistical Association 46, 253 (1951), 68--78.Google ScholarGoogle ScholarCross RefCross Ref
  41. Thorsten May, Andreas Bannach, James Davey, Tobias Ruppert, and Jörn Kohlhammer. 2011. Guiding feature subset selection with an interactive visualization. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST’11). IEEE, 111--120.Google ScholarGoogle ScholarCross RefCross Ref
  42. Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. 2006. Spam filtering with naive Bayes—which naive Bayes? In Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS’06), Vol. 17. 28--69.Google ScholarGoogle Scholar
  43. Mohammad Sadegh, Roliana Ibrahim, and Zulaiha Ali Othman. 2012. Opinion mining and sentiment analysis: A survey. International Journal of Computers and Technology 2, 3 (2012), 171–178.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ted O’Donoghue and Matthew Rabin. 1999. Doing it now or later. American Economic Review 89, 1 (1999), 103--124.Google ScholarGoogle ScholarCross RefCross Ref
  45. Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 667--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual explanation of evidence with additive classifiers. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hema Raghavan and James Allan. 2007. An interactive algorithm for asking and incorporating feature feedback into support vector machines. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 79--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hema Raghavan, Omid Madani, and Rosie Jones. 2005. InterActive feature selection. In Proceedings of the 19th international joint conference on Artificial intelligence (IJCAI’05), Vol. 5. 841--846.Google ScholarGoogle Scholar
  50. Hema Raghavan, Omid Madani, and Rosie Jones. 2006. Active learning with feedback on features and instances. Journal of Machine Learning Research 7, Aug. (2006), 1655--1686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D. Williams. 2017. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Juha Reunanen. 2003. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, Mar. (2003), 1371--1382.Google ScholarGoogle Scholar
  54. Xin Rong and Eytan Adar. 2016. Visual tools for debugging neural language models. In Proceedings of ICML Workshop on Visualization for Deep Learning.Google ScholarGoogle Scholar
  55. Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1467--1478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Eldar Shafir. 2007. Decisions constructed locally: Some fundamental principles of the psychology of decision making. In Social Psychology: Handbook of Basic Principles. Arie W. Kruglanski and E. Tory Higgins (Eds.), Guilford Publications, Chapter 14, 334--352.Google ScholarGoogle Scholar
  57. Patrice Y. Simard, David Maxwell Chickering, Aparna Lakshmiratan, Denis Xavier Charles, Léon Bottou, Carlos Garcia Jurado Suarez, David Grangier, Saleema Amershi, Johan Verwey, and Jina Suh. 2014. ICE: Enabling non-experts to build models interactively for large-scale lopsided problems. CoRR abs/1409.4814.Google ScholarGoogle Scholar
  58. Herbert A. Simon. 1957. Models of Man; Social and Rational. Wiley, Oxford.Google ScholarGoogle Scholar
  59. Simone Stumpf, Adrian Bussone, and Dympna O ’sullivan. 2016. Explanations considered harmful? User interactions with machine learning systems. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’16).Google ScholarGoogle Scholar
  60. Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies 67, 8 (2009), 639--662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Simone Stumpf, Erin Sullivan, Erin Fitzhenry, Ian Oberst, Weng-Keen Wong, and Margaret Burnett. 2008. Integrating rich user feedback into intelligent user interfaces. In Proceedings of the 13th International Conference on Intelligent User Interfaces. ACM, 50--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Google Big Picture Team. 2017. Facets: Visualization for ML datasets. Retrieved February 11, 2018 from https://pair-code.github.io/facets/.Google ScholarGoogle Scholar
  63. Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, and Harry Hochheiser. 2015. An interactive tool for natural language processing on clinical text. In Proceedings of the 4th Workshop on Visual Text Analytics (IUI TextVis’15). Retrieved from http://vialab.science.uoit.ca/textvis2015/{PDF}.Google ScholarGoogle Scholar
  64. Joe Tullio, Anind K. Dey, Jason Chalecki, and James Fogarty. 2007. How it works: A field study of non-technical users interacting with an intelligent system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. Science 185, 4157 (1974), 1124--1131.Google ScholarGoogle Scholar
  66. Paroma Varma, Dan Iter, Christopher De Sa, and Christopher Ré. 2017. Flipper: A systematic approach to debugging training sets. In Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics. ACM, 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin 1, 6 (1945), 80--83.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer-Human Interaction
          ACM Transactions on Computer-Human Interaction  Volume 26, Issue 4
          August 2019
          251 pages
          ISSN:1073-0516
          EISSN:1557-7325
          DOI:10.1145/3341168
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 June 2019
          • Accepted: 1 March 2019
          • Revised: 1 October 2018
          • Received: 1 March 2018
          Published in tochi Volume 26, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format