skip to main content
10.1145/3038462.3038469acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
short-paper

Active Learning with Visualization for Text Data

Published:13 March 2017Publication History

ABSTRACT

Labeled datasets are always limited, and oftentimes the quantity of labeled data is a bottleneck for data analytics. This especially affects supervised machine learning methods, which require labels for models to learn from the labeled data. Active learning algorithms have been proposed to help achieve good analytic models with limited labeling efforts, by determining which additional instance labels will be most beneficial for learning for a given model. Active learning is consistent with interactive analytics as it proceeds in a cycle in which the unlabeled data is automatically explored. However, in active learning users have no control of the instances to be labeled, and for text data, the annotation interface is usually document only. Both of these constraints seem to affect the performance of an active learning model. We hypothesize that visualization techniques, particularly interactive ones, will help to address these constraints. In this paper, we implement a pilot study of visualization in active learning for text classification, with an interactive labeling interface. We compare the results of three experiments. Early results indicate that visualization improves high-performance machine learning model building with an active learning algorithm.

References

  1. Shilpa Arora, Eric Nyberg, and Carolyn P Rosé. 2009. Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing. Association for Computational Linguistics, 18--26. Google ScholarGoogle ScholarCross RefCross Ref
  2. John Blitzer, Mark Dredze, Fernando Pereira, and others. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, Vol. 7. 440--447.Google ScholarGoogle Scholar
  3. Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 data-driven documents. IEEE transactions on visualization and computer graphics 17, 12 (2011), 2301--2309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Brooks, Saleema Amershi, Bongshin Lee, Steven M Drucker, Ashish Kapoor, and Patrice Simard. 2015. FeatureInsight: Visual support for error-driven feature ideation in text classification. In Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 105--112.Google ScholarGoogle ScholarCross RefCross Ref
  5. Florian Heimerl, Steffen Koch, Harald Bosch, and Thomas Ertl. 2012. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2839--2848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hongsen Liao, Li Chen, Yibo Song, and Hao Ming. 2015. Visualization Based Active Learning for Video Annotation. IEEE Transactions on Multimedia (2015).Google ScholarGoogle Scholar
  7. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  8. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Google ScholarGoogle ScholarCross RefCross Ref
  9. Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom Mitchell, and others. 1998. Learning to classify text from labeled and unlabeled documents. AAAI/IAAI 792 (1998).Google ScholarGoogle Scholar
  10. Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison.Google ScholarGoogle Scholar
  11. Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1467--1478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vladimir Naumovich Vapnik and Samuel Kotz. 1982. Estimation of dependences based on empirical data. Vol. 40. Springer-Verlag New York.Google ScholarGoogle Scholar

Index Terms

  1. Active Learning with Visualization for Text Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics
        March 2017
        82 pages
        ISBN:9781450349031
        DOI:10.1145/3038462

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader