short-paper

Active Learning with Visualization for Text Data

Authors:
Lulu Huang

Dalhousie University, Halifax, NS, Canada

Dalhousie University, Halifax, NS, Canada
View Profile

,
Stan Matwin

Dalhousie University & Polish Academy of Science, Halifax, NS, Canada

Dalhousie University & Polish Academy of Science, Halifax, NS, Canada
View Profile

,
Eder J. de Carvalho

University of Sao Paulo, Sao Carlos, Brazil

University of Sao Paulo, Sao Carlos, Brazil
View Profile

,
Rosane Minghim

University of Sao Paulo, Sao Carlos, Brazil

University of Sao Paulo, Sao Carlos, Brazil
View Profile

ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data AnalyticsMarch 2017Pages 69–74https://doi.org/10.1145/3038462.3038469

Published:13 March 2017Publication History

ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics

Pages 69–74

ABSTRACT

Labeled datasets are always limited, and oftentimes the quantity of labeled data is a bottleneck for data analytics. This especially affects supervised machine learning methods, which require labels for models to learn from the labeled data. Active learning algorithms have been proposed to help achieve good analytic models with limited labeling efforts, by determining which additional instance labels will be most beneficial for learning for a given model. Active learning is consistent with interactive analytics as it proceeds in a cycle in which the unlabeled data is automatically explored. However, in active learning users have no control of the instances to be labeled, and for text data, the annotation interface is usually document only. Both of these constraints seem to affect the performance of an active learning model. We hypothesize that visualization techniques, particularly interactive ones, will help to address these constraints. In this paper, we implement a pilot study of visualization in active learning for text classification, with an interactive labeling interface. We compare the results of three experiments. Early results indicate that visualization improves high-performance machine learning model building with an active learning algorithm.

References

Shilpa Arora, Eric Nyberg, and Carolyn P Rosé. 2009. Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing. Association for Computational Linguistics, 18--26. Google ScholarCross Ref
John Blitzer, Mark Dredze, Fernando Pereira, and others. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, Vol. 7. 440--447.Google Scholar
Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 data-driven documents. IEEE transactions on visualization and computer graphics 17, 12 (2011), 2301--2309. Google ScholarDigital Library
Michael Brooks, Saleema Amershi, Bongshin Lee, Steven M Drucker, Ashish Kapoor, and Patrice Simard. 2015. FeatureInsight: Visual support for error-driven feature ideation in text classification. In Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on. IEEE, 105--112.Google ScholarCross Ref
Florian Heimerl, Steffen Koch, Harald Bosch, and Thomas Ertl. 2012. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2839--2848. Google ScholarDigital Library
Hongsen Liao, Li Chen, Yibo Song, and Hao Ming. 2015. Visualization Based Active Learning for Video Annotation. IEEE Transactions on Multimedia (2015).Google Scholar
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.Google Scholar
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Google ScholarCross Ref
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom Mitchell, and others. 1998. Learning to classify text from labeled and unlabeled documents. AAAI/IAAI 792 (1998).Google Scholar
Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison.Google Scholar
Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1467--1478.Google ScholarDigital Library
Vladimir Naumovich Vapnik and Samuel Kotz. 1982. Estimation of dependences based on empirical data. Vol. 40. Springer-Verlag New York.Google Scholar

Index Terms

Active Learning with Visualization for Text Data
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
      1. Cognitive science
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Effective multi-label active learning for text classification
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical when a very large amount of data is needed for training multi-label text ...
Read More
Cost‐effective multi‐instance multilabel active learning
Abstract
Multi‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Read More
Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

In many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics
March 2017
82 pages
ISBN:9781450349031
DOI:10.1145/3038462
Conference Chairs:
Dorota Glowacka
University of Helsinki, Finland
,
Evangelos Milios
Dalhousie University, Canada
,
Axel J. Soto
University of Manchester, UK
,
Fernando Paulovich
University of Sao Paulo, Brazil
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
text classification
visualization
Qualifiers
- short-paper
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 623
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Active Learning with Visualization for Text Data

ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effective multi-label active learning for text classification

Cost‐effective multi‐instance multilabel active learning

Combining active learning and semi-supervised for improving learning performance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Active Learning with Visualization for Text Data

ESIDA '17: Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effective multi-label active learning for text classification

Cost‐effective multi‐instance multilabel active learning

Combining active learning and semi-supervised for improving learning performance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media