Uncovering the Latent Structures of Crowd Labeling

Tian, Tian; Zhu, Jun

doi:10.1007/978-3-319-18038-0_31

Tian Tian¹⁰ &
Jun Zhu¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3543 Accesses
5 Citations

Abstract

Crowdsourcing provides a new way to distribute enormous tasks to a crowd of annotators. The divergent knowledge background and personal preferences of crowd annotators lead to noisy (or even inconsistent) answers to a same question. However, diverse labels provide us information about the underlying structures of tasks and annotators. This paper proposes latent-class assumptions for learning-from-crowds models, that is, items can be separated into several latent classes and workers’ annotating behaviors may differ among different classes. We propose a nonparametric model to uncover the latent classes, and also extend the state-of-the-art minimax entropy estimator to learn latent structures. Experimental results on both synthetic data and real data collected from Amazon Mechanical Turk demonstrate our methods can disclose interesting and meaningful latent structures, and incorporating latent class structures can also bring significant improvements on ground truth label recovery for difficult tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast-but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP (2008)
Google Scholar
Zhu, J., Chen, N., Xing, E.P.: Bayesian inference with posterior regularization and applications to infinite latent svms. JMLR 15, 1799–1847 (2014)
MATH MathSciNet Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 20–28 (1979)
Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. JMLR 11, 1297–1322 (2010)
MathSciNet Google Scholar
Zhou, D., Platt, J.C., Basu, S., Mao, Y.: Learning from the wisdom of crowds by minimax entropy. In: NIPS (2012)
Google Scholar
Zhou, D., Liu, Q., Platt, J.C., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML (2014)
Google Scholar
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: NIPS (2010)
Google Scholar
Sheshadri, A., Lease, M.: Square: a benchmark for research on computing crowd consensus. In: First AAAI Conference on Human Computation and Crowdsourcing (2013)
Google Scholar
Tian, Y., Zhu, J.: Learning from crowds in the presence of schools of thought. In: ICDM (2012)
Google Scholar
Li, H., Yu, B., Zhou, D.: Error rate analysis of labeling by crowdsourcing. In: ICML Workshop: Machine Learning Meets Crowdsourcing, Atalanta, Georgia, USA (2013)
Google Scholar
Gao, C., Zhou, D.: Minimax optimal convergence rates for estimating ground truth from crowdsourced labels. arXiv preprint arXiv:1310.5764 (2013)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics 9(2), 249–265 (2000)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab of Intelligent Technology and Systems, Tsinghua National Lab for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Tian Tian & Jun Zhu

Authors

Tian Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Zhu .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, T., Zhu, J. (2015). Uncovering the Latent Structures of Crowd Labeling. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-18038-0_31
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18037-3
Online ISBN: 978-3-319-18038-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics