research-article

CrowdScreen: algorithms for filtering data with humans

Authors:
Aditya G. Parameswaran

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Hector Garcia-Molina

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Hyunjung Park

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Neoklis Polyzotis

UC Santa Cruz, Santa Cruz, CA, USA

UC Santa Cruz, Santa Cruz, CA, USA
View Profile

,
Aditya Ramesh

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Jennifer Widom

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMay 2012Pages 361–372https://doi.org/10.1145/2213836.2213878

Published:20 May 2012Publication History

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Pages 361–372

ABSTRACT

Given a large set of data items, we consider the problem of filtering them based on a set of properties that can be verified by humans. This problem is commonplace in crowdsourcing applications, and yet, to our knowledge, no one has considered the formal optimization of this problem. (Typical solutions use heuristics to solve the problem.) We formally state a few different variants of this problem. We develop deterministic and probabilistic algorithms to optimize the expected cost (i.e., number of questions) and expected error. We experimentally show that our algorithms provide definite gains with respect to other strategies. Our algorithms can be applied in a variety of crowdsourcing scenarios and can form an integral part of any query processor that uses human computation.

References

Mechanical Turk. http://mturk.com.Google Scholar
A. Feng et al. Crowddb: Query processing with the vldb crowd (demo). In VLDB, 2011.Google Scholar
A. Marcus et al. Crowdsourced databases: Query processing with people. In CIDR, 2011.Google Scholar
A. Marcus et al. Demonstration of qurk: a query processor for human operators. In SIGMOD, 2011. Google ScholarDigital Library
A. Parameswaran et al. Human-assisted graph search: it's okay to ask questions. In VLDB, 2011. Google ScholarDigital Library
Omar Alonso, Daniel E. Rose, and Benjamin Stewart. Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 2008. Google ScholarDigital Library
A. Doan, R. Ramakrishnan, and A.Y. Halevy. Crowdsourcing systems on the world-wide web. Communications of the ACM, 54(4):86--96, 2011. Google ScholarDigital Library
E. Bakshy et al. Everyone's an influencer: quantifying influence on twitter. In WSDM, 2011. Google ScholarDigital Library
A. Parameswaran et al. Crowdscreen: Algorithms for filtering data with humans. Technical report, http://ilpubs.stanford.edu:8090/1011/.Google Scholar
G. Little et al. Turkit: tools for iterative tasks on mechanical turk. In HCOMP, 2009. Google ScholarDigital Library
J. Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS. 2009.Google Scholar
M. J. Franklin et al. Crowddb: answering queries with crowdsourcing. In SIGMOD, 2011. Google ScholarDigital Library
Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. Human-powered sorts and joins. Proc. VLDB Endow., 5, September 2011. Google ScholarDigital Library
Robert McCann, Warren Shen, and AnHai Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE '08. Google ScholarDigital Library
P. Donmez et al. Efficiently learning the accuracy of labeling sources for selective sampling. In KDD, 2009. Google ScholarDigital Library
P. Perona P. Welinder. Online crowdsourcing: rating annotators and obtaining cost-effective labels. In CVPR, 2010.Google ScholarCross Ref
A. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, 2011.Google Scholar
Alexander J. Quinn and Benjamin B. Bederson. Human computation: a survey and taxonomy of a growing field. In CHI, 2011. Google ScholarDigital Library
R. Gomes et al. Crowdclustering. In NIPS, 2011.Google Scholar
R. Snow et al. Cheap and fast-but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarDigital Library
Tim Roughgarden. Algorithmic game theory. Commun. ACM, 53(7):78--86, 2010. Google ScholarDigital Library
Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.Google Scholar
V. Raykar et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit. In ICML, 2009. Google ScholarDigital Library
V. S. Sheng et al. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008. Google ScholarDigital Library
Larry Wasserman. All of Statistics. Springer, 2003.Google Scholar
Omar F. Zaidan and Chris Callison-Burch. Feasibility of human-in-the-loop minimum error rate training. In EMNLP, 2009. Google ScholarDigital Library

Index Terms

CrowdScreen: algorithms for filtering data with humans
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Multiverse: crowd algorithms on existing interfaces
CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing Systems

Crowd-powered systems implement crowd algorithms to improve crowd work through techniques like redundancy, iteration, and task decomposition. Existing approaches require substantial programming to package tasks for the crowd and apply crowd algorithms. ...
Read More
Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow
WebSci '13: Proceedings of the 5th Annual ACM Web Science Conference

Ontology evaluation has proven to be one of the more difficult problems in ontology engineering. Researchers proposed numerous methods to evaluate logical correctness of an ontology, its structure, or coverage of a domain represented by a corpus. ...
Read More
How many crowdsourced workers should a requester hire?

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
General Chairs:
K. Selçuk Candan
Arizona State University
,
Yi Chen
Arizona State University
,
Richard Snodgrass
University of Arizona
,
Program Chair:
Luis Gravano
Columbia University
,
Publications Chair:
Ariel Fuxman
Microsoft Research
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
filtering
human computation
predicates
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 163
  Total Citations
  View Citations
- 1,294
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CrowdScreen: algorithms for filtering data with humans

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multiverse: crowd algorithms on existing interfaces

Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow

How many crowdsourced workers should a requester hire?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CrowdScreen: algorithms for filtering data with humans

SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multiverse: crowd algorithms on existing interfaces

Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow

How many crowdsourced workers should a requester hire?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media