ABSTRACT
Given a large set of data items, we consider the problem of filtering them based on a set of properties that can be verified by humans. This problem is commonplace in crowdsourcing applications, and yet, to our knowledge, no one has considered the formal optimization of this problem. (Typical solutions use heuristics to solve the problem.) We formally state a few different variants of this problem. We develop deterministic and probabilistic algorithms to optimize the expected cost (i.e., number of questions) and expected error. We experimentally show that our algorithms provide definite gains with respect to other strategies. Our algorithms can be applied in a variety of crowdsourcing scenarios and can form an integral part of any query processor that uses human computation.
- Mechanical Turk. http://mturk.com.Google Scholar
- A. Feng et al. Crowddb: Query processing with the vldb crowd (demo). In VLDB, 2011.Google Scholar
- A. Marcus et al. Crowdsourced databases: Query processing with people. In CIDR, 2011.Google Scholar
- A. Marcus et al. Demonstration of qurk: a query processor for human operators. In SIGMOD, 2011. Google ScholarDigital Library
- A. Parameswaran et al. Human-assisted graph search: it's okay to ask questions. In VLDB, 2011. Google ScholarDigital Library
- Omar Alonso, Daniel E. Rose, and Benjamin Stewart. Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 2008. Google ScholarDigital Library
- A. Doan, R. Ramakrishnan, and A.Y. Halevy. Crowdsourcing systems on the world-wide web. Communications of the ACM, 54(4):86--96, 2011. Google ScholarDigital Library
- E. Bakshy et al. Everyone's an influencer: quantifying influence on twitter. In WSDM, 2011. Google ScholarDigital Library
- A. Parameswaran et al. Crowdscreen: Algorithms for filtering data with humans. Technical report, http://ilpubs.stanford.edu:8090/1011/.Google Scholar
- G. Little et al. Turkit: tools for iterative tasks on mechanical turk. In HCOMP, 2009. Google ScholarDigital Library
- J. Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS. 2009.Google Scholar
- M. J. Franklin et al. Crowddb: answering queries with crowdsourcing. In SIGMOD, 2011. Google ScholarDigital Library
- Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. Human-powered sorts and joins. Proc. VLDB Endow., 5, September 2011. Google ScholarDigital Library
- Robert McCann, Warren Shen, and AnHai Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE '08. Google ScholarDigital Library
- P. Donmez et al. Efficiently learning the accuracy of labeling sources for selective sampling. In KDD, 2009. Google ScholarDigital Library
- P. Perona P. Welinder. Online crowdsourcing: rating annotators and obtaining cost-effective labels. In CVPR, 2010.Google ScholarCross Ref
- A. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, 2011.Google Scholar
- Alexander J. Quinn and Benjamin B. Bederson. Human computation: a survey and taxonomy of a growing field. In CHI, 2011. Google ScholarDigital Library
- R. Gomes et al. Crowdclustering. In NIPS, 2011.Google Scholar
- R. Snow et al. Cheap and fast-but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarDigital Library
- Tim Roughgarden. Algorithmic game theory. Commun. ACM, 53(7):78--86, 2010. Google ScholarDigital Library
- Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.Google Scholar
- V. Raykar et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit. In ICML, 2009. Google ScholarDigital Library
- V. S. Sheng et al. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008. Google ScholarDigital Library
- Larry Wasserman. All of Statistics. Springer, 2003.Google Scholar
- Omar F. Zaidan and Chris Callison-Burch. Feasibility of human-in-the-loop minimum error rate training. In EMNLP, 2009. Google ScholarDigital Library
Index Terms
- CrowdScreen: algorithms for filtering data with humans
Recommendations
Multiverse: crowd algorithms on existing interfaces
CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing SystemsCrowd-powered systems implement crowd algorithms to improve crowd work through techniques like redundancy, iteration, and task decomposition. Existing approaches require substantial programming to package tasks for the crowd and apply crowd algorithms. ...
Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow
WebSci '13: Proceedings of the 5th Annual ACM Web Science ConferenceOntology evaluation has proven to be one of the more difficult problems in ontology engineering. Researchers proposed numerous methods to evaluate logical correctness of an ontology, its structure, or coverage of a domain represented by a corpus. ...
How many crowdsourced workers should a requester hire?
Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Comments