New trends and ideas
A survey of the use of crowdsourcing in software engineering

https://doi.org/10.1016/j.jss.2016.09.015Get rights and content

Highlights

  • A comprehensive survey of literature on crowdsourcing for software engineering.

  • An analysis and identification of key milestones in the development of this area.

  • Identification of open challenges and future work for this research agenda.

Abstract

The term ‘crowdsourcing’ was initially introduced in 2006 to describe an emerging distributed problem-solving model by online workers. Since then it has been widely studied and practiced to support software engineering. In this paper we provide a comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic. We first review the definitions of crowdsourcing and derive our definition of Crowdsourcing Software Engineering together with its taxonomy. Then we summarise industrial crowdsourcing practice in software engineering and corresponding case studies. We further analyse the software engineering domains, tasks and applications for crowdsourcing and the platforms and stakeholders involved in realising Crowdsourced Software Engineering solutions. We conclude by exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering.

Introduction

Crowdsourcing is an emerging distributed problem-solving model based on the combination of human and machine computation. The term ‘crowdsourcing’ was jointly1 coined by Howe and Robinson in 2006 (Howe, 2006b). According to the widely accepted definition presented in the article, crowdsourcing is the act of an organisation outsourcing their work to an undefined, networked labour using an open call for participation.

Crowdsourced Software Engineering (CSE) derives from crowdsourcing. Using an open call, it recruits global online labour to work on various types of software engineering tasks, such as requirements extraction, design, coding and testing. This emerging model has been claimed to reduce time-to-market by increasing parallelism (Lakhani, Garvin, Lonstein, 2010, LaToza, Ben Towne, van der Hoek, Herbsleb, 2013, Stol, Fitzgerald, 2014), and to lower costs and defect rates with flexible development capability (Lakhani et al., 2010). Crowdsourced Software Engineering is implemented by many successful crowdsourcing platforms, such as TopCoder, AppStori, uTest, Mob4Hire and TestFlight.

The crowdsourcing model has been applied to a wide range of creative and design-based activities (Cooper, Khatib, Treuille, Barbero, Lee, Beenen, Leaver-Fay, Baker, Popović, et al., 2010, Norman, Bountra, Edwards, Yamamoto, Friend, 2011, Brabham, Sanchez, Bartholomew, 2009, Chatfield, Brajawidagda, 2014, Alonso, Rose, Stewart, 2008). Crowdsourced Software Engineering has also rapidly gained increasing interest in both industrial and academic communities. Our pilot study of this survey reveals a dramatic rise in recent work on the use of crowdsourcing in software engineering, yet many authors claim that there is ‘little work’ on crowdsourcing for/in software engineering (Schiller, Ernst, 2012, Schiller, 2014, Zogaj, Bretschneider, Leimeister, 2014). These authors can easily be forgiven for this misconception, since the field is growing quickly and touches many disparate aspects of software engineering, forming a literature that spreads over many different software engineering application areas. Although previous work demonstrates that crowdsourcing is a promising approach, it usually targets a specific activity/domain in software engineering. Little is yet known about the overall picture of what types of tasks have been applied in software engineering, which types are more suitable to be crowdsourced, and what the limitations of and issues for Crowdsourced Software Engineering are. This motivates the need for the comprehensive survey that we present here.

The purpose of our survey is two-fold: First, to provide a comprehensive survey of the current research progress on using crowdsourcing to support software engineering activities. Second, to summarise the challenges for Crowdsourced Software Engineering and to reveal to what extent these challenges were addressed by existing work. Since this field is an emerging, fast-expanding area in software engineering yet to achieve full maturity, we aim to strive for breadth in this survey. The included literature may directly crowdsource software engineering tasks to the general public, indirectly reuse existing crowdsourced knowledge, or propose a framework to enable the realisation/improvement of Crowdsourced Software Engineering.

The remaining parts of this paper are organised as follows. Section 2 introduces the methodology on literature search and selection, with detailed numbers for each step. Section 3 presents background information on Crowdsourced Software Engineering. Section 4 describes practical platforms for Crowdsourced Software Engineering, together with their typical processes and relevant case studies. Section 5 provides a finer-grained view of Crowdsourced Software Engineering based on their application domains in software development life-cycle. Sections 6 and 7 describe current issues, open problems and opportunities. Section 8 discusses the limitations of this survey. Section 9 concludes.

Section snippets

Literature search and selection

The aim of conducting a comprehensive survey of all publications related to Crowdsourced Software Engineering necessitates a careful and thorough paper selection process. The process contains several steps which are described as follows:

To start with, we defined the inclusion criteria of the surveyed publications: The main criterion for including a paper in our survey is that the paper should describe research on crowdsourcing2

Definitions, trends and landscape

We first review definitions of crowdsourcing, before proceeding to the focus of Crowdsourced Software Engineering.

Crowdsourcing practice in software engineering

In this section, we describe the most prevalent crowdsourcing platforms together with typical crowdsourced development processes for software engineering. Since most case studies we collected were based on one (or several) of these commercial platforms, in the second part of this section, we present relevant case studies on the practice of Crowdsourced Software Engineering.

Crowdsourcing applications to software engineering

Crowdsourcing applications to software engineering are presented as multiple subsections, according to the software development life-cycle activities that pertain to them. The following major stages are addressed: software requirements, software design, software coding, software testing and verification, software evolution and maintenance. An overview of the research on Crowdsourced Software Engineering is shown in Table 6. The references that map to each of the software engineering tasks are

Issues and open problems

Despite the extensive applications of crowdsourcing in software engineering, the emerging model itself faces a series of issues that raise open problems for future work. These issues and open problems have been identified by previous studies. However, few research studies have focused on solutions to address these issues.

According to an in-depth industrial case study on TopCoder (Stol and Fitzgerald, 2014c), key concerns including task decomposition, planning and scheduling, coordination and

Opportunities

This section outlines five ways in which the authors believe Crowdsourced Software Engineering may develop as it matures, widens and deepens its penetration into software engineering methods, concepts and practices.

Threats to validity of this survey

The most relevant threats to validity for this survey study are the potential bias in the literature selection and misclassification.

Literature search and selection. Our online library search was driven by the keywords related to crowdsourcing and software engineering. It is possible that our search missed some studies that implicitly use crowdsourcing without mentioning the term ‘crowdsourcing’, or those studies that explicitly use crowdsourcing in the software engineering activities which

Conclusions

In this survey, we have analysed existing literature on the use of crowdsourcing in software engineering activities and research into these activities. The study has revealed a steadily increasing rate of publication and has presented a snapshot of the research progress of this area from the perspectives of theories, practices and applications. Specifically, theories on crowdsourced software development models, major commercial platforms for software engineering and corresponding case studies,

Acknowledgments

The authors would like to thank the many authors who contributed their valuable feedback in the ‘pseudo-crowdsourced’ checking process of this survey, and the anonymous referees for their comments.

Ke Mao is funded by the UCL Graduate Research Scholarship (GRS), and the UCL Overseas Research Scholarship (ORS). This work is also supported by the Dynamic Adaptive Automated Software Engineering (DAASE) programme grant (EP/J017515), which fully supports Yue Jia, partly supports Mark Harman.

Ke Mao is pursuing a PhD degree in computer science at University College London, under the supervision of Prof. Mark Harman and Dr. Licia Capra. He received the MSc degree in computer science from the Institute of Software, Chinese Academy of Sciences, China. He worked as a research intern and a software engineer intern at Microsoft and Baidu respectively. He has served as a publicity chair or a PC member for several international workshops on software crowdsourcing. He is currently

References (259)

  • S. Beecham et al.

    Motivation in software engineering: a systematic literature review

    Inf. Software Technol.

    (2008)
  • B. Bergvall-Kå reborn et al.

    The apple business model: crowdsourcing mobile applications

    Accounting Forum

    (2013)
  • M. Harman et al.

    Babel pidgin: SBSE can grow and graft entirely new functionality into a real world system

    Proc. 6th Symposium on Search Based Software Engineering

    (2014)
  • A. Abran et al.

    Guide to the Software Engineering Body of Knowledge (SWEBOK®)

    (2004)
  • K. Adamopoulos et al.

    Mutation testing using genetic algorithms: A co-evolution approach

    Proc. 6th Annual Genetic and Evolutionary Computation Conference

    (2004)
  • A. Adepetu et al.

    CrowdREquire: A Requirements Engineering Crowdsourcing Platform

    Technical Report

    (2012)
  • S. Afshan et al.

    Evolving readable string test inputs using a natural language model to reduce human oracle cost

    Proc. 6th IEEE International Conference on Software Testing, Verification and Validation

    (2013)
  • Y. Agarwal et al.

    ProtectMyPrivacy: Detecting and mitigating privacy leaks on iOS devices using crowdsourcing

    Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services

    (2013)
  • D. Akhawe et al.

    Alice in warningland: A large-scale field study of browser security warning effectiveness

    Proc. 22Nd USENIX Conference on Security

    (2013)
  • P. Akiki et al.

    Crowdsourcing user interface adaptations for minimizing the bloat in enterprise applications

    Proc. 5th ACM SIGCHI symposium on Engineering interactive computing systems

    (2013)
  • R. Ali et al.

    Social adaptation: when software gives users a voice

    Proc. 7th International Conference Evaluation of Novel Approaches to Software Engineering

    (2012)
  • R. Ali et al.

    Social sensing: When users become monitors

    Proc. 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering

    (2011)
  • M. Allahbakhsh et al.

    Quality control in crowdsourcing systems: Issues and directions

    IEEE Internet Comput.

    (2013)
  • M. Almaliki et al.

    The design of adaptive acquisition of users feedback: An empirical study

    Proc. 9th International Conference on Research Challenges in Information Science

    (2014)
  • O. Alonso et al.

    Crowdsourcing for relevance evaluation

    ACM SigIR Forum

    (2008)
  • S. Amann et al.

    Method-call recommendations from implicit developer feedback

    Proc. 1st International Workshop on CrowdSourcing in Software Engineering

    (2014)
  • M. Aparicio et al.

    Proposing a system to support crowdsourcing

    Proc. 2012 Workshop on Open Source and Design of Communication

    (2012)
  • N. Archak

    Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on TopCoder.com

    Proc. 19th international conference on World wide web

    (2010)
  • A. Arcuri et al.

    Multi-objective improvement of software using co-evolution and smart seeding

    Proc. 7th International Conference on Simulated Evolution and Learning

    (2008)
  • C. Arellano et al.

    Crowdsourced web augmentation : a security model

    Proc. 11 International Conference on Web Information Systems Engineering

    (2010)
  • R. Auler et al.

    Addressing JavaScript JIT engines performance quirks : a crowdsourced adaptive compiler

    Proc. 23rd International Conference on Compiler Construction

    (2014)
  • A. Bacchelli et al.

    Harnessing Stack Overflow for the IDE

    Proc. 3rd International Workshop on Recommendation Systems for Software Engineering

    (2012)
  • D.F. Bacon et al.

    A market-based approach to software evolution

    Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications

    (2009)
  • T. Ball et al.

    Beyond Open Source: The TouchDevelop Cloud-based Integrated Development and Runtime Environment

    Technical Report

    (2014)
  • E.T. Barr et al.

    The oracle problem in software testing: a survey

    IEEE Trans. Software Eng.

    (2015)
  • O. Barzilay et al.

    Facilitating crowd sourced software engineering via stack overflow

    Finding Source Code on the Web for Remix and Reuse

    (2013)
  • A. Begel et al.

    Social networking meets software development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder

    IEEE Software

    (2013)
  • A. Begel et al.

    Social media for software engineering

    Proc. FSE/SDP Workshop on Future of Software Engineering Research

    (2010)
  • M.S. Bernstein

    Crowd-powered interfaces

    Proc. 23nd annual ACM symposium on User interface software and technology

    (2010)
  • J. Bishop et al.

    Code hunt: experience with coding contests at scale

    Proc. 37th International Conference on Software Engineering - JSEET

    (2015)
  • R. Blanco et al.

    Repeatable and reliable search system evaluation using crowdsourcing

    Proc. 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

    (2011)
  • B.W. Boehm

    Software engineering economics

    (1981)
  • Bozkurt, M., Harman, M., (2011). Automatically generating realistic test input from web services. Proc. 6th IEEE...
  • D.C. Brabham

    Crowdsourcing as a model for problem solving an introduction and cases

    Convergence

    (2008)
  • D.C. Brabham et al.

    Crowdsourcing public participation in transit planning: preliminary results from the next stop design case

    Transportation Research Board

    (2009)
  • T.D. Breaux et al.

    Scaling requirements extraction to the crowd: Experiments with privacy policies

    Proc. 22nd IEEE International Requirements Engineering Conference

    (2014)
  • B. Bruce et al.

    Reducing energy consumption using genetic improvement

    Proc. 17th Annual Genetic and Evolutionary Computation Conference

    (2015)
  • M. Bruch

    IDE 2.0: Leveraging the Wisdom of the Software Engineering Crowds

    (2012)
  • M. Bruch et al.

    IDE 2.0: Collective intelligence in software development

    Proc. FSE/SDP Workshop on Future of Software Engineering Research

    (2010)
  • I. Burguera et al.

    Crowdroid: Behavior-based malware detection system for Android

    Proc. 1st ACM workshop on Security and Privacy in Smartphones and Mobile Devices

    (2011)
  • C. Challiol et al.

    Crowdsourcing mobile web applications

    Proc. ICWE 2013 Workshops

    (2013)
  • A.T. Chatfield et al.

    Crowdsourcing hazardous weather reports from citizens via twittersphere under the short warning lead times of EF5 intensity tornado conditions

    Proc. 47th Hawaii International Conference on System Sciences

    (2014)
  • C. Chen et al.

    Who asked what: Integrating crowdsourced FAQs into API documentation

    Proc. 36th International Conference on Software Engineering (ICSE Companion)

    (2014)
  • F. Chen et al.

    Crowd debugging

    Proc. 10th Joint Meeting on Foundations of Software Engineering

    (2015)
  • K.-t. Chen et al.

    Quadrant of Euphoria: a crowdsourcing platform for QoE assessment

    IEEE Netw.

    (2010)
  • N. Chen et al.

    Puzzle-based automatic testing: bringing humans into the loop by solving puzzles

    Proc. 27th IEEE/ACM International Conference on Automated Software Engineering

    (2012)
  • N. Chen et al.

    AR-Miner: mining informative reviews for developers from mobile app marketplace

    Proc. 36th International Conference on Software Engineering

    (2014)
  • Software engineering for self-adaptive systems (Dagstuhl Seminar)

    Dagstuhl Seminar Proceedings

    (2008)
  • P.K. Chilana

    Supporting users after software deployment through selection-based crowdsourced contextual help

    (2013)
  • P.K. Chilana et al.

    LemonAid: selection-based crowdsourced contextual help for web applications

    Proc. SIGCHI Conference on Human Factors in Computing Systems

    (2012)
  • Cited by (275)

    • Crowdtesting Practices and Models: An Empirical Approach

      2023, Information and Software Technology
    • False negative of defects estimation in crowdsourced testing

      2024, Journal of Software: Evolution and Process
    View all citing articles on Scopus

    Ke Mao is pursuing a PhD degree in computer science at University College London, under the supervision of Prof. Mark Harman and Dr. Licia Capra. He received the MSc degree in computer science from the Institute of Software, Chinese Academy of Sciences, China. He worked as a research intern and a software engineer intern at Microsoft and Baidu respectively. He has served as a publicity chair or a PC member for several international workshops on software crowdsourcing. He is currently investigating the application of crowdsourcing in software engineering, with a focus on crowdsourced software testing.

    Licia Capra is Professor of pervasive computing in the Department of Computer Science at University College London. Licia conducts research in the area of computer-supported cooperative work. She has tackled specific topics within this broad research field, including crowdsourcing, coordination, context-awareness, trust management, and personalisation. She has published more than 70 papers on these topics, in top venues including SIGSOFT FSE, IEEE TSE, ACM CSCW, SIGIR, SIGKDD, and RecSys.

    Mark Harman is Professor of software engineering at University College London, where he is the head of software systems engineering and director of the CREST centre. He is widely known for work on source code analysis and testing and was instrumental in the founding of the field of search-based software engineering, a sub-field of software engineering which is now attracted over 1,600 authors, spread over more than 40 countries.

    Yue Jia is a lecturer in the Department of Computer Science at University College London. His research interests cover mutation testing, app store analysis and search-based software engineering. He has published more than 25 papers including one of which received the best paper award at SCAM’08. He co-authored several invited keynote papers at leading international conferences: ICST 2015, SPLC 2014, SEAMS 2014 and ASE 2012 and published a comprehensive survey on mutation testing in TSE. He has served in many programme committees and as program chair for the Mutation workshop and guest editor for the STVR special issue on Mutation.

    View full text