Identifying relevant studies in software engineering
Introduction
Systematic reviews (also referred as systematic literature reviews, SLRs) aim to identify, assess and combine the evidence from primary research studies using an explicit and rigorous method. This method has been widely implemented in some disciplines, such as medicine and sociology. Since the publication of the seminal paper of Evidence-Based Software Engineering (EBSE) [25] in ICSE 2004, systematic review has become an important methodology of EBSE, and many SLRs have been conducted and reported.
EBSE involves five distinct steps [16]. The second step, ‘search the literature for the best available evidence to answer the question’, builds the basis for evidence aggregation, appraisal and further integration with decision making practice. Kitchenham and Charters [24] also state that the aim of an SLR is to find as many primary studies relevant to the research questions as possible using an unbiased search strategy. The rigor of the search process is one critical factor that distinguishes systematic reviews from traditional (ad-hoc) literature reviews.
Similar to other disciplines, many researchers doing SLRs rely on searches of digital libraries for identifying relevant studies in software engineering (SE). However, these database searches have typically been designed using methods lacking in scientific rigor, instead often relying solely on investigator’s past experience and knowledge of the subject matter [8]. In practice, identifying primary studies can be difficult for several reasons, including inadequate search strategy, heterogeneity of language describing the subject matter, and limited range of indexing terms describing study methodology [11]. Though Biolchini et al. suggest evaluating search engines to verify if they are capable of executing search strings during the planning phase [7], no concrete instruction has been provided for search strategy evaluation.
Despite the current state that neither the above EBSE methodology papers nor the SLR guidelines include the practical instructions about how to improve and evaluate the rigor and performance of a search strategy, some issues related to literature search in SE have emerged and been reflected in SLRs on different topics in SE, such as
- •
How to design a rigorous search strategy that maximizes the collection of relevant studies?
- •
What are the criteria of an affordable and reliable strategy to effectively balance the search sensitivity (recall) and precision (effort)?
- •
Is it possible to evaluate a predefined search strategy and the corresponding search strings?
Moreover, the most recent version of guidelines [24] also encourage software engineering researchers to develop and publish such strategies including identification of relevant digital libraries. Hence, there is an apparent need for validating search strategies for SLRs that optimize retrieval of relevant papers from digital libraries and electronic databases for researchers and practitioners. This paper purposes to contribute to the efforts aimed at addressing the above mentioned needs. We have devised a systematic and practical approach for search strategy development in order to improve the rigor of search processes in SLRs. This approach also strives to balance the retrieval of validated set of relevant papers in SE and the effort consumed in this phase.
This paper is structured as follows. Section 2 introduces concepts related to search strategies for SLRs and briefs the state-of-the-practice of literature search in SLRs in SE. In Section 3, we describe the proposed systematic and practical approach for implementing a relatively rigorous literature search. This search approach is then demonstrated and evaluated by two ‘replicated’ literature searches (participant-observer case studies) and compared to their original SLRs in Sections 4 Case study 1, 5 Case study 2 respectively. We discuss the findings from the case studies designed to assess the proposed systematic search approach and the threats to validity in Sections 6 Discussion, 7 Threats to validity respectively. They are followed by an overview of the related research in Section 8. Finally, Section 9 draws the conclusions of this paper.
Section snippets
Defining search strategy
A necessary and crucial step of SLR is the identification of as much relevant literature to research questions as possible. Search strategy, which defines the methods to retrieve the relevant literature, has been developed in many ways, but the typical approach can be for information professionals (in subject matter) to use their combined knowledge of databases (digital libraries), search techniques, thesauri and the field of interest, to explore, often iteratively, combinations of terms which
QGS-based systematic search approach
Based on the concept of quasi-gold standard (QGS), this section constructs a systematic, repeatable, and practical literature search approach for SE, which provides a mechanism for search strategy development and evaluation.
Case study 1
The first case study was performed by the first two authors in order to formally trial the systematic search process. A participant–observer case study allowed us to access the case information without any barrier [30]. The original search of a tertiary study in software engineering was replicated in this case study, and the subjective search string definition method was applied.
Case study 2
The second case study was aimed to independently follow and evaluate the proposed systematic search process by replicating the literature search of a recently published domain-specific SLR [28]. The case study was mainly done by a PhD student whose research topic is related to global software engineering. The first two authors developed the case study protocol and acted as supervisor and checker in this case. Different from the first case study described in Section 4, the objective method to
Discussion
The research reported in this paper was motivated by an important need for improving the search process of conducting SLRs. Like many other practitioners of EBSE [23], the case studies reported in this paper have also illustrated the limitations of applying automated or manual search methods alone. A manual search can consume a huge amount of effort when scanning a large number of literature venues. On the other hand, the performance of automated search is highly dependent upon the quality of
Threats to validity
Aiming to demonstrate and evaluate the different implementations of search string development of the systematic search approach, we conducted two case studies on different topics instead of a single case study. However, the cases were not randomly selected due to the considerations (criteria) discussed in Section 4.1 as well as the resource available to us when planning them.
Due to the focus of this paper, only the study search and selection steps of the original SLRs were replicated in the
Related work
Systematic literature reviewers in software engineering are aware of the importance of literature search, as well as the challenges involved in searching relevant studies when applying SLR methodology in different sub-disciplines of software engineering and computer science. Many SE researches have reported various kinds of difficulties during the “searching relevant studies” step of SLR methodology [4]. Several experienced systematic literature reviewers have also discussed the issues related
Conclusions
Systematic literature reviews have become an important empirical research methodology in software engineering. An increasing number of SLRs are being conducted and reported. In SLR, an effective and rigorous literature search plays a critical role in evidence aggregation. In order to enhance the rigor and comprehension of the methodology, with reference to the experience of SLRs in other disciplines (e.g., medicine and sociology), this paper proposes a systematic literature search approach
Acknowledgments
NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.
References (34)
- et al.
Lessons from applying the systematic literature review process within the software engineering domain
Journal of Systems and Software
(2007) - et al.
A systematic review of statistical power in software engineering experiments
Information and Software Technology
(2006) - et al.
Software effort estimation terminology: the tower of babel
Information and Software Technology
(2006) - et al.
A systematic review of effect size in software engineering experiments
Information and Software Technology
(2007) - et al.
A systematic review of quasi-experiments in software engineering
Information and Software Technology
(2009) - et al.
Systematic literature reviews in software engineering: a systematic literature review
Information and Software Technology
(2009) - The excellence in research for Australia ranked conference list. Australian Research Council, 2007 & 2010....
- The excellence in research for australia ranked journal list. Australian Research Council, 2007 & 2010....
- Simstat v.2.5 and wordstat v.6.1. Provalia Research, May 2010....
- M. AliBabar, H. Zhang, Systematic literature reviews in software engineering: preliminary results from interviews with...
Identifying systematic reviews in medline: developing an objective approach to search strategy design
Journal of Information Science
Presenting software engineering results using structured abstracts: a randomised experiment
Empirical Software Engineering
Systematic reviews: identifying relevant studies for systematic reviews
British Medical Journal
Developing search strategies for detecting relevant experiments
Empirical Software Engineering
Cited by (395)
VALIDATE: A deep dive into vulnerability prediction datasets
2024, Information and Software TechnologySupporting reusable model migration with Edelta
2024, Journal of Systems and SoftwareSystematic review on contract-based safety assurance and guidance for future research
2024, Journal of Systems ArchitectureSoftware vulnerability prediction: A systematic mapping study
2023, Information and Software TechnologyLearning from cyber security incidents: A systematic review and future research agenda
2023, Computers and Security