Identifying relevant studies in software engineering

https://doi.org/10.1016/j.infsof.2010.12.010Get rights and content

Abstract

Context

Systematic literature review (SLR) has become an important research methodology in software engineering since the introduction of evidence-based software engineering (EBSE) in 2004. One critical step in applying this methodology is to design and execute appropriate and effective search strategy. This is a time-consuming and error-prone step, which needs to be carefully planned and implemented. There is an apparent need for a systematic approach to designing, executing, and evaluating a suitable search strategy for optimally retrieving the target literature from digital libraries.

Objective

The main objective of the research reported in this paper is to improve the search step of undertaking SLRs in software engineering (SE) by devising and evaluating systematic and practical approaches to identifying relevant studies in SE.

Method

We have systematically selected and analytically studied a large number of papers (SLRs) to understand the state-of-the-practice of search strategies in EBSE. Having identified the limitations of the current ad-hoc nature of search strategies used by SE researchers for SLRs, we have devised a systematic and evidence-based approach to developing and executing optimal search strategies in SLRs. The proposed approach incorporates the concept of ‘quasi-gold standard’ (QGS), which consists of collection of known studies, and corresponding ‘quasi-sensitivity’ into the search process for evaluating search performance.

Results

We conducted two participant–observer case studies to demonstrate and evaluate the adoption of the proposed QGS-based systematic search approach in support of SLRs in SE research.

Conclusion

We report their findings based on the case studies that the approach is able to improve the rigor of search process in an SLR, as well as it can serve as a supplement to the guidelines for SLRs in EBSE. We plan to further evaluate the proposed approach using a series of case studies on varying research topics in SE.

Introduction

Systematic reviews (also referred as systematic literature reviews, SLRs) aim to identify, assess and combine the evidence from primary research studies using an explicit and rigorous method. This method has been widely implemented in some disciplines, such as medicine and sociology. Since the publication of the seminal paper of Evidence-Based Software Engineering (EBSE) [25] in ICSE 2004, systematic review has become an important methodology of EBSE, and many SLRs have been conducted and reported.

EBSE involves five distinct steps [16]. The second step, ‘search the literature for the best available evidence to answer the question’, builds the basis for evidence aggregation, appraisal and further integration with decision making practice. Kitchenham and Charters [24] also state that the aim of an SLR is to find as many primary studies relevant to the research questions as possible using an unbiased search strategy. The rigor of the search process is one critical factor that distinguishes systematic reviews from traditional (ad-hoc) literature reviews.

Similar to other disciplines, many researchers doing SLRs rely on searches of digital libraries for identifying relevant studies in software engineering (SE). However, these database searches have typically been designed using methods lacking in scientific rigor, instead often relying solely on investigator’s past experience and knowledge of the subject matter [8]. In practice, identifying primary studies can be difficult for several reasons, including inadequate search strategy, heterogeneity of language describing the subject matter, and limited range of indexing terms describing study methodology [11]. Though Biolchini et al. suggest evaluating search engines to verify if they are capable of executing search strings during the planning phase [7], no concrete instruction has been provided for search strategy evaluation.

Despite the current state that neither the above EBSE methodology papers nor the SLR guidelines include the practical instructions about how to improve and evaluate the rigor and performance of a search strategy, some issues related to literature search in SE have emerged and been reflected in SLRs on different topics in SE, such as

  • How to design a rigorous search strategy that maximizes the collection of relevant studies?

  • What are the criteria of an affordable and reliable strategy to effectively balance the search sensitivity (recall) and precision (effort)?

  • Is it possible to evaluate a predefined search strategy and the corresponding search strings?

Moreover, the most recent version of guidelines [24] also encourage software engineering researchers to develop and publish such strategies including identification of relevant digital libraries. Hence, there is an apparent need for validating search strategies for SLRs that optimize retrieval of relevant papers from digital libraries and electronic databases for researchers and practitioners. This paper purposes to contribute to the efforts aimed at addressing the above mentioned needs. We have devised a systematic and practical approach for search strategy development in order to improve the rigor of search processes in SLRs. This approach also strives to balance the retrieval of validated set of relevant papers in SE and the effort consumed in this phase.

This paper is structured as follows. Section 2 introduces concepts related to search strategies for SLRs and briefs the state-of-the-practice of literature search in SLRs in SE. In Section 3, we describe the proposed systematic and practical approach for implementing a relatively rigorous literature search. This search approach is then demonstrated and evaluated by two ‘replicated’ literature searches (participant-observer case studies) and compared to their original SLRs in Sections 4 Case study 1, 5 Case study 2 respectively. We discuss the findings from the case studies designed to assess the proposed systematic search approach and the threats to validity in Sections 6 Discussion, 7 Threats to validity respectively. They are followed by an overview of the related research in Section 8. Finally, Section 9 draws the conclusions of this paper.

Section snippets

Defining search strategy

A necessary and crucial step of SLR is the identification of as much relevant literature to research questions as possible. Search strategy, which defines the methods to retrieve the relevant literature, has been developed in many ways, but the typical approach can be for information professionals (in subject matter) to use their combined knowledge of databases (digital libraries), search techniques, thesauri and the field of interest, to explore, often iteratively, combinations of terms which

QGS-based systematic search approach

Based on the concept of quasi-gold standard (QGS), this section constructs a systematic, repeatable, and practical literature search approach for SE, which provides a mechanism for search strategy development and evaluation.

Case study 1

The first case study was performed by the first two authors in order to formally trial the systematic search process. A participant–observer case study allowed us to access the case information without any barrier [30]. The original search of a tertiary study in software engineering was replicated in this case study, and the subjective search string definition method was applied.

Case study 2

The second case study was aimed to independently follow and evaluate the proposed systematic search process by replicating the literature search of a recently published domain-specific SLR [28]. The case study was mainly done by a PhD student whose research topic is related to global software engineering. The first two authors developed the case study protocol and acted as supervisor and checker in this case. Different from the first case study described in Section 4, the objective method to

Discussion

The research reported in this paper was motivated by an important need for improving the search process of conducting SLRs. Like many other practitioners of EBSE [23], the case studies reported in this paper have also illustrated the limitations of applying automated or manual search methods alone. A manual search can consume a huge amount of effort when scanning a large number of literature venues. On the other hand, the performance of automated search is highly dependent upon the quality of

Threats to validity

Aiming to demonstrate and evaluate the different implementations of search string development of the systematic search approach, we conducted two case studies on different topics instead of a single case study. However, the cases were not randomly selected due to the considerations (criteria) discussed in Section 4.1 as well as the resource available to us when planning them.

Due to the focus of this paper, only the study search and selection steps of the original SLRs were replicated in the

Related work

Systematic literature reviewers in software engineering are aware of the importance of literature search, as well as the challenges involved in searching relevant studies when applying SLR methodology in different sub-disciplines of software engineering and computer science. Many SE researches have reported various kinds of difficulties during the “searching relevant studies” step of SLR methodology [4]. Several experienced systematic literature reviewers have also discussed the issues related

Conclusions

Systematic literature reviews have become an important empirical research methodology in software engineering. An increasing number of SLRs are being conducted and reported. In SLR, an effective and rigorous literature search plays a critical role in evidence aggregation. In order to enhance the rigor and comprehension of the methodology, with reference to the experience of SLRs in other disciplines (e.g., medicine and sociology), this paper proposes a systematic literature search approach

Acknowledgments

NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.

References (34)

  • X. Bai, L. Huang, H. Zhang. On scoping stakeholders and artifacts in software process, in: Proceedings of International...
  • J. Bailey, C. Zhang, D. Budgen, M. Turner, S. Charters, Search engine overlaps: Do they agree or disagree? in:...
  • J. Biolchini, P.G. Mian, A.C.C. Natali, G.H. Travassos. Systematic Review in Software Engineering, Technical Report,...
  • J. Boynton et al.

    Identifying systematic reviews in medline: developing an objective approach to search strategy design

    Journal of Information Science

    (1998)
  • D. Budgen et al.

    Presenting software engineering results using structured abstracts: a randomised experiment

    Empirical Software Engineering

    (2008)
  • K. Dickersin et al.

    Systematic reviews: identifying relevant studies for systematic reviews

    British Medical Journal

    (1994)
  • O. Dieste et al.

    Developing search strategies for detecting relevant experiments

    Empirical Software Engineering

    (2009)
  • Cited by (395)

    • VALIDATE: A deep dive into vulnerability prediction datasets

      2024, Information and Software Technology
    • Supporting reusable model migration with Edelta

      2024, Journal of Systems and Software
    • Software vulnerability prediction: A systematic mapping study

      2023, Information and Software Technology
    View all citing articles on Scopus
    View full text