Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

https://doi.org/10.1016/j.infsof.2014.01.003Get rights and content

Abstract

Context

The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project’s size, effort, duration, and cost.

Objective

The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012.

Method

A systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process.

Results

The papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted.

Conclusion

This work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined.

Introduction

The International Software Benchmarking Standards Group [1] designed and currently maintains two international public repositories: Software Development & Enhancement with over 6000 software projects and Maintenance & Support with over 470 software applications. The repository contains more than 150 data fields collected from a wide range of countries, organizations, application types, and development types.

The goal of ISBSG is to help any type of organization (business, public, or non-profit) improve their IT resource management through the use of these datasets by performing their own analyses, estimations, comparisons, or benchmarking.

The ISBSG dataset for software development is organized by releases. The current version (since 2013) is release 12, which includes over 6000 projects distributed between years 1989 and 2013. Such a dataset makes it possible to estimate a project’s size, effort, duration, and cost. Furthermore, ISBSG enables users to check project specification completeness, reduce project risk, control software development, plan infrastructure development, and benchmark performance. But ISBSG suffers from two major problems: the heterogeneity of data, i.e., the combining of data from heterogeneous sources [2], and most of the variables have a lot of missing values.

ISBSG implements two fields in its datasets, which can be used to filter out low quality cases from the analysis and help handle data validation and rating issues [S105]. Each project submitted to the ISBSG repository is validated against specific quality criteria and rated in four categories. As pointed out in Liebchen and Shepperd [3], the classification is primarily guided by the completeness of the software projects, which means that low quality data are interpreted as possessing high levels of missing values.

A mapping study can be considered as a secondary study that reviews articles related to a specific research topic. Such a study provides an overview of a research area to assess the existing evidence [4] and can identify gaps in the set of primary studies, where new or better primary studies are required. Mapping studies also pinpoint specific realms of knowledge where there may be an opportunity for more complete systematic literature reviews to be undertaken [5], [6].

Ultimately, a mapping study aims at providing a classification, conducting a thematic analysis, or presenting publication channels [7]. Petticrew and Roberts [5] also suggest that this type of study “involves a search of the literature to determine what sorts of studies addressing the systematic review question have been carried out, where they are published, in what databases they have been indexed, what sorts of outcomes they have assessed, and in which populations.” These studies require a rigorous searching process as well as detailed inclusion and exclusion criteria that are clearly defined in the research protocol and presented in the results report [8]. The main difference between a mapping study and a systematic literature review is the formulation of the research questions and the analysis of the available information [6].

The purpose of this study was to determine to what extent and how ISBSG has been the support for a group of researchers in software engineering. Thereby, systematic mapping has been performed to “map out” papers that have used this dataset in an attempt to identify the topics, estimation methods, complementary datasets, and other issues that have been dealt with in research questions. In this way, a picture portraying the potential and limitations of ISBSG as a research facility has been obtained. Additionally, the investigation into the set of papers related to ISBSG and their classification are valuable results [9].

The rest of this paper will be organized as follows: Section 2 will describe the mapping process, Section 3 will report the mapping results, Section 4 will discuss the limitations of the study, and finally, Section 5 will outline the main conclusions obtained and future work lines.

Section snippets

Research methodology

Systematic mapping studies are a type of systematic literature review that aims to collect and classify research papers related to a specific topic [5], [6], [7], [10], [11].

This section provides an overview of the steps involved in the process of mapping review following Petersen et al. [7] including the formulation of the research questions, the search strategy for primary studies, the inclusion and exclusion criteria, and the data collection process.

Results

The results of the systematic mapping study are presented following each of the research questions.

Discussion

This section summarizes the principal findings of this systematic mapping. It also includes the limitations of the study and discusses the implications for researchers and practitioners.

Conclusion and future work

This work presents the results of a systematic review about the usage of ISBSG until June of 2012. After the searching and filtering process, 129 papers were analyzed coming from as many as 19 journals and 40 conferences. The most relevant journals in terms of number of papers published related to the ISBSG dataset are Journal of Systems and Software, Information and Software Technology, and Empirical Software Engineering. The most relevant conferences include PROMISE and METRICS, followed by

References (37)

  • K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in software engineering, in: 12th Int. Conf....
  • D. Budgen, M. Turner, P. Brereton, B. Kitchenham, Using mapping studies in software engineering, in: Proc. PPIG, 2008,...
  • B.A. Kitchenham, D. Budgen, P. Brereton, The value of mapping studies – a participant–observer case study, in: Proc....
  • S.T. Acuna, J.W. Castro, O. Dieste, N. Juristo, A systematic mapping study on the open source software development...
  • N. Asoudeh, Y. Labiche, Requirement-based software testing with the UML: a systematic mapping study, in: ICSEA 2012...
  • The Excellence in Research for Australia (ERA) Initiative,...
  • M. Jorgensen et al.

    A systematic review of software development cost estimation studies

    IEEE Trans. Softw. Eng.

    (2007)
  • Y. Yang, M. He, M. Li, Q. Wang, B. Boehm, Phase distribution of software development effort, in: Proc. Second ACM-IEEE...
  • Cited by (32)

    • Research patterns and trends in software effort estimation

      2017, Information and Software Technology
    • The usage of ISBSG data fields in software effort estimation: A systematic mapping study

      2016, Journal of Systems and Software
      Citation Excerpt :

      The third largest family is Estimation by Analogy, with 22 papers (20.6%); it was used exclusively in 6 papers. In sum, the distribution obtained for effort estimation is consistent with (Fernández-Diego and González-Ladrón-de-Guevara, 2014). Finally, the category called “Others” includes 24 papers (22.4%) that use an extensive variety of methods such as combination of estimates, function points, fuzzy, simulation, survival analysis, multiple criteria linear programming, production function, sequential quadratic programming, case studies, Particle Swarm Optimisation (PSO), etc., that cannot be include in the previous families.

    View all citing articles on Scopus
    View full text