Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review
Introduction
The International Software Benchmarking Standards Group [1] designed and currently maintains two international public repositories: Software Development & Enhancement with over 6000 software projects and Maintenance & Support with over 470 software applications. The repository contains more than 150 data fields collected from a wide range of countries, organizations, application types, and development types.
The goal of ISBSG is to help any type of organization (business, public, or non-profit) improve their IT resource management through the use of these datasets by performing their own analyses, estimations, comparisons, or benchmarking.
The ISBSG dataset for software development is organized by releases. The current version (since 2013) is release 12, which includes over 6000 projects distributed between years 1989 and 2013. Such a dataset makes it possible to estimate a project’s size, effort, duration, and cost. Furthermore, ISBSG enables users to check project specification completeness, reduce project risk, control software development, plan infrastructure development, and benchmark performance. But ISBSG suffers from two major problems: the heterogeneity of data, i.e., the combining of data from heterogeneous sources [2], and most of the variables have a lot of missing values.
ISBSG implements two fields in its datasets, which can be used to filter out low quality cases from the analysis and help handle data validation and rating issues [S105]. Each project submitted to the ISBSG repository is validated against specific quality criteria and rated in four categories. As pointed out in Liebchen and Shepperd [3], the classification is primarily guided by the completeness of the software projects, which means that low quality data are interpreted as possessing high levels of missing values.
A mapping study can be considered as a secondary study that reviews articles related to a specific research topic. Such a study provides an overview of a research area to assess the existing evidence [4] and can identify gaps in the set of primary studies, where new or better primary studies are required. Mapping studies also pinpoint specific realms of knowledge where there may be an opportunity for more complete systematic literature reviews to be undertaken [5], [6].
Ultimately, a mapping study aims at providing a classification, conducting a thematic analysis, or presenting publication channels [7]. Petticrew and Roberts [5] also suggest that this type of study “involves a search of the literature to determine what sorts of studies addressing the systematic review question have been carried out, where they are published, in what databases they have been indexed, what sorts of outcomes they have assessed, and in which populations.” These studies require a rigorous searching process as well as detailed inclusion and exclusion criteria that are clearly defined in the research protocol and presented in the results report [8]. The main difference between a mapping study and a systematic literature review is the formulation of the research questions and the analysis of the available information [6].
The purpose of this study was to determine to what extent and how ISBSG has been the support for a group of researchers in software engineering. Thereby, systematic mapping has been performed to “map out” papers that have used this dataset in an attempt to identify the topics, estimation methods, complementary datasets, and other issues that have been dealt with in research questions. In this way, a picture portraying the potential and limitations of ISBSG as a research facility has been obtained. Additionally, the investigation into the set of papers related to ISBSG and their classification are valuable results [9].
The rest of this paper will be organized as follows: Section 2 will describe the mapping process, Section 3 will report the mapping results, Section 4 will discuss the limitations of the study, and finally, Section 5 will outline the main conclusions obtained and future work lines.
Section snippets
Research methodology
Systematic mapping studies are a type of systematic literature review that aims to collect and classify research papers related to a specific topic [5], [6], [7], [10], [11].
This section provides an overview of the steps involved in the process of mapping review following Petersen et al. [7] including the formulation of the research questions, the search strategy for primary studies, the inclusion and exclusion criteria, and the data collection process.
Results
The results of the systematic mapping study are presented following each of the research questions.
Discussion
This section summarizes the principal findings of this systematic mapping. It also includes the limitations of the study and discusses the implications for researchers and practitioners.
Conclusion and future work
This work presents the results of a systematic review about the usage of ISBSG until June of 2012. After the searching and filtering process, 129 papers were analyzed coming from as many as 19 journals and 40 conferences. The most relevant journals in terms of number of papers published related to the ISBSG dataset are Journal of Systems and Software, Information and Software Technology, and Empirical Software Engineering. The most relevant conferences include PROMISE and METRICS, followed by
References (37)
- et al.
Identifying relevant studies in software engineering
Inf. Softw. Technol.
(2011) - et al.
A systematic mapping study on the combination of static and dynamic quality assurance techniques
Inf. Softw. Technol.
(2012) - et al.
Handling imprecision and uncertainty in software development effort prediction: a type-2 fuzzy logic based framework
Inf. Softw. Technol.
(2009) - et al.
Software development effort prediction: a study on the factors impacting the accuracy of fuzzy logic systems
Inf. Softw. Technol.
(2010) - ISBSG, ISBSG Dataset Release 12, Int. Softw. Benchmarking Stand. Group.,...
- E. Stensrud, T. Foss, B. Kitchenham, I. Myrtveit, An empirical validation of the relationship between the magnitude of...
- G.A. Liebchen, M. Shepperd, Data sets and data quality in software engineering, in: Proc. 4th Int. Workshop Predict....
- W. Afzal, R. Torkar, R. Feldt, A systematic mapping study on non-functional search-based software testing, in: Proc....
- et al.
Systematic Reviews in the Social Sciences: A Practical Guide
(2006) - B. Kitchenham, S. Charters, Guidelines for Performing Systematic Literature Reviews in Software Engineering, Software...
A systematic review of software development cost estimation studies
IEEE Trans. Softw. Eng.
Cited by (32)
Analysis of factors of software development effort and productivity
2021, Procedia Computer ScienceTransformed k-nearest neighborhood output distance minimization for predicting the defect density of software projects
2020, Journal of Systems and SoftwareAn effective approach for software project effort and duration estimation with machine learning algorithms
2018, Journal of Systems and SoftwareResearch patterns and trends in software effort estimation
2017, Information and Software TechnologyThe usage of ISBSG data fields in software effort estimation: A systematic mapping study
2016, Journal of Systems and SoftwareCitation Excerpt :The third largest family is Estimation by Analogy, with 22 papers (20.6%); it was used exclusively in 6 papers. In sum, the distribution obtained for effort estimation is consistent with (Fernández-Diego and González-Ladrón-de-Guevara, 2014). Finally, the category called “Others” includes 24 papers (22.4%) that use an extensive variety of methods such as combination of estimates, function points, fuzzy, simulation, survival analysis, multiple criteria linear programming, production function, sequential quadratic programming, case studies, Particle Swarm Optimisation (PSO), etc., that cannot be include in the previous families.
An empirical analysis of data preprocessing for machine learning-based software cost estimation
2015, Information and Software Technology