Improved prediction of software defects using ensemble machine learning techniques

Mehta, Sweta; Patnaik, K. Sridhar

doi:10.1007/s00521-021-05811-3

Improved prediction of software defects using ensemble machine learning techniques

Original Article
Published: 02 March 2021

Volume 33, pages 10551–10562, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1261 Accesses
28 Citations
3 Altmetric
Explore all metrics

Abstract

Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Review of Prediction of Software Defect by Using Machine Learning Algorithms

Stacking Based Ensemble Learning for Improved Software Defect Prediction

Software Defect Prediction: An ML Approach-Based Comprehensive Study

References

Hauer F, Pretschner A, Schmitt M, Grötsch M (2017) Industrial evaluation of search-based test generation techniques for control systems. In: The 28th international symposium on software reliability engineering (ISSRE)
Yalçıner B, Özdeş M (2019) Software defect estimation using machine learning algorithms. In: 4th international conference on computer science and engineering (UBMK), Samsun, Turkey, pp 487–491. https://doi.org/10.1109/UBMK.2019.8907149
Shirabad JS, Menzies TJ (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Ottawa
Google Scholar
Shenvi AA (2009) Defect prevention with orthogonal defect classification. In: Proceeding ISEC '09 proceedings of the 2nd India software engineering conference
Caglayan B, Tosun A et al (2010) Usage of multiple prediction models based on defect categories. In: Proceeding PROMISE '10 proceedings of the 6th international conference on predictive models in software engineering
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Article Google Scholar
Bennin KE, Keung J, Monden A, Phannachitta P, Mensah S (2017) The significant effects of data sampling on software defect prioritization and classification. In: Proceedings of the 11th ACM/IEEE international symposium on empirical software engineering and measurement, IEEE Press, pp 364–373
Malhotra R (2015) A systematic review of machine learning techniques for software defect prediction. Appl Soft Comput J 27:504–518
Article Google Scholar
Reddivari S, Raman J (2019) Software quality prediction: an investigation based on machine learning. In: IEEE 20th International conference on information reuse and integration for data science (IRI), Los Angeles, CA, USA, pp 115–122. https://doi.org/10.1109/IRI.2019.00030
Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: IEEE international conference on software quality, reliability and security, Vancouver, BC, pp 17–26. https://doi.org/10.1109/QRS.2015.14
Song Q, Guo Y, Shepperd M (2019) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Software Eng 45(12):1253–1269. https://doi.org/10.1109/TSE.2018.2836442
Article Google Scholar
Arora I, Saha A (2018) Software defect prediction: a comparison between artificial neural network and support vector machine. Advanced computing and communication technologies. Springer, Singapore, pp 51–61
Chapter Google Scholar
Immaculate SD, Begam MF and Floramary M (2019) Software bug prediction using supervised machine learning algorithms. In: International conference on data science and communication (IconDSC), Bangalore, India, pp 1–7, https://doi.org/10.1109/IconDSC.2019.8816965
Awad MA, ElNainay MY, Abougabal MS (2017) Predicting bug severity using customized weighted majority voting algorithms. In: Japan-Africa conference on electronics, communications and computers (JAC-ECC), Alexandria, pp 170–175
Nielsen D (2016) Tree boosting with XGBoost—why does XGBoost Win “Every” machine learning competition? Norwegian University of Science and Technology, Trondheim
Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD, international conference on knowledge discovery and data mining (ACM), San Franciso, CA, USA, pp 785–794
Muthukrishnan R, Rohini R (2016) LASSO: a feature selection technique in predictive modeling for machine learning. In: IEEE international conference on advances in computer applications (ICACA), Coimbatore, pp18–20. https://doi.org/10.1109/ICACA.2016.7887916
Palaste VG, Nandedkar VS (2015) A Survey on software defect prediction using data mining techniques. Int J Innov Res Comput Commun Eng 3(11):10–94
Google Scholar
Guo G, Mu G (2013) Joint estimation of age, gender and ethnicity: CCA vs. PLS. In: Proceedings of 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), Shanghai, pp 1–6. https://doi.org/10.1109/FG.2013.6553737
Panichella A, Oliveto R, Lucia AD (2014) Cross-project defect prediction models: L’union fait la force. In: Proceedings of the international conference on software maintenance, reengineering and reverse engineering (CSMR/WCRE), pp 164–173
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the international conference on software engineering (ICSE), pp 789–800
Chidamber SR, Kemerer CF (1994) A metrics suite for object- oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Meiliana, Karim S, Warnars HLHS, Gaol FL, Abdurachman E, Soewito B (2017) Software metrics for defect prediction using machine learning approaches: a literature review with PROMISE repository dataset. In: IEEE international conference on cybernetics and computational intelligence, Phuket, pp 19–23
Chhillar SR, Gahlot S (2017) An evolution of software metrics: a review. ICAIP 2017:139–143
Google Scholar
Hariprasad T, Vidhyagaran G, Seenu K, Thirumalai C (2017) Software complexity analysis using halstead metrics. In: International conference on trends in electronics and informatics (ICEI), Tirunelveli, pp 1109–1113. https://doi.org/10.1109/ICOEI.2017.8300883
Abreu, Fernando B (1995) Design metrics for OO software system. ECOOP’95, Quantitative Methods Workshop
Wang F, Ai J, Zou Z (2019) A cluster-based hybrid feature selection method for defect prediction. In IEEE 19th international conference on software quality, reliability and security (QRS), Sofia, Bulgaria, pp 1–9. https://doi.org/10.1109/QRS.2019.00014
Nitesh V Chawla et al. (2002) SMOTE: synthetic minority over-sampling technique. In: Journal of artificial intelligence research, pp 321–357

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, 835315, India
Sweta Mehta & K. Sridhar Patnaik

Authors

Sweta Mehta
View author publications
You can also search for this author in PubMed Google Scholar
K. Sridhar Patnaik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sweta Mehta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehta, S., Patnaik, K.S. Improved prediction of software defects using ensemble machine learning techniques. Neural Comput & Applic 33, 10551–10562 (2021). https://doi.org/10.1007/s00521-021-05811-3

Download citation

Received: 24 September 2020
Accepted: 05 February 2021
Published: 02 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05811-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved prediction of software defects using ensemble machine learning techniques

Abstract

Access this article

Similar content being viewed by others

A Review of Prediction of Software Defect by Using Machine Learning Algorithms

Stacking Based Ensemble Learning for Improved Software Defect Prediction

Software Defect Prediction: An ML Approach-Based Comprehensive Study

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved prediction of software defects using ensemble machine learning techniques

Abstract

Access this article

Similar content being viewed by others

A Review of Prediction of Software Defect by Using Machine Learning Algorithms

Stacking Based Ensemble Learning for Improved Software Defect Prediction

Software Defect Prediction: An ML Approach-Based Comprehensive Study

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation