Skip to main content
Log in

Improved prediction of software defects using ensemble machine learning techniques

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Software testing process is a crucial part in software development. Generally the errors made by developers get fixed at a later stage of the software development process. This increases the impact of the defect. To prevent this, defects need to be predicted during the initial days of the software development, which in turn helps in efficient utilization of the testing resources. Defect prediction process involves classification of software modules into defect prone and non-defect prone. This paper aims to reduce the impact of two major issues faced during defect prediction, i.e., data imbalance and high dimensionality of the defect datasets. In this research work, various software metrics are evaluated using feature selection techniques such as Recursive Feature Elimination (RFE), Correlation-based feature selection, Lasso, Ridge, ElasticNet and Boruta. Logistic Regression, Decision Trees, K-nearest neighbor, Support Vector Machines and Ensemble Learning are some of the algorithms in machine learning that have been used in combination with the feature extraction and feature selection techniques for classifying the modules in software as defect prone and non-defect prone. The proposed model uses combination of Partial Least Square (PLS) Regression and RFE for dimension reduction which is further combined with Synthetic Minority Oversampling Technique due to the imbalanced nature of the used datasets. It has been observed that XGBoost and Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 as compared to algorithms used in the research work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Hauer F, Pretschner A, Schmitt M, Grötsch M (2017) Industrial evaluation of search-based test generation techniques for control systems. In: The 28th international symposium on software reliability engineering (ISSRE)

  2. Yalçıner B, Özdeş M (2019) Software defect estimation using machine learning algorithms. In: 4th international conference on computer science and engineering (UBMK), Samsun, Turkey, pp 487–491. https://doi.org/10.1109/UBMK.2019.8907149

  3. Shirabad JS, Menzies TJ (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Ottawa

    Google Scholar 

  4. Shenvi AA (2009) Defect prevention with orthogonal defect classification. In: Proceeding ISEC '09 proceedings of the 2nd India software engineering conference

  5. Caglayan B, Tosun A et al (2010) Usage of multiple prediction models based on defect categories. In: Proceeding PROMISE '10 proceedings of the 6th international conference on predictive models in software engineering

  6. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Article  Google Scholar 

  7. Bennin KE, Keung J, Monden A, Phannachitta P, Mensah S (2017) The significant effects of data sampling on software defect prioritization and classification. In: Proceedings of the 11th ACM/IEEE international symposium on empirical software engineering and measurement, IEEE Press, pp 364–373

  8. Malhotra R (2015) A systematic review of machine learning techniques for software defect prediction. Appl Soft Comput J 27:504–518

    Article  Google Scholar 

  9. Reddivari S, Raman J (2019) Software quality prediction: an investigation based on machine learning. In: IEEE 20th International conference on information reuse and integration for data science (IRI), Los Angeles, CA, USA, pp 115–122. https://doi.org/10.1109/IRI.2019.00030

  10. Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: IEEE international conference on software quality, reliability and security, Vancouver, BC, pp 17–26. https://doi.org/10.1109/QRS.2015.14

  11. Song Q, Guo Y, Shepperd M (2019) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Software Eng 45(12):1253–1269. https://doi.org/10.1109/TSE.2018.2836442

    Article  Google Scholar 

  12. Arora I, Saha A (2018) Software defect prediction: a comparison between artificial neural network and support vector machine. Advanced computing and communication technologies. Springer, Singapore, pp 51–61

    Chapter  Google Scholar 

  13. Immaculate SD, Begam MF and Floramary M (2019) Software bug prediction using supervised machine learning algorithms. In: International conference on data science and communication (IconDSC), Bangalore, India, pp 1–7, https://doi.org/10.1109/IconDSC.2019.8816965

  14. Awad MA, ElNainay MY, Abougabal MS (2017) Predicting bug severity using customized weighted majority voting algorithms. In: Japan-Africa conference on electronics, communications and computers (JAC-ECC), Alexandria, pp 170–175

  15. Nielsen D (2016) Tree boosting with XGBoost—why does XGBoost Win “Every” machine learning competition? Norwegian University of Science and Technology, Trondheim

    Google Scholar 

  16. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD, international conference on knowledge discovery and data mining (ACM), San Franciso, CA, USA, pp 785–794

  17. Muthukrishnan R, Rohini R (2016) LASSO: a feature selection technique in predictive modeling for machine learning. In: IEEE international conference on advances in computer applications (ICACA), Coimbatore, pp18–20. https://doi.org/10.1109/ICACA.2016.7887916

  18. Palaste VG, Nandedkar VS (2015) A Survey on software defect prediction using data mining techniques. Int J Innov Res Comput Commun Eng 3(11):10–94

    Google Scholar 

  19. Guo G, Mu G (2013) Joint estimation of age, gender and ethnicity: CCA vs. PLS. In: Proceedings of 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), Shanghai, pp 1–6. https://doi.org/10.1109/FG.2013.6553737

  20. Panichella A, Oliveto R, Lucia AD (2014) Cross-project defect prediction models: L’union fait la force. In: Proceedings of the international conference on software maintenance, reengineering and reverse engineering (CSMR/WCRE), pp 164–173

  21. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the international conference on software engineering (ICSE), pp 789–800

  22. Chidamber SR, Kemerer CF (1994) A metrics suite for object- oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  23. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  24. Meiliana, Karim S, Warnars HLHS, Gaol FL, Abdurachman E, Soewito B (2017) Software metrics for defect prediction using machine learning approaches: a literature review with PROMISE repository dataset. In: IEEE international conference on cybernetics and computational intelligence, Phuket, pp 19–23

  25. Chhillar SR, Gahlot S (2017) An evolution of software metrics: a review. ICAIP 2017:139–143

    Google Scholar 

  26. Hariprasad T, Vidhyagaran G, Seenu K, Thirumalai C (2017) Software complexity analysis using halstead metrics. In: International conference on trends in electronics and informatics (ICEI), Tirunelveli, pp 1109–1113. https://doi.org/10.1109/ICOEI.2017.8300883

  27. Abreu, Fernando B (1995) Design metrics for OO software system. ECOOP’95, Quantitative Methods Workshop

  28. Wang F, Ai J, Zou Z (2019) A cluster-based hybrid feature selection method for defect prediction. In IEEE 19th international conference on software quality, reliability and security (QRS), Sofia, Bulgaria, pp 1–9. https://doi.org/10.1109/QRS.2019.00014

  29. Nitesh V Chawla et al. (2002) SMOTE: synthetic minority over-sampling technique. In: Journal of artificial intelligence research, pp 321–357

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sweta Mehta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehta, S., Patnaik, K.S. Improved prediction of software defects using ensemble machine learning techniques. Neural Comput & Applic 33, 10551–10562 (2021). https://doi.org/10.1007/s00521-021-05811-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-05811-3

Keywords

Navigation