Skip to main content
Log in

Effect of Data Sampling on Cone Shaped Embedded Normalization in Just in Time Software Defect Prediction

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Just-in-Time (JIT) defect prediction represents a software engineering approach that seeks to detect potential defects in software code at the earliest stages of the development process. This proactive method allows developers to tackle issues before they escalate, thereby enhancing software security and reliability. However, researchers often encounter a common challenge known as class imbalance when working on this model. This imbalance in data adversely affects the model's performance. To address this, the study minimized the class imbalance problem by employing data sampling techniques. The study evaluated the performance of the proposed cone-shaped embedded normalization (CSEN) model against other baseline models in two scenarios. First, the comparison was conducted without sampling, and second, after performing data sampling. Typically, in state-of-the-art predictions of buggy changes, the f1 score ranges from 0.3 to 0.53. However, the proposed model significantly improved this score to 0.72. Moreover, the highest accuracy achieved by the proposed CSEN model was 74.42%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The experiment uses public data sets shared by Kamei et al. [4], and they have already published the download address of the data sets in their paper. https://research.cs.queensu.ca/~kamei/jittse/jit.zip

References

  1. Mockus A, Roy CK, Gourley A. The Bay Area open-source systems project. IEEE Softw. 2010;27(4):20–3.

    Google Scholar 

  2. Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng. 2014. https://doi.org/10.1145/25970732597075.

    Article  Google Scholar 

  3. Zimmermann T, Premraj R, Zeller A Predicting defects for Eclipse. In proceedings of the 2012 ACM-IEEE International symposium on empirical software engineering and measurement 2012 (ESEM '12).

  4. Kamei Y, Shihab E, Adams B, Hassan AE. A large-scale empirical study of just-in-time quality assurance. IEEE Trans Software Eng. 2013;39(6):757–73.

    Article  Google Scholar 

  5. Yang Y, Wang Q, Leung H. Just-in-time quality assurance: towards practicable and sustainable quality assurance. IEEE Trans Softw Eng. 2015;41(2):111–27.

    Google Scholar 

  6. Herzig K, Just S, Zeller A It's not a bug, it's a feature: how misclassification impacts bug prediction. In Proceedings of the 39th international conference on software engineering 2017

  7. Menzies T, Krishna R, Fu W. Local versus global models for effort-aware just-in-time defect prediction: A replication study. Empir Softw Eng. 2018;23(3):1712–35.

    Google Scholar 

  8. Huang Q, Xia X, Lo D Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction empirical software engineering. Research collection school of information systems 2018 1–40

  9. Shin Y, Kim D, Zimmermann T. Just-in-time defect prediction leveraging social interactions. Empir Softw Eng. 2019;24(2):639–75.

    Google Scholar 

  10. Hoang T, Dam HK, Kamei Y, Lo D, Ubayashi N “DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction,” in Proceedings of the international conference on mining software repositories (MSR), 2019, pp. 34–45.

  11. Qiao L, Wang Y. Effort-aware and just-in-time defect prediction with neural network. PLoS ONE. 2019;14(2):e0211359. https://doi.org/10.1371/journal.pone.0211359.

    Article  MathSciNet  Google Scholar 

  12. Yang S, Wang Q, Leung H. A multi-objective optimization approach for just-in-time defect prediction. Inf Sci. 2020;507:1264–83.

    Google Scholar 

  13. Yuli T, Ning L, Jeff T, Wei Z, “How well just-in-time defect prediction techniques enhance software reliability?” In proceedings of the IEEE international conference on software quality, reliability and security (QRS) 2020

  14. Gao R, Xie Q, Leung H Ensemble-based just-in-time defect prediction. IEEE transactions on software engineering 2021

  15. Yanli S, Jingru Z, Xingqi W, Weiwei W, Jinglong F "Research on cross-company defect prediction method to improve software security", security and communication networks, 19 pages, 2021 https://doi.org/10.1155/2021/5558561

  16. MSR 2014: Proceedings of the 11th working conference on mining software repositories May 2014 Pages 172–181https://doi.org/10.1145/2597073.2597075

  17. Xingguang Y, Huiqun Y, Guisheng F, Kai S, Liqiong C “Local versus Global Models for Just-In-Time Software Defect Prediction” 2019

  18. Saleh Albahli, “A Deep ensemble learning method for effort-aware just-in-time defect prediction “ 2019

  19. Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T. Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf Softw Technol. 2019;106:182–200.

    Article  Google Scholar 

  20. Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. arXiv 2017, arXiv:1708.06633. Available online: https://arxiv.org/abs/1708.06633 (accessed on 20 November 2019)

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Authors

Contributions

Lipika Goel: Conceptualization, Methodology, Software, Supervision, Sonam Gupta: Visualization, Supervision, Investigation, Dharmendra Kumar: Data Curation, writing – original draft, Vinay Pathak: Software Validation, editing.

Corresponding author

Correspondence to Sonam Gupta.

Ethics declarations

Conflict of interest

Not Applicable.

Research Involving Human and Animals

Not Applicable.

Informed Consent

Informed consent was obtained from all individual participants included in the study. The participant has consented to the submission of the research article to the journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Security for Communication and Computing Application” guest edited by Karan Singh, Ali Ahmadian, Ahmed Mohamed Aziz Ismail, R S Yadav, Md. Akbar Hossain, D. K. Lobiyal, Mohamed Abdel-Basset, Soheil Salahshour, Anura P. Jayasumana, Satya P. Singh, Walid Osamy, Mehdi Salimi and Norazak Senu.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goel, L., Gupta, S., Kumar, D. et al. Effect of Data Sampling on Cone Shaped Embedded Normalization in Just in Time Software Defect Prediction. SN COMPUT. SCI. 5, 345 (2024). https://doi.org/10.1007/s42979-024-02703-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02703-w

Keywords

Navigation