Skip to main content
Log in

NRPredictor: an ensemble learning and feature selection based approach for predicting the non-reproducible bugs

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Software maintenance is essential and significant phase of software development life cycle. In software projects, issue tracking systems are used to collect, categorise, and track filed issues. The distinct bug reports are not being able to reproduced by software developers and hence, marked as non-reproducible. Non-reproducible problems are a major performance issue in bug repositories since they take up a lot of time and effort from developers. The goal of this paper is to create a prediction model for detecting non-reproducible bugs. Due to sheer unexpected nature of bug fixation, bug management is frequently a painful undertaking for software engineers. Non reproducible bugs add to the difficulty of this vexing indexing. This paper deals with the development of a early prediction model for identification of non-reproducible bugs. In this work, a novel framework named NRPredictor, has been proposed which uses three ensemble learning and one feature selection algorithm for Non-Reproducible bug prediction. The prediction performance of the proposed framework has been examined using projects of Bugzilla bug tracking system. Three open-source projects viz. Mozilla Firefox, Eclipse and NetBeans have been used for evaluating the prediction performance. While forecasting the fixability of bug reports, the experimental findings reveal that NRPredictor surpasses traditional machine learning techniques. For Mozilla Firefox, Eclipse, and NetBeans projects, NRPredictor, delivers performance (in terms of F1-score) up to 88.3, 87.8, and 87.4% respectively. An improvement in performance up to 6.1, 5 and 2.7% has been obtained for NetBeans, Eclipse, and Mozilla Firefox projects, respectively as compared to the best performing standalone machine learning classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://netbeans.org/bugzilla/.

  2. https://bugs.eclipse.org/bugs/.

  3. https://bugzilla.mozilla.org/.

  4. https://bugs.eclipse.org/bugs/show_bug.cgi?id=13747.

  5. https://www.bugzilla.org/docs/2.18/html/lifecycle.html.

  6. https://en.wikipedia.org/wiki/NetBeans.

  7. https://en.wikipedia.org/wiki/Eclipse(software).

  8. https://en.wikipedia.org/wiki/Firefox.

References

  • Abou Khalil Z, Constantinou E, Mens T, Duchien L (2021) On the impact of release policies on bug handling activity: a case study of eclipse. J Syst Softw 173:110882

    Article  Google Scholar 

  • Abualigah L, Abd Elaziz M, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Exp. Syst. Appl. 191:116158

    Article  Google Scholar 

  • Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376:113609

    Article  MathSciNet  MATH  Google Scholar 

  • Abualigah L, Yousri D, Abd Elaziz M, Ewees AA, Al-Qaness MA, Gandomi AH (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Indus. Eng. 157:107250

    Article  Google Scholar 

  • Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput. Methods Appl. Mech. Eng. 391:114570

    Article  MathSciNet  MATH  Google Scholar 

  • Ahmed HA, Bawany NZ, Shamsi JA (2021) CaPBug-a framework for automatic bug categorization and prioritization using NLP and machine learning algorithms. IEEE Access 9:50496–50512

    Article  Google Scholar 

  • Alzubi JA (2015) Optimal classifier ensemble design based on cooperative game theory. Res J Appl Sci Eng Technol 11(12):1336–1343

    Article  Google Scholar 

  • Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32(20):16091–16107

    Article  Google Scholar 

  • Alzubi OA, Alzubi JAA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76–86

    Google Scholar 

  • Anvik J (2006) Automating bug report assignment. In: Proceedings of the 28th International conference on software engineering, pp 937–940

  • Anvik J, Murphy GC (2011) Reducing the effort of bug report triage: recommenders for development oriented decisions. ACM Trans Softw Eng Method (TOSEM) 20(3):10

    Article  Google Scholar 

  • Artzi S, Kim S, Ernst MD (2008) Recrash: Making software failures reproducible by preserving object states. In: European conference on object-oriented programming. pp 542–565. Springer

  • Bauer C, Parkinson A, Scharl A (1999) Automated vs. manual classification: a multi-methodological set of web analysis components. In: Australasian conference on information systems, pp 54–64. Citeseer

  • Bhattacharya P, Neamtiu I (2010) Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging. In: Software maintenance (ICSM), 2010 IEEE international conference on, pp 1–10. IEEE

  • Breiman L (1996) Bagging predictors. Machine Learn 24(2):123–140

    Article  MATH  Google Scholar 

  • Breu S, Premraj R, Sillito J, Zimmermann T (2010) Information needs in bug reports: improving cooperation between developers and users. In: Proceedings of the 2010 ACM conference on computer supported cooperative work, pp 301–310

  • Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, and Ng V (2017) Detecting missing information in bug descriptions. In Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 396–407. ACM

  • Cheng X, Liu N, Guo L, Xu Z, Zhang T (2020) Blocking bug prediction based on XGBoost with enhanced features. In: 2020 IEEE 44th annual computers, software, and applications conference (COMPSAC), pp 902–911. IEEE

  • Dao A-H, Yang C-Z (2021) Improving priority prediction for bug reports with comment features. In 2021 IEEE international conference on software engineering and artificial intelligence (SEAI), pp 58–62. IEEE

  • Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems, pp 1–15. Springer

  • Ekanayake J (2021) Bug severity prediction using keywords in imbalanced learning environment. Int J Inf Technol Comput Sci (IJITCS) 13:53–60

    Google Scholar 

  • Erfani Joorabchi M, Mirzaaghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In Proceedings of the 11th working conference on mining software repositories, pp 62–71. ACM

  • Fagan M (2002) Design and code inspections to reduce errors in program development. In Software pioneers, pp 575–607. Springer

  • Fan Y, Xia X, Lo D, Hassan AE (2018) Chaff from the wheat: characterizing and determining valid bug reports. IEEE Trans Softw Eng 46(5):495–525

    Article  Google Scholar 

  • Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, pp 23–37. Springer

  • Goyal A, Sardana N (2016) Analytical study on bug triaging practices. Int J Open Source Softw Process (IJOSSP) 7(2):20–42

    Article  Google Scholar 

  • Goyal A, Sardana N (2017) black optimizing bug report assignment using multi criteria decision making technique. Intell Decision Technol 11(3):307–320

    Article  Google Scholar 

  • Goyal A, Sardana N (2017) Machine learning or information retrieval techniques for bug triaging: Which is better? e-Inform Softw Eng J. https://doi.org/10.5277/e-Inf170106

    Article  Google Scholar 

  • Goyal A, Sardana N (2017) Nrfixer: sentiment based model for predicting the fixability of non-reproducible bugs. Inform Softw Eng J 11(1):109–122

    Google Scholar 

  • Goyal A, Sardana N (2018) Characterization study of developers in non-reproducible bugs. In: 2018 eleventh international conference on contemporary computing (IC3), pp 1–6. IEEE

  • Goyal A Sardana N (2019) Empirical analysis of ensemble machine learning techniques for bug triaging. In: 2019 twelfth international conference on contemporary computing (IC3), pp 1–6. IEEE

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: Software engineering, 2010 ACM/IEEE 32nd international conference on, vol 1, pp 495–504. IEEE

  • X. Guo, Y. Yin, C. Dong, G. Yang, and G. Zhou (2008) On the class imbalance problem. In: Fourth international conference on natural computation, 2008. ICNC’08., vol 4, pp 192–201. IEEE

  • Gupta S, Gupta SK (2021) An approach to generate the bug report summaries using two-level feature extraction. Exp Syst Appl 176:114816

    Article  Google Scholar 

  • Hewett R, Kijsanayothin P (2009) On modeling software defect repair time. Empir Softw Eng 14(2):165

    Article  Google Scholar 

  • Isotani H, Washizaki H, Fukazawa Y, Nomoto T, Ouji S, Saito S (2021) Duplicate bug report detection by using sentence embedding and fine-tuning. In: 2021 IEEE international conference on software maintenance and evolution (ICSME), pp 535–544. IEEE

  • Jin W, Orso A (2012) Bugredux: reproducing field failures for in-house debugging. In: 2012 34th international conference on software engineering (ICSE), pp 474–484. IEEE

  • Jonsson L, Borg M, Broman D, Sandahl K, Eldh S, Runeson P (2016) Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir Softw Eng 21(4):1533–1578

    Article  Google Scholar 

  • Kamkar M (1998) Application of program slicing in algorithmic debugging. Inform Softw Technol 40(11–12):637–645

    Article  Google Scholar 

  • Koh Y, Kang S, Lee S (2021) Bug report summarization using believability score and text ranking. In: 2021 international conference on artificial intelligence in information and communication (ICAIIC), pp 117–120. IEEE

  • Kumari M, Sharma M, Anand S, Singh V (2020) Predicting the fix time of a reported bug using radoop: a big data approach. In: Decision analytics applications in industry, pp 259–269. Springer

  • Kumari M, Singh UK, Sharma M (2020) Entropy based machine learning models for software bug severity assessment in cross project context. In: International conference on computational science and its applications, pp 939–953. Springer

  • Lal S, Sardana N, Sureka A (2017) Eclogger: cross-project catch-block logging prediction using ensemble of classifiers. e-Inform Softw Eng J. https://doi.org/10.5277/e-Inf170101

    Article  Google Scholar 

  • Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inform Softw Technol 58:388–402

    Article  Google Scholar 

  • Lee Y, Lee S, Lee C-G, Yeom I, Woo H (2020) Continual prediction of bug-fix time using deep learning-based activity stream embedding. IEEE Access 8:10503–10515

    Article  Google Scholar 

  • Li Z, Jiang Z, Chen X, Cao K, Gu Q (2021) Laprob: a label propagation-based software bug localization method. Inform Softw Technol 130:106410

    Article  Google Scholar 

  • Limsettho N, Bennin KE, Keung JW, Hata H, Matsumoto K (2018) Cross project defect prediction using class distribution estimation and oversampling. Inform Softw Technol 100:87–102

    Article  Google Scholar 

  • López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform Sci 250:113–141

    Article  Google Scholar 

  • Malhotra R, Dabas A, Hariharasudhan A, Pant M (2021) A study on machine learning applied to software bug priority prediction. In: 2021 11th international conference on cloud computing, data science & engineering (Confluence), pp 965–970. IEEE

  • Mohan D, Goyal A, Sardana N (2016) Visheshagya: Time based expertise model for bug report assignment. In: 2016 ninth international conference on contemporary computing (IC3), pp 1–6. IEEE

  • Movassagh AA, Alzubi JA, Gheisari M, Rahimi M, Mohan S, Abbasi AA, Nabipour N (2021) Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model. J Ambient Intel Human Comput. https://doi.org/10.1007/s12652-020-02623-6

    Article  Google Scholar 

  • Neysiani BS, Babamir SM (2020) Automatic duplicate bug report detection using information retrieval-based versus machine learning-based approaches. In: 2020 6th international conference on web research (ICWR), pp 288–293. IEEE

  • Neysiani BS, Babamir SM, Aritsugi M (2020) Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems. Inform Softw Technol 126:106344

    Article  Google Scholar 

  • Oyelade ON, Ezugwu AE-S, Mohamed TI, Abualigah L (2022) Ebola optimization search algorithm: a new nature-inspired metaheuristic optimization algorithm. IEEE Access 10:16150–16177

    Article  Google Scholar 

  • Perry DE, Stieg CS (1993) Software faults in evolving a large, real-time system: a case study. In: European software engineering conference, pp 48–67. Springer

  • Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm Sigkdd Explorations Newsletter 6(1):50–59

    Article  Google Scholar 

  • Rahman MM, Khomh F, Castelluccio M (2020) Why are some bugs non-reproducible?: An empirical investigation using data fusion. In: 2020 IEEE international conference on software maintenance and evolution (ICSME), pp 605–616. IEEE

  • Rashmi P, Kambli P (2020) Predicting bug in a software using ann based machine learning techniques. In: 2020 IEEE International Conference for Innovation in Technology (INOCON), pp 1–5. IEEE

  • Rocha TM, Carvalho ALDC (2021) SiameseQAT: a semantic context-based duplicate bug report detection using replicated cluster information. IEEE Access 9:44610–44630

    Article  Google Scholar 

  • Sethuraman J, Alzubi JA, Manikandan R, Gheisari M, Kumar A (2019) Eccentric methodology with optimization to unearth hidden facts of search engine result pages. Recent Patents Comput Sci 12(2):110–119

    Article  Google Scholar 

  • Sharma M, Kumari M, Singh V (2021) Bug priority assessment in cross-project context using entropy-based measure. In: Advances in machine learning and computational intelligence, pp 113–128. Springer

  • Shatnawi MQ, Alazzam B (2022) An assessment of eclipse bugs’ priority and severity prediction using machine learning. Int J Commun Netwo Inform Security (IJCNIS) 14(1):62–69

    Google Scholar 

  • Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan AE, Matsumoto K-I (2013) Studying re-opened bugs in open source software. Empir Softw Eng 18(5):1005–1042

    Article  Google Scholar 

  • Shokripour R, Anvik J, Kasirun ZM, Zamani S (2015) A time-based approach to automatic bug report assignment. J Syst Softw 102:109–122

    Article  Google Scholar 

  • Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: 2010 17th Asia pacific software engineering conference (APSEC), pp 366–374. IEEE

  • Tagra A (2021) Studying reopened bugs in open source software systems. PhD thesis

  • Tamrawi A, Nguyen TT, Al-Kofahi J, Nguyen TN (2011) Fuzzy set-based automatic bug triaging: nier track. In: Software engineering (ICSE), 2011 33rd international conference on, pp 884–887. IEEE

  • Tamrawi A, Nguyen TT, Al-Kofahi JM, Nguyen TN (2011) Fuzzy set and cache-based approach for bug triaging. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp 365–375

  • Valdivia Garcia H, Shihab E (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th working conference on mining software repositories, pp 72–81. ACM

  • White M, Linares-Vásquez M, Johnson P, Bernal-Cárdenas C, Poshyvanyk D (2015) Generating reproducible and replayable bug reports from android application crashes. In: 2015 IEEE 23rd international conference on program comprehension (ICPC), pp 48–59. IEEE

  • Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  • Xi S-Q, Yao Y, Xiao X-S, Xu F, Lv J (2019) Bug triaging based on tossing sequence modeling. J Comput Sc Technol 34(5):942–956

    Article  Google Scholar 

  • Xia X, Lo D, Shihab E, Wang X, Yang X (2015) Elblocker: predicting blocking bugs with ensemble imbalance learning. Inform Softw Technol 61:93–106

    Article  Google Scholar 

  • Ye L, Jinxiao H, Yutao M (2020) An automatic method using hybrid neural networks and attention mechanism for software bug triaging. J Comput Res Develop 57(3):461

    Google Scholar 

  • Yuan W, Xiong Y, Sun H, and Liu X (2021) Incorporating multiple features to predict bug fixing time with neural networks. In: 2021 IEEE international conference on software maintenance and evolution (ICSME), pp 93–103. IEEE

  • Zhang T, Yang G, Lee B, Lua EK (2014) A novel developer ranking algorithm for automatic bug triage using topic model and developer relations. In: 2014 21st Asia-pacific software engineering conference, vol 1, pp 223–230. IEEE

  • Zhou T, Sun X, Xia X, Li B, Chen X (2019) Improving defect prediction with deep forest. Inform Softw Technol 114:204–216

    Article  Google Scholar 

  • Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proceedings of the 34th international conference on software engineering, pp 1074–1083. IEEE Press

Download references

Funding

This study and all authors involved in this study have not received any funding, including after the completion of the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kulbhushan Bansal.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article. The authors did not receive support from any organization for the submitted work.

Human or animal rights

This tudy did not involve any human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bansal, K., Singh, G., Sunesh Malik et al. NRPredictor: an ensemble learning and feature selection based approach for predicting the non-reproducible bugs. Int J Syst Assur Eng Manag 14, 989–1009 (2023). https://doi.org/10.1007/s13198-023-01902-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-023-01902-7

Keywords

Navigation