Skip to main content
Log in

The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction

  • Original Paper
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

Software engineers have limited resources and need metrics analysis tools to investigate software quality such as fault-proneness of modules. There are a large number of software metrics available to investigate quality. However, not all metrics are strongly correlated with faults. In addition, software fault data are imbalanced and affect quality assessment tools such as fault prediction or threshold values that are used to identify risky modules. Software quality is investigated for three purposes. First, the receiver operating characteristics (ROC) analysis is used to identify threshold values to identify risky modules. Second, the ROC analysis is investigated for imbalanced data. Third, the ROC analysis is considered for feature selection. This work validated the use of ROC to identify thresholds for four metrics (WMC, CBO, RFC and LCOM). The ROC results after sampling the data are not significantly different from before sampling. The ROC analysis selects the same metrics (WMC, CBO and RFC) in most datasets, while other techniques have a large variation in selecting metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. www.jarchitect.com.

  2. https://kenai.com/nonav/projects/buginfo.

  3. http://www.spinellis.gr/sw/ckjm/.

References

  1. Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    MATH  Google Scholar 

  2. Arisholm E, Briand L, Johannessen E (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17

    Article  Google Scholar 

  3. Agrawal A, Menzies T (2017) “Better Data” is better than “Better Data Miners”, arXiv:1705.03697 [cs.SE]

  4. Basili V, Briand L, Melo W (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  5. Bender R (1999) Quantitative risk assessment in epidemiological studies investigating threshold effects. Biom J 41(3):305–319

    Article  MATH  Google Scholar 

  6. Benlarbi S, El Emam K, Goel N, Rai S (2000) Thresholds for object-oriented measures. In: 11th International symposium on software reliability engineering (ISSRE 2000). IEEE Computer Society, Los Alamitos, CA, pp 24–38

  7. Briand LC, Wü st J, Daly JW, Victor Porter D (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273

    Article  Google Scholar 

  8. Cartwright M (1998) An empirical view of inheritance. Inf Softw Technol 40:795–799

    Article  Google Scholar 

  9. Catal C, Diri B (2008) A Fault prediction model with limited fault data to improve test process. In: PROFES 2008, LNCS 5089, pp 244–257

  10. Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38:4626–4636

    Article  Google Scholar 

  11. Catal C, Alan O, Balkan K (2011) Class noise detection based on software metrics and ROC curves. Inf Sci 181(21):4867–4877

    Article  Google Scholar 

  12. Challagulla VU, Bastani FB, Yen I, Paul RA (2005) Empirical assessment of machine learning based software defect prediction techniques. In: Tenth IEEE international workshop on object-oriented real-time dependable systems, pp 263–270

  13. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  14. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE, synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  15. Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  16. Daly J, Brooks A, Miller J, Roper M, Wood M (1996) Evaluating inheritance depth on the maintainability of object-oriented software. Empir Softw Eng 1(2):109–132

    Article  Google Scholar 

  17. Dessi N, Pes B (2015) Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst Appl 42(10):4632–4642

    Article  Google Scholar 

  18. El Emam KE, Benlarbi S, Goel N, Rai SN (2001a) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–648

    Article  Google Scholar 

  19. El Emam KE, Melo W, Machado J (2001b) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56:63–75

    Article  Google Scholar 

  20. El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai S (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494–509

    Article  Google Scholar 

  21. Erni K, Lewerentz C (1996) Applying design-metrics to object-oriented frameworks. In: Proceedings of the third international software metrics symposium. Society Press, pp 25–26

  22. Fawcett T (2004) ROC graphs, notes and practical considerations for researchers. Technical report, HP Laboratories, Page Mill Road, Palo Alto, CA

  23. Ferreira KAM, Bigonha M, Bigonha R, Mendes L, Almeida H (2012) Identifying thresholds for object-oriented software metrics. J Syst Softw 85:244–257

    Article  Google Scholar 

  24. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code

  25. Gao K, Khoshgoftaar K, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606

    Article  Google Scholar 

  26. Gondra I (2008) Applying machine learning to software fault-proneness prediction. J Syst Softw 81(2):186–195

    Article  Google Scholar 

  27. Gronback RC (2003) Software remodeling: improving design and implementation quality, using audits, metrics and refactoring in Borland Together ControlCenter, A Borland White Paper

  28. Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  29. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software, an update. Spec Interest Group Knowl Discov Data Min Explor Newsl 11(1):10–18

    Google Scholar 

  30. Harrison R, Counsell S, Nithi R (2000) Experimental assessment of the effect of inheritance on the maintainability of object-oriented systems. J Syst Softw 52(2):173–179

    Article  Google Scholar 

  31. Hosmer D, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  32. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  33. Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13:561–595

    Article  Google Scholar 

  34. John G, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers, San Mateo, pp 338–345

  35. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10

  36. Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. In: Proceedings of the 5th international conference on dependability of computer systems, pp 69–81

  37. Khoshgoftaar T, Seliya N (2004) Comparative assessment of software quality classification techniques, an empirical case study. Empir Softw Eng 9(3):229–257

    Article  Google Scholar 

  38. Khoshgoftaar TM, Kehan G, Seliya N (2010) Attribute Selection and imbalanced data: problems in software defect prediction. In: Proceedings of the 22nd IEEE international conference on tools with artificial intelligence (ICTAI), pp 137–144

  39. Koru AG, El Emam K, Zhang D, Liu H, Mathew D (2008) Theory of relative defect proneness. Empir Softw Eng 13:473–498

    Article  Google Scholar 

  40. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the fourteenth international conference on machine learning, pp 179–186

  41. Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116

    Article  Google Scholar 

  42. Ma Y, Cukic B (2007) Adequate evaluation of quality models in software engineering studies. In: International workshop on predictor models in software engineering, p 1

  43. Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300

    Article  Google Scholar 

  44. Marinescu R (2002) Measurement and quality in object-oriented design. Ph.D. thesis, Politehnica University of Timisoara

  45. McCabe Software (2012) Using code quality metrics in management of outsourced development and maintenance, white paper. http://www.mccabe.com/pdf/McCabeCodeQualityMetrics-OutsourcedDev.pdf. Accessed Nov 2012

  46. Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Predictive software models workshop

  47. Mertik M, Lenic M, Stiglic G, Kokol P (2006) Estimating software quality with advanced data mining techniques. In: International conference on software engineering advances, p 19

  48. Olague H, Etzkorn L, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(8):402–419

    Article  Google Scholar 

  49. Prechelt L, Unger B, Philippsen M, Tichy W (2003) A controlled experiment on inheritance depth as a cost factor for code maintenance. J Syst Softw 65:115–126

    Article  Google Scholar 

  50. Quinlan JR (1993) C4.5, Programs for machine learning. Morgan Kaufmann, San Mateo

  51. Riquelme JC, Ruiz R, Rodrí guez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas del \(8^{\circ } \) taller sobre el apoyo a la decisió n en ingenierí a del software, pp 67–74

  52. Rosenberg LH, Stapko R, Gallo A (1999) Risk-based object oriented testing. In: 24th Annual software engineering workshop, Goddard Space Flight Center

  53. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2008) Building useful models from imbalanced data with sampling and boosting. In: Proceedings of the twenty-first international FLAIRS conference, pp 206–311

  54. Shatnawi R, Li W (2008) The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw 81(11):1868–1882

    Article  Google Scholar 

  55. Shatnawi RA (2010) Quantitative investigation of the acceptable risk levels of object-oriented metrics in open-source systems. IEEE Trans Softw Eng 36(2):216–225

    Article  Google Scholar 

  56. Shatnawi R, Li W, Swain J, Newman T (2010) Finding software metrics threshold values using ROC curves. J Softw Maint Evol Res Pract 22(1):1–16

    Article  Google Scholar 

  57. Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR, pp 935–942

  58. Wang H, Khoshgoftaar TM, Seliya N (2011) How many software metrics should be selected for defect prediction? In: Murray RC, McCarthy, PM (eds) FLAIRS conference. AAAI Press

  59. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Article  Google Scholar 

  60. XLStat, Creating an ROC curve and identify the optimal threshold value for a detection method. http://www.xlstat.com/en/learning-center/tutorials/creating-an-roc-curve-and-identify-the-optimal-threshold-value-for-a-detection-method.html. Accessed 8/2/2014

  61. Yan Z, Chen X, Guo P (2010) Software defect prediction using fuzzy support vector regression. In: International symposium on neural networks. Springer, Berlin, pp 17–24

  62. Yu Q, Jiang S, Zang Y (2017) The performance stability of defect prediction models with class imbalance: an empirical study. IEICE Trans Inf Syst E100(2):265–272

    Article  Google Scholar 

  63. Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789

    Article  Google Scholar 

  64. Zweig M, Campbell G (1993) Receiver-operating characteristic (ROC) plots, a fundamental evaluation tool in clinical medicine. Clinl Chem 39(4):561–577

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raed Shatnawi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shatnawi, R. The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction. Innovations Syst Softw Eng 13, 201–217 (2017). https://doi.org/10.1007/s11334-017-0295-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-017-0295-0

Keywords

Navigation