Reflections on the NASA MDP data sets
Reflections on the NASA MDP data sets
- Author(s): D. Gray ; D. Bowes ; N. Davey ; Y. Sun ; B. Christianson
- DOI: 10.1049/iet-sen.2011.0132
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): D. Gray 1 ; D. Bowes 1 ; N. Davey 1 ; Y. Sun 1 ; B. Christianson 1
-
-
View affiliations
-
Affiliations:
1: Computer Science Department, University of Hertfordshire, UK
-
Affiliations:
1: Computer Science Department, University of Hertfordshire, UK
- Source:
Volume 6, Issue 6,
December 2012,
p.
549 – 558
DOI: 10.1049/iet-sen.2011.0132 , Print ISSN 1751-8806, Online ISSN 1751-8814
Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.
Inspec keywords: pattern classification; software reliability; software metrics; data mining
Other keywords:
Subjects: Data handling techniques; Software metrics; Knowledge engineering techniques; Software engineering techniques
References
-
-
1)
- Williams, C., Spacco, J.: `SZZ revisited: verifying when changes induce fixes', Proc. 2008 Workshop on Defects in Large Software Systems. DEFECTS’08, 2008, New York, USA, p. 32–36.
-
2)
- T. Howley , M.G. Madden , M.L. O'Connell , A.G. Ryder . The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data. Knowl.-Based Syst. , 5 , 363 - 370
-
3)
- G.E.A.P.A. Batista , R.C. Prati , M.C. Monard . A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. , 20 - 29
-
4)
- Menzies, T., Stefano, J.S.D., Orrego, A., Chapman, R.: `Assessing predictors of software defects', Proc. Workshop on Predictive Software Models, 2004.
-
5)
- Jiang, Y., Cukic, B., Menzies, T.: `Fault prediction using early lifecycle data', 18thIEEE Int. Symp. on Software Reliability, 2007. ISSRE’07, 2007, p. 237–246.
-
6)
- Singh, Y., Kaur, A., Malhotra, R.: `Predicting software fault proneness model using neural network', Proc. Ninth Int. Conf. on Product-Focused Software Process Improvement. PROFES’08, 2008, Berlin, Heidelberg, p. 204–214.
-
7)
- Y. Ma , L. Guo , B. Cukic . (2006) A statistical framework for the prediction of fault-proneness, Advances in machine learning application in software engineering.
-
8)
- Q. Song , Z. Jia , M. Shepperd , S. Ying , J. Liu . A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. , 3 , 356 - 370
-
9)
- Cieslak, D.A., Chawla, N.V., Striegel, A.: `Combating imbalance in network intrusion datasets', 2006 IEEE Int. Conf. Granular Computing, 2006, p. 732–737.
-
10)
- A.G. Koru , H. Liu . An investigation of the effect of module size on defect prediction using static measures. ACM SIGSOFT Softw. Eng. Notes , 4 , 1 - 5
-
11)
- S. Lessmann , B. Baesens , C. Mues , S. Pietsch . Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. , 4 , 485 - 496
-
12)
- H. Zhang , X. Zhang . Comments on “data mining static code attributes to learn defect predictors”. IEEE Trans. Softw. Eng. , 635 - 637
-
13)
- Kim, S., Zimmermann, T., Pan, K., Whitehead, E.J.J.: `Automatic identification of bug-introducing changes', ASE'06: Proc. 21st IEEE/ACM Int. Conf. on Automated Software Engineering, 2006, Washington, DC, USA, p. 81–90.
-
14)
- Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: `Comparing design and code metrics for software quality prediction', Proc. Fourth Int. Workshop on Predictor Models in Software Engineering. PROMISE’08, 2008, New York, USA, p. 11–18.
-
15)
- Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: `The misuse of the NASA metrics data program data sets for automated software defect prediction', Evaluation and Assessment in Software Engineering (EASE), 2011, p. 96–103.
-
16)
- Kutlubay, O., Turhan, B., Bener, A.B.: `A two-step model for defect density estimation', 33rdEUROMICRO Conf. Software Engineering and Advanced Applications, 2007, p. 322–332.
-
17)
- Vivanco, R.A., Kamei, Y., Monden, A., Matsumoto, K.-i., Jin, D.: `Using search-based metric selection and oversampling to predict fault prone modules', IEEE CCECE, 2010, p. 1–6.
-
18)
- Tao, W., Wei-hua, L.: `Naive Bayes software defect prediction model', 2010 Int. Conf. on Computational Intelligence and Software Engineering (CiSE), 2010, p. 1–4.
-
19)
- N.V. Chawla , N. Japkowicz , A. Kolcz . Special issue on learning from imbalanced datasets. SIGKDD Explor. Newsl. , 1 , 1 - 6
-
20)
- Mertik, M., Lenic, M., Stiglic, G., Kokol, P.: `Estimating software quality with advanced data mining techniques', Int. Conf. on Software Engineering Advances, 2006, p. 19.
-
21)
- Guo, L., Ma, Y., Cukic, B., Singh, H.: `Robust prediction of fault-proneness by random forests', 15thInt. Symp. on Software Reliability Engineering. ISSRE 2004, 2004, p. 417–428.
-
22)
- Nickerson, A.S., Japkowicz, N., Milios, E.: `Using unsupervised learning to guide resampling in imbalanced data sets', Proc. Eighth Int. Workshop on AI and Statistics, 2001, p. 261–265.
-
23)
- Mende, T., Koschke, R.: `Effort-aware defect prediction models', European Conf. on Software Maintenance and Reengineering, 2010, p. 107–116.
-
24)
- T. Menzies , Z. Milton , B. Turhan , B. Cukic , Y. Jiang , A. Bener . Defect prediction from static code features: current results, limitations, new approaches. Autom. Softw. Eng. , 4 , 375 - 407
-
25)
- Davis, J., Goadrich, M.: `The relationship between Precision-Recall and ROC curves', Proc. 23rd Int. Conf. on Machine Learning. ICML'06, 2006, New York, USA, p. 233–240.
-
26)
- Cong, J., En-Mei, D., Li-Na, Q.: `Software fault prediction model based on adaptive dynamical and median particle swarm optimization', 2010 Second Int. Conf. on Multimedia and Information Technology (MMIT), 2010, 1, p. 44–47.
-
27)
- B. Turhan , T. Menzies , A.B. Bener , J. Di Stefano . On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. , 540 - 578
-
28)
- Khoshgoftaar, T.M., Seliya, N.: `The necessity of assuring quality in software measurement data', METRICS'04: Proc. Software Metrics, 10th Int. Symp., 2004, Washington, DC, USA, p. 119–130.
-
29)
- Menzies, T., Stefano, J.S.D.: `How good is your blind spot sampling policy?', Proc. Eighth IEEE Int. Symp. on High Assurance Systems Engineering, 2004, p. 129–138.
-
30)
- K.O. Elish , M.O. Elish . Predicting defect-prone software modules using support vector machines. J. Syst. Softw. , 5 , 649 - 660
-
31)
- C. Chen , A. Liaw , L. Breiman . (2004) Using random forest to learn imbalanced data.
-
32)
- Challagulla, V.U.B., Bastani, F.B., Yen, I.L.: `A unified framework for defect data analysis using the mbr technique', ICTAI’06: Proc. 18th IEEE Int. Conf. on Tools with Artificial Intelligence, 2006, Washington, DC, USA, p. 39–46.
-
33)
- T.J. McCabe . (1976) A complexity measure’. ICSE’76, Proc. Second Int. Conf. on Software Engineering.
-
34)
- H. He , E.A. Garcia . Learning from Imbalanced Data. IEEE Trans. Know. Data Eng. , 1263 - 1284
-
35)
- Zhang, H., Nelson, A., Menzies, T.: `On the value of learning from defect dense components for software defect prediction', Proc. Sixth Int. Conf. on Predictive Models in Software Engineering. PROMISE’10, 2010, New York, USA, p. 14:1–14:9.
-
36)
- Schröter, A., Zimmermann, T., Zeller, A.: `Predicting component failures at design time', Proc. Fifth Int. Symp. on Empirical Software Engineering, 2006, p. 18–27.
-
37)
- Kołcz, A., Chowdhury, A., Alspector, J.: `Data duplication: an imbalance problem?', ICML 2003 Workshop on Learning from Imbalanced Datasets, 2003.
-
38)
- Pelayo, L., Dick, S.: `Applying novel resampling strategies to software defect prediction', Annual Meeting of the North American Fuzzy Information Processing Society, 2007. NAFIPS'07, 2007, p. 69–72.
-
39)
- Y. Jiang , B. Cukic , Y. Ma . Techniques for evaluating fault prediction models. Empir. Softw. Eng. , 5 , 561 - 595
-
40)
- Jiang, Y., Cukic, B.: `Misclassification cost-sensitive fault prediction models', Proc. Fifth Int. Conf. on Predictor Models in Software Engineering. PROMISE’09, 2009, New York, USA, p. 20:1–20:10.
-
41)
- Bezerra, M.E.R., Oliveira, A.L.I., Meira, S.R.L.: `A constructive RBF neural network for estimating the probability of defects in software modules', Int. Joint Conf. on Neural Networks, 2007. IJCNN 2007, 2007, p. 2869–2874.
-
42)
- A.G. Koru , H. Liu . Building effective defect-prediction models in practice. IEEE Softw. , 6 , 23 - 29
-
43)
- Li, Z., Reformat, M.: `A practical method for the software fault-prediction', IEEE Int. Conf. on Information Reuse and Integration. IRI 2007, August 2007, p. 659–666.
-
44)
- M.A. Hall . (1999) Correlation-based feature subset selection for machine learning.
-
45)
- M.H. Halstead . (1977) Elements of software science (operating and programming systems series).
-
46)
- Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: `Further thoughts on precision', Evaluation and Assessment in Software Engineering (EASE), 2011, p. 129–133.
-
47)
- O. Vandecruys , D. Martens , B. Baesens , C. Mues , M. De Backer , R. Haesen . Mining software repositories for comprehensible software fault prediction models. J. Syst. Softw. , 5 , 823 - 839
-
48)
- Shivaji, S., Whitehead, E.J., Akella, R., Kim, S.: `Reducing features to improve bug prediction', 24thIEEE/ACM Int. Conf. Automated Software Engineering, 2009. ASE’09, 2009, p. 600–604.
-
49)
- G.D. Boetticher . Improving credibility of machine learner models in software engineering’. Advanced Machine Learner Applications in Software Engineering (Series on Software Engineering and Knowledge Engineering) , 52 - 72
-
50)
- Liebchen, G.A., Shepperd, M.: `Data sets and data quality in software engineering', PROMISE’08: Proc. Fourth Int. Workshop on Predictor Models in Software Engineering, 2008, New York, USA, p. 39–44.
-
51)
- Challagulla, V.U.B., Bastani, F.B., Yen, I.L., Paul, R.A.: `Empirical assessment of machine learning based software defect prediction techniques', WORDS’05: Proc. 10th IEEE Int. Workshop on Object-Oriented Real-Time Dependable Systems, 2005, Washington, DC, USA, p. 263–270.
-
52)
- I.H. Witten , E. Frank . (2005) Data mining: practical machine learning tools and techniques, Morgan Kaufmann series in data management systems.
-
53)
- Oral, A.D., Bener, A.B.: `Defect prediction for embedded software', 22ndInt. Symp. on Computer and Information Sciences, 2007. ISCIS 2007, 2007, p. 1–6.
-
54)
- B. Turhan , G. Kocak , A. Bener . Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Syst. Appl. , 6 , 9986 - 9990
-
55)
- Y. Liu , T.M. Khoshgoftaar , N. Seliya . Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans. Softw. Eng. , 6 , 852 - 864
-
56)
- Guo, L., Cukic, B., Singh, H.: `Predicting fault prone modules by the Dempster–Shafer belief networks', Proc. 18th IEEE Int. Conf. on Automated Software Engineering, 2003, p. 249–252.
-
57)
- Menzies, T., Turhan, B., Bener, A., Gay, G., Cukic, B., Jiang, Y.: `Implications of ceiling effects in defect predictors', Proc. Fourth Int. Workshop on Predictor Models in Software Engineering. PROMISE’08, 2008, New York, USA, p. 47–54.
-
58)
- M.R. Segal . (2004) Machine learning benchmarks and random forest regression.
-
59)
- Zhong, S., Khoshgoftaar, T.M., Seliya, N.: `Unsupervised learning for expert-based software quality estimation', Proc. Eighth IEEE Int. Symp. on High Assurance Systems Engineering, 2004, p. 149–155.
-
60)
- T. Menzies , J. Greenwald , A. Frank . Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. , 1 , 2 - 13
-
61)
- S. Kim , E. James , J. Whitehead , Y. Zhang . Classifying software changes: clean or buggy?. IEEE Trans. Softw. Eng. , 2 , 181 - 196
-
62)
- Tosun, A., Bener, A.: `Reducing false alarms in software defect prediction by decision threshold optimization', Proc. 2009 Third Int. Symp. on Empirical Software Engineering and Measurement. ESEM’09, 2009, Washington, DC, USA, p. 477–480.
-
63)
- Zhang, H.: `An investigation of the relationships between lines of code and defects', IEEE Int. Conf. on Software Maintenance, 2009. ICSM 2009, 2009, p. 274–283.
-
64)
- Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: `SMOTEBoost: improving prediction of the minority class in boosting', Proc. Principles of Knowledge Discovery in Databases (PKDD-2003), 2003, p. 107–119.
-
65)
- Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: `Detecting fault modules applying feature selection to classifiers', IEEE Int. Conf. on Information Reuse and Integration. IRI 2007, 2007, p. 667–672.
-
66)
- A.B. de Carvalho , A. Pozo , S.R. Vergilio . A symbolic fault-prediction model based on multiobjective particle swarm optimization. J. Syst. Softw. , 5 , 868 - 882
-
67)
- Kaminsky, K., Boetticher, G.: `Building a genetically engineerable evolvable program (GEEP) using breadth-based explicit knowledge for predicting software defects', IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS'04, 2004, 1, p. 10–15.
-
68)
- Jiang, Y., Cukic, B., Menzies, T.: `Can data transformation help in the detection of fault-prone modules?', DEFECTS‘08: Proc. 2008 Workshop on Defects in Large Software Systems, 2008, New York, USA, p. 16–20.
-
69)
- S. Dudoit , J. Fridlyand . (2003) Classification in microarray experiments, Statistical analysis of gene expression microarray data.
-
70)
- Mende, T., Koschke, R.: `Revisiting the evaluation of defect prediction models', Proc. Fifth Int. Conf. on Predictor Models in Software Engineering. PROMISE’09, 2009, New York, USA, p. 7:1–7:10.
-
71)
- Turhan, B., Bener, A.: `A multivariate analysis of static code attributes for defect prediction', QSIC’07: Proc. Seventh Int. Conf. on Quality Software, 2007, Washington, DC, USA, p. 231–237.
-
72)
- T.M. Khoshgoftaar , E.B. Allen . Ordering fault-prone software modules. Softw. Qual. Control , 19 - 37
-
73)
- Seliya, N., Khoshgoftaar, T.M., Zhong, S.: `Analyzing software quality with limited fault-proneness defect data', Ninth IEEE Int. Symp. on High-Assurance Systems Engineering, 2005. HASE 2005, 2005, p. 89–98.
-
74)
- N.V. Chawla , K.W. Bowyer , L.O. Hall , W.P. Kegelmeyer . SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. , 321 - 357
-
75)
- J. Śliwerski , T. Zimmermann , A. Zeller . When do changes induce fixes?. SIGSOFT Softw. Eng. Notes , 4 , 1 - 5
-
1)