ABSTRACT
Background: Bug datasets have been created and used by many researchers to build bug prediction models.
Aims: In this work we collected existing public bug datasets and unified their contents.
Method: We considered 5 public datasets which adhered to all of our criteria. We also downloaded the corresponding source code for each system in the datasets and performed their source code analysis to obtain a common set of source code metrics. This way we produced a unified bug dataset at class and file level that is suitable for further research (e.g. to be used in the building of new bug prediction models). Furthermore, we compared the metric definitions and values of the different bug datasets.
Results: We found that (i) the same metric abbreviation can have different definitions or metrics calculated in the same way can have different names, (ii) in some cases different tools give different values even if the metric definitions coincide because (iii) one tool works on source code while the other calculates metrics on bytecode, or (iv) in several cases the downloaded source code contained more files which influenced the afferent metric values significantly.
Conclusions: Apart from all these imprecisions, we think that having a common metric set can help in building better bug prediction models and deducing more general conclusions. We made the unified dataset publicly available for everyone. By using a public dataset as an input for different bug prediction related investigations, researchers can make their studies reproducible, thus able to be validated and verified.
- OpenStaticAnalyzer static code analyzer. https://github.com/sed-inf-u-szeged/OpenStatic Analyzer.Google Scholar
- SourceMeter static code analyzer. https://www.sourcemeter.com.Google Scholar
- The promise repository of empirical software engineering data, 2015.Google Scholar
- Vera Barstad, Morten Goodwin, and Terje Gjøsæter. Predicting source code quality with static analysis and machine learning. In NIK, 2014.Google Scholar
- Victor R Basili, Lionel C. Briand, and Walcélio L Melo. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on software engineering, 22(10):751--761, 1996. Google ScholarDigital Library
- David Bowes, Tracy Hall, Mark Harman, Yue Jia, Federica Sarro, and Fan Wu. Mutation-aware fault prediction. In Proceedings of the 25th International Symposium on Software Testing and Analysis, pages 330--341. ACM, 2016. Google ScholarDigital Library
- Lionel C. Briand, John W. Daly, and Jurgen K Wust. A unified framework for coupling measurement in object-oriented systems. IEEE Transactions on software Engineering, 25(1):91--121, 1999. Google ScholarDigital Library
- Peggy Cellier, Mireille Ducassé, Sébastien Ferré, and Olivier Ridoux. Multiple fault localization with data mining. In SEKE, pages 238--243, 2011.Google Scholar
- Valentin Dallmeier and Thomas Zimmermann. Automatic extraction of bug localization benchmarks from history. In Proc. IntâĂŹl Conf. on Automated Software Eng, pages 433--436. Citeseer, 2007. Google ScholarDigital Library
- Marco D'Ambros, Michele Lanza, and Romain Robbes. An extensive comparison of bug prediction approaches. In 7th Working Conference on Mining Software Repositories (MSR), pages 31--41. IEEE, 2010.Google ScholarCross Ref
- Marco DâĂŹAmbros, Michele Lanza, and Romain Robbes. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering, 17(4--5):531--577, 2012. Google ScholarDigital Library
- David Gray, David Bowes, Neil Davey, Y Sun, and Bruce Christianson. Reflections on the nasa mdp data sets. IET software, 6(6):549--558, 2012.Google ScholarCross Ref
- Tibor Gyimothy, Rudolf Ferenc, and Istvan Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software engineering, 31(10):897--910, 2005. Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
- Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6):1276--1304, 2012. Google ScholarDigital Library
- Tracy Hall, Min Zhang, David Bowes, and Yi Sun. Some code smells have a significant but small effect on faults. ACM Transactions on Software Engineering and Methodology (TOSEM), 23(4):33, 2014. Google ScholarDigital Library
- Mark Harman, Syed Islam, Yue Jia, Leandro L. Minku, Federica Sarro, and Komsan Srivisut. Less is more: Temporal fault predictive performance over multiple hadoop releases. In Claire Le Goues and Shin Yoo, editors, Search-Based Software Engineering, pages 240--246, Cham, 2014. Springer International Publishing.Google Scholar
- J Horning, H Lauer, P Melliar-Smith, and Brian Randell. A program structure for error detection and recovery. Operating Systems, pages 171--187, 1974. Google ScholarDigital Library
- Marian Jureczko and Lech Madeyski. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, page 9. ACM, 2010. Google ScholarDigital Library
- Sunghun Kim, Thomas Zimmermann, E James Whitehead Jr, and Andreas Zeller. Predicting faults from cached history. In Proceedings of the 29th international conference on Software Engineering, pages 489--498. IEEE Computer Society, 2007. Google ScholarDigital Library
- Rüdiger Lincke, Jonas Lundberg, and Welf Löwe. Comparing software metrics tools. In Proceedings of the 2008 international symposium on Software testing and analysis, pages 131--142. ACM, 2008. Google ScholarDigital Library
- Ruchika Malhotra, Shivani Shukla, and Geet Sawhney. Assessment of defect prediction models using machine learning techniques for object-oriented systems. In Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), 2016 5th International Conference on, pages 577--583. IEEE, 2016.Google Scholar
- Tim Menzies, Jeremy Greenwald, and Art Frank. Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering, 33(1), 2007. Google ScholarDigital Library
- Ayse Tosun Misirli, Ayse Bener, and Resat Kale. Ai-based software defect predictors: Applications and benefits in a case study. AI Magazine, 32(2):57--68. 2011.Google ScholarCross Ref
- Raimund Moser, Witold Pedrycz, and Giancarlo Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering, pages 181--190. ACM, 2008. Google ScholarDigital Library
- Nachiappan Nagappan and Thomas Ball. Use of relative code churn measures to predict system defect density. In Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on, pages 284--292. IEEE, 2005. Google ScholarDigital Library
- Thomas J Ostrand, Elaine J Weyuker, and Robert M Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4):340--355, 2005. Google ScholarDigital Library
- Shruthi Puranik, Pranav Deshpande, and K Chandrasekaran. A novel machine learning approach for bug prediction. Procedia Computer Science, 93:924--930, 2016.Google ScholarCross Ref
- Brian Randell. System structure for software fault tolerance. IEEE Transactions on Software Engineering, 10(2):220--232, 1975. Google ScholarDigital Library
- Gregorio Robles. Replicating msr: A study of the potential replicability of papers published in the mining software repositories proceedings. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 171--180. IEEE, 2010.Google ScholarCross Ref
- Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering, 39(9):1208--1215, 2013. Google ScholarDigital Library
- Thomas Shippey, Tracy Hall, Steve Counsell, and David Bowes. So you need more method level datasets for your software defect prediction?: Voilà! In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, page 12. ACM, 2016. Google ScholarDigital Library
- Ramanath Subramanyam and Mayuram S. Krishnan. Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Transactions on software engineering, 29(4):297--310, 2003. Google ScholarDigital Library
- Zoltán Tóth, Péter Gyimesi, and Rudolf Ferenc. A public bug database of github projects and its application in bug prediction. In International Conference on Computational Science and Its Applications, pages 625--638. Springer, 2016.Google ScholarCross Ref
- Elaine J Weyuker, Robert M Bell, and Thomas J Ostrand. Replicate, replicate, replicate. In Replication in Empirical Software Engineering Research (RESER), 2011 Second International Workshop on, pages 71--77. IEEE, 2011. Google ScholarDigital Library
- Elaine J Weyuker, Thomas J Ostrand, and Robert M Bell. Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Engineering, 15(3):277--295, 2010. Google ScholarDigital Library
- W Eric Wong, Vidroha Debroy, and Dianxiang Xu. Towards better fault localization: A crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(3):378--396, 2012. Google ScholarDigital Library
- W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. A survey on software fault localization. IEEE Transactions on Software Engineering, 42(8):707--740, 2016. Google ScholarDigital Library
- Zhiwei Xu, Taghi M Khoshgoftaar, and Edward B Allen. Prediction of software faults using fuzzy nonlinear regression modeling. In High Assurance Systems Engineering, 2000, Fifth IEEE International Symposim on. HASE 2000, pages 281--290. IEEE, 2000.Google Scholar
- Zhe Yu, Nicholas A Kraft, and Tim Menzies. How to read less: Better machine assisted reading methods for systematic literature reviews. arXiv preprint arXiv:1612.03224, 2016.Google Scholar
- Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 91--100. ACM, 2009. Google ScholarDigital Library
- Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. Predicting defects for eclipse. In Proceedings of the third international workshop on predictor models in software engineering, page 9. IEEE Computer Society, 2007. Google ScholarDigital Library
Index Terms
- A Public Unified Bug Dataset for Java
Recommendations
A public unified bug dataset for java and its assessment regarding metrics and bug prediction
AbstractBug datasets have been created and used by many researchers to build and validate novel bug prediction models. In this work, our aim is to collect existing public source code metric-based bug datasets and unify their contents. Furthermore, we wish ...
Empirical Evaluation of Hunk Metrics as Bug Predictors
IWSM '09 /Mensura '09: Proceedings of the International Conferences on Software Process and Product MeasurementReducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers ...
Method-level bug prediction
ESEM '12: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurementResearchers proposed a wide range of approaches to build effective bug prediction models that take into account multiple aspects of the software development process. Such models achieved good prediction performance, guiding developers towards those ...
Comments