Abstract
Context
Although a long-running project has experienced many releases, removing defects from a product is still a challenge. Cross-version defect prediction (CVDP) regards project data of prior releases as a useful source for predicting fault-prone modules based on defect prediction techniques. Recent studies have explored cross-project defect prediction (CPDP) that uses the project data from outside a project for defect prediction. While CPDP techniques and CPDP data can be diverted to CVDP, its effectiveness has not been investigated.
Objective
To investigate whether CPDP approaches and CPDP data are useful for CVDP. The investigation also compared the usage of prior release data.
Method
We chose a style of replication of a previous comparative study on CPDP approaches.
Results
Some CPDP approaches could improve the performance of CVDP. The use of the latest prior release was the best choice. If one has no CVDP data, the use of CPDP data for CVDP was found to be effective.
Conclusions
1) Some CPDP approaches could improve CVDP, 2), if one can access project data from the latest release, project data from older releases would not bring clear benefit, and 3) even if one has no CVDP data, appropriate CPDP approaches would be able to deliver quality prediction with CPDP data.
Similar content being viewed by others
References
Amasaki S (2018) Cross-version defect prediction using cross-project defect prediction approaches. In: Proc. of PROMISE ’18. ACM, pp 32–41
Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: Proc. of SEAA ’15. IEEE, pp 96–103
Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc. of ISESE ’06. ACM, pp 1–10
Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: Proc. of QRS ’16. IEEE, pp 214–221
Bin Y, Zhou K, Lu H, Zhou Y, Xu B (2017) Training data selection for cross-project defection prediction: which approach is better? In: Proc. of ESEM ’17. IEEE, pp 354–363
Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720
Broomhead DS, Lowe D (1988) Multivariate functional interpolation and adaptive networks. Complex Syst 2:321–355
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proc. of ICST ’13. IEEE, pp 252–261
Chen L, Fang B, Shang Z, Tang Y (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62(C):67–77
Cheng M, Wu G, Wan H, You G, Yuan M, Jiang M (2016) Exploiting correlation subspace to predict heterogeneous cross-project defects. Int J Softw Eng Knowl Eng 26(09 & 10):1571–1580
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cox DR (1958) Two further applications of a model for binary regression. Biometrika 45(3):562–565
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proc. of MSR ’10. IEEE, pp 31–41
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130
Erika CCA, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proc. of ESEM ’09. IEEE, pp 460–463
Harman M, Islam S, Jia Y, Minku LL, Sarro F, Srivisut K (2014) Less is more: temporal fault predictive performance over multiple hadoop releases. In: Proc. of SSBSE’14. Springer, pp 240–246
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199
He Z, Peters F, Menzies T, Yang Y (2013) Learning from open-source projects: an empirical study on defect prediction. In: Proc. of ESEM ’13. IEEE, pp 45–54
He P, Li B, Ma Y (2014) Towards cross-project defect prediction with imbalanced feature sets. CoRR 1411.4228
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proc. of PROMISE ’13. ACM, pp 6:1–6:10
Herbold S (2015) CrossPare: a tool for benchmarking cross-project defect predictions. In: Proc. of ASEW ’15. IEEE, pp 90–96
Herbold S, Trautsch A, Grabowski J (2017) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng, 1–25. https://doi.org/10.1109/TSE.2018.2790413
Herbold S, Trautsch A, Grabowski J (2018) Correction of ”a comparative study to benchmark cross-project defect prediction approaches”. IEEE Trans Softw Eng, 1–5. https://doi.org/10.1109/TSE.2018.2790413
Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Proc. of ISSRE ’13. IEEE, pp 118–127
Holschuh T, Pauser M, Herzig K, Zimmermann T, Premraj R, Zeller A (2009) Predicting defects in sap java code: An experience report. In: Proc. of ICSE ’09 - companion volume. IEEE, pp 172–181
Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312
Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proc. of ESEC/FSE ’15. ACM, pp 496–507
Jing XY, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proc. of PROMISE ’10. ACM, pp 9:1–9:10
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proc. of ACIT-CSI ’15, pp 2–7
Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8:3
Khoshgoftaar TM, Rebours P, Seliya N (2009) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49
Li Z, Jing XY, Zhu X, Zhang H, Xu B, Ying S (2017) On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans Softw Eng, 1–21
Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864
Lu H, Kocaguneli E, Cukic B (2014) Defect prediction between software versions with active learning and dimensionality reduction. In: Proc. of ISSRE ’14. IEEE, pp 312–322
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):1–30
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local versus global models for effort estimation and defect prediction. In: Proc. of ASE ’11. IEEE, pp 343–351
Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, Matsumoto K (2013) Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Trans Softw Eng 39(10):1345–1357
Nam J, Kim S (2015) CLAMI: defect prediction on unlabeled datasets. In: Proc. of ASE ’15. IEEE, pp 452–463
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proc. of ICSE ’13. IEEE, pp 382–391
Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896
Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’Union fait la force. In: Proc. of CSMR-WCRE ’14. IEEE, pp 164–173
Peters F, Menzies T (2012) Privacy and utility for defect prediction: experiments with MORPH. In: Proc. of ICSE ’12. IEEE, pp 189–199
Peters F, Menzies T, Gong L, Zhang H (2013a) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Peters F, Menzies T, Marcus A (2013b) Better cross company defect prediction. In: MSR ’13: 10th IEEE working conference on mining software repositories. IEEE, pp 409–418
Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proc. of ICSE ’15. IEEE, pp 801–811
Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proc. of ESEM ’11. IEEE, pp 215–224
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc
Rahman F, Posnett D, Devanbu P (2012) Recalling the ”imprecision” of cross-project defect prediction. In: Proc. of ESEC/FSE ’12. ACM, pp 61:1–61:11
Rana R, Staron M, Berger C, Hansson J, Nilsson M, Meding W (2014) The adoption of machine learning techniques for software defect prediction: an initial industrial validation. In: Proc. of joint conference on knowledge-based software engineering. Springer, pp 270–285
Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):1–29
Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980
Sarro F, Di Martino S, Ferrucci F, Gravino C (2012) A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: Proc. of SAC ’12. ACM, pp 1215–1220
Shepperd MJ, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Tosun A, Bener A, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: a case study within the Turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257
Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17(1–2):62–74
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14 (5):540–578
Turhan B, Tosun AM, Bener AB (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118
Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: Proc. of SNPD ’12. IEEE, pp 476–481
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proc. of PROMISE ’08. ACM, pp 19–24
Wu R, Zhang H, Kim S, Cheung SC (2011) ReLink: recovering links between bugs and changes. In: Proc. of ESEC/FSE ’11. ACM, pp 15–25
Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42 (10):977–998
Xu Z, Li S, Tang Y, Luo X, Zhang T, Liu J, Xu J (2018a) Cross version defect prediction with representative data via sparse subset selection. In: Proc. of ICPC ’18. ACM, pp 1–12
Xu Z, Liu J, Luo X, Zhang T (2018b) Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proc. of SANER ’18. IEEE, pp 209–220
Yu Q, Jiang S, Zhang Y (2017) A feature matching and transfer approach for cross-company defect prediction. J Syst Softw 132:366–378
Yu X, Wu M, Jian Y, Bennin KE, Fu M, Ma C (2018) Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning. Soft Comput 22(10):1–12
Zhang Y, Lo D, Xia X, Sun J (2015) An Empirical Study of Classifier Combination for Cross-Project Defect Prediction. In: Proc. of COMPSAC ’15. IEEE, pp 264–269
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proc. of ICSE ’16. ACM, pp 309–320
Zhang Y, Lo D, Xia X, Sun J (2018) Combined classifier for cross-project defect prediction: an extended empirical study. Front Comput Sci 12(2):280–296
Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Zhou Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453
Zhou Y, Yang Y, Lu H, Chen L, Li Y, Zhao Y, Qian J, Xu B (2018) How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans Softw Eng Methodol 27(1):1–51
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proc. of ESEC/FSE ’09. ACM, pp 91–100
Acknowledgments
This work was partially supported by JSPS KAKENHI under Grant No. 18K11246.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Shane McIntosh, Leandro L. Minku, Ayşe Tosun, and Burak Turhan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Predictive Models and Data Analytics in Software Engineering (PROMISE)
Rights and permissions
About this article
Cite this article
Amasaki, S. Cross-version defect prediction: use historical data, cross-project data, or both?. Empir Software Eng 25, 1573–1595 (2020). https://doi.org/10.1007/s10664-019-09777-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09777-8