Skip to main content
Log in

Cross-version defect prediction: use historical data, cross-project data, or both?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

Although a long-running project has experienced many releases, removing defects from a product is still a challenge. Cross-version defect prediction (CVDP) regards project data of prior releases as a useful source for predicting fault-prone modules based on defect prediction techniques. Recent studies have explored cross-project defect prediction (CPDP) that uses the project data from outside a project for defect prediction. While CPDP techniques and CPDP data can be diverted to CVDP, its effectiveness has not been investigated.

Objective

To investigate whether CPDP approaches and CPDP data are useful for CVDP. The investigation also compared the usage of prior release data.

Method

We chose a style of replication of a previous comparative study on CPDP approaches.

Results

Some CPDP approaches could improve the performance of CVDP. The use of the latest prior release was the best choice. If one has no CVDP data, the use of CPDP data for CVDP was found to be effective.

Conclusions

1) Some CPDP approaches could improve CVDP, 2), if one can access project data from the latest release, project data from older releases would not bring clear benefit, and 3) even if one has no CVDP data, appropriate CPDP approaches would be able to deliver quality prediction with CPDP data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. http://www.spinelis.gr/sw/ckjm

References

  • Amasaki S (2018) Cross-version defect prediction using cross-project defect prediction approaches. In: Proc. of PROMISE ’18. ACM, pp 32–41

  • Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: Proc. of SEAA ’15. IEEE, pp 96–103

  • Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc. of ISESE ’06. ACM, pp 1–10

  • Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: Proc. of QRS ’16. IEEE, pp 214–221

  • Bin Y, Zhou K, Lu H, Zhou Y, Xu B (2017) Training data selection for cross-project defection prediction: which approach is better? In: Proc. of ESEM ’17. IEEE, pp 354–363

  • Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720

    Article  Google Scholar 

  • Broomhead DS, Lowe D (1988) Multivariate functional interpolation and adaptive networks. Complex Syst 2:321–355

    MATH  Google Scholar 

  • Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proc. of ICST ’13. IEEE, pp 252–261

  • Chen L, Fang B, Shang Z, Tang Y (2015) Negative samples reduction in cross-company software defects prediction. Inf Softw Technol 62(C):67–77

    Article  Google Scholar 

  • Cheng M, Wu G, Wan H, You G, Yuan M, Jiang M (2016) Exploiting correlation subspace to predict heterogeneous cross-project defects. Int J Softw Eng Knowl Eng 26(09 & 10):1571–1580

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Cox DR (1958) Two further applications of a model for binary regression. Biometrika 45(3):562–565

    Article  MATH  Google Scholar 

  • D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proc. of MSR ’10. IEEE, pp 31–41

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

    Article  MATH  Google Scholar 

  • Erika CCA, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proc. of ESEM ’09. IEEE, pp 460–463

  • Harman M, Islam S, Jia Y, Minku LL, Sarro F, Srivisut K (2014) Less is more: temporal fault predictive performance over multiple hadoop releases. In: Proc. of SSBSE’14. Springer, pp 240–246

  • He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199

    Article  Google Scholar 

  • He Z, Peters F, Menzies T, Yang Y (2013) Learning from open-source projects: an empirical study on defect prediction. In: Proc. of ESEM ’13. IEEE, pp 45–54

  • He P, Li B, Ma Y (2014) Towards cross-project defect prediction with imbalanced feature sets. CoRR 1411.4228

  • He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

    Article  Google Scholar 

  • Herbold S (2013) Training data selection for cross-project defect prediction. In: Proc. of PROMISE ’13. ACM, pp 6:1–6:10

  • Herbold S (2015) CrossPare: a tool for benchmarking cross-project defect predictions. In: Proc. of ASEW ’15. IEEE, pp 90–96

  • Herbold S, Trautsch A, Grabowski J (2017) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng, 1–25. https://doi.org/10.1109/TSE.2018.2790413

    Article  Google Scholar 

  • Herbold S, Trautsch A, Grabowski J (2018) Correction of ”a comparative study to benchmark cross-project defect prediction approaches”. IEEE Trans Softw Eng, 1–5. https://doi.org/10.1109/TSE.2018.2790413

    Article  Google Scholar 

  • Herzig K, Just S, Rau A, Zeller A (2013) Predicting defects using change genealogies. In: Proc. of ISSRE ’13. IEEE, pp 118–127

  • Holschuh T, Pauser M, Herzig K, Zimmermann T, Premraj R, Zeller A (2009) Predicting defects in sap java code: An experience report. In: Proc. of ICSE ’09 - companion volume. IEEE, pp 172–181

  • Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312

    Article  Google Scholar 

  • Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proc. of ESEC/FSE ’15. ACM, pp 496–507

  • Jing XY, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339

    Article  Google Scholar 

  • Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proc. of PROMISE ’10. ACM, pp 9:1–9:10

  • Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: Proc. of ACIT-CSI ’15, pp 2–7

  • Khoshgoftaar TM, Seliya N (2003) Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir Softw Eng 8:3

    Google Scholar 

  • Khoshgoftaar TM, Rebours P, Seliya N (2009) Software quality analysis by combining multiple projects and learners. Softw Qual J 17(1):25–49

    Article  Google Scholar 

  • Li Z, Jing XY, Zhu X, Zhang H, Xu B, Ying S (2017) On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans Softw Eng, 1–21

  • Liu Y, Khoshgoftaar TM, Seliya N (2010) Evolutionary optimization of software quality modeling with multiple repositories. IEEE Trans Softw Eng 36(6):852–864

    Article  Google Scholar 

  • Lu H, Kocaguneli E, Cukic B (2014) Defect prediction between software versions with active learning and dimensionality reduction. In: Proc. of ISSRE ’14. IEEE, pp 312–322

  • Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256

    Article  Google Scholar 

  • Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J 23(3):1–30

    Article  Google Scholar 

  • Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local versus global models for effort estimation and defect prediction. In: Proc. of ASE ’11. IEEE, pp 343–351

  • Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, Matsumoto K (2013) Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Trans Softw Eng 39(10):1345–1357

    Article  Google Scholar 

  • Nam J, Kim S (2015) CLAMI: defect prediction on unlabeled datasets. In: Proc. of ASE ’15. IEEE, pp 452–463

  • Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proc. of ICSE ’13. IEEE, pp 382–391

  • Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896

    Article  Google Scholar 

  • Panichella A, Oliveto R, De Lucia A (2014) Cross-project defect prediction models: L’Union fait la force. In: Proc. of CSMR-WCRE ’14. IEEE, pp 164–173

  • Peters F, Menzies T (2012) Privacy and utility for defect prediction: experiments with MORPH. In: Proc. of ICSE ’12. IEEE, pp 189–199

  • Peters F, Menzies T, Gong L, Zhang H (2013a) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068

    Article  Google Scholar 

  • Peters F, Menzies T, Marcus A (2013b) Better cross company defect prediction. In: MSR ’13: 10th IEEE working conference on mining software repositories. IEEE, pp 409–418

  • Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proc. of ICSE ’15. IEEE, pp 801–811

  • Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proc. of ESEM ’11. IEEE, pp 215–224

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc

  • Rahman F, Posnett D, Devanbu P (2012) Recalling the ”imprecision” of cross-project defect prediction. In: Proc. of ESEC/FSE ’12. ACM, pp 61:1–61:11

  • Rana R, Staron M, Berger C, Hansson J, Nilsson M, Meding W (2014) The adoption of machine learning techniques for software defect prediction: an initial industrial validation. In: Proc. of joint conference on knowledge-based software engineering. Springer, pp 270–285

  • Ryu D, Choi O, Baik J (2014) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):1–29

    Google Scholar 

  • Ryu D, Jang JI, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980

    Article  Google Scholar 

  • Sarro F, Di Martino S, Ferrucci F, Gravino C (2012) A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: Proc. of SAC ’12. ACM, pp 1215–1220

  • Shepperd MJ, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215

    Article  Google Scholar 

  • Tosun A, Bener A, Turhan B, Menzies T (2010) Practical considerations in deploying statistical methods for defect prediction: a case study within the Turkish telecommunications industry. Inf Softw Technol 52(11):1242–1257

    Article  Google Scholar 

  • Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empir Softw Eng 17(1–2):62–74

    Article  Google Scholar 

  • Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14 (5):540–578

    Article  Google Scholar 

  • Turhan B, Tosun AM, Bener AB (2013) Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol 55(6):1101–1118

    Article  Google Scholar 

  • Uchigaki S, Uchida S, Toda K, Monden A (2012) An ensemble approach of simple regression models to cross-project fault prediction. In: Proc. of SNPD ’12. IEEE, pp 476–481

  • Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proc. of PROMISE ’08. ACM, pp 19–24

  • Wu R, Zhang H, Kim S, Cheung SC (2011) ReLink: recovering links between bugs and changes. In: Proc. of ESEC/FSE ’11. ACM, pp 15–25

  • Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42 (10):977–998

    Article  Google Scholar 

  • Xu Z, Li S, Tang Y, Luo X, Zhang T, Liu J, Xu J (2018a) Cross version defect prediction with representative data via sparse subset selection. In: Proc. of ICPC ’18. ACM, pp 1–12

  • Xu Z, Liu J, Luo X, Zhang T (2018b) Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proc. of SANER ’18. IEEE, pp 209–220

  • Yu Q, Jiang S, Zhang Y (2017) A feature matching and transfer approach for cross-company defect prediction. J Syst Softw 132:366–378

    Article  Google Scholar 

  • Yu X, Wu M, Jian Y, Bennin KE, Fu M, Ma C (2018) Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning. Soft Comput 22(10):1–12

    Article  Google Scholar 

  • Zhang Y, Lo D, Xia X, Sun J (2015) An Empirical Study of Classifier Combination for Cross-Project Defect Prediction. In: Proc. of COMPSAC ’15. IEEE, pp 264–269

  • Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proc. of ICSE ’16. ACM, pp 309–320

  • Zhang Y, Lo D, Xia X, Sun J (2018) Combined classifier for cross-project defect prediction: an extended empirical study. Front Comput Sci 12(2):280–296

    Article  Google Scholar 

  • Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Zhou Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453

    Article  Google Scholar 

  • Zhou Y, Yang Y, Lu H, Chen L, Li Y, Zhao Y, Qian J, Xu B (2018) How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans Softw Eng Methodol 27(1):1–51

    Article  Google Scholar 

  • Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proc. of ESEC/FSE ’09. ACM, pp 91–100

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI under Grant No. 18K11246.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sousuke Amasaki.

Additional information

Communicated by: Shane McIntosh, Leandro L. Minku, Ayşe Tosun, and Burak Turhan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Predictive Models and Data Analytics in Software Engineering (PROMISE)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amasaki, S. Cross-version defect prediction: use historical data, cross-project data, or both?. Empir Software Eng 25, 1573–1595 (2020). https://doi.org/10.1007/s10664-019-09777-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09777-8

Keywords

Navigation