On the relative value of cross-company and within-company data for defect prediction

Turhan, Burak; Menzies, Tim; Bener, Ayşe B.; Di Stefano, Justin

doi:10.1007/s10664-008-9103-7

On the relative value of cross-company and within-company data for defect prediction

Published: 07 January 2009

Volume 14, pages 540–578, (2009)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Burak Turhan¹,
Tim Menzies²,
Ayşe B. Bener¹ &
…
Justin Di Stefano²

3329 Accesses
509 Citations
4 Altmetric
Explore all metrics

Abstract

We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few. Then we apply principles of analogy-based learning (i.e. nearest neighbor (NN) filtering) to CC data, in order to fine tune these models for localization. We compare the performance of these models with that of defect predictors learned from within-company (WC) data. As expected, we observe that defect predictors learned from WC data outperform the ones learned from CC data. However, our analyses also yield defect predictors learned from NN-filtered CC data, with performance close to, but still not better than, WC data. Therefore, we perform a final analysis for determining the minimum number of local defect reports in order to learn WC defect predictors. We demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within a few months. Hence, for companies with no local defect data, we recommend a two-phase approach that allows them to employ the defect prediction process instantaneously. In phase one, companies should use NN-filtered CC data to initiate the defect prediction process and simultaneously start collecting WC (local) data. Once enough WC data is collected (i.e. after a few months), organizations should switch to phase two and use predictors learned from WC data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

Global vs. local models for cross-project defect prediction

Article 24 October 2016

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

Throughout the paper, the following notation is used: a defect predictor (or predictor) means a binary classification method that categorizes software modules as either defective or defect-free; data refers to MxN matrices of raw measurements of N metrics from M software modules; these N metrics are referred to as features.
Therefore, throughout the paper, the term “data” refers to static code features.
We should carefully note that we do not make use of any conceptual similarities, since our analysis is based on static code features. As to the issue of conceptual connections within the code, we refer the reader to the concept location and cohesion work of Marcus et al. (2008).
http://promisedata.org/repository
E.g. given an attribute’s minimum and maximum values, replace a particular value n with (n − min)/((max − min)/10). For more on discretization, see Dougherty et al. (1995).
In other languages, modules may be called “function” or “method”.
SOFTLAB data were not available at that time.
Details of this issue are out of the scope of this paper. For more, please see Table 1 in (Domingos and Pazzani 1997).
Caveat: We did not optimize the value of k for each project. We simply used a constant k = 10. We consider it as a future work to dynamically set the value of k for a given project (Baker 2007).
In order to reflect the use in practice, we do not use the remaining 90% of the same project for training, we rather use a random 90% of data from other projects. Please note that all WC analysis in this paper reflects within-company, not within project simulations. Since SOFTLAB data are collected from a single company, learning a predictor on some projects and to test it on a different one does not violate within company simulation.
That panel supported neither Fagan’s claim (Fagan 1986) that inspections can find 95% of defects before testing nor Shull’s claim that specialized directed inspection methods can catch 35% more defects that other methods (Shull et al. 2000).
TR(a,b,c) is a triangular distribution with min/mode/max of a,b,c.
Please note that we can only compare the defect detection properties of automatic vs manual methods. Unlike automatic defect prediction via data mining, the above manual inspection methods don’t just report “true,false” on a module, Rather, the manual methods also provide specific debugging information. Hence, a complete comparison of automatic vs manual defect prediction would have to include both an analysis of the time to detect potential defects and the time required to fix them. Manual methods might score higher to automatic methods since they can offer more clues back to the developer about what is wrong with the method. However, such an analysis is beyond the scope of this paper. Here, we focus only on the relative merits of different methods for predicting error prone modules.

References

Arisholm E, Briand L (2006a) Predicting fault-prone components in a java legacy system. In: ISESE ’06: Proceedings of the 2006 ACM/IEEE international symposium on international symposium on empirical software engineering, September 2006. http://portal.acm.org/citation.cfm?id=1159733.1159738
Arisholm E, Briand L (2006b) Predicting fault-prone components in a java legacy system. In: 5th ACM-IEEE international symposium on empirical software engineering (ISESE), Rio de Janeiro, Brazil, September 21–22. http://simula.no/research/engineering/publications/Arisholm.2006.4
Baker D (2007) A hybrid approach to expert and model-based effort estimation. Master’s thesis, Lane Department of Computer Science and Electrical Engineering, West Virginia University. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=5443
Basili V, McGarry F, Pajerski R, Zelkowitz M (2002) Lessons learned from 25 years of process improvement: the rise and fall of the NASA software engineering laboratory. In: Proceedings of the 24th international conference on software engineering (ICSE) 2002, Orlando, Florida. http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/83.88.pdf
Bell R, Ostrand T, Weyuker E (2006) Looking for bugs in all the right places. In: ISSTA ’06: Proceedings of the 2006 international symposium on software testing and analysis, July 2006. http://portal.acm.org/citation.cfm?id=1146238.1146246
Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Boehm B, Papaccio P (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477, October 1988
Article Google Scholar
Boehm B (2000) Safe and simple software cost analysis. IEEE Software, pp 14–17, September/October 2000. http://www.computer.org/certification/beta/Boehm_Safe.pdf
Boetticher G, Menzies T, Ostrand T (2007) The PROMISE repository of empirical software engineering data. http://promisedata.org/repository
Brooks FP (1995) The mythical man-month, Anniversary edn. Addison-Wesley, Reading
Google Scholar
Chapman M, Solomon D (2002) The relationship of cyclomatic complexity, essential complexity and error rates. In: Proceedings of the NASA software assurance symposium, Coolfont Resort and Conference Center in Berkley Springs, West Virginia. http://www.ivv.nasa.gov/business/research/osmasas/conclusion2002/Mike_Chapman_The_Relationship_of_Cyclomatic_Complexity_Essential_Complexity_and_Error_Rates.ppt
Chen Z, Menzies T, Port D (2005) Feature subset selection can improve software cost estimation. In: PROMISE’05. http://menzies.us/pdf/05/fsscocomo.pdf
Dekhtyar A, Hayes JH, Menzies T (2004) Text is software too. In: International workshop on mining software repositories. http://menzies.us/pdf/04msrtext.pdf
Demsar J (2006) Statistical comparisons of clasifiers over multiple data sets. J Mach Learn Res 7:1–30. http://jmlr.csail.mit.edu/papers/v7/demsar06a.html
MathSciNet Google Scholar
Domingos P, Pazzani MJ (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130 citeseer.ist.psu.edu/domingos97optimality.html
Article MATH Google Scholar
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202. http://www.cs.pdx.edu/ timm/dm/dougherty95supervised.pdf
Duda R, Hart P, Nilsson N (1976) Subjective bayesian methods for rule-based inference systems. In: Technical Report 124, Artificial Intelligence Center, SRI International
Fagan M (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211
Article Google Scholar
Fagan M (1986) Advances in software inspections. IEEE Trans Softw Eng 12:744–751, July 1986
Google Scholar
Fenton NE, Pfleeger S (1995) Software metrics: a rigorous & practical approach, 2nd edn. International Thompson, London
Google Scholar
Goseva K, Hamill M (2007) Architecture-based software reliability: why only a few parameters matter? In: 31st annual IEEE international computer software and applications conference (COMPSAC 2007), Beijing, July 2007
Graves TL, Karr AF, Marron JS, Siy HP (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661. www.niss.org/technicalreports/tr80.pdf
Google Scholar
Hall G, Munson J (2000) Software evolution: code delta and code churn. J Syst Softw 54(2):111–118
Article Google Scholar
Halstead M (1977) Elements of software science. Elsevier, Amsterdam
MATH Google Scholar
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19. http://doi.ieeecomputersociety.org/10.1109/TSE.2006.3
Article Google Scholar
Jiang Y, Cukic B, Menzies T (2007) Fault prediction using early lifecycle data. In: ISSRE’07. http://menzies.us/pdf/07issre.pdf
John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, Montreal, pp 338–345. http://citeseer.ist.psu.edu/john95estimating.html
Khoshgoftaar TM, Seliya N (2003) Analogy-based practical classification rules for software quality estimation. Empirical Softw Eng 8(4):325–350
Article Google Scholar
Khoshgoftaar T, Seliya N (2004) Comparative assessment of software quality classification techniques: an empirical case study. Empirical Softw Eng 9(3):229–257
Article Google Scholar
Kitchenham BA, Mendes E, Travassos GH (2007) Cross- vs. within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33:316–329, May 2007
Article Google Scholar
Koru AG, Emam KE, Zhang D, Liu H, Mathew D (2008) Theory of relative defect proneness. Empirical Softw Eng 13(5):473–498
Article Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60. http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177730491
Article MATH MathSciNet Google Scholar
Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300, March–April 2008
Article Google Scholar
McCabe T (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320, December 1976
Article MathSciNet Google Scholar
Menzies T, Raffo D, Setamanit S, Hu Y, Tootoonian S (2002) Model-based tests of truisms. In: Proceedings of IEEE ASE 2002. http://menzies.us/pdf/02truisms.pdf
Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Proceedings, workshop on predictive software models, Chicago. http://menzies.us/pdf/04psm.pdf.
Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision. IEEE Trans Softw Eng 33:637–640. http://menzies.us/pdf/07precision.pdf
Article Google Scholar
Menzies T, Greenwald J, Frank A (2007b) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33:2–13 http://menzies.us/pdf/06learnPredict.pdf
Article Google Scholar
Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of PROMISE 2008 workshop (ICSE). http://menzies.us/pdf/08ceiling.pdf
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: ICSE ’08: Proceedings of the 30th international conference on software engineering, May 2008. http://portal.acm.org/citation.cfm?id=1368088.1368114
Musa J, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw Hill, New York
Google Scholar
Nagappan N, Ball T (2005a) Static analysis tools as early indicators of pre-release defect density. In: ICSE 2005, St. Louis
Nagappan N, Ball T (2005b) Static analysis tools as early indicators of pre-release defect density. In: ICSE, pp 580–586. http://doi.acm.org/10.1145/1062558
Nikora A, Munson J (2003) Developing fault predictors for evolving software systems. In: Ninth international software metrics symposium (METRICS’03)
Orrego A (2004) Sawtooth: learning from huge amounts of data. Master’s thesis, Computer Science, West Virginia University
Ostrand T, Weyuker E, Bell R (2007) Automating algorithms for the identification of fault-prone files. In: ISSTA ’07: Proceedings of the 2007 international symposium on software testing and analysis, July 2007. http://portal.acm.org/citation.cfm?id=1273463.1273493
Polyspace (2008) Polyspace verifier^®. http://www.di.ens.fr/ cousot/projects/DAEDALUS/synthetic_summary/POLYSPACE/polyspace-daedalus.htm
Porter A, Selby R (1990) Empirically guided software development using metric-based classification trees. IEEE Softw 7:46–54, March
Article Google Scholar
Quinlan R (1992a) C4.5: programs for machine learning. Morgan Kaufman, San Francisco, iSBN: 1558602380
Google Scholar
Quinlan JR (1992b) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, pp 343–348. http://citeseer.nj.nec.com/quinlan92learning.html
Rakitin S (2001) Software verification and validation for practitioners and managers, 2nd edn. Artech House, Cormano
Google Scholar
Shepperd M, Ince D (1994) A critique of three metrics. J Syst Softw 26(3):197–210, September 1994
Article Google Scholar
Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(12), November 1997. http://www.utdallas.edu/~rbanker/SE_XII.pdf
Shull F, Basili V, Boehm B, Brown A, Costa P, Lindvall M, Port D, Rus I, Tesoriero R, Zelkowitz M (2002) What we have learned about fighting defects. In: Proceedings of 8th international software metrics symposium, Ottawa, Canada, pp 249–258. http://fc-md.umd.edu/fcmd/Papers/shull_defects.ps
Shull F, Rus I, Basili V (2000) How perspective-based reading can improve requirements inspections. IEEE Comput 33(7):73–79. http://www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.77.pdf
Google Scholar
Srinivasan K, Fisher D (1995) Machine learning approaches to estimating software development effort. IEEE Trans. Softw Eng 21(2):126–137, February 1995
Article Google Scholar
Tang W, Khoshgoftaar TM (2004) Noise identification with the k-means algorithm. In: ICTAI 2004, pp 373–378. http://doi.ieeecomputersociety.org/10.1109/ICTAI.2004.93
Tian J, Zelkowitz M (1995) Complexity measure evaluation and selection. IEEE Trans Softw Eng 21(8):641–649, August 1995
Article Google Scholar
Witten IH, Frank E (2005) Data mining, 2nd edn. Morgan Kaufmann, Los Altos
MATH Google Scholar
Yang Y, Webb G (2003) Weighted proportional k-interval discretization for naive-bayes classifiers. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2003). http://www.csse.monash.edu/~webb/Files/YangWebb03.pdf

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Bogazici University, Istanbul, Turkey
Burak Turhan & Ayşe B. Bener
Lane Department of Computer Science and Electrical Engineering, Morgantown, West Virginia, USA
Tim Menzies & Justin Di Stefano

Authors

Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar
Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar
Ayşe B. Bener
View author publications
You can also search for this author in PubMed Google Scholar
Justin Di Stefano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Burak Turhan.

Additional information

Editor: James Miller

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turhan, B., Menzies, T., Bener, A.B. et al. On the relative value of cross-company and within-company data for defect prediction. Empir Software Eng 14, 540–578 (2009). https://doi.org/10.1007/s10664-008-9103-7

Download citation

Published: 07 January 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10664-008-9103-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the relative value of cross-company and within-company data for defect prediction

Abstract

Access this article

Similar content being viewed by others

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

Global vs. local models for cross-project defect prediction

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the relative value of cross-company and within-company data for defect prediction

Abstract

Access this article

Similar content being viewed by others

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

Global vs. local models for cross-project defect prediction

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation