Skip to main content
Log in

1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Automated unit test generation techniques traditionally follow one of two goals: Either they try to find violations of automated oracles (e.g., assertions, contracts, undeclared exceptions), or they aim to produce representative test suites (e.g., satisfying branch coverage) such that a developer can manually add test oracles. Search-based testing (SBST) has delivered promising results when it comes to achieving coverage, yet the use in conjunction with automated oracles has hardly been explored, and is generally hampered as SBST does not scale well when there are too many testing targets. In this paper we present a search-based approach to handle both objectives at the same time, implemented in the EvoSuite tool. An empirical study applying EvoSuite on 100 randomly selected open source software projects (the SF100 corpus) reveals that SBST has the unique advantage of being well suited to perform both traditional goals at the same time—efficiently triggering faults, while producing representative test sets for any chosen coverage criterion. In our study, EvoSuite detected twice as many failures in terms of undeclared exceptions as a traditional random testing approach, witnessing thousands of real faults in the 100 open source projects. Two out of every five classes with undeclared exceptions have actual faults, but these are buried within many failures that are caused by implicit preconditions. This “noise” can be interpreted as either a call for further research in improving automated oracles—or to make tools like EvoSuite an integral part of software development to enforce clean program interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The other participating tools were t2 and DSC, as well as Randoop as a baseline. Tools were evaluated based on achieved code coverage, mutation score and execution time (linearly combined in a single score).

  2. An alternative would be to resort to data structures that can cope with larger number ranges (e.g., BigDecimal in Java), but this would lead to a significant performance drop.

  3. Note that we used the 1.01 version of SF100. The original version in (Fraser and Arcuri 2012b) had 8,784 classes, but more classes became available once we fixed some classpath issues (e.g., missing jars) in some of the projects.

  4. http://findbugs.sourceforge.net, accessed July 2013.

References

  • Arcuri A (2013) It really does matter how you normalize the branch distance in search-based software testing. Softw Test Verif Rel (STVR) 23(2):119–147

    Article  Google Scholar 

  • Arcuri A, Briand L (2012) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw Test Verif Rel (STVR). doi:10.1002/stvr.1486

  • Arcuri A, Fraser G (2013) Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir Software Eng (EMSE) 18(3):594–623. doi:10.1007/s10664-013-9249-9

    Article  Google Scholar 

  • Arcuri A, Iqbal MZ, Briand L (2012) Random testing: theoretical results and practical implications. IEEE Trans Softw Eng (TSE) 38(2):258–277

    Article  Google Scholar 

  • Baresi L, Young M (2001) Test oracles. Technical Report CIS-TR-01-02, University of Oregon, Dept. of Computer and Information Science, Eugene, Oregon, USA. http://www.cs.uoregon.edu/~michal/pubs/oracles.html

  • Barr E, Vo T, Le V, Su Z (2013) Automatic detection of floating-point exceptions. In: Proceedings of the international conference on principles of programming languages (POPL’13). ACM

  • Bauersfeld S, Vos T, Lakhotiay K, Poulding S, Condori N (2013) Unit testing tool competition. In: International workshop on search-based software testing (SBST)

  • Bhattacharya N, Sakti A, Antoniol G, Guéhéneuc YG, Pesant G (2011) Divide-by-zero exception raising via branch coverage. In: Proceedings of the third international conference on search based software engineering, SSBSE’11. Springer, Berlin, Heidelberg, pp 204–218

  • Clarke LA (1976) A system to generate test data and symbolically execute programs. IEEE Trans Softw Eng (TSE) 2(3):215–222

    Article  Google Scholar 

  • Cowles M, Davis C (1982) On the origins of the .05 level of statistical significance. Am Psychol 37(5):553–558

    Article  Google Scholar 

  • Csallner C, Smaragdakis Y (2004) JCrasher: an automatic robustness tester for Java. Softw Pract Exper 34:1025–1050. doi:10.1002/spe.602

    Article  Google Scholar 

  • Del Grosso C, Antoniol G, Merlo E, Galinier P (2008) Detecting buffer overflow via automatic test input data generation. Comput Oper Res 35(10):3125–3143

    Article  Google Scholar 

  • Duran JW, Ntafos SC (1984) An evaluation of random testing. IEEE Trans Softw Eng (TSE) 10(4):438–444

    Article  Google Scholar 

  • Feller W (1968) An introduction to probability theory and its applications, vol 1, 3 edn. Wiley

  • Fraser G, Arcuri A (2011a) EvoSuite: automatic test suite generation for object-oriented software. In: ACM symposium on the foundations of software engineering (FSE), pp 416–419

  • Fraser G, Arcuri A (2011b) It is not the length that matters, it is how you control it. In: IEEE International conference on software testing, verification and validation (ICST), pp 150–159

  • Fraser G, Arcuri A (2012a) The seed is strong: seeding strategies in search-based software testing. In: IEEE International conference on software testing, verification and validation (ICST), pp 121–130

  • Fraser G, Arcuri A (2012b) Sound empirical evidence in software testing. In: ACM/IEEE International conference on software engineering (ICSE), pp 178–188

  • Fraser G, Arcuri A (2013a) EvoSuite: on the challenges of test case generation in the real world (tool paper). In: IEEE International conference on software testing, verification and validation (ICST)

  • Fraser G, Arcuri A (2013b) Whole test suite generation. IEEE Trans Softw Eng 39(2):276–291

    Article  Google Scholar 

  • Fraser G, Arcuri A, McMinn P (2013) Test suite generation with memetic algorithms. In: Genetic and evolutionary computation conference (GECCO)

  • Godefroid P, Klarlund N, Sen K (2005) Dart: directed automated random testing. In: ACM conference on programming language design and implementation (PLDI), pp 213–223

  • Godefroid P, Levin MY, Molnar DA (2008) Active property checking. In: Proceedings of the 8th ACM international conference on Embedded software, EMSOFT ’08. ACM, New York, pp 207–216

  • Gross F, Fraser G, Zeller A (2012) Search-based system testing: high coverage, no false alarms. In: ACM Int. symposium on software testing and analysis (ISSTA)

  • Korel B, Al-Yami AM (1996) Assertion-oriented automated test data generation. In: Proceedings of the 18th international conference on software engineering, ICSE ’96. IEEE Computer Society, Washington, pp 71–80

  • Lakhotia K, Harman M, Gross H (2010a) AUSTIN: a tool for search based software testing for the C language and its evaluation on deployed automotive systems. In: International symposium on search based software engineering (SSBSE), pp 101–110

  • Lakhotia K, McMinn P, Harman M (2010b) An empirical investigation into branch coverage for c programs using cute and austin. J Syst Softw 83(12):2379–2391

    Article  Google Scholar 

  • Malburg J, Fraser G (2011) Combining search-based and constraint-based testing. In: IEEE/ACM int. conference on automated software engineering (ASE)

  • McMinn P (2004) Search-based software test data generation: a survey. Softw Test Verif Rel (STVR) 14(2):105–156

    Article  Google Scholar 

  • McMinn P (2007) Iguana: input generation using automated novel algorithms. a plug and play research tool. Tech. rep., The University of Sheffield

  • McMinn P (2009) Search-based failure discovery using testability transformations to generate pseudo-oracles. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, genetic and evolutionary computation conference (GECCO). ACM, New York, pp 1689–1696

  • Meyer B, Ciupa I, Leitner A, Liu LL (2007) Automatic testing of object-oriented software. In: Proceedings of the 33rd conference on current trends in theory and practice of computer science, SOFSEM ’07. Springer, Berlin, Heidelberg, pp 114–129

  • Orso A, Xie T (2008) Bert: behavioral regression testing. In: Proceedings of the 2008 international workshop on dynamic analysis: held in conjunction with the ACM SIGSOFT International symposium on software testing and analysis (ISSTA 2008), WODA ’08. ACM, New York, pp 36–42. doi:10.1145/1401827.1401835

  • Pacheco C, Ernst MD (2005) Eclat: automatic generation and classification of test inputs. In: ECOOP 2005—object-oriented programming, 19th European conference, pp 504–527

  • Pacheco C, Lahiri SK, Ernst MD, Ball T (2007) Feedback-directed random test generation. In: ACM/IEEE International conference on software engineering (ICSE), pp 75–84

  • Pandita R, Xie T, Tillmann N, de Halleux J (2010) Guided test generation for coverage criteria. In: IEEE International conference on software maintenance (ICSM), pp 1–10

  • R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN 3-900051-07-0

  • Romano D, Di Penta M, Antoniol G (2011) An approach for search based testing of null pointer exceptions. In: Proceedings of the 2011 fourth IEEE international conference on software testing, verification and validation, ICST ’11. IEEE Computer Society, Washington, pp 160–169

  • Sen K, Marinov D, Agha G (2005) CUTE: a concolic unit testing engine for C. In: ESEC/FSE-13: proc. of the 10th European software engineering conf. held jointly with 13th ACM SIGSOFT int. symposium on foundations of software engineering. ACM, pp 263–272

  • Tillmann N, de Halleux NJ (2008) Pex—white box test generation for .NET. In: International conference on Tests and Proofs (TAP), pp 134–253

  • Tracey N, Clark J, Mander K, McDermid J (2000) Automated test-data generation for exception conditions. Softw Pract Exper 30(1):61–79

    Article  Google Scholar 

  • Visser W, Pasareanu CS, Khurshid S (2004) Test input generation with Java PathFinder. ACM SIGSOFT 29(4):97–107

    Article  Google Scholar 

  • Williams N, Marre B, Mouy P, Roger M (2005) PathCrawler: automatic generation of path tests by combining static and dynamic analysis. In: EDCC’05: proceedings of the 5th European dependable computing conference. LNCS, vol 3463. Springer, pp 281–292

  • Xiao X, Xie T, Tillmann N, de Halleux J (2011) Precise identification of problems for structural test generation. In: Proceeding of the 33rd international conference on software engineering, ICSE ’11. ACM, New York, pp 611–620

Download references

Acknowledgements

This project has been funded by a Google Focused Research Award on “Test Amplification” and the Norwegian Research Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gordon Fraser.

Additional information

Communicated by: Gregg Rothermel

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fraser, G., Arcuri, A. 1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite. Empir Software Eng 20, 611–639 (2015). https://doi.org/10.1007/s10664-013-9288-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-013-9288-2

Keywords

Navigation