ABSTRACT
Property-based testing (PBT) is a testing methodology where users write executable formal specifications of software components and an automated harness checks these specifications against many automatically generated inputs. From its roots in the QuickCheck library in Haskell, PBT has made significant inroads in mainstream languages and industrial practice at companies such as Amazon, Volvo, and Stripe. As PBT extends its reach, it is important to understand how developers are using it in practice, where they see its strengths and weaknesses, and what innovations are needed to make it more effective.
We address these questions using data from 30 in-depth interviews with experienced users of PBT at Jane Street, a financial technology company making heavy and sophisticated use of PBT. These interviews provide empirical evidence that PBT's main strengths lie in testing complex code and in increasing confidence beyond what is available through conventional testing methodologies, and, moreover, that most uses fall into a relatively small number of high-leverage idioms. Its main weaknesses, on the other hand, lie in the relative complexity of writing properties and random data generators and in the difficulty of evaluating their effectiveness. From these observations, we identify a number of potentially high-impact areas for future exploration, including performance improvements, differential testing, additional high-leverage testing scenarios, better techniques for generating random input data, test-case reduction, and methods for evaluating the effectiveness of tests.
- Maurício Aniche, Christoph Treude, and Andy Zaidman. 2022. How Developers Engineer Test Cases: An Observational Study. IEEE Transactions on Software Engineering 48, 12 (Dec. 2022), 4925--4946. Google ScholarCross Ref
- Thomas Arts, John Hughes, Joakim Johansson, and Ulf Wiger. 2006. Testing telecoms software with quviq QuickCheck. In Proceedings of the 2006 ACM SIGPLAN workshop on Erlang (ERLANG '06). Association for Computing Machinery, New York, NY, USA, 2--10. Google ScholarDigital Library
- Thomas Arts, John Hughes, Ulf Norell, and Hans Svensson. 2015. Testing AU-TOSAR software with QuickCheck. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 1--4. Google ScholarCross Ref
- Karen Barrett-Wilt. 2021. The trials and tribulations of academic publishing - and Fuzz Testing. https://www.cs.wisc.edu/2021/01/14/the-trials-and-tribulations-of-academic-publishing-and-fuzz-testing/Google Scholar
- Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, and Andy Zaidman. 2019. Developer Testing in the IDE: Patterns, Beliefs, and Behavior. IEEE Transactions on Software Engineering 45, 3 (March 2019), 261--284. Google ScholarDigital Library
- Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman. 2015. When, how, and why developers (do not) test in their IDEs. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 179--190. Google ScholarDigital Library
- Ann Blandford, Dominic Furniss, and Stephann Makri. 2016. Analysing Data. In Qualitative HCI Research: Going Behind the Scenes, Ann Blandford, Dominic Furniss, and Stephann Makri (Eds.). Springer International Publishing, Cham, 51--60. Google ScholarCross Ref
- James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using lightweight formal methods to validate a key-value storage node in Amazon S3. In SOSP 2021. https://www.amazon.science/publications/using-lightweight-formal-methods-to-validate-a-key-value-storage-node-in-amazon-s3Google ScholarDigital Library
- James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). Association for Computing Machinery, New York, NY, USA, 836--850. Google ScholarDigital Library
- Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17). Association for Computing Machinery, New York, NY, USA, 2329--2344. event-place: Dallas, Texas, USA. Google ScholarDigital Library
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP '00), Montreal, Canada, September 18--21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada, 268--279. Google ScholarDigital Library
- Arthur Corgozinho, Marco Valente, and Henrique Rocha. 2023. How Developers Implement Property-Based Tests. In Conference: 39th International Conference on Software Maintenance and Evolution (ICSME 2023).Google Scholar
- Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 201--211. ISSN: 2332-6549. Google ScholarDigital Library
- Natasha Danas, Tim Nelson, Lane Harrison, Shriram Krishnamurthi, and Daniel J. Dougherty. 2017. User Studies of Principled Model Finder Output. In Software Engineering and Formal Methods (Lecture Notes in Computer Science), Alessandro Cimatti and Marjan Sirjani (Eds.). Springer International Publishing, Cham, 168--184. Google ScholarCross Ref
- Zac Hatfield Dodds. 2022. current maintainer of Hypothesis (https://github.com/HypothesisWorks/hypothesis). Personal communication.Google Scholar
- Zac Hatfield Dodds and David R. MacIver. 2023. Ghostwriting tests for you --- Hypothesis 6.82.0 documentation. https://hypothesis.readthedocs.io/en/latest/ghostwriter.htmlGoogle Scholar
- Stephen Dolan and Mindy Preston. 2017. Testing with crowbar. In OCaml Workshop.Google Scholar
- Tristan Dyer, Tim Nelson, Kathi Fisler, and Shriram Krishnamurthi. 2022. Applying cognitive principles to model-finding output: the positive value of negative information. Proceedings of the ACM on Programming Languages 6, OOPSLA1 (April 2022), 79:1--79:29. Google ScholarDigital Library
- Carl Eastlund. 2015. Quickcheck for Core. https://blog.janestreet.com/quickcheck-for-core/Google Scholar
- Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++} : Combining Incremental Steps of Fuzzing Research. https://www.usenix.org/conference/woot20/presentation/fioraldiGoogle Scholar
- Harrison Goldstein, Joseph W Cutler, Adam Stein, Benjamin C Pierce, and Andrew Head. 2022. Some Problems with Properties, Vol. 1. https://harrisongoldste.in/papers/hatra2022.pdfGoogle Scholar
- Harrison Goldstein, Samantha Frohlich, Meng Wang, and Benjamin C. Pierce. 2023. Reflecting on Random Generation. In Proceedings of ACM Programming Languages. Seattle, WA, USA. Google ScholarDigital Library
- Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing Randomness. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (Oct. 2022), 128:89--128:113. Google ScholarDigital Library
- Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. 2012. Test confessions: A study of testing practices for plug-in systems. In 2012 34th International Conference on Software Engineering (ICSE). 244--254. ISSN: 1558-1225. Google ScholarCross Ref
- Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and Practices of Differential Testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 71--80. Google ScholarDigital Library
- Zac Hatfield Dodds. 2023. HypoFuzz. https://hypofuzz.com/Google Scholar
- Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2021. Magma: A Ground-Truth Fuzzing Benchmark. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4, 3 (June 2021), 49:1--49:29. Google ScholarDigital Library
- Constance Heitmeyer. 1998. On the Need for Practical Formal Methods. Technical Report. https://apps.dtic.mil/sti/citations/ADA465485 Section: Technical Reports.Google Scholar
- John Hughes. 2007. QuickCheck Testing for Fun and Profit. In Practical Aspects of Declarative Languages (Lecture Notes in Computer Science), Michael Hanus (Ed.). Springer, Berlin, Heidelberg, 1--32. Google ScholarDigital Library
- John Hughes. 2016. Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane. In A List of Successes That Can Change the World: Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday, Sam Lindley, Conor McBride, Phil Trinder, and Don Sannella (Eds.). Springer International Publishing, Cham, 169--186. Google ScholarCross Ref
- John Hughes, Benjamin C. Pierce, Thomas Arts, and Ulf Norell. 2016. Mysteries of DropBox: Property-Based Testing of a Distributed Synchronization Service. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 135--145. Google ScholarCross Ref
- JetBrains. 2021. Python Developers Survey 2021 Results. https://lp.jetbrains.com/python-developers-survey-2021/Google Scholar
- Shriram Krishnamurthi and Tim Nelson. 2019. The Human in Formal Methods. In Formal Methods - The Next 30 Years (Lecture Notes in Computer Science), Maurice H. ter Beek, Annabelle McIver, and José N. Oliveira (Eds.). Springer International Publishing, Cham, 3--10. Google ScholarDigital Library
- Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. PACMPL 3, OOPSLA (2019), 181:1--181:29. Google ScholarDigital Library
- Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating good generators for inductive relations. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--30. Publisher: ACM New York, NY, USA. Google ScholarDigital Library
- David R. MacIver and Alastair F. Donaldson. 2020. Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper). In 34th European Conference on Object-Oriented Programming (ECOOP 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166), Robert Hirschfeld and Tobias Pape (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 13:1--13:27. ISSN: 1868-8969. Google ScholarCross Ref
- David R MacIver, Zac Hatfield-Dodds, and others. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software 4, 43 (2019), 1891. https://joss.theoj.org/papers/10.21105/joss.01891.pdfGoogle ScholarCross Ref
- Tim Mackinnon, Steve Freeman, and Philip Craig. 2000. Endo-testing: unit testing with mock objects. Extreme programming examined (2000), 287--301.Google Scholar
- Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (dec 1990), 32--44. Google ScholarDigital Library
- Minsky. 2015. Testing with expectations. https://blog.janestreet.com/testing-with-expectations/Google Scholar
- Liam O'Connor and Oskar Wickström. 2022. Quickstrom: property-based acceptance testing with LTL specifications. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 1025--1038. Google ScholarDigital Library
- Otter.ai. 2023. Otter.ai - Voice Meeting Notes & Real-time Transcription. https://otter.ai/Google Scholar
- Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an Optimising Compiler by Generating Random Lambda Terms. In Proceedings of the 6th International Workshop on Automation of Software Test (AST '11). ACM, New York, NY, USA, 91--97. event-place: Waikiki, Honolulu, HI, USA. Google ScholarDigital Library
- M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman. 2018. Mutation Testing Advances: An Analysis and Survey. Advances in Computers (Jan. 2018). Google ScholarCross Ref
- Zoe Paraskevopoulou, Aaron Eline, and Leonidas Lampropoulos. 2022. Computing correctly with inductive relations. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 966--980. Google ScholarDigital Library
- Zoe Paraskevopoulou, Cătălin Hriţcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. 2015. Foundational Property-Based Testing. In Interactive Theorem Proving (Lecture Notes in Computer Science), Christian Urban and Xingyuan Zhang (Eds.). Springer International Publishing, Cham, 325--343. Google ScholarCross Ref
- Goran Petrovic and Marko Ivankovic. 2018. State of Mutation Testing at Google. In Proceedings of the 40th International Conference on Software Engineering 2017 (SEIP).Google ScholarDigital Library
- Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen. 2020. Quickly generating diverse valid test inputs with reinforcement learning. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1410--1421. Google ScholarDigital Library
- Alastair Reid, Luke Church, Shaked Flur, Sarah de Haas, Maritza Johnson, and Ben Laurie. 2020. Towards making formal methods normal: meeting developers where they are. http://arxiv.org/abs/2010.16345 arXiv:2010.16345 [cs].Google Scholar
- Jessica Shi, Alperen Keles, Harrison Goldstein, Benjamin C Pierce, and Leonidas Lampropoulos. 2023. Etna: An Evaluation Platform for Property-Based Testing (Experience Report). Proc. ACM Program. Lang. 7 (2023). Google ScholarDigital Library
- Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation. IEEE Transactions on Software Engineering (2020). Publisher: IEEE. Google ScholarCross Ref
- Jacob Stanley. 2017. Hedgehog will eat all your bugs. https://hedgehog.qa/Google Scholar
- Dominic Steinhöfel and Andreas Zeller. 2022. Input invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 583--594. Google ScholarDigital Library
- Mark Utting and Bruno Legeard. 2010. Practical Model-Based Testing: A Tools Approach. Elsevier.Google ScholarDigital Library
- John Wrenn, Tim Nelson, and Shriram Krishnamurthi. 2021. Using Relational Problems to Teach Property-Based Testing. The art science and engineering of programming 5, 2 (Jan. 2021). Google ScholarCross Ref
- Michał Zalewski. 2022. American Fuzzy Lop (AFL). https://github.com/google/AFL original-date: 2019-07-25T16:50:06Z.Google Scholar
Recommendations
Targeted property-based testing
ISSTA 2017: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and AnalysisWe introduce targeted property-based testing, an enhanced form of property-based testing that aims to make the input generation component of a property-based testing tool guided by a search strategy rather than being completely random. Thus, this ...
JQF: coverage-guided property-based testing in Java
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and AnalysisWe present JQF, a platform for performing coverage-guided fuzz testing in Java. JQF is designed both for practitioners, who wish to find bugs in Java programs, as well as for researchers, who wish to implement new fuzzing algorithms.
Practitioners ...
Property-based testing: a new approach to testing for assurance
The goal of software testing analysis is to validate that an implementation satisfies its specifications. Many errors in software are caused by generalizable flaws in the source code. Property-based testing assures that a given program is free of ...
Comments