Property-Based Testing in Practice

Authors:
Harrison Goldstein

University of Pennsylvania, Philadelphia, Pennsylvania, USA

University of Pennsylvania, Philadelphia, Pennsylvania, USA

https://orcid.org/0000-0001-9631-1169
View Profile

,
Joseph W. Cutler

University of Pennsylvania, Philadelphia, Pennsylvania, USA

University of Pennsylvania, Philadelphia, Pennsylvania, USA

https://orcid.org/0000-0001-9399-9308
View Profile

,
Daniel Dickstein

Jane Street, New York, New York, USA

Jane Street, New York, New York, USA

https://orcid.org/0009-0001-5706-6815
View Profile

,
Benjamin C. Pierce

University of Pennsylvania, Philadelphia, Pennsylvania, USA

University of Pennsylvania, Philadelphia, Pennsylvania, USA

https://orcid.org/0000-0001-7839-1636
View Profile

,
Andrew Head

University of Pennsylvania, Philadelphia, Pennsylvania, USA

University of Pennsylvania, Philadelphia, Pennsylvania, USA

https://orcid.org/0000-0002-1523-3347
View Profile

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringMay 2024Article No.: 187Pages 1–13https://doi.org/10.1145/3597503.3639581

Published:12 April 2024Publication History

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Pages 1–13

ABSTRACT

Property-based testing (PBT) is a testing methodology where users write executable formal specifications of software components and an automated harness checks these specifications against many automatically generated inputs. From its roots in the QuickCheck library in Haskell, PBT has made significant inroads in mainstream languages and industrial practice at companies such as Amazon, Volvo, and Stripe. As PBT extends its reach, it is important to understand how developers are using it in practice, where they see its strengths and weaknesses, and what innovations are needed to make it more effective.

We address these questions using data from 30 in-depth interviews with experienced users of PBT at Jane Street, a financial technology company making heavy and sophisticated use of PBT. These interviews provide empirical evidence that PBT's main strengths lie in testing complex code and in increasing confidence beyond what is available through conventional testing methodologies, and, moreover, that most uses fall into a relatively small number of high-leverage idioms. Its main weaknesses, on the other hand, lie in the relative complexity of writing properties and random data generators and in the difficulty of evaluating their effectiveness. From these observations, we identify a number of potentially high-impact areas for future exploration, including performance improvements, differential testing, additional high-leverage testing scenarios, better techniques for generating random input data, test-case reduction, and methods for evaluating the effectiveness of tests.

References

Maurício Aniche, Christoph Treude, and Andy Zaidman. 2022. How Developers Engineer Test Cases: An Observational Study. IEEE Transactions on Software Engineering 48, 12 (Dec. 2022), 4925--4946. Google ScholarCross Ref
Thomas Arts, John Hughes, Joakim Johansson, and Ulf Wiger. 2006. Testing telecoms software with quviq QuickCheck. In Proceedings of the 2006 ACM SIGPLAN workshop on Erlang (ERLANG '06). Association for Computing Machinery, New York, NY, USA, 2--10. Google ScholarDigital Library
Thomas Arts, John Hughes, Ulf Norell, and Hans Svensson. 2015. Testing AU-TOSAR software with QuickCheck. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 1--4. Google ScholarCross Ref
Karen Barrett-Wilt. 2021. The trials and tribulations of academic publishing - and Fuzz Testing. https://www.cs.wisc.edu/2021/01/14/the-trials-and-tribulations-of-academic-publishing-and-fuzz-testing/Google Scholar
Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, and Andy Zaidman. 2019. Developer Testing in the IDE: Patterns, Beliefs, and Behavior. IEEE Transactions on Software Engineering 45, 3 (March 2019), 261--284. Google ScholarDigital Library
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman. 2015. When, how, and why developers (do not) test in their IDEs. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 179--190. Google ScholarDigital Library
Ann Blandford, Dominic Furniss, and Stephann Makri. 2016. Analysing Data. In Qualitative HCI Research: Going Behind the Scenes, Ann Blandford, Dominic Furniss, and Stephann Makri (Eds.). Springer International Publishing, Cham, 51--60. Google ScholarCross Ref
James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using lightweight formal methods to validate a key-value storage node in Amazon S3. In SOSP 2021. https://www.amazon.science/publications/using-lightweight-formal-methods-to-validate-a-key-value-storage-node-in-amazon-s3Google ScholarDigital Library
James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). Association for Computing Machinery, New York, NY, USA, 836--850. Google ScholarDigital Library
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17). Association for Computing Machinery, New York, NY, USA, 2329--2344. event-place: Dallas, Texas, USA. Google ScholarDigital Library
Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP '00), Montreal, Canada, September 18--21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada, 268--279. Google ScholarDigital Library
Arthur Corgozinho, Marco Valente, and Henrique Rocha. 2023. How Developers Implement Property-Based Tests. In Conference: 39th International Conference on Software Maintenance and Evolution (ICSME 2023).Google Scholar
Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 201--211. ISSN: 2332-6549. Google ScholarDigital Library
Natasha Danas, Tim Nelson, Lane Harrison, Shriram Krishnamurthi, and Daniel J. Dougherty. 2017. User Studies of Principled Model Finder Output. In Software Engineering and Formal Methods (Lecture Notes in Computer Science), Alessandro Cimatti and Marjan Sirjani (Eds.). Springer International Publishing, Cham, 168--184. Google ScholarCross Ref
Zac Hatfield Dodds. 2022. current maintainer of Hypothesis (https://github.com/HypothesisWorks/hypothesis). Personal communication.Google Scholar
Zac Hatfield Dodds and David R. MacIver. 2023. Ghostwriting tests for you --- Hypothesis 6.82.0 documentation. https://hypothesis.readthedocs.io/en/latest/ghostwriter.htmlGoogle Scholar
Stephen Dolan and Mindy Preston. 2017. Testing with crowbar. In OCaml Workshop.Google Scholar
Tristan Dyer, Tim Nelson, Kathi Fisler, and Shriram Krishnamurthi. 2022. Applying cognitive principles to model-finding output: the positive value of negative information. Proceedings of the ACM on Programming Languages 6, OOPSLA1 (April 2022), 79:1--79:29. Google ScholarDigital Library
Carl Eastlund. 2015. Quickcheck for Core. https://blog.janestreet.com/quickcheck-for-core/Google Scholar
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++} : Combining Incremental Steps of Fuzzing Research. https://www.usenix.org/conference/woot20/presentation/fioraldiGoogle Scholar
Harrison Goldstein, Joseph W Cutler, Adam Stein, Benjamin C Pierce, and Andrew Head. 2022. Some Problems with Properties, Vol. 1. https://harrisongoldste.in/papers/hatra2022.pdfGoogle Scholar
Harrison Goldstein, Samantha Frohlich, Meng Wang, and Benjamin C. Pierce. 2023. Reflecting on Random Generation. In Proceedings of ACM Programming Languages. Seattle, WA, USA. Google ScholarDigital Library
Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing Randomness. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (Oct. 2022), 128:89--128:113. Google ScholarDigital Library
Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. 2012. Test confessions: A study of testing practices for plug-in systems. In 2012 34th International Conference on Software Engineering (ICSE). 244--254. ISSN: 1558-1225. Google ScholarCross Ref
Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and Practices of Differential Testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 71--80. Google ScholarDigital Library
Zac Hatfield Dodds. 2023. HypoFuzz. https://hypofuzz.com/Google Scholar
Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2021. Magma: A Ground-Truth Fuzzing Benchmark. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4, 3 (June 2021), 49:1--49:29. Google ScholarDigital Library
Constance Heitmeyer. 1998. On the Need for Practical Formal Methods. Technical Report. https://apps.dtic.mil/sti/citations/ADA465485 Section: Technical Reports.Google Scholar
John Hughes. 2007. QuickCheck Testing for Fun and Profit. In Practical Aspects of Declarative Languages (Lecture Notes in Computer Science), Michael Hanus (Ed.). Springer, Berlin, Heidelberg, 1--32. Google ScholarDigital Library
John Hughes. 2016. Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane. In A List of Successes That Can Change the World: Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday, Sam Lindley, Conor McBride, Phil Trinder, and Don Sannella (Eds.). Springer International Publishing, Cham, 169--186. Google ScholarCross Ref
John Hughes, Benjamin C. Pierce, Thomas Arts, and Ulf Norell. 2016. Mysteries of DropBox: Property-Based Testing of a Distributed Synchronization Service. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 135--145. Google ScholarCross Ref
JetBrains. 2021. Python Developers Survey 2021 Results. https://lp.jetbrains.com/python-developers-survey-2021/Google Scholar
Shriram Krishnamurthi and Tim Nelson. 2019. The Human in Formal Methods. In Formal Methods - The Next 30 Years (Lecture Notes in Computer Science), Maurice H. ter Beek, Annabelle McIver, and José N. Oliveira (Eds.). Springer International Publishing, Cham, 3--10. Google ScholarDigital Library
Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. PACMPL 3, OOPSLA (2019), 181:1--181:29. Google ScholarDigital Library
Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating good generators for inductive relations. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--30. Publisher: ACM New York, NY, USA. Google ScholarDigital Library
David R. MacIver and Alastair F. Donaldson. 2020. Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper). In 34th European Conference on Object-Oriented Programming (ECOOP 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166), Robert Hirschfeld and Tobias Pape (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 13:1--13:27. ISSN: 1868-8969. Google ScholarCross Ref
David R MacIver, Zac Hatfield-Dodds, and others. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software 4, 43 (2019), 1891. https://joss.theoj.org/papers/10.21105/joss.01891.pdfGoogle ScholarCross Ref
Tim Mackinnon, Steve Freeman, and Philip Craig. 2000. Endo-testing: unit testing with mock objects. Extreme programming examined (2000), 287--301.Google Scholar
Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (dec 1990), 32--44. Google ScholarDigital Library
Minsky. 2015. Testing with expectations. https://blog.janestreet.com/testing-with-expectations/Google Scholar
Liam O'Connor and Oskar Wickström. 2022. Quickstrom: property-based acceptance testing with LTL specifications. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 1025--1038. Google ScholarDigital Library
Otter.ai. 2023. Otter.ai - Voice Meeting Notes & Real-time Transcription. https://otter.ai/Google Scholar
Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an Optimising Compiler by Generating Random Lambda Terms. In Proceedings of the 6th International Workshop on Automation of Software Test (AST '11). ACM, New York, NY, USA, 91--97. event-place: Waikiki, Honolulu, HI, USA. Google ScholarDigital Library
M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman. 2018. Mutation Testing Advances: An Analysis and Survey. Advances in Computers (Jan. 2018). Google ScholarCross Ref
Zoe Paraskevopoulou, Aaron Eline, and Leonidas Lampropoulos. 2022. Computing correctly with inductive relations. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 966--980. Google ScholarDigital Library
Zoe Paraskevopoulou, Cătălin Hriţcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. 2015. Foundational Property-Based Testing. In Interactive Theorem Proving (Lecture Notes in Computer Science), Christian Urban and Xingyuan Zhang (Eds.). Springer International Publishing, Cham, 325--343. Google ScholarCross Ref
Goran Petrovic and Marko Ivankovic. 2018. State of Mutation Testing at Google. In Proceedings of the 40th International Conference on Software Engineering 2017 (SEIP).Google ScholarDigital Library
Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen. 2020. Quickly generating diverse valid test inputs with reinforcement learning. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1410--1421. Google ScholarDigital Library
Alastair Reid, Luke Church, Shaked Flur, Sarah de Haas, Maritza Johnson, and Ben Laurie. 2020. Towards making formal methods normal: meeting developers where they are. http://arxiv.org/abs/2010.16345 arXiv:2010.16345 [cs].Google Scholar
Jessica Shi, Alperen Keles, Harrison Goldstein, Benjamin C Pierce, and Leonidas Lampropoulos. 2023. Etna: An Evaluation Platform for Property-Based Testing (Experience Report). Proc. ACM Program. Lang. 7 (2023). Google ScholarDigital Library
Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation. IEEE Transactions on Software Engineering (2020). Publisher: IEEE. Google ScholarCross Ref
Jacob Stanley. 2017. Hedgehog will eat all your bugs. https://hedgehog.qa/Google Scholar
Dominic Steinhöfel and Andreas Zeller. 2022. Input invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 583--594. Google ScholarDigital Library
Mark Utting and Bruno Legeard. 2010. Practical Model-Based Testing: A Tools Approach. Elsevier.Google ScholarDigital Library
John Wrenn, Tim Nelson, and Shriram Krishnamurthi. 2021. Using Relational Problems to Teach Property-Based Testing. The art science and engineering of programming 5, 2 (Jan. 2021). Google ScholarCross Ref
Michał Zalewski. 2022. American Fuzzy Lop (AFL). https://github.com/google/AFL original-date: 2019-07-25T16:50:06Z.Google Scholar

Recommendations

Targeted property-based testing
ISSTA 2017: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis

We introduce targeted property-based testing, an enhanced form of property-based testing that aims to make the input generation component of a property-based testing tool guided by a search strategy rather than being completely random. Thus, this ...
Read More
JQF: coverage-guided property-based testing in Java
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

We present JQF, a platform for performing coverage-guided fuzz testing in Java. JQF is designed both for practitioners, who wish to find bugs in Java programs, as well as for researchers, who wish to implement new fuzzing algorithms.

Practitioners ...
Read More
Property-based testing: a new approach to testing for assurance

The goal of software testing analysis is to validate that an implementation satisfies its specifications. Many errors in software are caused by generalizable flaws in the source code. Property-based testing assures that a given program is free of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
Co-chairs:
Ana Paiva,
Rui Abreu,
Program Co-chairs:
Abhik Roychoudhury,
Margaret Storey
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 April 2024
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 109
  Total Downloads
- Downloads (Last 12 months)109
- Downloads (Last 6 weeks)98
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Property-Based Testing in Practice

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

ABSTRACT

References

Cited By

Recommendations

Targeted property-based testing

JQF: coverage-guided property-based testing in Java

Property-based testing: a new approach to testing for assurance