skip to main content
10.1145/3597503.3639581acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open Access

Property-Based Testing in Practice

Published:12 April 2024Publication History

ABSTRACT

Property-based testing (PBT) is a testing methodology where users write executable formal specifications of software components and an automated harness checks these specifications against many automatically generated inputs. From its roots in the QuickCheck library in Haskell, PBT has made significant inroads in mainstream languages and industrial practice at companies such as Amazon, Volvo, and Stripe. As PBT extends its reach, it is important to understand how developers are using it in practice, where they see its strengths and weaknesses, and what innovations are needed to make it more effective.

We address these questions using data from 30 in-depth interviews with experienced users of PBT at Jane Street, a financial technology company making heavy and sophisticated use of PBT. These interviews provide empirical evidence that PBT's main strengths lie in testing complex code and in increasing confidence beyond what is available through conventional testing methodologies, and, moreover, that most uses fall into a relatively small number of high-leverage idioms. Its main weaknesses, on the other hand, lie in the relative complexity of writing properties and random data generators and in the difficulty of evaluating their effectiveness. From these observations, we identify a number of potentially high-impact areas for future exploration, including performance improvements, differential testing, additional high-leverage testing scenarios, better techniques for generating random input data, test-case reduction, and methods for evaluating the effectiveness of tests.

References

  1. Maurício Aniche, Christoph Treude, and Andy Zaidman. 2022. How Developers Engineer Test Cases: An Observational Study. IEEE Transactions on Software Engineering 48, 12 (Dec. 2022), 4925--4946. Google ScholarGoogle ScholarCross RefCross Ref
  2. Thomas Arts, John Hughes, Joakim Johansson, and Ulf Wiger. 2006. Testing telecoms software with quviq QuickCheck. In Proceedings of the 2006 ACM SIGPLAN workshop on Erlang (ERLANG '06). Association for Computing Machinery, New York, NY, USA, 2--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thomas Arts, John Hughes, Ulf Norell, and Hans Svensson. 2015. Testing AU-TOSAR software with QuickCheck. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  4. Karen Barrett-Wilt. 2021. The trials and tribulations of academic publishing - and Fuzz Testing. https://www.cs.wisc.edu/2021/01/14/the-trials-and-tribulations-of-academic-publishing-and-fuzz-testing/Google ScholarGoogle Scholar
  5. Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, and Andy Zaidman. 2019. Developer Testing in the IDE: Patterns, Beliefs, and Behavior. IEEE Transactions on Software Engineering 45, 3 (March 2019), 261--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman. 2015. When, how, and why developers (do not) test in their IDEs. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ann Blandford, Dominic Furniss, and Stephann Makri. 2016. Analysing Data. In Qualitative HCI Research: Going Behind the Scenes, Ann Blandford, Dominic Furniss, and Stephann Makri (Eds.). Springer International Publishing, Cham, 51--60. Google ScholarGoogle ScholarCross RefCross Ref
  8. James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using lightweight formal methods to validate a key-value storage node in Amazon S3. In SOSP 2021. https://www.amazon.science/publications/using-lightweight-formal-methods-to-validate-a-key-value-storage-node-in-amazon-s3Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James Bornholt, Rajeev Joshi, Vytautas Astrauskas, Brendan Cully, Bernhard Kragl, Seth Markle, Kyle Sauri, Drew Schleit, Grant Slatton, Serdar Tasiran, Jacob Van Geffen, and Andrew Warfield. 2021. Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP '21). Association for Computing Machinery, New York, NY, USA, 836--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS '17). Association for Computing Machinery, New York, NY, USA, 2329--2344. event-place: Dallas, Texas, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP '00), Montreal, Canada, September 18--21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada, 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Arthur Corgozinho, Marco Valente, and Henrique Rocha. 2023. How Developers Implement Property-Based Tests. In Conference: 39th International Conference on Software Maintenance and Evolution (ICSME 2023).Google ScholarGoogle Scholar
  13. Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 201--211. ISSN: 2332-6549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Natasha Danas, Tim Nelson, Lane Harrison, Shriram Krishnamurthi, and Daniel J. Dougherty. 2017. User Studies of Principled Model Finder Output. In Software Engineering and Formal Methods (Lecture Notes in Computer Science), Alessandro Cimatti and Marjan Sirjani (Eds.). Springer International Publishing, Cham, 168--184. Google ScholarGoogle ScholarCross RefCross Ref
  15. Zac Hatfield Dodds. 2022. current maintainer of Hypothesis (https://github.com/HypothesisWorks/hypothesis). Personal communication.Google ScholarGoogle Scholar
  16. Zac Hatfield Dodds and David R. MacIver. 2023. Ghostwriting tests for you --- Hypothesis 6.82.0 documentation. https://hypothesis.readthedocs.io/en/latest/ghostwriter.htmlGoogle ScholarGoogle Scholar
  17. Stephen Dolan and Mindy Preston. 2017. Testing with crowbar. In OCaml Workshop.Google ScholarGoogle Scholar
  18. Tristan Dyer, Tim Nelson, Kathi Fisler, and Shriram Krishnamurthi. 2022. Applying cognitive principles to model-finding output: the positive value of negative information. Proceedings of the ACM on Programming Languages 6, OOPSLA1 (April 2022), 79:1--79:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Carl Eastlund. 2015. Quickcheck for Core. https://blog.janestreet.com/quickcheck-for-core/Google ScholarGoogle Scholar
  20. Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++} : Combining Incremental Steps of Fuzzing Research. https://www.usenix.org/conference/woot20/presentation/fioraldiGoogle ScholarGoogle Scholar
  21. Harrison Goldstein, Joseph W Cutler, Adam Stein, Benjamin C Pierce, and Andrew Head. 2022. Some Problems with Properties, Vol. 1. https://harrisongoldste.in/papers/hatra2022.pdfGoogle ScholarGoogle Scholar
  22. Harrison Goldstein, Samantha Frohlich, Meng Wang, and Benjamin C. Pierce. 2023. Reflecting on Random Generation. In Proceedings of ACM Programming Languages. Seattle, WA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing Randomness. Proceedings of the ACM on Programming Languages 6, OOPSLA2 (Oct. 2022), 128:89--128:113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. 2012. Test confessions: A study of testing practices for plug-in systems. In 2012 34th International Conference on Software Engineering (ICSE). 244--254. ISSN: 1558-1225. Google ScholarGoogle ScholarCross RefCross Ref
  25. Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and Practices of Differential Testing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zac Hatfield Dodds. 2023. HypoFuzz. https://hypofuzz.com/Google ScholarGoogle Scholar
  27. Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2021. Magma: A Ground-Truth Fuzzing Benchmark. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4, 3 (June 2021), 49:1--49:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Constance Heitmeyer. 1998. On the Need for Practical Formal Methods. Technical Report. https://apps.dtic.mil/sti/citations/ADA465485 Section: Technical Reports.Google ScholarGoogle Scholar
  29. John Hughes. 2007. QuickCheck Testing for Fun and Profit. In Practical Aspects of Declarative Languages (Lecture Notes in Computer Science), Michael Hanus (Ed.). Springer, Berlin, Heidelberg, 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. John Hughes. 2016. Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane. In A List of Successes That Can Change the World: Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday, Sam Lindley, Conor McBride, Phil Trinder, and Don Sannella (Eds.). Springer International Publishing, Cham, 169--186. Google ScholarGoogle ScholarCross RefCross Ref
  31. John Hughes, Benjamin C. Pierce, Thomas Arts, and Ulf Norell. 2016. Mysteries of DropBox: Property-Based Testing of a Distributed Synchronization Service. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 135--145. Google ScholarGoogle ScholarCross RefCross Ref
  32. JetBrains. 2021. Python Developers Survey 2021 Results. https://lp.jetbrains.com/python-developers-survey-2021/Google ScholarGoogle Scholar
  33. Shriram Krishnamurthi and Tim Nelson. 2019. The Human in Formal Methods. In Formal Methods - The Next 30 Years (Lecture Notes in Computer Science), Maurice H. ter Beek, Annabelle McIver, and José N. Oliveira (Eds.). Springer International Publishing, Cham, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. PACMPL 3, OOPSLA (2019), 181:1--181:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating good generators for inductive relations. Proceedings of the ACM on Programming Languages 2, POPL (2017), 1--30. Publisher: ACM New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David R. MacIver and Alastair F. Donaldson. 2020. Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper). In 34th European Conference on Object-Oriented Programming (ECOOP 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166), Robert Hirschfeld and Tobias Pape (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 13:1--13:27. ISSN: 1868-8969. Google ScholarGoogle ScholarCross RefCross Ref
  37. David R MacIver, Zac Hatfield-Dodds, and others. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software 4, 43 (2019), 1891. https://joss.theoj.org/papers/10.21105/joss.01891.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  38. Tim Mackinnon, Steve Freeman, and Philip Craig. 2000. Endo-testing: unit testing with mock objects. Extreme programming examined (2000), 287--301.Google ScholarGoogle Scholar
  39. Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (dec 1990), 32--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Minsky. 2015. Testing with expectations. https://blog.janestreet.com/testing-with-expectations/Google ScholarGoogle Scholar
  41. Liam O'Connor and Oskar Wickström. 2022. Quickstrom: property-based acceptance testing with LTL specifications. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 1025--1038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Otter.ai. 2023. Otter.ai - Voice Meeting Notes & Real-time Transcription. https://otter.ai/Google ScholarGoogle Scholar
  43. Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an Optimising Compiler by Generating Random Lambda Terms. In Proceedings of the 6th International Workshop on Automation of Software Test (AST '11). ACM, New York, NY, USA, 91--97. event-place: Waikiki, Honolulu, HI, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman. 2018. Mutation Testing Advances: An Analysis and Survey. Advances in Computers (Jan. 2018). Google ScholarGoogle ScholarCross RefCross Ref
  45. Zoe Paraskevopoulou, Aaron Eline, and Leonidas Lampropoulos. 2022. Computing correctly with inductive relations. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA, 966--980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zoe Paraskevopoulou, Cătălin Hriţcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. 2015. Foundational Property-Based Testing. In Interactive Theorem Proving (Lecture Notes in Computer Science), Christian Urban and Xingyuan Zhang (Eds.). Springer International Publishing, Cham, 325--343. Google ScholarGoogle ScholarCross RefCross Ref
  47. Goran Petrovic and Marko Ivankovic. 2018. State of Mutation Testing at Google. In Proceedings of the 40th International Conference on Software Engineering 2017 (SEIP).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen. 2020. Quickly generating diverse valid test inputs with reinforcement learning. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE '20). Association for Computing Machinery, New York, NY, USA, 1410--1421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Alastair Reid, Luke Church, Shaked Flur, Sarah de Haas, Maritza Johnson, and Ben Laurie. 2020. Towards making formal methods normal: meeting developers where they are. http://arxiv.org/abs/2010.16345 arXiv:2010.16345 [cs].Google ScholarGoogle Scholar
  50. Jessica Shi, Alperen Keles, Harrison Goldstein, Benjamin C Pierce, and Leonidas Lampropoulos. 2023. Etna: An Evaluation Platform for Property-Based Testing (Experience Report). Proc. ACM Program. Lang. 7 (2023). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation. IEEE Transactions on Software Engineering (2020). Publisher: IEEE. Google ScholarGoogle ScholarCross RefCross Ref
  52. Jacob Stanley. 2017. Hedgehog will eat all your bugs. https://hedgehog.qa/Google ScholarGoogle Scholar
  53. Dominic Steinhöfel and Andreas Zeller. 2022. Input invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 583--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mark Utting and Bruno Legeard. 2010. Practical Model-Based Testing: A Tools Approach. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. John Wrenn, Tim Nelson, and Shriram Krishnamurthi. 2021. Using Relational Problems to Teach Property-Based Testing. The art science and engineering of programming 5, 2 (Jan. 2021). Google ScholarGoogle ScholarCross RefCross Ref
  56. Michał Zalewski. 2022. American Fuzzy Lop (AFL). https://github.com/google/AFL original-date: 2019-07-25T16:50:06Z.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503

    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 April 2024

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate276of1,856submissions,15%

    Upcoming Conference

    ICSE 2025
  • Article Metrics

    • Downloads (Last 12 months)109
    • Downloads (Last 6 weeks)98

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader