Abstract
Random data generators can be thought of as parsers of streams of randomness. This perspective on generators for random data structures is established folklore in the programming languages community, but it has never been formalized, nor have its consequences been deeply explored.
We build on the idea of freer monads to develop free generators, which unify parsing and generation using a common structure that makes the relationship between the two concepts precise. Free generators lead naturally to a proof that a monadic generator can be factored into a parser plus a distribution over choice sequences. Free generators also support a notion of derivative, analogous to the familiar Brzozowski derivatives of formal languages, allowing analysis tools to "preview" the effect of a particular generator choice. This gives rise to a novel algorithm for generating data structures satisfying user-specified preconditions.
Supplemental Material
Available for Download
Appendix, containing proofs and diagrams.
- Janusz A Brzozowski. 1964. Derivatives of regular expressions. Journal of the ACM (JACM), 11, 4 (1964), 481–494. Google ScholarDigital Library
- Koen Claessen, Jonas Duregård, and Michal H. Palka. 2015. Generating constrained random data with uniform distribution. J. Funct. Program., 25 (2015), https://doi.org/10.1017/S0956796815000143 Google ScholarCross Ref
- Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00), Montreal, Canada, September 18-21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada. 268–279. https://doi.org/10.1145/351240.351266 Google ScholarDigital Library
- Kyle Thomas Dewey. 2017. Automated Black Box Generation of Structured Inputs for Use in Software Testing. University of California, Santa Barbara. Google Scholar
- Stephen Dolan and Mindy Preston. 2017. Testing with crowbar. In OCaml Workshop. Google Scholar
- Tony Garnock-Jones, Mahdi Eslamimehr, and Alessandro Warth. 2018. Recognising and generating terms using derivatives of parsing expression grammars. arXiv preprint arXiv:1801.10490. Google Scholar
- Michele Giry. 1982. A categorical approach to probability theory. In Categorical aspects of topology and analysis. Springer, 68–85. Google Scholar
- Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 50–59. https://dl.acm.org/doi/10.5555/3155562.3155573 Google ScholarCross Ref
- Harrison Goldstein. 2021. Ungenerators. In ICFP Student Research Competition. https://harrisongoldste.in/papers/icfpsrc21.pdf Google Scholar
- Harrison Goldstein. 2022. Parsing Randomness: Free Generators Development. Oct, https://doi.org/10.5281/zenodo.7086231 Google ScholarDigital Library
- John Hughes. 2007. QuickCheck testing for fun and profit. In International Symposium on Practical Aspects of Declarative Languages. 1–32. https://dl.acm.org/doi/10.1007/978-3-540-69611-7_1 Google Scholar
- Oleg Kiselyov and Hiromi Ishii. 2015. Freer monads, more extensible effects. ACM SIGPLAN Notices, 50, 12 (2015), 94–105. https://dl.acm.org/doi/10.1145/2804302.2804319 Google ScholarDigital Library
- Leonidas Lampropoulos, Diane Gallois-Wong, Catalin Hritcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. 2017. Beginner’s Luck: a language for property-based generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. 114–129. http://dl.acm.org/citation.cfm?id=3009868 Google ScholarDigital Library
- Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C Pierce. 2017. Generating good generators for inductive relations. Proceedings of the ACM on Programming Languages, 2, POPL (2017), 1–30. https://dl.acm.org/doi/10.1145/3158133 Google Scholar
- Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. Google Scholar
- Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady. 10, 707–710. Google Scholar
- Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-Based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). Association for Computing Machinery, New York, NY, USA. 46–56. isbn:9781450350761 https://doi.org/10.1145/3092703.3092711 Google ScholarDigital Library
- David R MacIver and Zac Hatfield-Dodds. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software, 4, 43 (2019), 1891. Google ScholarCross Ref
- Eugenio Moggi. 1991. Notions of computation and monads. Information and computation, 93, 1 (1991), 55–92. Google Scholar
- Tomáš Petříček. 2009. Encoding monadic computations in C# using iterators. Proceedings of ITAT. Google Scholar
- Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen. 2020. Quickly generating diverse valid test inputs with reinforcement learning. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1410–1421. https://doi.org/10.1145/3377811.3380399 Google ScholarDigital Library
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 283–294. https://doi.org/10.1145/1993498.1993532 Google ScholarDigital Library
Index Terms
- Parsing randomness
Recommendations
Reflecting on Random Generation
Expert users of property-based testing often labor to craft random generators that encode detailed knowledge about what it means for a test input to be valid and interesting. Fortunately, the fruits of this labor can also be put to other uses. In the ...
Left recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
On parsing and condensing substrings of LR languages in linear time
LR parsers have longbeen known as being an efficient algorithm for recognizing deterministic context-free grammars. In this article, we present a linear-time method for parsing substrings of LR languages. The algorithm depends on the LR automaton which ...
Comments