research-article

Open Access

Parsing randomness

Authors:
Harrison Goldstein

University of Pennsylvania, USA

University of Pennsylvania, USA

0000-0001-9631-1169
View Profile

,
Benjamin C. Pierce

University of Pennsylvania, USA

University of Pennsylvania, USA

0000-0001-7839-1636
View Profile

Proceedings of the ACM on Programming Languages Volume 6 Issue OOPSLA2Article No.: 128pp 89–113https://doi.org/10.1145/3563291

Published:31 October 2022Publication History

Related Artifact: Parsing Randomness: Free Generators Development October 2022 software https://doi.org/10.5281/zenodo.7086231

Proceedings of the ACM on Programming Languages

Abstract

Random data generators can be thought of as parsers of streams of randomness. This perspective on generators for random data structures is established folklore in the programming languages community, but it has never been formalized, nor have its consequences been deeply explored.

We build on the idea of freer monads to develop free generators, which unify parsing and generation using a common structure that makes the relationship between the two concepts precise. Free generators lead naturally to a proof that a monadic generator can be factored into a parser plus a distribution over choice sequences. Free generators also support a notion of derivative, analogous to the familiar Brzozowski derivatives of formal languages, allowing analysis tools to "preview" the effect of a particular generator choice. This gives rise to a novel algorithm for generating data structures satisfying user-specified preconditions.

Supplemental Material

Available for Download

zip

oopslab22main-p138-p-archive.zip (1.4 MB)

Appendix, containing proofs and diagrams.

References

Janusz A Brzozowski. 1964. Derivatives of regular expressions. Journal of the ACM (JACM), 11, 4 (1964), 481–494. Google ScholarDigital Library
Koen Claessen, Jonas Duregård, and Michal H. Palka. 2015. Generating constrained random data with uniform distribution. J. Funct. Program., 25 (2015), https://doi.org/10.1017/S0956796815000143 Google ScholarCross Ref
Koen Claessen and John Hughes. 2000. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00), Montreal, Canada, September 18-21, 2000, Martin Odersky and Philip Wadler (Eds.). ACM, Montreal, Canada. 268–279. https://doi.org/10.1145/351240.351266 Google ScholarDigital Library
Kyle Thomas Dewey. 2017. Automated Black Box Generation of Structured Inputs for Use in Software Testing. University of California, Santa Barbara. Google Scholar
Stephen Dolan and Mindy Preston. 2017. Testing with crowbar. In OCaml Workshop. Google Scholar
Tony Garnock-Jones, Mahdi Eslamimehr, and Alessandro Warth. 2018. Recognising and generating terms using derivatives of parsing expression grammars. arXiv preprint arXiv:1801.10490. Google Scholar
Michele Giry. 1982. A categorical approach to probability theory. In Categorical aspects of topology and analysis. Springer, 68–85. Google Scholar
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 50–59. https://dl.acm.org/doi/10.5555/3155562.3155573 Google ScholarCross Ref
Harrison Goldstein. 2021. Ungenerators. In ICFP Student Research Competition. https://harrisongoldste.in/papers/icfpsrc21.pdf Google Scholar
Harrison Goldstein. 2022. Parsing Randomness: Free Generators Development. Oct, https://doi.org/10.5281/zenodo.7086231 Google ScholarDigital Library
John Hughes. 2007. QuickCheck testing for fun and profit. In International Symposium on Practical Aspects of Declarative Languages. 1–32. https://dl.acm.org/doi/10.1007/978-3-540-69611-7_1 Google Scholar
Oleg Kiselyov and Hiromi Ishii. 2015. Freer monads, more extensible effects. ACM SIGPLAN Notices, 50, 12 (2015), 94–105. https://dl.acm.org/doi/10.1145/2804302.2804319 Google ScholarDigital Library
Leonidas Lampropoulos, Diane Gallois-Wong, Catalin Hritcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. 2017. Beginner’s Luck: a language for property-based generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. 114–129. http://dl.acm.org/citation.cfm?id=3009868 Google ScholarDigital Library
Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C Pierce. 2017. Generating good generators for inductive relations. Proceedings of the ACM on Programming Languages, 2, POPL (2017), 1–30. https://dl.acm.org/doi/10.1145/3158133 Google Scholar
Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. Google Scholar
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady. 10, 707–710. Google Scholar
Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-Based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). Association for Computing Machinery, New York, NY, USA. 46–56. isbn:9781450350761 https://doi.org/10.1145/3092703.3092711 Google ScholarDigital Library
David R MacIver and Zac Hatfield-Dodds. 2019. Hypothesis: A new approach to property-based testing. Journal of Open Source Software, 4, 43 (2019), 1891. Google ScholarCross Ref
Eugenio Moggi. 1991. Notions of computation and monads. Information and computation, 93, 1 (1991), 55–92. Google Scholar
Tomáš Petříček. 2009. Encoding monadic computations in C# using iterators. Proceedings of ITAT. Google Scholar
Sameer Reddy, Caroline Lemieux, Rohan Padhye, and Koushik Sen. 2020. Quickly generating diverse valid test inputs with reinforcement learning. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1410–1421. https://doi.org/10.1145/3377811.3380399 Google ScholarDigital Library
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 283–294. https://doi.org/10.1145/1993498.1993532 Google ScholarDigital Library

Index Terms

Parsing randomness
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

Reflecting on Random Generation

Expert users of property-based testing often labor to craft random generators that encode detailed knowledge about what it means for a test input to be valid and interesting. Fortunately, the fruits of this labor can also be put to other uses. In the ...
Read More
Left recursion in Parsing Expression Grammars

Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
Read More
On parsing and condensing substrings of LR languages in linear time

LR parsers have longbeen known as being an efficient algorithm for recognizing deterministic context-free grammars. In this article, we present a linear-time method for parsing substrings of LR languages. The algorithm depends on the LR automaton which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Programming Languages Volume 6, Issue OOPSLA2
October 2022
1932 pages
EISSN:2475-1421
DOI:10.1145/3554307
Editor:
Philip Wadler
University of Edinburgh, UK
Issue’s Table of Contents
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2022
Published in pacmpl Volume 6, Issue OOPSLA2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
Author Tags
Formal languages
Parsing
Property-based testing
Random generation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 434
  Total Downloads
- Downloads (Last 12 months)301
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parsing randomness

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Reflecting on Random Generation

Left recursion in Parsing Expression Grammars

On parsing and condensing substrings of LR languages in linear time