A Benchmark Production Tool for Regular Expressions

Borsotti, Angelo; Breveglieri, Luca; Crespi Reghizzi, Stefano; Morzenti, Angelo

doi:10.1007/978-3-030-23679-3_8

Angelo Borsotti¹⁶,
Luca Breveglieri¹⁶,
Stefano Crespi Reghizzi^16,17 &
…
Angelo Morzenti¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11601))

Included in the following conference series:

International Conference on Implementation and Application of Automata

402 Accesses
2 Citations

Abstract

We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
On an AMD Athlon dual-core processor with 2.00 GB RAM and 2.20 GHz clock.

References

Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A.: From ambiguous regular expressions to deterministic parsing automata. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 35–48. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22360-5_4
Chapter MATH Google Scholar
Câmpeanu, C., Salomaa, K., Yu, S.: Regex and extended regex. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 77–84. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44977-9_7
Chapter MATH Google Scholar
Celentano, A., Crespi Reghizzi, S., Della Vigna, P., Ghezzi, C., Granata, G., Savoretti, F.: Compiler testing using a sentence generator. Softw. Pract. Exp. 10, 897–918 (1980). https://doi.org/10.1002/spe.4380101104
Article Google Scholar
Héam, P.-C., Joly, J.-L.: On the uniform random generation of non deterministic automata up to isomorphism. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 140–152. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22360-5_12
Chapter MATH Google Scholar
Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30500-2_2
Chapter MATH Google Scholar
Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. Int. J. Found. Comput. Sci. 28(5), 543–562 (2017)
Article MathSciNet Google Scholar
Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, Boston (2007)
Google Scholar
Szilard, A., Yu, S., Zhang, K., Shallit, J.: Characterizing regular languages with polynomial densities. In: Havel, I.M., Koubek, V. (eds.) MFCS 1992. LNCS, vol. 629, pp. 494–503. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55808-X_48
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Politecnico di Milano, 20133, Milan, Italy
Angelo Borsotti, Luca Breveglieri, Stefano Crespi Reghizzi & Angelo Morzenti
CNR-IEIIT, 20133, Milan, Italy
Stefano Crespi Reghizzi

Authors

Angelo Borsotti
View author publications
You can also search for this author in PubMed Google Scholar
Luca Breveglieri
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Crespi Reghizzi
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Morzenti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Breveglieri .

Editor information

Editors and Affiliations

Slovak Academy of Sciences, Košice, Slovakia
Michal Hospodár
Slovak Academy of Sciences, Košice, Slovakia
Galina Jirásková

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A. (2019). A Benchmark Production Tool for Regular Expressions. In: Hospodár, M., Jirásková, G. (eds) Implementation and Application of Automata. CIAA 2019. Lecture Notes in Computer Science(), vol 11601. Springer, Cham. https://doi.org/10.1007/978-3-030-23679-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-23679-3_8
Published: 26 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23678-6
Online ISBN: 978-3-030-23679-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics