Skip to main content

A Benchmark Production Tool for Regular Expressions

  • Conference paper
  • First Online:
Implementation and Application of Automata (CIAA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11601))

Included in the following conference series:

Abstract

We describe a new tool, named REgen, that generates regular expressions (RE) to be used as test cases, and that generates also synthetic benchmarks for exercising and measuring the performance of RE-based software libraries and applications. Each group of REs is randomly generated and satisfies a user-specified set of constraints, such as length, nesting depth, operator arity, repetition depth, and syntax tree balancing. In addition to such parameters, other features are chosen by the tool. An RE group may include REs that are ambiguous, or that define the same regular language but differ with respect to their syntactic structure. A benchmark is a collection of RE groups that have a user-specified numerosity and distribution, together with a representative sample of texts for each RE in the collection. We present two generation algorithms for RE groups and for benchmarks. Experimental results are reported for a large benchmark we used to compare the performance of different RE parsing algorithms. The tool REgen and the RE benchmark are publicly available and fill a gap in supporting tools for the development and evaluation of RE applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    On an AMD Athlon dual-core processor with 2.00 GB RAM and 2.20 GHz clock.

References

  1. Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A.: From ambiguous regular expressions to deterministic parsing automata. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 35–48. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22360-5_4

    Chapter  MATH  Google Scholar 

  2. Câmpeanu, C., Salomaa, K., Yu, S.: Regex and extended regex. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 77–84. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44977-9_7

    Chapter  MATH  Google Scholar 

  3. Celentano, A., Crespi Reghizzi, S., Della Vigna, P., Ghezzi, C., Granata, G., Savoretti, F.: Compiler testing using a sentence generator. Softw. Pract. Exp. 10, 897–918 (1980). https://doi.org/10.1002/spe.4380101104

    Article  Google Scholar 

  4. Héam, P.-C., Joly, J.-L.: On the uniform random generation of non deterministic automata up to isomorphism. In: Drewes, F. (ed.) CIAA 2015. LNCS, vol. 9223, pp. 140–152. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22360-5_12

    Chapter  MATH  Google Scholar 

  5. Lee, J., Shallit, J.: Enumerating regular expressions and their languages. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 2–22. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30500-2_2

    Chapter  MATH  Google Scholar 

  6. Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. Int. J. Found. Comput. Sci. 28(5), 543–562 (2017)

    Article  MathSciNet  Google Scholar 

  7. Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley, Boston (2007)

    Google Scholar 

  8. Szilard, A., Yu, S., Zhang, K., Shallit, J.: Characterizing regular languages with polynomial densities. In: Havel, I.M., Koubek, V. (eds.) MFCS 1992. LNCS, vol. 629, pp. 494–503. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55808-X_48

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Breveglieri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borsotti, A., Breveglieri, L., Crespi Reghizzi, S., Morzenti, A. (2019). A Benchmark Production Tool for Regular Expressions. In: Hospodár, M., Jirásková, G. (eds) Implementation and Application of Automata. CIAA 2019. Lecture Notes in Computer Science(), vol 11601. Springer, Cham. https://doi.org/10.1007/978-3-030-23679-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-23679-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-23678-6

  • Online ISBN: 978-3-030-23679-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics