ABSTRACT
The Generalised-LL (GLL) context-free parsing algorithm was introduced at the 2009 LDTA workshop, and since then a series of variant algorithms and implementations have been described. There is a wide variety of optimisations that may be applied to GLL, some of which were already present in the originally published form.
This paper presents a reference GLL implementation shorn of all optimisations as a common baseline for the real-world comparison of performance across GLL variants. This baseline version has particular value for non-specialists, since its simple form may be straightforwardly encoded in the implementer's preferred programming language.
We also describe our approach to low level memory management of GLL internal data structures. Our evaluation on large inputs shows a factor 3--4 speedup over a naïve implementation using the standard Java APIs and a factor 4--5 reduction in heap requirements. We conclude with notes on some algorithm-level optimisations that may be applied independently of the internal data representation.
- Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, Translation, and Compiling. Prentice-Hall, Inc., USA. isbn:0139145567 Google Scholar
- GNU. 2023. New C parser. https://gcc.gnu.org/wiki/New_C_Parser Accessed: 2023-07-06 Google Scholar
- GNU. 2023. New C parser [patch]. https://gcc.gnu.org/legacy-ml/gcc-patches/2004-10/msg01969.html Accessed: 2023-07-06 Google Scholar
- Anastasia Izmaylova, Ali Afroozeh, and Tijs van der Storm. 2016. Practical, General Parser Combinators. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM ’16). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450340977 https://doi.org/10.1145/2847538.2847539 Google ScholarDigital Library
- Adrian Johnstone and Elizabeth Scott. 2011. Modelling GLL Parser Implementations. In Software Language Engineering, Brian Malloy, Steffen Staab, and Mark van den Brand (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 42–61. isbn:978-3-642-19440-5 Google Scholar
- Adrian Johnstone and Elizabeth Scott. 2015. Principled software microengineering. Science of Computer Programming, 97 (2015), 64–68. issn:0167-6423 https://doi.org/10.1016/j.scico.2013.11.018 Special Issue on New Ideas and Emerging Results in Understanding Software Google ScholarDigital Library
- Paul Klint, Tijs van der Storm, and Jurgen Vinju. 2009. RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation. In 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation. 168–177. https://doi.org/10.1109/SCAM.2009.28 Google ScholarDigital Library
- Scott McPeak and George C. Necula. 2004. Elkhound: A Fast, Practical GLR Parser Generator. In Compiler Construction, Evelyn Duesterwald (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 73–88. isbn:978-3-540-24723-4 https://doi.org/10.1007/978-3-540-24723-4_6 Google ScholarCross Ref
- Thomas J. Pennello. 1986. Very Fast LR Parsing. In Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction (SIGPLAN ’86). Association for Computing Machinery, New York, NY, USA. 145–151. isbn:0897911970 https://doi.org/10.1145/12276.13326 Google ScholarDigital Library
- Elizabeth Scott and Adrian Johnstone. 2010. GLL Parsing. Electronic Notes in Theoretical Computer Science, 253, 7 (2010), 177–189. issn:1571-0661 https://doi.org/10.1016/j.entcs.2010.08.041 Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications (LDTA 2009) Google ScholarDigital Library
- Elizabeth Scott and Adrian Johnstone. 2013. GLL parse-tree generation. Science of Computer Programming, 78, 10 (2013), 1828–1844. issn:0167-6423 https://doi.org/10.1016/j.scico.2012.03.005 Special section on Language Descriptions Tools and Applications (LDTA’08 & ’09) Google ScholarDigital Library
- Elizabeth Scott and Adrian Johnstone. 2016. Structuring the GLL parsing algorithm for performance. Science of Computer Programming, 125 (2016), 1–22. issn:0167-6423 https://doi.org/10.1016/j.scico.2016.04.003 Google ScholarDigital Library
- Elizabeth Scott and Adrian Johnstone. 2018. GLL syntax analysers for EBNF grammars. Science of Computer Programming, 166 (2018), 120–145. issn:0167-6423 https://doi.org/10.1016/j.scico.2018.06.001 Google ScholarCross Ref
- Elizabeth Scott and Adrian Johnstone. 2019. Multiple Lexicalisation (a Java Based Study). In Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2019). Association for Computing Machinery, New York, NY, USA. 71–82. isbn:9781450369817 https://doi.org/10.1145/3357766.3359532 Google ScholarDigital Library
- Elizabeth Scott, Adrian Johnstone, and L. Thomas van Binsbergen. 2019. Derivation representation using binary subtree sets. Science of Computer Programming, 175 (2019), 63–84. issn:0167-6423 https://doi.org/10.1016/j.scico.2019.01.008 Google ScholarDigital Library
- Elizabeth Scott, Adrian Johnstone, and Robert Walsh. 2023. Multiple Input Parsing and Lexical Analysis. ACM Trans. Program. Lang. Syst., 45, 3 (2023), Article 14, jul, 44 pages. issn:0164-0925 https://doi.org/10.1145/3594734 Google ScholarDigital Library
- Daniel Spiewak. 2023. gll-combinators. https://index.scala-lang.org/djspiewak/gll-combinators Accessed: 2023-09-05 Google Scholar
- StackOverflow. 2023. Are GCC and Clang parsers really handwritten? https://stackoverflow.com/questions/6319086/are-gcc-and-clang-parsers-really-handwritten Accessed: 2023-07-06 Google Scholar
- Bjarne Stroustrup. 1995. The Design and Evolution of C++. ACM Press/Addison-Wesley Publishing Co., USA. isbn:0201543303 Google Scholar
- Masaru Tomita. 1985. An Efficient Context-Free Parsing Algorithm for Natural Languages. In Proceedings of the 9th International Joint Conference on Artificial Intelligence - Volume 2 (IJCAI’85). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 756–764. isbn:0934613028 Google ScholarDigital Library
- L. Thomas van Binsbergen, Elizabeth Scott, and Adrian Johnstone. 2018. GLL Parsing with Flexible Combinators. In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2018). Association for Computing Machinery, New York, NY, USA. 16–28. isbn:9781450360296 https://doi.org/10.1145/3276604.3276618 Google ScholarDigital Library
Index Terms
- A Reference GLL Implementation
Recommendations
Multiple Input Parsing and Lexical Analysis
This article introduces two new approaches in the areas of lexical analysis and context-free parsing. We present an extension, MGLL, of generalised parsing which allows multiple input strings to be parsed together efficiently, and we present an enhanced ...
GLL parsing with flexible combinators
SLE 2018: Proceedings of the 11th ACM SIGPLAN International Conference on Software Language EngineeringAt SLE in 2014, Ridge presented the P3 combinator library with which parsers can be developed for left-recursive, non-deterministic and ambiguous grammars. A combinator expression in P3 yields a binarised grammar reflecting the expression's structure. ...
GLL Parsing
Recursive Descent (RD) parsers are popular because their control flow follows the structure of the grammar and hence they are easy to write and to debug. However, the class of grammars which admit RD parsers is very limited. Backtracking techniques may ...
Comments