Skip to main content

Speeding up Natural Language Parsing by Reusing Partial Results

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Abstract

This paper proposes a novel technique that applies case-based reasoning in order to generate templates for reusable parse tree fragments, based on PoS tags of bigrams and trigrams that demonstrate low variability in their syntactic analyses from prior data. The aim of this approach is to improve the speed of dependency parsers by avoiding redundant calculations. This can be resolved by applying the predefined templates that capture results of previous syntactic analyses and directly assigning the stored structure to a new n-gram that matches one of the templates, instead of parsing a similar text fragment again. The study shows that using a heuristic approach to select and reuse the partial results increases parsing speed by reducing the input length to be processed by a parser. The increase in parsing speed comes at some expense of accuracy. Experiments on English show promising results: the input dimension can be reduced by more than 20% at the cost of less than 3 points of Unlabeled Attachment Score.

This work has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150), from the TELEPARES-UDC project (FFI2014-51978-C2-2-R) and the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and from Xunta de Galicia (ED431B 2017/01). We gratefully acknowledge NVIDIA Corporation for the donation of a GTX Titan X GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this implementation we only consider projective trees for trigrams.

  2. 2.

    Intel Core i7-7700 CPU 4.2 GHz.

  3. 3.

    In case a bigram and trigram overlap, the n-gram with higher head confidence will be chosen and its dependents will be removed.

References

  1. Baroni, M.: Distributions in text. In: Corpus Linguistics: An international handbook, vol. 2, pp. 803–821. Mouton de Gruyter (2009)

    Google Scholar 

  2. Bodenstab, N., Dunlop, A., Hall, K., Roark, B.: Beam-width prediction for efficient context-free parsing. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 440–449. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002529

  3. Gómez-Rodríguez, C., Alonso-Alonso, I., Vilares, D.: How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis. Artif. Intell. Rev. 52(3), 2081–2097 (2017). https://doi.org/10.1007/s10462-017-9584-0

    Article  Google Scholar 

  4. Ha, L.Q., Hanna, P., Ming, J., Smith, F.: Extending Zipf’s law to n-grams for large corpora. Artif. Intell. Rev. 32(1–4), 101–113 (2009). https://doi.org/10.1007/s10462-009-9135-4

    Article  Google Scholar 

  5. Ha, L.Q., Sicilia-Garcia, E.I., Ming, J., Smith, F.J.: Extension of Zipf’s law to words and phrases. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1, pp. 1–6. Association for Computational Linguistics (2002)

    Google Scholar 

  6. Hüllermeier, E.: Case-Based Approximate Reasoning, Theory and Decision Library, vol. 44. Springer, Cham (2007). https://doi.org/10.1007/1-4020-5695-8

    Book  Google Scholar 

  7. Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL 4, 313–327 (2016). https://transacl.org/ojs/index.php/tacl/article/view/885

  8. Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of LREC, vol. 6, pp. 2216–2219 (2006)

    Google Scholar 

  9. Nivre, J., et al.: Universal dependencies 2.1 (2017). http://hdl.handle.net/11234/1-2515. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (’UFAL), Faculty of Mathematics and Physics, Charles University

  10. Richter, M.M., Aamodt, A.: Case-based reasoning foundations. Knowl. Eng. Rev. 20(3), 203–207 (2005). https://doi.org/10.1017/S0269888906000695

    Article  Google Scholar 

  11. Smith, F., Devine, K.: Storing and retrieving word phrases. Inf. Process. Manage. 21(3), 215–224 (1985)

    Article  Google Scholar 

  12. Straka, M., Straková, J.: Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017. http://www.aclweb.org/anthology/K/K17/K17-3009.pdf

  13. Vieira, T., Eisner, J.: Learning to prune: exploring the frontier of fast and accurate parsing. Trans. Assoc. Comput. Linguist. 5, 263–278 (2017). https://transacl.org/ojs/index.php/tacl/article/view/924

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Gómez-Rodríguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Strzyz, M., Gómez-Rodríguez, C. (2023). Speeding up Natural Language Parsing by Reusing Partial Results. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics