skip to main content
10.1145/3477495.3531872acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Summarizing Legal Regulatory Documents using Transformers

Published:07 July 2022Publication History

ABSTRACT

Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.

Skip Supplemental Material Section

Supplemental Material

SIGIR22-sp1890.mp4

mp4

11.6 MB

References

  1. Sophia Althammer, Arian Askari, Suzan Verberne, and Allan Hanbury. 2021. DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval. (2021). https://arxiv.org/abs/2108.03937Google ScholarGoogle Scholar
  2. Arian Askari and Suzan Verberne. 2021. Combining Lexical and Neural Retrieval with Longformer-based Summarization for Effective Case Law Retrieval. In Proc. of the Second International Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES). 162--170.Google ScholarGoogle Scholar
  3. Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long- Document Transformer. (2020). https://arxiv.org/abs/2004.05150Google ScholarGoogle Scholar
  4. Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh. 2021. Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents. CoRR abs/2106.15876 (2021). arXiv:2106.15876 https://arxiv.org/abs/2106.15876Google ScholarGoogle Scholar
  5. Ilias Chalkidis, Manos Fergadiotis, and Prodromos Malakasiotis. 2019. Large-Scale Multi-Label Text Classification on EU Legislation. Technical Report. 6314--6322 pages. https://eur-lex.europa.eu/Google ScholarGoogle Scholar
  6. Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, and Þorsteinn Björnsson. 2021. IceSum: An Icelandic Text Summarization Corpus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 9--14.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  8. Diego Feijo and Viviane Moreira. 2019. Summarizing legal rulings: Comparative experiments. In proceedings of the international conference on recent advances in natural language processing (RANLP 2019). 313--322.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://www.aclweb.org/anthology/W04--1013Google ScholarGoogle Scholar
  10. Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. (2019). http://arxiv.org/abs/1903.10318Google ScholarGoogle Scholar
  11. Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. CoRR abs/1908.08345 (2019). arXiv:1908.08345 http://arxiv.org/abs/1908.08345Google ScholarGoogle Scholar
  12. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  13. Ye Liu, Jianguo Zhang, Yao Wan, Congying Xia, Lifang He, and Philip Yu. 2021. HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 146--154. https://doi.org/10.18653/v1/2021.emnlp-main.13Google ScholarGoogle ScholarCross RefCross Ref
  14. Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System. In Proc. of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2342--2348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404--411.Google ScholarGoogle Scholar
  16. Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A Deep Reinforced Model for Abstractive Summarization. CoRR abs/1705.04304 (2017). arXiv:1705.04304 http://arxiv.org/abs/1705.04304Google ScholarGoogle Scholar
  17. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  18. Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. COLIEE 2020: Methods for Legal Document Retrieval and Entailment. In New Frontiers in Artificial Intelligence - JSAI-isAI 2020 Workshops, JURISIN, LENLS 2020 Workshops. 196--210.Google ScholarGoogle Scholar
  19. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google ScholarGoogle Scholar
  20. Ming Zhong, Pengfei Liu, Yiran Chen, DanqingWang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. (2020), 6197--6208. https://doi.org/10.18653/v1/2020.acl-main.552Google ScholarGoogle Scholar

Index Terms

  1. Summarizing Legal Regulatory Documents using Transformers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2022
        3569 pages
        ISBN:9781450387323
        DOI:10.1145/3477495

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 July 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader