short-paper

Summarizing Legal Regulatory Documents using Transformers

Authors:
Svea Klaus

University of Granada & E.ON Digital Technology GmbH, Granada, Spain

University of Granada & E.ON Digital Technology GmbH, Granada, Spain
View Profile

,
Ria Van Hecke

E.ON Digital Technology GmbH, Hannover, Germany

E.ON Digital Technology GmbH, Hannover, Germany
View Profile

,
Kaweh Djafari Naini

E.ON Digital Technology GmbH, Hannover, Germany

E.ON Digital Technology GmbH, Hannover, Germany
View Profile

,
Ismail Sengor Altingovde

Middle East Technical University, Ankara, Turkey

Middle East Technical University, Ankara, Turkey
View Profile

,
Juan Bernabé-Moreno

University of Granada & E.ON Digital Technology GmbH, Granada, Spain

University of Granada & E.ON Digital Technology GmbH, Granada, Spain
View Profile

,
Enrique Herrera-Viedma

University of Granada, Granada, Spain

University of Granada, Granada, Spain
View Profile

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2022Pages 2426–2430https://doi.org/10.1145/3477495.3531872

Published:07 July 2022Publication History

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2426–2430

ABSTRACT

Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.

Supplemental Material

SIGIR22-sp1890.mp4

mp4

11.6 MB

Download

References

Sophia Althammer, Arian Askari, Suzan Verberne, and Allan Hanbury. 2021. DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval. (2021). https://arxiv.org/abs/2108.03937Google Scholar
Arian Askari and Suzan Verberne. 2021. Combining Lexical and Neural Retrieval with Longformer-based Summarization for Effective Case Law Retrieval. In Proc. of the Second International Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES). 162--170.Google Scholar
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long- Document Transformer. (2020). https://arxiv.org/abs/2004.05150Google Scholar
Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh. 2021. Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents. CoRR abs/2106.15876 (2021). arXiv:2106.15876 https://arxiv.org/abs/2106.15876Google Scholar
Ilias Chalkidis, Manos Fergadiotis, and Prodromos Malakasiotis. 2019. Large-Scale Multi-Label Text Classification on EU Legislation. Technical Report. 6314--6322 pages. https://eur-lex.europa.eu/Google Scholar
Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, and Þorsteinn Björnsson. 2021. IceSum: An Icelandic Text Summarization Corpus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 9--14.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Diego Feijo and Viviane Moreira. 2019. Summarizing legal rulings: Comparative experiments. In proceedings of the international conference on recent advances in natural language processing (RANLP 2019). 313--322.Google ScholarCross Ref
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://www.aclweb.org/anthology/W04--1013Google Scholar
Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. (2019). http://arxiv.org/abs/1903.10318Google Scholar
Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. CoRR abs/1908.08345 (2019). arXiv:1908.08345 http://arxiv.org/abs/1908.08345Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Ye Liu, Jianguo Zhang, Yao Wan, Congying Xia, Lifang He, and Philip Yu. 2021. HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 146--154. https://doi.org/10.18653/v1/2021.emnlp-main.13Google ScholarCross Ref
Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System. In Proc. of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2342--2348.Google ScholarDigital Library
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404--411.Google Scholar
Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A Deep Reinforced Model for Abstractive Summarization. CoRR abs/1705.04304 (2017). arXiv:1705.04304 http://arxiv.org/abs/1705.04304Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. COLIEE 2020: Methods for Legal Document Retrieval and Entailment. In New Frontiers in Artificial Intelligence - JSAI-isAI 2020 Workshops, JURISIN, LENLS 2020 Workshops. 196--210.Google Scholar
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
Ming Zhong, Pengfei Liu, Yiran Chen, DanqingWang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. (2020), 6197--6208. https://doi.org/10.18653/v1/2020.acl-main.552Google Scholar

Index Terms

Summarizing Legal Regulatory Documents using Transformers
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization
ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

Legal holdings are used in Italy as a critical component of the legal system, serving to establish legal precedents, provide guidance for future legal decisions, and ensure consistency and predictability in the interpretation and application of the law. ...
Read More
Text summarization from legal documents: a survey

Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put ...
Read More
State-of-the-art approach to extractive text summarization: a comprehensive review
Abstract
With the rapid growth of social media platforms, digitization of official records, and digital publication of articles, books, magazines, and newspapers, lots of data are generated every day. This data is a foundation of information and contains a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
eur-lex
extractive text summarization
legal ir
transformer
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 437
  Total Downloads
- Downloads (Last 12 months)148
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Summarizing Legal Regulatory Documents using Transformers

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization

Text summarization from legal documents: a survey

State-of-the-art approach to extractive text summarization: a comprehensive review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Summarizing Legal Regulatory Documents using Transformers

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization

Text summarization from legal documents: a survey

State-of-the-art approach to extractive text summarization: a comprehensive review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media