ABSTRACT
Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.
Supplemental Material
- Sophia Althammer, Arian Askari, Suzan Verberne, and Allan Hanbury. 2021. DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval. (2021). https://arxiv.org/abs/2108.03937Google Scholar
- Arian Askari and Suzan Verberne. 2021. Combining Lexical and Neural Retrieval with Longformer-based Summarization for Effective Case Law Retrieval. In Proc. of the Second International Conference on Design of Experimental Search & Information Retrieval Systems (DESIRES). 162--170.Google Scholar
- Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long- Document Transformer. (2020). https://arxiv.org/abs/2004.05150Google Scholar
- Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, and Saptarshi Ghosh. 2021. Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents. CoRR abs/2106.15876 (2021). arXiv:2106.15876 https://arxiv.org/abs/2106.15876Google Scholar
- Ilias Chalkidis, Manos Fergadiotis, and Prodromos Malakasiotis. 2019. Large-Scale Multi-Label Text Classification on EU Legislation. Technical Report. 6314--6322 pages. https://eur-lex.europa.eu/Google Scholar
- Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, and Þorsteinn Björnsson. 2021. IceSum: An Icelandic Text Summarization Corpus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 9--14.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Diego Feijo and Viviane Moreira. 2019. Summarizing legal rulings: Comparative experiments. In proceedings of the international conference on recent advances in natural language processing (RANLP 2019). 313--322.Google ScholarCross Ref
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81. https://www.aclweb.org/anthology/W04--1013Google Scholar
- Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. (2019). http://arxiv.org/abs/1903.10318Google Scholar
- Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. CoRR abs/1908.08345 (2019). arXiv:1908.08345 http://arxiv.org/abs/1908.08345Google Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Ye Liu, Jianguo Zhang, Yao Wan, Congying Xia, Lifang He, and Philip Yu. 2021. HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 146--154. https://doi.org/10.18653/v1/2021.emnlp-main.13Google ScholarCross Ref
- Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System. In Proc. of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2342--2348.Google ScholarDigital Library
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404--411.Google Scholar
- Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A Deep Reinforced Model for Abstractive Summarization. CoRR abs/1705.04304 (2017). arXiv:1705.04304 http://arxiv.org/abs/1705.04304Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
- Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. COLIEE 2020: Methods for Legal Document Retrieval and Entailment. In New Frontiers in Artificial Intelligence - JSAI-isAI 2020 Workshops, JURISIN, LENLS 2020 Workshops. 196--210.Google Scholar
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
- Ming Zhong, Pengfei Liu, Yiran Chen, DanqingWang, Xipeng Qiu, and Xuanjing Huang. 2020. Extractive Summarization as Text Matching. (2020), 6197--6208. https://doi.org/10.18653/v1/2020.acl-main.552Google Scholar
Index Terms
- Summarizing Legal Regulatory Documents using Transformers
Recommendations
Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization
ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and LawLegal holdings are used in Italy as a critical component of the legal system, serving to establish legal precedents, provide guidance for future legal decisions, and ensure consistency and predictability in the interpretation and application of the law. ...
Text summarization from legal documents: a survey
Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put ...
State-of-the-art approach to extractive text summarization: a comprehensive review
AbstractWith the rapid growth of social media platforms, digitization of official records, and digital publication of articles, books, magazines, and newspapers, lots of data are generated every day. This data is a foundation of information and contains a ...
Comments