Skip to main content

Topic Modelling of Legal Texts Using Bidirectional Encoder Representations from Sentence Transformers

  • Conference paper
  • First Online:
Advances in Information Systems, Artificial Intelligence and Knowledge Management (ICIKS 2023)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 486))

Included in the following conference series:

  • 85 Accesses

Abstract

Topic Modeling of legal texts is a challenging task because of its complicated language structures, and technical features. Recently, there has been a big boost in the number of legislative documents, which makes it very difficult for law experts to keep up with legislation like implementing acts and analyzing cases. The importance of topics is affected by the processing and the presentation of law texts in some contexts. The aim of this work is to figure out the legal opinions from cases seen by the supreme court of the United States and the legal judgments from cases seen by the supreme court of India. In this study we used different Language Models to create sentence embeddings from those legal texts datasets. This paper employs BERTopic technique and a baseline approach in order to discover significant topics from legal opinions and legal judgment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.courtlistener.com/.

  2. 2.

    http://www.westlawindia.com.

  3. 3.

    https://github.com/Law-AI/semantic-segmentation.

  4. 4.

    https://www.sbert.net/.

  5. 5.

    https://radimrehurek.com/gensim/index.html.

References

  1. Nogales, A., Täks, E., Taveter, K.: Ontology modeling of the estonian traffic act for self-driving buses. In: Lossio-Ventura, J.A., Muñante, D., Alatrista-Salas, H. (eds.) SIMBig 2018. CCIS, vol. 898, pp. 249–256. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11680-4_24

    Chapter  Google Scholar 

  2. Ruhl, J.B., Nay, J., Gilligan, J.: Topic modeling the president: conventional and computational methods. Geo. Wash. L. Rev. 86, 1243 (2018)

    Google Scholar 

  3. Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)

    Article  Google Scholar 

  4. Ray, S.K., Ahmad, A., Kumar, C.A.: Review and implementation of topic modeling in Hindi. Appl. Artif. Intell. 33(11), 979–1007 (2019)

    Article  Google Scholar 

  5. Pilato, G., Vassallo, G.: TSVD as a statistical estimator in the latent semantic analysis paradigm. IEEE Trans. Emerg. Top. Comput. 3(2), 185–192 (2014)

    Article  Google Scholar 

  6. Rajandeep, K., Manpreet, K.: Latent semantic analysis: searching technique for text documents. Int. J. Eng. Dev. Res. 3(2), 803–806 (2015)

    Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  8. Mu, W., Lim, K.H., Liu, J., Karunasekera, S., Falzon, L., Harwood, A.: A clustering-based topic model using word networks and word embeddings. J. Big Data 9(1), 1–38 (2022)

    Article  Google Scholar 

  9. Kadir, N.H.M., Aliman, S.: Text analysis on health product reviews using R approach. Indones. J. Electr. Eng. Comput. Sci. (IJEECS) 18(3), 1303–1310 (2020)

    Google Scholar 

  10. Mangsor, N.S.M.N., Nasir, S.A.M., Yaacob, W.F.W., Ismail, Z., Rahman, S.A.: Analysing corporate social responsibility reports using document clustering and topic modeling techniques. Indones. J. Electr. Eng. Comput. Sci. 26(3), 1546–1555 (2022)

    Google Scholar 

  11. Remmits, Y.: Finding the topics of case law: latent dirichlet allocation on supreme court decisions (2017)

    Google Scholar 

  12. Luz De Araujo, P.H., De Campos, T.: Topic modelling brazilian supreme court lawsuits. In: Legal Knowledge and Information Systems, pp. 113–122. IOS Press (2020)

    Google Scholar 

  13. Mohammed, S.H., Al-augby, S.: LSA & LDA topic modeling classification: comparison study on e-books. Indones. J. Electr. Eng. Comput. Sci. 19(1), 353–362 (2020)

    Google Scholar 

  14. O’Neill, J., Robin, C., O’Brien, L., Buitelaar, P.: An analysis of topic modelling for legislative texts. In: CEUR Workshop Proceedings (2016)

    Google Scholar 

  15. Angelov, D.: Top2vec: distributed representations of topics. arXiv preprint arXiv:2008.09470 (2020)

  16. Rawat, A.J., Ghildiyal, S., Dixit, A.K.: Topic modelling of legal documents using NLP and bidirectional encoder representations from transformers. Indones. J. Electr. Eng. Comput. Sci. 28(3), 1749–1755 (2022)

    Google Scholar 

  17. Silveira, R., Fernandes, C., Neto, J.A.M., Furtado, V., Pimentel Filho, J.E.: Topic modelling of legal documents via LEGAL-BERT. In: Proceedings http://ceur-ws org ISSN 1613 0073 (2021)

    Google Scholar 

  18. Grootendorst, M.: BERTopic: neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022)

  19. Gunjan, V.K., Zurada, J.M.: Modern Approaches in Machine Learning & Cognitive Science: A Walkthrough. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96634-8

    Book  Google Scholar 

  20. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  21. Abuzayed, A., Al-Khalifa, H.: BERT for Arabic topic modeling: an experimental study on BERTopic technique. Procedia Comput. Sci. 189, 191–194 (2021)

    Article  Google Scholar 

  22. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)

    Google Scholar 

  23. Thinsungnoena, T., Kaoungkub, N., Durongdumronchaib, P., Kerdprasopb, K., Kerdprasopb, N.: The clustering validity with silhouette and sum of squared errors. Learning 3(7) (2015)

    Google Scholar 

  24. Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Legal Knowledge and Information Systems: JURIX 2019: The Thirty-second Annual Conference, vol. 322. IOS Press (2019)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Google PhD Fellowships program and by Google Cloud Platform (GCP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eya Hammami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hammami, E., Faiz, R. (2024). Topic Modelling of Legal Texts Using Bidirectional Encoder Representations from Sentence Transformers. In: Saad, I., Rosenthal-Sabroux, C., Gargouri, F., Chakhar, S., Williams, N., Haig, E. (eds) Advances in Information Systems, Artificial Intelligence and Knowledge Management. ICIKS 2023. Lecture Notes in Business Information Processing, vol 486. Springer, Cham. https://doi.org/10.1007/978-3-031-51664-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51664-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51663-4

  • Online ISBN: 978-3-031-51664-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics