RELD: A Knowledge Graph of Relation Extraction Datasets

Ali, Manzoor; Saleem, Muhammad; Moussallem, Diego; Sherif, Mohamed Ahmed; Ngonga Ngomo, Axel-Cyrille

doi:10.1007/978-3-031-33455-9_20

RELD: A Knowledge Graph of Relation Extraction Datasets

Conference paper
First Online: 22 May 2023

898 Accesses
7 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13870))

Abstract

Relation extraction plays an important role in natural language processing. There is a wide range of available datasets that benchmark existing relation extraction approaches. However, most benchmarking datasets are provided in different formats containing specific annotation rules, thus making it difficult to conduct experiments on different types of relation extraction approaches. We present RELD, an RDF knowledge graph of eight open-licensed and publicly available relation extraction datasets. We modeled the benchmarking datasets into a single ontology that provides a unified format for data access, along with annotations required for training different types of relation extraction systems. Moreover, RELD abides by the Linked Data principles. To the best of our knowledge, RELD is the largest RDF knowledge graph of entities and relations from text, containing \(\sim \)1230 million triples describing 1034 relations, 2 million sentences, 3 million abstracts and 4013 documents. RELD contributes to a variety of uses in the natural language processing community, and distinctly provides unified and easy modeling of data for benchmarking relation extraction and named entity recognition models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We excluded datasets (e.g., TACRED) that are not freely available in this current version of the RELD. However, they can easily be included in the future.
2.
The detail information of schema, i.e., object properties, data properties, classes are available on RELD homepage.
3.
Due to page size limitation, some details and extra instances are truncated from Listing 1.2.
4.
We use VoID vocabulary to describe different metadata of the dataset.
5.
We use NLTK [9] for tokenization, parts of speech tagging, and punctuation.
6.
The complete details of the mapping process and the tools used are available in the tutorial https://reld-tutorial.readthedocs.io/en/latest/tutorial.html.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94 (2000)
Google Scholar
Ali, M., Saleem, M., Ngomo, A.C.N.: Rebench: microbenchmarking framework for relation extraction systems. In: Sattler, U., et al. (eds.) ISWC 2022. LNCS, vol. 13489, pp. 643–659. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19433-7_37
Chapter Google Scholar
Batista, D.S., Martins, B., Silva, M.J.: Semi-supervised bootstrapping of relationship extractors with distributional semantics. In: Empirical Methods in Natural Language Processing. ACL (2015)
Google Scholar
Elsahar, H., et al.: T-rex: a large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Google Scholar
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 179–188. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1017. https://aclanthology.org/P17-1017
Han, X., et al.: Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: EMNLP (2018)
Google Scholar
Hendrickx, I., et al.: SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 33–38. Association for Computational Linguistics (2010). https://aclanthology.org/S10-1006
Honnibal, M., Montani, I.: spacy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. 7(1), 411–420 (2017, to appear)
Google Scholar
Loper, E., Bird, S.: Nltk: the natural language toolkit. arXiv preprint cs/0205028 (2002)
Google Scholar
Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extraction meets the semantic web: a survey. Semant. Web 11(2), 255–335 (2020)
Article Google Scholar
Moreira, J., Oliveira, C., Macêdo, D., Zanchettin, C., Barbosa, L.: Distantly-supervised neural relation extraction with side information using BERT. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9206648
Moussallem, D., Usbeck, R., Röeder, M., Ngomo, A.C.N.: Mag: a multilingual, knowledge-base agnostic and deterministic entity linking approach. In: Proceedings of the Knowledge Capture Conference, pp. 1–8 (2017)
Google Scholar
Nadgeri, A., et al.: KGPool: dynamic knowledge graph context selection for relation extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 535–548. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-acl.48. https://aclanthology.org/2021.findings-acl.48
Ngonga Ngomo, A.C., et al.: LIMES - a framework for link discovery on the semantic web. KI-Künstliche Intelligenz, German Journal of Artificial Intelligence - Organ des Fachbereichs "Künstliche Intelligenz" der Gesellschaft für Informatik e.V. (2021). https://papers.dice-research.org/2021/KI_LIMES/public.pdf
Ning, Q., Feng, Z., Roth, D.: A structured learning approach to temporal relation extraction. arXiv preprint arXiv:1906.04943 (2019)
Orr, D.: 50,000 lessons on how to read: a relation extraction corpus. Online: Google Research Blog, vol. 11 (2013)
Google Scholar
Pawar, S., Palshikar, G.K., Bhattacharyya, P.: Relation extraction: a survey. arXiv preprint arXiv:1712.05191 (2017)
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 (2019)
Qu, M., Gao, T., Xhonneux, L.P., Tang, J.: Few-shot relation extraction via Bayesian meta-learning on relation graphs. In: Daume III, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 7867–7876. PMLR (2020). https://proceedings.mlr.press/v119/qu20a.html
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning (2016)
Google Scholar
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84 (2013)
Google Scholar
Sorokin, D., Gurevych, I.: Context-aware representations for knowledge base relation extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1784–1789. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/D17-1188. https://aclanthology.org/D17-1188
Sui, D., Chen, Y., Liu, K., Zhao, J., Zeng, X., Liu, S.: Joint entity and relation extraction with set prediction networks. arXiv preprint arXiv:2011.01675 (2020)
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012)
Google Scholar
Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7498–7505. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.acl-main.669
Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction (2020). https://doi.org/10.48550/ARXIV.2005.00087. https://arxiv.org/abs/2005.00087
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, vol. 57, p. 45 (2006)
Google Scholar
Wang, Y., Yu, B., Zhang, Y., Liu, T., Zhu, H., Sun, L.: TPLinker: single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 1572–1582. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.138. https://aclanthology.org/2020.coling-main.138
Yao, Y., et al.: Docred: a large-scale document-level relation extraction dataset. arXiv preprint arXiv:1906.06127 (2019)
Yu, M., Yin, W., Hasan, K.S., Santos, C.d., Xiang, B., Zhou, B.: Improved neural relation detection for knowledge base question answering. arXiv preprint arXiv:1704.06194 (2017)
Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 35–45 (2017)
Google Scholar

Download references

Acknowledgment

This work has been supported by the BMBF-funded EuroStars project PORQUE (01QE2056C), the European Union’s Horizon Europe research and innovation programme ENEXA (101070305), the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) within the project SAIL (NW21-059D) and the University of Malakand Pakistan.

Author information

Authors and Affiliations

DICE Group, Department of Computer Science, Paderborn University, Paderborn, Germany
Manzoor Ali, Muhammad Saleem, Diego Moussallem, Mohamed Ahmed Sherif & Axel-Cyrille Ngonga Ngomo

Authors

Manzoor Ali
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Saleem
View author publications
You can also search for this author in PubMed Google Scholar
Diego Moussallem
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ahmed Sherif
View author publications
You can also search for this author in PubMed Google Scholar
Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manzoor Ali .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Catia Pesquita
University of London, London, UK
Ernesto Jimenez-Ruiz
Rensselaer Polytechnic Institute, Troy, MI, USA
Jamie McCusker
Universidade de Lisboa, Lisbon, Portugal
Daniel Faria
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
EURECOM, Biot, France
Raphael Troncy
University of Mannheim, Mannheim, Germany
Sven Hertling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ali, M., Saleem, M., Moussallem, D., Sherif, M.A., Ngonga Ngomo, AC. (2023). RELD: A Knowledge Graph of Relation Extraction Datasets. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-33455-9_20
Published: 22 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33454-2
Online ISBN: 978-3-031-33455-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics