Skip to main content

RELD: A Knowledge Graph of Relation Extraction Datasets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13870))

Abstract

Relation extraction plays an important role in natural language processing. There is a wide range of available datasets that benchmark existing relation extraction approaches. However, most benchmarking datasets are provided in different formats containing specific annotation rules, thus making it difficult to conduct experiments on different types of relation extraction approaches. We present RELD, an RDF knowledge graph of eight open-licensed and publicly available relation extraction datasets. We modeled the benchmarking datasets into a single ontology that provides a unified format for data access, along with annotations required for training different types of relation extraction systems. Moreover, RELD abides by the Linked Data principles. To the best of our knowledge, RELD is the largest RDF knowledge graph of entities and relations from text, containing \(\sim \)1230 million triples describing 1034 relations, 2 million sentences, 3 million abstracts and 4013 documents. RELD contributes to a variety of uses in the natural language processing community, and distinctly provides unified and easy modeling of data for benchmarking relation extraction and named entity recognition models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We excluded datasets (e.g., TACRED) that are not freely available in this current version of the RELD. However, they can easily be included in the future.

  2. 2.

    The detail information of schema, i.e., object properties, data properties, classes are available on RELD homepage.

  3. 3.

    Due to page size limitation, some details and extra instances are truncated from Listing 1.2.

  4. 4.

    We use VoID vocabulary to describe different metadata of the dataset.

  5. 5.

    We use NLTK [9] for tokenization, parts of speech tagging, and punctuation.

  6. 6.

    The complete details of the mapping process and the tools used are available in the tutorial https://reld-tutorial.readthedocs.io/en/latest/tutorial.html.

References

  1. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94 (2000)

    Google Scholar 

  2. Ali, M., Saleem, M., Ngomo, A.C.N.: Rebench: microbenchmarking framework for relation extraction systems. In: Sattler, U., et al. (eds.) ISWC 2022. LNCS, vol. 13489, pp. 643–659. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19433-7_37

    Chapter  Google Scholar 

  3. Batista, D.S., Martins, B., Silva, M.J.: Semi-supervised bootstrapping of relationship extractors with distributional semantics. In: Empirical Methods in Natural Language Processing. ACL (2015)

    Google Scholar 

  4. Elsahar, H., et al.: T-rex: a large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)

    Google Scholar 

  5. Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 179–188. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1017. https://aclanthology.org/P17-1017

  6. Han, X., et al.: Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: EMNLP (2018)

    Google Scholar 

  7. Hendrickx, I., et al.: SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 33–38. Association for Computational Linguistics (2010). https://aclanthology.org/S10-1006

  8. Honnibal, M., Montani, I.: spacy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. 7(1), 411–420 (2017, to appear)

    Google Scholar 

  9. Loper, E., Bird, S.: Nltk: the natural language toolkit. arXiv preprint cs/0205028 (2002)

    Google Scholar 

  10. Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extraction meets the semantic web: a survey. Semant. Web 11(2), 255–335 (2020)

    Article  Google Scholar 

  11. Moreira, J., Oliveira, C., Macêdo, D., Zanchettin, C., Barbosa, L.: Distantly-supervised neural relation extraction with side information using BERT. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9206648

  12. Moussallem, D., Usbeck, R., Röeder, M., Ngomo, A.C.N.: Mag: a multilingual, knowledge-base agnostic and deterministic entity linking approach. In: Proceedings of the Knowledge Capture Conference, pp. 1–8 (2017)

    Google Scholar 

  13. Nadgeri, A., et al.: KGPool: dynamic knowledge graph context selection for relation extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 535–548. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-acl.48. https://aclanthology.org/2021.findings-acl.48

  14. Ngonga Ngomo, A.C., et al.: LIMES - a framework for link discovery on the semantic web. KI-Künstliche Intelligenz, German Journal of Artificial Intelligence - Organ des Fachbereichs "Künstliche Intelligenz" der Gesellschaft für Informatik e.V. (2021). https://papers.dice-research.org/2021/KI_LIMES/public.pdf

  15. Ning, Q., Feng, Z., Roth, D.: A structured learning approach to temporal relation extraction. arXiv preprint arXiv:1906.04943 (2019)

  16. Orr, D.: 50,000 lessons on how to read: a relation extraction corpus. Online: Google Research Blog, vol. 11 (2013)

    Google Scholar 

  17. Pawar, S., Palshikar, G.K., Bhattacharyya, P.: Relation extraction: a survey. arXiv preprint arXiv:1712.05191 (2017)

  18. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 (2019)

  19. Qu, M., Gao, T., Xhonneux, L.P., Tang, J.: Few-shot relation extraction via Bayesian meta-learning on relation graphs. In: Daume III, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 7867–7876. PMLR (2020). https://proceedings.mlr.press/v119/qu20a.html

  20. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning (2016)

    Google Scholar 

  21. Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84 (2013)

    Google Scholar 

  22. Sorokin, D., Gurevych, I.: Context-aware representations for knowledge base relation extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1784–1789. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/D17-1188. https://aclanthology.org/D17-1188

  23. Sui, D., Chen, Y., Liu, K., Zhao, J., Zeng, X., Liu, S.: Joint entity and relation extraction with set prediction networks. arXiv preprint arXiv:2011.01675 (2020)

  24. Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012)

    Google Scholar 

  25. Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7498–7505. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.acl-main.669

  26. Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction (2020). https://doi.org/10.48550/ARXIV.2005.00087. https://arxiv.org/abs/2005.00087

  27. Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, vol. 57, p. 45 (2006)

    Google Scholar 

  28. Wang, Y., Yu, B., Zhang, Y., Liu, T., Zhu, H., Sun, L.: TPLinker: single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 1572–1582. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.138. https://aclanthology.org/2020.coling-main.138

  29. Yao, Y., et al.: Docred: a large-scale document-level relation extraction dataset. arXiv preprint arXiv:1906.06127 (2019)

  30. Yu, M., Yin, W., Hasan, K.S., Santos, C.d., Xiang, B., Zhou, B.: Improved neural relation detection for knowledge base question answering. arXiv preprint arXiv:1704.06194 (2017)

  31. Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 35–45 (2017)

    Google Scholar 

Download references

Acknowledgment

This work has been supported by the BMBF-funded EuroStars project PORQUE (01QE2056C), the European Union’s Horizon Europe research and innovation programme ENEXA (101070305), the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) within the project SAIL (NW21-059D) and the University of Malakand Pakistan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manzoor Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ali, M., Saleem, M., Moussallem, D., Sherif, M.A., Ngonga Ngomo, AC. (2023). RELD: A Knowledge Graph of Relation Extraction Datasets. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33455-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33454-2

  • Online ISBN: 978-3-031-33455-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics