Published June 5, 2023 | Version v1
Dataset Open

BioKG with attributes and decoupled benchmarks

  • 1. Vrije Universiteit Amsterdam

Description

BioKG is a biomedical knowledge graph containing relationships between proteins, molecules, diseases, and others. It was originally proposed by Walsh et al. (2020) in "BioKG: A Knowledge Graph for Relational Learning On Biological Data".

We enrich this dataset with the aim of incorporating multimodal data associated with biomedical entities:

  • Proteins: Protein embeddings computed with ProtTrans from aminoacid sequences
  • Molecules: Molecule embeddings computed with MolTrans from SMILES representations
  • Diseases: Textual descriptions retrieved from MeSH

Furthermore, we decouple the benchmarks provided by Walsh et al. from the edges in the knowledge graph, which ensures that there is no direct data leakage between the benchmarks and the triples used to train link prediction models.

Files

Files (2.4 GB)

Name Size Download all
md5:0d545cf722fa16bed4ed459fdeb489cb
2.4 GB Download