Skip to main content

Context-Compatible Information Fusion for Scientific Knowledge Graphs

  • Conference paper
  • First Online:
Digital Libraries for Open Knowledge (TPDL 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12246))

Included in the following conference series:

Abstract

Currently, a trend to augment document collections with entity-centric knowledge provided by knowledge graphs is clearly visible, especially in scientific digital libraries. Entity facts are either manually curated, or for higher scalability automatically harvested from large volumes of text documents. The often claimed benefit is that a collection-wide fact extraction combines information from huge numbers of documents into one single database. However, even if the extraction process would be 100% correct, the promise of pervasive information fusion within retrieval tasks poses serious threats with respect to the results’ validity. This is because important contextual information provided by each document is often lost in the process and cannot be readily restored at retrieval time. In this paper, we quantify the consequences of uncontrolled knowledge graph evolution in real-world scientific libraries using NLM’s PubMed corpus vs. the SemMedDB knowledge base. Moreover, we operationalise the notion of implicit context as a viable solution to gain a sense of context compatibility for all extracted facts based on the pair-wise coherence of all documents used for extraction: Our derived measures for context compatibility determine which facts are relatively safe to combine. Moreover, they allow to balance between precision and recall. Our practical experiments extensively evaluate context compatibility based on implicit contexts for typical digital library tasks. The results show that our implicit notion of context compatibility is superior to existing methods in terms of both, simplicity and retrieval quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developers.google.com/knowledge-graph/.

  2. 2.

    https://www.drugbank.ca.

  3. 3.

    https://www.uniprot.org.

  4. 4.

    https://github.com/HermannKroll/ContextInformationFusion.

  5. 5.

    https://skr3.nlm.nih.gov/SemMedDB/.

  6. 6.

    https://www.nlm.nih.gov/databases/download/pubmed_medline.html.

References

  1. Auer, S., Kovtun, V., Prinz, M., Kasprzik, A., Stocker, M., Vidal, M.E.: Towards a knowledge graph for science. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics. WIMS 2018. ACM (2018)

    Google Scholar 

  2. Bechhofer, S., et al.: Why linked data is not enough for scientists. Fut. Gener. Comput. Syst. 29(2), 599–611 (2013)

    Article  Google Scholar 

  3. Candan, K.S., Liu, H., Suvarna, R.: Resource description framework: metadata and its applications. SIGKDD Expl. 3(1), 6–19 (2001)

    Article  Google Scholar 

  4. Carothers, G.: RDF 1.1 N-Quads. https://www.w3.org/TR/n-quads/ (2014)

  5. Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proceedings of the 14th International Conference on WWW, WWW 2005, pp. 613–622. ACM (2005)

    Google Scholar 

  6. Ernst, P., Siu, A., Weikum, G.: Highlife: higher-arity fact harvesting. In: Proceedings of the 2018 World Wide Web Conference, WWW 2018, International World Wide Web Conference on Steering Committee, pp. 1013–1022 (2018)

    Google Scholar 

  7. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25

    Chapter  Google Scholar 

  8. Hayes, P.J., Patel-Schneider, P.F.: RDF 1.1 Semantics. https://www.w3.org/TR/rdf11-mt/##whatnot (2014)

  9. Hernández, D., Hogan, A., Krötzsch, M.: Reifying RDF: what works well with Wikidata? In: Proceedings of the 11th International Work. on Scalable Semantic Web Knowledge Base Systems. CEUR Working Proceedings, vol. 1457, pp. 32–47. CEUR-WS.org (2015)

    Google Scholar 

  10. Kalo, J.C., Homoceanu, S., Rose, J., Balke, W.T.: Avoiding Chinese Whispers: controlling end-to-end join quality in linked open data stores. In: Proceedings of the ACM Web Science Conference, WebSci 2015, pp. 5:1–5:10. ACM (2015)

    Google Scholar 

  11. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012)

    Article  Google Scholar 

  12. Lebo, T., Sahoo, S., McGuinness, D.: PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/ (2013)

  13. Patel-Schneider, P.: Contextualization via qualifiers. In: Workshop on Contextualized Knowledge Graphs co-located with 17th International Semantic Web Conference on, CKG@ISWC 2018 (2018). http://wiki.knoesis.org/index.php/CKG2018

  14. Pinto, J.M.G., Balke, W.-T.: Can plausibility help to support high quality content in digital libraries? In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 169–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_14

    Chapter  Google Scholar 

  15. Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)

    Article  Google Scholar 

  16. Swanson, D.R.: Complementary structures in disjoint science literatures. In: Proc. of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 280–289. SIGIR 1991, ACM (1991)

    Google Scholar 

  17. Tan, W.C.: Provenance in databases: past, current, and future. Bull. IEEE Comput. Soc. Techn. Committee Data Eng. 30(4), 3–12 (2007)

    Google Scholar 

  18. Vahdati, S., Palma, G., Nath, R.J., Lange, C., Auer, S., Vidal, M.-E.: Unveiling scholarly communities over knowledge graphs. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 103–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00066-0_9

    Chapter  Google Scholar 

  19. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  20. Wylot, M., Cudré-Mauroux, P., Hauswirth, M., Groth, P.: Storing, tracking, and querying provenance in linked data. IEEE Trans. Knowl. Data Eng. 29(8), 1751–1764 (2017)

    Article  Google Scholar 

  21. Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: a survey. IEEE Trans. Big Data 3(1), 18–35 (2017)

    Article  Google Scholar 

  22. Zhang, R., et al.: Using semantic predications to uncover drug-drug interactions in clinical data. J. Biomed. Inform. 49, 134–147 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hermann Kroll .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kroll, H., Kalo, JC., Nagel, D., Mennicke, S., Balke, WT. (2020). Context-Compatible Information Fusion for Scientific Knowledge Graphs. In: Hall, M., MerÄŤun, T., Risse, T., Duchateau, F. (eds) Digital Libraries for Open Knowledge. TPDL 2020. Lecture Notes in Computer Science(), vol 12246. Springer, Cham. https://doi.org/10.1007/978-3-030-54956-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54956-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54955-8

  • Online ISBN: 978-3-030-54956-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics