Abstract
Knowledge Graphs have been successfully adopted in recent years, existing general-purpose ones, like Wikidata, as well as domain-specific ones, like UniProt. Their increasing size poses new challenges to their practical usage. As an example, Wikidata has been growing the size of its contents and their data since its inception making it difficult to download and process its data. Although the structure of Wikidata items is flexible, it tends to be heterogeneous: the shape of an entity representing a human is distinct from that of a mountain. Recently, Wikidata adopted Entity Schemas to facilitate the definition of different schemas using Shape Expressions, a language that can be used to describe and validate RDF data. In this paper, we present an approach to obtain subsets of knowledge graphs based on Shape Expressions that use an implementation of the Pregel algorithm implemented in Rust. We have applied our approach to obtain subsets of Wikidata and UniProt and present some of these experiments’ results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 967–980. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1376616.1376712
Beghaeiraveri, S.A.H., et al.: Wikidata subsetting: approaches, tools, and evaluation (2023). https://www.semantic-web-journal.net/system/files/swj3491.pdf
The UniProt Consortium: UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51(D1), D523–D531 (2022). https://doi.org/10.1093/nar/gkac1052
Gayo, J.E.L.: Creating knowledge graphs subsets using shape expressions (2021). https://doi.org/10.z8550/ARXIV.2110.11709. https://arxiv.org/abs/2110.11709
Gayo, J.E.L.: Wshex: a language to describe and validate wikibase entities (2022). https://arxiv.org/abs/2208.02697
Hogan, A., et al.: Knowledge graphs. CoRR abs/2003.02320 (2020). https://arxiv.org/abs/2003.02320
Labra-Gayo, J.E., et al.: Knowledge graphs and wikidata subsetting (2021). https://doi.org/10.37044/osf.io/wu9et. http://biohackrxiv.org/wu9et
Labra-Gayo, J.E., et al.: RDF Data integration using Shape Expressions (2023). https://biohackrxiv.org/md73k
Labra Gayo, J.E., Prud’hommeaux, E., Boneva, I., Kontokostas, D.: Validating RDF Data. No. 1 in Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool Publishers LLC (2017). https://doi.org/10.2200/s00786ed1v01y201707wbe016
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data, New York, NY, USA, pp. 135–146 (2010). https://doi.org/10.1145/1807167.1807184
McCune, R.R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2) (2015). https://doi.org/10.1145/2818185
Prud’hommeaux, E., Labra Gayo, J.E., Solbrig, H.: Shape expressions: an RDF validation and transformation language. In: Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS 2014, pp. 32–40. ACM (2014)
Reutter, J.L., Soto, A., Vrgoč, D.: Recursion in SPARQL. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 19–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_2
Thornton, K., Solbrig, H., Stupp, G.S., Labra Gayo, J.E., Mietchen, D., Prud’hommeaux, E., Waagmeester, A.: Using shape expressions (ShEx) to share RDF data models and to guide curation with rigorous validation. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 606–620. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_39
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Series in Multimedia Information and Systems. Morgan Kaufmann, San Francisco (1999)
Xu, Q., Wang, X., Li, J., Zhang, Q., Chai, L.: Distributed subgraph matching on big knowledge graphs using pregel. IEEE Access 7, 116453–116464 (2019). https://doi.org/10.1109/ACCESS.2019.2936465
Acknowledgements
This project has received funding from NumFOCUS, a non-profit organization promoting open-source scientific projects, and has been supported by the ANGLIRU project, funded by the Spanish Agency for Research. The opinions and arguments employed herein do not reflect the official views of these organizations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Préstamo, Á.I., Gayo, J.E.L. (2023). Using Pregel to Create Knowledge Graphs Subsets Described by Non-recursive Shape Expressions. In: Ortiz-Rodriguez, F., Villazón-Terrazas, B., Tiwari, S., Bobed, C. (eds) Knowledge Graphs and Semantic Web. KGSWC 2023. Lecture Notes in Computer Science, vol 14382. Springer, Cham. https://doi.org/10.1007/978-3-031-47745-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-47745-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47744-7
Online ISBN: 978-3-031-47745-4
eBook Packages: Computer ScienceComputer Science (R0)