Abstract
We address the problem of scalable distributed reasoning, proposing a technique for materialising the closure of an RDF graph based on MapReduce. We have implemented our approach on top of Hadoop and deployed it on a compute cluster of up to 64 commodity machines. We show that a naive implementation on top of MapReduce is straightforward but performs badly and we present several non-trivial optimisations. Our algorithm is scalable and allows us to compute the RDFS closure of 865M triples from the Web (producing 30B triples) in less than two hours, faster than any other published approach.
Download to read the full chapter text
Chapter PDF
References
Battré, D., Heine, F., Höing, A., Kao, O.: On triple dissemination, forward-chaining, and load balancing in DHT based RDF stores. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 343–354. Springer, Heidelberg (2007)
Cai, M., Frank, M.: RDFPeers: A scalable distributed RDF repository based on a structured peer-to-peer network. In: WWW Conference (2004)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI), pp. 137–147 (2004)
Fang, Q., Zhao, Y., Yang, G.-W., Zheng, W.-M.: Scalable distributed ontology reasoning using DHT-based partitioning. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 91–105. Springer, Heidelberg (2008)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3, 158–182 (2005)
Hayes, P. (ed.): RDF Semantics. W3C Recommendation (2004)
Hogan, A., Harth, A., Polleres, A.: Scalable authoritative OWL reasoning for the web. Int. J. on Semantic Web and Information Systems 5(2) (2009)
Hogan, A., Harth, A., Polleres, A.: Saor: Authoritative reasoning for the web. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 76–90. Springer, Heidelberg (2008)
ter Horst, H.J.: Completeness, decidability and complexity of entailment for RDF schema and a semantic extension involving the OWL vocabulary. Journal of Web Semantics 3(2–3), 79–115 (2005)
Kaoudi, Z., Miliaraki, I., Koubarakis, M.: RDFS reasoning and query answering on top of DHTs. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 499–516. Springer, Heidelberg (2008)
Kiryakov, A., Ognyanov, D., Manov, D.: OWLIM – a pragmatic semantic repository for OWL. In: Web Information Systems Engineering (WISE) Workshops, pp. 182–192 (2005)
Mika, P., Tummarello, G.: Web semantics in the clouds. IEEE Intelligent Systems 23(5), 82–87 (2008)
MacCartney, B., McIlraith, S.A., Amir, E., Uribe, T.: Practical partition-based theorem proving for large knowledge bases. In: IJCAI (2003)
Oren, E., Kotoulas, S., et al.: Marvin: A platform for large-scale analysis of Semantic Web data. In: Int. Web Science conference (2009)
Soma, R., Prasanna, V.: Parallel inferencing for OWL knowledge bases. In: Int. Conf. on Parallel Processing, pp. 75–82 (2008)
Urbani, J.: Scalable Distributed RDFS/OWL Reasoning using MapReduce. Master’s thesis, Vrije Universiteit Amsterdam (2009), http://www.few.vu.nl/~jui200/thesis.pdf
Zhou, J., Ma, L., Liu, Q., Zhang, L., Yu, Y., Pan, Y.: Minerva: A scalable OWL ontology storage and inference system. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 429–443. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F. (2009). Scalable Distributed Reasoning Using MapReduce. In: Bernstein, A., et al. The Semantic Web - ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol 5823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04930-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-04930-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04929-3
Online ISBN: 978-3-642-04930-9
eBook Packages: Computer ScienceComputer Science (R0)