skip to main content
10.1145/3366030.3366054acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Uniform Access to Multiform Data Lakes using Semantic Technologies

Authors Info & Claims
Published:22 February 2020Publication History

ABSTRACT

Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, Semantic Web techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-of-the-art Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.

References

  1. Paolo Atzeni, Francesca Bugiotti, and Luca Rossi. 2012. Uniform Access to Nonrelational Database Systems: The SOS Platform.. In In CAiSE, Jolita Ralyté, Xavier Franch, Sjaak Brinkkemper, and Stanislaw Wrycza (Eds.), Vol. 7328. Springer, 160--174.Google ScholarGoogle Scholar
  2. Sören Auer, Simon Scerri, Aad Versteden, Erika Pauwels, Stasinos Konstantopoulos, Jens Lehmann, Hajira Jabeen, Ivan Ermilov, Gezim Sejdiu, Mohamed Nadjib Mami, et al. 2017. The BigDataEurope platform-supporting the variety dimension of big data. In International Conference on Web Engineering. Springer, 41--59.Google ScholarGoogle ScholarCross RefCross Ref
  3. Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5, 2 (2009), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  4. Elena Botoeva, Diego Calvanese, Benjamin Cogrel, Julien Corman, and Guohui Xiao. 2018. A Generalized Framework for Ontology-Based Data Access. In International Conference of the Italian Association for Artificial Intelligence. Springer, 166--180.Google ScholarGoogle ScholarCross RefCross Ref
  5. Olivier Curé, Robin Hecht, Chan Le Duc, and Myriam Lamolle. 2011. Data integration over NoSQL stores using access path based mappings. In International Conference on Database and Expert Systems Applications. Springer, 481--495.Google ScholarGoogle ScholarCross RefCross Ref
  6. Oliver Curé, Fadhela Kerdjoudj, David Faye, Chan Le Duc, and Myriam Lamolle. 2013. On the potential integration of an ontology-based data access approach in NoSQL stores. International Journal of Distributed Systems and Technologies (IJDST) 4, 3 (2013), 17--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. James Dixon. 2010. Pentaho, Hadoop, and Data Lakes. (2010). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes Online; accessed 06-August-2019.Google ScholarGoogle Scholar
  8. Brendan Elliott, En Cheng, Chimezie Thomas-Ogbuji, and Z Meral Ozsoyoglu. 2009. A complete translation from SPARQL into efficient SQL. In Proceedings of the International Database Engineering & Applications Symposium. ACM, 31--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kemele M Endris, Philipp D Rohde, Maria-Esther Vidal, and Sören Auer. 2019. Ontario: Federated Query Processing Against a Semantic Data Lake. In International Conference on Database and Expert Systems Applications. Springer, 379--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vijay Gadepally, Peinan Chen, Jennie Duggan, Aaron Elmore, Brandon Haynes, Jeremy Kepner, Samuel Madden, Tim Mattson, and Michael Stonebraker. 2016. The bigdawg polystore system and architecture. In High Performance Extreme Computing Conference. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  11. Victor Giannakouris, Nikolaos Papailiou, Dimitrios Tsoumakos, and Nectarios Koziris. 2016. MuSQLE: Distributed SQL query execution over multiple engine environments. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 452--461.Google ScholarGoogle ScholarCross RefCross Ref
  12. Martin Giese, Ahmet Soylu, Guillermo Vega-Gorgojo, Arild Waaler, Peter Haase, Ernesto Jiménez-Ruiz, Davide Lanti, Martín Rezk, Guohui Xiao, Özgür Özçep, et al. 2015. Optique: Zooming in on big data. Computer 48, 3 (2015), 60--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Damien Graux, Louis Jachiet, Pierre Geneves, and Nabil Layaïda. 2018. A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 693--702.Google ScholarGoogle ScholarCross RefCross Ref
  14. Eben Hewitt. 2010. Cassandra: the definitive guide. " O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  15. Boyan Kolev, Patrick Valduriez, Carlyna Bondiombouy, Ricardo Jiménez-Peris, Raquel Pau, and José Pereira. 2016. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases 34, 4 (2016), 463--503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Doug Laney. 2012. Deja VVVu: others claiming Gartner's construct for big data. Gartner Blog, Jan 14 (2012).Google ScholarGoogle Scholar
  17. Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google ScholarGoogle Scholar
  18. Jens Lehmann, Gezim Sejdiu, Lorenz Bühmann, Patrick Westphal, Claus Stadler, Ivan Ermilov, Simon Bin, Nilesh Chakraborty, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. 2017. Distributed Semantic Analytics using the SANSA Stack. In ISWC. Springer, 147--155.Google ScholarGoogle Scholar
  19. Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, and Sören Auer. 2019. Querying Data Lakes using Spark and Presto. In The World Wide Web Conference. ACM, 3574--3578.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, Sören Auer, and Jens Lehman. 2019. How to feed the Squerall with RDF and other data nuts? Proceedings of 18th International Semantic Web Conference (Poster & Demo Track) (2019).Google ScholarGoogle Scholar
  21. Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, Sören Auer, and Jens Lehman. 2019. Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources. Proceedings of 18th International Semantic Web Conference (2019).Google ScholarGoogle ScholarCross RefCross Ref
  22. Franck Michel, Catherine Faron-Zucker, and Johan Montagnat. 2016. A mapping-based method to query MongoDB documents with SPARQL. In International Conference on Database and Expert Systems Applications. Springer, 52--67.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kian Win Ong, Yannis Papakonstantinou, and Romain Vernoux. 2014. The SQL+ + unifying semi-structured query language, and an expressiveness benchmark of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631 (2014).Google ScholarGoogle Scholar
  24. Rami Sellami, Sami Bhiri, and Bruno Defude. 2016. Supporting Multi Data Stores Applications in Cloud Environments. IEEE Trans. Services Computing 9, 1 (2016), 59--71.Google ScholarGoogle Scholar
  25. Rami Sellami and Bruno Defude. 2018. Complex Queries Optimization and Evaluation over Relational and NoSQL Data Stores in Cloud Environments. IEEE Trans. Big Data 4, 2 (2018), 217--230.Google ScholarGoogle ScholarCross RefCross Ref
  26. Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, et al. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802--1813.Google ScholarGoogle Scholar
  27. D.E. Spanos, P. Stavrou, and N. Mitrou. 2010. Bringing relational databases into the semantic web: A survey. Semantic Web (2010), 1--41.Google ScholarGoogle Scholar
  28. Jörg Unbehauen and Michael Martin. 2016. Executing SPARQL queries over Mapped Document Stores with SparqlMap-M. In 12th Int. Conf. on Semantic Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ágnes Vathy-Fogarassy and Tamás Hugyák. 2017. Uniform data access platform for SQL and NoSQL database systems. Information Systems 69 (2017), 93--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marco Vogt, Alexander Stiemer, and Heiko Schuldt. 2017. Icarus: Towards a multistore database system. 2017 IEEE International Conference on Big Data (Big Data) (2017), 2490--2499.Google ScholarGoogle ScholarCross RefCross Ref
  31. Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. 2018. Ontology-based data access: A survey. IJCAI.Google ScholarGoogle Scholar
  32. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Uniform Access to Multiform Data Lakes using Semantic Technologies

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
              December 2019
              709 pages

              Copyright © 2019 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 February 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader