skip to main content
10.1145/3216122.3216130acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Modeling Data Lake Metadata with a Data Vault

Published:18 June 2018Publication History

ABSTRACT

With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.

References

  1. Hassan H. Alrehamy and Coral Walker. 2015. Personal Data Lake With Data Gravity Pull. In IEEE 5th International Conference on Big Data and Cloud Computing (BDCloud 2015), Dalian, China. IEEE Computer Society, Washington, DC, USA, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Carlyna Bondiombouy and Patrick Valduriez. 2010. Query Processing in Multi-store Systems: an overview. Technical Report RR-8890. INRIA Sophia Antipolis-Méditerranée.Google ScholarGoogle Scholar
  3. James Dixon. 2010. Pentaho, Hadoop and Data Lakes. https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/.Google ScholarGoogle Scholar
  4. Fentaw Awel Eshetu. 2014. Data Vault Modelling: An Introductory Guide. B.Sc. Thesis, Helsinki Metropolia University of Applied Sciences, Finland.Google ScholarGoogle Scholar
  5. Huang Fang. 2015. Managing Data Lakes in Big Data Era: What's a data lake and why has it became popular in data management ecosystem. In 5th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems (CYBER 2015), Shenyang, China. 820--824.Google ScholarGoogle ScholarCross RefCross Ref
  6. Pravin Ganore. 2015. Introduction To The Concept Of Data Lake And Its Benefits. https://www.esds.co.in/blog/introduction-to-the-concept-of-data-lake-and-its-benefits/.Google ScholarGoogle Scholar
  7. Harold Giménez. 2011. PostgreSQL Performance Considerations. https://robots.thoughtbot.com/postgresql-performance-considerations.Google ScholarGoogle Scholar
  8. Hans Hultgren. 2012. Data vault modelling guide -- Introductory guide to data vault modelling. Genesee Academy. https://hanshultgren.files.wordpress.com/2012/09/data-vault-modeling-guide.pdf.Google ScholarGoogle Scholar
  9. Bill Inmon. 2016. Data Lake Architecture: Designing the Data Lake and avoiding the garbage dump. Technics Publications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tomcy John and Pankaj Misra. 2017. Data Lake for Enterprises: Lambda Architecture for building enterprise data systems. Packt Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Vladan Jovanovic and Ivan Bojicic. 2012. Conceptual Data Vault Model. In Southern Association for Information Systems Conference, Atlanta, GA, USA. Association for Information Systems, 131--136.Google ScholarGoogle Scholar
  12. Eric Kergosien. 2017. TEchnologies de l'information et de la communication au Cœur du Territoire NumérIQue pour la valorisation du patrimoine. https://tectoniq.meshs.fr/.Google ScholarGoogle Scholar
  13. Eric Kergosien, B. Jacquemin, M. Severo, and S. Chaudron. 2015. Vers l'interopérabilité des données hétérogènes liées au patrimoine industriel textile. In 18 colloque international sur le document numérique (CIDE18), Montpellier, France. 15.Google ScholarGoogle Scholar
  14. Pwint Phyu Khine and Zhao Shun Wang. 2017. Data Lake: A New Ideology in Big Data Era. In 4th International Conference on Wireless Communication and Sensor Network (WCSN 2017), Wuhan, China (ITM Web of Conferences), Vol. 17. 1--6.Google ScholarGoogle Scholar
  15. Dragoljub Krneta, Vladan Jovanovic, and Zoran Marjanovic. 2014. A direct approach to physical Data Vault design. Computer Science and Information Systems 11, 2 (2014), 569--599.Google ScholarGoogle ScholarCross RefCross Ref
  16. Dan Linstedt. 2011. Super Charge your Data Warehouse: Invaluable Data Modeling Rules to Implement Your Data Vault. CreateSpace Independent Publishing.Google ScholarGoogle Scholar
  17. Dan Linstedt. 2015. Data Vault Basics. https://danlinstedt.com/solutions-2/data-vault-basics/.Google ScholarGoogle Scholar
  18. Dan Linstedt and Michael Olschimke. 2015. Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann, Cambridge, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Natalia Miloslavskaya and Alexander Tolstoy. 2016. Big Data, Fast Data and Data Lake Concepts. Procedia Computer Science 88 (2016), 300--305.Google ScholarGoogle ScholarCross RefCross Ref
  20. Daniel E. O'Leary. 2014. Embedding AI and Crowdsourcing in the Big Data Lake. IEEE Intelligent Systems 29, 5 (November 2014), 70--73.Google ScholarGoogle ScholarCross RefCross Ref
  21. Nishara Pathirana. 2015. Modeling territorial knowledge from web data about natural and cultural heritage. M.Sc. Thesis, Université Lumière Lyon 2, France.Google ScholarGoogle Scholar
  22. Glenn Norman Paulley. 2000. Exploiting Functional Dependence in Query Optimization. Ph.D. Dissertation. University of Waterloo, Canada.Google ScholarGoogle Scholar
  23. Olle Regardt, Lars Rönnbäck, Maria Bergholtz, Paul Johannesson, and Petia Wohed. 2009. Anchor Modeling. In 288th International Conference on Conceptual Modeling (ER 2009), Gramado, Brazil (Lecture Notes in Computer Science), Vol. 5829. Springer, Heidelberg, Germany, 234--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lars Rönnbäck and Hans Hultgren. 2013. Comparing Anchor Modeling with Data Vault Modeling. https://hanshultgren.files.wordpress.com/2013/06/modeling_compare_05_larshans.pdf.Google ScholarGoogle Scholar
  25. Lars Rönnbäck, Olle Regardt, Maria Bergholtz, Paul Johannesson, and Petia Wohed. 2010. Anchor modeling -- Agile information modeling in evolving data environments. Data and Knowledge Engineering 69, 12 (2010), 1229--1253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Brian Stein and Alan Morrison. 2014. The enterprise data lake: Better integration and deeper analytics. Technology Forecast, 1. http://www.pwc.com/us/en/technology-forecast/2014/cloud-computing/assets/pdf/pwc-technology-forecast-data-lakes.pdf.Google ScholarGoogle Scholar
  27. Ran Tan, Rada Chirkova, Vijay Gadepally, and Timothy G. Mattson. 2017. Enabling Query Processing across Heterogeneous Data Models: A Survey. In 2017 IEEE International Conference on Big Data (BIGDATA 2017), Boston, USA. 3211--3220.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Modeling Data Lake Metadata with a Data Vault

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IDEAS '18: Proceedings of the 22nd International Database Engineering & Applications Symposium
      June 2018
      328 pages
      ISBN:9781450365277
      DOI:10.1145/3216122

      Copyright © 2018 ACM

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate74of210submissions,35%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader