Skip to main content
Log in

DLToDW: Transferring Relational and NoSQL Databases from a Data Lake

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Over the past decade, digital transformation has led to the evolution of databases towards Big Data. A need to collect and analyze data from different sources has emerged. At the same time, traditional decision support systems are unable to meet the growing needs of modern businesses to integrate and analyze a wide variety of generated data. As a result, most organizations need to convert their data stored in relational systems to NoSQL or "Not only SQL" systems that are based on flexible models and schemas. Our work is part of a medical application that must allow health professionals to analyze complex data for decision making. We propose mechanisms to extract data from a Data Lake and store them in a NoSQL Data Warehouse. This will allow to perform, in a second time, decisional analysis facilitated by the features offered by NoSQL systems (richness of data structures, query language, access performances). In this article, we present a process for ingesting data from a Data Lake into a Data Warehouse. The ingestion consists, first, in transferring relational and NoSQL DBs extracted from the Data Lake into a single NoSQL DB (the Data Warehouse), second, in merging so-called "similar" classes and third, in converting the links into references between objects. To automate this process, we used the Model Driven Architecture (MDA) which provides a schema transformation environment. From the physical schemas describing a Data Lake, we propose transformation rules that allow to create a Data Warehouse stored under a document-oriented NoSQL system. An experimentation has been performed for a medical application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. OrientDB https://orientdb.org/.

  2. Object Data Management Group http://www.odbms.org/odmg-standard/.

  3. The Object Management Group (OMG) https://www.omg.org.

  4. Query/View/Transform (QVT) https://www.omg.org/spec/QVT/1.2/PDF.

  5. Unified Modeling Language (UML) https://www.omg.org/spec/UML/2.5.1/About-UML/.

  6. The principles of exploitation of the ontology are not detailed in this article.

  7. http://www.odbms.org/wp-content/uploads/2013/11/001.04-Ullman-CS145-ODL-OQL-Fall-2004.pdf.

  8. MySQL https://www.mysql.com.

  9. PostgreSQL https://www.postgresql.org.

  10. Mon0goDB https://www.mongodb.com/.

  11. The Object Management Group (OMG) https://www.omg.org.

  12. Eclipse Modeling Framework (EMF) https://www.eclipse.org/modeling/emf.

  13. XML Metadata Interchange (XMI) https://www.omg.org/spec/XMI/2.5.1/About-XMI/.

References

  1. Couto J, Borges O, Ruiz DD, Marczak S, Prikladnicki R. A mapping study about Data Lakes: an improved definition and possible architectures. SEKE. 2019. https://doi.org/10.18293/SEKE2019-129.

    Article  Google Scholar 

  2. DB-Engines Ranking. DB-Engines. https://db-engines.com/en/ranking/document+store. Accessed 17 Jan 2022.

  3. Kuszera EM, Peres LM, Fabro MDD. Toward RDB to NoSQL: transforming data with metamorfose framework. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol Cyprus, April 2019, pp 456–463. https://doi.org/10.1145/3297280.3299734.

  4. Mahmood AA. Automated algorithm for data migration from relational to NoSQL databases. Al-Nahrain J Eng Sci. 2018;21:60–5.

    Google Scholar 

  5. Stanescu L, Brezovan M, Burdescu DD. Automatic mapping of MySQL databases to NoSQL MongoDB. 2016, pp 837–840. https://doi.org/10.15439/2016F45.

  6. Liyanaarachchi G, Kasun L, Nimesha M, Lahiru K, Karunasena A. MigDB—relational to NoSQL mapper. In: 2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS), 2016; pp 1–6. https://doi.org/10.1109/ICIAFS.2016.7946576.

  7. Mallek H, Ghozzi F, Teste O, Gargouri F. BigDimETL with NoSQL Database. Procedia Comput Sci. 2018;126:798–807. https://doi.org/10.1016/j.procs.2018.08.014.

    Article  Google Scholar 

  8. Yangui R, Nabli A, Gargouri F. ETL based framework for NoSQL warehousing. In: Information systems. Cham: Springer; 2017. p. 40–53. https://doi.org/10.1007/978-3-319-65930-5_4.

    Chapter  Google Scholar 

  9. Wijaya YS, Arman AA. A framework for data migration between different datastore of NoSQL Database. In: 2018 International Conference on ICT for Smart Society (ICISS), 2018, p 1–6. https://doi.org/10.1109/ICTSS.2018.8549944.

  10. Dabbèchi H, Haddar N, Elghazel H, Haddar K. Social media data integration: from data lake to NoSQL Data Warehouse. 2021; pp 701–710. https://doi.org/10.1007/978-3-030-71187-0_64.

  11. Candel CJF, Ruiz DS, García-Molina JJ. A unified metamodel for NoSQL and relational databases. ArXiv210506494 Cs, 2021, [Online]. http://arxiv.org/abs/2105.06494. Accessed 21 Jun 2021.

  12. Özsu MT, Valduriez P. Distributed and parallel database systems. ACM Comput Surv. 1996;28(1):125–8. https://doi.org/10.1145/234313.234368.

    Article  Google Scholar 

  13. Azqueta-Alzúaz A, Patiño-Martinez M, Brondino I, Jimenez-Peris R. Massive data load on distributed database systems over HBase. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, pp 776–779. https://doi.org/10.1109/CCGRID.2017.124.

  14. Machado F, Saccol D, Piveta E, Padilha R, Ribeiro E. A text similarity-based process for extracting JSON conceptual schemas. In: Proceedings of the 23rd International Conference on Enterprise Information Systems, Online Streaming, 2021, pp 264–271. https://doi.org/10.5220/0010475102640271.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rym Jemmali.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jemmali, R., Abdelhedi, F. & Zurfluh, G. DLToDW: Transferring Relational and NoSQL Databases from a Data Lake. SN COMPUT. SCI. 3, 381 (2022). https://doi.org/10.1007/s42979-022-01287-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01287-7

Keywords

Navigation