Skip to main content

Integrating NoSQL, Relational Database, and the Hadoop Ecosystem in an Interdisciplinary Project involving Big Data and Credit Card Transactions

  • Conference paper
  • First Online:
Information Technology - New Generations

Abstract

The project entitled as Big Data, Internet of Things, and Mobile Devices, in Portuguese Banco de Dados, Internet das Coisas e Dispositivos Moveis (BDIC-DM) was implemented at the Brazilian Aeronautics Institute of Technology (ITA) on the 1st Semester of 2015. It involved 60 graduate students within just 17 academic weeks. As a starting point for some features of real time Online Transactional Processing (OLTP) system, the Relational Database Management System (RDBMS) MySQL was used along with the NoSQL Cassandra to store transaction data generated from web portal and mobile applications. Considering batch data analysis, the Apache Hadoop Ecosystem was used for Online Analytical Processing (OLAP). The infrastructure based on the Apache Sqoop tool has allowed exporting data from the relational database MySQL to the Hadoop File System (HDFS), while Python scripts were used to export transaction data from the NoSQL database to the HDFS. The main objective of the BDIC-DM project was to implement an e-Commerce prototype system to manage credit card transactions, involving large volumes of data, by using different technologies. The used tools involved generation, storage, and consumption of Big Data. This paper describes the process of integrating NoSQL and relational database with Hadoop Cluster, during an academic project using the Scrum Agile Method. At the end, processing time significantly decreased, by using appropriate tools and available data. For future work, it is suggested the investigation of other tools and datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lees, A., & King, M. (2015). World payment report. Capgemini Consulting and Royal Bank of Scotland (pp. 1–36), Vol. 1.

    Google Scholar 

  2. Hey, T., Tansley, S., & Tolle, K. (2009). The fourth paradigm: Data-intensive scientific discovery. In E-Science and information management. Berlin/Heidelberg: Springer.

    Google Scholar 

  3. Reed, D. A., & Dongarra, J. (2015). Exascale computing and big data. Communications of the ACM, 58(7), 56–68.

    Article  Google Scholar 

  4. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209. Available at: http://dx.doi.org/10.1007/s11036-013-0489-0.

    Article  Google Scholar 

  5. Tsai, C. W., et al. (2015, October). Big data analytics: A survey. Journal of Big Data, 2, 1–32. Available at: http://dx.doi.org/10.1186/s40537-015-0030-3.

  6. Guerra, V. da C., et al. (2014, April). Interdisciplinarity and agile development: A case study on graduate courses. In ITNG 2014 – Proceedings of the 11th international conference on information technology: New generations (pp. 622–623). Las Vegas: IEEE Computer Society.

    Google Scholar 

  7. da Cunha, A. M., et al. (2008). Estudo de Caso abrangendo o Ensino Interdisciplinar de Engenharia de Software. Fórum de Educação em Engenharia de Software, 43(8), 80–88. Available at: https://goo.gl/m8JUJc.

    Google Scholar 

  8. Carneiro, E. M., et al. (2015, April). Cluster analysis and artificial neural networks: A case study in credit card fraud detection. In 2015 12th international conference on information technology – New generations (pp. 122–126). Las Vegas.

    Google Scholar 

  9. Tiwari, S. (2011). Professional NoSQL. Indianapolis: Wiley.

    Google Scholar 

  10. Hecht, R., & Jablonski, S. (2011, December). NoSQL evaluation: A use case oriented survey. In Proceedings – 2011 international conference on cloud and service computing, CSC 2011 (pp. 336–341).

    Google Scholar 

  11. Harrison, G. (2015). Next generation databases. New York: Apress.

    Book  Google Scholar 

  12. Apache-Camel (2011). Apache Cassandra. The Apache Software Foundation. Available at: http://camel.apache.org/index.html. Accessed 17 September 2016.

  13. Venner, J. (2009). Pro Hadoop. New York: Apress.

    Book  Google Scholar 

  14. Ishwarappa, & Anuradha, J. (2015). A brief introduction on big data 5Vs characteristics and hadoop technology. Procedia Computer Science, 48(C), 319–324. Available at: http://dx.doi.org/10.1016/j.procs.2015.04.188.

    Article  Google Scholar 

  15. Bhosale, H. S., & Gadekar, D. P. (2014). A review paper on big data and Hadoop. International Journal of Scientific and Research Publications, 4(10), 2250–3153. Available at: www.ijsrp.org.

    Google Scholar 

  16. Shvachko, K., et al. (2010). The Hadoop distributed file system. In 2010 I.E. 26th symposium on mass storage systems and technologies, MSST2010 (pp. 1–10). Incline Village.

    Google Scholar 

  17. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. Available at: http://doi.acm.org/10.1145/1327452.1327492.

    Article  Google Scholar 

  18. Thusoo, A., et al. (2010). Hive – A petabyte scale data warehouse using hadoop. In Proceedings – International conference on data engineering (pp. 996–1005). Long Beach.

    Google Scholar 

  19. Codd, E. F. (1990). The relational model for database management: Version 2. Boston: Addison-Wesley Longman Publishing Co.

    MATH  Google Scholar 

  20. Sqoop. The Apache Software Foundation. Available at: http://sqoop.apache.org/. Accessed 20 Sept 2016.

  21. White, T. (2015). Hadoop: The definitive guide (4th ed.). Sebastopol: O’Reilly Media, Inc..

    Google Scholar 

  22. Huai, Y., et al., (2014). Major technical advancements in apache hive. In SIGMOD’14. Snowbird.

    Google Scholar 

  23. Generatedata. Available at: http://www.generatedata.com/. Accessed 20 Sept 2016.

  24. HiveRunner. Available at: https://github.com/klarna/HiveRunner. Accessed 21 Sept 2016.

  25. SQLite. Available at: https://www.sqlite.org/. Accessed 12 Sept 2016.

  26. Apache Spark™ – Lightning-fast cluster computing. Available at: http://spark.apache.org/. Accessed 20 Sept 2016.

Download references

Acknowledgment

The authors would like to thank the Brazilian Aeronautics Institute of Technology (ITA), for the support and contribution during this project, the 2RP Net enterprise, and the SPOT Project for supporting some hardware and software infrastructure to develop this academic proof of concept.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romulo Alceu Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Rodrigues, R.A., Filho, L.A.L., Gonçalves, G.S., Mialaret, L.F.S., da Cunha, A.M., Dias, L.A.V. (2018). Integrating NoSQL, Relational Database, and the Hadoop Ecosystem in an Interdisciplinary Project involving Big Data and Credit Card Transactions. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 558. Springer, Cham. https://doi.org/10.1007/978-3-319-54978-1_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54978-1_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54977-4

  • Online ISBN: 978-3-319-54978-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics