Abstract
The project entitled as Big Data, Internet of Things, and Mobile Devices, in Portuguese Banco de Dados, Internet das Coisas e Dispositivos Moveis (BDIC-DM) was implemented at the Brazilian Aeronautics Institute of Technology (ITA) on the 1st Semester of 2015. It involved 60 graduate students within just 17 academic weeks. As a starting point for some features of real time Online Transactional Processing (OLTP) system, the Relational Database Management System (RDBMS) MySQL was used along with the NoSQL Cassandra to store transaction data generated from web portal and mobile applications. Considering batch data analysis, the Apache Hadoop Ecosystem was used for Online Analytical Processing (OLAP). The infrastructure based on the Apache Sqoop tool has allowed exporting data from the relational database MySQL to the Hadoop File System (HDFS), while Python scripts were used to export transaction data from the NoSQL database to the HDFS. The main objective of the BDIC-DM project was to implement an e-Commerce prototype system to manage credit card transactions, involving large volumes of data, by using different technologies. The used tools involved generation, storage, and consumption of Big Data. This paper describes the process of integrating NoSQL and relational database with Hadoop Cluster, during an academic project using the Scrum Agile Method. At the end, processing time significantly decreased, by using appropriate tools and available data. For future work, it is suggested the investigation of other tools and datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lees, A., & King, M. (2015). World payment report. Capgemini Consulting and Royal Bank of Scotland (pp. 1–36), Vol. 1.
Hey, T., Tansley, S., & Tolle, K. (2009). The fourth paradigm: Data-intensive scientific discovery. In E-Science and information management. Berlin/Heidelberg: Springer.
Reed, D. A., & Dongarra, J. (2015). Exascale computing and big data. Communications of the ACM, 58(7), 56–68.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209. Available at: http://dx.doi.org/10.1007/s11036-013-0489-0.
Tsai, C. W., et al. (2015, October). Big data analytics: A survey. Journal of Big Data, 2, 1–32. Available at: http://dx.doi.org/10.1186/s40537-015-0030-3.
Guerra, V. da C., et al. (2014, April). Interdisciplinarity and agile development: A case study on graduate courses. In ITNG 2014 – Proceedings of the 11th international conference on information technology: New generations (pp. 622–623). Las Vegas: IEEE Computer Society.
da Cunha, A. M., et al. (2008). Estudo de Caso abrangendo o Ensino Interdisciplinar de Engenharia de Software. Fórum de Educação em Engenharia de Software, 43(8), 80–88. Available at: https://goo.gl/m8JUJc.
Carneiro, E. M., et al. (2015, April). Cluster analysis and artificial neural networks: A case study in credit card fraud detection. In 2015 12th international conference on information technology – New generations (pp. 122–126). Las Vegas.
Tiwari, S. (2011). Professional NoSQL. Indianapolis: Wiley.
Hecht, R., & Jablonski, S. (2011, December). NoSQL evaluation: A use case oriented survey. In Proceedings – 2011 international conference on cloud and service computing, CSC 2011 (pp. 336–341).
Harrison, G. (2015). Next generation databases. New York: Apress.
Apache-Camel (2011). Apache Cassandra. The Apache Software Foundation. Available at: http://camel.apache.org/index.html. Accessed 17 September 2016.
Venner, J. (2009). Pro Hadoop. New York: Apress.
Ishwarappa, & Anuradha, J. (2015). A brief introduction on big data 5Vs characteristics and hadoop technology. Procedia Computer Science, 48(C), 319–324. Available at: http://dx.doi.org/10.1016/j.procs.2015.04.188.
Bhosale, H. S., & Gadekar, D. P. (2014). A review paper on big data and Hadoop. International Journal of Scientific and Research Publications, 4(10), 2250–3153. Available at: www.ijsrp.org.
Shvachko, K., et al. (2010). The Hadoop distributed file system. In 2010 I.E. 26th symposium on mass storage systems and technologies, MSST2010 (pp. 1–10). Incline Village.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. Available at: http://doi.acm.org/10.1145/1327452.1327492.
Thusoo, A., et al. (2010). Hive – A petabyte scale data warehouse using hadoop. In Proceedings – International conference on data engineering (pp. 996–1005). Long Beach.
Codd, E. F. (1990). The relational model for database management: Version 2. Boston: Addison-Wesley Longman Publishing Co.
Sqoop. The Apache Software Foundation. Available at: http://sqoop.apache.org/. Accessed 20 Sept 2016.
White, T. (2015). Hadoop: The definitive guide (4th ed.). Sebastopol: O’Reilly Media, Inc..
Huai, Y., et al., (2014). Major technical advancements in apache hive. In SIGMOD’14. Snowbird.
Generatedata. Available at: http://www.generatedata.com/. Accessed 20 Sept 2016.
HiveRunner. Available at: https://github.com/klarna/HiveRunner. Accessed 21 Sept 2016.
SQLite. Available at: https://www.sqlite.org/. Accessed 12 Sept 2016.
Apache Spark™ – Lightning-fast cluster computing. Available at: http://spark.apache.org/. Accessed 20 Sept 2016.
Acknowledgment
The authors would like to thank the Brazilian Aeronautics Institute of Technology (ITA), for the support and contribution during this project, the 2RP Net enterprise, and the SPOT Project for supporting some hardware and software infrastructure to develop this academic proof of concept.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Rodrigues, R.A., Filho, L.A.L., Gonçalves, G.S., Mialaret, L.F.S., da Cunha, A.M., Dias, L.A.V. (2018). Integrating NoSQL, Relational Database, and the Hadoop Ecosystem in an Interdisciplinary Project involving Big Data and Credit Card Transactions. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 558. Springer, Cham. https://doi.org/10.1007/978-3-319-54978-1_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-54978-1_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54977-4
Online ISBN: 978-3-319-54978-1
eBook Packages: EngineeringEngineering (R0)