Abstract
The development of big data applications is closely linked to the availability of scalable and cost-effective computing capacities for storing and processing data in a distributed and parallel fashion, respectively. Cloud providers already offer a portfolio of various cloud services for supporting big data applications. Large companies like Netflix and Spotify use those cloud services to operate their big data applications. In this chapter, we propose a generic reference architecture for implementing big data applications based on state-of-the-art cloud services. The applicability and implementation of our reference architecture is demonstrated for three leading cloud providers. Given these implementations, we analyze main pricing schemes and cost factors to compare respective cloud services based on a big data streaming use case. Derived findings are essential for cloud-based big data management from a cost perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A shared nothing architecture denotes a distributed computing architecture consisting of nodes that only possess and utilize their own computing resources including memory and disk storage. This facilitates, inter alia, a large scale horizontal scaling using commodity machines based on a distributed file system.
- 2.
In a shared storage environment, a central file storage system is shared among the nodes.
- 3.
Abbr. for Structured Query Language.
- 4.
The technical documentation and pricing details can be found on the website of the respective cloud providers: AWS (https://aws.amazon.com), Google Cloud (https://cloud.google.com), and Microsoft Azure (https://azure.microsoft.com/).
- 5.
The Google Compute Engine Unit (GCEU) is used as a measure to calculate the total capacity of a virtual central processing unit (vCPU). Google’s Compute Engine defines the GCEU for each VM instance type depending on the number of vCPUs.
- 6.
A streaming unit is a measure for expressing the computing capacity in terms of CPU and memory with a maximum throughput of 1 MB/s.
- 7.
Abbr. for Solid-State Drive.
- 8.
Allows a direct transfer of data from Pub/Sub to BigQuery.
- 9.
Abbr. for Application Programming Interface.
References
Assunção MD, Calheiros RN, Bianchi S, Netto MA, Buyya R (2015) Big data computing and clouds: trends and future directions. J Parallel Distrib Comput 79:3–15
AWS (2016) Big data analytics options on aws. https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209
Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endowment 5(12):1802–1813
Creeger M (2009) Cloud computing: an overview. ACM Queue 7(5):2
Gartner (2015) Magic quadrant for public cloud storage services, worldwide. http://www.gartner.com/technology/reprints.do?id=1-2IH2LGI&ct=150626&st=sb
Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 1197–1208
Heilig L, Lalla-Ruiz E, Voß S (2016) A cloud brokerage approach for solving the resource management problem in multi-cloud environments. Comput Ind Eng 95:16–26
Heilig L, Voß S (2014) Decision analytics for cloud computing: a classification and literature review. In: Newman A, Leung J (eds) Tutorials in operations research—bridging data and decisions. INFORMS, San Francisco, pp 1–26
Heilig L, Voß S (2014) A scientometric analysis of cloud computing literature. IEEE Trans Cloud Comput 2(3):266–278
Jensen M, Schwenk J, Gruschka N, Iacono LL (2009) On technical security issues in cloud computing. In: Proceedings of the IEEE international conference on cloud computing (CLOUD). IEEE, Bangalore, India, pp 109–116
Krishnan S, Tse E (2013) Hadoop platform as a service in the cloud. Technical report, Netflix. http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N (2011) Big data, analytics and the path from insights to value. MIT Sloan Manage Rev 52(2):21
Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM international conference on computing frontiers (CF). ACM, Ischia, Italy, pp 53:1–53:8
Maravić I (2016) Spotify’s event delivery—the road to the cloud (part III). https://labs.spotify.com/2016/03/10/spotifys-event-delivery-the-road-to-the-cloud-part-iii/
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies (MSST). Incline Village, NV, USA, pp 1–10
Talia D (2013) Clouds for scalable big data analytics. IEEE Comput 46(5):98–101
Tallon PP (2013) Corporate governance of big data: perspectives on value, risk, and cost. Computer 46(6):32–38
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Heilig, L., Voß, S. (2017). Managing Cloud-Based Big Data Platforms: A Reference Architecture and Cost Perspective. In: García Márquez, F., Lev, B. (eds) Big Data Management . Springer, Cham. https://doi.org/10.1007/978-3-319-45498-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-45498-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45497-9
Online ISBN: 978-3-319-45498-6
eBook Packages: Business and ManagementBusiness and Management (R0)