Managing Cloud-Based Big Data Platforms: A Reference Architecture and Cost Perspective

Heilig, Leonard; Voß, Stefan

doi:10.1007/978-3-319-45498-6_2

Leonard Heilig³ &
Stefan Voß³

6235 Accesses
6 Citations

Abstract

The development of big data applications is closely linked to the availability of scalable and cost-effective computing capacities for storing and processing data in a distributed and parallel fashion, respectively. Cloud providers already offer a portfolio of various cloud services for supporting big data applications. Large companies like Netflix and Spotify use those cloud services to operate their big data applications. In this chapter, we propose a generic reference architecture for implementing big data applications based on state-of-the-art cloud services. The applicability and implementation of our reference architecture is demonstrated for three leading cloud providers. Given these implementations, we analyze main pricing schemes and cost factors to compare respective cloud services based on a big data streaming use case. Derived findings are essential for cloud-based big data management from a cost perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A shared nothing architecture denotes a distributed computing architecture consisting of nodes that only possess and utilize their own computing resources including memory and disk storage. This facilitates, inter alia, a large scale horizontal scaling using commodity machines based on a distributed file system.
2.
In a shared storage environment, a central file storage system is shared among the nodes.
3.
Abbr. for Structured Query Language.
4.
The technical documentation and pricing details can be found on the website of the respective cloud providers: AWS (https://aws.amazon.com), Google Cloud (https://cloud.google.com), and Microsoft Azure (https://azure.microsoft.com/).
5.
The Google Compute Engine Unit (GCEU) is used as a measure to calculate the total capacity of a virtual central processing unit (vCPU). Google’s Compute Engine defines the GCEU for each VM instance type depending on the number of vCPUs.
6.
A streaming unit is a measure for expressing the computing capacity in terms of CPU and memory with a maximum throughput of 1 MB/s.
7.
Abbr. for Solid-State Drive.
8.
Allows a direct transfer of data from Pub/Sub to BigQuery.
9.
Abbr. for Application Programming Interface.

References

Assunção MD, Calheiros RN, Bianchi S, Netto MA, Buyya R (2015) Big data computing and clouds: trends and future directions. J Parallel Distrib Comput 79:3–15
Article Google Scholar
AWS (2016) Big data analytics options on aws. https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209
Article Google Scholar
Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endowment 5(12):1802–1813
Article Google Scholar
Creeger M (2009) Cloud computing: an overview. ACM Queue 7(5):2
Google Scholar
Gartner (2015) Magic quadrant for public cloud storage services, worldwide. http://www.gartner.com/technology/reprints.do?id=1-2IH2LGI&ct=150626&st=sb
Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 1197–1208
Google Scholar
Heilig L, Lalla-Ruiz E, Voß S (2016) A cloud brokerage approach for solving the resource management problem in multi-cloud environments. Comput Ind Eng 95:16–26
Article Google Scholar
Heilig L, Voß S (2014) Decision analytics for cloud computing: a classification and literature review. In: Newman A, Leung J (eds) Tutorials in operations research—bridging data and decisions. INFORMS, San Francisco, pp 1–26
Google Scholar
Heilig L, Voß S (2014) A scientometric analysis of cloud computing literature. IEEE Trans Cloud Comput 2(3):266–278
Article Google Scholar
Jensen M, Schwenk J, Gruschka N, Iacono LL (2009) On technical security issues in cloud computing. In: Proceedings of the IEEE international conference on cloud computing (CLOUD). IEEE, Bangalore, India, pp 109–116
Google Scholar
Krishnan S, Tse E (2013) Hadoop platform as a service in the cloud. Technical report, Netflix. http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N (2011) Big data, analytics and the path from insights to value. MIT Sloan Manage Rev 52(2):21
Google Scholar
Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM international conference on computing frontiers (CF). ACM, Ischia, Italy, pp 53:1–53:8
Google Scholar
Maravić I (2016) Spotify’s event delivery—the road to the cloud (part III). https://labs.spotify.com/2016/03/10/spotifys-event-delivery-the-road-to-the-cloud-part-iii/
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies (MSST). Incline Village, NV, USA, pp 1–10
Google Scholar
Talia D (2013) Clouds for scalable big data analytics. IEEE Comput 46(5):98–101
Google Scholar
Tallon PP (2013) Corporate governance of big data: perspectives on value, risk, and cost. Computer 46(6):32–38
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Systems (IWI), University of Hamburg, Hamburg, Germany
Leonard Heilig & Stefan Voß

Authors

Leonard Heilig
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Voß
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonard Heilig .

Editor information

Editors and Affiliations

ETSI Industriales de Ciudad Real, University of Castilla-La Mancha, Ciudad Real, Spain
Fausto Pedro García Márquez
Drexel University, Philadelphia, Pennsylvania, USA
Benjamin Lev

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heilig, L., Voß, S. (2017). Managing Cloud-Based Big Data Platforms: A Reference Architecture and Cost Perspective. In: García Márquez, F., Lev, B. (eds) Big Data Management . Springer, Cham. https://doi.org/10.1007/978-3-319-45498-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45498-6_2
Published: 17 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45497-9
Online ISBN: 978-3-319-45498-6
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics