Skip to main content

Managing Cloud-Based Big Data Platforms: A Reference Architecture and Cost Perspective

  • Chapter
  • First Online:
Big Data Management

Abstract

The development of big data applications is closely linked to the availability of scalable and cost-effective computing capacities for storing and processing data in a distributed and parallel fashion, respectively. Cloud providers already offer a portfolio of various cloud services for supporting big data applications. Large companies like Netflix and Spotify use those cloud services to operate their big data applications. In this chapter, we propose a generic reference architecture for implementing big data applications based on state-of-the-art cloud services. The applicability and implementation of our reference architecture is demonstrated for three leading cloud providers. Given these implementations, we analyze main pricing schemes and cost factors to compare respective cloud services based on a big data streaming use case. Derived findings are essential for cloud-based big data management from a cost perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A shared nothing architecture denotes a distributed computing architecture consisting of nodes that only possess and utilize their own computing resources including memory and disk storage. This facilitates, inter alia, a large scale horizontal scaling using commodity machines based on a distributed file system.

  2. 2.

    In a shared storage environment, a central file storage system is shared among the nodes.

  3. 3.

    Abbr. for Structured Query Language.

  4. 4.

    The technical documentation and pricing details can be found on the website of the respective cloud providers: AWS (https://aws.amazon.com), Google Cloud (https://cloud.google.com), and Microsoft Azure (https://azure.microsoft.com/).

  5. 5.

    The Google Compute Engine Unit (GCEU) is used as a measure to calculate the total capacity of a virtual central processing unit (vCPU). Google’s Compute Engine defines the GCEU for each VM instance type depending on the number of vCPUs.

  6. 6.

    A streaming unit is a measure for expressing the computing capacity in terms of CPU and memory with a maximum throughput of 1 MB/s.

  7. 7.

    Abbr. for Solid-State Drive.

  8. 8.

    Allows a direct transfer of data from Pub/Sub to BigQuery.

  9. 9.

    Abbr. for Application Programming Interface.

References

  1. Assunção MD, Calheiros RN, Bianchi S, Netto MA, Buyya R (2015) Big data computing and clouds: trends and future directions. J Parallel Distrib Comput 79:3–15

    Article  Google Scholar 

  2. AWS (2016) Big data analytics options on aws. https://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf

  3. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209

    Article  Google Scholar 

  4. Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endowment 5(12):1802–1813

    Article  Google Scholar 

  5. Creeger M (2009) Cloud computing: an overview. ACM Queue 7(5):2

    Google Scholar 

  6. Gartner (2015) Magic quadrant for public cloud storage services, worldwide. http://www.gartner.com/technology/reprints.do?id=1-2IH2LGI&ct=150626&st=sb

  7. Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA (2013) BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 1197–1208

    Google Scholar 

  8. Heilig L, Lalla-Ruiz E, Voß S (2016) A cloud brokerage approach for solving the resource management problem in multi-cloud environments. Comput Ind Eng 95:16–26

    Article  Google Scholar 

  9. Heilig L, Voß S (2014) Decision analytics for cloud computing: a classification and literature review. In: Newman A, Leung J (eds) Tutorials in operations research—bridging data and decisions. INFORMS, San Francisco, pp 1–26

    Google Scholar 

  10. Heilig L, Voß S (2014) A scientometric analysis of cloud computing literature. IEEE Trans Cloud Comput 2(3):266–278

    Article  Google Scholar 

  11. Jensen M, Schwenk J, Gruschka N, Iacono LL (2009) On technical security issues in cloud computing. In: Proceedings of the IEEE international conference on cloud computing (CLOUD). IEEE, Bangalore, India, pp 109–116

    Google Scholar 

  12. Krishnan S, Tse E (2013) Hadoop platform as a service in the cloud. Technical report, Netflix. http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

  13. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N (2011) Big data, analytics and the path from insights to value. MIT Sloan Manage Rev 52(2):21

    Google Scholar 

  14. Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM international conference on computing frontiers (CF). ACM, Ischia, Italy, pp 53:1–53:8

    Google Scholar 

  15. Maravić I (2016) Spotify’s event delivery—the road to the cloud (part III). https://labs.spotify.com/2016/03/10/spotifys-event-delivery-the-road-to-the-cloud-part-iii/

  16. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies (MSST). Incline Village, NV, USA, pp 1–10

    Google Scholar 

  17. Talia D (2013) Clouds for scalable big data analytics. IEEE Comput 46(5):98–101

    Google Scholar 

  18. Tallon PP (2013) Corporate governance of big data: perspectives on value, risk, and cost. Computer 46(6):32–38

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonard Heilig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Heilig, L., Voß, S. (2017). Managing Cloud-Based Big Data Platforms: A Reference Architecture and Cost Perspective. In: García Márquez, F., Lev, B. (eds) Big Data Management . Springer, Cham. https://doi.org/10.1007/978-3-319-45498-6_2

Download citation

Publish with us

Policies and ethics