Abstract
Scalable web systems are directly related to distributed storage systems used to process large amounts of data (big data). An example of such a system is Hadoop with its many extensions supporting data storage such as SQL-on-Hadoop systems and the “Parquet” file format. Another kind of systems for storing and processing big data are NoSQL databases, such as HBase, which are used in applications requiring fast random access. The Kudu system was created to combine the advantages of Hadoop and HBase and enable both effective data set analysis and fast random access. As subject of the research, performance analysis of the mentioned systems was performed. The experiment was conducted in the Amazon Web Services public cloud environment, where the cluster of nine virtual machines was configured. For research purpose, containing about billion rows fragment of “Wikipedia Page Traffic Statistics” public dataset was used. The results of the measurements confirm that the Kudu system is a promising alternative to the commonly used technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Kudu. Documentation (2018). https://kudu.apache.org/docs/
Baranowski, Z., Canali, L., Toebbicke, R., Hrivnac, J., Barberis, D.: A study of data representation in Hadoop to optimize data storage and search performance for the ATLAS EventIndex. J. Phys: Conf. Ser. 898, 062020 (2017). https://doi.org/10.1088/1742-6596/898/6/062020
Borzemski, L., Kamińska-Chuchmała, A.: Distributed web systems performance forecasting using turning bands method. IEEE Trans. Ind. Inf. 9(1), 254–261 (2013). https://doi.org/10.1109/TII.2012.2198664
Lakhe, B.: Practical Hadoop Migration - How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL. Apress, New York (2016)
Lipcon, T., Alves, D., Burkert, D., Cryans, J.D., Dembo, A., Percy, M., Rus, S. Wang, D., Bertozzi, M., McCabe, C.P., Wang, A.: Kudu - Storage for Fast Analytics on Fast Data. Cloudera, Inc. (2015). https://kudu.apache.org/kudu.pdf
Marz, N., Warren, J.: Big Data - Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications, New York (2015)
Press, G.: A Very Short History of Big Data. Forbes, 9 May 2013. https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/
Skomoroch, P.N.: Wikipedia Page Traffic Statistics - 7 months of hourly pageview statistics for all articles in Wikipedia. Amazon Web Services (2015). https://aws.amazon.com/datasets/wikipedia-page-traffic-statistics/
Tyukin, B.: Benchmarking Impala on Kudu vs Parquet. Blog about Big Data, Business Intelligence, Data Warehousing and ETL, 5 January 2018. https://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/
Vohra, D.: Practical Hadoop Ecosystem A Definitive Guide to Hadoop-Related Frameworks and Tools. Apress, New York (2016)
Yegulalp, S.: Cloudera’s Kudu: Like HDFS and HBase in one. InfoWorld Tech Watch, 28 September 2015. https://www.infoworld.com/article/2986675/hadoop/cloudera-kudu-hdfs-hbase-in-one.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Oleś, D., Nowak, Z. (2019). The Performance Analysis of Distributed Storage Systems Used in Scalable Web Systems. In: Borzemski, L., Świątek, J., Wilimowska, Z. (eds) Information Systems Architecture and Technology: Proceedings of 39th International Conference on Information Systems Architecture and Technology – ISAT 2018. ISAT 2018. Advances in Intelligent Systems and Computing, vol 852. Springer, Cham. https://doi.org/10.1007/978-3-319-99981-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-99981-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99980-7
Online ISBN: 978-3-319-99981-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)