ABSTRACT
Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the field of big data, a lot of research has addressed allocation and scheduling in Hadoop system, in this paper we will present the main research done in improving the performance of the MapReduce model of Hadoop platform. The Most previous surveys only focused on Hadoop MapReduce scheduling and how to ameliorate it, but this paper tries to give an overview of the important work that aim to improve the performance of Hadoop MapReduce from different sides (energy, budget, scheduling, makespan …).
- [1] M. Senthilkumar and P. Ilango, “A Survey on Job Scheduling in Big Data”, CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 16, No 3, Sofia 2016, DOI: 10.1515/cait-2016-0033.Google ScholarDigital Library
- [2] Sarika Patil and Shyam Deshmukh, “Survey on Task Assignment Techniques in Hadoop”, International Journal of Computer Applications, December 2012 DOI: 10.5120/9617-4256.Google ScholarCross Ref
- [3] B.Thirumala Rao and L.S.S.Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments”, International Journal of Computer Applications (0975 – 8887), Volume 34– No.9, November 2011.Google Scholar
- [4] Dongjin Yoo and Kwang Mong Sim, “A Comparative Review of Job Scheduling for MapReduce”, IEEE CCIS2011.Google Scholar
- [5] Seyed Reza Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms”, International Journal of Computer Networks and Communications Security, VOL. 2, NO. 9, SEPTEMBER 2014, 308–317.Google Scholar
- [6] Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou “Improving MapReduce performance in heterogeneous environments with adaptive task tuning”, 2014 ACM.Google ScholarDigital Library
- [7] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica, “Job Scheduling for Multi-User MapReduce Clusters”, Technical Report No. UCB/EECS-2009-55, April 30, 2009.Google Scholar
- [8] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, and Xiao Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, 2010 IEEE International Symposium on Parallel and Distributed Processing.Google Scholar
- [9] Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi, “Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications”, 2015 IEEE Transactions on Parallel and Distributed Systems.Google Scholar
- [10] Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad, “ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications”, ICPP 2019. ACMGoogle ScholarDigital Library
- [11] Nezih Yigitbasi, Kushal Datta, Nilesh Jain, and Theodore Willke, “Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters”, GCM’2011. ACM.Google Scholar
- [12] Yanpei Chen, Laura Keys, and Randy H. Katz, “Towards Energy Efficient MapReduce”, Technical Report No. UCB/EECS-2009-109. August 5, 2009.Google Scholar
- [13] Ivanilton Polato, Denilson Barbosa, Abram Hindle, and Fabio Kon, “Hadoop Energy Consumption Reduction with Hybrid HDFS”, SAC 2016. ACM.Google ScholarDigital Library
- [14] Yang Wang and Wei Shi, “Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds”, 2014 IEEE Transactions on Cloud Computing.Google ScholarCross Ref
- [15] Zhuoyao Zhang, Ludmila Cherkasova, and Boon Thau Loo, ”Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud” (2011). 2014 IEEE Network Operations and Management Symposium (NOMS).Google Scholar
- [16] Chen He, Ying Lu, and David Swanson, ”Matchmaking: A New MapReduce Scheduling Technique” (2011). 2011 Third IEEE International Conference on Cloud Computing Technology and Science.Google Scholar
- [17] Jian Tan, Xiaoqiao Meng, and Li Zhang, ”Performance Analysis of Coupling Scheduler for MapReduce/Hadoop”, the 31st Annual IEEE International Conference on Communications, 2012.Google Scholar
- [18] Kamal Kc and Kemafor Anyanwu, “Scheduling Hadoop Jobs to Meet s”, 2010 IEEE Second International Conference on Cloud Computing Technology and Science. DOI: 10.1109/CloudCom 2010.97Google ScholarCross Ref
- [19] Chen He, Ying Lu, and David Swanson, ”Real-Time Scheduling in MapReduce Clusters”, 2013 IEEE.Google ScholarCross Ref
- [20] Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu, “Starfish: A Selftuning System for Big Data Analytics”, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 9-12, 2011.Google Scholar
- [21] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “ARIA: automatic resource inference and allocation for mapreduce environments”, ICAC ’11: Proceedings of the 8th ACM international conference on Autonomic computing, June 2011.Google ScholarDigital Library
- [22] Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2010.Google ScholarCross Ref
- [23] Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B. N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, and Topher ZiCornell, “Nova: Continuous Pig/Hadoop Workflows”, SIGMOD ’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM 2011.Google Scholar
- [24] Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, and Andreas Neumann, “Oozie: Towards a Scalable Workflow Management System for Hadoop”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarDigital Library
- [25] Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh, “Redoop Infrastructure for Recurring Big Data Queries”, Proceedings of the VLDB Endowment, August 2014, ACM .Google ScholarDigital Library
- [26] Harold Lim, Herodotos Herodotou, and Shivnath Babu, “Stubby: A Transformation based Optimizer for MapReduce Workflows”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarDigital Library
- [27] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang, “MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs”, Proceedings of the VLDB Endowment, August 2014, ACM.Google ScholarDigital Library
- [28] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2011.Google ScholarDigital Library
- [29] Mukhtaj Khan, Yong Jin, Maozhen Li, Yang Xiang and Changjun Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning”, IEEE Transactions on Parallel and Distributed Systems, 2016.Google ScholarDigital Library
- [30] Apache Hadoop http://hadoop.Apache.orgGoogle Scholar
- [31] S. Li, T. Abdelzaher, M. Yuan, ”Tapa: Temperature aware power allocation in data center with map-reduce”, in: 2011 International Green Computing Conference and Workshops (IGCC), IEEE, 2011, pp. 1–8.Google ScholarDigital Library
- [32] Z. Niu, B. He, F. Liu, ”Not all joules are equal: Towards energy-efficient and green-aware data processing frameworks”, in: IEEE International Conference on Cloud Engineering, IEEE, 2016, pp. 2–11.Google Scholar
- [33] P.P. Nghiem, S.M. Figueira, ”Towards efficient resource provisioning in mapreduce”, J. Parallel Distributed Comput. 95 (C) (2016) 29–41.Google ScholarDigital Library
- [34] F. Tian, K. Chen, ”Towards optimal resource provisioning for running mapreduce programs in public clouds”, in: IEEE International Conference on Cloud Computing, vol. 25, IEEE, 2011, pp. 155–162.Google Scholar
Recommendations
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing ApplicationsHadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the InternetThe data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...
Comments