skip to main content
10.1145/3454127.3456617acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnissConference Proceedingsconference-collections
research-article

Survey on improving the performance of MapReduce in Hadoop

Published:26 November 2021Publication History

ABSTRACT

Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the field of big data, a lot of research has addressed allocation and scheduling in Hadoop system, in this paper we will present the main research done in improving the performance of the MapReduce model of Hadoop platform. The Most previous surveys only focused on Hadoop MapReduce scheduling and how to ameliorate it, but this paper tries to give an overview of the important work that aim to improve the performance of Hadoop MapReduce from different sides (energy, budget, scheduling, makespan …).

References

  1. [1] M. Senthilkumar and P. Ilango, “A Survey on Job Scheduling in Big Data”, CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 16, No 3, Sofia 2016, DOI: 10.1515/cait-2016-0033.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Sarika Patil and Shyam Deshmukh, “Survey on Task Assignment Techniques in Hadoop”, International Journal of Computer Applications, December 2012 DOI: 10.5120/9617-4256.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] B.Thirumala Rao and L.S.S.Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments”, International Journal of Computer Applications (0975 – 8887), Volume 34– No.9, November 2011.Google ScholarGoogle Scholar
  4. [4] Dongjin Yoo and Kwang Mong Sim, “A Comparative Review of Job Scheduling for MapReduce”, IEEE CCIS2011.Google ScholarGoogle Scholar
  5. [5] Seyed Reza Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms”, International Journal of Computer Networks and Communications Security, VOL. 2, NO. 9, SEPTEMBER 2014, 308–317.Google ScholarGoogle Scholar
  6. [6] Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou “Improving MapReduce performance in heterogeneous environments with adaptive task tuning”, 2014 ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica, “Job Scheduling for Multi-User MapReduce Clusters”, Technical Report No. UCB/EECS-2009-55, April 30, 2009.Google ScholarGoogle Scholar
  8. [8] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, and Xiao Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, 2010 IEEE International Symposium on Parallel and Distributed Processing.Google ScholarGoogle Scholar
  9. [9] Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi, “Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications”, 2015 IEEE Transactions on Parallel and Distributed Systems.Google ScholarGoogle Scholar
  10. [10] Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad, “ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications”, ICPP 2019. ACMGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Nezih Yigitbasi, Kushal Datta, Nilesh Jain, and Theodore Willke, “Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters”, GCM’2011. ACM.Google ScholarGoogle Scholar
  12. [12] Yanpei Chen, Laura Keys, and Randy H. Katz, “Towards Energy Efficient MapReduce”, Technical Report No. UCB/EECS-2009-109. August 5, 2009.Google ScholarGoogle Scholar
  13. [13] Ivanilton Polato, Denilson Barbosa, Abram Hindle, and Fabio Kon, “Hadoop Energy Consumption Reduction with Hybrid HDFS”, SAC 2016. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Yang Wang and Wei Shi, “Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds”, 2014 IEEE Transactions on Cloud Computing.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Zhuoyao Zhang, Ludmila Cherkasova, and Boon Thau Loo, ”Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud” (2011). 2014 IEEE Network Operations and Management Symposium (NOMS).Google ScholarGoogle Scholar
  16. [16] Chen He, Ying Lu, and David Swanson, ”Matchmaking: A New MapReduce Scheduling Technique” (2011). 2011 Third IEEE International Conference on Cloud Computing Technology and Science.Google ScholarGoogle Scholar
  17. [17] Jian Tan, Xiaoqiao Meng, and Li Zhang, ”Performance Analysis of Coupling Scheduler for MapReduce/Hadoop”, the 31st Annual IEEE International Conference on Communications, 2012.Google ScholarGoogle Scholar
  18. [18] Kamal Kc and Kemafor Anyanwu, “Scheduling Hadoop Jobs to Meet s”, 2010 IEEE Second International Conference on Cloud Computing Technology and Science. DOI: 10.1109/CloudCom 2010.97Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Chen He, Ying Lu, and David Swanson, ”Real-Time Scheduling in MapReduce Clusters”, 2013 IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu, “Starfish: A Selftuning System for Big Data Analytics”, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 9-12, 2011.Google ScholarGoogle Scholar
  21. [21] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “ARIA: automatic resource inference and allocation for mapreduce environments”, ICAC ’11: Proceedings of the 8th ACM international conference on Autonomic computing, June 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B. N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, and Topher ZiCornell, “Nova: Continuous Pig/Hadoop Workflows”, SIGMOD ’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM 2011.Google ScholarGoogle Scholar
  24. [24] Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, and Andreas Neumann, “Oozie: Towards a Scalable Workflow Management System for Hadoop”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh, “Redoop Infrastructure for Recurring Big Data Queries”, Proceedings of the VLDB Endowment, August 2014, ACM .Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Harold Lim, Herodotos Herodotou, and Shivnath Babu, “Stubby: A Transformation based Optimizer for MapReduce Workflows”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang, “MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs”, Proceedings of the VLDB Endowment, August 2014, ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Mukhtaj Khan, Yong Jin, Maozhen Li, Yang Xiang and Changjun Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning”, IEEE Transactions on Parallel and Distributed Systems, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Apache Hadoop http://hadoop.Apache.orgGoogle ScholarGoogle Scholar
  31. [31] S. Li, T. Abdelzaher, M. Yuan, ”Tapa: Temperature aware power allocation in data center with map-reduce”, in: 2011 International Green Computing Conference and Workshops (IGCC), IEEE, 2011, pp. 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Z. Niu, B. He, F. Liu, ”Not all joules are equal: Towards energy-efficient and green-aware data processing frameworks”, in: IEEE International Conference on Cloud Engineering, IEEE, 2016, pp. 2–11.Google ScholarGoogle Scholar
  33. [33] P.P. Nghiem, S.M. Figueira, ”Towards efficient resource provisioning in mapreduce”, J. Parallel Distributed Comput. 95 (C) (2016) 29–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] F. Tian, K. Chen, ”Towards optimal resource provisioning for running mapreduce programs in public clouds”, in: IEEE International Conference on Cloud Computing, vol. 25, IEEE, 2011, pp. 155–162.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security
    April 2021
    410 pages
    ISBN:9781450388719
    DOI:10.1145/3454127

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 26 November 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format