research-article

Survey on improving the performance of MapReduce in Hadoop

Authors:
Nour-Eddine Bakni

LIMSAD Faculty of Sciences, Morocco

LIMSAD Faculty of Sciences, Morocco
View Profile

,
Ismail Assayad

LIMSAD Faculty of Sciences and ENSEM, Hassan II University of Casablanca, Morocco

LIMSAD Faculty of Sciences and ENSEM, Hassan II University of Casablanca, Morocco
View Profile

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & SecurityApril 2021Article No.: 36Pages 1–5https://doi.org/10.1145/3454127.3456617

Published:26 November 2021Publication History

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

Pages 1–5

ABSTRACT

Hadoop has become the most popular and the most used platform in distributed data processing, Hadoop is also an open-source software that implements the MapReduce model for processing big data, it has taken a large part in scientific research in the field of big data, a lot of research has addressed allocation and scheduling in Hadoop system, in this paper we will present the main research done in improving the performance of the MapReduce model of Hadoop platform. The Most previous surveys only focused on Hadoop MapReduce scheduling and how to ameliorate it, but this paper tries to give an overview of the important work that aim to improve the performance of Hadoop MapReduce from different sides (energy, budget, scheduling, makespan …).

References

[1] M. Senthilkumar and P. Ilango, “A Survey on Job Scheduling in Big Data”, CYBERNETICS AND INFORMATION TECHNOLOGIES, Volume 16, No 3, Sofia 2016, DOI: 10.1515/cait-2016-0033.Google ScholarDigital Library
[2] Sarika Patil and Shyam Deshmukh, “Survey on Task Assignment Techniques in Hadoop”, International Journal of Computer Applications, December 2012 DOI: 10.5120/9617-4256.Google ScholarCross Ref
[3] B.Thirumala Rao and L.S.S.Reddy, “Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments”, International Journal of Computer Applications (0975 – 8887), Volume 34– No.9, November 2011.Google Scholar
[4] Dongjin Yoo and Kwang Mong Sim, “A Comparative Review of Job Scheduling for MapReduce”, IEEE CCIS2011.Google Scholar
[5] Seyed Reza Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms”, International Journal of Computer Networks and Communications Security, VOL. 2, NO. 9, SEPTEMBER 2014, 308–317.Google Scholar
[6] Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou “Improving MapReduce performance in heterogeneous environments with adaptive task tuning”, 2014 ACM.Google ScholarDigital Library
[7] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica, “Job Scheduling for Multi-User MapReduce Clusters”, Technical Report No. UCB/EECS-2009-55, April 30, 2009.Google Scholar
[8] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, and Xiao Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, 2010 IEEE International Symposium on Parallel and Distributed Processing.Google Scholar
[9] Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi, “Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications”, 2015 IEEE Transactions on Parallel and Distributed Systems.Google Scholar
[10] Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad, “ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications”, ICPP 2019. ACMGoogle ScholarDigital Library
[11] Nezih Yigitbasi, Kushal Datta, Nilesh Jain, and Theodore Willke, “Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters”, GCM’2011. ACM.Google Scholar
[12] Yanpei Chen, Laura Keys, and Randy H. Katz, “Towards Energy Efficient MapReduce”, Technical Report No. UCB/EECS-2009-109. August 5, 2009.Google Scholar
[13] Ivanilton Polato, Denilson Barbosa, Abram Hindle, and Fabio Kon, “Hadoop Energy Consumption Reduction with Hybrid HDFS”, SAC 2016. ACM.Google ScholarDigital Library
[14] Yang Wang and Wei Shi, “Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds”, 2014 IEEE Transactions on Cloud Computing.Google ScholarCross Ref
[15] Zhuoyao Zhang, Ludmila Cherkasova, and Boon Thau Loo, ”Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud” (2011). 2014 IEEE Network Operations and Management Symposium (NOMS).Google Scholar
[16] Chen He, Ying Lu, and David Swanson, ”Matchmaking: A New MapReduce Scheduling Technique” (2011). 2011 Third IEEE International Conference on Cloud Computing Technology and Science.Google Scholar
[17] Jian Tan, Xiaoqiao Meng, and Li Zhang, ”Performance Analysis of Coupling Scheduler for MapReduce/Hadoop”, the 31st Annual IEEE International Conference on Communications, 2012.Google Scholar
[18] Kamal Kc and Kemafor Anyanwu, “Scheduling Hadoop Jobs to Meet s”, 2010 IEEE Second International Conference on Cloud Computing Technology and Science. DOI: 10.1109/CloudCom 2010.97Google ScholarCross Ref
[19] Chen He, Ying Lu, and David Swanson, ”Real-Time Scheduling in MapReduce Clusters”, 2013 IEEE.Google ScholarCross Ref
[20] Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu, “Starfish: A Selftuning System for Big Data Analytics”, 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11) January 9-12, 2011.Google Scholar
[21] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “ARIA: automatic resource inference and allocation for mapreduce environments”, ICAC ’11: Proceedings of the 8th ACM international conference on Autonomic computing, June 2011.Google ScholarDigital Library
[22] Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin, “FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2010.Google ScholarCross Ref
[23] Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B. N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, and Topher ZiCornell, “Nova: Continuous Pig/Hadoop Workflows”, SIGMOD ’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, ACM 2011.Google Scholar
[24] Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, and Andreas Neumann, “Oozie: Towards a Scalable Workflow Management System for Hadoop”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarDigital Library
[25] Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh, “Redoop Infrastructure for Recurring Big Data Queries”, Proceedings of the VLDB Endowment, August 2014, ACM .Google ScholarDigital Library
[26] Harold Lim, Herodotos Herodotou, and Shivnath Babu, “Stubby: A Transformation based Optimizer for MapReduce Workflows”, SWEET ’12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, ACM 2012.Google ScholarDigital Library
[27] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang, “MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs”, Proceedings of the VLDB Endowment, August 2014, ACM.Google ScholarDigital Library
[28] Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, 2011.Google ScholarDigital Library
[29] Mukhtaj Khan, Yong Jin, Maozhen Li, Yang Xiang and Changjun Jiang, “Hadoop Performance Modeling for Job Estimation and Resource Provisioning”, IEEE Transactions on Parallel and Distributed Systems, 2016.Google ScholarDigital Library
[30] Apache Hadoop http://hadoop.Apache.orgGoogle Scholar
[31] S. Li, T. Abdelzaher, M. Yuan, ”Tapa: Temperature aware power allocation in data center with map-reduce”, in: 2011 International Green Computing Conference and Workshops (IGCC), IEEE, 2011, pp. 1–8.Google ScholarDigital Library
[32] Z. Niu, B. He, F. Liu, ”Not all joules are equal: Towards energy-efficient and green-aware data processing frameworks”, in: IEEE International Conference on Cloud Engineering, IEEE, 2016, pp. 2–11.Google Scholar
[33] P.P. Nghiem, S.M. Figueira, ”Towards efficient resource provisioning in mapreduce”, J. Parallel Distributed Comput. 95 (C) (2016) 29–41.Google ScholarDigital Library
[34] F. Tian, K. Chen, ”Towards optimal resource provisioning for running mapreduce programs in public clouds”, in: IEEE International Conference on Cloud Computing, vol. 25, IEEE, 2011, pp. 155–162.Google Scholar

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing Applications

Hadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...
Read More
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the Internet

The data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security
April 2021
410 pages
ISBN:9781450388719
DOI:10.1145/3454127
Editors:
Pr. Ben Ahmed Mohamed
FSTT/UAE, Tangier Morocco
,
Pr. Boudhir Anouar Abdelhakim
FSTT/UAE, Tangier Morocco
,
Pr. Tomader Mazri
ENSAK ITU, Kenitra Morocco
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Big data
Hadoop
Job Scheduling
MapReduce
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 63
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Survey on improving the performance of MapReduce in Hadoop

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

ABSTRACT

References

Cited By

Recommendations

MapReduce: Review and open challenges

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop

Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Survey on improving the performance of MapReduce in Hadoop

NISS '21: Proceedings of the 4th International Conference on Networking, Information Systems & Security

ABSTRACT

References

Cited By

Recommendations

MapReduce: Review and open challenges

High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop

Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media