Asynchronous Non-Blocking Algorithm to Handle Straggler Reduce Tasks in Hadoop System

Arwan A. Khoiruddin (1), Nordin Zakaria (2), Hitham Seddig Alhussian (3)
(1) High Performance Cloud Computing Center, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, 32610, Malaysia
(2) High Performance Cloud Computing Center, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia 32610
(3) High Performance Cloud Computing Center, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, 32610, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
Khoiruddin, Arwan A., et al. “Asynchronous Non-Blocking Algorithm to Handle Straggler Reduce Tasks in Hadoop System”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 5, Oct. 2020, pp. 1913-9, doi:10.18517/ijaseit.10.5.9073.
Hadoop is widely adopted as a big data processing application as it can run on commercial hardware at a reasonable time. Hadoop uses asynchronous blocking concurrency using Thread and Future class. Therefore, in some cases such as network link or hardware failure, a running task may block other tasks from running (the task becomes straggler). Hadoop releases are equipped with algorithms to handle straggler tasks problem. However, the algorithms manage Map and Reduce task similarly, while the straggler root cause might be different for both tasks. In this paper, the Asynchronous Non-Blocking (ANB) method is proposed to improve the performance and avoid the blocking of Reduce task in Hadoop. Instead of using the single queue, our approach uses two queues, i.e. task queue and callback queue. When a task is not ready or detected as a straggler, it is removed from the main task queue and temporarily sent to the callback queue. When the task is ready to run, it will be sent back to the main task queue for running. The performance of the algorithm is compared with rTuner, the latest paper found on handling straggler task in Reduce task. From the comparison, it is shown that ANB consistently gives faster time to complete because any unready tasks will be directly put into the callback queue without blocking other tasks. Furthermore, the overhead time in rTuner is high as it needs to check the straggler status and to find the reason for a task to become straggler.

D. Reinsel, J. Gantz, and J. Rydning, “The digitisation of the world: from edge to core,” IDC White Paper, 2018.

Y. Sun, Y. Shi, and Z. Zhang, “Finance Big Data: Management, Analysis, and Applications,” Int. J. Electron. Commer., vol. 23, pp. 9-11, 2019.

M. Nakagami, J. A. B. Fortes, and S. Yamaguchi, “Job-Aware Optimization of File Placement in Hadoop,” 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 664-669, 2019.

X. Luo and X. Fu, “Configuration optimisation method of Hadoop system performance based on genetic simulated annealing algorithm,” Cluster Computing, pp. 1-9, 2018.

P. Garraghan, X. Ouyang, R. Yang, D. McKee, and J. Xu, “Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters,” IEEE Transactions on Services Computing, vol. 12, pp. 91-104, 2019.

H.-G. Kim, “Effects of Design Factors of HDFS on I/O Performance,” J. Comput. Sci., vol. 14, pp. 304-309, 2018.

D. Choi, M. Jeon, N. Kim, and B.-D. Lee, “An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications,” IEEE Systems Journal, vol. 12, pp. 3346-3357, 2018.

X. Du, Y. Liu, and C. Zhao, “A Hadoop Yarn Scheduling Based on Node Computing Capability and Data Locality in Heterogeneous Environments,” 2018.

K. Midoun, W.-K. Hidouci, M. Loudini, and D. Belayadi, “RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop,” 2018.

P. Zhang, C. Li, and Y. Zhao, “An Improved Task Scheduling Algorithm Based on Cache Locality and Data Locality in Hadoop,” 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 244-249, 2016.

K. Kalia and N. Gupta, “A Review on Job Scheduling for Hadoop Mapreduce,” 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS), pp. 75-79, 2017.

A. Sharma and G. Singh, “A Review of Scheduling Algorithms in Hadoop,” 2020.

A. M. S. Lakshmi, N. S. Chandra, and M. BalRaju, “Optimised Capacity Scheduler for MapReduce Applications in Cloud Environments,” 2019.

H. Chen and D. Cui, “SLA-based Hadoop Capacity Scheduler Algorithm,” 2015.

J. A. Murali and T. Brindha, “Analysis of Scheduling Algorithms in Hadoop,” 2018.

J. V. Gautam, H. B. Prajapati, V. K. Dabhi, and S. Chaudhary, “Empirical Study of Job Scheduling Algorithms in Hadoop MapReduce,” Cybernetics and Information Technologies, vol. 17, pp. 146-163, 2017.

Y. Xu et al., “RAPID: Avoiding TCP Incast Throughput Collapse in Public Clouds With Intelligent Packet Discarding,” IEEE Journal on Selected Areas in Communications, vol. 37, pp. 1911-1923, 2019.

P. Pandey, S. Singh, and S. Singh, “Cloud computing,” in ICWET, 2010.

B. T. Rao, N. V. Sridevi, V. K. Reddy, and L. S. S. Reddy, “Performance Issues of Heterogeneous Hadoop Clusters in Cloud Computing,” ArXiv, vol. abs/1207.0894, 2012.

S. Shankland, “Google spotlights data center inner workings,” CNET. https://www.cnet.com/news/google-spotlights-data-center-inner-workings/ (accessed Jun. 07, 2020).

M. Liroz-Gistau, R. Akbarinia, D. Agrawal, and P. Valduriez, “FP-Hadoop: Efficient processing of skewed MapReduce jobs,” Information Systems, vol. 60, pp. 69-84, 2016.

R. Patgiri and R. Das, “rTuner: A Performance Enhancement of MapReduce Job,” in ICCMS 2018, 2018.

S. Ghemawat et al., “Performance Tuning and Scheduling of Large Data Set Analysis in Map Reduce Paradigm by Optimal Configuration using Hadoop,” 2019.

X. Hua, M. C. Huang, and P. Liu, “Hadoop Configuration Tuning with Ensemble Modeling and Metaheuristic Optimization,” IEEE Access, vol. 6, pp. 44161-44174, 2018.

M. A. Rahman, A. Hossen, J. Hossen, C. Venkataseshaiah, T. Bhuvaneswari, and A. Sultana, “Towards machine learning-based self-tuning of Hadoop-Spark system,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 15, p. 1076, 2019.

W. Wang, Y. Shi, X. Liu, Y. Feng, and N. Tao, “Hadoop Performance Tuning based on Parameter Optimization,” 2018.

Y. Guo, J. Rao, C. Jiang, and X. Zhou, “Moving Hadoop into the cloud with flexible slot management and speculative execution,” IEEE Transactions on Parallel and Distributed systems, vol. 28, no. 3, pp. 798-812, 2016.

Y. Guo, J. Rao, C. Jiang, and X. Zhou, “Moving Hadoop into the Cloud with Flexible Slot Management and Speculative Execution,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, pp. 798-812, 2017.

X. Huang, L. Zhang, R. Li, L. Wan, and K. Li, “Novel heuristic speculative execution strategies in heterogeneous distributed environments,” Computers & Electrical Engineering, vol. 50, pp. 166-179, 2016.

D. C. Vinutha and G. T. Raju, “Evolutionary Approach based Scheduler for Speculative Task Execution,” 2019 1st International Conference on Advances in Information Technology (ICAIT), pp. 485-490, 2019.

Q. Liu, W. Cai, J. Shen, Z. Fu, X. Liu, and N. Linge, “A speculative execution strategy based on node classification and hierarchy index mechanism for heterogeneous Hadoop systems,” 2017 19th International Conference on Advanced Communication Technology (ICACT), pp. 889-894, 2017.

M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments,” in OSDI, 2008.

S. R. Pakize, “A Comprehensive View of Hadoop MapReduce Scheduling Algorithms,” 2014.

M. Beckert and R. Ernst, “Response time analysis for sporadic server-based budget scheduling in real time virtualisation environments,” ACM Transactions on Embedded Computing Systems (TECS), vol. 16, no. 5s, pp. 1-19, 2017.

F. Kaltenberger, C. Roux, M. Buczkowski, and M. Wewior, “The OpenAirInterface application programming interface for schedulers using Carrier Aggregation,” in 2016 International Symposium on Wireless Communication Systems (ISWCS), 2016, pp. 497-500.

M. Peuster, J. Kampmeyer, and H. Karl, “Containernet 2.0: A Rapid Prototyping Platform for Hybrid Service Function Chains,” 2018 4th IEEE Conference on Network Softwarization and Workshops (NetSoft), pp. 335-337, 2018.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).