Study of an Improved Hadoop Speculative Execution Algorithm

Article Preview

Abstract:

The problems of difference of nodes capabilities and the unevenly-distributed bandwidth of the network, widespread exist in the heterogeneous clouding environment. Together with the users randomness of submitting jobs, the problems above lead to server synchronization problems.Under the platform of Hadoop and the situations mentioned above, we come up with a method which is based on the native hadoop speculative algorithm to solve the problems. Through monitoring the load-balance in realtime, dynamically assessing the performance of the node and making the speculative tasks happened in high-performance node which meantime is the nearest node from input split, the algorithm effectively reduces the occupation of the network and accelerates executing speed. The experiment result shows that the method in the execution of which the speculative tasks has a high ratio, significantly improved the efficiency and throughput of the cluster.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2281-2284

Citation:

Online since:

February 2014

Export:

Price:

* - Corresponding Author

[1] Apache Software Foundation, Hadoop on demand. URL http: /hadoop. apache. org/core/docs/r0. 20. 0/hod-user-guide. html.

Google Scholar

[2] P. Ling T. X, Z. Zhang, B. T. Loo, and I. Lee, Real-time mapreduce scheduling, University of Pennsylvania Department of Computer and Information Science, Tech. Rep., Jan (2010).

Google Scholar

[3] H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. In Proc. of ACM Symposium on Cloud Computing, (2011).

DOI: 10.1145/2038916.2038934

Google Scholar

[4] Jinhua Hu; Jianhua Gu; Guofei Sun; Tianhai Zhao; , A Scheduling Strategy on Load Balancing of Virtual Machine Resources in Cloud Computing Environment, Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on , pp.89-96, 18-20 Dec. (2010).

DOI: 10.1109/paap.2010.65

Google Scholar

[5] Zahafia M.Konwinski A.Joseph A.Improving MapReduce Performance in heterogeneous environments [C]. /Proc of the 8th Usenix Symp on Operating Systems Design and Implementation.2008:29-42.

Google Scholar

[6] Dean J. Ghemawat S. MapReduce;simplified Data Processing on Large Clusters[J].Commun.ACM.2008.51(1):107-113.

Google Scholar

[7] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Job scheduling for multi-user MapReduce clusters, Tech. Rep. UCB/EECS-2009-55, EECS Department, University of California, Berkeley (Apr 2009).

DOI: 10.1145/1755913.1755940

Google Scholar

[8] Kant Soni, V. Sharma, R. Kumar Mishra, M. An analysis of various job scheduling strategies in grid computing, Signal Processing Systems (ICSPS), 2010 2nd International Conference on, vol. 2, pp. V2-162-V2-166, 5-7 July (2010).

DOI: 10.1109/icsps.2010.5555272

Google Scholar