在Hadoop架構中需求共享及區域感知的排程研究

在Hadoop分散式運算架構底下，根據系統所使用排程策略的不同，將會直接影響到整體的系統效能。Hadoop架構中系統所預設的排程策略為先進先出(FIFO)，但先進先出排程策略並沒有考慮到不同任務間可能會需要相同的檔案，或是檔案過大時使用網路傳輸檔案導致系統效能降低的影響。本研究提出了FSSL排程策略，以先進先出排程策略為基礎，再加上考慮需求共享與區域感知的因素並在演算法中加入所需要的調整參數，並以此演算法制定新的排程策略進行任務排程以減少網路的負載。實驗結果顯示，我們所提出的FSSL排程策略相較於FIFO排程策略，在多數任務擁有相同需求檔案或是需求檔案較大的執行環境下能夠進一步地改善系統效能，平均系統效能的改善比率約為65%。

關鍵字

MapReduce ； Shared-scan ； Location-aware ； Scheduling ； Hadoop

並列摘要

Using different scheduling polices can affect the system performance in Hadoop architecture. In Hadoop architecture, the default scheduling policy is First-In-First-Out (FIFO). However, the FIFO scheduler simply schedule jobs according to their arrival time and does not consider any other factors that may have great impact on system performance. As a result, using FIFO cannot achieve good enough performance in Hadoop. In this paper, we propose a novel scheduling algorithm, called FSSL (FIFO with Shared-Scan and Locality-aware). FSSL is a scheduling policy based on FIFO and take locality of required data and data sharing probability between jobs into account. Such that the jobs which need the same data can be gathered and easily batch processed, and thus reduce the overhead of transferring data between data nodes and computations nodes. The results show that FSSL scheduling polity can improve system performance about 65% compared to FIFO scheduling policy.

並列關鍵字

MapReduce ； Shared-scan ； Location-aware ； Scheduling ； Hadoop

參考文獻

[1] Agrawal, P., Kifer, D., & Olston, C. (2008). Scheduling shared scans of large data files. Proceedings of the VLDB Endowment, 1(1), 958-969.

[3] Chen, T., Wei, H., Wei, M., Chen, Y., Hsu, T., & Shih, W. (2013). LaSA: A locality-aware scheduling algorithm for hadoop-MapReduce resource assignment. Paper presented at the Collaboration Technologies and Systems (CTS), 2013 International Conference on, 342-346.

[6] Lee, K., Lee, Y., Choi, H., Chung, Y. D., & Moon, B. (2012). Parallel data processing with MapReduce: A survey. AcM sIGMoD Record, 40(4), 11-20.

[11] Jiang, D., Ooi, B. C., Shi, L., & Wu, S. (2010). The performance of MapReduce: An in-depth study. Proceedings of the VLDB Endowment, 3(1-2), 472-483.

[13] Dittrich, J., Quiane-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., & Schad, J. (2010). Hadoop++ : Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment, 3(1-2), 515-529.

國際替代計量

在Hadoop架構中需求共享及區域感知的排程研究

全文下載

主題瀏覽