Skip to main content
Log in

A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

MapReduce, a popular programming model for processing data-intensive tasks, has achieved great success in a wide range of applications such as search indexing, social network mining, collaborative recommendation, and spam detection. However, the ability of MapReduce is limited in two respects by its default schedulers. First, it does not support concurrent services sharing a cloud datacenter and second, it fails to guarantee response time for deadline-constrained services. This paper proposes the Paused Rate Monotonic (PRM) algorithm for scheduling hard real-time tasks on a MapReduce-based cloud. The scheduling performance is analyzed theoretically. We prove a bound on cluster utilization, which can be used as a sufficient condition to test whether a given task set can be scheduled. Both the theoretical analysis and experimental evaluation show that the PRM algorithm outperforms traditional real-time ones by improving the probability that a real-time task set can be scheduled on a MapReduce-based cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Berliska J, Drozdowski M (2011) Scheduling divisible MapReduce computations. J Paral Distr Comput 71(3):450–459

    Article  Google Scholar 

  2. Bini E, Buttazzo GC (2005) Measuring the performance of schedulability tests. Real Time Syst 30(1–2):129–154

    Article  MATH  Google Scholar 

  3. Chang H, Kodialam M, Kompella R, Lakshman TV, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM’11, pp 3074–3082

  4. Chen F, Kodialam M, Lakshman T (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: Proceedings of the IEEE International Conference on Computer Communications, INFOCOM’12, pp 1143–1151

  5. Chen Q, Guo M, Deng Q, Zheng L, Guo S, Shen Y (2011) Hat: history-based auto-tuning mapreduce in heterogeneous environments. J Supercomput 64(3): 1038–1054

    Google Scholar 

  6. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  7. Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in mapreduce environment. In: Proceedings of the IEEE 17th International Conference on Parallel and Distributed Systems, ICPADS’11, pp 9–16

  8. Ferguson A, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European Conference on Computer Systems, Eurosys’12, pp 99–112

  9. He C, Lu Y, Swanson D (2013) Real-time scheduling in mapreduce clusters. In: Proceedings of IEEE the 15th International Conference on High Performance Computing and Communications, HPCC’13, pp 1536–1544

  10. Herodotou H, Babu S (2013) A what-if engine for cost-based MapReduce optimization. IEEE Data Eng Bull 36(1):5–14

    Google Scholar 

  11. Kc K and Anyanwu K (2010). Scheduling hadoop jobs to meet deadlines. In: Proceeding of IEEE Second International Conference on Cloud Computing Technology and Science, CloudCom’10, pp 388–392

  12. Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. SIGMOD Record 40(4):11–20

    Article  Google Scholar 

  13. Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard real-time environment. J Assoc Comput Mach 20(1):46–61

    Article  MATH  MathSciNet  Google Scholar 

  14. Morton K, Balazinska M, Grossman D (2010) Paratimer: a progress indicator for mapreduce dags. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp 507–518

  15. Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pp 289–298

  16. Myung J, Lee SG (2013) Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce. J Supercomput 66(1):594–609

    Google Scholar 

  17. Phan LT, Zhang Z, Loo BT, Lee I (2010) Real time MapReduce scheduling. Computer and Information Science, University of Pennsylvania, Pennsylvania

    Google Scholar 

  18. Polo J, Carrera D, Becerra Y, Torres J, Ayguade E, Steinder M, Whalley I (2010). Performance-driven task co-scheduling for mapreduce environments. In: Proceeding of 11th IEEE Network Operations and Management Symposium, NOMS’11, pp 373–380

  19. Sandholm T and Lai K (2010) Dynamic proportional share scheduling in hadoop. In: Proceedings of the 15th International Conference on Job scheduling Strategies for Parallel Processing, JSSPP’10, pp 110–131

  20. Serrano D, Bouchenak S, Kouki Y, Ledoux T, Lejeune J, Sopena J, Arantes L, Sens P et al. (2013) Towards qos-oriented sla guarantees for online cloud services. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid, Computing, CCGRID2013, pp 10–18

  21. Teng F, Yang H, Li T, Yang Y, Li Z (2013) Scheduling real-time workflow on mapreduce-based cloud. In: Proceedings of the Third International Conference on Innovative Computing Technology, INTECH’13, pp 117–122

  22. Teng F, Yu L, Magoulès F (2011) Simmapreduce: a simulator for modeling mapreduce framework. In: Proceeding of International Conference on Multimedia and Ubiquitous, Engineering, MUE’11, pp 277–282

  23. Verma A, Cherkasova L, Campbell RH (2011) Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC ’11, pp 235–244

  24. Wang W-J, Chang Y-S, Lo W-T, Lee Y-K (2013) Adaptive scheduling for parallel tasks with qos satisfaction for hybrid cloud environments. J Supercomput 66(2):783–811

    Google Scholar 

  25. Yu L, Magoulès F (2009) Service scheduling and rescheduling in an applications integration framework. Adv Eng Softw 40(9):941–946

    Article  MATH  Google Scholar 

  26. Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley

    Google Scholar 

  27. Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer systems, EuroSys ’10, pp 265–278

  28. Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pp 29–42

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Teng.

Additional information

This work was partially supported by the National Natural Science Foundation of China (No. 61202043), the Fundamental Research Funds for the Central Universities (No. SWJTU12CX098), and the IRT SystemX (Pôle de Compétitivité Systematic).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, F., Magoulès, F., Yu, L. et al. A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud. J Supercomput 69, 739–765 (2014). https://doi.org/10.1007/s11227-014-1115-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1115-z

Keywords

Navigation