TLA: Temporal look-ahead processor allocation method for heterogeneous multi-cluster systems

https://doi.org/10.1016/j.jpdc.2013.07.018Get rights and content

Highlights

  • The allocation simulation process used in TLA is novel and effective.

  • TLA directly utilizes the performance metric to make allocation decisions.

  • Extensive simulation has been carrying out to evaluate the performance of TLA.

  • With precise runtime, TLA has up to an 87% performance improvement.

Abstract

In a heterogeneous multi-cluster (HMC) system, processor allocation is responsible for choosing available processors among clusters for job execution. Traditionally, processor allocation in HMC considers only resource fragmentation or processor heterogeneity, which leads to heuristics such as Best-Fit (BF) and Fastest-First (FF). However, those heuristics only favor certain types of workloads and cannot be changed adaptively. In this paper, a temporal look-ahead (TLA) method is proposed, which uses an allocation simulation process to guide the decision of processor allocation. Thus, the allocation decision is made dynamically according to the current workload and system configurations. We evaluate the performance of TLA by simulations, with different workloads and system configurations, in terms of average turnaround time. Simulation results indicate that, with precise runtime information, TLA outperforms traditional processor allocation methods and has up to an 87% performance improvement.

Introduction

This paper focuses on the issue of processor allocation for parallel jobs in heterogeneous multi-cluster (HMC) systems. An HMC system consists of multiple clusters, whose computational power, memory size, and communication capability can be varied for different clusters. Such systems are becoming more and more popular in grid computing and cloud computing  [20], [24], [23].

In an HMC system, a central job scheduler (also known as a meta-scheduler) is commonly used to dispatch all submitted jobs  [20], [23]. A central queue, called the waiting queue, is used to accommodate those submitted jobs. The central job scheduler usually involves two operations: job scheduling and processor allocation. Job scheduling decides the execution order of the jobs, while processor allocation decides which cluster(s) to allocate the job to  [20], [23].

The issues of processor allocation in a HMC system are more complicated than those in other parallel systems. In a homogeneous cluster system or a supercomputer, all processors have equal computation capability. Therefore, it makes no significant difference to allocate a job on different processors. For a homogeneous multi-cluster system, processor allocation methods can be divided into two categories, single-site allocation and multi-site co-allocation, depending on whether the system supports a job to be executed across different clusters  [12]. The single-site allocation algorithms need to take care of the resource fragmentation problem, which means the entire system has a sufficient amount of available processors for a job but no single cluster alone has enough free processors to accommodate it. For an HMC system, the heterogeneity of resources adds another dimension of complexity to the allocation problem  [23].

In this paper, we focus on the case of single-site allocation on HMC, since multi-site co-allocation is rarely seen in production systems  [21]. In addition, we assume the heterogeneity is only for computing speed (it is called speed heterogeneity hereafter in this paper). Because of speed heterogeneity, allocating a job onto different clusters could lead to a different job execution time. Moreover, different allocation decisions for a job may also affect the waiting time of the jobs behind it in the waiting queue because of the resource fragmentation. Therefore, processor allocation becomes a crucial issue in a HMC system because it affects both the waiting time and execution time of the job.

Traditional processor allocation heuristics on a HMC system aim to resolve the resource fragmentation problem or to leverage the speed heterogeneity property to improve system performance, which leads to heuristics such as Best-Fit (BF)  [10] and Fastest-First (FF)  [12]. As reported in  [23], their performance largely depends on the input workload and system configurations since both methods consider only a single performance factor.

This paper proposes a novel single-site allocation method called temporal look-ahead (TLA), which uses an allocation simulation process to guide the decision of processor allocation. Instead of focusing on resource fragmentation and speed heterogeneity directly, TLA tries to optimize the performance of entire system based on the specified performance metric. The idea behind TLA is that by considering the performance metric directly it will naturally take into account all relevant performance factors to that metric. This design gives TLA potential to optimize other performance metric, which may have completely different performance factors.

TLA works as follows. For each job j to be allocated, TLA evaluates all possible allocations and picks the one that could result in best system performance. Each possible allocation, say allocate to cluster c, is evaluated through a simulation, which simulates the job scheduling, allocation, and execution of all subsequent jobs in the waiting queue, under the assumption that job j is allocated to cluster c, and evaluates the consequent system performance of such an allocation.

The realization of TLA needs to know which job scheduling algorithm is used since it decides the execution order of jobs in the waiting queue. TLA is a general processor allocation algorithm which can work with different job scheduling algorithms. To simplify the presentation, this paper only demonstrate the capability of TLA with the well-known First-Come–First-Served (FCFS) job scheduling algorithm  [20], [23], [15].

To show the effectiveness of TLA, we compared TLA with existing processor allocation heuristics with the metric of average job turnaround time. Simulation experiments were conducted for various input workloads and system configurations. Simulation results show that the peak performance improvements made by TLA can be up to 87%, when the runtime estimation is accurate.

The rest of this paper is organized as follows. Section  2 presents the system model and reviews the related work. Section  3 illustrates the idea of TLA. Section  4 presents the results of experiments which are based on precise job runtime estimation. Section  5 presents and discusses the performance issue of TLA. Conclusions and future work are given in Section  6.

Section snippets

System model

The system in discussion is a heterogeneous multi-cluster (HMC) architecture, which consists of a collection of interconnected clusters. Each cluster is a computer system with homogeneous processors, while the number and the speed of processors can be varied for different clusters. Here we assume the speed difference among the clusters can be perfectly reflected by job runtime. For example, if the speed of processor in cluster A is twice as fast as that in cluster B, running a job on cluster A

TLA method

For the job in the waiting queue to be allocated, say job j, TLA is asking the following question:

If jobs in the current waiting queue are what the system will have ultimately, which allocation of job j will achieve the best overall system performance?

To answer that, TLA evaluates all possible allocations for job j, in terms of a desired performance metric, such as ATT introduced in Section  2.4. For a possible allocation, say cluster c, TLA utilizes a simulation procedure for all the

Experiments

In this section we present the simulation results based on precise job runtime estimation. The experimental settings, including the used input workloads and the methods to model different system configurations, are presented in Section  4.1. Section  4.2 present the simulation results as well as the discussion.

Discussion on the performance issue of TLA

In this section we further discuss some performance issues of TLA. The first thing to be verified is how TLA performs with inaccurate runtime estimation, since in the real world it is hard to precisely estimate the runtime of each parallel job. This issue will be discussed in Section  5.1. By default, TLA simulates the allocation of all waiting jobs in the score calculation. Actually, TLA can use an arbitrary number of jobs for allocation simulation. This number is called the simulation depth.

Conclusion and future work

This paper investigates the issues of single-site processor allocation in heterogeneous multi-cluster (HMC) systems. Traditional processor allocation policies, such as Best-Fit (BF) and Fastest-First (FF), consider only speed heterogeneity or resource fragmentation. Their performance is not consistent for different input workload and system configurations. In this paper, we propose the temporal look-ahead (TLA) processor allocation method, which tries to find an allocation that can benefit the

Po-Chi Shih was born on October 9, 1980 in Taipei, Taiwan, R.O.C. He received the B.S. and M.S. degrees in Computer Science and Information Engineering from Tunghai University in 2003 and 2005, and a Ph.D. degree in Computer Science from National Tsing Hua University in 2012. He is now doing a post-Doc in NTHU. His research interests include parallel processing, cloud computing and software defined network.

References (29)

  • K. Li

    Job scheduling and processor allocation for grid computing on metacomputers

    Journal of Parallel and Distributed Computing

    (2005)
  • U. Lublin et al.

    The workload on parallel supercomputers: modeling the characteristics of rigid jobs

    Journal of Parallel and Distributed Computing

    (2003)
  • X. Tang

    List scheduling with duplication for heterogeneous computing systems

    Journal of Parallel and Distributed Computing

    (2010)
  • M. Armbrust

    Above the clouds: a Berkeley view of cloud computing, Tech. Rep. UCB/EECS-2009-28, EECS Department

    (2009)
  • A.I.D. Bucur, D.H.J. Epema, An evaluation of processor co-allocation for different system configurations and job...
  • E. Carsten, et al., On advantages of grid computing for parallel job scheduling, in: Proceedings of the 2nd IEEE/ACM...
  • G.F. Dror, Packing schemes for gang scheduling, in: Proceedings of the Workshop on Job Scheduling Strategies for...
  • C. Ernemann, et al., On effects of machine configurations on parallel job scheduling in computational grids, in:...
  • C. Ernemann, et al., Benefits of global grid computing for job scheduling, in: Proceedings of the IEEE/ACM...
  • D.G. Feitelson, et al., Theory and practice in parallel job scheduling, in: Proceedings of the International Conference...
  • D.G. Feitelson

    Experimental analysis of the root causes of performance evaluation results: a backfilling case study

    IEEE Transactions on Parallel and Distributed Systems

    (2005)
  • D.G. Feitelson, et al., Parallel job scheduling—a status report, in: Proceedings of the International Workshop on Job...
  • V. Hamscher, et al., Evaluation of job-scheduling strategies for grid computing, in: Proceedings of the First IEEE/ACM...
  • V. Hamscher, et al., Evaluation of job-scheduling strategies for grid computing, in: Proceedings of the 7th...
  • Cited by (0)

    Po-Chi Shih was born on October 9, 1980 in Taipei, Taiwan, R.O.C. He received the B.S. and M.S. degrees in Computer Science and Information Engineering from Tunghai University in 2003 and 2005, and a Ph.D. degree in Computer Science from National Tsing Hua University in 2012. He is now doing a post-Doc in NTHU. His research interests include parallel processing, cloud computing and software defined network.

    Kuo-Chan Huang received his B.S. and Ph.D. degrees in Computer Science and Information Engineering from National Chiao-Tung University, Taiwan, in 1993 and 1998, respectively. He is currently an Associate Professor in the Computer and Information Science Department at the National Taichung University, Taiwan. He is a member of ACM and the IEEE Computer Society. His research areas include parallel processing, cluster, grid and cloud computing and workflow computing.

    Che-Rung Lee received a B.S. and M.S. degrees in Computer Science from National Tsing Hua University Taiwan in 1996 and 2000 respectively, and the Ph.D. degree in Computer Science from University of Maryland, College Park in 2007. He joined the Department of Computer Science at National Tsing Hua University as an Assistant Professor in 2008. His research interests include numerical algorithms, scientific computing, high-performance computation, and cloud computing. He is a member of IEEE and SIAM.

    I-Hsin Chung is currently a Research Staff Member at the Thomas J. Watson Research Center, Yorktown Heights, NY, USA.

    Yeh-Ching Chung received a B.S. degree in Information Engineering from Chung Yuan Christian University in 1983, and the M.S. and Ph.D. degrees in Computer and Information Science from Syracuse University in 1988 and 1992, respectively. He joined the Department of Information Engineering at Feng Chia University as an Associate Professor in 1992 and became a Full Professor in 1999. From 1998 to 2001, he was the chairman of the department. In 2002, he joined the Department of Computer Science at National Tsing Hua University as a full professor. His research interests include parallel and distributed processing, cluster systems, grid computing, multi-core tool chain design, and multi-core embedded systems. He is a member of the IEEE Computer Society and ACM.

    View full text