Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

https://doi.org/10.1016/j.jpdc.2010.04.004Get rights and content

Abstract

The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. HPC users need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost which will reduce the profit margin of Cloud providers, but also high carbon emissions which are not environmentally sustainable. Hence, there is an urgent need for energy-efficient solutions that can address the high increase in the energy consumption from the perspective of not only the Cloud provider, but also from the environment. To address this issue, we propose near-optimal scheduling policies that exploit heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors (such as energy cost, carbon emission rate, workload, and CPU power efficiency) which change across different data centers depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 25% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.

Introduction

During the last few years, the use of High Performance Computing (HPC) infrastructure to run business and consumer based IT applications has increased rapidly. This is evident from the recent Top500 supercomputer applications where many supercomputers are now used for industrial HPC applications, including 9.2% of them are used for Finance and 6.2% for Logistic services [58]. Thus, it is desirable for IT industries to have access to a flexible HPC infrastructure which is available on demand with minimum investment. Cloud computing [10] promises to deliver such reliable services through next-generation data centers built on virtualized compute and storage technologies. Users are able to access applications and data from a “Cloud” anywhere in the world on demand and pay based on what they use. Hence, Cloud computing is a highly scalable and cost-effective infrastructure for running HPC applications which requires ever-increasing computational resources.

However, Clouds are essentially data centers that require high energy1 usage to maintain operation [5]. Today, a typical data center with 1000 racks need 10 MW of power to operate [50]. High energy usage is undesirable since it results in high energy cost. For a data center, the energy cost is a significant component of its operating and up-front costs [50]. Therefore, Cloud providers want to increase their profit or Return on Investment (ROI) by reducing their energy cost. Many Cloud providers are thus building different data centers and deploying them in many geographical locations so as not only to expose their Cloud services to business and consumer applications, e.g. Amazon [1], but also to reduce energy cost, e.g. Google [40].

In April 2007, Gartner estimated that the Information and Communication Technologies (ICT) industry generates about 2% of the total global CO22 emissions, which is equal to the aviation industry [30]. As governments impose carbon emissions limits on the ICT industry like in the automobile industry [18], [21], Cloud providers must reduce energy usage to meet the permissible restrictions [15]. Thus, Cloud providers must ensure that data centers are utilized in a carbon-efficient manner to meet scaling demand. Otherwise, building more data centers without any carbon consideration is not viable since it is not environmentally sustainable and will ultimately violate the imposed carbon emissions limits. This will in turn affect the future widespread adoption of Cloud computing, especially for the HPC community which demands scalable infrastructure to be delivered by Cloud providers. Companies like Alpiron [2] already offer software for cost-efficient server management and promise to reduce energy cost by analyzing, via advanced algorithms, which server to shutdown or turn on during the runtime.

Motivated by this practice, this paper enhances the idea of cost-effective management by taking both the aspects of economic (profit) and environmental (carbon emissions) sustainability into account. In particular, we aim to examine how a Cloud provider can achieve optimal energy sustainability of running HPC workloads across its entire Cloud infrastructure by harnessing the heterogeneity of multiple data centers geographically distributed in different locations worldwide.

The analysis of previous work shows that little investigation has been done for both economic and environmental sustainability to achieve energy efficiency on a global scale as in Cloud computing. First, previous work has generally studied how to reduce energy usage from the perspective of reducing cost, but not how to improve the profit while reducing the carbon emissions which is also significantly impacting the Cloud providers [25]. Second, most previous work has focused on achieving energy efficiency at a single data center location, but not across multiple data center locations. However, Cloud providers such as Amazon EC2 [1] typically has multiple data centers distributed worldwide. As shown in Fig. 1, the energy efficiency of an individual data center in different locations changes dynamically at various times depending on a number of factors such as energy cost, carbon emission rate, workload, CPU power efficiency, cooling system, and environmental temperature. Thus, these different contributing factors can be considered to exploit the heterogeneity across multiple data centers for improving the overall energy efficiency of the Cloud provider. Third, previous work has mainly proposed energy saving policies that are application-specific [26], [28], processor-specific [52], [17], and/or server-specific [60], [38]. But, these policies are only applicable or most effective for the specific models that they are specially designed for. Hence, we propose some simple, yet effective generic energy-efficient scheduling policies that can be extended to any application, processor, and server models so that they can bereadily deployed in existing data centers with minimum changes. Our generic scheduling policies within a data center can also easily complement any of these application-specific, processor-specific, and/or server-specific energy saving policies that are already in place within existing data centers or servers.

Hence, the key contributions of this paper are:

  • (1)

    A novel mathematical model for energy efficiency based on various contributing factors such as energy cost, carbon emission rate, HPC workload, and CPU power efficiency;

  • (2)

    Near-optimal energy-efficient scheduling policies which not only minimize the carbon emission and maximize the profit of the Cloud provider, but also can be readily implemented without much infrastructure changes such as the relocation of existing data centers;

  • (3)

    Energy efficiency analysis of our proposed policies (in terms of carbon emissions and profit) through extensive simulations using real HPC workload traces, and data center carbon emission rates and energy costs to demonstrate the importance of considering various contributing factors;

  • (4)

    Analysis of lower/upper bounds of the optimization problem; and

  • (5)

    Exploiting local minima in Dynamic Voltage Scaling (DVS) to further reduce the energy consumption of HPC applications within a data center.

This paper is organized as follows. Section 2 discusses related work. Section 3 defines the Cloud computing scenario and the problem description. In Section 4, different policies for allocating applications to data centers efficiently are described. Section 5 explains the evaluation methodology and simulation setup, followed by the analysis of the performance results in Section 6. Section 7 presents the conclusion and future work.

Section snippets

Related work

Table 1 gives an overview of previous work which addresses any of the five aspects considered by this paper. To the best of our knowledge, except our work, there is no previous work which collectively addresses all five aspects.

Most previous work addresses energy-efficient computing for servers [5]. But most of them focuses on reducing energy consumption in data centers for web workloads [60], [12]. Thus, they assume that energy is an increasing function of CPU frequency since web workloads

System model

Our system model is based on the Cloud computing environment, whereby Cloud users are able to tap the computational power offered by the Cloud providers to execute their HPC applications. The Cloud meta-scheduler acts as an interface to the Cloud infrastructure and schedules applications on the behalf of users as shown in Fig. 2. It interprets and analyzes the service requirements of a submitted application and decides whether to accept or reject the application based on the availability of

Meta-scheduling policies

The meta-scheduler periodically assigns applications to data centers at a fixed time interval called the scheduling cycle. This enables the meta-scheduler to potentially make a better selection choice of applications when mapping from a larger pool of applications to the data centers, as compared to during each submission of an application. In each scheduling cycle, the meta-scheduler collects the information from both data centers and users.

In general, a meta-scheduling policy consists of two

Performance evaluation

Configuration of applications: We use workload traces from Feitelson’s Parallel Workload Archive (PWA) [23] to model the HPC workload. Since this paper focuses on studying the requirements of Cloud users with HPC applications, the PWA meets our objective by providing workload traces that reflect the characteristics of real parallel applications. Our experiments utilize the first week of the LLNL Thunder trace (January 2007 to June 2007). The LLNL Thunder trace from the Lawrence Livermore

Analysis of results

This section presents the evaluation of our proposed mapping scheduling policies based on various metrics such as carbon emission, and profit. During experimentation, the performance of MP–MCE policy in terms of carbon emission and profit was observed to be very similar to GMP with no additional benefits. Hence, to save space, the results for MP–MCE are not presented in the paper.

Concluding remarks and future directions

The usage of energy has become a major concern since the price of electricity has increased dramatically. Especially, Cloud providers need a high amount of electricity to run and maintain their computational resources in order to provide the best service level for the customer. Although this importance has been emphasized in a lot of research literature, the combined approach of analyzing the profit and energy sustainability in the resource allocation process has not been taken into

Acknowledgments

We would like to thank Marcos Dias de Assuncao for his constructive comments on this paper. This work is partially supported by research grants from the Australian Research Council (ARC) and Australian Department of Innovation, Industry, Science and Research (DIISR).

Saurabh Kumar Garg is a Ph.D. student at the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, University of Melbourne, Australia. In Melbourne University, he has been awarded various special scholarships for his Ph.D. candidature. He completed his 5-year Integrated Master of Technology in Mathematics and Computing from the Indian Institute of Technology (IIT) Delhi, India, in 2006. His research interests include Resource Management, Scheduling, Utility and Grid Computing, Cloud

References (60)

  • J. Burge, P. Ranganathan, J.L. Wiener, Cost-aware scheduling for heterogeneous enterprise machines (CASH’EM), Technical...
  • J.S. Chase et al.

    Managing energy and server resources in hosting centers

    SIGOPS Operating Systems Review

    (2001)
  • Y. Chen et al.

    Managing server energy and operational costs in hosting centers

    ACM SIGMETRICS Performance Evaluation Review

    (2005)
  • M. Chin, Desktop cpu power survey, Silentpcreview. com,...
  • US Department of Energy, Voluntary reporting of greenhouse gases: Appendix F. Electricity emission factors, 2007....
  • K. Corrigan, A. Shah, C. Patel, Estimating environmental costs, in: Proceedings of the Ist USENIX Workshop on...
  • US Department of Energy, US Energy Information Administration (EIA) report, 2007....
  • A. Elyada et al.

    Low-complexity policies for energy-performance tradeoff in chip-multi-processors

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    (2008)
  • United States Environmental Protection Agency, Letter to enterprise server manufacturer or other interested...
  • United States Environmental Protection Agency, Report to congress on server and data center energy efficiency, Public...
  • M. Etinski, J. Corbalan, J. Labarta, M. Valero, A. Veidenbaum, Power-aware load balancing of large scale MPI...
  • EUbusiness, Proposed EU regulation to reduce CO2 emissions from cars, Dec. 2007....
  • X. Fan et al.

    Power provisioning for a warehouse-sized computer

  • D. Feitelson, Parallel workloads archive, Aug. 2009....
  • D.G. Feitelson, L. Rudolph, U. Schwiegelshohn, K.C. Sevcik, P. Wong, Theory and practice in parallel job scheduling,...
  • W. Feng et al.

    The green500 list: encouraging sustainable supercomputing

    Computer

    (2007)
  • X. Feng, R. Ge, K.W. Cameron, Power and energy profiling of scientific applications on distributed systems, in:...
  • W. Feng, T. Scogland, The green500 list: year one, in: Proceedings of the 2009 IEEE International Symposium on Parallel...
  • V. Freeh et al.

    Analyzing the energy-time trade-off in high-performance computing applications

    IEEE Transactions on Parallel and Distributed Systems

    (2007)
  • A. Gandhi, M. Harchol-Balter, R. Das, C. Lefurgy, Optimal power allocation in server farms, in: Proceedings of the 11th...
  • Cited by (263)

    • CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency

      2023, Proceedings of the ACM on Measurement and Analysis of Computing Systems
    View all citing articles on Scopus

    Saurabh Kumar Garg is a Ph.D. student at the Cloud Computing and Distributed Systems (CLOUDS) Laboratory, University of Melbourne, Australia. In Melbourne University, he has been awarded various special scholarships for his Ph.D. candidature. He completed his 5-year Integrated Master of Technology in Mathematics and Computing from the Indian Institute of Technology (IIT) Delhi, India, in 2006. His research interests include Resource Management, Scheduling, Utility and Grid Computing, Cloud Computing, Green Computing, Wireless Networks, and Ad hoc Networks.

    Chee Shin Yeo is a research engineer at the Institute of High Performance Computing (IHPC), Singapore. He completed his Ph.D. at the University of Melbourne, Australia. His research interests include parallel and distributed computing, services and utility computing, energy-efficient computing, and market based resource allocation.

    Arun Anandasivam is a research assistant and Ph.D. student in the Institute of Information Systems and Management at Universität Karlsruhe. His research work comprises pricing policies and decision frameworks of grid and Cloud computing providers.

    Rajkumar Buyya is a Professor of Computer Science and Software Engineering; and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercialising innovations originating from the CLOUDS Lab. He has pioneered Economic Paradigm for Service-Oriented Grid computing and demonstrated its utility through his contributions to conceptualisation, design and development of Cloud and Grid technologies such as Aneka, Alchemi, Nimrod-G and Gridbus that power the emerging eScience and eBusiness applications.

    View full text