research-article

Free Access

Toward ML-centric cloud platforms

Authors:
Ricardo Bianchini

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Marcus Fontoura

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Eli Cortez

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Anand Bonde

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Alexandre Muzio

Microsoft Azure, Redmond, WA

Microsoft Azure, Redmond, WA
View Profile

,
Ana-Maria Constantin

Microsoft Azure, Redmond, WA

Microsoft Azure, Redmond, WA
View Profile

,
Thomas Moscibroda

Microsoft Azure, Redmond, WA

Microsoft Azure, Redmond, WA
View Profile

,
Gabriel Magalhaes

University of Washington

University of Washington
View Profile

,
Girish Bablani

Microsoft Azure, Redmond, WA

Microsoft Azure, Redmond, WA
View Profile

,
Mark Russinovich

Microsoft Azure, Redmond, WA

Microsoft Azure, Redmond, WA
View Profile

Authors Info & Claims

Communications of the ACM Volume 63 Issue 2February 2020pp 50–59https://doi.org/10.1145/3364684

Published:22 January 2020Publication History

Communications of the ACM

Abstract

Exploring the opportunities to use ML, the possible designs, and our experience with Microsoft Azure.

References

Abadi, M. et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12^th USENIX Symp. Operating System Design and Implementation (2016).Google Scholar
Agarwal, A. et al. Making contextual decisions with low technical debt. arXiv preprint arXiv:1606.03966 (2016).Google Scholar
Amazon Web Services. Amazon CloudWatch; https://aws.amazon.com/cloudwatch/.Google Scholar
Bodik, P., Griffith, R., Sutton, C., Fox, A., Jordan, M., and Patterson, D. Statistical machine learning makes automatic control practical for Internet datacenters. In Proceedings of HotCloud (2009).Google Scholar
Calheiros, R. N., Masoumi, E., Ranjan, R., and Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications' QoS. IEEE Trans. Cloud Computing 3, 4 (2015).Google ScholarDigital Library
Cao, R., Yu, Z., Marbach, T., Li, J., Wang, G., and Liu, X. Load prediction for data centers based on database service. In Proceedings of the 42^nd Annual Computer Software and Applications Conf. (2018).Google ScholarCross Ref
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. In Proceedings of the 34^th Intern. Conf. Very Large Data Bases (2008).Google ScholarDigital Library
Chen, S., Shen, Y., and Zhu, Y. Modeling conceptual characteristics of virtual machines for CPU utilization prediction. In Proceedings of the Intern. Conf. Conceptual Modeling (2018).Google ScholarCross Ref
Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., and Bianchini, R. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the Intern. Symp Operating Systems Principles (2017).Google ScholarDigital Library
Crankshaw, D. et al. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. In Proceedings of the 7^th Biennial Conf. Innovative Data Systems Research (2015).Google Scholar
Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. Clipper: A low-latency online prediction serving system. In Proceedings of the 14^th Symp. Networked Systems Design and Implementation (2017).Google Scholar
Delimitrou, C. and Kozyrakis, C. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18^th Intern. Conf. Architectural Support for Programming Languages and Operating Systems (2013).Google ScholarDigital Library
Fox, A., Kiciman, E., and Patterson, D. Combining statistical monitoring and predictable recovery for self-management. In Proceedings of the 1^st Workshop on Self-Managed Systems (2004).Google ScholarDigital Library
Gao, J. Machine Learning Applications For Datacenter Optimization, 2014.Google Scholar
Gong, Z., Gu, X., and Wilkes, J. Press: Predictive elastic resource scaling for cloud systems. In Proceedings of the Intern. Conf. Network and Service Management (2010).Google Scholar
Google. TensorFlow serving; http://tensorflow.github.io/serving/.Google Scholar
Islam, S., Keung, J., Lee, K., and Liu, A. Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems 28, 1 (2012).Google ScholarDigital Library
Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
Khan, A., Yan, X., Tao, S., and Anerousis, N. Workload characterization and prediction in the cloud: A multiple time series approach. In Proceedings of the Intern. Conf. Network and Service Management (2012).Google ScholarCross Ref
Mao, H., Alizadeh, M., Menache, I., and Kandula, S. Resource management with deep reinforcement learning. In Proceedings of the 15^th ACM Workshop on Hot Topics in Networks (2016).Google ScholarDigital Library
Microsoft Azure. Azure Monitor; https://azure.microsoft.com/en-us/services/monitor/.Google Scholar
Moritz, P. et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13^th USENIX Symp. Operating Systems Design and Implementation (2018).Google Scholar
Novakovic, D., Vasic, N., Novakovic, S., Kostic, D., and Bianchini, R. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proceedings of the USENIX Annual Technical Conf. (2013).Google Scholar
Rao, J., Bu, X., Xu, C.-Z., Wang, L., and Yin, G. VCONF: A reinforcement learning approach to virtual machine auto-configuration. In Proceedings of the 6^th Intern. Conf. Autonomic Computing (2009).Google ScholarDigital Library
Roy, N., Dubey, A., and Gokhale, A. Efficient autoscaling in the cloud using predictive models for workload forecasting. In Proceedings of the Intern. Conf. on Cloud Computing (2011).Google ScholarDigital Library
Yadwadkar, N.J. Machine learning for automatic resource management in the datacenter and the cloud. Ph.D. thesis, UC Berkeley, 2018.Google Scholar
Zhang, Y., Prekas, G., Fumarola, G. M., Fontoura, M., Goiri, I., and Bianchini, R. History-Based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of the Intern. Symp. Operating Systems Design and Implementation (2016).Google Scholar
Zheng, W., Nguyen, T.D., and Bianchini, R. Automatic configuration of Internet services. In Proceedings of the 2^nd European Conf. Computer systems (2007).Google ScholarDigital Library

Index Terms

Toward ML-centric cloud platforms

Recommendations

Machine learning (ML)-centric resource management in cloud computing: A review and future directions
Abstract
Cloud computing has rapidly emerged as a model for delivering Internet-based utility computing services. Infrastructure as a Service (IaaS) is one of the most important and rapidly growing models in cloud computing. Scalability, ...
Read More
An inter-cloud bridge system for heterogeneous cloud platforms

Over the years, more cloud computing systems have been developed providing flexible interfaces for inter-cloud interaction. This work approaches the concept of inter-cloud by utilizing APIs, open source specifications and exposed interfaces from cloud ...
Read More
Toward cloud-agnostic middlewares
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications

Cloud computing is a natural progression of service-oriented architecture. The Web as the platform: data with Web 2.0, programming and development with mashups, and deployments and resource provisioning with cloud computing. However, the Web was not ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 63, Issue 2
February 2020
80 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3380852
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 January 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 6,199
  Total Downloads
- Downloads (Last 12 months)241
- Downloads (Last 6 weeks)70
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Toward ML-centric cloud platforms

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Machine learning (ML)-centric resource management in cloud computing: A review and future directions

An inter-cloud bridge system for heterogeneous cloud platforms

Toward cloud-agnostic middlewares

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Toward ML-centric cloud platforms

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Machine learning (ML)-centric resource management in cloud computing: A review and future directions

An inter-cloud bridge system for heterogeneous cloud platforms

Toward cloud-agnostic middlewares

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media