Abstract
Exploring the opportunities to use ML, the possible designs, and our experience with Microsoft Azure.
- Abadi, M. et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symp. Operating System Design and Implementation (2016).Google Scholar
- Agarwal, A. et al. Making contextual decisions with low technical debt. arXiv preprint arXiv:1606.03966 (2016).Google Scholar
- Amazon Web Services. Amazon CloudWatch; https://aws.amazon.com/cloudwatch/.Google Scholar
- Bodik, P., Griffith, R., Sutton, C., Fox, A., Jordan, M., and Patterson, D. Statistical machine learning makes automatic control practical for Internet datacenters. In Proceedings of HotCloud (2009).Google Scholar
- Calheiros, R. N., Masoumi, E., Ranjan, R., and Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications' QoS. IEEE Trans. Cloud Computing 3, 4 (2015).Google ScholarDigital Library
- Cao, R., Yu, Z., Marbach, T., Li, J., Wang, G., and Liu, X. Load prediction for data centers based on database service. In Proceedings of the 42nd Annual Computer Software and Applications Conf. (2018).Google ScholarCross Ref
- Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. In Proceedings of the 34th Intern. Conf. Very Large Data Bases (2008).Google ScholarDigital Library
- Chen, S., Shen, Y., and Zhu, Y. Modeling conceptual characteristics of virtual machines for CPU utilization prediction. In Proceedings of the Intern. Conf. Conceptual Modeling (2018).Google ScholarCross Ref
- Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., and Bianchini, R. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the Intern. Symp Operating Systems Principles (2017).Google ScholarDigital Library
- Crankshaw, D. et al. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. In Proceedings of the 7th Biennial Conf. Innovative Data Systems Research (2015).Google Scholar
- Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. Clipper: A low-latency online prediction serving system. In Proceedings of the 14th Symp. Networked Systems Design and Implementation (2017).Google Scholar
- Delimitrou, C. and Kozyrakis, C. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th Intern. Conf. Architectural Support for Programming Languages and Operating Systems (2013).Google ScholarDigital Library
- Fox, A., Kiciman, E., and Patterson, D. Combining statistical monitoring and predictable recovery for self-management. In Proceedings of the 1st Workshop on Self-Managed Systems (2004).Google ScholarDigital Library
- Gao, J. Machine Learning Applications For Datacenter Optimization, 2014.Google Scholar
- Gong, Z., Gu, X., and Wilkes, J. Press: Predictive elastic resource scaling for cloud systems. In Proceedings of the Intern. Conf. Network and Service Management (2010).Google Scholar
- Google. TensorFlow serving; http://tensorflow.github.io/serving/.Google Scholar
- Islam, S., Keung, J., Lee, K., and Liu, A. Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems 28, 1 (2012).Google ScholarDigital Library
- Ke, G. et al. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- Khan, A., Yan, X., Tao, S., and Anerousis, N. Workload characterization and prediction in the cloud: A multiple time series approach. In Proceedings of the Intern. Conf. Network and Service Management (2012).Google ScholarCross Ref
- Mao, H., Alizadeh, M., Menache, I., and Kandula, S. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (2016).Google ScholarDigital Library
- Microsoft Azure. Azure Monitor; https://azure.microsoft.com/en-us/services/monitor/.Google Scholar
- Moritz, P. et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Symp. Operating Systems Design and Implementation (2018).Google Scholar
- Novakovic, D., Vasic, N., Novakovic, S., Kostic, D., and Bianchini, R. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proceedings of the USENIX Annual Technical Conf. (2013).Google Scholar
- Rao, J., Bu, X., Xu, C.-Z., Wang, L., and Yin, G. VCONF: A reinforcement learning approach to virtual machine auto-configuration. In Proceedings of the 6th Intern. Conf. Autonomic Computing (2009).Google ScholarDigital Library
- Roy, N., Dubey, A., and Gokhale, A. Efficient autoscaling in the cloud using predictive models for workload forecasting. In Proceedings of the Intern. Conf. on Cloud Computing (2011).Google ScholarDigital Library
- Yadwadkar, N.J. Machine learning for automatic resource management in the datacenter and the cloud. Ph.D. thesis, UC Berkeley, 2018.Google Scholar
- Zhang, Y., Prekas, G., Fumarola, G. M., Fontoura, M., Goiri, I., and Bianchini, R. History-Based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of the Intern. Symp. Operating Systems Design and Implementation (2016).Google Scholar
- Zheng, W., Nguyen, T.D., and Bianchini, R. Automatic configuration of Internet services. In Proceedings of the 2nd European Conf. Computer systems (2007).Google ScholarDigital Library
Index Terms
- Toward ML-centric cloud platforms
Recommendations
Machine learning (ML)-centric resource management in cloud computing: A review and future directions
AbstractCloud computing has rapidly emerged as a model for delivering Internet-based utility computing services. Infrastructure as a Service (IaaS) is one of the most important and rapidly growing models in cloud computing. Scalability, ...
An inter-cloud bridge system for heterogeneous cloud platforms
Over the years, more cloud computing systems have been developed providing flexible interfaces for inter-cloud interaction. This work approaches the concept of inter-cloud by utilizing APIs, open source specifications and exposed interfaces from cloud ...
Toward cloud-agnostic middlewares
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applicationsCloud computing is a natural progression of service-oriented architecture. The Web as the platform: data with Web 2.0, programming and development with mashups, and deployments and resource provisioning with cloud computing. However, the Web was not ...
Comments