research-article

Rhythm: component-distinguishable workload deployment in datacenters

Authors:
Laiping Zhao

Tianjin University

Tianjin University
View Profile

,
Yanan Yang

Tianjin University

Tianjin University
View Profile

,
Kaixuan Zhang

Tianjin University

Tianjin University
View Profile

,
Xiaobo Zhou

Tianjin University

Tianjin University
View Profile

,
Tie Qiu

Tianjin University

Tianjin University
View Profile

,
Keqiu Li

Tianjin University

Tianjin University
View Profile

,
Yungang Bao

CAS

CAS
View Profile

EuroSys '20: Proceedings of the Fifteenth European Conference on Computer SystemsApril 2020Article No.: 19Pages 1–17https://doi.org/10.1145/3342195.3387534

Published:17 April 2020Publication History

EuroSys '20: Proceedings of the Fifteenth European Conference on Computer Systems

Pages 1–17

Editorial Notes

A corrigendum was issued for this article on July 15, 2020. You can download the corrigendum from the supplemental material section of this citation page.

ABSTRACT

Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat an LC workload as a whole when allocating resources to BE jobs and neglect the different features of components of an LC workload. This kind of coarse-grained co-location method leaves a significant room for improvement in resource utilization.

Based on the observation of the inconsistent interference tolerance abilities of different LC components, we propose a new abstraction called Servpod, which is a collection of a LC parts that are deployed on the same physical machine together, and show its merits on building a fine-grained co-location framework. The key idea is to differentiate the BE throughput launched with each LC Servpod, i.e., Servpod with high interference tolerance ability can be deployed along with more BE jobs. Based on Servpods, we present Rhythm, a co-location controller that maximizes the resource utilization while guaranteeing LC service's tail latency requirement. It quantifies the interference tolerance ability of each servpod through the analysis of tail-latency contribution. We evaluate Rhythm using LC services in forms of containerized processes and microservices, and find that it can improve the system throughput by 31.7%, CPU utilization by 26.2%, and memory bandwidth utilization by 34% while guaranteeing the SLA (service level agreement).

Supplemental Material

Available for Download

pdf

a19-zhao-corrigendum.pdf (134.2 KB)

Corrigendum to "Rhythm: component-distinguishable workload deployment in datacenters" by Zhao et al., Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20).

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martion Wicke, Yuan Yu, and Zheng Xiaoqiang. 2016. Tensorflow: a system for large-scale machine learning.. In OSDI, Vol. 16. 265--283.Google ScholarDigital Library
S. Agarwala, F. Alegre, K. Schwan, and J. Mehalingham. 2007. E2EProf: Automated End-to-End Performance Management for Enterprise Systems. In The 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). 749--758. Google ScholarDigital Library
Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. 2003. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the Nineteenth ACM Symposium on (Operating Systems Principles (SOSP '03). Association for Computing Machinery, New York, NY, USA, 74--89.Google ScholarDigital Library
Paul Barham, Richard Black, Moises Goldszmidt, Rebecca Isaacs, John MacCormick, Richard Mortier, and Aleksandr Simma. 2008. Constellation: automated discovery of service and host dependencies in networked systems. Technical Report MSR-TR-2008-67. 1--14 pages.Google Scholar
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. 2004. Using Magpie for request extraction and workload modelling. In Proceedings of the Sixth USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2004 (proceedings of the sixth usenix symposium on operating systems design and implementation (osdi) 2004 ed.). 259--272.Google ScholarDigital Library
Sean Kenneth Barker and Prashant Shenoy. 2010. Empirical Evaluation of Latency-sensitive Application Performance in the Cloud. In Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems (MMSys '10). ACM, New York, NY, USA, 35--46.Google ScholarDigital Library
P. Chen, Y. Qi, and D. Hou. 2019. CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment. IEEE Transactions on Services Computing 12, 2 (March 2019), 214--230. Google ScholarCross Ref
Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). ACM, New York, NY, USA, 107--120.Google Scholar
Xu Chen, Ming Zhang, Z. Morley Mao, and Paramvir Bahl. 2008. Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, USA, 117--130.Google Scholar
MIchael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. 2014. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 217--231. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chowGoogle ScholarDigital Library
The Internet Traffic Archive ClarkNet. 2017. http://ita.ee.lbl.gov/html/traces.html.Google Scholar
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 308--319.Google ScholarDigital Library
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (Feb. 2013), 74--80.Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In 2013 IEEE international symposium on workload characterization (IISWC). IEEE, 23--33.Google ScholarCross Ref
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices, Vol. 48. ACM, 77--88.Google Scholar
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127--144.Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. 2016. HCloud: Resource-Efficient Provisioning in Shared Cloud Systems. SIGPLAN Not. 51, 4 (March 2016), 473--488.Google Scholar
Elasticsearch. 2019. Elasticsearch: a search engine based on the Lucene library. https://lucene.apache.org/solr/.Google Scholar
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. SIGPLAN Not. 47, 4 (March 2012), 37--48.Google ScholarDigital Library
Rodrigo Fonseca, George Porter, Randy H. Katz, and Scott Shenker. 2007. X-Trace: A Pervasive Network Tracing Framework. In 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 07). USENIX Association, Cambridge, MA. https://www.usenix.org/conference/nsdi-07/x-trace-pervasive-network-tracing-frameworkGoogle ScholarDigital Library
Yu Gan and Christina Delimitrou. 2018. The Architectural Implications of Cloud Microservices. IEEE Computer Architecture Letters 17, 2 (July 2018), 155--158.Google ScholarCross Ref
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). ACM, New York, NY, USA, 3--18.Google ScholarDigital Library
Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Zhen Jia, Biwei Xie, Chen Zheng, Qiang Yang, and Haibin Wang. 2017. A Dwarf-based Scalable Big Data Benchmarking Methodology. CoRR abs/1711.03229 (2017).Google Scholar
Alexander N. Gorban, Lyudmila I. Pokidysheva, Elena V. Smirnova, and Tatiana A. Tyukina. 2011. Law of the Minimum Paradoxes. Bulletin of Mathematical Biology 73 (2011), 2013--2044.Google ScholarCross Ref
Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying Effects of Shared On-chip Resource Interference for Consolidated Virtual Machines. In Proceedings of the 2Nd ACM Symposium on Cloud Computing (SOCC '11). ACM, New York, NY, USA, 22:1--22:14.Google ScholarDigital Library
Jing Guo, Zihao Chang, Sa Wang, Haiyang Ding, Yihui Feng, Liang Mao, and Yungang Bao. 2019. Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces. In Proceedings of the International Symposium on Quality of Service (IWQoS '19). ACM, New York, NY, USA, Article 39, 10 pages.Google ScholarDigital Library
Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 519--532.Google ScholarDigital Library
Alexandru Iosup, Nezih Yigitbasi, and Dick Epema. 2011. On the Performance Variability of Production Cloud Services. In 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 104--113.Google Scholar
Ravi R. Iyer. 2004. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In International Conference on Supercomputing. 257--266.Google ScholarDigital Library
Bart Jacob, Paul Larson, B Leitao, and SAMM Da Silva. 2008. SystemTap: instrumenting the Linux kernel for analyzing performance and functional problems. IBM Redbook (2008).Google Scholar
jaeger. 2019. https://www.jaegertracing.io/.Google Scholar
M. K. Jeong, M. Erez, C. Sudanthi, and N. Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In DAC Design Automation Conference 2012. 850--855.Google Scholar
Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja Jayant Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. CoRR abs/1902.03383 (2019).Google Scholar
Jonathan Kaldor, Jonathan Mace, Michal Bejda, Edison Gao, Wiktor Kuropatwa, Joe O'Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song. 2017. Canopy: An End-to-End Performance Tracing And Analysis System. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017. ACM, 34--50.Google ScholarDigital Library
M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. 2012. Measuring interference between live datacenter applications. In High PERFORMANCE Computing, Networking, Storage and Analysis. 1--12.Google Scholar
Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article Article 34, 16 pages.Google ScholarDigital Library
Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. In ACM SIGPLAN Notices, Vol. 49. ACM, 729--742.Google ScholarDigital Library
Darja Krushevskaja and Mark Sandler. 2013. Understanding latency variations of black box services. In Proceedings of the 22nd international conference on World Wide Web. ACM, 703--714.Google ScholarDigital Library
Kubernetes. 2019. https://kubernetes.io/.Google Scholar
Qixiao Liu and Zhibin Yu. 2018. The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: A View from Alibaba Trace. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '18). ACM, New York, NY, USA, 347--360.Google ScholarDigital Library
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 450--462.Google Scholar
LTTng. 2019. https://lttng.org/.Google Scholar
Jiuyue Ma, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen, Haibin Wang, Lixin Zhang, and Yungang Bao. 2015. Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD). SIGARCH Comput. Archit. News 43, 1 (March 2015), 131--143.Google ScholarDigital Library
Jonathan Mace, Peter Bodik, Rodrigo Fonseca, and Madanlal Musuvathi. 2015. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 589--603.Google Scholar
A. K. Maji, S. Mitra, and S. Bagchi. 2015. ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services. In 2015 IEEE International Conference on Autonomic Computing. 91--100.Google Scholar
Haroon Malik, Hadi Hemmati, and Ahmed E Hassan. 2013. Automatic detection of performance deviations in the load testing of large scale systems. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 1012--1021.Google ScholarCross Ref
Raman Manikantan, Kaushik Rajan, and Ramaswamy Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Computer Architecture (ISCA), 2012 39th Annual International Symposium on. IEEE, 428--439.Google ScholarCross Ref
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248--259.Google ScholarDigital Library
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 248--259.Google ScholarDigital Library
D. A. Menasce. 2002. TPC-W: A Benchmark for E-Commerce. IEEE Internet Computing 6 (05 2002), 83--87.Google Scholar
Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems. ACM, 237--250.Google ScholarDigital Library
Rajiv Nishtala, Vinicius Petrucci, Paul Carpenter, and Magnus Sjalander. 2020. Twig : Multi-Agent Task Management for Colocated Latency-Critical Cloud Services. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 167--179.Google ScholarCross Ref
Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC'13). USENIX Association, Berkeley, CA, USA, 219--230.Google Scholar
Numactl. 2019. https://github.com/numactl/numactl.Google Scholar
Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Ylä-Jääski, and Pan Hui. 2012. Exploiting Hardware Heterogeneity Within the Same Instance Type of Amazon EC2. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing (HotCloud'12). USENIX Association, Berkeley, CA, USA, 4--4.Google ScholarDigital Library
Ioannis Papadakis, Konstantinos Nikas, Vasileios Karakostas, Georgios Goumas, and Nectarios Koziris. 2017. Improving QoS and Utilisation in modern multi-core servers with Dynamic Cache Partitioning. In Proceedings of the Joined Workshops COSH 2017 and VisorHPC 2017, Carsten Clauss, Stefan Lankes, Carsten Trinitis, and Josef Weidendorfer (Eds.). Stockholm, Sweden, 21--26.Google Scholar
Jinsu Park, Seongbeom Park, and Woongki Baek. 2019. CoPart: Co-ordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, New York, NY, USA, Article 10, 16 pages.Google ScholarDigital Library
Tirthak Patel and Devesh Tiwari. 2020. CLITE : Efficient and QoS-Aware Co-location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193--206.Google ScholarCross Ref
Xing Pu, Ling Liu, Yiduo Mei, Sankaran Sivathanu, Younggyun Koh, and Calton Pu. 2010. Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments. In Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD '10). IEEE Computer Society, Washington, DC, USA, 51--58.Google ScholarDigital Library
Navaneeth Rameshan, Leandro Navarro, Enric Monte, and Vladimir Vlassov. 2014. Stay-Away, Protecting Sensitive Applications from Performance Interference. In Proceedings of the 15th International Middleware Conference (Middleware '14). ACM, New York, NY, USA, 301--312.Google ScholarDigital Library
Redis. 2019. Redis: an open source, in-memory data structure store. https://redis.io.Google Scholar
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). ACM, New York, NY, USA, 7:1--7:13.Google Scholar
B. Sang, J. Zhan, G. Lu, H. Wang, D. Xu, L. Wang, Z. Zhang, and Z. Jia. 2012. Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes. IEEE Transactions on Parallel and Distributed Systems 23, 6 (June 2012), 1159--1167.Google ScholarDigital Library
Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 460--471.Google ScholarDigital Library
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc.Google Scholar
S. Sivathanu, X. Pu, L. Liu, X. Dong, and Y. Mei. 2013. Performance Analysis of Network I/O Workloads in Virtualized Data Centers. IEEE Transactions on Services Computing 6 (01 2013), 48--63.Google Scholar
Solr. 2019. Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. https://www.elastic.co.Google Scholar
Shekhar Srikantaiah, Mahmut Kandemir, and Qian Wang. 2009. SHARP control: controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 517--528.Google ScholarDigital Library
Christopher Stewart and Kai Shen. 2005. Performance modeling and system management for multi-component online services. In Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation-Volume 2. USENIX Association, 71--84.Google ScholarDigital Library
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, and Gregory R. Ganger. 2006. Stardust: Tracking Activity in a Distributed Storage System. SIGMETRICS Perform. Eval. Rev. 34, 1 (June 2006), 3--14. Google ScholarDigital Library
A Tirumala, F Qin, J Dugan, J Ferguson, and K Gibbs. 2005. Iperf: The TCP/UDP bandwidth measurement tool. http.dast.nlanr.net/Projects 38 (2005).Google Scholar
Guohui Wang and T. S. Eugene Ng. 2010. The Impact of Virtualization on Network Performance of Amazon EC2 Data Center. In Proceedings of the 29th Conference on Information Communications (INFOCOM'10). IEEE Press, Piscataway, NJ, USA, 1163--1171.Google ScholarDigital Library
Fei Xu, Fangming Liu, Linghui Liu, Hai Jin, Bo Li, and Baochun Li. 2014. iAware: Making Live Migration of Virtual Machines Interference-Aware in the Cloud. IEEE Trans. Comput. 63, 12 (Dec. 2014), 3012--3025.Google ScholarDigital Library
H. Xu, X. Ning, H. Zhang, J. Rhee, and G. Jiang. 2016. PInfer: Learning to Infer Concurrent Request Paths from System Kernel Events. In 2016 IEEE International Conference on Autonomic Computing (ICAC). 199--208. Google ScholarCross Ref
Ran Xu, Subrata Mitra, Jason Rahman, Peter Bai, Bowen Zhou, Greg Bronevetsky, and Saurabh Bagchi. 2018. Pythia: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads. In Proceedings of the 19th International Middleware Conference (Middleware '18). ACM, New York, NY, USA, 146--160.Google ScholarDigital Library
Neeraja J Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. 2014. Wrangler: Predictable and faster jobs using fewer resources. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--14.Google ScholarDigital Library
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 607--618.Google ScholarDigital Library
Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 309--322.Google Scholar
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI 2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems. 379--391.Google ScholarDigital Library
Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang. 2014. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 406--418.Google Scholar
Yunqi Zhang, Michael A Laurenzano, Jason Mars, and Lingjia Tang. 2014. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. IEEE, 406--418.Google ScholarDigital Library
Jiacheng Zhao, Huimin Cui, Jingling Xue, and Xiaobing Feng. 2016. Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis. IEEE Trans. Parallel Distrib. Syst. 27, 5 (May 2016), 1443--1456.Google ScholarDigital Library
Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng, Youliang Yan, and Wensen Yang. 2013. An Empirical Model for Predicting Cross-core Performance Interference on Multicore Processors. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). IEEE Press, Piscataway, NJ, USA, 201--212.Google ScholarDigital Library
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. ACM SIGARCH Computer Architecture News 44, 2 (2016), 33--47.Google ScholarDigital Library
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.Google ScholarCross Ref
Zipkin. 2019. https://zipkin.io/.Google Scholar

Recommendations

Traffic-sensitive live migration of virtual machines
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

In this paper we address the problem of network contention between the migration traffic and the Virtual Machine (VM) application traffic for the live migration of co-located Virtual Machines. When VMs are migrated with pre-copy, they run at the source ...
Read More
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
Read More
Nosv

nOSV can provide a bare-metal like performance for HPC applications on Cloud.The CPU cores and main memory are not shared among guest VMs of nOSV.Dedicated I/O resources are allocated to I/O sensitive HPC guests.Other virtualization environments can run ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '20: Proceedings of the Fifteenth European Conference on Computer Systems
April 2020
49 pages
ISBN:9781450368827
DOI:10.1145/3342195
General Chairs:
Angelos Bilas
University of Crete and FORTH-ICS
,
Kostas Magoutis
University of Crete and FORTH-ICS
,
Evangelos Markatos
University of Crete and FORTH-ICS
,
Program Chairs:
Dejan Kostic
KTH Royal Institute of Technology, Sweden
,
Margo Seltzer
The University of British Columbia, Canada
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '20 Paper Acceptance Rate43of234submissions,18%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,191
  Total Downloads
- Downloads (Last 12 months)184
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rhythm: component-distinguishable workload deployment in datacenters

EuroSys '20: Proceedings of the Fifteenth European Conference on Computer Systems

Editorial Notes

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Recommendations

Traffic-sensitive live migration of virtual machines

SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices

Nosv