ABSTRACT
The public cloud is moving to a Platform-as-a-Service model where services such as data management, machine learning or image classification are provided by the cloud operator while applications are written in high-level languages and leverage these services.
Managed languages such as Java, Python or Scala are widely used in this setting. However, while these languages can increase productivity, they are often associated with problems such as unpredictable garbage collection pauses or warm-up overheads.
We argue that the reason for these problems is that current language runtime systems were not initially designed for the cloud setting. To address this, we propose seven tenets for designing future language runtime systems for cloud data centers. We then outline the design of a general substrate for building such runtime systems, based on these seven tenets.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).Google Scholar
- Apache Arrow. 2017. Powering Columnar In-Memory Analytics. (2017). https://arrow.apache.org/Google Scholar
- Krste Asanovic and D Patterson. 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. In USENIX FAST, Vol. 13.Google Scholar
- Microsoft Azure. 2017. Machine Learning. (2017). https://azure.microsoft.com/en-us/services/machine-learning/Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In 22nd Symposium on Operating Systems Principles (SOSP '09). Google ScholarDigital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14).Google ScholarDigital Library
- Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKinley. 2004. Oil and Water? High Performance Garbage Collection in Java with MMTk. In 26th International Conference on Software Engineering (ICSE '04). Google ScholarCross Ref
- Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes. Commun. ACM 59, 5 (April 2016), 50--57. Google ScholarDigital Library
- Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A Cloud-Scale Acceleration Architecture. In 49th Annual IEEE/ACM International Symposium on Microarchitecture.Google ScholarDigital Library
- Huawei Press Center. 2017. Huawei proposed DC 3.0 architecture of future data center to meet the requirement of real-time data processing in big data era. (2017). http://pr.huawei.com/en/news/hw-423134-3.0.htmGoogle Scholar
- Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. DianNao Family: Energy-efficient Hardware Accelerators for Machine Learning. Commun. ACM 59, 11 (Oct. 2016), 105--112. Google ScholarDigital Library
- Eric S. Chung, John D. Davis, and Jaewon Lee. 2013. LINQits: Big Data on Little Clients. In 40th International Symposium on Computer Architecture (ISCA '13). Google ScholarDigital Library
- Databricks. 2015. Project Tungsten: Bringing Spark Closer to Bare Metal. (2015). https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.htmlGoogle Scholar
- Jesper de Jong. 2015. Project Valhalla -- Value Types. (2015). http://www.jesperdj.com/2015/10/04/project-valhalla-value-types/Google Scholar
- Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. 2015. Beyond Processor-centric Operating Systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV).Google Scholar
- Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network Requirements for Resource Disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).Google Scholar
- Nicolas Geoffray, Gaël Thomas, Julia Lawall, Gilles Muller, and Bertil Folliot. VMKit: A Substrate for Managed Runtime Environments. In 6th ACM International Conference on Virtual Execution Environments (VEE '10). Google ScholarDigital Library
- Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Viswani, Dimitrios Vytiniotis, Ganesan Ramalingan, Manuel Costa, Derek Murray, Steven Hand, and Michael Isard. 2015. Broom: sweeping out Garbage Collection from Big Data systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV).Google ScholarDigital Library
- Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, and Steven Hand. 2015. Musketeer: All for One, One for All in Data Processing Systems. In EuroSys '15.Google Scholar
- Google. 2017. Google App Engine: Platform as a Service. (2017). https://developers.google.com/appengineGoogle Scholar
- Shan Shan Huang, Amir Hormati, David F. Bacon, and Rodric Rabbah. 2008. Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary. In ECOOP '08.Google Scholar
- InfoQ. 2017. Azul Systems Launches Falcon, a New Just-in-Time Compiler for Java, Based on LLVM. (2017). https://www.infoq.com/news/2017/05/azul-falconGoogle Scholar
- Intel. 2017. Intel® Rack Scale Design. (2017). http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.htmlGoogle Scholar
- Mick Jordan, Laurent Daynès, Grzegorz Czajkowski, Marcin Jarzab, and Ciarán Bryce. 2004. Scaling J2EE Application Servers with the Multi-tasking Virtual Machine. Technical Report. Sun Microsystems, Inc., Mountain View, CA, USA.Google Scholar
- Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, et al. 2017, arXiv:1704.04760. In-Datacenter Performance Analysis of a Tensor Processing Unit. (2017, arXiv:1704.04760).Google Scholar
- Data Center Knowledge. 2013. Meet the Future of Data Center Rack Technologies. (Feb. 2013). http://www.datacenterknowledge.com/archives/2013/02/20/meet-the-future-of-data-center-rack-technologies/Google Scholar
- David Koeplinger, Christina Delimitrou, Raghu Prabhakar, Christos Kozyrakis, Yaqi Zhang, and Kunle Olukotun. 2016. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. In 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarDigital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). Google ScholarCross Ref
- David Lion, Adrian Chiu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. 2016. Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).Google Scholar
- Martin Maas, Krste Asanovic, and John Kubiatowicz. Grail Quest: A New Proposal for Hardware-assisted Garbage Collection. In 6th Workshop on Architectures and Systems for Big Data (ASBD '16).Google Scholar
- Martin Maas, Krste Asanović, Tim Harris, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). Google ScholarDigital Library
- Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. Trash Day: Coordinating Garbage Collection in Distributed Systems. In 5th Workshop on Hot Topics in Operating Systems (HotOS XV).Google Scholar
- Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Google ScholarDigital Library
- Microsoft Developer Network. 2017. Application Domains. (2017). https://msdn.microsoft.com/en-us/library/2bh4z9hs(v=vs.110).aspxGoogle Scholar
- Microsoft Developer Network. 2017. Ngen.exe (Native Image Generator). (2017). https://msdn.microsoft.com/en-us/library/6t9t5wcf(v=vs.110).aspxGoogle Scholar
- Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-Performance Big-Data-Friendly Garbage Collector. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16).Google Scholar
- Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). Google ScholarDigital Library
- OpenJDK. 2017. JEP 295: Ahead-of-Time Compilation. (2017). http://openjdk.java.net/jeps/295Google Scholar
- Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In 12th Symposium on Networked Systems Design and Implementation (NSDI '15).Google Scholar
- Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, Song Jiang, undefined, undefined, undefined, and undefined. 2014. SDA: Software-defined accelerator for large-scale DNN systems. 2014 IEEE Hot Chips 26 Symposium (2014).Google ScholarCross Ref
- Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. Weld: A Common Runtime for High Performance Data Analytics. In 8th biennial Conference on Innovative Data Systems Research (CIDR '17).Google Scholar
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14).Google ScholarDigital Library
- Google Cloud Platform. 2017. Cloud Dataflow - Batch & Stream Data Processing. (2017). https://cloud.google.com/dataflow/Google Scholar
- Tom's IT Pro. 2016. Cloud 3.0 And 'Building Scale' At Interop. (May 2016). http://www.tomsitpro.com/articles/interop-cloud-3-building-scale,1-3277.htmlGoogle Scholar
- Android Open Source Project. 2017. Implementing ART Just-In-Time (JIT) Compiler. (2017). https://source.android.com/devices/tech/dalvik/jit-compiler.htmlGoogle Scholar
- Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: A Compiler and Runtime for Heterogeneous Systems. In 24th ACM Symposium on Operating Systems Principles (SOSP '13). Google ScholarDigital Library
- Amazon Web Services. 2017. Amazon EC2 F1 Instances. (2017). http://aws.amazon.com/ec2/instance-types/f1/Google Scholar
- Amazon Web Services. 2017. Amazon Machine Learning - Predictive Analytics with AWS. (2017). http://aws.amazon.com/machine-learning/Google Scholar
- Amazon Web Services. 2017. AWS Lambda - Serverless Compute. (2017). http://aws.amazon.com/lambda/Google Scholar
- Gil Tene, Balaji Iyengar, and Michael Wolf. 2011. C4: The Continuously Concurrent Compacting Collector. In ISMM '11.Google ScholarDigital Library
- Christian Wimmer and Thomas Würthinger. 2012. Truffle: A Self-optimizing Runtime System. In 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity (SPLASH '12). Google ScholarDigital Library
Index Terms
- Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era
Recommendations
Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel ProcessingServerless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Cloud Computing and the Common Man
The cloud offers several advantages, but until some of its risks are better understood many major players might hold back.
Managing appliance launches in infrastructure clouds
TG '11: Proceedings of the 2011 TeraGrid Conference: Extreme Digital DiscoveryInfrastructure cloud computing introduces a significant paradigm shift that has the potential to revolutionize how scientific computing is done. However, while it is actively adopted by a number of scientific communities, it is still lacking a well-...
Comments