skip to main content
research-article

DVM: towards a datacenter-scale virtual machine

Authors Info & Claims
Published:03 March 2012Publication History
Skip Abstract Section

Abstract

As cloud-based computation becomes increasingly important, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud systems use existing virtual machine monitor (VMM) technologies, such as Xen, VMware, and Windows Hypervisor, to multiplex a physical host into multiple virtual hosts and isolate computation on the shared cluster platform. However, traditional multiplexing VMMs do not scale beyond one single physical host, and it alone cannot provide the programming interface and cluster-wide computation that a datacenter system requires. We design a new instruction set architecture, DISA, to unify myriads of compute nodes to form a big virtual machine called DVM, and present programmers the view of a single computer where thousands of tasks run concurrently in a large, unified, and snapshotted memory space. The DVM provides a simple yet scalable programming model and mitigates the scalability bottleneck of traditional distributed shared memory systems. Along with an efficient execution engine, the capacity of a DVM can scale up to support large clusters. We have implemented and tested DVM on three platforms, and our evaluation shows that DVM has excellent performance in terms of execution time and speedup. On one physical host, the system overhead of DVM is comparable to that of traditional VMMs. On 16 physical hosts, the DVM runs 10 times faster than MapReduce/Hadoop and X10. On 256 EC2 instances, DVM shows linear speedup on a parallelizable workload.

References

  1. Amazon Elastic Compute Cloud -- EC2. phhttp://aws.amazon.com/ec2/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  2. Windows Azure. phhttp://www.microsoft.com/windowsazure/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  3. Rackspace. phhttp://www.rackspace.com/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  4. E. Allen, D. Chase, J. Hallett, V. Luchangco, J. Maessen, S. Ryu, G. Steele Jr, S. Tobin-Hochstadt, J. Dias, C. Eastlund, et al. The Fortress language specification. phhttps://labs.oracle.com/projects/plrg/fortress.pdf, 2008. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  5. }hadoop.poweredbyApache Hadoop. Hadoop Users List. phhttp://wiki.apache.org/hadoop/PoweredBy. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  6. }mahoutApache Mahout. Mahout machine learning libraries. phhttp://mahout.apache.org/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  7. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A Berkeley view of cloud computing. phUC Berkeley Technical Report UCB/EECS-2009--28, February 2009.Google ScholarGoogle Scholar
  8. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In phProceedings of the 19th ACM symposium on Operating Systems Principles, pages 164--177, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. le(2009)}barroso2009datacenterL. Barroso and U. Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. phSynthesis Lectures on Computer Architecture, 4 (1): 1--108, 2009.Google ScholarGoogle Scholar
  10. L. Barroso, J. Dean, and U. Hoelzle. Web search for a planet: The Google cluster architecture. phIEEE Micro, 23 (2): 22--28, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Birman, G. Chockler, and R. van Renesse. Toward a cloud computing research agenda. phSIGACT News, 40 (2): 68--80, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Buyya, T. Cortes, and H. Jin. Single system image. phIntl. Journal of High Performance Computing Applications, 15 (2): 124, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Chamberlain, D. Callahan, and H. Zima. Parallel programmability and the Chapel language. phInternational Journal of High Performance Computing Applications, 21 (3): 291, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Chapman and G. Heiser. vNUMA: A virtual shared-memory multiprocessor. In phProceedings of the 2009 conference on USENIX Annual technical conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. Von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In phACM SIGPLAN Notices, volume 40, pages 519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D.-K. Chen, H.-M. Su, and P.-C. Yew. The impact of synchronization and granularity on parallel systems. In phProceedings of the 17th annual intl. symposium on Computer Architecture, pages 239--248, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. In phProc. of the 15th ACM SIGKDD intl conf. on Knowledge discovery and data mining, pages 209--218, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In phProc. of NIPS'07, pages 281--288, 2007.Google ScholarGoogle Scholar
  19. T. Condie, N. Conway, P. Alvaro, J. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce online. In phProceedings of the 7th USENIX conf. on networked systems design and implementation, pages 21--21, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In phthe 6th Conference on Symposium on Operating Systems Design & Implementation, volume 6, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Ekanayake, S. Pallickara, and G. Fox. MapReduce for data intensive scientific analysis. In phFourth IEEE International Conference on eScience, pages 277--284, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. P. I. Forum. MPI: A message-passing interface standard. phhttp://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf, 2009. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  23. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In phProc. of the 9th ACM Symposium on Operating Systems Principles (SOSP'03), pages 29--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Hayes. Cloud computing. phCommunications of the ACM, 51 (7): 9--11, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. He, W. Fang, Q. Luo, N. Govindaraju, and T. Wang. Mars: a MapReduce framework on graphics processors. In phProceedings of the 17th international conference on parallel architectures and compilation techniques, pages 260--269, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Hedlund. Inverse virtualization for internet scale applications. phhttp://bradhedlund.com/2011/03/16/inverse-virtualization-for-inte%rnet-scale-applications/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  27. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In phEuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. u et al.(2010)Jégou, Douze, and Schmid}jegou2010improvingH. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. phInternational Journal of Computer Vision, 87 (3): 316--336, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Keleher, A. Cox, S. Dwarkadas, and W. Treadmarks. Distributed shared memory on standard workstations and operating systems. In phProc. 1994 Winter Usenix Conference, pages 115--131, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori. KVM: the linux virtual machine monitor. In phProceedings of the Linux Symposium, volume 1, pages 225--230, 2007.Google ScholarGoogle Scholar
  31. D. Lee, S. Baek, and K. Sung. Modified k-means algorithm for vector quantizer design. phSignal Processing Letters, IEEE, 4 (1): 2--4, 1997.Google ScholarGoogle Scholar
  32. K. Li and P. Hudak. Memory coherence in shared virtual memory systems. phACM Trans. Comput. Syst., 7 (4): 321--359, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Lu, S. Dwarkadas, A. Cox, and W. Zwaenepoel. Message passing versus distributed shared memory on networks of workstations. In phProc. of the IEEE/ACM Supercomputing 95 Conf., page 37, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Ma and L. Gu. The limitation of MapReduce: A probing case and a lightweight solution. In phProc. of the 1st Intl. Conf. on Cloud Computing, GRIDs, and Virtualization, pages 68--73, 2010.Google ScholarGoogle Scholar
  35. J. MacQueen. Some methods for classification and analysis of multivariate observations. In phProceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, page 14, 1967.Google ScholarGoogle Scholar
  36. }matlabMathWorks. Inc. Matlab. phhttp://www.mathworks.com/products/matlab/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  37. B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. phComputer, 24 (8): 52--60, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The Eucalyptus open-source cloud-computing system. In phProc. of the 9th IEEE/ACM Intl. Symposium on Cluster Computing and the Grid, pages 124--131, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. J. Nurnberg, U. K. Wiil, and D. L. Hicks. A grand unified theory for structural computing. phMetainformatics, 3002: 1--16, 2004.Google ScholarGoogle Scholar
  40. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. phSci. Program., 13 (4): 277--298, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In phProc. of the 2007 IEEE 13th Intl Symposium on High Performance Computer Architecture, pages 13--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Salesforce.com. phhttp://www.salesforce.com. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  43. M. C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. phBioinformatics, 25: 1363--1369, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. }rprojectThe R Project. The R Language. phhttp://www.r-project.org/. {last access: 11/2, 2011}.Google ScholarGoogle Scholar
  45. C. Tseng. Compiler optimizations for eliminating barrier synchronization. In phACM SIGPLAN Notices, volume 30, pages 144--155, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. A. Waldspurger. Memory resource management in VMware ESX server. phSIGOPS Oper. Syst. Rev., 36 (SI): 181--194, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-Merge: simplified relational data processing on large clusters. In phSIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029--1040, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In phthe 8th Conference on Symposium on Operating Systems Design & Implementation, pages 1--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In phEuroSys '10: Proceedings of the 5th European conference on computer systems, pages 265--278, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Zhang and A. Rudnicky. A large scale clustering scheme for kernel k-means. phPattern Recognition, 4: 40289, 2002.Google ScholarGoogle Scholar
  51. W. Zhao, H. Ma, and Q. He. Parallel k-means clustering based on mapreduce. In phroceedings of the First International Conference on Cloud Computiong (CloudCom), pages 674--679, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DVM: towards a datacenter-scale virtual machine

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 47, Issue 7
            VEE '12
            July 2012
            229 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2365864
            Issue’s Table of Contents
            • cover image ACM Conferences
              VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
              March 2012
              248 pages
              ISBN:9781450311762
              DOI:10.1145/2151024

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 March 2012

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader