skip to main content
research-article

Distributed application configuration, management, and visualization with plush

Published:12 December 2011Publication History
Skip Abstract Section

Abstract

Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, and maintenance in distributed environments, few tools provide a unified framework for application management. Many of the existing tools address the management needs of a single type of application or service that runs in a specific environment, and these tools are not adaptable enough to be used for other applications or platforms. To this end, we present the design and implementation of Plush, a fully configurable application management infrastructure designed to meet the general requirements of several different classes of distributed applications. Plush allows developers to specifically define the flow of control needed by their computations using application building blocks. Through an extensible resource management interface, Plush supports execution in a variety of environments, including both live deployment platforms and emulated clusters. Plush also uses relaxed synchronization primitives for improving fault tolerance and liveness in failure-prone environments. To gain an understanding of how Plush manages different classes of distributed applications, we take a closer look at specific applications and evaluate how Plush provides support for each.

References

  1. Adabala, S., Chadha, V., Chawla, P., Figueiredo, R., Fortes, J., Krsul, I., Matsunaga, A., Tsugawa, M., Zhang, J., Zhao, M., Zhu, L., and Zhu, X. 2005. From virtualized resources to virtual computing grids: The In-VIGO system. Future Gen. Comput. Syst. 21, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Albrecht, J. 2009. Bringing big systems to small schools: Distributed systems for undergraduates. In Proceedings of the 40th ACM Technical Symposium on Computer Science Education (SIGCSE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Albrecht, J., Braud, R., Dao, D., Topilski, N., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2007. Remote control: Distributed application configuration, management, and visualization with Plush. In Proceedings of the USENIX Large Installation System Administration Conference (LISA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Albrecht, J. and Huang, D. Y. 2010. Managing distributed applications using Gush. In Proceedings of the ICST Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, Testbed Practices Session (TridentCom).Google ScholarGoogle Scholar
  5. Albrecht, J., Oppenheimer, D., Patterson, D., and Vahdat, A. 2008. Design and implementation tradeoffs for wide-area resource discovery. ACM Trans. Internet Technol. 8, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Albrecht, J., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2006a. Loose synchronization for large-scale networked systems. In Proceedings of the USENIX Annual Technical Conference (USENIX). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Albrecht, J., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2006b. PlanetLab application management using Plush. ACM Operat. Syst. Rev. 40, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andersen, D. G., Balakrishnan, H., and Kaashoek, F. 2005. Improving Web availability for clients with MONET. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anderson, D. P. 2004. BOINC: A System for public-resource computing and storage. In Proceedings of the IEEE/ACM International Workshop on Grid Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bavier, A., Bowman, M., Chun, B., Culler, D., Karlin, S., Muir, S., Peterson, L., Roscoe, T., Spalink, T., and Wawrzoniak, M. 2004. Operating systems support for planetary-scale network services. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Berman, F., Casanova, H., Chien, A., Cooper, K., Dail, H., Dasgupta, A., Deng, W., Dongarra, J., Johnsson, L., Kennedy, K., Koelbel, C., Liu, B., Liu, X., Mandal, A., Marin, G., Mazina, M., Mellor-Crummey, J., Mendes, C., Olugbile, A., Patel, M., Reed, D., Shi, Z., Sievert, O., Xia, H., and YarKhan, A. 2005. New grid scheduling and rescheduling methods in the GrADS project. Inter. J. Parall. Program. 33, 2--3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bershad, B., Zekauskas, M., and Sawdon, W. 1993. The midway distributed shared memory system. In Proceedings of the IEEE Computer Conference (COMPCON).Google ScholarGoogle Scholar
  14. Bricker, A., Litzkow, M., and Livny, M. 1991. Condor technical summary. Tech. rep. 1069, Computer Science Department, University of Wisconsin--Madison.Google ScholarGoogle Scholar
  15. Burgess, M. 1995. Cfengine: A site configuration engine. USENIX Comput. Syst. 8, 3.Google ScholarGoogle Scholar
  16. Catlett, C. 2002. The philosophy of TeraGrid: Building an open, extensible, distributed TeraScale facility. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chandra, R., Zeldovich, N., Sapuntzakis, C., and Lam, M. S. 2005. The collective: A cache-based system management architecture. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chun, B. gexec. http://www.theether.org/gexec/.Google ScholarGoogle Scholar
  19. Coa, J., Jarvis, S., Saini, S., and Nudd, G. 2003. GridFlow: Workflow managament for grid computing. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dijkstra, E. 1968. The Structure of the “THE”-multiprogramming system. Comm. ACM 11, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Foster, I. 2005. A globus toolkit primer. http://www.globus.org/toolkit/docs/4.0/key/GT4_Primer_0.6.pdf.Google ScholarGoogle Scholar
  22. Fox, A. and Brewer, E. 1999. Harvest, yield, and scalable tolerant systems. In Proceedings of the IEEE Workshop on Hot Topics in Operating Systems (HotOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Freedman, M. J., Freudenthal, E., and Mazières, D. 2004. Democratizing content publication with Coral. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Geist, G. A. and Sunderam, V. S. 1992. Network-based concurrent computing on the PVM system. Concurrency: Pract. Exper. 4, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Geni 2008. http://www.geni.net.Google ScholarGoogle Scholar
  26. Gentzsch, W. 2001. Sun grid engine: Towards creating a compute power grid. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Globus Toolkit Monitoring and Discovery System: MDS4. http://www-unix.mcs.anl.gov/~schopf/Talks/mds4SC_nov2004.ppt.Google ScholarGoogle Scholar
  28. Goldsack, P., Guijarro, J., Lain, A., Mecheneau, G., Murray, P., and Toft, P. 2003. SmartFrog: Configuration and automatic ignition of distributed applications. In Proceedings of the HP Openview University Association Conference (HP OVUA).Google ScholarGoogle Scholar
  29. Gush 2008. http://gush.cs.williams.edu/.Google ScholarGoogle Scholar
  30. Huebsch, R. PlanetLab application manager. http://appmanager.berkeley.intel-research.net.Google ScholarGoogle Scholar
  31. Irwin, D., Chase, J., Grit, L., Yumerefendi, A., Becker, D., and Yocum, K. G. 2006. Sharing networked resources with brokered leases. In Proceedings of the USENIX Annual Technical Conference (USENIX). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jordan, H. F. 1978. A special purpose architecture for finite element analysis. In Proceedings of the International Conference on Parallel Processing (ICPP).Google ScholarGoogle Scholar
  33. Keahey, K., Doering, K., and Foster, I. 2004. From sandbox to playground: Dynamic virtual environments in the grid. In Proceedings of the International Workshop in Grid Computing (Grid). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kee, Y.-S., Logothetis, D., Huang, R., Casanova, H., and Chien, A. 2005. Efficient resource description and high quality selection for virtual grids. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Keleher, P., Dwarkadas, S., Cox, A. L., and Zwaenepoel, W. 1994. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter USENIX Conference (USENIX). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Killian, C., Anderson, J. W., Braud, R., Jhala, R., and Vahdat, A. 2007. Mace: Language support for building distributed systems. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kostić, D., Rodriguez, A., Albrecht, J., and Vahdat, A. 2003. Bullet: High bandwidth data dissemination using an overlay mesh. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Krsul, I., Ganguly, A., Zhang, J., Fortes, J. A. B., and Figueiredo, R. J. 2004. VMPlants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the Supercomputing Conference (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Leiserson, C. E., Abuhamdeh, Z. S., Douglas, D. C., Feynman, C. R., Ganmukhi, M. N., Hill, J. V., Hillis, W. D., Kuszmaul, B. C., Pierre, M. A. S., Wells, D. S., Wong-Chan, M. C., Yang, S.-W., and Zak, R. 1996. The network architecture of the connection machine CM-5. J. Parall. Distrib. Comput. 33, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Litzkow, M., Livny, M., and Mutka, M. 1988. Condor—A hunter of idle workstations. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS).Google ScholarGoogle Scholar
  41. Liu, C., Yang, L., Foster, I., and Angulo, D. 2002. Design and evaluation of a resource selection framework. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Load Sharing Facility (LSF). http://www.platform.com/Products/Platform.LSF.Family/.Google ScholarGoogle Scholar
  43. Ludtke, S., Baldwin, P., and Chiu, W. 1999. EMAN: Semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 122.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E. A., Tao, J., and Zhao, Y. 2005. Scientific workflow management and the Kepler system. Concurrency Computat. Pract. Exper. (Special Issue on Scientific Workflows) 18, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mao, Y. vxargs. http://dharma.cis.upenn.edu/planetlab/vxargs/.Google ScholarGoogle Scholar
  46. Markoff, J. and Hansell, S. 2006. Hiding in plain sight, Google seeks more power. New York Times.Google ScholarGoogle Scholar
  47. Maui. Maui. http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php.Google ScholarGoogle Scholar
  48. McNett, M., Gupta, D., Vahdat, A., and Voelker, G. M. 2007. Usher: An extensible framework for managing clusters of virtual machines. In Proceedings of the USENIX Large Installation System Administration Conference (LISA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Message Passing Interface Forum. 1994. MPI: A message-passing interface standard. Tech. rep. UT-CS-94-230, University of Tennessee, Knoxville. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Nacar, M. A., Pierce, M., and Fox, G. C. 2004. Developing a secure grid computing environment shell engine: Containers and services. Neural Parall. Scientific Computat. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nebula 2007. http://plush.cs.williams.edu/nebula.Google ScholarGoogle Scholar
  52. Oppenheimer, D., Albrecht, J., Patterson, D., and Vahdat, A. 2005. Design and implementation tradeoffs for wide-area resource discovery. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Orca-ben 2008. https://ben.renci.org/.Google ScholarGoogle Scholar
  54. Pai, V. S., Wang, L., Park, K., Pang, R., and Peterson, L. 2003. The dark side of the Web: An open proxy's view. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets).Google ScholarGoogle Scholar
  55. Park, K. and Pai, V. S. 2004. Deploying large file transfer on an HTTP content distribution network. In Proceedings of the ACM/USENIX Workshop on Real, Large Distributed Systems (WORLDS).Google ScholarGoogle Scholar
  56. Park, K. and Pai, V. S. 2006. CoMon: A mostly-scalable monitoring system for PlanetLab. ACM Operat. Syst. Rev. 40, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Pearlman, L., Kesselman, C., Gullapalli, S., B. F. Spencer, J., Futrelle, J., Ricker, K., Foster, I., Hubbard, P., and Severance, C. 2004. Distributed hybrid earthquake engineering experiments: Experiences with a ground-shaking grid application. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. PlanetLab Geni 2008. http://groups.geni.net/geni/wiki/PlanetLab.Google ScholarGoogle Scholar
  59. Plush 2004. Plush. http://plush.cs.williams.edu.Google ScholarGoogle Scholar
  60. Portable Batch Scheduler. (PBS). http://www.altair.com/software/pbspro.htm.Google ScholarGoogle Scholar
  61. ProtoGeni 2008. http://www.protogeni.net/.Google ScholarGoogle Scholar
  62. Pu, C. and Leff, A. 1991. Epsilon-serializability. Tech. rep. CUCS-054-90, Columbia University.Google ScholarGoogle Scholar
  63. Raman, R., Livny, M., and Solomon, M. 2003. Policy driven heterogeneous resource co-allocation with gangmatching. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ripeanu, M., Bowman, M., Chase, J. S., Foster, I., and Milenkovic, M. 2004. Globus and PlanetLab resource management solutions compared. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ritchie, D. M. and Thompson, K. 1974. The UNIX Time-sharing system. Comm. ACM 17, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Satopää, V., Albrecht, J., Irwin, D., and Raghavan, B. 2011. Finding a “kneedle” in a haystack: Detecting knee points in system behavior. In Proceedings of the IEEE Workshop on Simplifying Complex Networks for Practitioners (Simplex).Google ScholarGoogle Scholar
  67. Scott, S. L. 1996. Synchronization and communication in the T3E multiprocessor. In Proceedings of Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Shoykhet, A., Lange, J., and Dinda, P. 2004. Virtuoso: A system for virtual machine marketplaces. Tech. rep. NWU-CS-04-39, Department of Computer Science, Northwestern University.Google ScholarGoogle Scholar
  69. Terry, D. B., Theimer, M. M., Petersen, K., Demers, A. J., Spreitzer, M. J., and Hauser, C. H. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Topilski, N., Albrecht, J., and Vahdat, A. 2008. Improving scalability and fault tolerance in an application management infrastructure. In Proceedings of the USENIX Workshop on Large-Scale Computing (LASCO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Torres-Rojas, F., Ahamad, M., and Raynal, M. 1999. Timed consistency for shared distributed objects. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Vahdat, A., Yocum, K., Walsh, K., Mahadevan, P., Kostić, D., Chase, J., and Becker, D. 2002. Scalability and accuracy in a large-scale network emulator. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Walker, E., Minyard, T., and Boisseau, J. 2004. GridShell: A login shell for orchestrating and coordinating applications in a grid enabled environment. In Proceedings of the International Conference on Computing, Communications and Control Technologies (CCCT).Google ScholarGoogle Scholar
  74. Wood, T., Shenoy, P., Venkataramani, A., and Yousif, M. 2007. Black-box and gray-box strategies for virtual machine migration. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Yu, H. and Vahdat, A. 2000. Design and evaluation of a continuous consistency model for replicated services. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Yu, J. and Buyya, R. 2005. A taxonomy of workflow management systems for grid computing. J. Grid Computing 3, 3--4.Google ScholarGoogle ScholarCross RefCross Ref
  77. Zhang, X. and Schopf, J. 2004. Performance analysis of the Globus toolkit monitoring and discovery service, MDS2. In Proceedings of the International Workshop on Middleware Performance (MP).Google ScholarGoogle Scholar

Index Terms

  1. Distributed application configuration, management, and visualization with plush

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 11, Issue 2
        December 2011
        130 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/2049656
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 December 2011
        • Accepted: 1 May 2011
        • Revised: 1 July 2010
        • Received: 1 January 2009
        Published in toit Volume 11, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader