research-article

Distributed application configuration, management, and visualization with plush

Authors:
Jeannie Albrecht

Williams College, Williamstown, MA

Williams College, Williamstown, MA
View Profile

,
Christopher Tuttle

Google

Google
View Profile

,
Ryan Braud

Thousand Eyes

Thousand Eyes
View Profile

,
Darren Dao

eHarmony

eHarmony
View Profile

,
Nikolay Topilski

Akamai Technologies

Akamai Technologies
View Profile

,
Alex C. Snoeren

University of California, San Diego

University of California, San Diego
View Profile

,
Amin Vahdat

University of California, San Diego

University of California, San Diego
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 11 Issue 2Article No.: 6pp 1–41https://doi.org/10.1145/2049656.2049658

Published:12 December 2011Publication History

ACM Transactions on Internet Technology

Abstract

Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, and maintenance in distributed environments, few tools provide a unified framework for application management. Many of the existing tools address the management needs of a single type of application or service that runs in a specific environment, and these tools are not adaptable enough to be used for other applications or platforms. To this end, we present the design and implementation of Plush, a fully configurable application management infrastructure designed to meet the general requirements of several different classes of distributed applications. Plush allows developers to specifically define the flow of control needed by their computations using application building blocks. Through an extensible resource management interface, Plush supports execution in a variety of environments, including both live deployment platforms and emulated clusters. Plush also uses relaxed synchronization primitives for improving fault tolerance and liveness in failure-prone environments. To gain an understanding of how Plush manages different classes of distributed applications, we take a closer look at specific applications and evaluate how Plush provides support for each.

References

Adabala, S., Chadha, V., Chawla, P., Figueiredo, R., Fortes, J., Krsul, I., Matsunaga, A., Tsugawa, M., Zhang, J., Zhao, M., Zhu, L., and Zhu, X. 2005. From virtualized resources to virtual computing grids: The In-VIGO system. Future Gen. Comput. Syst. 21, 6. Google ScholarDigital Library
Albrecht, J. 2009. Bringing big systems to small schools: Distributed systems for undergraduates. In Proceedings of the 40th ACM Technical Symposium on Computer Science Education (SIGCSE). Google ScholarDigital Library
Albrecht, J., Braud, R., Dao, D., Topilski, N., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2007. Remote control: Distributed application configuration, management, and visualization with Plush. In Proceedings of the USENIX Large Installation System Administration Conference (LISA). Google ScholarDigital Library
Albrecht, J. and Huang, D. Y. 2010. Managing distributed applications using Gush. In Proceedings of the ICST Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, Testbed Practices Session (TridentCom).Google Scholar
Albrecht, J., Oppenheimer, D., Patterson, D., and Vahdat, A. 2008. Design and implementation tradeoffs for wide-area resource discovery. ACM Trans. Internet Technol. 8, 4. Google ScholarDigital Library
Albrecht, J., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2006a. Loose synchronization for large-scale networked systems. In Proceedings of the USENIX Annual Technical Conference (USENIX). Google ScholarDigital Library
Albrecht, J., Tuttle, C., Snoeren, A. C., and Vahdat, A. 2006b. PlanetLab application management using Plush. ACM Operat. Syst. Rev. 40, 1. Google ScholarDigital Library
Andersen, D. G., Balakrishnan, H., and Kaashoek, F. 2005. Improving Web availability for clients with MONET. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
Anderson, D. P. 2004. BOINC: A System for public-resource computing and storage. In Proceedings of the IEEE/ACM International Workshop on Grid Computing. Google ScholarDigital Library
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarDigital Library
Bavier, A., Bowman, M., Chun, B., Culler, D., Karlin, S., Muir, S., Peterson, L., Roscoe, T., Spalink, T., and Wawrzoniak, M. 2004. Operating systems support for planetary-scale network services. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
Berman, F., Casanova, H., Chien, A., Cooper, K., Dail, H., Dasgupta, A., Deng, W., Dongarra, J., Johnsson, L., Kennedy, K., Koelbel, C., Liu, B., Liu, X., Mandal, A., Marin, G., Mazina, M., Mellor-Crummey, J., Mendes, C., Olugbile, A., Patel, M., Reed, D., Shi, Z., Sievert, O., Xia, H., and YarKhan, A. 2005. New grid scheduling and rescheduling methods in the GrADS project. Inter. J. Parall. Program. 33, 2--3. Google ScholarDigital Library
Bershad, B., Zekauskas, M., and Sawdon, W. 1993. The midway distributed shared memory system. In Proceedings of the IEEE Computer Conference (COMPCON).Google Scholar
Bricker, A., Litzkow, M., and Livny, M. 1991. Condor technical summary. Tech. rep. 1069, Computer Science Department, University of Wisconsin--Madison.Google Scholar
Burgess, M. 1995. Cfengine: A site configuration engine. USENIX Comput. Syst. 8, 3.Google Scholar
Catlett, C. 2002. The philosophy of TeraGrid: Building an open, extensible, distributed TeraScale facility. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarDigital Library
Chandra, R., Zeldovich, N., Sapuntzakis, C., and Lam, M. S. 2005. The collective: A cache-based system management architecture. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
Chun, B. gexec. http://www.theether.org/gexec/.Google Scholar
Coa, J., Jarvis, S., Saini, S., and Nudd, G. 2003. GridFlow: Workflow managament for grid computing. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarDigital Library
Dijkstra, E. 1968. The Structure of the “THE”-multiprogramming system. Comm. ACM 11, 5. Google ScholarDigital Library
Foster, I. 2005. A globus toolkit primer. http://www.globus.org/toolkit/docs/4.0/key/GT4_Primer_0.6.pdf.Google Scholar
Fox, A. and Brewer, E. 1999. Harvest, yield, and scalable tolerant systems. In Proceedings of the IEEE Workshop on Hot Topics in Operating Systems (HotOS). Google ScholarDigital Library
Freedman, M. J., Freudenthal, E., and Mazières, D. 2004. Democratizing content publication with Coral. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
Geist, G. A. and Sunderam, V. S. 1992. Network-based concurrent computing on the PVM system. Concurrency: Pract. Exper. 4, 4. Google ScholarDigital Library
Geni 2008. http://www.geni.net.Google Scholar
Gentzsch, W. 2001. Sun grid engine: Towards creating a compute power grid. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarDigital Library
Globus Toolkit Monitoring and Discovery System: MDS4. http://www-unix.mcs.anl.gov/~schopf/Talks/mds4SC_nov2004.ppt.Google Scholar
Goldsack, P., Guijarro, J., Lain, A., Mecheneau, G., Murray, P., and Toft, P. 2003. SmartFrog: Configuration and automatic ignition of distributed applications. In Proceedings of the HP Openview University Association Conference (HP OVUA).Google Scholar
Gush 2008. http://gush.cs.williams.edu/.Google Scholar
Huebsch, R. PlanetLab application manager. http://appmanager.berkeley.intel-research.net.Google Scholar
Irwin, D., Chase, J., Grit, L., Yumerefendi, A., Becker, D., and Yocum, K. G. 2006. Sharing networked resources with brokered leases. In Proceedings of the USENIX Annual Technical Conference (USENIX). Google ScholarDigital Library
Jordan, H. F. 1978. A special purpose architecture for finite element analysis. In Proceedings of the International Conference on Parallel Processing (ICPP).Google Scholar
Keahey, K., Doering, K., and Foster, I. 2004. From sandbox to playground: Dynamic virtual environments in the grid. In Proceedings of the International Workshop in Grid Computing (Grid). Google ScholarDigital Library
Kee, Y.-S., Logothetis, D., Huang, R., Casanova, H., and Chien, A. 2005. Efficient resource description and high quality selection for virtual grids. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid). Google ScholarDigital Library
Keleher, P., Dwarkadas, S., Cox, A. L., and Zwaenepoel, W. 1994. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter USENIX Conference (USENIX). Google ScholarDigital Library
Killian, C., Anderson, J. W., Braud, R., Jhala, R., and Vahdat, A. 2007. Mace: Language support for building distributed systems. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Google ScholarDigital Library
Kostić, D., Rodriguez, A., Albrecht, J., and Vahdat, A. 2003. Bullet: High bandwidth data dissemination using an overlay mesh. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarDigital Library
Krsul, I., Ganguly, A., Zhang, J., Fortes, J. A. B., and Figueiredo, R. J. 2004. VMPlants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the Supercomputing Conference (SC). Google ScholarDigital Library
Leiserson, C. E., Abuhamdeh, Z. S., Douglas, D. C., Feynman, C. R., Ganmukhi, M. N., Hill, J. V., Hillis, W. D., Kuszmaul, B. C., Pierre, M. A. S., Wells, D. S., Wong-Chan, M. C., Yang, S.-W., and Zak, R. 1996. The network architecture of the connection machine CM-5. J. Parall. Distrib. Comput. 33, 2. Google ScholarDigital Library
Litzkow, M., Livny, M., and Mutka, M. 1988. Condor—A hunter of idle workstations. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS).Google Scholar
Liu, C., Yang, L., Foster, I., and Angulo, D. 2002. Design and evaluation of a resource selection framework. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarDigital Library
Load Sharing Facility (LSF). http://www.platform.com/Products/Platform.LSF.Family/.Google Scholar
Ludtke, S., Baldwin, P., and Chiu, W. 1999. EMAN: Semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 122.Google ScholarCross Ref
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E. A., Tao, J., and Zhao, Y. 2005. Scientific workflow management and the Kepler system. Concurrency Computat. Pract. Exper. (Special Issue on Scientific Workflows) 18, 10. Google ScholarDigital Library
Mao, Y. vxargs. http://dharma.cis.upenn.edu/planetlab/vxargs/.Google Scholar
Markoff, J. and Hansell, S. 2006. Hiding in plain sight, Google seeks more power. New York Times.Google Scholar
Maui. Maui. http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php.Google Scholar
McNett, M., Gupta, D., Vahdat, A., and Voelker, G. M. 2007. Usher: An extensible framework for managing clusters of virtual machines. In Proceedings of the USENIX Large Installation System Administration Conference (LISA). Google ScholarDigital Library
Message Passing Interface Forum. 1994. MPI: A message-passing interface standard. Tech. rep. UT-CS-94-230, University of Tennessee, Knoxville. Google ScholarDigital Library
Nacar, M. A., Pierce, M., and Fox, G. C. 2004. Developing a secure grid computing environment shell engine: Containers and services. Neural Parall. Scientific Computat. 12. Google ScholarDigital Library
Nebula 2007. http://plush.cs.williams.edu/nebula.Google Scholar
Oppenheimer, D., Albrecht, J., Patterson, D., and Vahdat, A. 2005. Design and implementation tradeoffs for wide-area resource discovery. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarDigital Library
Orca-ben 2008. https://ben.renci.org/.Google Scholar
Pai, V. S., Wang, L., Park, K., Pang, R., and Peterson, L. 2003. The dark side of the Web: An open proxy's view. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets).Google Scholar
Park, K. and Pai, V. S. 2004. Deploying large file transfer on an HTTP content distribution network. In Proceedings of the ACM/USENIX Workshop on Real, Large Distributed Systems (WORLDS).Google Scholar
Park, K. and Pai, V. S. 2006. CoMon: A mostly-scalable monitoring system for PlanetLab. ACM Operat. Syst. Rev. 40, 1. Google ScholarDigital Library
Pearlman, L., Kesselman, C., Gullapalli, S., B. F. Spencer, J., Futrelle, J., Ricker, K., Foster, I., Hubbard, P., and Severance, C. 2004. Distributed hybrid earthquake engineering experiments: Experiences with a ground-shaking grid application. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarDigital Library
PlanetLab Geni 2008. http://groups.geni.net/geni/wiki/PlanetLab.Google Scholar
Plush 2004. Plush. http://plush.cs.williams.edu.Google Scholar
Portable Batch Scheduler. (PBS). http://www.altair.com/software/pbspro.htm.Google Scholar
ProtoGeni 2008. http://www.protogeni.net/.Google Scholar
Pu, C. and Leff, A. 1991. Epsilon-serializability. Tech. rep. CUCS-054-90, Columbia University.Google Scholar
Raman, R., Livny, M., and Solomon, M. 2003. Policy driven heterogeneous resource co-allocation with gangmatching. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarDigital Library
Ripeanu, M., Bowman, M., Chase, J. S., Foster, I., and Milenkovic, M. 2004. Globus and PlanetLab resource management solutions compared. In Proceedings of the IEEE Symposium on High Performance Distributed Compuuting (HPDC). Google ScholarDigital Library
Ritchie, D. M. and Thompson, K. 1974. The UNIX Time-sharing system. Comm. ACM 17, 7. Google ScholarDigital Library
Satopää, V., Albrecht, J., Irwin, D., and Raghavan, B. 2011. Finding a “kneedle” in a haystack: Detecting knee points in system behavior. In Proceedings of the IEEE Workshop on Simplifying Complex Networks for Practitioners (Simplex).Google Scholar
Scott, S. L. 1996. Synchronization and communication in the T3E multiprocessor. In Proceedings of Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
Shoykhet, A., Lange, J., and Dinda, P. 2004. Virtuoso: A system for virtual machine marketplaces. Tech. rep. NWU-CS-04-39, Department of Computer Science, Northwestern University.Google Scholar
Terry, D. B., Theimer, M. M., Petersen, K., Demers, A. J., Spreitzer, M. J., and Hauser, C. H. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the ACM Symposium on Operating System Principles (SOSP). Google ScholarDigital Library
Topilski, N., Albrecht, J., and Vahdat, A. 2008. Improving scalability and fault tolerance in an application management infrastructure. In Proceedings of the USENIX Workshop on Large-Scale Computing (LASCO). Google ScholarDigital Library
Torres-Rojas, F., Ahamad, M., and Raynal, M. 1999. Timed consistency for shared distributed objects. In Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC). Google ScholarDigital Library
Vahdat, A., Yocum, K., Walsh, K., Mahadevan, P., Kostić, D., Chase, J., and Becker, D. 2002. Scalability and accuracy in a large-scale network emulator. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation (OSDI). Google ScholarDigital Library
Walker, E., Minyard, T., and Boisseau, J. 2004. GridShell: A login shell for orchestrating and coordinating applications in a grid enabled environment. In Proceedings of the International Conference on Computing, Communications and Control Technologies (CCCT).Google Scholar
Wood, T., Shenoy, P., Venkataramani, A., and Yousif, M. 2007. Black-box and gray-box strategies for virtual machine migration. In Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI). Google ScholarDigital Library
Yu, H. and Vahdat, A. 2000. Design and evaluation of a continuous consistency model for replicated services. In Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation (OSDI). Google ScholarDigital Library
Yu, J. and Buyya, R. 2005. A taxonomy of workflow management systems for grid computing. J. Grid Computing 3, 3--4.Google ScholarCross Ref
Zhang, X. and Schopf, J. 2004. Performance analysis of the Globus toolkit monitoring and discovery service, MDS2. In Proceedings of the International Workshop on Middleware Performance (MP).Google Scholar

Index Terms

Distributed application configuration, management, and visualization with plush
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

PlanetLab application management using plush

Support for application deployment and monitoring in large-scale distributed systems such as PlanetLab remains in its early stages. While a number of solutions exist for specific subtasks of deployment and monitoring, these tools suffer from a lack of ...
Read More
Distributed application management
Read More
Distributed event based challenges for systems and applications management
DEBS '11: Proceedings of the 5th ACM international conference on Distributed event-based system

IT system and application management is critical to business use of IT systems. Distributed event processing is core to application and systems management, even for applications that are not "event driven." Emerging technology like virtualization and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Internet Technology Volume 11, Issue 2
December 2011
130 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/2049656
Issue’s Table of Contents

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 December 2011
- Accepted: 1 May 2011
- Revised: 1 July 2010
- Received: 1 January 2009
Published in toit Volume 11, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Application management
PlanetLab
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 516
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed application configuration, management, and visualization with plush

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

PlanetLab application management using plush

Distributed application management

Distributed event based challenges for systems and applications management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed application configuration, management, and visualization with plush

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

PlanetLab application management using plush

Distributed application management

Distributed event based challenges for systems and applications management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media