skip to main content
research-article

Sinfonia: A new paradigm for building scalable distributed systems

Published:27 November 2009Publication History
Skip Abstract Section

Abstract

We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols, a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a new minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.

References

  1. Aguilera, M. K., Golab, W., and Shah, M. 2008. A practical scalable distributed B-tree. Proc. VLDB Endowment 1, 1, 598--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amir, Y. and Stanton, J. 1998. The Spread wide area group communication system. Tech. rep. CNDS-98-4, The Johns Hopkins University.Google ScholarGoogle Scholar
  3. Amza, C. Cox, A., Dwarkadas, S., Keleher, P., Lu, H., et al. 1996. Treadmarks: Shared memory computing on networks of workstations. IEEE Comput. 29, 2, 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Birman, K. P. and Joseph, T. A. 1987. Exploiting virtual synchrony in distributed systems. In Proceedings of the Symposium on Operating System Principles. 123--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. 1993. The primary-backup approach. In Distributed Systems, S. J. Mullender, Ed. Addison-Wesley, Chapter 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the Symposium on Operating Systems Design and Implementation. 335--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carter, J. B., Bennett, J. K., and Zwaenepoel, W. 1991. Implementation and performance of Munin. In Proceedings of the Symposium on Operating Systems Principles. 152--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. BigTable: A distributed storage system for structured data. In Proceedings of the Symposium on Operating Systems Design and Implementation. 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chao, C., English, R., Jacobson, D., Stepanov, A., and Wilkes, J. 1992. Mime: A high performance storage device with strong recovery guarantees. Tech. rep. HPL-CSP-92-9, HP Laboratories.Google ScholarGoogle Scholar
  11. Chockler, G. V., Keidar, I., and Vitenberg, R. 2001. Group communication specifications: A comprehensive study. ACM Comput. Surv. 33, 4, 1--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dasgupta, P., LeBlanc, R. J. Jr., Ahamad, M., and Ramachandran, U. 1991. The Clouds distributed operating system. IEEE Comput. 24, 11, 34--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4, 372--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Demers, A., Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Welch, B. 1994. The Bayou architecture: Support for data sharing among mobile users. In Proceedings of the IEEE Workshop on Mobile Computing Systems and Applications. 2--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fakler, M., Frenz, S., Goeckelmann, R., Schoettner, M., and Schulthess, P. 2005. Project Tetropolis—Application of grid computing to interactive virtual 3D worlds. In Proceedings of the International Conference on Hypermedia and Grid Systems.Google ScholarGoogle Scholar
  17. Ferreira, P., Shapiro, M., Blondel, X., Fambon, O., Garcia, J., et al. 2000. Perdis: Design, implementation, and use of a persistent distributed store. In Recent Advances in Distributed Systems. Lecture Notes in Computer Science, vol. 1752. Springer, Chapter 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the Symposium on Operating Systems Principles. 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gray, J. and Lamport, L. 2006. Consensus on transaction commit. ACM Trans. Datab. Syst. 31, 1, 133--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gribble, S. D., Brewer, E. A., Hellerstein, J. M., and Culler, D. 2000. Scalable, distributed data structures for Internet service construction. In Proceedings of the Symposium on Operating Systems Design and Implementation. 319--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Harris, T. and Fraser, K. 2003. Language support for lightweight transactions. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages and Applications. 388--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Symposium on Principles of Distributed Computing. 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Herlihy, M. and Moss, J. E. B. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the International Symposium on Computer Architecture. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multi-processor database machines. In Proceedings of the International Data Engineering Conference. 456--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. ACM SIGPLAN Not. 35, 11, 190--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Li, K. 1988. IVY: A shared virtual memory system for parallel computing. In Proceedings of the International Conference on Parallel Processing. 94--101.Google ScholarGoogle Scholar
  28. Liskov, B. 1988. Distributed programming in Argus. Comm. ACM 31, 3, 300--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Liskov, B., Castro, M., Shrira, L., and Adya, A. 1999. Providing persistent objects in distributed systems. In Proceedings of the European Conference on Object-Oriented Programming. 230--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the Symposium on Operating Systems Design and Implementation. 105--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mehra, P. and Fineberg, S. 2004. Fast and flexible persistence: The magic potion for fault-tolerance, scalability and performance in online data stores. In Proceedings of the International Parallel and Distributed Processing Symposium - Workshop 11. 206a.Google ScholarGoogle Scholar
  32. Olson, M. A. 1993. The design and implementation of the Inversion File System. In Proceedings of the USENIX Winter Conference. 205--218.Google ScholarGoogle Scholar
  33. RDMA Consortium. http://www.rdmaconsortium.org.Google ScholarGoogle Scholar
  34. Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., and Kubiatowicz, J. 2003. Pond: The OceanStore prototype. In Proceedings of the USENIX Conference on File and Storage Technologies. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Satyanarayanan, M., Kistler, J. J., Kumar, P., Okasaki, M. E., Siegel, E. H., and Steere, D. C. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4, 447--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Satyanarayanan, M., Mashburn, H. H., Kumar, P., Steere, D. C., and Kistler, J. J. 1994. Lightweight recoverable virtual memory. ACM Trans. Comput. Syst. 12, 1, 33--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Schiper, A. and Toueg, S. 2006. From set membership to group membership: A separation of concerns. IEEE Trans. Depend. Secure Comput. 3, 1, 2--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Schmuck, F. B. and Wyllie, J. C. 1991. Experience with transactions in QuickSilver. In Proceedings of the Symposium on Operating Systems Principles. 239--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sears, R. and Brewer, E. 2006. Stasis: Flexible transactional storage. In Proceedings of the Symposium on Operating Systems Design and Implementation. 29--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Symposium on Principles of Distributed Computing. 204--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Skeen, D. and Stonebraker, M. 1983. A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Engin. 9, 3, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Spector, A. Z., Thompson, D., Pausch, R. F., Eppinger, J. L., Duchamp, D., Draves, R., Daniels, D. S., and Bloch, J. J. 1987. Camelot: A flexible and efficient distributed transaction processing facility for Mach and the Internet—An status report. Res. paper CMU-CS-87-129, Computer Science Department, Carnegie Mellon University.Google ScholarGoogle Scholar

Index Terms

  1. Sinfonia: A new paradigm for building scalable distributed systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computer Systems
        ACM Transactions on Computer Systems  Volume 27, Issue 3
        November 2009
        80 pages
        ISSN:0734-2071
        EISSN:1557-7333
        DOI:10.1145/1629087
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 November 2009
        • Revised: 1 April 2009
        • Accepted: 1 April 2009
        • Received: 1 December 2008
        Published in tocs Volume 27, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader