skip to main content
10.1145/2670518.2673884acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
tutorial

A Highly Available Software Defined Fabric

Published:27 October 2014Publication History

ABSTRACT

Existing SDNs rely on a collection of intricate, mutually-dependent mechanisms to implement a logically centralized control plane. These cyclical dependencies and lack of clean separation of concerns can impact the availability of SDNs, such that a handful of link failures could render entire portions of an SDN non-functional. This paper shows why and when this could happen, and makes the case for taking a fresh look at architecting SDNs for robustness to faults from the ground up. Our approach carefully synthesizes various key distributed systems ideas -- in particular, reliable flooding, global snapshots, and replicated controllers. We argue informally that it can offer high availability in the face of a variety of network failures, but much work needs to be done to make our approach scalable and general. Thus, our paper represents a starting point for a broader discussion on approaches for building highly available SDNs.

References

  1. OSPF Version 2: The Flooding Procedure. Request for Comments 1583, Internet Engineering Task Force.Google ScholarGoogle Scholar
  2. T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine Grained Traffic Engineering for Data Centers. In CoNEXT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Brewer. Towards robust distributed systems. Invited talk at Priniples of Distributed Computing, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1), Feb. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Garg, V. K. Garg, and Y. Sabharwal. Scalable algorithms for global snapshots in distributed systems. In Proceedings of the 20th Annual International Conference on Supercomputing, ICS '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer. Achieving high utilization with software-driven wan. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. X. Jin, L. Li, L. Vanbever, and J. Rexford. SoftCell: Scalable and Flexible Cellular Core Network Architecture. In CoNEXT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Jin, H. Liu, R. Gandhi, S. Kandula, R. Mahajan, J. Rexford, R. Wattenhofer, and M. Zhang. Dionysus: Dynamic scheduling of network updates. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Kingsbury and P. Bailis. The network is reliable. http://aphyr.com/posts/288-the-network-is-reliable.Google ScholarGoogle Scholar
  11. T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev, J. Gross, N. Gude, P. Ingram, E. Jackson, A. Lambeth, R. Lenglet, S.-H. Li, A. Padmanabhan, J. Pettit, B. Pfaff, R. Ramanathan, S. Shenker, A. Shieh, J. Stribling, P. Thakkar, D. Wendlandt, A. Yip, and R. Zhang. Network virtualization in multi-tenant datacenters. In NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: A distributed control platform for large-scale production networks. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed internet routing convergence. In SIGCOMM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18--25, Dec. 2001.Google ScholarGoogle Scholar
  15. R. Mahajan and R. Wattenhofer. On consistent updates in software defined networks. In HotNets, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Mattern. Efficient algorithms for distributed snapshots and global virtual time approximation. J. Parallel Distrib. Comput., 18(4), Aug. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Panda, C. Scott, A. Ghodsi, T. Koponen, and S. Shenker. Cap for networks. In HotSDN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker. Abstractions for network update. In SIGCOMM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Ros and P. Ruiz. Five nines of southbound reliability in software-defined networks. In HotSDN, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Sahoo, K. Kant, and P. Mohapatra. Bgp convergence delay after multiple simultaneous router failures: Characterization and solutions. Comput. Commun., 32(7-10), May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Sun, R. Mahajan, J. Rexford, L. Yuan, M. Zhang, and A. Arefin. A network-state management service. In SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Highly Available Software Defined Fabric

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HotNets-XIII: Proceedings of the 13th ACM Workshop on Hot Topics in Networks
      October 2014
      189 pages
      ISBN:9781450332569
      DOI:10.1145/2670518

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

      Acceptance Rates

      HotNets-XIII Paper Acceptance Rate26of118submissions,22%Overall Acceptance Rate110of460submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader