skip to main content
10.1145/1095810.1095817acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
Article

Fault-scalable Byzantine fault-tolerant services

Published:20 October 2005Publication History

ABSTRACT

A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine fault-tolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36% as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.

References

  1. M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Lazy verification in fault-tolerant distributed storage systems. Symposium on Reliable Distributed Systems, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. The Read/Conditional-Write and Query/Update protocols. Technical report CMU--PDL--05--107. Parallel Data Laboratory, Carnegie Mellon University, Pittsburgh, PA, August 2005.]]Google ScholarGoogle Scholar
  3. A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. FARSITE: federated, available, and reliable storage for an incompletely trusted environment. Symposium on Operating Systems Design and Implementation, pages 1--15. USENIX Association, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. K. Aguilera, W. Chen, and S. Toueg. Failure detection and consensus in the crash-recovery model. Distributed Computing, 13(2):99--125. Springer-Verlag, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. Advances in Cryptology - CRYPTO, pages 1--15. Springer-Verlag, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, Reading, Massachusetts, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824--840. ACM, October 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cachin and J. A. Poritz. Secure intrusion-tolerant replication on the Internet. International Conference on Dependable Systems and Networks, pages 167--176. IEEE, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Castro and B. Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398--461, November 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Chockler, D. Malkhi, and M. Reiter. Backoff protocols for distributed mutual exclusion and ordering. International Conference on Distributed Computing Systems, pages 11--20. IEEE, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. P. Fry and M. K. Reiter. Nested objects in a Byzantine quorum-replicated system. Symposium on Reliable Distributed Systems. IEEE, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. ACM SIGMOD International Conference on Management of Data. Published as SIGMOD Record, 25(2):173--182. ACM, June 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. D. Gribble, E. A. Brewer, J. M. Hellerstein, and D. Culler. Scalable, distributed data structures for internet service construction. Symposium on Operating Systems Design and Implementation, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: double-ended queues as an example. International Conference on Distributed Computing Systems, pages 522--529. IEEE, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492. ACM, July 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, and B. Kemme. Are quorums an alternative for data replication? ACM Transactions on Database Systems (TODS), 28(3):257--294. ACM, September 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Katcher. PostMark: a new file system benchmark. Technical report TR3022. Network Appliance, October 1997.]]Google ScholarGoogle Scholar
  18. K. P. Kihlstrom, L. E. Moser, and P. M. Melliar-Smith. The SecureRing group communication system. ACM Transactions on Information and Systems Security, 1(4):371--406. IEEE, November 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems, 6(2):213--226, June 1981.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kursawe. Optimistic Byzantine agreement. Symposium on Reliable Distributed Systems, pages 262--267. IEEE, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133--169. ACM Press, May 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401. ACM, July 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. L. Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks, 2:95--114, 1978.]]Google ScholarGoogle Scholar
  24. W. Litwin and T. Schwarz. LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes. ACM SIGMOD International Conference on Management of Data, pages 237--248. ACM, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: abstractions as the foundation for storage infrastructure. Symposium on Operating Systems Design and Implementation, pages 105--120. USENIX Association, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Malkhi and M. Reiter. Byzantine quorum systems. Distributed Computing, 11(4):203--213. Springer-Verlag, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Malkhi, M. Reiter, and A. Wool. The load and availability of Byzantine quorum systems. SIAM Journal of Computing, 29(6):1889--1906. Society for Industrial and Applied Mathematics, April 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Malkhi and M. K. Reiter. An architecture for survivable coordination in large distributed systems. IEEE Transactions on Knowledge and Data Engineering, 12(2). IEEE, April 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J.-P. Martin and L. Alvisi. Fast Byzantine consensus. International Conference on Dependable Systems and Networks. IEEE, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J.-P. Martin, L. Alvisi, and M. Dahlin. Minimal Byzantine storage. International Symposium on Distributed Computing, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Morris. Storage: from atoms to people. Keynote address at Conference on File and Storage Technologies, January 2002.]]Google ScholarGoogle Scholar
  32. M. Naor and A. Wool. The load, capacity, and availability of quorum systems. SIAM Journal on Computing, 27(2):423--447. SIAM, April 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. K. Reiter. The Rampart toolkit for building high-integrity services. Theory and Practice in Distributed Systems(Lecture Notes in Computer Science 938), pages 99--110, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: the OceanStore prototype. Conference on File and Storage Technologies. USENIX Association, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. L. Rivest. The MD5 message-digest algorithm, RFC--1321. Network Working Group, IETF, April 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 22(4):299--319, December 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Thambidurai and Y.-K. Park. Interactive consistency with multiple failure modes. Symposium on Reliable Distributed Systems, pages 93--100. IEEE, 1988.]]Google ScholarGoogle ScholarCross RefCross Ref
  38. R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. Symposium on Operating Systems Design and Implementation, pages 91--104. USENIX Association, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Wang, D. Feng, X. Lai, and H. Yu. Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD. Report 2004/199. Cryptology ePrint Archive, August 2004. http://eprint.iacr.org/.]]Google ScholarGoogle Scholar
  40. A. Wool. Quorum systems in replicated databases: science or fiction. Bull. IEEE Technical Committee on Data Engineering, 21(4):3--11. IEEE, December 1998.]]Google ScholarGoogle Scholar
  41. L. Zhou, F. B. Schneider, and R. V. Renesse. COCA: A secure distributed online certification authority. ACM Transactions on Computer Systems, 20(4):329--368. ACM, November 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fault-scalable Byzantine fault-tolerant services

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles
          October 2005
          259 pages
          ISBN:1595930795
          DOI:10.1145/1095810
          • cover image ACM SIGOPS Operating Systems Review
            ACM SIGOPS Operating Systems Review  Volume 39, Issue 5
            SOSP '05
            December 2005
            290 pages
            ISSN:0163-5980
            DOI:10.1145/1095809
            Issue’s Table of Contents

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 October 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate131of716submissions,18%

          Upcoming Conference

          SOSP '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader