Skip to main content
Log in

Tolerating permanent and transient value faults

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Transmission faults allow us to reason about permanent and transient value faults in a uniform way. However, all existing solutions to consensus in this model are either in the synchronous system, or require strong conditions for termination, that exclude the case where all messages of a process can be corrupted. In this paper we introduce eventual consistency in order to overcome this limitation. Eventual consistency denotes the existence of rounds in which processes receive the same set of messages. We show how eventually consistent rounds can be simulated from eventually synchronous rounds, and how eventually consistent rounds can be used to solve consensus. Depending on the nature and number of permanent and transient transmission faults, we obtain different conditions on \(n\), the number of processes, in order to solve consensus in our weak model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This assumption potentially allows corrupted messages on all links in a run; therefore it models dynamic faults.

  2. This assumption makes sense in the context of transient faults.

  3. We give three algorithms in order to show the generality of our approach.

  4. W.l.o.g., the same message is sent to all. Because of transmission faults, this does not prevent two processes \(p\) and \(q\) from receiving different messages from some process \(s\).

  5. The notion of a simulation differs from the notion of a translation of the HO model for benign faults. A translation establishes a relation purely based on connectivity, while with value faults, also some computation is involved. Because of this, we decided thus to use the term simulation instead.

  6. The sending function in a simulation algorithm is thus a function that maps \(states_p\) and the input from \(\mathcal {M}\) to a unique message from \(\mathcal {M}\); while the state-transition function \(T_{p}^r\) is a function that maps \(states_p\), the input from \(\mathcal {M}\), and a partial vector (indexed by \(\varPi \)) of elements of \(\mathcal {M}\) to \(states_p\).

  7. At line 16, the reception vector \(\varvec{\mu }_p^r\) is a vector of vectors: \(\varvec{\mu }_p^r[q']\) is the vector \(p\) has received from \(q'\), and \(\varvec{\mu }_p^r[q'][q]\) is element \(q\) of this vector.

  8. Similar as in the description of Algorithm 1, in case of messages that contain a vector of messages, we focus only on those elements of the vectors that are related to the message sent by process \(p_2\).

    Fig. 4
    figure 4

    Algorithm 2 from the point of view of \(v_2\) sent by \(p_2;\, p_1\) is the coordinator, \(n=4,\, f=1\)

  9. The two rounds of \({ BOTR}\) algorithm can be merged in a single round in which the code of both state-transition functions is executed at once. We have split them in two rounds to emphasize on the different communication predicates required.

  10. Consider two phases \(\phi _0\) and \(\phi _0+1\), such that a process has decided \(\bar{v}\) in phase \(\phi _0\). We consider the more general case in the presence of dynamic faults, and we assume that \(n = 5\), \(f=\alpha =1\) and \(T=4\). This means that at least \(T-\alpha =3\) processes have \(ts=\phi _0\) and \(vote=\bar{v}\). Consider in phase \(\phi _0+1\) that \((v,ts) \in possibleV_p\) at \(p\) with \(v\ne \bar{v}\). This means that \(p\), in round \(3(\phi _0\!+\!1)-2\), has received \(T=4\) messages with either \((v,ts,-)\), or \((-,ts',-)\) and \(ts'<ts\). Since \(n=5\) and \(T=4\), at least one of these messages is from a process \(c\) such that \(vote_c=\bar{v}\) and \(ts_c=\phi _0\). Since \(v\ne \bar{v}\), we must have \(\phi _0 < ts\). However, in phase \(\phi _0+1\), no process \(p\) can have \((v,ts)\) with \(ts> \phi _0\) in \(history_p\). Therefore, by line 31, we will not have \(v \in confirmedV\).

  11. Process \(p_1\) decided \(v_1\) by receiving correctly messages from processes \(p_1\),\(p_2\) and \(p_3\) and the corrupted message \(\langle v_1,\phi _1,-\rangle \) from \(p_4\).

  12. This observation was made already in [19] and [4], but without giving algorithms supporting the observation.

References

  1. Abraham, I., Chockler, G., Keidar, I., Malkhi, D.: Byzantine disk paxos: optimal resilience with byzantine shared memory. Distrib. Comput. 18(5), 387–408 (2006)

    Article  MATH  Google Scholar 

  2. Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Consensus with byzantine failures and little system synchrony. In: Dependable Systems and Networks (DSN 2006), pp. 147–155 (2006)

  3. Anagnostou, E., Hadzilacos, V.: Tolerating transient and permanent failures (extended abstract). In: Proceedings of the 7th International Workshop on Distributed Algorithms, WDAG ’93, pp. 174–188. Springer, Berlin (1993)

  4. Biely, M., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A., Widder, J.: Tolerating corrupted communication. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). ACM Press (2007)

  5. Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theor. Comput. Sci. 412(40), 5602–5630 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  6. Borran, F., Hutle, M., Schiper, A.: Timing analysis of leader-based and decentralized byzantine consensus algorithms. In: LADC, pp. 166–175 (2011)

  7. Borran, F., Schiper, A.: A leader-free byzantine consensus algorithm. In: ICDCN, pp. 67–78 (2010)

  8. Borran, F., Hutle, M., Santos, N., Schiper, A.: Quantitative analysis of consensus algorithms. IEEE Trans. Dependable Secur. Comput. 9(2), 236–249 (2012)

    Article  Google Scholar 

  9. Brasileiro, F.V., Greve, F., Mostéfaoui, A., Raynal, M.: Consensus in one communication step. In: Proceedings of the 6th International Conference on Parallel Computing Technologies, PaCT ’01, pp. 42–50. Springer, London (2001)

  10. Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)

    Article  Google Scholar 

  11. Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with benign faults. Distributed Comput. 22(1), 49–71 (2009)

    Article  MATH  Google Scholar 

  12. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  13. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

    Google Scholar 

  14. Gafni, E.: Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In: Proceeding of the 16th Annual ACM Symposium on Principles of Distributed Computing (PODC’98), pp. 143–152. ACM Press, Puerto Vallarta (1998)

  15. Lamport, L.: Byzantizing paxos by refinement. In: Proceedings of the 25th International Conference on Distributed Computing, DISC’11, pp. 211–224. Springer, Berlin (2011)

  16. Lamport, L.: Fast paxos. Tech. Rep. MSR-TR-2005-12, Microsoft Research (2005)

  17. Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  18. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16, 133–169 (1998)

    Article  Google Scholar 

  19. Lampson, B.: The abcd’s of paxos. In: Proceeding of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC’01), p. 13. ACM Press (2001)

  20. Martin, J.P., Alvisi, L.: Fast byzantine consensus. Trans Dependable Secur. Comput. 3(3), 202–214 (2006)

    Article  Google Scholar 

  21. Milosevic, Z., Hutle, M., Schiper, A.: Unifying Byzantine consensus algorithms with weak interactive consistency. In: 12th International Conference on Principles of Distributed Systems (OPODIS 2009) (2009)

  22. Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  23. Pedone, F., Schiper, A., Urbán, P., Cavin, D.: Solving agreement problems with weak ordering oracles. In: Proceedings of the 4th European Dependable Computing Conference on Dependable Computing, EDCC-4, pp. 44–61. Springer, London (2002)

  24. Pinter, S.S., Shinahr, I.: Distributed agreement in the presence of communication and process failures. In: Proceedings of the 14th IEEE Convention of Electrical & Electronics Engineers in Israel. IEEE (1985)

  25. Rutti, O., Milosevic, Z., Schiper, A.: Generic construction of consensus algorithms for benign and byzantine faults. In: 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 343–352. IEEE Computer Society, Los Alamitos (2010)

  26. Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Annual Symposium on Theor. Aspects of Computer Science (STACS’89), LNCS, vol. 349, pp. 304–313. Springer, Paderborn (1989)

  27. Santoro, N., Widmayer, P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384(2–3), 232–249 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  28. Sayeed, H.M., Abu-Amara, M., Abu-Amara, H.: Optimal asynchronous agreement and leader election algorithm for complete networks with byzantine faulty links. Distrib. Comput. 9, 147–156 (1995)

    Article  MathSciNet  Google Scholar 

  29. Schmid, U., Weiss, B., Rushby, J.: Formally verified byzantine agreement in presence of link faults. In: 22nd International Conference on Distributed Computing Systems (ICDCS’02), pp. 608–616. Austria, Vienna (2002)

  30. Siu, H.S., Chin, Y.H., Yang, W.P.: Byzantine agreement in the presence of mixed faults on processors and links. IEEE Trans. Parallel Distrib. Syst. 9, 335–345 (1998)

    Article  Google Scholar 

  31. Yan, K.Q., Chin, Y.H., Wang, S.C.: Optimal agreement protocol in malicious faulty processors and faulty links. IEEE Trans. Knowl. Data Eng. 4(3), 266–280 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zarko Milosevic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milosevic, Z., Hutle, M. & Schiper, A. Tolerating permanent and transient value faults. Distrib. Comput. 27, 55–77 (2014). https://doi.org/10.1007/s00446-013-0199-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-013-0199-7

Keywords

Navigation