Tolerating permanent and transient value faults

Milosevic, Zarko; Hutle, Martin; Schiper, André

doi:10.1007/s00446-013-0199-7

Tolerating permanent and transient value faults

Published: 01 November 2013

Volume 27, pages 55–77, (2014)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Zarko Milosevic¹,
Martin Hutle² &
André Schiper¹

270 Accesses
2 Citations
Explore all metrics

Abstract

Transmission faults allow us to reason about permanent and transient value faults in a uniform way. However, all existing solutions to consensus in this model are either in the synchronous system, or require strong conditions for termination, that exclude the case where all messages of a process can be corrupted. In this paper we introduce eventual consistency in order to overcome this limitation. Eventual consistency denotes the existence of rounds in which processes receive the same set of messages. We show how eventually consistent rounds can be simulated from eventually synchronous rounds, and how eventually consistent rounds can be used to solve consensus. Depending on the nature and number of permanent and transient transmission faults, we obtain different conditions on $n$, the number of processes, in order to solve consensus in our weak model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Signature-free asynchronous Byzantine systems: from multivalued to binary consensus with $$t<n/3$$ , $$O(n^2)$$ messages, and constant time

Article 29 April 2016

Signature-Free Asynchronous Byzantine Systems: From Multivalued to Binary Consensus with t < n/3, O(n 2) Messages, and Constant Time

A Closer Look at Fault Tolerance

Article 15 May 2017

Notes

This assumption potentially allows corrupted messages on all links in a run; therefore it models dynamic faults.
This assumption makes sense in the context of transient faults.
We give three algorithms in order to show the generality of our approach.
W.l.o.g., the same message is sent to all. Because of transmission faults, this does not prevent two processes $p$ and $q$ from receiving different messages from some process $s$.
The notion of a simulation differs from the notion of a translation of the HO model for benign faults. A translation establishes a relation purely based on connectivity, while with value faults, also some computation is involved. Because of this, we decided thus to use the term simulation instead.
The sending function in a simulation algorithm is thus a function that maps $states_p$ and the input from $\mathcal {M}$ to a unique message from $\mathcal {M}$; while the state-transition function $T_{p}^r$ is a function that maps $states_p$, the input from $\mathcal {M}$, and a partial vector (indexed by $\varPi $) of elements of $\mathcal {M}$ to $states_p$.
At line 16, the reception vector $\varvec{\mu }_p^r$ is a vector of vectors: $\varvec{\mu }_p^r[q']$ is the vector $p$ has received from $q'$, and $\varvec{\mu }_p^r[q'][q]$ is element $q$ of this vector.
Similar as in the description of Algorithm 1, in case of messages that contain a vector of messages, we focus only on those elements of the vectors that are related to the message sent by process $p_2$.
Fig. 4
Algorithm 2 from the point of view of $v_2$ sent by $p_2;\, p_1$ is the coordinator, $n=4,\, f=1$
Full size image
The two rounds of ${ BOTR}$ algorithm can be merged in a single round in which the code of both state-transition functions is executed at once. We have split them in two rounds to emphasize on the different communication predicates required.
Consider two phases $\phi _0$ and $\phi _0+1$, such that a process has decided $\bar{v}$ in phase $\phi _0$. We consider the more general case in the presence of dynamic faults, and we assume that $n = 5$, $f=\alpha =1$ and $T=4$. This means that at least $T-\alpha =3$ processes have $ts=\phi _0$ and $vote=\bar{v}$. Consider in phase $\phi _0+1$ that $(v,ts) \in possibleV_p$ at $p$ with $v\ne \bar{v}$. This means that $p$, in round $3(\phi _0\!+\!1)-2$, has received $T=4$ messages with either $(v,ts,-)$, or $(-,ts',-)$ and $ts'<ts$. Since $n=5$ and $T=4$, at least one of these messages is from a process $c$ such that $vote_c=\bar{v}$ and $ts_c=\phi _0$. Since $v\ne \bar{v}$, we must have $\phi _0 < ts$. However, in phase $\phi _0+1$, no process $p$ can have $(v,ts)$ with $ts> \phi _0$ in $history_p$. Therefore, by line 31, we will not have $v \in confirmedV$.
Process $p_1$ decided $v_1$ by receiving correctly messages from processes $p_1$,$p_2$ and $p_3$ and the corrupted message $\langle v_1,\phi _1,-\rangle $ from $p_4$.
This observation was made already in [19] and [4], but without giving algorithms supporting the observation.

References

Abraham, I., Chockler, G., Keidar, I., Malkhi, D.: Byzantine disk paxos: optimal resilience with byzantine shared memory. Distrib. Comput. 18(5), 387–408 (2006)
Article MATH Google Scholar
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Consensus with byzantine failures and little system synchrony. In: Dependable Systems and Networks (DSN 2006), pp. 147–155 (2006)
Anagnostou, E., Hadzilacos, V.: Tolerating transient and permanent failures (extended abstract). In: Proceedings of the 7th International Workshop on Distributed Algorithms, WDAG ’93, pp. 174–188. Springer, Berlin (1993)
Biely, M., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A., Widder, J.: Tolerating corrupted communication. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). ACM Press (2007)
Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theor. Comput. Sci. 412(40), 5602–5630 (2011)
Article MATH MathSciNet Google Scholar
Borran, F., Hutle, M., Schiper, A.: Timing analysis of leader-based and decentralized byzantine consensus algorithms. In: LADC, pp. 166–175 (2011)
Borran, F., Schiper, A.: A leader-free byzantine consensus algorithm. In: ICDCN, pp. 67–78 (2010)
Borran, F., Hutle, M., Santos, N., Schiper, A.: Quantitative analysis of consensus algorithms. IEEE Trans. Dependable Secur. Comput. 9(2), 236–249 (2012)
Article Google Scholar
Brasileiro, F.V., Greve, F., Mostéfaoui, A., Raynal, M.: Consensus in one communication step. In: Proceedings of the 6th International Conference on Parallel Computing Technologies, PaCT ’01, pp. 42–50. Springer, London (2001)
Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Article Google Scholar
Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with benign faults. Distributed Comput. 22(1), 49–71 (2009)
Article MATH Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Google Scholar
Gafni, E.: Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In: Proceeding of the 16th Annual ACM Symposium on Principles of Distributed Computing (PODC’98), pp. 143–152. ACM Press, Puerto Vallarta (1998)
Lamport, L.: Byzantizing paxos by refinement. In: Proceedings of the 25th International Conference on Distributed Computing, DISC’11, pp. 211–224. Springer, Berlin (2011)
Lamport, L.: Fast paxos. Tech. Rep. MSR-TR-2005-12, Microsoft Research (2005)
Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Article MATH Google Scholar
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16, 133–169 (1998)
Article Google Scholar
Lampson, B.: The abcd’s of paxos. In: Proceeding of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC’01), p. 13. ACM Press (2001)
Martin, J.P., Alvisi, L.: Fast byzantine consensus. Trans Dependable Secur. Comput. 3(3), 202–214 (2006)
Article Google Scholar
Milosevic, Z., Hutle, M., Schiper, A.: Unifying Byzantine consensus algorithms with weak interactive consistency. In: 12th International Conference on Principles of Distributed Systems (OPODIS 2009) (2009)
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)
Article MATH MathSciNet Google Scholar
Pedone, F., Schiper, A., Urbán, P., Cavin, D.: Solving agreement problems with weak ordering oracles. In: Proceedings of the 4th European Dependable Computing Conference on Dependable Computing, EDCC-4, pp. 44–61. Springer, London (2002)
Pinter, S.S., Shinahr, I.: Distributed agreement in the presence of communication and process failures. In: Proceedings of the 14th IEEE Convention of Electrical & Electronics Engineers in Israel. IEEE (1985)
Rutti, O., Milosevic, Z., Schiper, A.: Generic construction of consensus algorithms for benign and byzantine faults. In: 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 343–352. IEEE Computer Society, Los Alamitos (2010)
Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Annual Symposium on Theor. Aspects of Computer Science (STACS’89), LNCS, vol. 349, pp. 304–313. Springer, Paderborn (1989)
Santoro, N., Widmayer, P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384(2–3), 232–249 (2007)
Article MATH MathSciNet Google Scholar
Sayeed, H.M., Abu-Amara, M., Abu-Amara, H.: Optimal asynchronous agreement and leader election algorithm for complete networks with byzantine faulty links. Distrib. Comput. 9, 147–156 (1995)
Article MathSciNet Google Scholar
Schmid, U., Weiss, B., Rushby, J.: Formally verified byzantine agreement in presence of link faults. In: 22nd International Conference on Distributed Computing Systems (ICDCS’02), pp. 608–616. Austria, Vienna (2002)
Siu, H.S., Chin, Y.H., Yang, W.P.: Byzantine agreement in the presence of mixed faults on processors and links. IEEE Trans. Parallel Distrib. Syst. 9, 335–345 (1998)
Article Google Scholar
Yan, K.Q., Chin, Y.H., Wang, S.C.: Optimal agreement protocol in malicious faulty processors and faulty links. IEEE Trans. Knowl. Data Eng. 4(3), 266–280 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

EPFL, 1015 , Lausanne, Switzerland
Zarko Milosevic & André Schiper
Fraunhofer AISEC, Garching, Germany
Martin Hutle

Authors

Zarko Milosevic
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hutle
View author publications
You can also search for this author in PubMed Google Scholar
André Schiper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zarko Milosevic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milosevic, Z., Hutle, M. & Schiper, A. Tolerating permanent and transient value faults. Distrib. Comput. 27, 55–77 (2014). https://doi.org/10.1007/s00446-013-0199-7

Download citation

Received: 23 May 2012
Accepted: 15 October 2013
Published: 01 November 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00446-013-0199-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tolerating permanent and transient value faults

Abstract

Access this article

Similar content being viewed by others

Signature-free asynchronous Byzantine systems: from multivalued to binary consensus with $$t<n/3$$ , $$O(n^2)$$ messages, and constant time

Signature-Free Asynchronous Byzantine Systems: From Multivalued to Binary Consensus with t < n/3, O(n 2) Messages, and Constant Time

A Closer Look at Fault Tolerance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tolerating permanent and transient value faults

Abstract

Access this article

Similar content being viewed by others

Signature-free asynchronous Byzantine systems: from multivalued to binary consensus with $$t<n/3$$ , $$O(n^2)$$ messages, and constant time

Signature-Free Asynchronous Byzantine Systems: From Multivalued to Binary Consensus with t < n/3, O(n 2) Messages, and Constant Time

A Closer Look at Fault Tolerance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation