Abstract
Transmission faults allow us to reason about permanent and transient value faults in a uniform way. However, all existing solutions to consensus in this model are either in the synchronous system, or require strong conditions for termination, that exclude the case where all messages of a process can be corrupted. In this paper we introduce eventual consistency in order to overcome this limitation. Eventual consistency denotes the existence of rounds in which processes receive the same set of messages. We show how eventually consistent rounds can be simulated from eventually synchronous rounds, and how eventually consistent rounds can be used to solve consensus. Depending on the nature and number of permanent and transient transmission faults, we obtain different conditions on \(n\), the number of processes, in order to solve consensus in our weak model.
Similar content being viewed by others
Notes
This assumption potentially allows corrupted messages on all links in a run; therefore it models dynamic faults.
This assumption makes sense in the context of transient faults.
We give three algorithms in order to show the generality of our approach.
W.l.o.g., the same message is sent to all. Because of transmission faults, this does not prevent two processes \(p\) and \(q\) from receiving different messages from some process \(s\).
The notion of a simulation differs from the notion of a translation of the HO model for benign faults. A translation establishes a relation purely based on connectivity, while with value faults, also some computation is involved. Because of this, we decided thus to use the term simulation instead.
The sending function in a simulation algorithm is thus a function that maps \(states_p\) and the input from \(\mathcal {M}\) to a unique message from \(\mathcal {M}\); while the state-transition function \(T_{p}^r\) is a function that maps \(states_p\), the input from \(\mathcal {M}\), and a partial vector (indexed by \(\varPi \)) of elements of \(\mathcal {M}\) to \(states_p\).
At line 16, the reception vector \(\varvec{\mu }_p^r\) is a vector of vectors: \(\varvec{\mu }_p^r[q']\) is the vector \(p\) has received from \(q'\), and \(\varvec{\mu }_p^r[q'][q]\) is element \(q\) of this vector.
Similar as in the description of Algorithm 1, in case of messages that contain a vector of messages, we focus only on those elements of the vectors that are related to the message sent by process \(p_2\).
The two rounds of \({ BOTR}\) algorithm can be merged in a single round in which the code of both state-transition functions is executed at once. We have split them in two rounds to emphasize on the different communication predicates required.
Consider two phases \(\phi _0\) and \(\phi _0+1\), such that a process has decided \(\bar{v}\) in phase \(\phi _0\). We consider the more general case in the presence of dynamic faults, and we assume that \(n = 5\), \(f=\alpha =1\) and \(T=4\). This means that at least \(T-\alpha =3\) processes have \(ts=\phi _0\) and \(vote=\bar{v}\). Consider in phase \(\phi _0+1\) that \((v,ts) \in possibleV_p\) at \(p\) with \(v\ne \bar{v}\). This means that \(p\), in round \(3(\phi _0\!+\!1)-2\), has received \(T=4\) messages with either \((v,ts,-)\), or \((-,ts',-)\) and \(ts'<ts\). Since \(n=5\) and \(T=4\), at least one of these messages is from a process \(c\) such that \(vote_c=\bar{v}\) and \(ts_c=\phi _0\). Since \(v\ne \bar{v}\), we must have \(\phi _0 < ts\). However, in phase \(\phi _0+1\), no process \(p\) can have \((v,ts)\) with \(ts> \phi _0\) in \(history_p\). Therefore, by line 31, we will not have \(v \in confirmedV\).
Process \(p_1\) decided \(v_1\) by receiving correctly messages from processes \(p_1\),\(p_2\) and \(p_3\) and the corrupted message \(\langle v_1,\phi _1,-\rangle \) from \(p_4\).
References
Abraham, I., Chockler, G., Keidar, I., Malkhi, D.: Byzantine disk paxos: optimal resilience with byzantine shared memory. Distrib. Comput. 18(5), 387–408 (2006)
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Consensus with byzantine failures and little system synchrony. In: Dependable Systems and Networks (DSN 2006), pp. 147–155 (2006)
Anagnostou, E., Hadzilacos, V.: Tolerating transient and permanent failures (extended abstract). In: Proceedings of the 7th International Workshop on Distributed Algorithms, WDAG ’93, pp. 174–188. Springer, Berlin (1993)
Biely, M., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A., Widder, J.: Tolerating corrupted communication. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). ACM Press (2007)
Biely, M., Schmid, U., Weiss, B.: Synchronous consensus under hybrid process and link failures. Theor. Comput. Sci. 412(40), 5602–5630 (2011)
Borran, F., Hutle, M., Schiper, A.: Timing analysis of leader-based and decentralized byzantine consensus algorithms. In: LADC, pp. 166–175 (2011)
Borran, F., Schiper, A.: A leader-free byzantine consensus algorithm. In: ICDCN, pp. 67–78 (2010)
Borran, F., Hutle, M., Santos, N., Schiper, A.: Quantitative analysis of consensus algorithms. IEEE Trans. Dependable Secur. Comput. 9(2), 236–249 (2012)
Brasileiro, F.V., Greve, F., Mostéfaoui, A., Raynal, M.: Consensus in one communication step. In: Proceedings of the 6th International Conference on Parallel Computing Technologies, PaCT ’01, pp. 42–50. Springer, London (2001)
Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Charron-Bost, B., Schiper, A.: The heard-of model: computing in distributed systems with benign faults. Distributed Comput. 22(1), 49–71 (2009)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Gafni, E.: Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In: Proceeding of the 16th Annual ACM Symposium on Principles of Distributed Computing (PODC’98), pp. 143–152. ACM Press, Puerto Vallarta (1998)
Lamport, L.: Byzantizing paxos by refinement. In: Proceedings of the 25th International Conference on Distributed Computing, DISC’11, pp. 211–224. Springer, Berlin (2011)
Lamport, L.: Fast paxos. Tech. Rep. MSR-TR-2005-12, Microsoft Research (2005)
Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16, 133–169 (1998)
Lampson, B.: The abcd’s of paxos. In: Proceeding of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC’01), p. 13. ACM Press (2001)
Martin, J.P., Alvisi, L.: Fast byzantine consensus. Trans Dependable Secur. Comput. 3(3), 202–214 (2006)
Milosevic, Z., Hutle, M., Schiper, A.: Unifying Byzantine consensus algorithms with weak interactive consistency. In: 12th International Conference on Principles of Distributed Systems (OPODIS 2009) (2009)
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)
Pedone, F., Schiper, A., Urbán, P., Cavin, D.: Solving agreement problems with weak ordering oracles. In: Proceedings of the 4th European Dependable Computing Conference on Dependable Computing, EDCC-4, pp. 44–61. Springer, London (2002)
Pinter, S.S., Shinahr, I.: Distributed agreement in the presence of communication and process failures. In: Proceedings of the 14th IEEE Convention of Electrical & Electronics Engineers in Israel. IEEE (1985)
Rutti, O., Milosevic, Z., Schiper, A.: Generic construction of consensus algorithms for benign and byzantine faults. In: 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 343–352. IEEE Computer Society, Los Alamitos (2010)
Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Annual Symposium on Theor. Aspects of Computer Science (STACS’89), LNCS, vol. 349, pp. 304–313. Springer, Paderborn (1989)
Santoro, N., Widmayer, P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384(2–3), 232–249 (2007)
Sayeed, H.M., Abu-Amara, M., Abu-Amara, H.: Optimal asynchronous agreement and leader election algorithm for complete networks with byzantine faulty links. Distrib. Comput. 9, 147–156 (1995)
Schmid, U., Weiss, B., Rushby, J.: Formally verified byzantine agreement in presence of link faults. In: 22nd International Conference on Distributed Computing Systems (ICDCS’02), pp. 608–616. Austria, Vienna (2002)
Siu, H.S., Chin, Y.H., Yang, W.P.: Byzantine agreement in the presence of mixed faults on processors and links. IEEE Trans. Parallel Distrib. Syst. 9, 335–345 (1998)
Yan, K.Q., Chin, Y.H., Wang, S.C.: Optimal agreement protocol in malicious faulty processors and faulty links. IEEE Trans. Knowl. Data Eng. 4(3), 266–280 (1992)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Milosevic, Z., Hutle, M. & Schiper, A. Tolerating permanent and transient value faults. Distrib. Comput. 27, 55–77 (2014). https://doi.org/10.1007/s00446-013-0199-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-013-0199-7