We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Fault-Tolerant Computing with Unreliable Channels (Extended Version)

Abstract: We study implementations of basic fault-tolerant primitives, such as consensus and registers, in message-passing systems subject to process crashes and a broad range of communication failures. Our results characterize the necessary and sufficient conditions for implementing these primitives as a function of the connectivity constraints and synchrony assumptions. Our main contribution is a new algorithm for partially synchronous consensus that is resilient to process crashes and channel failures and is optimal in its connectivity requirements. In contrast to prior work, our algorithm assumes the most general model of message loss where faulty channels are flaky, i.e., can lose messages without any guarantee of fairness. This failure model is particularly challenging for consensus algorithms, as it rules out standard solutions based on leader oracles and failure detectors. To circumvent this limitation, we construct our solution using a new variant of the recently proposed view synchronizer abstraction, which we adapt to the crash-prone setting with flaky channels.
Comments: Extended version of a paper in the 27th International Conference on Principles of Distributed Systems (OPODIS 2023)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2305.15150 [cs.DC]
  (or arXiv:2305.15150v2 [cs.DC] for this version)

Submission history

From: Alejandro Naser-Pastoriza [view email]
[v1] Wed, 24 May 2023 13:49:05 GMT (372kb,D)
[v2] Tue, 28 Nov 2023 16:29:33 GMT (95kb,D)

Link back to: arXiv, form interface, contact.