eCommons

 

Approximating Perfect Failure Detectors in Asynchronous Distributed Systems

Other Titles

Abstract

Systems that provide group membership services (e.g., Isis) can be used to solve many canonical problems in distributed systems, such as Consensus and Leader Election. However, solving these problems has been shown to require {\em failure detectors} of various strengths, all of which are impossible to implement in an asynchronous system. It has been shown that it is possible to solve Consensus in an asynchronous distributed system using a very weak failure detector. However, existing group membership systems do not appear to implement this type of failure detector. Furthermore, there are other problems that can be solved with group membership services, such as Leader Election, that are harder than Consensus and cannot be solved without a perfect failure detector. This leads to an apparent contradiction: how can provably impossible problems be solved in existing group membership systems? In this dissertation, we investigate {\em approximations} to perfect failure detectors in order to obtain acceptable and practical solutions to otherwise-impossible problems such as Election. We argue that approximating a perfect failure detector yields approximate solutions to these problems, and define a notion of approximation that yields solutions that are useful in practice. We give a formal specification of an approximately-perfect failure detector, give two protocols that implement our approximation, and derive upper bounds on the number of tolerable failures for any such protocol. The approximately-perfect failure detector presented in this dissertation is similar to that used in Isis; this implies that the failure detector implemented in group membership services such as Isis are actually approximately-perfect failure detectors, and that the resulting solutions to Consensus, Election, and other problems are approximate solutions. Furthermore, the protocols that implement our specification are very simple and can be easily implemented. Such an implementation could be used in existing or future group membership services, or by other systems programmers for whom group membership services are not necessary but for whom failure detection would be useful.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

1995-10

Publisher

Cornell University

Keywords

computer science; technical report

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR95-1550

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

technical report

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record