eCommons

 

Reducing Costs Of Byzantine Fault Tolerant Distributed Applications

Other Titles

Author(s)

Abstract

Byzantine fault tolerance (BFT) is a powerful technique for building software that tolerates arbitrary failures. The technique has been developed since the 70s and has a rich research literature. Yet no production system has adopted the technique, despite an increasing need from today's large and complex systems. An important reason is that a BFT system incurs significantly higher costs than a crash-tolerant counterpart. The costs include developmental costs, as a BFT system is harder to design and implement correctly, and operational costs, as a BFT system requires to run on more machines, employs more expensive cryptographic operations, and sends more and larger messages. This dissertation attempts to reduce both developmental and operational costs of BFT distributed applications. In the first part, we propose an approach to translate existing crash-tolerant systems to Byzantine-fault-tolerant. Our approach makes fewer assumptions about the source crash-tolerant distributed applications, and thus it is applicable to a larger set of applications than existing approaches. We propose and prove correct our basic approach. Then we extend the basic approach to large scale distributed applications and propose additional mechanisms to deal with practical problems in such settings-namely, replication and host churns. We evaluate our scalable translation approach by a simulation and case studies. The second part of the dissertation presents a novel Byzantine replication approach that focuses on reducing resource costs (but can also be used as a translation). By leveraging an external reconfiguration service, our replication approach requires only t + 1 replicas and t witnesses (lighter hosts) to tolerate t Byzantine faults. Our approach also uses inexpensive HMAC signatures and imposes a chain communication pattern between the replicas to reduce CPU and network consumptions. We also propose a variation of the approach, in which Byzantine hosts do not forge CRC checksums, to further reduce the resource costs. In particular, it uses t + 1 replicas, CRC checksums, and t fewer rounds of communication. We evaluate the performance of a prototype of each variation in the common case, when it runs without failures.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2011-08-31

Publisher

Keywords

Byzantine fault tolerance; asynchronous distributed applications; developmental costs; operational costs

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Van Renesse, Robbert

Committee Co-Chair

Committee Member

Easley, David Alan
Halpern, Joseph Yehuda

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record