eCommons

 

Analysis And Design Of Advanced Caching Solutions For The Modern Web

Other Titles

Author(s)

Abstract

The rise of modern Web applications has seen a surge in the quantity of digital content - photos, videos, and user interactions - stored, accessed, and transmitted by Internet services. To better handle such content, popular Web services, such as Facebook, have deployed large cache tiers within their serving stacks to lessen the load on backend systems and to decrease the data-request latency for users. Designing such cache infrastructures exposes challenges across three dimensions: (1) Modern Web workloads differ from earlier traditional workloads, due to the large amount of user-created content, as well as the frequency of updates due to user interactions; therefore, more advanced cache-replacement policies are required to provide sustained high hit-ratios, the key metric for caching performance. (2) Flash devices are extensively used due to their cost advantage over DRAM and their significantly higher I/O performance than magnetic disks; however, flash often yields low performance and high wearing costs with the small random writes that advanced caching algorithms tend to generate. (3) Load balance is critical for the scalability of an entire cache-server tier; however, different content within the modern Web may have disparate popularities, and the dependent data-partition mechanism is often used to co-locate relative data in favor of advanced application queries. Both scenarios exacerbate the imbalance. In this dissertation, we use two existing cache systems within the Facebook infrastructure - the photo-cache as a representative of static-content cache, and the Tao cache as an example of in-memory cache solutions - to address the three challenges men- tioned above. First, we examine the workload caused by accessing photos on Facebook and the effectiveness of the many layers of photo caches that have been deployed. By analyzing an event stream covering almost 80 million requests for more than 1 million unique photos, we are able to study cache-access patterns, evaluate current cache efficacy, and explore the potential performance benefits of certain advanced eviction algorithms at multiple cache layers. Second, when building advanced cache on flash devices, in order to address the performance degradation caused by small random writes, we propose the novel RIPQ (Restricted Insertion Priority Queue) framework. RIPQ allows for advanced caching algorithms with large cache sizes, high throughput, and long device lifetime. RIPQ maintains an approximate priority queue efficiently on flash by aggregating small random writes into large writes to a restricted set of insertion points, lazily moving items, and co-locating items with similar priorities. We show that two families of advanced caching algorithms, Segmented-LRU and GDSF (Greedy-Dual-Size-Frequency), can be easily implemented with RIPQ. Our evaluation builds on traces from Facebook's photoserving stack shows that GDSF algorithms with RIPQ can improve cache hit ratios by 20% over the current production system, while incurring low overhead and achieving high throughput. Third, we investigate the principal causes of load imbalance - including data colocation, non-ideal hashing scenario, and hot-spot temporal effects. We analyze Facebook's runtime traffic against the partitioned cache tier in front of Tao, a subsystem that stores objects associated with Facebook's social graph. As part of this investigation, we also employ trace-drive analytics to study the benefits and limitations of current loadbalancing methods - including consistent hashing and hot-partition replication (with front-end caching as a special case) - and we suggest areas for future research.

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

2014-08-18

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Birman, Kenneth Paul

Committee Co-Chair

Van Renesse, Robbert

Committee Member

Orman, Levent V.
Snavely, Keith Noah

Degree Discipline

Computer Science

Degree Name

Ph. D., Computer Science

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record