Dependable Decentralized Storage Management for Cloud Computing

Citation:
Paulo J.  2015.  Dependable Decentralized Storage Management for Cloud Computing.

Thesis Type:

PhD Thesis

Abstract:

The volume of worldwide digital information is growing and will continue to grow at an impressive rate. Storage deduplication is accepted as valuable technique for handling such data explosion. Namely, by eliminating unnecessary duplicate content from storage systems, both hardware and storage management costs can be improved. Nowadays, this technique is applied to distinct storage types and, it is increasingly desired in cloud computing infrastructures, where a significant portion of worldwide data is stored. However, designing a deduplication system for cloud infrastructures is a complex task, as duplicates must be found and eliminated across a distributed cluster that supports virtual machines and applications with strict storage performance requirements. The core of this dissertation addresses precisely the challenges of cloud infrastructures deduplication. We start by surveying and comparing the existing deduplication systems and the distinct storage environments targeted by them. This discussion is missing in the literature and it is important for understanding the novel issues that must be addressed by cloud deduplication systems. Then, as our main contribution, we introduce our own deduplication system that eliminates duplicates across virtual machine volumes in a distributed cloud infrastructure. Redundant content is found and removed in a cluster-wide fashion while having a negligible impact in the performance of applications using the deduplicated volumes.
Our prototype is evaluated in a real distributed setting with a benchmark suited for deduplication systems, which is also a contribution of this dissertation.

Citation Key:

pp15
PreviewAttachmentSize
pp15.pdf4.93 MB