Clouder: A Flexible Large Scale Decentralized Object Store

Citation:
Vilaça R.  2012.  Clouder: A Flexible Large Scale Decentralized Object Store.

Thesis Type:

PhD Thesis

Abstract:

Large scale data stores have been initially introduced to support a few concrete extreme scale applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. In strong contrast with traditional relational databases (RDBMS), large scale data stores present very simple data models and APIs, lacking most of the established relational data management operations; and relax consistency guarantees, providing eventual consistency. With a number of alternatives now available and mature, there is an increasing willingness to use them in a wider and more diverse spectrum of applications, by skewing the current trade-off towards the needs of common business users, and easing the migration from current RDBMS. This is particularly so when used in the context of a Cloud solution such as in a Platform as a Service (PaaS). This thesis aims at reducing the gap between traditional RDBMS and large scale data stores, by seeking mechanisms to provide additional consistency guarantees and higher level data processing primitives in large scale data stores. The devised mechanisms should not hinder the scalability and dependability of large scale data stores. Regarding, higher level data processing primitives this thesis explores two complementary approaches: by extending data stores with additional operations such as general multi-item operations; and by coupling data stores with other existent processing facilities without hindering scalability. We address this challenges with a new architecture for large scale data stores, efficient multi item access for large scale data stores, and SQL processing atop large scale data stores. The novel architecture allows to find the right trade-offs among flexible usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data placement strategy, that allow to efficiently store and retrieve large sets of related data at once. For efficient SQL support atop scalable data stores we devise design modifications to existing relational SQL query engines, allowing them to be distributed. We demonstrate our approaches with running prototypes and extensive experimental evaluation using proper workloads.

Citation Key:

phd
PreviewAttachmentSize
ricardo_manuel_pereira_vilaca.pdf5.91 MB