Skip to main content

HReplica: A Dynamic Data Replication Engine with Adaptive Compression for Multi-Tiered Storage

Authors: H. Devarajan, A. Kougkas, X.-H. Sun

Date: December, 2020

Venue: The 2020 IEEE International Conference on Big Data (Big Data'20), December 10-13, 2020

Type: Conference

Abstract

As the diversity of big data applications increases, their requirements diverge and often conflict with one other. Managing this diversity in any supercomputer or data center is a major challenge for system designers. Data replication is a popular approach to meet several of these requirements, such as low latency, read availability, durability, etc. This approach can be enhanced using new modern heterogeneous hardware and soft- ware techniques such as data compression. However, both these enhancements work in isolation to the detriment of both. In this work, we present HReplica: a dynamic data replication engine which harmoniously leverages data compression and hierarchical storage to increase the effectiveness of data replication. We have developed a novel dynamic selection algorithm that facilitates the optimal matching of replication schemes, compression libraries, and tiered storage. Our evaluation shows that HReplica can improve scientific and cloud application performance by 5.2x when compared to other state-of-the-art replication schemes. Index Terms-data replication, dynamic, selection algorithm, multi-tiered, data compression, intelligent selection, dynamic programming, cloud application, scientific application, big data.

Tags

Data ReplicationDynamicSelection AlgorithmMulti-TieredData CompressionIntelligent SelectionDynamic ProgrammingCloud ApplicationScientific ApplicationsBig DataHermes