Skip to main content

SCALER: Scalable Parallel File Write in HDFS

Authors: X. Yang, Y. Yin, H. Jin, X.-H. Sun

Date: September, 2014

Venue: International Conference on Cluster Computing 2014 (Cluster'14), Madrid, Spain

Type: Conference

Abstract

Two camps of file systems exist: parallel file systems designed for conventional high performance computing (HPC) and distributed file systems designed for newly emerged data- intensive applications. Addressing the big data challenge requires an approach that utilizes both high performance computing and data-intensive computing power. Thus, HPC applications may need to interact with distributed file systems, such as HDFS. The N-1 (N-to-1) parallel file write is a critical technical challenge, because it is very common for HPC applications but HDFS does not allow it. This study introduces a system solution, named SCALER, which allows MPI based applications to directly access HDFS without extra data movement. SCALER supports N-1 file write at both the inter-block level and intra-block level. Experimental results confirm that SCALER achieves the design goal efficiently.