An Evaluation of DAOS for Simulation and Deep Learning HPC Workloads
Authors: L. Logan, J. Lofstead, A. Kougkas, X.-H. Sun
Date: May, 2023
Venue: The 3rd Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems (CHEOPS'23)
Type: Workshop
Abstract
Traditionally, distributed storage systems have relied upon the interfaces provided by OS kernels to interact with stor- age hardware. However, much research has shown that OSes impose serious overheads on every I/O operation, especially on high-performance storage and networking hardware (e.g., PMEM and 200GBe). Thus, distributed storage stacks are being re-designed to take advantage of this modern hard- ware by utilizing new hardware interfaces which bypass the kernel entirely. However, the impact of these optimizations have not been well-studied for real HPC workloads on real hardware. In this work, we provide a comprehensive evalua- tion of DAOS: a state-of-the-art distributed storage system which re-architects the storage stack from scratch for mod- ern hardware. We compare DAOS against traditional storage stacks and demonstrate that by utilizing optimal interfaces to hardware, performance improvements of up to 6x can be observed in real scientific applications.