Skip to main content

Hades: A Context-Aware Active Storage Framework for Accelerating Large-Scale Data Analysis

Authors: J. Cernuda, L. Logan, A. Gainaru, J. Lofstead, A. Kougkas, X.-H. Sun

Date: May, 2024

Venue: The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing (CCGRID 2024)

Type: Conference

Abstract

Modern simulation workflows generate and analyze massive amounts of data using I/O libraries like Adios2 and NetCDF. Although extensive work has optimized the I/O processes during the simulation phase, executing analytical queries-which often require iterative traversals of large files for insights-is cumbersome and usually constrained by low I/O performance. Instead of waiting for the analysis phase to process queries, quantities can be derived asynchronously during data production and cached, speeding up future queries. In this work, we introduce a context-aware I/O layer named 'Hades.' It is designed to efficiently derive insights from selected quantities without compromising overall workflow performance. Hades actively and asynchronously computes and stores these quantities while the data is in transit. Hades leverages a hierarchical buffering system with data access-aware prefetching to ensure quick and timely access to relevant data. It offers a flexible query interface empowering users to easily define derived quantities and provide control over data placement decisions. Hades is implemented using an Adios2 plugin engine and the Hermes buffering platform, enabling transparent use by any Adios-powered application or workflow. Experimental results demonstrate performance improvements by up to 3-4x for tested real-world scientific producer-consumer workflows. Index Terms-Active Storage, Hierarchical Storage, Context Awareness, Metadata Management, Data Operator, In-transit Computing

Tags

Active StorageHierarchical StorageContext AwarenessMetadata ManagementData OperatorIn-Transit ComputingCoeus