Skip to main content

Jarvis: Towards a Shared, User-Friendly, and Reproducible, I/O Infrastructure.

Authors: J. Cernuda, L. Logan, N. Lewis, S. Byna, X.-H. Sun, A. Kougkas

Date: November, 2024

Venue: The International Parallel Data Systems Workshop (PDSW'24)

Type: Workshop

Abstract

Hardware is becoming increasingly heterogeneous in modern high-performance computing clusters. However, computing environments for developing tools to harness these technologies are not easily available to researchers. This work showcases the need for a new high-pace, heterogeneous I/O research cluster and presents a novel software deployment framework named Jarvis to manage its hardware diversity. Jarvis is an extensible Python framework that allows users to create Packages that deploy, manage, and monitor software, including complex applications (e.g., scientific simulations), support tools (e.g., Darshan, GDB), and storage systems (e.g., Lustre, DAOS). These packages can be combined to form complex deployment Pipelines. To ensure pipelines are portable across hardware, Jarvis defines a novel Resource Graph schema file, which is a snapshot of a cluster's machine-specific information. This schema can be queried by Jarvis packages to deploy software across diverse hardware compositions with minimal user effort. Index Terms-deployment, HPC, hardware abstraction, resource management, I/O, Python

Tags

DeploymentHPCHardware AbstractionI/OPythonResource Management

Links