Skip to main content

Characterization & Implications of Dataflow in HPC Workflows

Authors: M. Tang, L. Guo, A. Kougkas, X.-H. Sun, N. R. Tallent

Date: May, 2026

Venue: 40th IEEE International Parallel & Distributed Processing Symposium

Type: Conference

Abstract

Scientific discoveries are increasingly facilitated by data-driven workflows executed in HPC environments, where complex task-to-data exchange patterns cause bottlenecks in shared storage. Existing I/O-aware scheduling techniques use application-oriented metrics that primarily focus on individual task characteristics, leaving workflow-level dataflow behaviors—such as file reuse and producer–consumer I/O patterns—underexplored despite their importance for effective orchestration. This paper presents a workflow-centric methodology to characterize and analyze I/O using workflow-specific patterns and metrics. By correlating workflow structures (e.g., task composition, execution order, and producer–consumer relationships) with I/O metrics (e.g., access type, operation count, and data size), we gain deeper insight into workflow I/O dynamics. We evaluate our approach on six diverse HPC workflows, covering 3,500 task instances and 1,387 unique files, with a total I/O volume of 1.5 TB. This analysis uncovers a spectrum of behaviors related to file reuse, data exchange, and task-level I/O heterogeneity. These insights reveal new opportunities and research directions for I/O-aware scheduling, supporting smarter task placement, data movement, and storage selection in HPC.

Tags

HPC AnalysisI/OWorkflow OptimizationData AnalyticsPerformance Measurement