Tools
This page provides an overview of the supplementary tools distributed with WisIO.
wisio-recorder2parquet
The wisio-recorder2parquet
tool is a command-line utility designed to convert I/O trace files generated by the Recorder tracing tool into the Apache Parquet format. This conversion is beneficial for efficient storage and subsequent analysis, as Parquet is a columnar storage format optimized for analytical workloads.
Functionality
- Input: Takes raw trace files generated by the Recorder tool. These files typically contain detailed records of I/O operations performed by an application.
- Processing:
- Parses individual trace records, extracting information such as function calls (e.g.,
open
,read
,write
, POSIX I/O, MPI I/O calls), timestamps, file identifiers, process/rank information, and data transfer sizes. - Categorizes I/O operations (e.g., read, write, metadata).
- Extracts metadata from the input trace file paths, such as hostname, application name, and process ID.
- Parses individual trace records, extracting information such as function calls (e.g.,
- Output: Generates Parquet files containing the structured I/O trace data. The schema of the Parquet files includes the following fields:
Field Name | Data Type | Description |
---|---|---|
index | Int64 | Record index |
level | Int32 | Call stack level (if available) |
tstart | Float32 | Start timestamp |
tmid | Int64 | Timestamp midpoint |
tend | Float32 | End timestamp |
duration | Float32 | Duration of the operation |
hostname | UTF8 String | Hostname where the operation occurred |
app | UTF8 String | Application name |
rank | Int32 | MPI rank |
proc_name | UTF8 String | Process name |
proc_id | Int64 | Unique process identifier |
thread_id | Int32 | Thread identifier |
cat | Int32 | Operation category |
io_cat | Int32 | I/O category (Read, Write, Metadata) |
func_id | UTF8 String | Function name/identifier |
acc_pat | Int32 | Access pattern (e.g., sequential, random) |
file_id | Int64 | Unique file identifier |
file_name | UTF8 String | Name of the file involved in the operation |
size | Int64 | Size of the I/O operation (bytes) |
bandwidth | Float32 | Calculated bandwidth for the operation |
Usage
The wisio-recorder2parquet
tool is typically built as part of the WisIO project, specifically within the recorder
subproject. Its direct usage involves invoking the compiled executable with the path to the Recorder trace files.
mpirun -n 8 wisio-recorder2parquet <input_recorder_trace_directory>
The tool processes the traces from the specified <input_recorder_trace_directory>
. It outputs one or more .parquet
files into a subdirectory named _parquet
, which is automatically created within the <input_recorder_trace_directory>
. These resulting Parquet files can then be used as input for the WisIO recorder
analyzer.