Skip to main content

Tools

This page provides an overview of the supplementary tools distributed with WisIO.

wisio-recorder2parquet

The wisio-recorder2parquet tool is a command-line utility designed to convert I/O trace files generated by the Recorder tracing tool into the Apache Parquet format. This conversion is beneficial for efficient storage and subsequent analysis, as Parquet is a columnar storage format optimized for analytical workloads.

Functionality

  • Input: Takes raw trace files generated by the Recorder tool. These files typically contain detailed records of I/O operations performed by an application.
  • Processing:
    • Parses individual trace records, extracting information such as function calls (e.g., open, read, write, POSIX I/O, MPI I/O calls), timestamps, file identifiers, process/rank information, and data transfer sizes.
    • Categorizes I/O operations (e.g., read, write, metadata).
    • Extracts metadata from the input trace file paths, such as hostname, application name, and process ID.
  • Output: Generates Parquet files containing the structured I/O trace data. The schema of the Parquet files includes the following fields:
Field NameData TypeDescription
indexInt64Record index
levelInt32Call stack level (if available)
tstartFloat32Start timestamp
tmidInt64Timestamp midpoint
tendFloat32End timestamp
durationFloat32Duration of the operation
hostnameUTF8 StringHostname where the operation occurred
appUTF8 StringApplication name
rankInt32MPI rank
proc_nameUTF8 StringProcess name
proc_idInt64Unique process identifier
thread_idInt32Thread identifier
catInt32Operation category
io_catInt32I/O category (Read, Write, Metadata)
func_idUTF8 StringFunction name/identifier
acc_patInt32Access pattern (e.g., sequential, random)
file_idInt64Unique file identifier
file_nameUTF8 StringName of the file involved in the operation
sizeInt64Size of the I/O operation (bytes)
bandwidthFloat32Calculated bandwidth for the operation

Usage

The wisio-recorder2parquet tool is typically built as part of the WisIO project, specifically within the recorder subproject. Its direct usage involves invoking the compiled executable with the path to the Recorder trace files.

mpirun -n 8 wisio-recorder2parquet <input_recorder_trace_directory>

The tool processes the traces from the specified <input_recorder_trace_directory>. It outputs one or more .parquet files into a subdirectory named _parquet, which is automatically created within the <input_recorder_trace_directory>. These resulting Parquet files can then be used as input for the WisIO recorder analyzer.