Building a Package
This guide documents how to extend the set of applications that Jarvis is able to deploy. We refer to these as packages (pkgs for short).
AI Bindings
Our llm info file is located here. This is used for context for the AI to understand how to build a jarvis package.
At a high-level, you can use the following style of prompt to create a jarvis package. This example is for building a package for top.
The jarvis repo is located at `builtin`. Create a new package in this repo named `top`. Make it so I can deploy top on a series of nodes with parallel SSH.
The documentation for jarvis is attached as context.
Bootstrap a Pkg
You can bootstrap a pkg to the primary repo as follows:
jarvis repo create [name] [pkg_class]
pkg_class
can be one of:
- service
- app
- interceptor
For example:
jarvis repo create hermes service
jarvis repo create hermes_mpiio interceptor
jarvis repo create gray_scott app
We can then create an example pipeline as follows:
jarvis ppl create test
jarvis ppl append hermes
jarvis ppl append hermes_mpiio
jarvis ppl append gray_scott
This is an example of a pipeline which will deploy Hermes, the Hermes MPI-IO interceptor, and Gray Scott, which is an application which performs I/O using MPI.
The Pkg
Base Class
This section will go over the variables and methods common across all Pkg types. These variables will be initialized automatically.
class Pkg:
def __init__(self):
self.pkg_dir = '...'
self.shared_dir = '...'
self.private_dir = '...'
self.env = {}
self.mod_env = {}
self.config = {}
self.global_id = '...'
self.pkg_id = '...'
pkg_id
and global_id
The Global ID (global_id) is the globally unique ID of the a package in all of jarvis. It is a dot-separated string. Typically, the format is:
{pipeline_id}.{pkg_id}
The Package ID (pkg_id) is the unique ID of the package relative to a pipeline. This is a simple string (no dots).
For example, from section 5.1, we appended 3 packages: hermes, hermes_mpiio, and gray_scott. hermes, hermes_mpiio, and gray_scott are also the pkg_ids. The global_ids would be:
test.hermes
test.hermes_mpiio
test.gray_scott
Usage:
self.global_id
self.pkg_id
pkg_dir
The package directory is the location of the class python file on the filesystem.
For example, when calling jarvis repo create hermes
, the directory
created by this command will be the pkg_dir.
One use case for the pkg_dir is for creating template configuration files. For example, OrangeFS has a complex XML configuration which would be a pain to repeat in Python. One could include an OrangeFS XML config in their package directory and commit as part of their Jarvis repo.
Usage:
self.pkg_dir
shared_dir
The shared_dir is a directory stored on a filesystem common across all nodes in the hostfile. Each node has the same view of data in the shared_dir. The shared_dir contains data for the specific pkg to avoid conflicts in a pipeline with multiple pkgs.
For example, when deploying Hermes, we assume that each node has the Hermes configuration file. Each node is expected to have the same configuration file. We store the Hermes config in the shared_dir.
Usage:
self.shared_dir
private_dir
This is a directory which is common across all nodes, but nodes do not have the same view of data.
For example, when deploying OrangeFS, it is required that each node has a file called pvfs2tab. It essentially stores the protocol + address that OrangeFS uses for networking. However, the content of this file is different for each node. Storing it in the shared_dir would be incorrect. This is why we need the private_dir.
Usage:
self.private_dir
env
Jarvis pipelines store the current environment in a YAML file, which represents a python dictionary. The key is the environment variable name (string) and the value is the intended meaning of the variable. There is a single environment used for the entire pipeline. Each pipeline stores its own environment to avoid conflict.
Usage:
self.env['VAR_NAME']
Environments can be modified using various helper functions:
self.track_env(env_track_dict)
self.prepend_env(env_name, val)
self.setenv(env_name, val)
Viewing the env YAML file for the current pipeline from the CLI
cat `jarvis path`/env.yaml
mod_env
a python dictionary. Essentially a copy of env
. However, mod_env
also stores the LD_PRELOAD environment variable for interception. This can cause conflict if used irresponsibly. Not every program should be intercepted.
For example, we use this for Hermes to intercept POSIX I/O. However, POSIX is widely-used for I/O so we like to be very specific when it is used.
mod_env
can be modified using the same functions as env
.
self.track_env(env_track_dict)
self.prepend_env(env_name, val)
self.setenv(env_name, val)
config
The Jarvis configuration is stored in {pkg_dir}/{pkg_id}.yaml
.
Unlike the environment dict, this stores variables that are specific to
the package. They are not global to the pipeline.
For example, OrangeFS and Hermes need to know the desired port number and RPC protocol. This information is specific to the program, not the entire pipeline.
Usage:
self.config['VAR_NAME']
jarvis
The Jarvis CD configuration manager stores various properties global to all of Jarvis. The most important information is the hostfile and resource_graph, discussed in the next sections.
Usage:
self.jarvis
hostfile
The hostfile contains the set of all hosts that Jarvis has access to. The hostfile format is documented here.
Usage:
self.jarvis.hostfile
resource_graph
The resource graph can be queried to get storage and networking information for storing large volumes of data.
self.jarvis.resource_graph
Building a Service or Application
Services and Applications implement the same interface, but are logically slightly different. A service is long-running and would typically require the users to manually stop it. Applications stop automatically when it finishes doing what it's doing. Jarvis can deploy services alongside applications to avoid the manual stop when benchmarking.
_init
The Jarvis constructor (_init
) is used to initialize global variables.
Don't assume that self.config is initialized.
This is to provide an overview of the parameters of this class.
Default values should almost always be None.
def _init(self):
self.gray_scott_path = None