What Is a Run

This page is a deep dive into the core unit of work on Spell: the run.

Anatomy of the run command

The spell run command is used to create runs and is likely the command you'll use most while using Spell. One of the simplest commands you can run is spell run "echo hello world", which will print hello world to the screen. In this case, only spell run is related to the Spell command, while echo hello world is your regular command:

$ spell run "echo hello world"

A more realistic example is this one taken from style transfer demo:

$ spell run --mount runs/1/data:datasets \
            --machine-type K80 \
            --apt ffmpeg \
            --pip moviepy \
  "python style.py \
            --checkpoint-dir ckpt \
            --style images/style/my-style-image.jpg \
            --style-weight 1.5e2 \
            --train-path datasets/train2014 \
            --vgg-path datasets/imagenet-vgg-verydeep-19.mat"

In this example, we've mounted a file using the spell --mount flag, specified that we want a K80 GPU with the --machine-type flag, and added both an apt and a pip dependency. These flags add the correct files and dependencies to your spell environment. Then everything in the quotation marks is the regular command from the fast style transfer repository.

As this command demonstrates, spell run has a large number of (mostly environment configuration) options. There are more details in the corresponding sections of the user guide, we will briefly mention the most important ones here:

  • mount allows you to mount resources from SpellFS (or potentially other object storage systems) into your run. This is how we recommend loading your datasets into a run.
  • machine-type allows you to specify your instance type. For more details, see the page on Instance types.
  • framework allows you to specify the base machine learning framework image to use. Our default framework contains all of the common machine learning libraries (TensorFlow, PyTorch, etcetera), but may potentially be more out of date than a framework-specific image.
  • apt, pip, and conda allow you to customize the packages installed in the environment.
  • github-url lets you initialize a run with files from a particular GitHub repository. This is handy if you don't already have the code locally.

Run IDs

All runs get a unique run ID. The run ID can be found in the ✨ Casting spell #<ID>… message that appears after a run is created. Run IDs are assigned in ascending sequence. All runs are given an ID, including workspace runs and TensorBoard runs, although these two types of runs do not show up as entries on your run history page.

Run IDs are useful if you want to:

  • Refer back to all the information from your run. Logs, metrics, and outputs from a run are available on the web console.
  • Mount the outputs from your run into another run. This pattern of using the output of one run as the input to another run is so common that we've built an entire feature, workflows, based on this idea.

How runs interact with git

A run is typically initiated from a Git repository on the machine where you are using Spell, but it does not have to be. When initiated from a Git repository, Spell uses Git to safely sync the code in your current directory to the remote machine. Turning any directory into a Git repository is very easy - just run git init.

If the run is initiated from a Git repository, the run will automatically sync code from the repository and use it within the run. If the current repo has uncommitted changes, these changes will also be synced and used within the run, and can be downloaded as a separate patch via the web console. If there are any untracked files, the run will provide a warning. This is to avoid any ambiguity about the state of your code being transferred to Spell.

Note

  • Spell uses SSH to sync your code. If you require special SSH configuration (e.g. proxy commands for network firewalls) update the ssh_config file in your Spell config directory. This will likely be in ~/.spell on macOS or Linux and your user's AppData/Roaming/spell directory on Windows.
  • Spell does not support git-lfs. If you have large files inside your repository, we recommend using the spell upload command. Git submodules are not yet supported either, as a workaround you can directly commit the submodules into the parent repo.
  • It is possible to configure Spell to error if uncommitted changes are detected. This can help enforce tracking code changes and improve reproducibility. Configure this in the config file in your Spell config directory.

Everything in the commit that is currently checked out (unless the --commit-ref option is provided to specify a different commit) will be available in the run.

For example, if you go through the steps below, you will see running on Spell print out during your run, showing that the file in your current Git commit was transferred to Spell for the run:

$ mkdir project
$ cd project
$ echo 'running on Spell!' > file
$ git add file && git commit -m 'first commit'
$ spell run cat file

Note that the run will execute in whatever the local current working directory is. For example, if you are working in a nested directory within your Git repository, you can run commands there without worrying about the path relative to the Git repository root.

Dependency management and customization

Customizing frameworks

There are several ways to modify the code packages available in a run. The highest level of customization is your choice of framework, specified using the --framework command line option. E.g.:

$ spell run --framework fastai "python train.py"

This provides the following options:

Name CLI arguments Notes
Default (none) tensorflow==1.14.0
torch==1.4.0
torchvision==0.5.0
Pillow==6.1.0
Keras==2.2.4
TensorFlow 2.0 --framework tensorflow2 tensorflow==2.0.0
MXNet --framework mxnet mxnet==1.5.0
fast.ai --framework fastai torch==1.2.0
Caffe --framework caffe caffe==1.0.0
DyNet --framework dynet dyNET==2.1
Torch --framework torch Torch7

Spell provides a number of framework environments for you to run your code in. All framework environments have the Ubuntu 18.04 operating system and Python versions 2.7 and 3.6. If a machine type is specified that has a GPU, then CUDA and cuDNN are also included. Additionally, the following Python packages always come pre-installed:

six==1.12.0
scipy==1.2.1
requests==2.21.0
protobuf==3.6.1
pickleshare==0.7.5
numpy==1.16.2
h5py==2.9.0

Installing additional pip packages

You can add pip packages using the --pip option followed by the name of the package.

$ spell run --pip sklearn "python train.py"

To add multiple packages, you can repeat the --pip command:

$ spell run --pip sklearn --pip imageio "python train.py"

If you are installing a lot of packages at once, it is probably more convenient to use pip-req, passing in a path to a requirements.txt file:

$ spell run --pip-req requirements.txt "python train.py"

The requirements file should be a valid pip requirements file. For example:

# requirements.txt
sklearn
imageio

Installing additional conda packages

You can add conda packages using the --conda option followed by the name of the package.

$ spell run --conda scikit-learn "python train.py"

Much like with pip, you can add multiple packages by by repeating the --conda option, or suppy a conda requirements file file via the --conda-file option.

$ spell run --conda-file ./environment.yml "python train.py"

The conda environment name in the run on Spell will be spell.

Installing additional apt packages

To modify the system packages included in the environment, use the apt option.

apt packages are added much in the same way as pip packages, via a command line option: --apt <package>. Spell maintained frameworks are built on the Ubuntu 18.04 Linux distribution, so any package reachable from the default set of package indexes shipped with Ubuntu 18.04 can be specified.

$ spell run --apt libprotobuf-dev --apt protobuf-compiler "python train.py"

Combining package managers

You can even mix and match package manager commands in a single command:

$ spell run --framework fastai --apt ffmpeg --pip cudatoolkit\>=9.0 "echo 'hello world'"

Notice that the commands allow you to specify specific versions of packages using >, >=, ==, <, and <= operators. Lesser-than and greater-than are reserved characters in the CLI so you will probably need to escape these characters using a backslash, as above.

Note

We do not currently support mixing conda and pip install instructions. If you want to simultaneously use both, use --conda-file.

Mounting resources

To add resources (such as datasets) to a run, mount them using the --mount option. To use the mount option, specify the resource path to your dataset and the path to mount the dataset to on Spell. For example, the following code snippet mounts an audio dataset to the /mnt/audio-data path and prints out its contents from inside of the run.

$ spell run --mount public/audio/css10:/mnt/audio-data "ls /mnt/audio-data"
💫 Casting spell #9…
✨ Stop viewing logs with ^C
✨ Machine_Requested… done
✨ Building… done
✨ Mounting… done
✨ Run is running
chinese-single-speaker-speech-dataset
dutch-single-speaker-speech-dataset
finnish-single-speaker-speech-dataset
french-single-speaker-speech-dataset
german-single-speaker-speech-dataset
greek-single-speaker-speech-dataset
hungarian-single-speaker-speech-dataset
japanese-single-speaker-speech-dataset
korean-single-speaker-speech-dataset
russian-single-speaker-speech-dataset
spanish-single-speaker-speech-dataset

We strongly recommend using absolute mount paths for legibility. As a best practice, we recommend using the /mnt directory as the root directory for your data.

Alternatively, you can choose omit all or part of the mount path, in which case the default path and/or default name will be used:

$ spell run --mount public/audio/css10:audio-data "python main.py"
$ spell run --mount public/audio/css10 "python main.py"

In this case the files will land in the current working directory inside of the run (typically /spell/$REPO_NAME, assuming you started the run from the root of a GitHub directory). To learn more about resources see the page "What is a resource?".

Setting environment variables

Environment variables can be passed to a run using the env flag.

$ spell run --env CUDA_VISIBLE_DEVICES=1 "python example.py"

Using code from a GitHub repo

As explained in the section "How runs interact with git" above, the default workflow for using a run involves initializing it from a git repository. That git repository is loaded onto the machine, and its files are usable by the run.

You can specify an alternative GitHub code repository using the --github-url flag. This will use the master branch by default. You can specify an alternative branch, commit hash, or git ref using the --github-ref flag. For example:

$ spell run \
    --github-url 'https://github.com/spellrun/spell-examples.git' \
    --github-ref '622d64' \
    'git log -n 1 | cat'

Note that on the community edition of Spell the code repository has to be public. Spell for Teams allows you to pull code from private repositories using our GitHub integration. For more info on this see the page "Integrating GitHub".

(Advanced) Using custom public Docker images

If you have use case that requires environment configuration deeper than what the apt, pip, and conda flags can provide, you can chose to provide Spell a custom Docker image instead using the docker_image flag. This parameter takes a container image URL as input, e.g. a URL in the form <domain>/<repository>/<image_name>:<tag>. The domain and repository parts of the path are both optional, if you omit them we default to public DockerHub (https://hub.docker.com/_/) will be used. For example:

$ spell run -f --docker-image 'python:latest' \
    'python -c "print(\"Hello World\")"'

Note that the docker_image flag is exclusive—it cannot be combined with any of the other environment configuration flags, e.g. pip, conda, etcetera.

(Advanced) Using custom private Docker images

Note

This feature is only available on Spell for Teams.

Users on the community edition must pull from a public registry. Users on Spell For Teams may pull from private AWS ECS or GCP Container Registry instead.

The docker_image flag on the spell run command can be used to initialize a run environment using an image pulled from a Docker registry. Teams users that have configured their own cluster can use the private registry of the cluster's cloud provider (ECR for AWS, GCR for GCP) using the spell cluster add-docker-registry command.

If your cluster is on AWS for example, you can use the command to add one of the ECR repos currently available in that AWS account. This will update the IAM permissions that Spell uses to control your cluster allowing it access to the repo.

$ spell cluster add-docker-registry
This command will
    - Allow Spell to get authorization tokens to access your docker registry
    - If no repository is specified, list your repositories in the registry
    - Add read permissions for that repository to the IAM role associated with the cluster
All of this will be done with your AWS profile 'default' which has Access Key ID 'ABCDEFGHIJKL' and region 'us-west-2' - continue? [y/N]: y
Spell does not yet have access to the following repos found in your AWS account:
- image-processing
- text-generation
- image-gan
Please choose a repository: text-generation
...
Successfully added read permission to text-generation

Once set up, you can use the --docker-image argument to spell run to specify a docker image pushed to the private ECR repository.

The ECR or GCR permissions can be removed at any time with spell cluster delete-docker-registry.

(Advanced) Mounting public buckets

You can mount data from public AWS S3 or GCP GS buckets into a run. For example, in the next example we mount monthly rainfall figures from the public SILO climate dataset and give it the alias data.

$ spell run \
    -m 's3://silo-open-data/annual/monthly_rain:data' \
    'python main.py'

Users on Spell for Teams can mount data from private S3 and/or GS buckets using our AWS and GCP integrations. See Cluster Bucket Management for more details.

(Advanced) Specifying an early stopping condition

Early stopping is the technique of ending a training run early, before the model training process has reached its maximum number of epochs, based on lack of improvement in the model accuracy.

We support a form of early stopping directly in the CLI using the stop-condition flag. For example, --stop-condition "keras/val_acc < 0.5 : 10" will stop model training if the run doesn't reach a validation accuracy of 50 percent within 10 epochs, or if the validation accuracy metric dips below 50 percent in subsequent epochs. Early stopping conditions can be defined using either automatic and custom user metrics.

We support <, >, <=, and >= operators. The second parameter is optional; if left unspecified, the early stopping check will be applied to every epoch of training.

(Advanced) Run states

Runs transition through the following states:

  • Requested: the run has been created, and is queued for execution. Typically, a run will transition through this state very quickly. However, a run could stay Requested for an extended period of time for two reasons:
    1. You already have the maximum number of concurrent runs executing that is allowed per your plan. If this is the case, your run will transition to Building when the number of concurrent runs you have falls below the limit.
    2. Spell received a number of runs at the same time and is starting up more machines to execute your run. If this is the case, your run will transition to Building as soon as a machine is ready.
  • Building: the environment for your run is created. This includes installing any dependencies that are specified (e.g. --pip or --apt parameters provided on the command line to create the run) and copying the code from your Git repository into the run. First we check to see if the run uses an environment from a previous run, and if so we used the cached environment to minimize build time in lieu of building the environment again.
  • Mounting: the resources (if any) that were specified are mounted into the run. (See the Resources section for more information).
  • Running: your command is executing!
  • Saving: any new or modified files from your command are saved as a resource into runs/
  • Pushing: your environment is pushed to our cache of environments to expedite a future run with the same environment. Usually, your run will transition through this step very quickly since this push happens asynchronously in the background throughout every other state. The push is usually completed by the time this state is reached unless your run is very quick.

All runs eventually transition to one of the following final states:

  • Complete: this is the normal state for a run to transition to upon completion.
  • Killed: the run was killed using the spell kill command. When a run is killed, it does not transition through any subsequent steps and is immediately terminated.
  • Stopped: the run was stopped using the spell stop command. A stopped run still transitions through the Saving and Pushing states, however its Running state is immediately exited whenever the spell stop command is issued.
  • Failed: the run did not complete due to an error in some other state.
  • Interrupted: the run did not complete because it was executing on a spot instance that got reclaimed (this state is only possible when using spot instances on Spell for Teams).