What Is a Workflow?

The Basics

Workflows are runs that kick off multi-stage machine learning pipelines, and are typically built using Spell's Python API.

The Longer Version

Complex machine learning applications often require multi-stage pipelines (e.g., data loading, transforming, training, testing, iterating) and each stage might be appropriate for a different Spell run. Workflows are designed to help you automate this process.

A workflow is a special type of run that can create other runs. All runs created within a workflow are linked to that workflow. Thus, a workflow represents a logical grouping of runs, typically from a single script or application. This allows you to keep track of which runs were created from a specific application pipeline.

Workflows are particularly convenient for long running pipelines or scripts, since you don't have to worry about keeping your computer up and running for the duration of the script if you execute the workflow on Spell's infrastructure.

Anatomy of a Workflow Command

The spell workflow command is used to create a workflow. It is very similar to the spell run command with a few notable differences:

  • A new workflow is automatically created prior to executing the command argument to spell workflow.
  • Temporary authentication credentials for your user are passed into the run to enable additional requests to the Spell server (e.g., to create additional runs).
  • The --repo option can be specified to add a Git repository at a specific commit to this workflow. Similar to a normal run, the code in the specified repo will be synced to the remote machine prior to executing the workflow. All of the --repo specified repositories will be synced prior to executing the workflow. Part of the --repo specification is a label that can be used in subsequent run requests during the workflow to specify that Git commit for the run.

Workflows are typically used in conjunction with the Spell Python API, since it provides an easy way to programmatically interact with Spell. See Spell Python API for more information.

Workflow Example

An example workflow can be found in the spell-examples repo on GitHub. This workflow is a Python script that uses the Spell Python API to create three runs:

  • one to download a large text file
  • one to mount this large text file and train a recurrent neural network from character-level language models to predict the next characters in a sequence
  • one to use the trained model to generate new artificial text, similar to the training file

To run the workflow:

  1. Clone the Tensorflow character-level language RNN model repo:

    $ git clone https://github.com/sherjilozair/char-rnn-tensorflow.git

  2. Clone the spell-examples repo:

    $ git clone https://github.com/spellrun/spell-examples.git

  3. Run the workflow

    $ cd spell-examples

    $ spell workflow --repo char-rnn=../char-rnn-tensorflow/ python workflows/char-rnn-workflow/workflow.py

The workflow will print out the generated text as the last line of the logs.