Private Machines

For users on the Spell for Teams plan, Spell supports using private machines that you own at your office, datacenter, or home to be added as workers within your Spell cluster.

Once you have set up an AWS or GCP cluster, you can set up distinct private machine types and add your personal private machines to your Spell cluster.

Machine requirements

  • Ubuntu 18.04 operating system or newer.
  • Docker installed. On Ubuntu this is most easily done with sudo apt install docker.io.
  • Nvidia GPU with drivers installed. On Ubuntu this can be as easy as sudo apt install nvidia-driver-430, don't forget to reboot the machine afterward installing new drivers.
  • At least 50G of free HD space, although Spell recommends significantly more for any complex Machine Learning workload.

Note

This is currently a BETA feature and is being rolled out to our Teams customers on a rolling basis. Please contact Spell support at support@spell.run if you would like to leverage this powerful new Spell feature!

Creating a new private machine type

Machine Type Creation Dialog

Under Clusters click on your cluster, then click on "Add New Machine Type". At the top of the modal that pops up, there will be an option to select either "Cloud Instance" or "Private Machine". Selecting "Private Machine" will update the options to only include the subset of options relevant to private machines:

  • Name. This name will be referenced by the --machine_type parameter when you create runs.
  • Additional Images. All private machine workers will have with the default framework (TensorFlow 1, PyTorch, and Conda) image installed onto them when connected to Spell. Use these checkboxes to attach additional framework images. Consult the section "Available frameworks" for details.

After clicking "Create" you will be shown an API key created for this new machine type. Keep this API key safe and don't share it without anyone. Copy it down for the next step: you will use it to register your private machines with Spell. Don't worry if you lose it, you can always return to the cluster page and get it again.

Install the Spell worker service

Log in to your machine. In the terminal, download the debian package for the Spell worker service:

$ wget https://apt.spell.run/spell-worker-service.deb

Install the Spell Worker Service by running

$ sudo apt install ./spell-worker-service.deb

Add the API key in the installation wizard prompt. After successful installation, your machine should be visible in the Clusters page. It may stay in the "Starting" state for a bit while the machine is connecting to Spell and Spell is downloading the frameworks to the machine's docker

Debugging the Spell worker service

The service should start automatically when the package is installed. If for any reason it's not working as expected you can use the following commands to help debug.

To show some status information and log snippets:

$ systemctl status spell-worker.service

To shows all Spell worker service logs:

$ journalctl -u spell-worker.service

To control Spell worker service's execution:

$ sudo systemctl [start | stop | restart] spell-worker.service

It shouldn't be necessary to run these commands, but if something is not behaving as expected the above commands can be helpful in identifying the issue.

Moving and deleting machines

In order to move a private machine from one machine type to another, you need to first delete the machine from its current machine type. This is done by clicking the blue "x" in the rightmost column of the machine details table. Once removed, you can register the machine with a new machine type by getting the new machine type's API key and running the following command on the machine itself:

$ sudo dpkg-reconfigure spell-worker-service

Or you can remove the machine from Spell altogether (don't worry, you can always add it back later) by running the following command:

$ sudo apt purge spell-worker-service

Deleting a private machine type has the same effect as deleting a cloud machine type, except the machines are removed and not terminated. Read more about deleting machine types here.