Setting up Spell in Your GCP account

For users on the Teams plan, we deploy Spell in your cloud and provide the same cluster management tools backing our own internal infrastructure. This means you can keep your data in your own GS (Google Storage) buckets, perform runs on your own machines, and deploy models within your own cloud infrastructure.

This guide gets you started using Spell in your GCP (Google Cloud Platform) account.

Setting up GCP

  1. Make sure you have a GCP account. If you don’t, you can create one here.
  2. Make sure you have gcloud command line tools installed. If not, follow the instructions from Google’s help docs here.
  3. Make sure you have set up Application Default Credentials for external libraries to use. You should run gcloud auth application-default login to set these credentials (further information on what this does is available from Google Cloud here).

Setting up Spell

  1. Install Spell using pip install spell (for more, read our Quickstart guide).
  2. Log in to the CLI with your username or email and password when prompted.

Note

Setting up Spell in your GCP account is only available to Teams plan subscribers. If you're interested in upgrading to a Teams plan, contact us at support@spell.run.

Setting up Your GCP Resources

Next, we’ll need to set up your GCP resources. We’ve made this easy with the spell cluster init gcp command.

Run spell cluster init gcp and follow the prompts:

This command will help you
    - Set up an Google Storage bucket to store your run outputs in
    - Setup a VPC network which Spell will spin up workers in to run your jobs
    - Create a subnet in the VPC
    - Setup a Service Account allowing Spell to spin up and down machines and access the GS bucket

Enter y to confirm you'd like to continue set up.

Enter a display name for this cluster within Spell:

Enter a name for your cluster.

---------------------------------------------
All of this will be done within your project '<project_name>' - continue? [Y/n]

If you wish to isolate Spell resources to a separate project, please set up this project before running this command. Confirm if this GCP project is appropriate.

The script will now create a service account that Spell will use to interact with resources in your project. First it creates a role with the necessary permissions:

---------------------------------------------
Creating role SpellAccess_1456882 with the following permissions:
    compute.disks.create
    compute.disks.list
    compute.disks.resize
    compute.globalOperations.get
    compute.instances.create
    compute.instances.delete
    compute.instances.get
    compute.instances.list
    compute.instances.setLabels
    compute.instances.setMetadata
    compute.instances.setServiceAccount
    compute.subnetworks.use
    compute.subnetworks.useExternalIp
    compute.zones.list
    compute.regions.get
...

Then it creates a service account and assigns the role to it:

Assigning role SpellAccess_1456882 to service account spell-access-...@....iam.gserviceaccount.com...
Successfully set up service account spell-access-....@....iam.gserviceaccount.com
We recommend using an empty GS Bucket for Spell outputs. Would you like to make a new bucket or use an existing (new, existing): new
Please enter a name for the GS Bucket Spell will create for run outputs [spell-my-cluster]:

Give a name to the GS bucket where Spell will store your run outputs and jupyter workspaces. Your name can consist of lowercase letters, periods, and dashes. Read more about bucket naming rules on Google Cloud here.

Created your new bucket spell-my-cluster!

Next the script will create a new Virtual Private Cluster where worker machines will run your training jobs.

First, select a region for your VPC:

All of this will be done within this project's region 'us-west2' - continue? [Y/n]:
Creating network...
  [####################################]  100%
Created a new VPC/network with name gcp-cluster!
Creating firewall rule to allow ingress on ports [22, 2376, 9999]...
  [####################################]  100%
Creating firewall rule to allow communication between instances within VPC...
  [####################################]  100%
Firewall rules ready!
Creating subnetwork...
  [####################################]  100%
Created a new subnet gcp7 within network gcp7 in region us-west2!
---------------------------------------------
Your cluster my-cluster is initialized! Head over to the web console to create machine types to execute your runs on -  https://spell.run/my-org/clusters/17

And you're done!

GCP Limits

In order to create machines in this VPC, you'll need to make sure your machine limits enable that machine type. If you've just set up your GCP account, some of your limits may be set to 0, so you'll first need to request an increase before you can create machine types in Spell. GCP usually approves these pretty quickly.

Model Serving with GKE

Once you have successfully completed spell cluster init gcp you can then construct an GKE cluster for model serving with spell cluster init-model-server.

Requirements

You must have the Python libraries boto3 and kubernetes installed (we recommend pip). Additionally you will need kubectl (the tool to communicate with a Kubernetes cluster, get it here) installed on the machine running the command.

Networking and Scaling

By default it will create an GKE cluster in the same VPC and subnets configured for your spell workers, but you can optionally provide the --create-new-vpc flag to create a new VPC and subnets for the GKE cluster. You can also control the cluster autoscaling parameters with --nodes-min and --nodes-max as well as the disk size of each node with --node-volume-size.

Configuring GKE

Once the GKE cluster is created (you can optionally skip it if the kubernetes cluster is already created), the spell cluster init-model-server command then configures the Kubernetes cluster for model serving. This requires the following steps:

  1. Sets up Cluster Autoscaling
  2. Sets up metrics-server (necessary for Horizontal Pod Autoscaling)
  3. Creates the serving namespace
  4. Grants permissions to the Spell API to manage the Kubernetes cluster
  5. Sets up Ambassador (necessary for model-server routing and load balancing)
  6. Sets up StatsD (used for real-time metrics on the model servers)
  7. Uploads the necessary information to the Spell API to configure your cluster for model serving on this newly configured EKS cluster

Permissions

Below is a list of permissions Spell will need from your GCP project.

GCP Permissions

    compute.disks.create
    compute.disks.list
    compute.disks.resize
    compute.globalOperations.get
    compute.instances.create
    compute.instances.delete
    compute.instances.get
    compute.instances.list
    compute.instances.setLabels
    compute.instances.setMetadata
    compute.instances.setServiceAccount
    compute.subnetworks.use
    compute.subnetworks.useExternalIp
    compute.zones.list
    compute.regions.get

Bucket Permissions

During spell cluster init gcp we will add roles/storage.admin role to the newly created service account. Any bucket added using add-bucket will grant this service account the roles/storage.objectViewer role.

Service Account Access

Spell will gain access to the newly created service account through the following roles:

    roles/iam.serviceAccountTokenCreator
    roles/iam.serviceAccountUser