Setting up Spell in Your AWS account

For users on the Teams plan, we deploy Spell in your cloud and provide the same cluster management tools backing our own internal infrastructure. This means you can keep your data in your own S3 buckets, perform runs on your own machines, and deploy models within your own cloud infrastructure.

This guide gets you started using Spell in your AWS account.

Setting up AWS

  1. Make sure you have an AWS account. If you don’t, you can create one here.
  2. Make sure you have AWS command line tools installed. If not, follow the instructions from Amazon’s help docs here.
  3. Make sure ~/.aws/credentials contains the access key of a user who has enough AWS permissions to set up the cluster. You can run aws configure to set these credentials (further information on what this does is available from Amazon here).

Setting up Spell

  1. Install Spell using pip install spell (for more, read our Quickstart guide).
  2. Log in to the CLI with your username or email and password when prompted.

Note

Setting up Spell in your AWS account is only available to Teams plan subscribers. If you're interested in upgrading to a Teams plan, contact us at support@spell.run.

Setting up Your Amazon Resources

Next, we’ll need to set up your Amazon resources. We’ve made this easy with the spell cluster init aws command.

Run spell cluster init aws and follow the prompts:

Enter a display name for this cluster within Spell:

Enter a name for your cluster.

Enter the name of the AWS profile you would like to use [default]:

Enter the name of your AWS profile. Most likely you will use default (just press return). If you have multiple AWS users, you can find the name of the appropriate profile in ~/.aws/credentials.

This command will help you
    - Setup an S3 bucket to store your run outputs in
    - Setup a VPC which Spell will spin up workers in to run your jobs
    - Ensure subnets in the VPC in multiple availability zones
    - Setup a Security Group providing Spell SSH and Docker access to workers
    - Setup an IAM role allowing Spell to spin up and down machines and access
the S3 bucket
All of this will be done with your AWS profile 'default' which has Access Key
ID 'ABCDEFG' and region 'us-west-2' - continue? [y/N]:

Enter y to confirm you'd like to continue set up.

Please enter a name for the S3 Bucket Spell will create for run outputs
[spell-my-cluster]:

Give a name to the S3 bucket where Spell will store your run outputs. Your name can consist of lowercase letters, periods, and dashes. Read more about bucket naming rules on Amazon here.

Created your new bucket spell-my-cluster!

Next the script will create a new Virtual Private Cluster where worker machines will run your training jobs. It will also create a subnet in each availability zone. Sometimes AWS runs out of capacity in one zone so this will allow Spell to maximize availability.

---------------------------------------------
Created a new VPC with ID vpc-123456789abcde!
Created a new subnet subnet-1234567xyz in your new VPC in availability-zone us-west-2a
Created a new subnet subnet-1234567123 in your new VPC in availability-zone us-west-2b
Created a new subnet subnet-1234567abc in your new VPC in availability-zone us-west-2c
Created a new subnet subnet-1234567def in your new VPC in availability-zone us-west-2d
Created internet gateway igw-1234567 for new VPC
Successfully created security group sg-0123456789abc

Finally the script will create an Identity Access Management role which will allow Spell to spin up and down machines and access S3 objects.

---------------------------------------------
Creating new IAM role
Successfully created IAM role SpellAccess-1234567
---------------------------------------------
Your cluster my-cluster is initialized! Head over to the web console to create machine types to execute your runs on - https://spell.run/my-org/clusters/17

And you're done!

AWS Limits

In order to create machines in this VPC, you'll need to make sure your machine limits enable that machine type. If you've just set up your AWS account, some of your limits may be set to 0, so you'll first need to request an increase before you can create machine types in Spell. AWS usually approves these pretty quickly.

Model Serving with EKS

Once you have successfully completed spell cluster init aws you can then construct an EKS cluster for model serving with spell cluster init-model-server.

Requirements

You must have the Python libraries boto3 and kubernetes installed (we recommend pip). Additionally you will need aws-iam-authenticator (required for authenticating to EKS clusters, get it here), eksctl (the tool we use to create the EKS cluster, get it here), and kubectl (the tool to communicate with a Kubernetes cluster, get it here) installed on the machine running the command.

Networking and Scaling

By default it will create an EKS cluster in the same VPC and subnets configured for your spell workers, but you can optionally provide the --create-new-vpc flag to create a new VPC and subnets for the EKS cluster. You can also control the cluster autoscaling parameters with --nodes-min and --nodes-max as well as the disk size of each node with --node-volume-size.

Configuring EKS

Once the EKS cluster is created (you can optionally skip it if the kubernetes cluster is already created), the spell cluster init-model-server command then configures the Kubernetes cluster for model serving. This requires the following steps:

  1. Sets up Cluster Autoscaling
  2. Sets up metrics-server (necessary for Horizontal Pod Autoscaling)
  3. Creates the serving namespace
  4. Grants permissions to the Spell API to manage the Kubernetes cluster
  5. Sets up Ambassador (necessary for model-server routing and load balancing)
  6. Creates the SpellReadS3User to give the cluster permissions to download models from your cluster's S3 buckets
  7. Creates Kubernetes secrets with the SpellReadS3User access tokens
  8. Sets up StatsD (used for real-time metrics on the model servers)
  9. Uploads the necessary information to the Spell API to configure your cluster for model serving on this newly configured EKS cluster

Permissions

Below is a list of permissions Spell will need from your AWS account.

EC2 Permissions

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EC2",
            "Effect": "Allow",
            "Action": [
                "s3:GetAccountPublicAccessBlock",
                "ec2:*",
                "s3:HeadBucket"
            ],
            "Resource": "*"
        },
        {
            "Sid": "DenyTerminate",
            "Effect": "Deny",
            "Action": [
                "ec2:TerminateInstances",
                "ec2:StopInstances"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "ec2:ResourceTag/spell-machine": "true"
                }
            }
        }
    ]
}

S3 Permissions

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "ReadS3",
        "Effect": "Allow",
        "Action": [
            "s3:ListBucketByTags",
            "s3:GetLifecycleConfiguration",
            "s3:GetBucketTagging",
            "s3:GetInventoryConfiguration",
            "s3:GetObjectVersionTagging",
            "s3:ListBucketVersions",
            "s3:GetBucketLogging",
            "s3:ListBucket",
            "s3:GetAccelerateConfiguration",
            "s3:GetBucketPolicy",
            "s3:GetObjectVersionTorrent",
            "s3:GetObjectAcl",
            "s3:GetEncryptionConfiguration",
            "s3:GetBucketRequestPayment",
            "s3:GetObjectVersionAcl",
            "s3:GetObjectTagging",
            "s3:GetMetricsConfiguration",
            "s3:GetBucketPublicAccessBlock",
            "s3:GetBucketPolicyStatus",
            "s3:ListBucketMultipartUploads",
            "s3:GetBucketWebsite",
            "s3:GetBucketVersioning",
            "s3:GetBucketAcl",
            "s3:GetBucketNotification",
            "s3:GetReplicationConfiguration",
            "s3:ListMultipartUploadParts",
            "s3:GetObject",
            "s3:GetObjectTorrent",
            "s3:GetBucketCORS",
            "s3:GetAnalyticsConfiguration",
            "s3:GetObjectVersionForReplication",
            "s3:GetBucketLocation",
            "s3:GetObjectVersion"
        ],
        "Resource": "*"
    }
}

S3 Write

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "WriteS3",
        "Effect": "Allow",
        "Action": [
            "s3:PutAnalyticsConfiguration",
            "s3:PutAccelerateConfiguration",
            "s3:DeleteObjectVersion",
            "s3:ReplicateTags",
            "s3:RestoreObject",
            "s3:ReplicateObject",
            "s3:PutEncryptionConfiguration",
            "s3:DeleteBucketWebsite",
            "s3:AbortMultipartUpload",
            "s3:PutBucketTagging",
            "s3:PutLifecycleConfiguration",
            "s3:PutObjectTagging",
            "s3:DeleteObject",
            "s3:PutBucketVersioning",
            "s3:DeleteObjectTagging",
            "s3:PutMetricsConfiguration",
            "s3:PutReplicationConfiguration",
            "s3:PutObjectVersionTagging",
            "s3:DeleteObjectVersionTagging",
            "s3:PutBucketCORS",
            "s3:PutInventoryConfiguration",
            "s3:PutObject",
            "s3:PutBucketNotification",
            "s3:PutBucketWebsite",
            "s3:PutBucketRequestPayment",
            "s3:PutBucketLogging",
            "s3:ReplicateDelete"
        ],
        "Resource": [
            "arn:aws:s3:::mybucket",
            "arn:aws:s3:::mybucket/*"
        ]
    }
}