Model Servers


Once your model is trained and ready for production, you may want to serve the model as a web API. With the model-servers command we are working the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access to an endpoint that returns the result of inference for input specified in an HTTP request.

Supported Technologies

Currently we support serving TensorFlow and Pytorch models.

For TensorFlow we are using TensorFlow Serving. The TensorFlow model used must be saved with SavedModelBuilder, (details available here) and available in SpellFS either as a run output or as an upload.

For PyTorch we are using a custom webserver that consumes a saved torch.jit.ScriptModule (details available here) and translates JSON in the POST data into Tensors and passes them in as kwargs into the forward method of the ScriptModule. For example, if your forward method is shaped:

forward(self, named_arg_1, named_arg_2):

then the input JSON must have a kwargs key whose value must be an object with keys corresponding to the named arguments of the forward method:

    "kwargs": {
        "named_arg_1": [1.0, 2.0, 3.0],
        "named_arg_2": [[0.9, 0.8], [321.0, 123.4]]

TensorFlow ImageNet Demo

This demo will guide you through deploying a model server with Spell and using it for live inference. The model we'll be using is a version of ResNet-50 trained on ImageNet and can be used to infer one of 1000 categories for a given picture.

1) Download and extract the pre-trained model (NHWC, JPG) from the official TensorFlow repository.

$ curl | tar -xz

2) Upload the model onto Spell. We'll be uploading the model to a folder named resnet:

$ spell upload resnet_v2_fp32_savedmodel_NHWC_jpg --name resnet

3) Create a model server based on the uploaded model by using the model-servers create command:

$ spell model-servers create resnet:v1 uploads/resnet

4) When the model has reached the Running status, it is ready for inference. You will need the URL and the Authentication Token in order to use the model for predictions. To retrieve this information, use the model-servers info command:

$ spell model-servers info resnet:v1
Model Name    resnet
Model Tag     v1
Resource      uploads/resnet50
Type          tensorflow
Date Created  Mar 14 14:36:32
Status        running
Time Running  30 days
Access Token  tzdknT0PTcDYL0PNjsU0uZtdS1edPCSt2skDnFDgx-rNugYfuzHXG8q-_evNf9dmzEXj3Kerx4FLJCUWsqvlfdQ

Store the server url and auth token into two local bash variables, server_url and auth_token.

    # Check the stored variables
    echo $server_url $auth_token

5) The model can be accessed via a RESTful interface. The body of the HTTP request must have a TensorFlow Specific format. The response of the server is an ID which represents the ImageNet class that was inferred. You can use the utility script to prepare the HTTP request, receive the response, and map the ImageNet Class ID to the English word(s) that describes it.

$ curl >
$ curl > imagenet1000_clsid_to_human.json
$ python
Loading JPEG image:
Predicting image class..
Classify result: zebra

6) By default, the Python script will use an already deployed ResNet model server. In order to use the model you have just created, use the variables saved at step 4, and call the Python script with the extra parameters --server and --auth. If you want to use your own image, pass a JPEG Image URL using the --img parameter:

$ python --server $server_url --auth $auth_token --img
Loading JPEG image:
Predicting image class..
Classify result: zebra

PyTorch Demo

This demo will guide you through creating a PyTorch module that can be saved as a torch.jit.ScriptModule and deployed as a model server with Spell.

There are currently two ways to create and save a torch.jit.ScriptModule, detailed in the official documentation here: tracing using torch.jit.trace() or using the @torch.jit.script_method annotations. In the following example, we will be using the tracing method.

1) Create a PyTorch Module with a forward method that will perform the desired operation (inference, classification, etc.) when deployed. This will likely involve your trained PyTorch model, but for this example we will be building a Module that doubles, triples, and squares the respective input Tensors. We then create an instance of the Module, create a sample input, and pass both to torch.jit.trace and save the result in NOTE: Your forward method can only take Tensors as input (albeit any number of them and of any dimension) and only return a Tuple of Tensors as output.

import torch
import json

class TestModule(torch.nn.Module):
    def __init__(self):
        super(TestModule, self).__init__()

    def forward(self, double, triple, squared):
        return (double*2, triple*3, squared**2)

# Trace and save
module_instance = TestModule()
test_input = (torch.tensor([1,2]), torch.tensor([3,4]), torch.tensor([5,6]))
traced_module = torch.jit.trace(module_instance, test_input)"")

2) Save this file as and commit it to a Git repo. Then run it on Spell using spell run python and save the Run ID for this run using export module_run_id=<run_id>.

3) Using the Spell CLI, create a model server like so:

$ spell model-servers create -t pytorch pytorch:v1 runs/$module_run_id/

4) You can monitor the model server's status with spell model-servers. Once the model server is up and ready, construct a curl request to test the endpoint. For our example:

$ curl -d '{"kwargs": {"double":[4.0, 5.0], "triple":[6.0, 7.0], "squared":[3.0, 25.0]}}' -X POST

This should return: