Hyperparameter Searches

The Basics

Hyperparameters control different aspects of the learning process of your model. Optimizing hyperparameters is one way to improve the accuracy of your model. Spell makes it easy to automate hyperparameter searches with the spell hyper command.

The Long Version

A hyperparameter is a high-level property of a machine learning model that typically governs the training process itself (e.g., learning rate, number of hidden layers in a neural network). Thus, a hyperparameter cannot generally be optimized with a single training of the model. Rather, the same model must be trained numerous times while varying the hyperparameter values to determine optimal values. Spell implements a number of features to help you automate this process.

Anatomy of a Hyperparameter Command

The spell hyper command kicks off your hyperparameter search. You can choose between a grid search, random search, or bayesian search.

The spell hyper command is very similar to the spell run command and takes all of the same command line options, with the addition of hyperparameter specifications. For more info on the spell run command, see What Is a Run.

Let's take a look at the example command below.

$ spell hyper grid -t K80  \
    --param rate=0.001,0.01,0.1,1 \
    --param layers=2,3,5,10 -- \
    "python train.py --learning_rate :rate: --num_layers :layers:"

The first part should be familiar. We request a grid search running on K80 machines.

Next are two --param options, which list the values that we want our hyperparameter search to test for each specified parameter. Here we specify two parameters: rate and layers, and the values we want for each. The way to specify the values to search is different for the different types of hyperparameter searches. For more details, skip down to grid search, random search, or bayesian search.

Finally, we have our python command: python train.py --learning_rate :rate: --num_layers :layers:

The parameters in colon bracket form, :rate: and :layers:, are replaced in individual runs with specific values of the respective parameter.

In grid search, a set of discrete values are provided for each hyperparameter and a run is created for all possible combinations of hyperparameters (i.e., if there are n hyperparameters, a run is created for each resultant n-tuple of the Cartesian product of the n hyperparameter value sets). For example:

$ spell hyper grid \
    --param rate=0.001,0.01,0.1,1 \
    --param layers=2,3,5,10 -- \
    python train.py --learning_rate :rate: --num_layers :layers:
Everything up-to-date
💫 Casting hyperparameter search #59…
rate    layers    Run ID
0.001   2         362
0.001   3         363
0.001   5         364
0.001   10        365
0.01    2         366
0.01    3         367
0.01    5         368
0.01    10        369
0.1     2         370
0.1     3         371
0.1     5         372
0.1     10        373
1       2         374
1       3         375
1       5         376
1       10        377

Hyperparameters are specified with the --param NAME=VALUE[,VALUE,VALUE...] option flag. NAME corresponds to the name of the hyperparameter. One or more comma separated VALUEs can be provided after the =, corresponding to the values for the hyperparameter. The values can consist of string, integer, or floating point values.

Note

The hyperparameter NAME provided must exist in the run command surrounded by colons. This tells Spell where to substitute specific values for the hyperparameter in the run command when making the individual runs for the hyperparameter search.

In random search, each hyperparameter is randomly sampled to determine specific values for each run. Additionally, the --num-runs option must be specified to indicate the total number of runs to create. Hyperparameters are specified with the --param option flag and the specification can consist of either:

  1. A set of discrete values, specified with --param NAME=VALUE[,VALUE,VALUE...], similar to grid search. In this case one of the discrete values is randomly selected for the hyperparameter value when constituting a run.
  2. A range specification, specified with --param NAME=MIN:MAX[:SCALING[:TYPE]]. In this case the hyperparameter value is randomly selected from the specified range when constituting a run. MIN and MAX are required and correspond to the minimum and maximum value of the range of this hyperparameter.

    SCALING is optional and can consist of 3 different values (linear is the default if not specified):

    • linear: the hyperparameter range (i.e., MIN to MAX) is sampled uniformly at random to determine a hyperparameter value.
    • log: the hyperparameter range is scaled logarithmically during the sampling (i.e., the range log(MIN) to log(MAX) is sampled uniformly at random and then exponentiated to yield the hyperparameter value). This results in a higher probability density for the sampling towards the lower end of the range.
    • reverse_log: this is the opposite scaling as that described in log, resulting in a higher probability density for the sampling at the higher end of the range.

    TYPE is optional and can consist of 2 different values (float is the default if not specified):

    • float: the resultant hyperparameter value is a floating point number.
    • int: the resultant hyperparameter value is an integer. If this option is specified, the value after randomly sampling is rounded to the nearest integer to yield the final hyperparameter value.

An example random hyperparameter search is as follows:

$ spell hyper random \
    --num-runs 10 \
    --param rate=.001:1.0:log \
    --param layers=2:100:linear:int \
    --param cell=gru,lstm,rnn -- \
    python train.py --learning_rate :rate: --num_layers :layers: --cell_type :cell:
Everything up-to-date
💫 Casting hyperparameter search #60…
rate        layers    cell    Run ID
0.535637    68        lstm    378
0.192321    21        gru     379
0.501205    34        lstm    380
0.00103308  40        gru     381
0.0976437   49        gru     382
0.0131644   36        rnn     383
0.00139867  27        lstm    384
0.0274699   3         lstm    385
0.350886    9         rnn     386
0.23146     66        lstm    387

Bayesian search uses the results of prior runs to try to pick new parameters to test intelligently. It will often either note that a large part of the parameter space is unexplored and pick something in that region or it will observe a prior success and pick something near that. This can help you save on the total number of iterations needed to find good parameters.

For a given objective function (e.g. the accuracy of your model), we treat this as a random function and, using the previously tested parameter samples and the resulting accuracy of your model after training, we create a posterior distribution over that objective function. From that we create an acquisition function which is our best guess of the potential a specific sample has. We then choose the sample which maximizes that acquisition function. There are a number of popular types of acquisition functions; our tool utilizes an upper confidence bound.

If that's a lot to take in, no need to worry: you only need to add a couple things to a random search and we will do the rest for you.

Similar to the random search you must specify one or more parameters via the --param flag. These take the form --param NAME=MIN:MAX[:TYPE] where MIN is the lowest value that parameter is allowed to take MAX is the highest and TYPE is either int or float.

You must also inform the search of the name of the metric you would like to optimize via --metric. To learn more about using metrics with Spell, you can check out the docs on Metrics. In addition, you need to specify how Spell should interpret the values of this metric observed. This is the --metric-agg which can be min, max, last, or avg. For example, if you select the keras metric keras/val_acc and aggregation type last, Spell will use the last validation accuracy recorded in a given run and treat that as the success of your model for those parameters. The model will attempt to maximize this value, so make sure to select a metric and aggregation type appropriately.

Lastly, in addition to --num-runs that you would specify for any hyperparameter search, bayesian search requires you to select the number of --parallel-runs as well. This is the maximum number of trials that Spell will run in parallel. This reflects a tradeoff: if you choose a lower number, the search will proceed incrementally and will take longer to complete. If you choose a higher number, many runs will be in progress when a new run is launched and the new run's parameters will be selected without the benefit of knowing how well the in-progress trials do.

An example bayesian hyperparameter search is as follows:

$ spell hyper bayesian \
    --num-runs 12 \
    --parallel-runs 3 \
    --metric keras/val_acc \
    --metric-agg avg \
    --param rate=.001:1.0 \
    --param layers=2:100:int \
    python train.py --learning_rate :rate: --num_layers :layers:
Everything up-to-date
💫 Casting hyperparameter search #61…
rate        layers    Run ID
0.343882    23        388
0.294112    72        389
0.587557    64        390

You can also check out our blog post to see an example of bayesian search in action.

Viewing Your Hypersearch on the Web

You can view the results of your hyperparameter run on the web.

The web visualization updates in real time, so you can see how each run is performing as they launch.

A screenshot of the Spell hyperparameter search page on the web

Python API

The Spell Python API also supports creating hyperparameter searches. See Spell Python API for more information on the Spell Python API in general, and Hyperparameter Searches for the hyperparameter search functionality.