Hyperparameters control different aspects of the learning process of your model. Optimizing hyperparameters is one way to improve the accuracy of your model. Spell makes it easy to automate hyperparameter searches with the
spell hyper command.
The Long Version
A hyperparameter is a high-level property of a machine learning model that typically governs the training process itself (e.g., learning rate, number of hidden layers in a neural network). Thus, a hyperparameter cannot generally be optimized with a single training of the model. Rather, the same model must be trained numerous times while varying the hyperparameter values to determine optimal values. Spell implements a number of features to help you automate this process.
Anatomy of a Hyperparameter Command
spell hyper command is very similar to the
spell run command and takes all of the same command line options, with the addition of hyperparameter specifications. For more info on the
spell run command, see What Is a Run.
Let's take a look at the example command below.
$ spell hyper grid -t K80 \ --param rate=0.001,0.01,0.1,1 \ --param layers=2,3,5,10 -- \ "python train.py --learning_rate :rate: --num_layers :layers:"
The first part should be familiar. We request a grid search running on
Next are two
--param options, which list the values that we want our hyperparameter search to test for each specified parameter. Here we specify two parameters:
layers, and the values we want for each. The way to specify the values to search is different for the different types of hyperparameter searches. For more details, skip down to grid search, random search, or bayesian search.
Finally, we have our python command:
python train.py --learning_rate :rate: --num_layers :layers:
The parameters in colon bracket form,
:layers:, are replaced in individual runs with specific values of the respective parameter.
In grid search, a set of discrete values are provided for each hyperparameter and a run is created for all possible combinations of hyperparameters (i.e., if there are n hyperparameters, a run is created for each resultant n-tuple of the Cartesian product of the n hyperparameter value sets). For example:
$ spell hyper grid \ --param rate=0.001,0.01,0.1,1 \ --param layers=2,3,5,10 -- \ python train.py --learning_rate :rate: --num_layers :layers: Everything up-to-date 💫 Casting hyperparameter search #59… rate layers Run ID 0.001 2 362 0.001 3 363 0.001 5 364 0.001 10 365 0.01 2 366 0.01 3 367 0.01 5 368 0.01 10 369 0.1 2 370 0.1 3 371 0.1 5 372 0.1 10 373 1 2 374 1 3 375 1 5 376 1 10 377
Hyperparameters are specified with the
--param NAME=VALUE[,VALUE,VALUE...] option flag.
NAME corresponds to the name of the hyperparameter. One or more comma separated
VALUEs can be provided after the
=, corresponding to the values for the hyperparameter. The values can consist of string, integer, or floating point values.
NAME provided must exist in the run command surrounded by colons. This tells Spell where to substitute specific values for the hyperparameter in the run command when making the individual runs for the hyperparameter search.
In random search, each hyperparameter is randomly sampled to determine specific values for each run. Additionally, the
--num-runs option must be specified to indicate the total number of runs to create. Hyperparameters are specified with the
--param option flag and the specification can consist of either:
- A set of discrete values, specified with
--param NAME=VALUE[,VALUE,VALUE...], similar to grid search. In this case one of the discrete values is randomly selected for the hyperparameter value when constituting a run.
A range specification, specified with
--param NAME=MIN:MAX[:SCALING[:TYPE]]. In this case the hyperparameter value is randomly selected from the specified range when constituting a run.
MAXare required and correspond to the minimum and maximum value of the range of this hyperparameter.
SCALINGis optional and can consist of 3 different values (
linearis the default if not specified):
linear: the hyperparameter range (i.e.,
MAX) is sampled uniformly at random to determine a hyperparameter value.
log: the hyperparameter range is scaled logarithmically during the sampling (i.e., the range
log(MAX)is sampled uniformly at random and then exponentiated to yield the hyperparameter value). This results in a higher probability density for the sampling towards the lower end of the range.
reverse_log: this is the opposite scaling as that described in
log, resulting in a higher probability density for the sampling at the higher end of the range.
TYPEis optional and can consist of 2 different values (
floatis the default if not specified):
float: the resultant hyperparameter value is a floating point number.
int: the resultant hyperparameter value is an integer. If this option is specified, the value after randomly sampling is rounded to the nearest integer to yield the final hyperparameter value.
An example random hyperparameter search is as follows:
$ spell hyper random \ --num-runs 10 \ --param rate=.001:1.0:log \ --param layers=2:100:linear:int \ --param cell=gru,lstm,rnn -- \ python train.py --learning_rate :rate: --num_layers :layers: --cell_type :cell: Everything up-to-date 💫 Casting hyperparameter search #60… rate layers cell Run ID 0.535637 68 lstm 378 0.192321 21 gru 379 0.501205 34 lstm 380 0.00103308 40 gru 381 0.0976437 49 gru 382 0.0131644 36 rnn 383 0.00139867 27 lstm 384 0.0274699 3 lstm 385 0.350886 9 rnn 386 0.23146 66 lstm 387
Bayesian search uses the results of prior runs to try to pick new parameters to test intelligently. It will often either note that a large part of the parameter space is unexplored and pick something in that region or it will observe a prior success and pick something near that. This can help you save on the total number of iterations needed to find good parameters.
For a given objective function (e.g. the accuracy of your model), we treat this as a random function and, using the previously tested parameter samples and the resulting accuracy of your model after training, we create a posterior distribution over that objective function. From that we create an acquisition function which is our best guess of the potential a specific sample has. We then choose the sample which maximizes that acquisition function. There are a number of popular types of acquisition functions; our tool utilizes an upper confidence bound.
If that's a lot to take in, no need to worry: you only need to add a couple things to a random search and we will do the rest for you.
Similar to the random search you must specify one or more parameters via the
--param flag. These take the form
--param NAME=MIN:MAX[:TYPE] where
MIN is the lowest value that parameter is allowed to take
MAX is the highest and
TYPE is either
You must also inform the search of the name of the metric you would like to optimize via
--metric. To learn more about using metrics with Spell, you can check out the docs on Metrics. In addition, you need to specify how Spell should interpret the values of this metric observed. This is the
--metric-agg which can be
avg. For example, if you select the keras metric
keras/val_acc and aggregation type
last, Spell will use the last validation accuracy recorded in a given run and treat that as the success of your model for those parameters. The model will attempt to maximize this value, so make sure to select a metric and aggregation type appropriately.
Lastly, in addition to
--num-runs that you would specify for any hyperparameter search, bayesian search requires you to select the number of
--parallel-runs as well. This is the maximum number of trials that Spell will run in parallel. This reflects a tradeoff: if you choose a lower number, the search will proceed incrementally and will take longer to complete. If you choose a higher number, many runs will be in progress when a new run is launched and the new run's parameters will be selected without the benefit of knowing how well the in-progress trials do.
An example bayesian hyperparameter search is as follows:
$ spell hyper bayesian \ --num-runs 12 \ --parallel-runs 3 \ --metric keras/val_acc \ --metric-agg avg \ --param rate=.001:1.0 \ --param layers=2:100:int \ python train.py --learning_rate :rate: --num_layers :layers: Everything up-to-date 💫 Casting hyperparameter search #61… rate layers Run ID 0.343882 23 388 0.294112 72 389 0.587557 64 390
You can also check out our blog post to see an example of bayesian search in action.
Viewing Your Hypersearch on the Web
You can view the results of your hyperparameter run on the web.
The web visualization updates in real time, so you can see how each run is performing as they launch.
The Spell Python API also supports creating hyperparameter searches. See Spell Python API for more information on the Spell Python API in general, and Hyperparameter Searches for the hyperparameter search functionality.