gluon.Trainer¶
-
class
mxnet.gluon.
Trainer
(params, optimizer, optimizer_params=None, kvstore='device', compression_params=None, update_on_kvstore=None)[source]¶ Bases:
object
Applies an Optimizer on a set of Parameters. Trainer should be used together with autograd.
Note
For the following cases, updates will always happen on kvstore, i.e., you cannot set update_on_kvstore=False.
dist kvstore with sparse weights or sparse gradients
dist async kvstore
optimizer.lr_scheduler is not None
Methods
For each parameter, reduce the gradients from different contexts.
load_states
(fname)Loads trainer states (e.g.
save_states
(fname)Saves trainer states (e.g.
Sets a new learning rate of the optimizer.
step
(batch_size[, ignore_stale_grad])Makes one step of parameter update.
update
(batch_size[, ignore_stale_grad])Makes one step of parameter update.
- Parameters
params (ParameterDict) – The set of parameters to optimize.
optimizer (str or Optimizer) – The optimizer to use. See help on Optimizer for a list of available optimizers.
optimizer_params (dict) – Key-word arguments to be passed to optimizer constructor. For example, {‘learning_rate’: 0.1}. All optimizers accept learning_rate, wd (weight decay), clip_gradient, and lr_scheduler. See each optimizer’s constructor for a list of additional supported arguments.
kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on
mxnet.kvstore.create
for more information.compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:’2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
update_on_kvstore (bool, default None) – Whether to perform parameter updates on kvstore. If None, then trainer will choose the more suitable option depending on the type of kvstore. If the update_on_kvstore argument is provided, environment variable MXNET_UPDATE_ON_KVSTORE will be ignored.
Properties –
---------- –
learning_rate (float) – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
-
allreduce_grads
()[source]¶ For each parameter, reduce the gradients from different contexts.
Should be called after autograd.backward(), outside of record() scope, and before trainer.update().
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
-
load_states
(fname)[source]¶ Loads trainer states (e.g. optimizer, momentum) from a file.
- Parameters
fname (str) – Path to input states file.
Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be loaded from the file, but rather set based on current Trainer’s parameters.
-
save_states
(fname)[source]¶ Saves trainer states (e.g. optimizer, momentum) to a file.
- Parameters
fname (str) – Path to output states file.
Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be saved.
-
set_learning_rate
(lr)[source]¶ Sets a new learning rate of the optimizer.
- Parameters
lr (float) – The new learning rate of the optimizer.
-
step
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update. Should be called after autograd.backward() and outside of record() scope.
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
- Parameters
batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
-
update
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update.
Should be called after autograd.backward() and outside of record() scope, and after trainer.update().
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
- Parameters
batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.