Float, Step size.
Float, Exponential decay rate for the first moment estimates.
Float, Exponential decay rate for the second moment estimates.
Float
Float
Float, L2 regularization coefficient add to all the weights
Float, clip gradient in range [-clip_gradient, clip_gradient]
The learning rate scheduler
Float, Step size.
Sets an individual learning rate multiplier for each parameter.
Sets an individual learning rate multiplier for each parameter.
If you specify a learning rate multiplier for a parameter, then
the learning rate for the parameter will be set as the product of
the global learning rate and its multiplier.
note:: The default learning rate multiplier of a Variable
can be set with lr_mult
argument in the constructor.
Sets an individual weight decay multiplier for each parameter.
Sets an individual weight decay multiplier for each parameter.
By default, the weight decay multipler is set as 0 for all
parameters whose name don't end with
or _weight
, if
you call the _gamma
setIdx2Name
method to set idx2name.
note:: The default weight decay multiplier for a Variable
can be set with its wd_mult
argument in the constructor.
Update the parameters.
update num_update
(Since version 0.10.0) Use setLrMult instead.
Adam optimizer as described in [King2014]
[King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980