Decay rate for both squared gradients and delta x.
rescaling factor of gradient.
The constant as described in the thesis
L2 regularization coefficient add to all the weights
clip gradient in range [-clip_gradient, clip_gradient]
Sets an individual learning rate multiplier for each parameter.
Sets an individual learning rate multiplier for each parameter.
If you specify a learning rate multiplier for a parameter, then
the learning rate for the parameter will be set as the product of
the global learning rate and its multiplier.
note:: The default learning rate multiplier of a Variable
can be set with lr_mult
argument in the constructor.
Sets an individual weight decay multiplier for each parameter.
Sets an individual weight decay multiplier for each parameter.
By default, the weight decay multipler is set as 0 for all
parameters whose name don't end with
or _weight
, if
you call the _gamma
setIdx2Name
method to set idx2name.
note:: The default weight decay multiplier for a Variable
can be set with its wd_mult
argument in the constructor.
Update the parameters.
update num_update
Use setLrMult instead.
AdaDelta optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/abs/1212.5701