mx.opt.adam¶
Description¶
Create an Adam optimizer with respective parameters. Adam optimizer as described in [King2014].
[King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980
Usage¶
mx.opt.adam(
  learning.rate = 0.001,
  beta1 = 0.9,
  beta2 = 0.999,
  epsilon = 1e-08,
  wd = 0,
  rescale.grad = 1,
  clip_gradient = -1,
  lr_scheduler = NULL
)
Arguments¶
Argument  | 
Description  | 
|---|---|
  | 
float, default=1e-3. The initial learning rate.  | 
  | 
float, default=0.9. Exponential decay rate for the first moment estimates.  | 
  | 
float, default=0.999. Exponential decay rate for the second moment estimates.  | 
  | 
float, default=1e-8  | 
  | 
float, default=0.0. L2 regularization coefficient add to all the weights.  | 
  | 
float, default=1.0. rescaling factor of gradient.  | 
  | 
float, optional, default=-1 (no clipping if < 0). clip gradient in range [-clip_gradient, clip_gradient].  | 
  | 
function, optional. The learning rate scheduler.  | 
