org.apache.clojure-mxnet.optimizer

ada-delta

(ada-delta {:keys [rho rescale-gradient epsilon wd clip-gradient], :as opts, :or {rho 0.05, rescale-gradient 1.0, epsilon 1.0E-8, wd 0.0, clip-gradient 0}})(ada-delta)
AdaDelta optimizer as described in Matthew D. Zeiler, 2012.
http://arxiv.org/abs/1212.5701

ada-grad

(ada-grad {:keys [learning-rate rescale-gradient epsilon wd], :or {learning-rate 0.05, rescale-gradient 1.0, epsilon 1.0E-7, wd 0.0}})(ada-grad)
 AdaGrad optimizer as described in Duchi, Hazan and Singer, 2011.
http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

- learning-rate Step size.
- epsilon A small number to make the updating processing stable.
             Default value is set to 1e-7.
- rescale-gradient rescaling factor of gradient.
- wd L2 regularization coefficient add to all the weights

adam

(adam {:keys [learning-rate beta1 beta2 epsilon decay-factor wd clip-gradient lr-scheduler], :or {learning-rate 0.002, beta1 0.9, beta2 0.999, epsilon 1.0E-8, decay-factor (- 1 1.0E-8), wd 0, clip-gradient 0}})(adam)
Adam optimizer as described in [King2014]

[King2014] Diederik Kingma, Jimmy Ba,
Adam: A Method for Stochastic Optimization,
http://arxiv.org/abs/1412.6980

 - learning-rate  Step size.
 - beta1  Exponential decay rate for the first moment estimates.
 - beta2  Exponential decay rate for the second moment estimates.
 -  epsilon
 - decay-factor
 - wd L2 regularization coefficient add to all the weights
 - clip-gradient  clip gradient in range [-clip_gradient, clip_gradient]
 - lr-scheduler The learning rate scheduler

create-state

(create-state optimizer index weight)
Create additional optimizer state such as momentum.

dcasgd

(dcasgd {:keys [learning-rate momentum lambda wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, lambda 0.04, wd 0.0, clip-gradient 0}})(dcasgd)
DCASGD optimizer with momentum and weight regularization.
Implementation of paper 'Asynchronous Stochastic Gradient Descent with
Delay Compensation for Distributed Deep Learning'

nag

(nag {:keys [learning-rate momentum wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, wd 1.0E-4, clip-gradient 0}})(nag)
SGD with nesterov.
It is implemented according to
https://github.com/torch/optim/blob/master/sgd.lua

rms-prop

(rms-prop {:keys [learning-rate rescale-gradient gamma1 gamma2 wd lr-scheduler clip-gradient], :or {learning-rate 0.002, rescale-gradient 1.0, gamma1 0.95, gamma2 0.9, wd 0.0, clip-gradient 0}})(rms-prop)
RMSProp optimizer as described in Tieleman & Hinton, 2012.
http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
- learningRate Step size.
- gamma1  decay factor of moving average for gradient, gradient^^2.
-  gamma2  momentum factor of moving average for gradient.
-  rescale-gradient rescaling factor of gradient.
-  wd L2 regularization coefficient add to all the weights
-  clip-gradient clip gradient in range [-clip_gradient, clip_gradient]
-  lr-scheduler The learning rate scheduler

sgd

(sgd {:keys [learning-rate momentum wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, wd 1.0E-4, clip-gradient 0}})(sgd)
A very simple SGD optimizer with momentum and weight regularization.

sgld

(sgld {:keys [learning-rate rescale-gradient wd clip-gradient lr-scheduler], :or {learning-rate 0.01, rescale-gradient 1, wd 1.0E-4, clip-gradient 0}})(sgld)
Stochastic Langevin Dynamics Updater to sample from a distribution.

- learning-rate Step size.
- rescale-gradient rescaling factor of gradient.
- wd L2 regularization coefficient add to all the weights
- clip-gradient Float, clip gradient in range [-clip_gradient, clip_gradient]
- lr-scheduler The learning rate scheduler

update

(update optimizer index weight grad state)
Update the parameters.
- optimizer - the optimizer
-  index An unique integer key used to index the parameters
-  weight weight ndarray
-  grad grad ndarray
 -  state NDArray or other objects returned by initState
          The auxiliary state used in optimization.