org.apache.clojure-mxnet.optimizer
ada-delta
(ada-delta {:keys [rho rescale-gradient epsilon wd clip-gradient], :as opts, :or {rho 0.05, rescale-gradient 1.0, epsilon 1.0E-8, wd 0.0, clip-gradient 0}})
(ada-delta)
ada-grad
(ada-grad {:keys [learning-rate rescale-gradient epsilon wd], :or {learning-rate 0.05, rescale-gradient 1.0, epsilon 1.0E-7, wd 0.0}})
(ada-grad)
AdaGrad optimizer as described in Duchi, Hazan and Singer, 2011.
http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
- learning-rate Step size.
- epsilon A small number to make the updating processing stable.
Default value is set to 1e-7.
- rescale-gradient rescaling factor of gradient.
- wd L2 regularization coefficient add to all the weights
adam
(adam {:keys [learning-rate beta1 beta2 epsilon decay-factor wd clip-gradient lr-scheduler], :or {learning-rate 0.002, beta1 0.9, beta2 0.999, epsilon 1.0E-8, decay-factor (- 1 1.0E-8), wd 0, clip-gradient 0}})
(adam)
Adam optimizer as described in [King2014]
[King2014] Diederik Kingma, Jimmy Ba,
Adam: A Method for Stochastic Optimization,
http://arxiv.org/abs/1412.6980
- learning-rate Step size.
- beta1 Exponential decay rate for the first moment estimates.
- beta2 Exponential decay rate for the second moment estimates.
- epsilon
- decay-factor
- wd L2 regularization coefficient add to all the weights
- clip-gradient clip gradient in range [-clip_gradient, clip_gradient]
- lr-scheduler The learning rate scheduler
create-state
(create-state optimizer index weight)
Create additional optimizer state such as momentum.
dcasgd
(dcasgd {:keys [learning-rate momentum lambda wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, lambda 0.04, wd 0.0, clip-gradient 0}})
(dcasgd)
DCASGD optimizer with momentum and weight regularization.
Implementation of paper 'Asynchronous Stochastic Gradient Descent with
Delay Compensation for Distributed Deep Learning'
nag
(nag {:keys [learning-rate momentum wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, wd 1.0E-4, clip-gradient 0}})
(nag)
rms-prop
(rms-prop {:keys [learning-rate rescale-gradient gamma1 gamma2 wd lr-scheduler clip-gradient], :or {learning-rate 0.002, rescale-gradient 1.0, gamma1 0.95, gamma2 0.9, wd 0.0, clip-gradient 0}})
(rms-prop)
RMSProp optimizer as described in Tieleman & Hinton, 2012.
http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
- learningRate Step size.
- gamma1 decay factor of moving average for gradient, gradient^^2.
- gamma2 momentum factor of moving average for gradient.
- rescale-gradient rescaling factor of gradient.
- wd L2 regularization coefficient add to all the weights
- clip-gradient clip gradient in range [-clip_gradient, clip_gradient]
- lr-scheduler The learning rate scheduler
sgd
(sgd {:keys [learning-rate momentum wd clip-gradient lr-scheduler], :as opts, :or {learning-rate 0.01, momentum 0.0, wd 1.0E-4, clip-gradient 0}})
(sgd)
A very simple SGD optimizer with momentum and weight regularization.
sgld
(sgld {:keys [learning-rate rescale-gradient wd clip-gradient lr-scheduler], :or {learning-rate 0.01, rescale-gradient 1, wd 1.0E-4, clip-gradient 0}})
(sgld)
Stochastic Langevin Dynamics Updater to sample from a distribution.
- learning-rate Step size.
- rescale-gradient rescaling factor of gradient.
- wd L2 regularization coefficient add to all the weights
- clip-gradient Float, clip gradient in range [-clip_gradient, clip_gradient]
- lr-scheduler The learning rate scheduler
update
(update optimizer index weight grad state)
Update the parameters.
- optimizer - the optimizer
- index An unique integer key used to index the parameters
- weight weight ndarray
- grad grad ndarray
- state NDArray or other objects returned by initState
The auxiliary state used in optimization.