AdaDelta optimizer as described in Matthew D.
AdaGrad optimizer as described in Matthew D.
AdaGrad optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/pdf/1212.5701v1.pdf
Adam optimizer as described in [King2014]
Adam optimizer as described in [King2014]
[King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980
DCASGD optimizer with momentum and weight regularization.
DCASGD optimizer with momentum and weight regularization. Implementation of paper "Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning"
SGD with nesterov.
SGD with nesterov. It is implemented according to https://github.com/torch/optim/blob/master/sgd.lua
RMSProp optimizer as described in Tieleman & Hinton, 2012.
RMSProp optimizer as described in Tieleman & Hinton, 2012. http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
A very simple SGD optimizer with momentum and weight regularization.
Stochastic Langevin Dynamics Updater to sample from a distribution.
AdaDelta optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/abs/1212.5701