Contrib Optimization API¶
Overview¶
This document summaries the contrib APIs used to initialize and update the model weights during training
mxnet.optimizer.contrib |
Contrib optimizers. |
The Contrib Optimization
API, defined in the optimizer.contrib
package, provides
many useful experimental APIs for new features.
This is a place for the community to try out the new features,
so that feature contributors can receive feedback.
Warning
This package contains experimental APIs and may change in the near future.
In the rest of this document, we list routines provided by the optimizer.contrib
package.
Contrib¶
GroupAdaGrad |
Adagrad optimizer with row-wise learning rates. |
API Reference¶
Contrib optimizers.
-
class
mxnet.optimizer.contrib.
GroupAdaGrad
(eps=1e-05, **kwargs)[source]¶ Adagrad optimizer with row-wise learning rates.
This class implements the AdaGrad optimizer described in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf but uses only a single learning rate for every row of the parameter array.
This optimizer updates each weight by:
grad = clip(grad * rescale_grad, clip_gradient) history += mean(square(grad), axis=1, keepdims=True) div = grad / sqrt(history + float_stable_eps) weight -= div * lr
Weights are updated lazily if the gradient is sparse.
For details of the update algorithm see
group_adagrad_update
.This optimizer accepts the following parameters in addition to those accepted by
Optimizer
. Weight decay is not supported.Parameters: eps (float, optional) – Initial value of the history accumulator. Avoids division by 0.