Contrib Optimization API¶
This document summaries the contrib APIs used to initialize and update the model weights during training
Contrib Optimization API, defined in the
optimizer.contrib package, provides
many useful experimental APIs for new features.
This is a place for the community to try out the new features,
so that feature contributors can receive feedback.
This package contains experimental APIs and may change in the near future.
In the rest of this document, we list routines provided by the
Adagrad optimizer with row-wise learning rates.
This class implements the AdaGrad optimizer described in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, and available at http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf but uses only a single learning rate for every row of the parameter array.
This optimizer updates each weight by:
grad = clip(grad * rescale_grad, clip_gradient) history += mean(square(grad), axis=1, keepdims=True) div = grad / sqrt(history + float_stable_eps) weight -= div * lr
Weights are updated lazily if the gradient is sparse.
For details of the update algorithm see
This optimizer accepts the following parameters in addition to those accepted by
Optimizer. Weight decay is not supported.
Parameters: eps (float, optional) – Initial value of the history accumulator. Avoids division by 0.