gluon.Parameter¶
-
class
mxnet.gluon.
Parameter
(name, grad_req='write', shape=None, dtype=<class 'numpy.float32'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]¶ Bases:
object
A Container holding parameters (weights) of Blocks.
Parameter
holds a copy of the parameter on eachContext
after it is initialized withParameter.initialize(...)
. Ifgrad_req
is not'null'
, it will also hold a gradient array on eachContext
:ctx = mx.gpu(0) x = mx.nd.zeros((16, 100), ctx=ctx) w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier()) b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero()) w.initialize(ctx=ctx) b.initialize(ctx=ctx) out = mx.nd.FullyConnected(x, w.data(ctx), b.data(ctx), num_hidden=64)
Methods
cast
(dtype)Cast data and gradient of this Parameter to a new data type.
data
([ctx])Returns a copy of this parameter on one context.
grad
([ctx])Returns a gradient buffer for this parameter on one context.
initialize
([init, ctx, default_init, …])Initializes parameter and gradient arrays.
list_ctx
()Returns a list of contexts this parameter is initialized on.
Returns copies of this parameter on all contexts, in the same order as creation.
Returns gradient buffers on all contexts, in the same order as
values()
.list_row_sparse_data
(row_id)Returns copies of the ‘row_sparse’ parameter on all contexts, in the same order as creation.
reset_ctx
(ctx)Re-assign Parameter to other contexts.
row_sparse_data
(row_id)Returns a copy of the ‘row_sparse’ parameter on the same context as row_id’s.
set_data
(data)Sets this parameter’s value on all contexts.
var
()Returns a symbol representing this parameter.
Sets gradient buffer on all contexts to 0.
Attributes
The type of the parameter.
The shape of the parameter.
- Parameters
name (str) – Name of this parameter.
grad_req ({'write', 'add', 'null'}, default 'write') –
Specifies how to update gradient to grad arrays.
'write'
means everytime gradient is written to gradNDArray
.'add'
means everytime gradient is added to the gradNDArray
. You need to manually callzero_grad()
to clear the gradient buffer before each iteration when using this option.’null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
shape (int or tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for
Symbol
API, butinit
will throw an error when usingNDArray
API.dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example,
numpy.float32
or'float32'
.lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter.
grad_stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter’s gradient.
-
grad_req
¶ This can be set before or after initialization. Setting
grad_req
to'null'
withx.grad_req = 'null'
saves memory and computation when you don’t need gradient w.r.t x.- Type
{‘write’, ‘add’, ‘null’}
-
lr_mult
¶ Local learning rate multiplier for this Parameter. The actual learning rate is calculated with
learning_rate * lr_mult
. You can set it withparam.lr_mult = 2.0
- Type
float
-
wd_mult
¶ Local weight decay multiplier for this Parameter.
- Type
float
-
cast
(dtype)[source]¶ Cast data and gradient of this Parameter to a new data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
data
(ctx=None)[source]¶ Returns a copy of this parameter on one context. Must have been initialized on this context before. For sparse parameters, use
Parameter.row_sparse_data()
instead.- Parameters
ctx (Context) – Desired context.
- Returns
- Return type
NDArray on ctx
-
property
dtype
¶ The type of the parameter.
Setting the dtype value is equivalent to casting the value of the parameter
-
grad
(ctx=None)[source]¶ Returns a gradient buffer for this parameter on one context.
- Parameters
ctx (Context) – Desired context.
-
initialize
(init=None, ctx=None, default_init=<mxnet.initializer.Uniform object>, force_reinit=False)[source]¶ Initializes parameter and gradient arrays. Only used for
NDArray
API.- Parameters
init (Initializer) – The initializer to use. Overrides
Parameter.init()
and default_init.ctx (Context or list of Context, defaults to
context.current_context()
.) –Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context.
Note
Copies are independent arrays. User is responsible for keeping their values consistent when updating. Normally
gluon.Trainer
does this for you.default_init (Initializer) – Default initializer is used when both
init()
andParameter.init()
areNone
.force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2)) >>> weight.initialize(ctx=mx.cpu(0)) >>> weight.data() [[-0.01068833 0.01729892] [ 0.02042518 -0.01618656]] <NDArray 2x2 @cpu(0)> >>> weight.grad() [[ 0. 0.] [ 0. 0.]] <NDArray 2x2 @cpu(0)> >>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)]) >>> weight.data(mx.gpu(0)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] <NDArray 2x2 @gpu(0)> >>> weight.data(mx.gpu(1)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] <NDArray 2x2 @gpu(1)>
-
list_data
()[source]¶ Returns copies of this parameter on all contexts, in the same order as creation. For sparse parameters, use
Parameter.list_row_sparse_data()
instead.- Returns
- Return type
list of NDArrays
-
list_row_sparse_data
(row_id)[source]¶ Returns copies of the ‘row_sparse’ parameter on all contexts, in the same order as creation. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized before.
- Parameters
row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter.
- Returns
- Return type
list of NDArrays
-
reset_ctx
(ctx)[source]¶ Re-assign Parameter to other contexts.
- Parameters
ctx (Context or list of Context, default
context.current_context()
.) – Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
-
row_sparse_data
(row_id)[source]¶ Returns a copy of the ‘row_sparse’ parameter on the same context as row_id’s. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized on this context before.
- Parameters
row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter.
- Returns
- Return type
NDArray on row_id’s context
-
property
shape
¶ The shape of the parameter.
By default, an unknown dimension size is 0. However, when the NumPy semantic is turned on, unknown dimension size is -1.