Gluon Package¶
Overview¶
Gluon package is a high-level interface for MXNet designed to be easy to use while keeping most of the flexibility of low level API. Gluon supports both imperative and symbolic programming, making it easy to train complex models imperatively in Python and then deploy with symbolic graph in C++ and Scala.
Parameter¶
Parameter |
A Container holding parameters (weights) of Blocks. |
ParameterDict |
A dictionary managing a set of parameters. |
Containers¶
Block |
Base class for all neural network layers and models. |
HybridBlock |
HybridBlock supports forwarding with both Symbol and NDArray. |
SymbolBlock |
Construct block from symbol. |
nn.Sequential |
Stacks Blocks sequentially. |
nn.HybridSequential |
Stacks HybridBlocks sequentially. |
Utilities¶
split_data |
Splits an NDArray into num_slice slices along batch_axis. |
split_and_load |
Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list. |
clip_global_norm |
Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm. |
API Reference¶
Neural network module.
-
class
mxnet.gluon.
Block
(prefix=None, params=None)[source]¶ Base class for all neural network layers and models. Your models should subclass this class.
Block
can be nested recursively in a tree structure. You can create and assign childBlock
as regular attributes:from mxnet.gluon import Block, nn from mxnet import ndarray as F class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. # It also allows sharing Parameters between Blocks recursively. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(F.zeros((10, 10), ctx=mx.cpu(0)))
Child
Block
assigned this way will be registered andcollect_params()
will collect their Parameters recursively.Parameters: - prefix (str) – Prefix acts like a name space. It will be prepended to the names of all
Parameters and child
Block
s in thisBlock
‘sname_scope()
. Prefix should be unique within one model to prevent name collisions. - params (ParameterDict or None) –
ParameterDict
for sharing weights with the newBlock
. For example, if you wantdense1
to sharedense0
‘s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20, params=dense0.collect_params())
-
__weakref__
¶ list of weak references to the object (if defined)
-
cast
(dtype)[source]¶ Cast this Block to use another data type.
Parameters: dtype (str or numpy.dtype) – The new data type.
-
collect_params
()[source]¶ Returns a
ParameterDict
containing thisBlock
and all of its children’s Parameters.
-
forward
(*args)[source]¶ Overrides to implement forward computation using
NDArray
. Only accepts positional arguments.Parameters: *args (list of NDArray) – Input tensors.
-
hybridize
(active=True)[source]¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.Parameters: active (bool, default True) – Whether to turn hybrid on or off.
-
initialize
(init=, ctx=None, verbose=False)[source]¶ Initializes
Parameter
s of thisBlock
and its children.Equivalent to
block.collect_params().initialize(...)
-
load_params
(filename, ctx, allow_missing=False, ignore_extra=False)[source]¶ Load parameters from file.
- filename : str
- Path to parameter file.
- ctx : Context or list of Context
- Context(s) initialize loaded parameters on.
- allow_missing : bool, default False
- Whether to silently skip loading parameters not represents in the file.
- ignore_extra : bool, default False
- Whether to silently ignore parameters from the file that are not present in this Block.
-
name_scope
()[source]¶ Returns a name space object managing a child
Block
and parameter names. Should be used within awith
statement:with self.name_scope(): self.dense = nn.Dense(20)
- prefix (str) – Prefix acts like a name space. It will be prepended to the names of all
Parameters and child
-
exception
mxnet.gluon.
DeferredInitializationError
[source]¶ Error for unfinished deferred initialization.
-
class
mxnet.gluon.
HybridBlock
(prefix=None, params=None)[source]¶ HybridBlock supports forwarding with both Symbol and NDArray.
Forward computation in
HybridBlock
must be static to work withSymbol
s, i.e. you cannot callNDArray.asnumpy()
,NDArray.shape
,NDArray.dtype
, etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.Before activating with
hybridize()
,HybridBlock
works just like normalBlock
. After activation,HybridBlock
will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead ofhybrid_forward()
.Refer Hybrid tutorial to see the end-to-end usage.
-
export
(path)[source]¶ Export HybridBlock to json format that can be loaded by mxnet.mod.Module or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
Parameters: path (str) – Path to save model. Two files path-symbol.json and path-0000.params will be created.
-
forward
(x, *args)[source]¶ Defines the forward computation. Arguments can be either
NDArray
orSymbol
.
-
-
class
mxnet.gluon.
Parameter
(name, grad_req='write', shape=None, dtype=, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True)[source]¶ A Container holding parameters (weights) of Blocks.
Parameter
holds a copy of the parameter on eachContext
after it is initialized withParameter.initialize(...)
. Ifgrad_req
is not'null'
, it will also hold a gradient array on eachContext
:ctx = mx.gpu(0) x = mx.nd.zeros((16, 100), ctx=ctx) w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier()) b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero()) w.initialize(ctx=ctx) b.initialize(ctx=ctx) out = mx.nd.FullyConnected(x, w.data(ctx), b.data(ctx), num_hidden=64)
Parameters: - name (str) – Name of this parameter.
- grad_req ({'write', 'add', 'null'}, default 'write') –
Specifies how to update gradient to grad arrays.
'write'
means everytime gradient is written to gradNDArray
.'add'
means everytime gradient is added to the gradNDArray
. You need to manually callzero_grad()
to clear the gradient buffer before each iteration when using this option.- ‘null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
- shape (tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with
unknown shape can be used for
Symbol
API, butinit
will throw an error when usingNDArray
API. - dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example,
numpy.float32
or'float32'
. - lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
- wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
- init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
-
grad_req
¶ {‘write’, ‘add’, ‘null’} – This can be set before or after initialization. Setting
grad_req
to'null'
withx.grad_req = 'null'
saves memory and computation when you don’t need gradient w.r.t x.
-
lr_mult
¶ float – Local learning rate multiplier for this Parameter. The actual learning rate is calculated with
learning_rate * lr_mult
. You can set it withparam.lr_mult = 2.0
-
wd_mult
¶ float – Local weight decay multiplier for this Parameter.
-
__weakref__
¶ list of weak references to the object (if defined)
-
cast
(dtype)[source]¶ Cast data and gradient of this Parameter to a new data type.
Parameters: dtype (str or numpy.dtype) – The new data type.
-
data
(ctx=None)[source]¶ Returns a copy of this parameter on one context. Must have been initialized on this context before.
Parameters: ctx (Context) – Desired context. Returns: Return type: NDArray on ctx
-
grad
(ctx=None)[source]¶ Returns a gradient buffer for this parameter on one context.
Parameters: ctx (Context) – Desired context.
-
initialize
(init=None, ctx=None, default_init=, force_reinit=False)[source]¶ Initializes parameter and gradient arrays. Only used for
NDArray
API.Parameters: - init (Initializer) – The initializer to use. Overrides
Parameter.init()
and default_init. - ctx (Context or list of Context, defaults to
context.current_context()
.) –Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context.
Note
Copies are independent arrays. User is responsible for keeping their values consistent when updating. Normally
gluon.Trainer
does this for you. - default_init (Initializer) – Default initializer is used when both
init()
andParameter.init()
areNone
. - force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2)) >>> weight.initialize(ctx=mx.cpu(0)) >>> weight.data() [[-0.01068833 0.01729892] [ 0.02042518 -0.01618656]]
>>> weight.grad() [[ 0. 0.] [ 0. 0.]] >>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)]) >>> weight.data(mx.gpu(0)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] >>> weight.data(mx.gpu(1)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] - init (Initializer) – The initializer to use. Overrides
-
list_data
()[source]¶ Returns copies of this parameter on all contexts, in the same order as creation.
-
class
mxnet.gluon.
ParameterDict
(prefix='', shared=None)[source]¶ A dictionary managing a set of parameters.
Parameters: - prefix (str, default
''
) – The prefix to be prepended to all Parameters’ names created by this dict. - shared (ParameterDict or None) – If not
None
, when this dict’sget()
method creates a new parameter, will first try to retrieve it from “shared” dict. Usually used for sharing parameters with another Block.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get
(name, **kwargs)[source]¶ Retrieves a
Parameter
with nameself.prefix+name
. If not found,get()
will first try to retrieve it from “shared” dict. If still not found,get()
will create a newParameter
with key-word arguments and insert it to self.Parameters: - name (str) – Name of the desired Parameter. It will be prepended with this dictionary’s prefix.
- **kwargs (dict) – The rest of key-word arguments for the created
Parameter
.
Returns: The created or retrieved
Parameter
.Return type:
-
initialize
(init=, ctx=None, verbose=False, force_reinit=False)[source]¶ Initializes all Parameters managed by this dictionary to be used for
NDArray
API. It has no effect when usingSymbol
API.Parameters: - init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence. - ctx (Context or list of Context) – Keeps a copy of Parameters on one or many context(s).
- force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
- init (Initializer) – Global default Initializer to be used when
-
load
(filename, ctx, allow_missing=False, ignore_extra=False, restore_prefix='')[source]¶ Load parameters from file.
- filename : str
- Path to parameter file.
- ctx : Context or list of Context
- Context(s) initialize loaded parameters on.
- allow_missing : bool, default False
- Whether to silently skip loading parameters not represents in the file.
- ignore_extra : bool, default False
- Whether to silently ignore parameters from the file that are not present in this ParameterDict.
- restore_prefix : str, default ‘’
- prepend prefix to names of stored parameters before loading.
-
prefix
¶ Prefix of this dict. It will be prepended to
Parameter`s' name created with :py:func:`get
.
-
reset_ctx
(ctx)[source]¶ Re-assign all Parameters to other contexts.
- ctx : Context or list of Context, default
context.current_context()
. - Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
- ctx : Context or list of Context, default
-
save
(filename, strip_prefix='')[source]¶ Save parameters to file.
- filename : str
- Path to parameter file.
- strip_prefix : str, default ‘’
- Strip prefix from parameter names before saving.
-
setattr
(name, value)[source]¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.collect_params().setattr('grad_req', 'null')
or change the learning rate multiplier:
model.collect_params().setattr('lr_mult', 0.5)
Parameters: - name (str) – Name of the attribute.
- value (valid type for attribute name) – The new value for the attribute.
- prefix (str, default
-
class
mxnet.gluon.
SymbolBlock
(outputs, inputs, params=None)[source]¶ Construct block from symbol. This is useful for using pre-trained models as feature extractors. For example, you may want to extract get the output from fc2 layer in AlexNet.
Parameters: - outputs (Symbol or list of Symbol) – The desired output for SymbolBlock.
- inputs (Symbol or list of Symbol) – The Variables in output’s argument that should be used as inputs.
- params (ParameterDict) – Parameter dictionary for arguments and auxililary states of outputs that are not inputs.
Examples
>>> # To extract the feature from fc1 and fc2 layers of AlexNet: >>> alexnet = gluon.model_zoo.vision.alexnet(pretrained=True, ctx=mx.cpu(), prefix='model_') >>> inputs = mx.sym.var('data') >>> out = alexnet(inputs) >>> internals = out.get_internals() >>> print(internals.list_outputs()) ['data', ..., 'model_dense0_relu_fwd_output', ..., 'model_dense1_relu_fwd_output', ...] >>> outputs = [internals['model_dense0_relu_fwd_output'], internals['model_dense1_relu_fwd_output']] >>> # Create SymbolBlock that shares parameters with alexnet >>> feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params()) >>> x = mx.nd.random.normal(shape=(16, 3, 224, 224)) >>> print(feat_model(x))
-
class
mxnet.gluon.
Trainer
(params, optimizer, optimizer_params=None, kvstore='device', compression_params=None)[source]¶ Applies an Optimizer on a set of Parameters. Trainer should be used together with autograd.
Parameters: - params (ParameterDict) – The set of parameters to optimize.
- optimizer (str or Optimizer) – The optimizer to use. See help on Optimizer for a list of available optimizers.
- optimizer_params (dict) – Key-word arguments to be passed to optimizer constructor. For example, {‘learning_rate’: 0.1}. All optimizers accept learning_rate, wd (weight decay), clip_gradient, and lr_scheduler. See each optimizer’s constructor for a list of additional supported arguments.
- kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on
mxnet.kvstore.create
for more information. - compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
- Properties –
- ---------- –
- learning_rate (float) – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
-
__weakref__
¶ list of weak references to the object (if defined)
-
load_states
(fname)[source]¶ Loads trainer states (e.g. optimizer, momentum) from a file.
Parameters: fname (str) – Path to input states file.
-
save_states
(fname)[source]¶ Saves trainer states (e.g. optimizer, momentum) to a file.
Parameters: fname (str) – Path to output states file.
-
set_learning_rate
(lr)[source]¶ Sets a new learning rate of the optimizer.
Parameters: lr (float) – The new learning rate of the optimizer.
-
step
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update. Should be called after autograd.compute_gradient and outside of record() scope.
Parameters: - batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
- ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
-
class
mxnet.gluon.nn.
Sequential
(prefix=None, params=None)[source]¶ Stacks Blocks sequentially.
Example:
net = nn.Sequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20))
-
class
mxnet.gluon.nn.
HybridSequential
(prefix=None, params=None)[source]¶ Stacks HybridBlocks sequentially.
Example:
net = nn.Sequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.hybridize()
Parallelization utility optimizer.
-
mxnet.gluon.utils.
split_data
(data, num_slice, batch_axis=0, even_split=True)[source]¶ Splits an NDArray into num_slice slices along batch_axis. Usually used for data parallelism where each slices is sent to one device (i.e. GPU).
Parameters: - data (NDArray) – A batch of data.
- num_slice (int) – Number of desired slices.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements. If True, an error will be raised when num_slice does not evenly divide data.shape[batch_axis].
Returns: Return value is a list even if num_slice is 1.
Return type: list of NDArray
-
mxnet.gluon.utils.
split_and_load
(data, ctx_list, batch_axis=0, even_split=True)[source]¶ Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list.
Parameters: - data (NDArray) – A batch of data.
- ctx_list (list of Context) – A list of Contexts.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements.
Returns: Each corresponds to a context in ctx_list.
Return type: list of NDArray
-
mxnet.gluon.utils.
clip_global_norm
(arrays, max_norm)[source]¶ Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm.
-
mxnet.gluon.utils.
check_sha1
(filename, sha1_hash)[source]¶ Check whether the sha1 hash of the file content matches the expected hash.
Parameters: - filename (str) – Path to the file.
- sha1_hash (str) – Expected sha1 hash in hexadecimal digits.
Returns: Whether the file content matches the expected hash.
Return type: bool
-
mxnet.gluon.utils.
download
(url, path=None, overwrite=False, sha1_hash=None)[source]¶ Download an given URL
Parameters: - url (str) – URL to download
- path (str, optional) – Destination path to store downloaded file. By default stores to the current directory with same name as in url.
- overwrite (bool, optional) – Whether to overwrite destination file if already exists.
- sha1_hash (str, optional) – Expected sha1 hash in hexadecimal digits. Will ignore existing file when hash is specified but doesn’t match.
Returns: The file path of the downloaded file.
Return type: str