Gluon Package¶
Overview¶
The Gluon package is a high-level interface for MXNet designed to be easy to use, while keeping most of the flexibility of a low level API. Gluon supports both imperative and symbolic programming, making it easy to train complex models imperatively in Python and then deploy with a symbolic graph in C++ and Scala.
Based on the the Gluon API specification, the Gluon API in Apache MXNet provides a clear, concise, and simple API for deep learning. It makes it easy to prototype, build, and train deep learning models without sacrificing training speed.
Advantages
- Simple, Easy-to-Understand Code: Gluon offers a full set of plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers.
- Flexible, Imperative Structure: Gluon does not require the neural network model to be rigidly defined, but rather brings the training algorithm and model closer together to provide flexibility in the development process.
- Dynamic Graphs: Gluon enables developers to define neural network models that are dynamic, meaning they can be built on the fly, with any structure, and using any of Python’s native control flow.
- High Performance: Gluon provides all of the above benefits without impacting the training speed that the underlying engine provides.
Examples
Simple, Easy-to-Understand Code
Use plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers:
net = gluon.nn.Sequential()
# When instantiated, Sequential stores a chain of neural network layers.
# Once presented with data, Sequential executes each layer in turn, using
# the output of one layer as the input for the next
with net.name_scope():
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
net.add(gluon.nn.Dense(num_outputs))
Flexible, Imperative Structure
Prototype, build, and train neural networks in fully imperative manner using the MXNet autograd package and the Gluon trainer method:
epochs = 10
for e in range(epochs):
for i, (data, label) in enumerate(train_data):
with autograd.record():
output = net(data) # the forward iteration
loss = softmax_cross_entropy(output, label)
loss.backward()
trainer.step(data.shape[0])
Dynamic Graphs
Build neural networks on the fly for use cases where neural networks must change in size and shape during model training:
def forward(self, F, inputs, tree):
children_outputs = [self.forward(F, inputs, child)
for child in tree.children]
#Recursively builds the neural network based on each input sentence’s
#syntactic structure during the model definition and training process
...
High Performance
Easily cache the neural network to achieve high performance by defining your neural network with HybridSequential and calling the hybridize method:
net = nn.HybridSequential()
with net.name_scope():
net.add(nn.Dense(256, activation="relu"))
net.add(nn.Dense(128, activation="relu"))
net.add(nn.Dense(2))
net.hybridize()
Contents¶
Parameter¶
Parameter |
A Container holding parameters (weights) of Blocks. |
Constant |
A constant parameter for holding immutable tensors. |
ParameterDict |
A dictionary managing a set of parameters. |
Containers¶
Block |
Base class for all neural network layers and models. |
HybridBlock |
HybridBlock supports forwarding with both Symbol and NDArray. |
SymbolBlock |
Construct block from symbol. |
nn.Sequential |
Stacks Blocks sequentially. |
nn.HybridSequential |
Stacks HybridBlocks sequentially. |
Utilities¶
split_data |
Splits an NDArray into num_slice slices along batch_axis. |
split_and_load |
Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list. |
clip_global_norm |
Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm. |
API Reference¶
Neural network module.
-
class
mxnet.gluon.
Block
(prefix=None, params=None)[source]¶ Base class for all neural network layers and models. Your models should subclass this class.
Block
can be nested recursively in a tree structure. You can create and assign childBlock
as regular attributes:from mxnet.gluon import Block, nn from mxnet import ndarray as F class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(F.zeros((10, 10), ctx=mx.cpu(0)))
Child
Block
assigned this way will be registered andcollect_params()
will collect their Parameters recursively. You can also manually register child blocks withregister_child()
.Parameters: - prefix (str) – Prefix acts like a name space. All children blocks created in parent block’s
name_scope()
will have parent block’s prefix in their name. Please refer to naming tutorial for more info on prefix and naming. - params (ParameterDict or None) –
ParameterDict
for sharing weights with the newBlock
. For example, if you wantdense1
to sharedense0
‘s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20, params=dense0.collect_params())
-
__weakref__
¶ list of weak references to the object (if defined)
-
apply
(fn)[source]¶ Applies
fn
recursively to every child block as well as self.Parameters: fn (callable) – Function to be applied to each submodule, of form fn(block). Returns: Return type: this block
-
cast
(dtype)[source]¶ Cast this Block to use another data type.
Parameters: dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)[source]¶ Returns a
ParameterDict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectParameterDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1_weight’, ‘conv1_bias’, ‘fc_weight’, ‘fc_bias’]:
model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
Parameters: select (str) – regular expressions Returns: Return type: The selected ParameterDict
-
forward
(*args)[source]¶ Overrides to implement forward computation using
NDArray
. Only accepts positional arguments.Parameters: *args (list of NDArray) – Input tensors.
-
hybridize
(active=True, **kwargs)[source]¶ Activates or deactivates
HybridBlock
s recursively. Has no effect on non-hybrid children.Parameters: - active (bool, default True) – Whether to turn hybrid on or off.
- static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
- static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
-
initialize
(init=, ctx=None, verbose=False, force_reinit=False)[source]¶ Initializes
Parameter
s of thisBlock
and its children. Equivalent toblock.collect_params().initialize(...)
Parameters: - init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence. - ctx (Context or list of Context) – Keeps a copy of Parameters on one or many context(s).
- verbose (bool, default False) – Whether to verbosely print out details on initialization.
- force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
- init (Initializer) – Global default Initializer to be used when
-
load_parameters
(filename, ctx=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')[source]¶ Load parameters from file previously saved by save_parameters.
Parameters: - filename (str) – Path to parameter file.
- ctx (Context or list of Context, default cpu()) – Context(s) to initialize loaded parameters on.
- allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
- ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
- cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
- dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
load_params
(filename, ctx=None, allow_missing=False, ignore_extra=False)[source]¶ [Deprecated] Please use load_parameters.
Load parameters from file.
- filename : str
- Path to parameter file.
- ctx : Context or list of Context, default cpu()
- Context(s) to initialize loaded parameters on.
- allow_missing : bool, default False
- Whether to silently skip loading parameters not represents in the file.
- ignore_extra : bool, default False
- Whether to silently ignore parameters from the file that are not present in this Block.
-
name_scope
()[source]¶ Returns a name space object managing a child
Block
and parameter names. Should be used within awith
statement:with self.name_scope(): self.dense = nn.Dense(20)
Please refer to naming tutorial for more info on prefix and naming.
-
register_child
(block, name=None)[source]¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)[source]¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.Parameters: hook (callable) – The forward hook function of form hook(block, input, output) -> None. Returns: Return type: mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)[source]¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.Parameters: hook (callable) – The forward hook function of form hook(block, input) -> None. Returns: Return type: mxnet.gluon.utils.HookHandle
-
save_parameters
(filename)[source]¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.Parameters: filename (str) – Path to file. References
-
save_params
(filename)[source]¶ [Deprecated] Please use save_parameters. Note that if you want load from SymbolBlock later, please use export instead.
Save parameters to file.
- filename : str
- Path to file.
-
summary
(*inputs)[source]¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
Parameters: inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray
is supported.
- prefix (str) – Prefix acts like a name space. All children blocks created in parent block’s
-
class
mxnet.gluon.
Constant
(name, value)[source]¶ A constant parameter for holding immutable tensors. Constant`s are ignored by `autograd and Trainer, thus their values will not change during training. But you can still update their values manually with the set_data method.
Constant s can be created with either:
const = mx.gluon.Constant('const', [[1,2],[3,4]])
or:
class Block(gluon.Block): def __init__(self, **kwargs): super(Block, self).__init__(**kwargs) self.const = self.params.get_constant('const', [[1,2],[3,4]])
Parameters: - name (str) – Name of the parameter.
- value (array-like) – Initial value for the constant.
-
exception
mxnet.gluon.
DeferredInitializationError
[source]¶ Error for unfinished deferred initialization.
-
class
mxnet.gluon.
HybridBlock
(prefix=None, params=None)[source]¶ HybridBlock supports forwarding with both Symbol and NDArray.
HybridBlock is similar to Block, with a few differences:
import mxnet as mx from mxnet.gluon import HybridBlock, nn class Model(HybridBlock): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def hybrid_forward(self, F, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model.hybridize() model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))
Forward computation in
HybridBlock
must be static to work withSymbol
s, i.e. you cannot callNDArray.asnumpy()
,NDArray.shape
,NDArray.dtype
, NDArray indexing (x[i]) etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.Before activating with
hybridize()
,HybridBlock
works just like normalBlock
. After activation,HybridBlock
will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead ofhybrid_forward()
.Please see references for detailed tutorial.
References
Hybrid - Faster training and easy deployment
-
export
(path, epoch=0, remove_amp_cast=True)[source]¶ Export HybridBlock to json format that can be loaded by SymbolBlock.imports, mxnet.mod.Module or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
Parameters: - path (str) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number.
- epoch (int) – Epoch number of saved model.
-
forward
(x, *args)[source]¶ Defines the forward computation. Arguments can be either
NDArray
orSymbol
.
-
-
class
mxnet.gluon.
Parameter
(name, grad_req='write', shape=None, dtype=, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]¶ A Container holding parameters (weights) of Blocks.
Parameter
holds a copy of the parameter on eachContext
after it is initialized withParameter.initialize(...)
. Ifgrad_req
is not'null'
, it will also hold a gradient array on eachContext
:ctx = mx.gpu(0) x = mx.nd.zeros((16, 100), ctx=ctx) w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier()) b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero()) w.initialize(ctx=ctx) b.initialize(ctx=ctx) out = mx.nd.FullyConnected(x, w.data(ctx), b.data(ctx), num_hidden=64)
Parameters: - name (str) – Name of this parameter.
- grad_req ({'write', 'add', 'null'}, default 'write') –
Specifies how to update gradient to grad arrays.
'write'
means everytime gradient is written to gradNDArray
.'add'
means everytime gradient is added to the gradNDArray
. You need to manually callzero_grad()
to clear the gradient buffer before each iteration when using this option.- ‘null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
- shape (int or tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with
unknown shape can be used for
Symbol
API, butinit
will throw an error when usingNDArray
API. - dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example,
numpy.float32
or'float32'
. - lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
- wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
- init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
- stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter.
- grad_stype ({'default', 'row_sparse', 'csr'}, defaults to 'default'.) – The storage type of the parameter’s gradient.
-
grad_req
¶ {‘write’, ‘add’, ‘null’} – This can be set before or after initialization. Setting
grad_req
to'null'
withx.grad_req = 'null'
saves memory and computation when you don’t need gradient w.r.t x.
-
lr_mult
¶ float – Local learning rate multiplier for this Parameter. The actual learning rate is calculated with
learning_rate * lr_mult
. You can set it withparam.lr_mult = 2.0
-
wd_mult
¶ float – Local weight decay multiplier for this Parameter.
-
__weakref__
¶ list of weak references to the object (if defined)
-
cast
(dtype)[source]¶ Cast data and gradient of this Parameter to a new data type.
Parameters: dtype (str or numpy.dtype) – The new data type.
-
data
(ctx=None)[source]¶ Returns a copy of this parameter on one context. Must have been initialized on this context before. For sparse parameters, use
Parameter.row_sparse_data()
instead.Parameters: ctx (Context) – Desired context. Returns: Return type: NDArray on ctx
-
dtype
¶ The type of the parameter.
Setting the dtype value is equivalent to casting the value of the parameter
-
grad
(ctx=None)[source]¶ Returns a gradient buffer for this parameter on one context.
Parameters: ctx (Context) – Desired context.
-
initialize
(init=None, ctx=None, default_init=, force_reinit=False)[source]¶ Initializes parameter and gradient arrays. Only used for
NDArray
API.Parameters: - init (Initializer) – The initializer to use. Overrides
Parameter.init()
and default_init. - ctx (Context or list of Context, defaults to
context.current_context()
.) –Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context.
Note
Copies are independent arrays. User is responsible for keeping their values consistent when updating. Normally
gluon.Trainer
does this for you. - default_init (Initializer) – Default initializer is used when both
init()
andParameter.init()
areNone
. - force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2)) >>> weight.initialize(ctx=mx.cpu(0)) >>> weight.data() [[-0.01068833 0.01729892] [ 0.02042518 -0.01618656]]
>>> weight.grad() [[ 0. 0.] [ 0. 0.]] >>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)]) >>> weight.data(mx.gpu(0)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] >>> weight.data(mx.gpu(1)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] - init (Initializer) – The initializer to use. Overrides
-
list_data
()[source]¶ Returns copies of this parameter on all contexts, in the same order as creation. For sparse parameters, use
Parameter.list_row_sparse_data()
instead.Returns: Return type: list of NDArrays
-
list_row_sparse_data
(row_id)[source]¶ Returns copies of the ‘row_sparse’ parameter on all contexts, in the same order as creation. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized before.
Parameters: row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter. Returns: Return type: list of NDArrays
-
reset_ctx
(ctx)[source]¶ Re-assign Parameter to other contexts.
Parameters: ctx (Context or list of Context, default context.current_context()
.) – Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
-
row_sparse_data
(row_id)[source]¶ Returns a copy of the ‘row_sparse’ parameter on the same context as row_id’s. The copy only retains rows whose ids occur in provided row ids. The parameter must have been initialized on this context before.
Parameters: row_id (NDArray) – Row ids to retain for the ‘row_sparse’ parameter. Returns: Return type: NDArray on row_id’s context
-
shape
¶ The shape of the parameter.
By default, an unknown dimension size is 0. However, when the NumPy semantic is turned on, unknown dimension size is -1.
-
class
mxnet.gluon.
ParameterDict
(prefix='', shared=None)[source]¶ A dictionary managing a set of parameters.
Parameters: - prefix (str, default
''
) – The prefix to be prepended to all Parameters’ names created by this dict. - shared (ParameterDict or None) – If not
None
, when this dict’sget()
method creates a new parameter, will first try to retrieve it from “shared” dict. Usually used for sharing parameters with another Block.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get
(name, **kwargs)[source]¶ Retrieves a
Parameter
with nameself.prefix+name
. If not found,get()
will first try to retrieve it from “shared” dict. If still not found,get()
will create a newParameter
with key-word arguments and insert it to self.Parameters: - name (str) – Name of the desired Parameter. It will be prepended with this dictionary’s prefix.
- **kwargs (dict) – The rest of key-word arguments for the created
Parameter
.
Returns: The created or retrieved
Parameter
.Return type:
-
get_constant
(name, value=None)[source]¶ Retrieves a
Constant
with nameself.prefix+name
. If not found,get()
will first try to retrieve it from “shared” dict. If still not found,get()
will create a newConstant
with key-word arguments and insert it to self.Parameters: - name (str) – Name of the desired Constant. It will be prepended with this dictionary’s prefix.
- value (array-like) – Initial value of constant.
Returns: The created or retrieved
Constant
.Return type:
-
initialize
(init=, ctx=None, verbose=False, force_reinit=False)[source]¶ Initializes all Parameters managed by this dictionary to be used for
NDArray
API. It has no effect when usingSymbol
API.Parameters: - init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence. - ctx (Context or list of Context) – Keeps a copy of Parameters on one or many context(s).
- verbose (bool, default False) – Whether to verbosely print out details on initialization.
- force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
- init (Initializer) – Global default Initializer to be used when
-
load
(filename, ctx=None, allow_missing=False, ignore_extra=False, restore_prefix='', cast_dtype=False, dtype_source='current')[source]¶ Load parameters from file.
Parameters: - filename (str) – Path to parameter file.
- ctx (Context or list of Context) – Context(s) initialize loaded parameters on.
- allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
- ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this ParameterDict.
- restore_prefix (str, default '') – prepend prefix to names of stored parameters before loading.
- cast_dtype (bool, default False) – Cast the data type of the parameter
- dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
prefix
¶ Prefix of this dict. It will be prepended to
Parameter`s' name created with :py:func:`get
.
-
reset_ctx
(ctx)[source]¶ Re-assign all Parameters to other contexts.
Parameters: ctx (Context or list of Context, default context.current_context()
.) – Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
-
save
(filename, strip_prefix='')[source]¶ Save parameters to file.
Parameters: - filename (str) – Path to parameter file.
- strip_prefix (str, default '') – Strip prefix from parameter names before saving.
-
setattr
(name, value)[source]¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.collect_params().setattr('grad_req', 'null')
or change the learning rate multiplier:
model.collect_params().setattr('lr_mult', 0.5)
Parameters: - name (str) – Name of the attribute.
- value (valid type for attribute name) – The new value for the attribute.
- prefix (str, default
-
class
mxnet.gluon.
SymbolBlock
(outputs, inputs, params=None)[source]¶ Construct block from symbol. This is useful for using pre-trained models as feature extractors. For example, you may want to extract the output from fc2 layer in AlexNet.
Parameters: - outputs (Symbol or list of Symbol) – The desired output for SymbolBlock.
- inputs (Symbol or list of Symbol) – The Variables in output’s argument that should be used as inputs.
- params (ParameterDict) – Parameter dictionary for arguments and auxililary states of outputs that are not inputs.
Examples
>>> # To extract the feature from fc1 and fc2 layers of AlexNet: >>> alexnet = gluon.model_zoo.vision.alexnet(pretrained=True, ctx=mx.cpu(), prefix='model_') >>> inputs = mx.sym.var('data') >>> out = alexnet(inputs) >>> internals = out.get_internals() >>> print(internals.list_outputs()) ['data', ..., 'model_dense0_relu_fwd_output', ..., 'model_dense1_relu_fwd_output', ...] >>> outputs = [internals['model_dense0_relu_fwd_output'], internals['model_dense1_relu_fwd_output']] >>> # Create SymbolBlock that shares parameters with alexnet >>> feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params()) >>> x = mx.nd.random.normal(shape=(16, 3, 224, 224)) >>> print(feat_model(x))
-
static
imports
(symbol_file, input_names, param_file=None, ctx=None)[source]¶ Import model previously saved by HybridBlock.export or Module.save_checkpoint as a SymbolBlock for use in Gluon.
Parameters: - symbol_file (str) – Path to symbol file.
- input_names (list of str) – List of input variable names
- param_file (str, optional) – Path to parameter file.
- ctx (Context, default None) – The context to initialize SymbolBlock on.
Returns: SymbolBlock loaded from symbol and parameter files.
Return type: Examples
>>> net1 = gluon.model_zoo.vision.resnet18_v1( ... prefix='resnet', pretrained=True) >>> net1.hybridize() >>> x = mx.nd.random.normal(shape=(1, 3, 32, 32)) >>> out1 = net1(x) >>> net1.export('net1', epoch=1) >>> >>> net2 = gluon.SymbolBlock.imports( ... 'net1-symbol.json', ['data'], 'net1-0001.params') >>> out2 = net2(x)
-
class
mxnet.gluon.
Trainer
(params, optimizer, optimizer_params=None, kvstore='device', compression_params=None, update_on_kvstore=None)[source]¶ Applies an Optimizer on a set of Parameters. Trainer should be used together with autograd.
Note
For the following cases, updates will always happen on kvstore, i.e., you cannot set update_on_kvstore=False.
- dist kvstore with sparse weights or sparse gradients
- dist async kvstore
- optimizer.lr_scheduler is not None
Parameters: - params (ParameterDict) – The set of parameters to optimize.
- optimizer (str or Optimizer) – The optimizer to use. See help on Optimizer for a list of available optimizers.
- optimizer_params (dict) – Key-word arguments to be passed to optimizer constructor. For example, {‘learning_rate’: 0.1}. All optimizers accept learning_rate, wd (weight decay), clip_gradient, and lr_scheduler. See each optimizer’s constructor for a list of additional supported arguments.
- kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on
mxnet.kvstore.create
for more information. - compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
- update_on_kvstore (bool, default None) – Whether to perform parameter updates on kvstore. If None, then trainer will choose the more suitable option depending on the type of kvstore. If the update_on_kvstore argument is provided, environment variable MXNET_UPDATE_ON_KVSTORE will be ignored.
- Properties –
- ---------- –
- learning_rate (float) – The current learning rate of the optimizer. Given an Optimizer object optimizer, its learning rate can be accessed as optimizer.learning_rate.
-
__weakref__
¶ list of weak references to the object (if defined)
-
allreduce_grads
()[source]¶ For each parameter, reduce the gradients from different contexts.
Should be called after autograd.backward(), outside of record() scope, and before trainer.update().
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
-
load_states
(fname)[source]¶ Loads trainer states (e.g. optimizer, momentum) from a file.
Parameters: fname (str) – Path to input states file. Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be loaded from the file, but rather set based on current Trainer’s parameters.
-
save_states
(fname)[source]¶ Saves trainer states (e.g. optimizer, momentum) to a file.
Parameters: fname (str) – Path to output states file. Note
optimizer.param_dict, which contains Parameter information (such as lr_mult and wd_mult) will not be saved.
-
set_learning_rate
(lr)[source]¶ Sets a new learning rate of the optimizer.
Parameters: lr (float) – The new learning rate of the optimizer.
-
step
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update. Should be called after autograd.backward() and outside of record() scope.
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
Parameters: - batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
- ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
-
update
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update.
Should be called after autograd.backward() and outside of record() scope, and after trainer.update().
For normal parameter updates, step() should be used, which internally calls allreduce_grads() and then update(). However, if you need to get the reduced gradients to perform certain transformation, such as in gradient clipping, then you may want to manually call allreduce_grads() and update() separately.
Parameters: - batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
- ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
-
class
mxnet.gluon.nn.
Sequential
(prefix=None, params=None)[source]¶ Stacks Blocks sequentially.
Example:
net = nn.Sequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20))
-
class
mxnet.gluon.nn.
HybridSequential
(prefix=None, params=None)[source]¶ Stacks HybridBlocks sequentially.
Example:
net = nn.HybridSequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20)) net.hybridize()
Parallelization utility optimizer.
-
mxnet.gluon.utils.
split_data
(data, num_slice, batch_axis=0, even_split=True)[source]¶ Splits an NDArray into num_slice slices along batch_axis. Usually used for data parallelism where each slices is sent to one device (i.e. GPU).
Parameters: - data (NDArray) – A batch of data.
- num_slice (int) – Number of desired slices.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements. If True, an error will be raised when num_slice does not evenly divide data.shape[batch_axis].
Returns: Return value is a list even if num_slice is 1.
Return type: list of NDArray
-
mxnet.gluon.utils.
split_and_load
(data, ctx_list, batch_axis=0, even_split=True)[source]¶ Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list.
Parameters: - data (NDArray) – A batch of data.
- ctx_list (list of Context) – A list of Contexts.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements.
Returns: Each corresponds to a context in ctx_list.
Return type: list of NDArray
-
mxnet.gluon.utils.
clip_global_norm
(arrays, max_norm, check_isfinite=True)[source]¶ Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm.
Parameters: - arrays (list of NDArray) –
- max_norm (float) –
- check_isfinite (bool, default True) – If True, check that the total_norm is finite (not nan or inf). This requires a blocking .asscalar() call.
Returns: Total norm. Return type is NDArray of shape (1,) if check_isfinite is False. Otherwise a float is returned.
Return type: NDArray or float
-
mxnet.gluon.utils.
check_sha1
(filename, sha1_hash)[source]¶ Check whether the sha1 hash of the file content matches the expected hash.
Parameters: - filename (str) – Path to the file.
- sha1_hash (str) – Expected sha1 hash in hexadecimal digits.
Returns: Whether the file content matches the expected hash.
Return type: bool
-
mxnet.gluon.utils.
download
(url, path=None, overwrite=False, sha1_hash=None, retries=5, verify_ssl=True)[source]¶ Download an given URL
Parameters: - url (str) – URL to download
- path (str, optional) – Destination path to store downloaded file. By default stores to the current directory with same name as in url.
- overwrite (bool, optional) – Whether to overwrite destination file if already exists.
- sha1_hash (str, optional) – Expected sha1 hash in hexadecimal digits. Will ignore existing file when hash is specified but doesn’t match.
- retries (integer, default 5) – The number of times to attempt the download in case of failure or non 200 return codes
- verify_ssl (bool, default True) – Verify SSL certificates.
Returns: The file path of the downloaded file.
Return type: str