Module API

Overview

The module API, defined in the module (or simply mod) package, provides an intermediate and high-level interface for performing computation with a Symbol. One can roughly think a module is a machine which can execute a program defined by a Symbol.

The module.Module accepts a Symbol as the input.

>>> data = mx.sym.Variable('data')
>>> fc1  = mx.sym.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.sym.Activation(fc1, name='relu1', act_type="relu")
>>> fc2  = mx.sym.FullyConnected(act1, name='fc2', num_hidden=10)
>>> out  = mx.sym.SoftmaxOutput(fc2, name = 'softmax')
>>> mod = mx.mod.Module(out)  # create a module by given a Symbol

Assume there is a valid MXNet data iterator nd_iter. We can initialize the module:

>>> mod.bind(data_shapes=nd_iter.provide_data,
>>>          label_shapes=nd_iter.provide_label) # create memory by given input shapes
>>> mod.init_params()  # initial parameters with the default random initializer

Now the module is able to compute. We can call high-level API to train and predict:

>>> mod.fit(nd_iter, num_epoch=10, ...)  # train
>>> mod.predict(new_nd_iter)  # predict on new data

or use intermediate APIs to perform step-by-step computations

>>> mod.forward(data_batch)  # forward on the provided data batch
>>> mod.backward()  # backward to calculate the gradients
>>> mod.update()  # update parameters using the default optimizer

A detailed tutorial is available at Module - Neural network training and inference.

The module package provides several modules:

BaseModule The base class of a module.
Module Module is a basic module that wrap a Symbol.
SequentialModule A SequentialModule is a container module that can chain multiple modules together.
BucketingModule This module helps to deal efficiently with varying-length inputs.
PythonModule A convenient module class that implements many of the module APIs as empty functions.
PythonLossModule A convenient module class that implements many of the module APIs as empty functions.

We summarize the interface for each class in the following sections.

The BaseModule class

The BaseModule is the base class for all other module classes. It defines the interface each module class should provide.

Initialize memory

BaseModule.bind Binds the symbols to construct executors.

Get and set parameters

BaseModule.init_params Initializes the parameters and auxiliary states.
BaseModule.set_params Assigns parameter and aux state values.
BaseModule.get_params Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device.
BaseModule.save_params Saves model parameters to file.
BaseModule.load_params Loads model parameters from file.

Train and predict

BaseModule.fit Trains the module parameters.
BaseModule.score Runs prediction on eval_data and evaluates the performance according to the given eval_metric.
BaseModule.iter_predict Iterates over predictions.
BaseModule.predict Runs prediction and collects the outputs.

Forward and backward

BaseModule.forward Forward computation.
BaseModule.backward Backward computation.
BaseModule.forward_backward A convenient function that calls both forward and backward.

Update parameters

BaseModule.init_optimizer Installs and initializes optimizers, as well as initialize kvstore for
BaseModule.update Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.
BaseModule.update_metric Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Input and output

BaseModule.data_names A list of names for data required by this module.
BaseModule.output_names A list of names for the outputs of this module.
BaseModule.data_shapes A list of (name, shape) pairs specifying the data inputs to this module.
BaseModule.label_shapes A list of (name, shape) pairs specifying the label inputs to this module.
BaseModule.output_shapes A list of (name, shape) pairs specifying the outputs of this module.
BaseModule.get_outputs Gets outputs of the previous forward computation.
BaseModule.get_input_grads Gets the gradients to the inputs, computed in the previous backward computation.

Others

BaseModule.get_states Gets states from all devices
BaseModule.set_states Sets value for states.
BaseModule.install_monitor Installs monitor on all executors.
BaseModule.symbol Gets the symbol associated with this module.

Other build-in modules

Besides the basic interface defined in BaseModule, each module class supports additional functionality. We summarize them in this section.

Class Module

Module.load Creates a model from previously saved checkpoint.
Module.save_checkpoint Saves current progress to checkpoint.
Module.reshape Reshapes the module for new input shapes.
Module.borrow_optimizer Borrows optimizer from a shared module.
Module.save_optimizer_states Saves optimizer (updater) state to a file.
Module.load_optimizer_states Loads optimizer (updater) state from a file.

Class BucketModule

BucketingModule.switch_bucket Switches to a different bucket.

Class SequentialModule

SequentialModule.add Add a module to the chain.

API Reference

class mxnet.module.BaseModule(logger=)[source]

The base class of a module.

A module represents a computation component. One can think of module as a computation machine. A module can execute forward and backward passes and update parameters in a model. We aim to make the APIs easy to use, especially in the case when we need to use the imperative API to work with multiple modules (e.g. stochastic depth network).

A module has several states:

  • Initial state: Memory is not allocated yet, so the module is not ready for computation yet.
  • Binded: Shapes for inputs, outputs, and parameters are all known, memory has been allocated, and the module is ready for computation.
  • Parameters are initialized: For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.
  • Optimizer is installed: An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).

In order for a module to interact with others, it must be able to report the following information in its initial state (before binding):

  • data_names: list of type string indicating the names of the required input data.
  • output_names: list of type string indicating the names of the required outputs.

After binding, a module should be able to report the following richer information:

  • state information
    • binded: bool, indicates whether the memory buffers needed for computation have been allocated.
    • for_training: whether the module is bound for training.
    • params_initialized: bool, indicates whether the parameters of this module have been initialized.
    • optimizer_initialized: bool, indicates whether an optimizer is defined and initialized.
    • inputs_need_grad: bool, indicates whether gradients with respect to the input data are needed. Might be useful when implementing composition of modules.
  • input/output information
    • data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelism, the data arrays might not be of the same shape as viewed from the external world.
    • label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not bound for training.
    • output_shapes: a list of (name, shape) for outputs of the module.
  • parameters (for modules with parameters)
    • get_params(): return a tuple (arg_params, aux_params). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters.
    • set_params(arg_params, aux_params): assign parameters to the devices doing the computation.
    • init_params(...): a more flexible interface to assign or initialize the parameters.
  • setup
    • bind(): prepare environment for computation.
    • init_optimizer(): install optimizer for parameter updating.
    • prepare(): prepare the module based on the current data batch.
  • computation
    • forward(data_batch): forward operation.
    • backward(out_grads=None): backward operation.
    • update(): update parameters according to installed optimizer.
    • get_outputs(): get outputs of the previous forward operation.
    • get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.
    • update_metric(metric, labels, pre_sliced=False): update performance metric for the previous forward computed results.
  • other properties (mostly for backward compatibility)
    • symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.

When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:

  • fit: train the module parameters on a data set.
  • predict: run prediction on a data set and collect outputs.
  • score: run prediction on a data set and evaluate performance.

Examples

>>> # An example of creating a mxnet module.
>>> import mxnet as mx
>>> data = mx.symbol.Variable('data')
>>> fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
>>> fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
>>> act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
>>> fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
>>> out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
>>> mod = mx.mod.Module(out)
forward_backward(data_batch)[source]

A convenient function that calls both forward and backward.

score(eval_data, eval_metric, num_batch=None, batch_end_callback=None, score_end_callback=None, reset=True, epoch=0, sparse_row_id_fn=None)[source]

Runs prediction on eval_data and evaluates the performance according to the given eval_metric.

Checkout Module Tutorial to see a end-to-end use-case.

Parameters:
  • eval_data (DataIter) – Evaluation data to run prediction on.
  • eval_metric (EvalMetric or list of EvalMetrics) – Evaluation metric to use.
  • num_batch (int) – Number of batches to run. Defaults to None, indicating run until the DataIter finishes.
  • batch_end_callback (function) – Could also be a list of functions.
  • reset (bool) – Defaults to True. Indicates whether we should reset eval_data before starting evaluating.
  • epoch (int) – Defaults to 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

Examples

>>> # An example of using score for prediction.
>>> # Evaluate accuracy on val_dataiter
>>> metric = mx.metric.Accuracy()
>>> mod.score(val_dataiter, metric)
>>> mod.score(val_dataiter, ['mse', 'acc'])
iter_predict(eval_data, num_batch=None, reset=True, sparse_row_id_fn=None)[source]

Iterates over predictions.

Examples

>>> for pred, i_batch, batch in module.iter_predict(eval_data):
...     # pred is a list of outputs from the module
...     # i_batch is a integer
...     # batch is the data batch from the data iterator
Parameters:
  • eval_data (DataIter) – Evaluation data to run prediction on.
  • num_batch (int) – Default is None, indicating running all the batches in the data iterator.
  • reset (bool) – Default is True, indicating whether we should reset the data iter before start doing prediction.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
predict(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False, sparse_row_id_fn=None)[source]

Runs prediction and collects the outputs.

When merge_batches is True (by default), the return value will be a list [out1, out2, out3], where each element is formed by concatenating the outputs for all the mini-batches. When always_output_list is False (as by default), then in the case of a single output, out1 is returned instead of [out1].

When merge_batches is False, the return value will be a nested list like [[out1_batch1, out2_batch1], [out1_batch2], ...]. This mode is useful because in some cases (e.g. bucketing), the module does not necessarily produce the same number of outputs.

The objects in the results have type NDArray. If you need to work with a numpy array, just call .asnumpy() on each NDArray.

Parameters:
  • eval_data (DataIter or NDArray or numpy array) – Evaluation data to run prediction on.
  • num_batch (int) – Defaults to None, indicates running all the batches in the data iterator.
  • merge_batches (bool) – Defaults to True, see above for return values.
  • reset (bool) – Defaults to True, indicates whether we should reset the data iter before doing prediction.
  • always_output_list (bool) – Defaults to False, see above for return values.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
Returns:

Prediction results.

Return type:

list of NDArray or list of list of NDArray

Examples

>>> # An example of using `predict` for prediction.
>>> # Predict on the first 10 batches of val_dataiter
>>> mod.predict(eval_data=val_dataiter, num_batch=10)
fit(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_end_callback=None, eval_batch_end_callback=None, initializer=, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None, monitor=None, sparse_row_id_fn=None)[source]

Trains the module parameters.

Checkout Module Tutorial to see a end-to-end use-case.

Parameters:
  • train_data (DataIter) – Train DataIter.
  • eval_data (DataIter) – If not None, will be used as validation set and the performance after each epoch will be evaluated.
  • eval_metric (str or EvalMetric) – Defaults to ‘accuracy’. The performance measure used to display during training. Other possible predefined metrics are: ‘ce’ (CrossEntropy), ‘f1’, ‘mae’, ‘mse’, ‘rmse’, ‘top_k_accuracy’.
  • epoch_end_callback (function or list of functions) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.
  • batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’.
  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values.
  • eval_end_callback (function or list of function) – These will be called at the end of each full evaluation, with the metrics over the entire evaluation set.
  • eval_batch_end_callback (function or list of function) – These will be called at the end of each mini-batch during evaluation.
  • initializer (Initializer) – The initializer is called to initialize the module parameters when they are not already initialized.
  • arg_params (dict) – Defaults to None, if not None, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has a higher priority than initializer.
  • aux_params (dict) – Defaults to None. Similar to arg_params, except for auxiliary states.
  • allow_missing (bool) – Defaults to False. Indicates whether to allow missing parameters when arg_params and aux_params are not None. If this is True, then the missing parameters will be initialized via the initializer.
  • force_rebind (bool) – Defaults to False. Whether to force rebinding the executors if already bound.
  • force_init (bool) – Defaults to False. Indicates whether to force initialization even if the parameters are already initialized.
  • begin_epoch (int) – Defaults to 0. Indicates the starting epoch. Usually, if resumed from a checkpoint saved at a previous training phase at epoch N, then this value should be N+1.
  • num_epoch (int) – Number of epochs for training.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.

Examples

>>> # An example of using fit for training.
>>> # Assume training dataIter and validation dataIter are ready
>>> # Assume loading a previously checkpointed model
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 3)
>>> mod.fit(train_data=train_dataiter, eval_data=val_dataiter, optimizer='sgd',
...     optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
...     arg_params=arg_params, aux_params=aux_params,
...     eval_metric='acc', num_epoch=10, begin_epoch=3)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list [].

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()[source]

Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device.

Returns:A pair of dictionaries each mapping parameter names to NDArray values.
Return type:(arg_params, aux_params)

Examples

>>> # An example of getting module parameters.
>>> print mod.get_params()
({'fc2_weight': , 'fc1_weight': ,
'fc3_bias': , 'fc3_weight': ,
'fc2_bias': , 'fc1_bias': }, {})
init_params(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, force_init will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of initializing module parameters.
>>> mod.init_params()
set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
save_params(fname)[source]

Saves model parameters to file.

Parameters:fname (str) – Path to output param file.

Examples

>>> # An example of saving module parameters.
>>> mod.save_params('myfile')
load_params(fname)[source]

Loads model parameters from file.

Parameters:fname (str) – Path to input param file.

Examples

>>> # An example of loading module parameters.
>>> mod.load_params('myfile')
get_states(merge_multi_context=True)[source]

Gets states from all devices

If merge_multi_context is True, returns output of form [out1, out2]. Otherwise, it returns output of the form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All output elements are NDArray.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:
Return type:A list of NDArray or a list of list of NDArray.
set_states(states=None, value=None)[source]

Sets value for states. Only one of states & value can be specified.

Parameters:
  • states (list of list of NDArray) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – A single scalar value for all state arrays.
install_monitor(mon)[source]

Installs monitor on all executors.

prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.

Parameters:
  • data_batch (DataBatch) – The current batch of data for forward computation.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
forward(data_batch, is_train=None)[source]

Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.

Examples

>>> import mxnet as mx
>>> from collections import namedtuple
>>> Batch = namedtuple('Batch', ['data'])
>>> data = mx.sym.Variable('data')
>>> out = data * 2
>>> mod = mx.mod.Module(symbol=out, label_names=None)
>>> mod.bind(data_shapes=[('data', (1, 10))])
>>> mod.init_params()
>>> data1 = [mx.nd.ones((1, 10))]
>>> mod.forward(Batch(data1))
>>> print mod.get_outputs()[0].asnumpy()
[[ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]]
>>> # Forward with data batch of different shape
>>> data2 = [mx.nd.ones((3, 5))]
>>> mod.forward(Batch(data2))
>>> print mod.get_outputs()[0].asnumpy()
[[ 2.  2.  2.  2.  2.]
 [ 2.  2.  2.  2.  2.]
 [ 2.  2.  2.  2.  2.]]
backward(out_grads=None)[source]

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.

Examples

>>> # An example of backward computation.
>>> mod.backward()
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
     5.46342608e-06   8.44196393e-07]
     ...]]
get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it returns out put of form [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:Output
Return type:list of NDArray or list of list of NDArray.

Examples

>>> # An example of getting forward output.
>>> print mod.get_outputs()[0].asnumpy()
[[ 0.09999977  0.10000153  0.10000716  0.10000195  0.09999853  0.09999743
   0.10000272  0.10000113  0.09999088  0.09999888]]
get_input_grads(merge_multi_context=True)[source]

Gets the gradients to the inputs, computed in the previous backward computation.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements have type NDArray. When merge_multi_context is False, those NDArray instances might live on different devices.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the gradients will be collected from multiple devices. A True value indicates that we should merge the collected results so that they look like from a single executor.
Returns:Input gradients.
Return type:list of NDArray or list of list of NDArray

Examples

>>> # An example of getting input gradients.
>>> print mod.get_input_grads()[0].asnumpy()
[[[  1.10182791e-05   5.12257748e-06   4.01927764e-06   8.32566820e-06
    -1.59775993e-06   7.24269375e-06   7.28067835e-06  -1.65902311e-05
    5.46342608e-06   8.44196393e-07]
    ...]]
update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

Examples

>>> # An example of updating module parameters.
>>> mod.init_optimizer(kvstore='local', optimizer='sgd',
...     optimizer_params=(('learning_rate', 0.01), ))
>>> mod.backward()
>>> mod.update()
>>> print mod.get_params()[0]['fc3_weight'].asnumpy()
[[  5.86930104e-03   5.28078526e-03  -8.88729654e-03  -1.08308345e-03
    6.13054074e-03   4.27560415e-03   1.53817423e-03   4.62131854e-03
    4.69872449e-03  -2.42400169e-03   9.94111411e-04   1.12386420e-03
    ...]]
update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) – Evaluation metric to use.
  • labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.
  • pre_sliced (bool) – Whether the labels are already sliced per device (default: False).

Examples

>>> # An example of updating evaluation metric.
>>> mod.forward(data_batch)
>>> mod.update_metric(metric, data_batch.label)
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple) or DataDesc objects) – Typically is data_iter.provide_data. Can also be a list of (data name, data shape).
  • label_shapes (list of (str, tuple) or DataDesc objects) – Typically is data_iter.provide_label. Can also be a list of (label name, label shape).
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).

Examples

>>> # An example of binding symbols.
>>> mod.bind(data_shapes=[('data', (1, 10, 10))])
>>> # Assume train_iter is already created.
>>> mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]
Installs and initializes optimizers, as well as initialize kvstore for
distributed training
Parameters:
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’.
  • optimizer_params (dict) – Defaults to (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Defaults to False, indicates whether to force re-initializing an optimizer if it is already installed.

Examples

>>> # An example of initializing optimizer.
>>> mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.005),))
symbol

Gets the symbol associated with this module.

Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.

class mxnet.module.Module(symbol, data_names=('data', ), label_names=('softmax_label', ), logger=, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]

Module is a basic module that wrap a Symbol. It is functionally the same as the FeedForward model, except under the module API.

Parameters:
  • symbol (Symbol) –
  • data_names (list of str) – Defaults to (‘data’) for a typical model used in image classification.
  • label_names (list of str) – Defaults to (‘softmax_label’) for a typical model used in image classification.
  • logger (Logger) – Defaults to logging.
  • context (Context or list of Context) – Defaults to mx.cpu().
  • work_load_list (list of number) – Default None, indicating uniform workload.
  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.
  • state_names (list of str) – states are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states().
  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.
  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
static load(prefix, epoch, load_optimizer_states=False, **kwargs)[source]

Creates a model from previously saved checkpoint.

Parameters:
  • prefix (str) – path prefix of saved model files. You should have “prefix-symbol.json”, “prefix-xxxx.params”, and optionally “prefix-xxxx.states”, where xxxx is the epoch number.
  • epoch (int) – epoch to load.
  • load_optimizer_states (bool) – whether to load optimizer states. Checkpoint needs to have been made with save_optimizer_states=True.
  • data_names (list of str) – Default is (‘data’) for a typical model used in image classification.
  • label_names (list of str) – Default is (‘softmax_label’) for a typical model used in image classification.
  • logger (Logger) – Default is logging.
  • context (Context or list of Context) – Default is cpu().
  • work_load_list (list of number) – Default None, indicating uniform workload.
  • fixed_param_names (list of str) – Default None, indicating no network parameters are fixed.
save_checkpoint(prefix, epoch, save_optimizer_states=False)[source]

Saves current progress to checkpoint. Use mx.callback.module_checkpoint as epoch_end_callback to save during training.

Parameters:
  • prefix (str) – The file prefix to checkpoint to.
  • epoch (int) – The current epoch number.
  • save_optimizer_states (bool) – Whether to save optimizer states to continue training.
data_names

A list of names for data required by this module.

label_names

A list of names for labels required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Gets data shapes.

Returns:
Return type:A list of (name, shape) pairs.
label_shapes

Gets label shapes.

Returns:The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).
Return type:A list of (name, shape) pairs.
output_shapes

Gets output shapes.

Returns:
Return type:A list of (name, shape) pairs.
get_params()[source]

Gets current parameters.

Returns:A pair of dictionaries each mapping parameter names to NDArray values.
Return type:(arg_params, aux_params)
init_params(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameter and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to NDArray.
  • aux_params (dict) – Dictionary of name to NDArray.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bound for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
reshape(data_shapes, label_shapes=None)[source]

Reshapes the module for new input shapes.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
borrow_optimizer(shared_module)[source]

Borrows optimizer from a shared module. Used in bucketing, where exactly the same optimizer (esp. kvstore) is used.

Parameters:shared_module (Module) –
forward(data_batch, is_train=None)[source]

Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
backward(out_grads=None)[source]

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray. When merge_multi_context is False, those NDArray might live on different devices.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:Output.
Return type:list of NDArray or list of list of NDArray
get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:Input gradients
Return type:list of NDArray or list of list of NDArray
get_states(merge_multi_context=True)[source]

Gets states from all devices.

If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:States
Return type:list of NDArray or list of list of NDArray
set_states(states=None, value=None)[source]

Sets value for states. Only one of the states & value can be specified.

Parameters:
  • states (list of list of NDArrays) – source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – a single scalar value for all state arrays.
update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) – Evaluation metric to use.
  • labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.
  • pre_sliced (bool) – Whether the labels are already sliced per device (default: False).
save_optimizer_states(fname)[source]

Saves optimizer (updater) state to a file.

Parameters:fname (str) – Path to output states file.
load_optimizer_states(fname)[source]

Loads optimizer (updater) state from a file.

Parameters:fname (str) – Path to input states file.
install_monitor(mon)[source]

Installs monitor on all executors.

prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.

Parameters:
  • data_batch (DataBatch) – The current batch of data for forward computation.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
class mxnet.module.BucketingModule(sym_gen, default_bucket_key=None, logger=, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]

This module helps to deal efficiently with varying-length inputs.

Parameters:
  • sym_gen (function) – A function when called with a bucket key, returns a triple (symbol, data_names, label_names).
  • default_bucket_key (str (or any python object)) – The key for the default bucket.
  • logger (Logger) –
  • context (Context or list of Context) – Defaults to mx.cpu()
  • work_load_list (list of number) – Defaults to None, indicating uniform workload.
  • fixed_param_names (list of str) – Defaults to None, indicating no network parameters are fixed.
  • state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()
  • group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.
  • compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Get data shapes.

Returns:
Return type:A list of (name, shape) pairs.
label_shapes

Get label shapes.

Returns:The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).
Return type:A list of (name, shape) pairs.
output_shapes

Gets output shapes.

Returns:
Return type:A list of (name, shape) pairs.
get_params()[source]

Gets current parameters.

Returns:A pair of dictionaries each mapping parameter names to NDArray values.
Return type:(arg_params, aux_params)
set_params(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]

Assigns parameters and aux state values.

Parameters:
  • arg_params (dict) – Dictionary of name to value (NDArray) mapping.
  • aux_params (dict) – Dictionary of name to value (NDArray) mapping.
  • allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If true, will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.

Examples

>>> # An example of setting module parameters.
>>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load)
>>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
init_params(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Defaults to None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Defaults to None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Defaults to False.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
get_states(merge_multi_context=True)[source]

Gets states from all devices.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are NDArray.
Return type:list of NDArrays or list of list of NDArrays
set_states(states=None, value=None)[source]

Sets value for states. Only one of states & values can be specified.

Parameters:
  • states (list of list of NDArrays) – Source states arrays formatted like [[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]].
  • value (number) – A single scalar value for all state arrays.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binding for a BucketingModule means setting up the buckets and binding the executor for the default bucket key. Executors corresponding to other keys are bound afterwards with switch_bucket.

Parameters:
  • data_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • label_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
  • for_training (bool) – Default is True.
  • inputs_need_grad (bool) – Default is False.
  • force_rebind (bool) – Default is False.
  • shared_module (BucketingModule) – Default is None. This value is currently not used.
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
  • bucket_key (str (or any python object)) – bucket key for binding. by default use the default_bucket_key
switch_bucket(bucket_key, data_shapes, label_shapes=None)[source]

Switches to a different bucket. This will change self.curr_module.

Parameters:
  • bucket_key (str (or any python object)) – The key of the target bucket.
  • data_shapes (list of (str, tuple)) – Typically data_batch.provide_data.
  • label_shapes (list of (str, tuple)) – Typically data_batch.provide_label.
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters:
  • kvstore (str or KVStore) – Defaults to ‘local’.
  • optimizer (str or Optimizer) – Defaults to ‘sgd’
  • optimizer_params (dict) – Defaults to ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Defaults to False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
prepare(data_batch, sparse_row_id_fn=None)[source]

Prepares the module for processing a data batch.

Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.

Parameters:
  • data_batch (DataBatch) – The current batch of data for forward computation.
  • sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
forward(data_batch, is_train=None)[source]

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Defaults to None, in which case is_train is take as self.for_training.
backward(out_grads=None)[source]

Backward computation.

update()[source]

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.

get_outputs(merge_multi_context=True)[source]

Gets outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are numpy arrays.
Return type:list of numpy arrays or list of list of numpy arrays
get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Defaults to True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.
Return type:list of NDArrays or list of list of NDArrays
update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
symbol

The symbol of the current bucket being used.

install_monitor(mon)[source]

Installs monitor on all executors

class mxnet.module.SequentialModule(logger=)[source]

A SequentialModule is a container module that can chain multiple modules together.

Note

Building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So, this should be only used as a handy utility.

add(module, **kwargs)[source]

Add a module to the chain.

Parameters:
  • module (BaseModule) – The new module to add.
  • kwargs (**keywords) –

    All the keyword arguments are saved as meta information for the added module. The currently known meta includes

    • take_labels: indicating whether the module expect to
      take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.
Returns:

This function returns self to allow us to easily chain a series of add calls.

Return type:

self

Examples

>>> # An example of addinging two modules to a chain.
>>> seq_mod = mx.mod.SequentialModule()
>>> seq_mod.add(mod1)
>>> seq_mod.add(mod2)
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

Gets data shapes.

Returns:A list of (name, shape) pairs. The data shapes of the first module is the data shape of a SequentialModule.
Return type:list
label_shapes

Gets label shapes.

Returns:A list of (name, shape) pairs. The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).
Return type:list
output_shapes

Gets output shapes.

Returns:A list of (name, shape) pairs. The output shapes of the last module is the output shape of a SequentialModule.
Return type:list
get_params()[source]

Gets current parameters.

Returns:A pair of dictionaries each mapping parameter names to NDArray values. This is a merged dictionary of all the parameters in the modules.
Return type:(arg_params, aux_params)
init_params(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes parameters.

Parameters:
  • initializer (Initializer) –
  • arg_params (dict) – Default None. Existing parameters. This has higher priority than initializer.
  • aux_params (dict) – Default None. Existing auxiliary states. This has higher priority than initializer.
  • allow_missing (bool) – Allow missing values in arg_params and aux_params (if not None). In this case, missing values will be filled with initializer.
  • force_init (bool) – Default False.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. Currently shared module is not supported for SequentialModule.
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default (('learning_rate', 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
forward(data_batch, is_train=None)[source]

Forward computation.

Parameters:
  • data_batch (DataBatch) –
  • is_train (bool) – Default is None, in which case is_train is take as self.for_training.
backward(out_grads=None)[source]

Backward computation.

update()[source]

Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.

get_outputs(merge_multi_context=True)[source]

Gets outputs from a previous forward computation.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [out1, out2]. Otherwise, it is like [[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]. All the output elements are numpy arrays.
Return type:list of NDArray or list of list of NDArray
get_input_grads(merge_multi_context=True)[source]

Gets the gradients with respect to the inputs of the module.

Parameters:merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the outputs will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor.
Returns:If merge_multi_context is True, it is like [grad1, grad2]. Otherwise, it is like [[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]. All the output elements are NDArray.
Return type:list of NDArrays or list of list of NDArrays
update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
install_monitor(mon)[source]

Installs monitor on all executors.

class mxnet.module.PythonModule(data_names, label_names, output_names, logger=)[source]

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • data_names (list of str) – Names of the data expected by the module.
  • label_names (list of str) – Names of the labels expected by the module. Could be None if the module does not need labels.
  • output_names (list of str) – Names of the outputs.
data_names

A list of names for data required by this module.

output_names

A list of names for the outputs of this module.

data_shapes

A list of (name, shape) pairs specifying the data inputs to this module.

label_shapes

A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list []`.

output_shapes

A list of (name, shape) pairs specifying the outputs of this module.

get_params()[source]

Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device. Subclass should override this method if contains parameters.

Returns:
Return type:({}, {}), a pair of empty dict.
init_params(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]

Initializes the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.

Parameters:
  • initializer (Initializer) – Called to initialize parameters if needed.
  • arg_params (dict) – If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
  • aux_params (dict) – If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
  • allow_missing (bool) – If True, params could contain missing values, and the initializer will be called to fill those missing params.
  • force_init (bool) – If True, will force re-initialize even if already initialized.
  • allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
update()[source]

Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.

update_metric(eval_metric, labels, pre_sliced=False)[source]

Evaluates and accumulates evaluation metric on outputs of the last forward computation. Subclass should override this method if needed.

Parameters:
  • eval_metric (EvalMetric) –
  • labels (list of NDArray) – Typically data_batch.label.
bind(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]

Binds the symbols to construct executors. This is necessary before one can perform computation with the module.

Parameters:
  • data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
  • label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
  • for_training (bool) – Default is True. Whether the executors should be bind for training.
  • inputs_need_grad (bool) – Default is False. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules.
  • force_rebind (bool) – Default is False. This function does nothing if the executors are already bound. But with this True, the executors will be forced to rebind.
  • shared_module (Module) – Default is None. This is used in bucketing. When not None, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
  • grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]

Installs and initializes optimizers. By default we do nothing. Subclass should override this method if needed.

Parameters:
  • kvstore (str or KVStore) – Default ‘local’.
  • optimizer (str or Optimizer) – Default ‘sgd’
  • optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
  • force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
class mxnet.module.PythonLossModule(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=, grad_func=None)[source]

A convenient module class that implements many of the module APIs as empty functions.

Parameters:
  • name (str) – Names of the module. The outputs will be named [name + ‘_output’].
  • data_names (list of str) – Defaults to ['data']. Names of the data expected by this module. Should be a list of only one name.
  • label_names (list of str) – Default ['softmax_label']. Names of the labels expected by the module. Should be a list of only one name.
  • grad_func (function) – Optional. If not None, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.
forward(data_batch, is_train=None)[source]

Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.

Parameters:
  • data_batch (DataBatch) – Could be anything with similar API implemented.
  • is_train (bool) – Default is None, which means is_train takes the value of self.for_training.
get_outputs(merge_multi_context=True)[source]

Gets outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.

Parameters:merge_multi_context (bool) – Should always be True, because we do not use multiple contexts for computing.
backward(out_grads=None)[source]

Backward computation.

Parameters:out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
get_input_grads(merge_multi_context=True)[source]

Gets the gradients to the inputs, computed in the previous backward computation.

Parameters:merge_multi_context (bool) – Should always be True because we do not use multiple context for computation.
install_monitor(mon)[source]

Installs monitor on all executors.