Module API¶
Overview¶
The module API, defined in the module
(or simply mod
) package, provides an
intermediate and high-level interface for performing computation with a
Symbol
. One can roughly think a module is a machine which can execute a
program defined by a Symbol
.
The module.Module
accepts a Symbol
as the input.
>>> data = mx.sym.Variable('data')
>>> fc1 = mx.sym.FullyConnected(data, name='fc1', num_hidden=128)
>>> act1 = mx.sym.Activation(fc1, name='relu1', act_type="relu")
>>> fc2 = mx.sym.FullyConnected(act1, name='fc2', num_hidden=10)
>>> out = mx.sym.SoftmaxOutput(fc2, name = 'softmax')
>>> mod = mx.mod.Module(out) # create a module by given a Symbol
Assume there is a valid MXNet data iterator nd_iter
. We can initialize the
module:
>>> mod.bind(data_shapes=nd_iter.provide_data,
>>> label_shapes=nd_iter.provide_label) # create memory by given input shapes
>>> mod.init_params() # initial parameters with the default random initializer
Now the module is able to compute. We can call high-level API to train and predict:
>>> mod.fit(nd_iter, num_epoch=10, ...) # train
>>> mod.predict(new_nd_iter) # predict on new data
or use intermediate APIs to perform step-by-step computations
>>> mod.forward(data_batch) # forward on the provided data batch
>>> mod.backward() # backward to calculate the gradients
>>> mod.update() # update parameters using the default optimizer
A detailed tutorial is available at Module - Neural network training and inference.
The module
package provides several modules:
BaseModule |
The base class of a module. |
Module |
Module is a basic module that wrap a Symbol. |
SequentialModule |
A SequentialModule is a container module that can chain multiple modules together. |
BucketingModule |
This module helps to deal efficiently with varying-length inputs. |
PythonModule |
A convenient module class that implements many of the module APIs as empty functions. |
PythonLossModule |
A convenient module class that implements many of the module APIs as empty functions. |
We summarize the interface for each class in the following sections.
The BaseModule
class¶
The BaseModule
is the base class for all other module classes. It defines the
interface each module class should provide.
Initialize memory¶
BaseModule.bind |
Binds the symbols to construct executors. |
Get and set parameters¶
BaseModule.init_params |
Initializes the parameters and auxiliary states. |
BaseModule.set_params |
Assigns parameter and aux state values. |
BaseModule.get_params |
Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device. |
BaseModule.save_params |
Saves model parameters to file. |
BaseModule.load_params |
Loads model parameters from file. |
Train and predict¶
BaseModule.fit |
Trains the module parameters. |
BaseModule.score |
Runs prediction on eval_data and evaluates the performance according to the given eval_metric . |
BaseModule.iter_predict |
Iterates over predictions. |
BaseModule.predict |
Runs prediction and collects the outputs. |
Forward and backward¶
BaseModule.forward |
Forward computation. |
BaseModule.backward |
Backward computation. |
BaseModule.forward_backward |
A convenient function that calls both forward and backward . |
Update parameters¶
BaseModule.init_optimizer |
Installs and initializes optimizers, as well as initialize kvstore for |
BaseModule.update |
Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. |
BaseModule.update_metric |
Evaluates and accumulates evaluation metric on outputs of the last forward computation. |
Input and output¶
BaseModule.data_names |
A list of names for data required by this module. |
BaseModule.output_names |
A list of names for the outputs of this module. |
BaseModule.data_shapes |
A list of (name, shape) pairs specifying the data inputs to this module. |
BaseModule.label_shapes |
A list of (name, shape) pairs specifying the label inputs to this module. |
BaseModule.output_shapes |
A list of (name, shape) pairs specifying the outputs of this module. |
BaseModule.get_outputs |
Gets outputs of the previous forward computation. |
BaseModule.get_input_grads |
Gets the gradients to the inputs, computed in the previous backward computation. |
Others¶
BaseModule.get_states |
Gets states from all devices |
BaseModule.set_states |
Sets value for states. |
BaseModule.install_monitor |
Installs monitor on all executors. |
BaseModule.symbol |
Gets the symbol associated with this module. |
Other build-in modules¶
Besides the basic interface defined in BaseModule
, each module class supports
additional functionality. We summarize them in this section.
Class Module
¶
Module.load |
Creates a model from previously saved checkpoint. |
Module.save_checkpoint |
Saves current progress to checkpoint. |
Module.reshape |
Reshapes the module for new input shapes. |
Module.borrow_optimizer |
Borrows optimizer from a shared module. |
Module.save_optimizer_states |
Saves optimizer (updater) state to a file. |
Module.load_optimizer_states |
Loads optimizer (updater) state from a file. |
Class BucketModule
¶
BucketingModule.switch_bucket |
Switches to a different bucket. |
Class SequentialModule
¶
SequentialModule.add |
Add a module to the chain. |
API Reference¶
-
class
mxnet.module.
BaseModule
(logger=)[source]¶ The base class of a module.
A module represents a computation component. One can think of module as a computation machine. A module can execute forward and backward passes and update parameters in a model. We aim to make the APIs easy to use, especially in the case when we need to use the imperative API to work with multiple modules (e.g. stochastic depth network).
A module has several states:
- Initial state: Memory is not allocated yet, so the module is not ready for computation yet.
- Binded: Shapes for inputs, outputs, and parameters are all known, memory has been allocated, and the module is ready for computation.
- Parameters are initialized: For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.
- Optimizer is installed: An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).
In order for a module to interact with others, it must be able to report the following information in its initial state (before binding):
- data_names: list of type string indicating the names of the required input data.
- output_names: list of type string indicating the names of the required outputs.
After binding, a module should be able to report the following richer information:
- state information
- binded: bool, indicates whether the memory buffers needed for computation have been allocated.
- for_training: whether the module is bound for training.
- params_initialized: bool, indicates whether the parameters of this module have been initialized.
- optimizer_initialized: bool, indicates whether an optimizer is defined and initialized.
- inputs_need_grad: bool, indicates whether gradients with respect to the input data are needed. Might be useful when implementing composition of modules.
- input/output information
- data_shapes: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelism, the data arrays might not be of the same shape as viewed from the external world.
- label_shapes: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not bound for training.
- output_shapes: a list of (name, shape) for outputs of the module.
- parameters (for modules with parameters)
- get_params(): return a tuple (arg_params, aux_params). Each of those
is a dictionary of name to
NDArray
mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. set_params(arg_params, aux_params)
: assign parameters to the devices doing the computation.init_params(...)
: a more flexible interface to assign or initialize the parameters.
- get_params(): return a tuple (arg_params, aux_params). Each of those
is a dictionary of name to
- setup
- bind(): prepare environment for computation.
- init_optimizer(): install optimizer for parameter updating.
- prepare(): prepare the module based on the current data batch.
- computation
- forward(data_batch): forward operation.
- backward(out_grads=None): backward operation.
- update(): update parameters according to installed optimizer.
- get_outputs(): get outputs of the previous forward operation.
- get_input_grads(): get the gradients with respect to the inputs computed in the previous backward operation.
- update_metric(metric, labels, pre_sliced=False): update performance metric for the previous forward computed results.
- other properties (mostly for backward compatibility)
- symbol: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, for BucketingModule, this property is simply the current symbol being used. For other modules, this value might not be well defined.
When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:
- fit: train the module parameters on a data set.
- predict: run prediction on a data set and collect outputs.
- score: run prediction on a data set and evaluate performance.
Examples
>>> # An example of creating a mxnet module. >>> import mxnet as mx >>> data = mx.symbol.Variable('data') >>> fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128) >>> act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu") >>> fc2 = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64) >>> act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu") >>> fc3 = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10) >>> out = mx.symbol.SoftmaxOutput(fc3, name = 'softmax') >>> mod = mx.mod.Module(out)
-
score
(eval_data, eval_metric, num_batch=None, batch_end_callback=None, score_end_callback=None, reset=True, epoch=0, sparse_row_id_fn=None)[source]¶ Runs prediction on
eval_data
and evaluates the performance according to the giveneval_metric
.Checkout Module Tutorial to see a end-to-end use-case.
Parameters: - eval_data (DataIter) – Evaluation data to run prediction on.
- eval_metric (EvalMetric or list of EvalMetrics) – Evaluation metric to use.
- num_batch (int) – Number of batches to run. Defaults to
None
, indicating run until the DataIter finishes. - batch_end_callback (function) – Could also be a list of functions.
- reset (bool) – Defaults to
True
. Indicates whether we should reset eval_data before starting evaluating. - epoch (int) – Defaults to 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
- sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
Examples
>>> # An example of using score for prediction. >>> # Evaluate accuracy on val_dataiter >>> metric = mx.metric.Accuracy() >>> mod.score(val_dataiter, metric) >>> mod.score(val_dataiter, ['mse', 'acc'])
-
iter_predict
(eval_data, num_batch=None, reset=True, sparse_row_id_fn=None)[source]¶ Iterates over predictions.
Examples
>>> for pred, i_batch, batch in module.iter_predict(eval_data): ... # pred is a list of outputs from the module ... # i_batch is a integer ... # batch is the data batch from the data iterator
Parameters: - eval_data (DataIter) – Evaluation data to run prediction on.
- num_batch (int) – Default is
None
, indicating running all the batches in the data iterator. - reset (bool) – Default is
True
, indicating whether we should reset the data iter before start doing prediction. - sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
-
predict
(eval_data, num_batch=None, merge_batches=True, reset=True, always_output_list=False, sparse_row_id_fn=None)[source]¶ Runs prediction and collects the outputs.
When merge_batches is
True
(by default), the return value will be a list[out1, out2, out3]
, where each element is formed by concatenating the outputs for all the mini-batches. When always_output_list isFalse
(as by default), then in the case of a single output, out1 is returned instead of[out1]
.When merge_batches is
False
, the return value will be a nested list like[[out1_batch1, out2_batch1], [out1_batch2], ...]
. This mode is useful because in some cases (e.g. bucketing), the module does not necessarily produce the same number of outputs.The objects in the results have type NDArray. If you need to work with a numpy array, just call
.asnumpy()
on each NDArray.Parameters: - eval_data (DataIter or NDArray or numpy array) – Evaluation data to run prediction on.
- num_batch (int) – Defaults to
None
, indicates running all the batches in the data iterator. - merge_batches (bool) – Defaults to
True
, see above for return values. - reset (bool) – Defaults to
True
, indicates whether we should reset the data iter before doing prediction. - always_output_list (bool) – Defaults to
False
, see above for return values. - sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
Returns: Prediction results.
Return type: list of NDArray or list of list of NDArray
Examples
>>> # An example of using `predict` for prediction. >>> # Predict on the first 10 batches of val_dataiter >>> mod.predict(eval_data=val_dataiter, num_batch=10)
-
fit
(train_data, eval_data=None, eval_metric='acc', epoch_end_callback=None, batch_end_callback=None, kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), eval_end_callback=None, eval_batch_end_callback=None, initializer=, arg_params=None, aux_params=None, allow_missing=False, force_rebind=False, force_init=False, begin_epoch=0, num_epoch=None, validation_metric=None, monitor=None, sparse_row_id_fn=None)[source]¶ Trains the module parameters.
Checkout Module Tutorial to see a end-to-end use-case.
Parameters: - train_data (DataIter) – Train DataIter.
- eval_data (DataIter) – If not
None
, will be used as validation set and the performance after each epoch will be evaluated. - eval_metric (str or EvalMetric) – Defaults to ‘accuracy’. The performance measure used to display during training. Other possible predefined metrics are: ‘ce’ (CrossEntropy), ‘f1’, ‘mae’, ‘mse’, ‘rmse’, ‘top_k_accuracy’.
- epoch_end_callback (function or list of functions) – Each callback will be called with the current epoch, symbol, arg_params and aux_params.
- batch_end_callback (function or list of function) – Each callback will be called with a BatchEndParam.
- kvstore (str or KVStore) – Defaults to ‘local’.
- optimizer (str or Optimizer) – Defaults to ‘sgd’.
- optimizer_params (dict) – Defaults to
(('learning_rate', 0.01),)
. The parameters for the optimizer constructor. The default value is not a dict, just to avoid pylint warning on dangerous default values. - eval_end_callback (function or list of function) – These will be called at the end of each full evaluation, with the metrics over the entire evaluation set.
- eval_batch_end_callback (function or list of function) – These will be called at the end of each mini-batch during evaluation.
- initializer (Initializer) – The initializer is called to initialize the module parameters when they are not already initialized.
- arg_params (dict) – Defaults to
None
, if notNone
, should be existing parameters from a trained model or loaded from a checkpoint (previously saved model). In this case, the value here will be used to initialize the module parameters, unless they are already initialized by the user via a call to init_params or fit. arg_params has a higher priority than initializer. - aux_params (dict) – Defaults to
None
. Similar to arg_params, except for auxiliary states. - allow_missing (bool) – Defaults to
False
. Indicates whether to allow missing parameters when arg_params and aux_params are notNone
. If this isTrue
, then the missing parameters will be initialized via the initializer. - force_rebind (bool) – Defaults to
False
. Whether to force rebinding the executors if already bound. - force_init (bool) – Defaults to
False
. Indicates whether to force initialization even if the parameters are already initialized. - begin_epoch (int) – Defaults to 0. Indicates the starting epoch. Usually, if resumed from a checkpoint saved at a previous training phase at epoch N, then this value should be N+1.
- num_epoch (int) – Number of epochs for training.
- sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
Examples
>>> # An example of using fit for training. >>> # Assume training dataIter and validation dataIter are ready >>> # Assume loading a previously checkpointed model >>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 3) >>> mod.fit(train_data=train_dataiter, eval_data=val_dataiter, optimizer='sgd', ... optimizer_params={'learning_rate':0.01, 'momentum': 0.9}, ... arg_params=arg_params, aux_params=aux_params, ... eval_metric='acc', num_epoch=10, begin_epoch=3)
-
data_names
¶ A list of names for data required by this module.
-
output_names
¶ A list of names for the outputs of this module.
-
data_shapes
¶ A list of (name, shape) pairs specifying the data inputs to this module.
-
label_shapes
¶ A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list
[]
.
-
output_shapes
¶ A list of (name, shape) pairs specifying the outputs of this module.
-
get_params
()[source]¶ Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device.
Returns: A pair of dictionaries each mapping parameter names to NDArray values. Return type: (arg_params, aux_params)
Examples
>>> # An example of getting module parameters. >>> print mod.get_params() ({'fc2_weight':
, 'fc1_weight': 'fc3_bias':, , 'fc3_weight': 'fc2_bias':, , 'fc1_bias': }, {})
-
init_params
(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]¶ Initializes the parameters and auxiliary states.
Parameters: - initializer (Initializer) – Called to initialize parameters if needed.
- arg_params (dict) – If not
None
, should be a dictionary of existing arg_params. Initialization will be copied from that. - aux_params (dict) – If not
None
, should be a dictionary of existing aux_params. Initialization will be copied from that. - allow_missing (bool) – If
True
, params could contain missing values, and the initializer will be called to fill those missing params. - force_init (bool) – If
True
, force_init will force re-initialize even if already initialized. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
Examples
>>> # An example of initializing module parameters. >>> mod.init_params()
-
set_params
(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]¶ Assigns parameter and aux state values.
Parameters: - arg_params (dict) – Dictionary of name to value (NDArray) mapping.
- aux_params (dict) – Dictionary of name to value (NDArray) mapping.
- allow_missing (bool) – If
True
, params could contain missing values, and the initializer will be called to fill those missing params. - force_init (bool) – If
True
, will force re-initialize even if already initialized. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
Examples
>>> # An example of setting module parameters. >>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load) >>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
-
save_params
(fname)[source]¶ Saves model parameters to file.
Parameters: fname (str) – Path to output param file. Examples
>>> # An example of saving module parameters. >>> mod.save_params('myfile')
-
load_params
(fname)[source]¶ Loads model parameters from file.
Parameters: fname (str) – Path to input param file. Examples
>>> # An example of loading module parameters. >>> mod.load_params('myfile')
-
get_states
(merge_multi_context=True)[source]¶ Gets states from all devices
If merge_multi_context is
True
, returns output of form[out1, out2]
. Otherwise, it returns output of the form[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All output elements are NDArray.Parameters: merge_multi_context (bool) – Defaults to True
. In the case when data-parallelism is used, the states will be collected from multiple devices. ATrue
value indicates that we should merge the collected results so that they look like from a single executor.Returns: Return type: A list of NDArray
or a list of list ofNDArray
.
-
set_states
(states=None, value=None)[source]¶ Sets value for states. Only one of states & value can be specified.
Parameters: - states (list of list of NDArray) – Source states arrays formatted like
[[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]]
. - value (number) – A single scalar value for all state arrays.
- states (list of list of NDArray) – Source states arrays formatted like
-
prepare
(data_batch, sparse_row_id_fn=None)[source]¶ Prepares the module for processing a data batch.
Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.
When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.
Parameters: - data_batch (DataBatch) – The current batch of data for forward computation.
- sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
-
forward
(data_batch, is_train=None)[source]¶ Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.
Parameters: - data_batch (DataBatch) – Could be anything with similar API implemented.
- is_train (bool) – Default is
None
, which means is_train takes the value ofself.for_training
.
Examples
>>> import mxnet as mx >>> from collections import namedtuple >>> Batch = namedtuple('Batch', ['data']) >>> data = mx.sym.Variable('data') >>> out = data * 2 >>> mod = mx.mod.Module(symbol=out, label_names=None) >>> mod.bind(data_shapes=[('data', (1, 10))]) >>> mod.init_params() >>> data1 = [mx.nd.ones((1, 10))] >>> mod.forward(Batch(data1)) >>> print mod.get_outputs()[0].asnumpy() [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]] >>> # Forward with data batch of different shape >>> data2 = [mx.nd.ones((3, 5))] >>> mod.forward(Batch(data2)) >>> print mod.get_outputs()[0].asnumpy() [[ 2. 2. 2. 2. 2.] [ 2. 2. 2. 2. 2.] [ 2. 2. 2. 2. 2.]]
-
backward
(out_grads=None)[source]¶ Backward computation.
Parameters: out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function. Examples
>>> # An example of backward computation. >>> mod.backward() >>> print mod.get_input_grads()[0].asnumpy() [[[ 1.10182791e-05 5.12257748e-06 4.01927764e-06 8.32566820e-06 -1.59775993e-06 7.24269375e-06 7.28067835e-06 -1.65902311e-05 5.46342608e-06 8.44196393e-07] ...]]
-
get_outputs
(merge_multi_context=True)[source]¶ Gets outputs of the previous forward computation.
If merge_multi_context is
True
, it is like[out1, out2]
. Otherwise, it returns out put of form[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements have type NDArray. When merge_multi_context isFalse
, those NDArray instances might live on different devices.Parameters: merge_multi_context (bool) – Defaults to True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicates that we should merge the collected results so that they look like from a single executor.Returns: Output Return type: list of NDArray or list of list of NDArray. Examples
>>> # An example of getting forward output. >>> print mod.get_outputs()[0].asnumpy() [[ 0.09999977 0.10000153 0.10000716 0.10000195 0.09999853 0.09999743 0.10000272 0.10000113 0.09999088 0.09999888]]
-
get_input_grads
(merge_multi_context=True)[source]¶ Gets the gradients to the inputs, computed in the previous backward computation.
If merge_multi_context is
True
, it is like[grad1, grad2]
. Otherwise, it is like[[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]
. All the output elements have type NDArray. When merge_multi_context isFalse
, those NDArray instances might live on different devices.Parameters: merge_multi_context (bool) – Defaults to True
. In the case when data-parallelism is used, the gradients will be collected from multiple devices. ATrue
value indicates that we should merge the collected results so that they look like from a single executor.Returns: Input gradients. Return type: list of NDArray or list of list of NDArray Examples
>>> # An example of getting input gradients. >>> print mod.get_input_grads()[0].asnumpy() [[[ 1.10182791e-05 5.12257748e-06 4.01927764e-06 8.32566820e-06 -1.59775993e-06 7.24269375e-06 7.28067835e-06 -1.65902311e-05 5.46342608e-06 8.44196393e-07] ...]]
-
update
()[source]¶ Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.
When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.
Examples
>>> # An example of updating module parameters. >>> mod.init_optimizer(kvstore='local', optimizer='sgd', ... optimizer_params=(('learning_rate', 0.01), )) >>> mod.backward() >>> mod.update() >>> print mod.get_params()[0]['fc3_weight'].asnumpy() [[ 5.86930104e-03 5.28078526e-03 -8.88729654e-03 -1.08308345e-03 6.13054074e-03 4.27560415e-03 1.53817423e-03 4.62131854e-03 4.69872449e-03 -2.42400169e-03 9.94111411e-04 1.12386420e-03 ...]]
-
update_metric
(eval_metric, labels, pre_sliced=False)[source]¶ Evaluates and accumulates evaluation metric on outputs of the last forward computation.
Parameters: - eval_metric (EvalMetric) – Evaluation metric to use.
- labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.
- pre_sliced (bool) – Whether the labels are already sliced per device (default: False).
Examples
>>> # An example of updating evaluation metric. >>> mod.forward(data_batch) >>> mod.update_metric(metric, data_batch.label)
-
bind
(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]¶ Binds the symbols to construct executors. This is necessary before one can perform computation with the module.
Parameters: - data_shapes (list of (str, tuple) or DataDesc objects) – Typically is
data_iter.provide_data
. Can also be a list of (data name, data shape). - label_shapes (list of (str, tuple) or DataDesc objects) – Typically is
data_iter.provide_label
. Can also be a list of (label name, label shape). - for_training (bool) – Default is
True
. Whether the executors should be bind for training. - inputs_need_grad (bool) – Default is
False
. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules. - force_rebind (bool) – Default is
False
. This function does nothing if the executors are already bound. But with thisTrue
, the executors will be forced to rebind. - shared_module (Module) – Default is
None
. This is used in bucketing. When notNone
, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths). - grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
Examples
>>> # An example of binding symbols. >>> mod.bind(data_shapes=[('data', (1, 10, 10))]) >>> # Assume train_iter is already created. >>> mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
- data_shapes (list of (str, tuple) or DataDesc objects) – Typically is
-
init_optimizer
(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]¶ - Installs and initializes optimizers, as well as initialize kvstore for
- distributed training
Parameters: - kvstore (str or KVStore) – Defaults to ‘local’.
- optimizer (str or Optimizer) – Defaults to ‘sgd’.
- optimizer_params (dict) – Defaults to
(('learning_rate', 0.01),)
. The default value is not a dictionary, just to avoid pylint warning of dangerous default values. - force_init (bool) – Defaults to
False
, indicates whether to force re-initializing an optimizer if it is already installed.
Examples
>>> # An example of initializing optimizer. >>> mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.005),))
-
symbol
¶ Gets the symbol associated with this module.
Except for Module, for other types of modules (e.g. BucketingModule), this property might not be a constant throughout its life time. Some modules might not even be associated with any symbols.
-
class
mxnet.module.
Module
(symbol, data_names=('data', ), label_names=('softmax_label', ), logger=, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]¶ Module is a basic module that wrap a Symbol. It is functionally the same as the FeedForward model, except under the module API.
Parameters: - symbol (Symbol) –
- data_names (list of str) – Defaults to (‘data’) for a typical model used in image classification.
- label_names (list of str) – Defaults to (‘softmax_label’) for a typical model used in image classification.
- logger (Logger) – Defaults to logging.
- context (Context or list of Context) – Defaults to
mx.cpu()
. - work_load_list (list of number) – Default
None
, indicating uniform workload. - fixed_param_names (list of str) – Default
None
, indicating no network parameters are fixed. - state_names (list of str) – states are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states().
- group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.
- compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
-
static
load
(prefix, epoch, load_optimizer_states=False, **kwargs)[source]¶ Creates a model from previously saved checkpoint.
Parameters: - prefix (str) – path prefix of saved model files. You should have “prefix-symbol.json”, “prefix-xxxx.params”, and optionally “prefix-xxxx.states”, where xxxx is the epoch number.
- epoch (int) – epoch to load.
- load_optimizer_states (bool) – whether to load optimizer states. Checkpoint needs to have been made with save_optimizer_states=True.
- data_names (list of str) – Default is (‘data’) for a typical model used in image classification.
- label_names (list of str) – Default is (‘softmax_label’) for a typical model used in image classification.
- logger (Logger) – Default is logging.
- context (Context or list of Context) – Default is
cpu()
. - work_load_list (list of number) – Default
None
, indicating uniform workload. - fixed_param_names (list of str) – Default
None
, indicating no network parameters are fixed.
-
save_checkpoint
(prefix, epoch, save_optimizer_states=False, remove_amp_cast=True)[source]¶ Saves current progress to checkpoint. Use mx.callback.module_checkpoint as epoch_end_callback to save during training.
Parameters: - prefix (str) – The file prefix to checkpoint to.
- epoch (int) – The current epoch number.
- save_optimizer_states (bool) – Whether to save optimizer states to continue training.
-
data_names
¶ A list of names for data required by this module.
-
label_names
¶ A list of names for labels required by this module.
-
output_names
¶ A list of names for the outputs of this module.
-
data_shapes
¶ Gets data shapes.
Returns: Return type: A list of (name, shape) pairs.
-
label_shapes
¶ Gets label shapes.
Returns: The return value could be None
if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).Return type: A list of (name, shape) pairs.
-
output_shapes
¶ Gets output shapes.
Returns: Return type: A list of (name, shape) pairs.
-
get_params
()[source]¶ Gets current parameters.
Returns: A pair of dictionaries each mapping parameter names to NDArray values. Return type: (arg_params, aux_params)
-
init_params
(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]¶ Initializes the parameters and auxiliary states.
Parameters: - initializer (Initializer) – Called to initialize parameters if needed.
- arg_params (dict) – If not
None
, should be a dictionary of existing arg_params. Initialization will be copied from that. - aux_params (dict) – If not
None
, should be a dictionary of existing aux_params. Initialization will be copied from that. - allow_missing (bool) – If
True
, params could contain missing values, and the initializer will be called to fill those missing params. - force_init (bool) – If
True
, will force re-initialize even if already initialized. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
-
set_params
(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]¶ Assigns parameter and aux state values.
Parameters: - arg_params (dict) – Dictionary of name to NDArray.
- aux_params (dict) – Dictionary of name to NDArray.
- allow_missing (bool) – If
True
, params could contain missing values, and the initializer will be called to fill those missing params. - force_init (bool) – If
True
, will force re-initialize even if already initialized. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
Examples
>>> # An example of setting module parameters. >>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load) >>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
-
bind
(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]¶ Binds the symbols to construct executors. This is necessary before one can perform computation with the module.
Parameters: - data_shapes (list of (str, tuple)) – Typically is
data_iter.provide_data
. - label_shapes (list of (str, tuple)) – Typically is
data_iter.provide_label
. - for_training (bool) – Default is
True
. Whether the executors should be bound for training. - inputs_need_grad (bool) – Default is
False
. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules. - force_rebind (bool) – Default is
False
. This function does nothing if the executors are already bound. But with thisTrue
, the executors will be forced to rebind. - shared_module (Module) – Default is
None
. This is used in bucketing. When notNone
, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths).
- data_shapes (list of (str, tuple)) – Typically is
-
reshape
(data_shapes, label_shapes=None)[source]¶ Reshapes the module for new input shapes.
Parameters: - data_shapes (list of (str, tuple)) – Typically is
data_iter.provide_data
. - label_shapes (list of (str, tuple)) – Typically is
data_iter.provide_label
.
- data_shapes (list of (str, tuple)) – Typically is
-
init_optimizer
(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]¶ Installs and initializes optimizers.
Parameters: - kvstore (str or KVStore) – Default ‘local’.
- optimizer (str or Optimizer) – Default ‘sgd’
- optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
- force_init (bool) – Default
False
, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
-
borrow_optimizer
(shared_module)[source]¶ Borrows optimizer from a shared module. Used in bucketing, where exactly the same optimizer (esp. kvstore) is used.
Parameters: shared_module (Module) –
-
forward
(data_batch, is_train=None)[source]¶ Forward computation. It supports data batches with different shapes, such as different batch sizes or different image sizes. If reshaping of data batch relates to modification of symbol or module, such as changing image layout ordering or switching from training to predicting, module rebinding is required.
See also
Parameters: - data_batch (DataBatch) – Could be anything with similar API implemented.
- is_train (bool) – Default is
None
, which meansis_train
takes the value ofself.for_training
.
-
backward
(out_grads=None)[source]¶ Backward computation.
See also
Parameters: out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
-
update
()[source]¶ Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch.
When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.
See also
-
get_outputs
(merge_multi_context=True)[source]¶ Gets outputs of the previous forward computation.
If
merge_multi_context
isTrue
, it is like[out1, out2]
. Otherwise, it is like[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements are NDArray. When merge_multi_context is False, those NDArray might live on different devices.Parameters: merge_multi_context (bool) – Default is True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: Output. Return type: list of NDArray or list of list of NDArray
-
get_input_grads
(merge_multi_context=True)[source]¶ Gets the gradients with respect to the inputs of the module.
If
merge_multi_context
isTrue
, it is like[grad1, grad2]
. Otherwise, it is like[[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]
. All the output elements are NDArray.Parameters: merge_multi_context (bool) – Default is True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: Input gradients Return type: list of NDArray or list of list of NDArray
-
get_states
(merge_multi_context=True)[source]¶ Gets states from all devices.
If merge_multi_context is
True
, it is like[out1, out2]
. Otherwise, it is like[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements are NDArray.Parameters: merge_multi_context (bool) – Default is True
. In the case when data-parallelism is used, the states will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: States Return type: list of NDArray or list of list of NDArray
-
set_states
(states=None, value=None)[source]¶ Sets value for states. Only one of the states & value can be specified.
Parameters: - states (list of list of NDArrays) – source states arrays formatted like
[[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]]
. - value (number) – a single scalar value for all state arrays.
- states (list of list of NDArrays) – source states arrays formatted like
-
update_metric
(eval_metric, labels, pre_sliced=False)[source]¶ Evaluates and accumulates evaluation metric on outputs of the last forward computation.
See also
Parameters: - eval_metric (EvalMetric) – Evaluation metric to use.
- labels (list of NDArray if pre_sliced parameter is set to False,) – list of lists of NDArray otherwise. Typically data_batch.label.
- pre_sliced (bool) – Whether the labels are already sliced per device (default: False).
-
save_optimizer_states
(fname)[source]¶ Saves optimizer (updater) state to a file.
Parameters: fname (str) – Path to output states file.
-
load_optimizer_states
(fname)[source]¶ Loads optimizer (updater) state from a file.
Parameters: fname (str) – Path to input states file.
-
prepare
(data_batch, sparse_row_id_fn=None)[source]¶ Prepares the module for processing a data batch.
Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.
When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, the update() updates the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. The prepare function is used to broadcast row_sparse parameters with the next batch of data.
Parameters: - data_batch (DataBatch) – The current batch of data for forward computation.
- sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
-
class
mxnet.module.
BucketingModule
(sym_gen, default_bucket_key=None, logger=, context=cpu(0), work_load_list=None, fixed_param_names=None, state_names=None, group2ctxs=None, compression_params=None)[source]¶ This module helps to deal efficiently with varying-length inputs.
Parameters: - sym_gen (function) – A function when called with a bucket key, returns a triple
(symbol, data_names, label_names)
. - default_bucket_key (str (or any python object)) – The key for the default bucket.
- logger (Logger) –
- context (Context or list of Context) – Defaults to
mx.cpu()
- work_load_list (list of number) – Defaults to
None
, indicating uniform workload. - fixed_param_names (list of str) – Defaults to
None
, indicating no network parameters are fixed. - state_names (list of str) – States are similar to data and label, but not provided by data iterator. Instead they are initialized to 0 and can be set by set_states()
- group2ctxs (dict of str to context or list of context,) – or list of dict of str to context Default is None. Mapping the ctx_group attribute to the context assignment.
- compression_params (dict) – Specifies type of gradient compression and additional arguments depending on the type of compression being used. For example, 2bit compression requires a threshold. Arguments would then be {‘type’:‘2bit’, ‘threshold’:0.5} See mxnet.KVStore.set_gradient_compression method for more details on gradient compression.
-
data_names
¶ A list of names for data required by this module.
-
output_names
¶ A list of names for the outputs of this module.
-
data_shapes
¶ Get data shapes.
Returns: Return type: A list of (name, shape) pairs.
-
label_shapes
¶ Get label shapes.
Returns: The return value could be None
if the module does not need labels, or if the module is not bound for training (in this case, label information is not available).Return type: A list of (name, shape) pairs.
-
output_shapes
¶ Gets output shapes.
Returns: Return type: A list of (name, shape) pairs.
-
get_params
()[source]¶ Gets current parameters.
Returns: A pair of dictionaries each mapping parameter names to NDArray values. Return type: (arg_params, aux_params)
-
set_params
(arg_params, aux_params, allow_missing=False, force_init=True, allow_extra=False)[source]¶ Assigns parameters and aux state values.
Parameters: - arg_params (dict) – Dictionary of name to value (NDArray) mapping.
- aux_params (dict) – Dictionary of name to value (NDArray) mapping.
- allow_missing (bool) – If true, params could contain missing values, and the initializer will be called to fill those missing params.
- force_init (bool) – If true, will force re-initialize even if already initialized.
- allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
Examples
>>> # An example of setting module parameters. >>> sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, n_epoch_load) >>> mod.set_params(arg_params=arg_params, aux_params=aux_params)
-
init_params
(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]¶ Initializes parameters.
Parameters: - initializer (Initializer) –
- arg_params (dict) – Defaults to
None
. Existing parameters. This has higher priority than initializer. - aux_params (dict) – Defaults to
None
. Existing auxiliary states. This has higher priority than initializer. - allow_missing (bool) – Allow missing values in arg_params and aux_params (if not
None
). In this case, missing values will be filled with initializer. - force_init (bool) – Defaults to
False
. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
-
get_states
(merge_multi_context=True)[source]¶ Gets states from all devices.
Parameters: merge_multi_context (bool) – Default is True. In the case when data-parallelism is used, the states will be collected from multiple devices. A True value indicate that we should merge the collected results so that they look like from a single executor. Returns: If merge_multi_context is True
, it is like[out1, out2]
. Otherwise, it is like[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements are NDArray.Return type: list of NDArrays or list of list of NDArrays
-
set_states
(states=None, value=None)[source]¶ Sets value for states. Only one of states & values can be specified.
Parameters: - states (list of list of NDArrays) – Source states arrays formatted like
[[state1_dev1, state1_dev2], [state2_dev1, state2_dev2]]
. - value (number) – A single scalar value for all state arrays.
- states (list of list of NDArrays) – Source states arrays formatted like
-
bind
(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]¶ Binding for a BucketingModule means setting up the buckets and binding the executor for the default bucket key. Executors corresponding to other keys are bound afterwards with switch_bucket.
Parameters: - data_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
- label_shapes (list of (str, tuple)) – This should correspond to the symbol for the default bucket.
- for_training (bool) – Default is
True
. - inputs_need_grad (bool) – Default is
False
. - force_rebind (bool) – Default is
False
. - shared_module (BucketingModule) – Default is
None
. This value is currently not used. - grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
- bucket_key (str (or any python object)) – bucket key for binding. by default use the default_bucket_key
-
switch_bucket
(bucket_key, data_shapes, label_shapes=None)[source]¶ Switches to a different bucket. This will change
self.curr_module
.Parameters: - bucket_key (str (or any python object)) – The key of the target bucket.
- data_shapes (list of (str, tuple)) – Typically
data_batch.provide_data
. - label_shapes (list of (str, tuple)) – Typically
data_batch.provide_label
.
-
init_optimizer
(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]¶ Installs and initializes optimizers.
Parameters: - kvstore (str or KVStore) – Defaults to ‘local’.
- optimizer (str or Optimizer) – Defaults to ‘sgd’
- optimizer_params (dict) – Defaults to ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
- force_init (bool) – Defaults to
False
, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
-
prepare
(data_batch, sparse_row_id_fn=None)[source]¶ Prepares the module for processing a data batch.
Usually involves switching bucket and reshaping. For modules that contain row_sparse parameters in KVStore, it prepares the row_sparse parameters based on the sparse_row_id_fn.
Parameters: - data_batch (DataBatch) – The current batch of data for forward computation.
- sparse_row_id_fn (A callback function) – The function takes data_batch as an input and returns a dict of str -> NDArray. The resulting dict is used for pulling row_sparse parameters from the kvstore, where the str key is the name of the param, and the value is the row id of the param to pull.
-
forward
(data_batch, is_train=None)[source]¶ Forward computation.
Parameters: - data_batch (DataBatch) –
- is_train (bool) – Defaults to
None
, in which case is_train is take asself.for_training
.
-
update
()[source]¶ Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.
When KVStore is used to update parameters for multi-device or multi-machine training, a copy of the parameters are stored in KVStore. Note that for row_sparse parameters, this function does update the copy of parameters in KVStore, but doesn’t broadcast the updated parameters to all devices / machines. Please call prepare to broadcast row_sparse parameters with the next batch of data.
-
get_outputs
(merge_multi_context=True)[source]¶ Gets outputs from a previous forward computation.
Parameters: merge_multi_context (bool) – Defaults to True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: If merge_multi_context is True
, it is like[out1, out2]
. Otherwise, it is like[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements are numpy arrays.Return type: list of numpy arrays or list of list of numpy arrays
-
get_input_grads
(merge_multi_context=True)[source]¶ Gets the gradients with respect to the inputs of the module.
Parameters: merge_multi_context (bool) – Defaults to True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: If merge_multi_context is True
, it is like[grad1, grad2]
. Otherwise, it is like[[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]
. All the output elements are NDArray.Return type: list of NDArrays or list of list of NDArrays
-
update_metric
(eval_metric, labels, pre_sliced=False)[source]¶ Evaluates and accumulates evaluation metric on outputs of the last forward computation.
Parameters: - eval_metric (EvalMetric) –
- labels (list of NDArray) – Typically
data_batch.label
.
-
symbol
¶ The symbol of the current bucket being used.
- sym_gen (function) – A function when called with a bucket key, returns a triple
-
class
mxnet.module.
SequentialModule
(logger=)[source]¶ A SequentialModule is a container module that can chain multiple modules together.
Note
Building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So, this should be only used as a handy utility.
-
add
(module, **kwargs)[source]¶ Add a module to the chain.
Parameters: - module (BaseModule) – The new module to add.
- kwargs (
**keywords
) –All the keyword arguments are saved as meta information for the added module. The currently known meta includes
- take_labels: indicating whether the module expect to
- take labels when doing computation. Note any module in the chain can take labels (not necessarily only the top most one), and they all take the same labels passed from the original data batch for the SequentialModule.
Returns: This function returns self to allow us to easily chain a series of add calls.
Return type: self
Examples
>>> # An example of addinging two modules to a chain. >>> seq_mod = mx.mod.SequentialModule() >>> seq_mod.add(mod1) >>> seq_mod.add(mod2)
-
data_names
¶ A list of names for data required by this module.
-
output_names
¶ A list of names for the outputs of this module.
-
data_shapes
¶ Gets data shapes.
Returns: A list of (name, shape) pairs. The data shapes of the first module is the data shape of a SequentialModule. Return type: list
-
label_shapes
¶ Gets label shapes.
Returns: A list of (name, shape) pairs. The return value could be None if the module does not need labels, or if the module is not bound for training (in this case, label information is not available). Return type: list
-
output_shapes
¶ Gets output shapes.
Returns: A list of (name, shape) pairs. The output shapes of the last module is the output shape of a SequentialModule. Return type: list
-
get_params
()[source]¶ Gets current parameters.
Returns: A pair of dictionaries each mapping parameter names to NDArray values. This is a merged dictionary of all the parameters in the modules. Return type: (arg_params, aux_params)
-
init_params
(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]¶ Initializes parameters.
Parameters: - initializer (Initializer) –
- arg_params (dict) – Default
None
. Existing parameters. This has higher priority than initializer. - aux_params (dict) – Default
None
. Existing auxiliary states. This has higher priority than initializer. - allow_missing (bool) – Allow missing values in arg_params and aux_params (if not
None
). In this case, missing values will be filled with initializer. - force_init (bool) – Default
False
. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
-
bind
(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]¶ Binds the symbols to construct executors. This is necessary before one can perform computation with the module.
Parameters: - data_shapes (list of (str, tuple)) – Typically is data_iter.provide_data.
- label_shapes (list of (str, tuple)) – Typically is data_iter.provide_label.
- for_training (bool) – Default is
True
. Whether the executors should be bind for training. - inputs_need_grad (bool) – Default is
False
. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules. - force_rebind (bool) – Default is
False
. This function does nothing if the executors are already bound. But with thisTrue
, the executors will be forced to rebind. - shared_module (Module) – Default is
None
. Currently shared module is not supported for SequentialModule. - grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
-
init_optimizer
(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]¶ Installs and initializes optimizers.
Parameters: - kvstore (str or KVStore) – Default ‘local’.
- optimizer (str or Optimizer) – Default ‘sgd’
- optimizer_params (dict) – Default
(('learning_rate', 0.01),)
. The default value is not a dictionary, just to avoid pylint warning of dangerous default values. - force_init (bool) – Default
False
, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
-
forward
(data_batch, is_train=None)[source]¶ Forward computation.
Parameters: - data_batch (DataBatch) –
- is_train (bool) – Default is
None
, in which case is_train is take asself.for_training
.
-
update
()[source]¶ Updates parameters according to installed optimizer and the gradient computed in the previous forward-backward cycle.
-
get_outputs
(merge_multi_context=True)[source]¶ Gets outputs from a previous forward computation.
Parameters: merge_multi_context (bool) – Default is True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: If merge_multi_context is True
, it is like[out1, out2]
. Otherwise, it is like[[out1_dev1, out1_dev2], [out2_dev1, out2_dev2]]
. All the output elements are numpy arrays.Return type: list of NDArray or list of list of NDArray
-
get_input_grads
(merge_multi_context=True)[source]¶ Gets the gradients with respect to the inputs of the module.
Parameters: merge_multi_context (bool) – Default is True
. In the case when data-parallelism is used, the outputs will be collected from multiple devices. ATrue
value indicate that we should merge the collected results so that they look like from a single executor.Returns: If merge_multi_context is True
, it is like[grad1, grad2]
. Otherwise, it is like[[grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2]]
. All the output elements are NDArray.Return type: list of NDArrays or list of list of NDArrays
-
update_metric
(eval_metric, labels, pre_sliced=False)[source]¶ Evaluates and accumulates evaluation metric on outputs of the last forward computation.
Parameters: - eval_metric (EvalMetric) –
- labels (list of NDArray) – Typically
data_batch.label
.
-
-
class
mxnet.module.
PythonModule
(data_names, label_names, output_names, logger=)[source]¶ A convenient module class that implements many of the module APIs as empty functions.
Parameters: - data_names (list of str) – Names of the data expected by the module.
- label_names (list of str) – Names of the labels expected by the module. Could be
None
if the module does not need labels. - output_names (list of str) – Names of the outputs.
-
data_names
¶ A list of names for data required by this module.
-
output_names
¶ A list of names for the outputs of this module.
-
data_shapes
¶ A list of (name, shape) pairs specifying the data inputs to this module.
-
label_shapes
¶ A list of (name, shape) pairs specifying the label inputs to this module. If this module does not accept labels – either it is a module without loss function, or it is not bound for training, then this should return an empty list
[]`
.
-
output_shapes
¶ A list of (name, shape) pairs specifying the outputs of this module.
-
get_params
()[source]¶ Gets parameters, those are potentially copies of the the actual parameters used to do computation on the device. Subclass should override this method if contains parameters.
Returns: Return type: ({}, {})
, a pair of empty dict.
-
init_params
(initializer=, arg_params=None, aux_params=None, allow_missing=False, force_init=False, allow_extra=False)[source]¶ Initializes the parameters and auxiliary states. By default this function does nothing. Subclass should override this method if contains parameters.
Parameters: - initializer (Initializer) – Called to initialize parameters if needed.
- arg_params (dict) – If not
None
, should be a dictionary of existing arg_params. Initialization will be copied from that. - aux_params (dict) – If not
None
, should be a dictionary of existing aux_params. Initialization will be copied from that. - allow_missing (bool) – If
True
, params could contain missing values, and the initializer will be called to fill those missing params. - force_init (bool) – If
True
, will force re-initialize even if already initialized. - allow_extra (boolean, optional) – Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when arg_params or aux_params contain extra parameters that is not needed by the executor.
-
update
()[source]¶ Updates parameters according to the installed optimizer and the gradients computed in the previous forward-backward batch. Currently we do nothing here. Subclass should override this method if contains parameters.
-
update_metric
(eval_metric, labels, pre_sliced=False)[source]¶ Evaluates and accumulates evaluation metric on outputs of the last forward computation. Subclass should override this method if needed.
Parameters: - eval_metric (EvalMetric) –
- labels (list of NDArray) – Typically
data_batch.label
.
-
bind
(data_shapes, label_shapes=None, for_training=True, inputs_need_grad=False, force_rebind=False, shared_module=None, grad_req='write')[source]¶ Binds the symbols to construct executors. This is necessary before one can perform computation with the module.
Parameters: - data_shapes (list of (str, tuple)) – Typically is
data_iter.provide_data
. - label_shapes (list of (str, tuple)) – Typically is
data_iter.provide_label
. - for_training (bool) – Default is
True
. Whether the executors should be bind for training. - inputs_need_grad (bool) – Default is
False
. Whether the gradients to the input data need to be computed. Typically this is not needed. But this might be needed when implementing composition of modules. - force_rebind (bool) – Default is
False
. This function does nothing if the executors are already bound. But with thisTrue
, the executors will be forced to rebind. - shared_module (Module) – Default is
None
. This is used in bucketing. When notNone
, the shared module essentially corresponds to a different bucket – a module with different symbol but with the same sets of parameters (e.g. unrolled RNNs with different lengths). - grad_req (str, list of str, dict of str to str) – Requirement for gradient accumulation. Can be ‘write’, ‘add’, or ‘null’ (default to ‘write’). Can be specified globally (str) or for each argument (list, dict).
- data_shapes (list of (str, tuple)) – Typically is
-
init_optimizer
(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.01), ), force_init=False)[source]¶ Installs and initializes optimizers. By default we do nothing. Subclass should override this method if needed.
Parameters: - kvstore (str or KVStore) – Default ‘local’.
- optimizer (str or Optimizer) – Default ‘sgd’
- optimizer_params (dict) – Default ((‘learning_rate’, 0.01),). The default value is not a dictionary, just to avoid pylint warning of dangerous default values.
- force_init (bool) – Default False, indicating whether we should force re-initializing the optimizer in the case an optimizer is already installed.
-
class
mxnet.module.
PythonLossModule
(name='pyloss', data_names=('data', ), label_names=('softmax_label', ), logger=, grad_func=None)[source]¶ A convenient module class that implements many of the module APIs as empty functions.
Parameters: - name (str) – Names of the module. The outputs will be named [name + ‘_output’].
- data_names (list of str) – Defaults to
['data']
. Names of the data expected by this module. Should be a list of only one name. - label_names (list of str) – Default
['softmax_label']
. Names of the labels expected by the module. Should be a list of only one name. - grad_func (function) – Optional. If not
None
, should be a function that takes scores and labels, both of type NDArray, and return the gradients with respect to the scores according to this loss function. The return value could be a numpy array or an NDArray.
-
forward
(data_batch, is_train=None)[source]¶ Forward computation. Here we do nothing but to keep a reference to the scores and the labels so that we can do backward computation.
Parameters: - data_batch (DataBatch) – Could be anything with similar API implemented.
- is_train (bool) – Default is
None
, which means is_train takes the value ofself.for_training
.
-
get_outputs
(merge_multi_context=True)[source]¶ Gets outputs of the previous forward computation. As a output loss module, we treat the inputs to this module as scores, and simply return them.
Parameters: merge_multi_context (bool) – Should always be True
, because we do not use multiple contexts for computing.
-
backward
(out_grads=None)[source]¶ Backward computation.
Parameters: out_grads (NDArray or list of NDArray, optional) – Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.