Backward computation.
Backward computation.
Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
Bind the symbols to construct executors.
Bind the symbols to construct executors. This is necessary before one can perform computation with the module.
Typically is DataIter.provideData
.
Typically is DataIter.provideLabel
.
Default is True
. Whether the executors should be bind for training.
Default is False
.
Whether the gradients to the input data need to be computed.
Typically this is not needed.
But this might be needed when implementing composition of modules.
Default is False
. This function does nothing
if the executors are already binded. But with this True
,
the executors will be forced to rebind.
Default is None
. This is used in bucketing. When not None
,
the shared module essentially corresponds to a different bucket
-- a module with different symbol but with the same sets of parameters
(e.g. unrolled RNNs with different lengths).
Requirement for gradient accumulation (globally). Can be 'write', 'add', or 'null' (default to 'write').
Forward computation.
Forward computation.
Could be anything with similar API implemented.
Default is None
, which means isTrain
takes the value of this.forTraining
.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be collected from multiple devices.
The results will look like [ [grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2] ]
,
those NDArray
might live on different devices.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be merged from multiple devices,
as they look like from a single executor.
The results will look like [grad1, grad2]
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be collected from multiple devices.
The results will look like [ [out1_dev1, out1_dev2], [out2_dev1, out2_dev2] ]
,
those NDArray
might live on different devices.
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be merged from multiple devices,
as they look like from a single executor.
The results will look like [out1, out2]
Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.
Get parameters, those are potentially copies of the the actual parameters used to do computation on the device.
(argParams, auxParams)
, a pair of dictionary of name to value mapping.
Initialize the parameters and auxiliary states.
Initialize the parameters and auxiliary states.
: Initializer Called to initialize parameters if needed. argParams : dict If not None, should be a dictionary of existing arg_params. Initialization will be copied from that. auxParams : dict If not None, should be a dictionary of existing aux_params. Initialization will be copied from that. allowMissing : bool If true, params could contain missing values, and the initializer will be called to fill those missing params. forceInit : bool If true, will force re-initialize even if already initialized. allowExtra : bool Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when argParams or auxParams contain extra parameters that is not needed by the executor.
A list of (name, shape) pairs specifying the label inputs to this module.
A list of (name, shape) pairs specifying the label inputs to this module.
If this module does not accept labels -- either it is a module without loss
function, or it is not binded for training, then this should return an empty
list []
.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
Typically DataBatch.label
.
Bind the symbols to construct executors.
Bind the symbols to construct executors. This is necessary before one can perform computation with the module.
Default is True
. Whether the executors should be bind for training.
Default is False
.
Whether the gradients to the input data need to be computed.
Typically this is not needed.
But this might be needed when implementing composition of modules.
Default is False
. This function does nothing
if the executors are already binded. But with this True
,
the executors will be forced to rebind.
Typically is DataIter.provideData
.
Train the module parameters.
Train the module parameters.
If not None
, will be used as validation set and evaluate
the performance after each epoch.
Number of epochs to run training.
Extra parameters for training.
Forward computation.
Forward computation.
a batch of data.
Whether it is for training or not.
Load model parameters from file.
Load model parameters from file.
Path to input param file.
IOException
if param file is invalid
Run prediction and collect the outputs.
Run prediction and collect the outputs.
dataIter to do the Inference
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a list [out1, out2, out3]
.
The concatenation process will be like
outputBatches = [ [a1, a2, a3], // batch a [b1, b2, b3] // batch b ] result = [ NDArray, // [a1, b1] NDArray, // [a2, b2] NDArray, // [a3, b3] ]
Where each element is concatenation of the outputs for all the mini-batches.
Run prediction and collect the outputs.
Run prediction and collect the outputs.
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a nested list like
[ [out1_batch1, out2_batch1, ...], [out1_batch2, out2_batch2, ...] ]
This mode is useful because in some cases (e.g. bucketing),
the module does not necessarily produce the same number of outputs.
Save model parameters to file.
Save model parameters to file.
Path to output param file.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
: DataIter
: EvalMetric
Number of batches to run. Default is Integer.MAX_VALUE
,
indicating run until the DataIter
finishes.
Could also be a list of functions.
Default True
,
indicating whether we should reset eval_data
before starting evaluating.
Default 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
Assign parameter and aux state values.
Assign parameter and aux state values.
argParams : dict
Dictionary of name to value (NDArray
) mapping.
auxParams : dict
Dictionary of name to value (NDArray
) mapping.
allowMissing : bool
If true, params could contain missing values, and the initializer will be
called to fill those missing params.
forceInit : bool
If true, will force re-initialize even if already initialized.
allowExtra : bool
Whether allow extra parameters that are not needed by symbol.
If this is True, no error will be thrown when argParams or auxParams
contain extra parameters that is not needed by the executor.
The base class of a modules. A module represents a computation component. The design purpose of a module is that it abstract a computation "machine", that one can run forward, backward, update parameters, etc. We aim to make the APIs easy to use, especially in the case when we need to use imperative API to work with multiple modules (e.g. stochastic depth network).
A module has several states:
- Initial state. Memory is not allocated yet, not ready for computation yet. - Binded. Shapes for inputs, outputs, and parameters are all known, memory allocated, ready for computation. - Parameter initialized. For modules with parameters, doing computation before initializing the parameters might result in undefined outputs. - Optimizer installed. An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).
In order for a module to interactive with others, a module should be able to report the following information in its raw stage (before binded)
data_names
: list of string indicating the names of required data.output_names
: list of string indicating the names of required outputs.And also the following richer information after binded:
binded
:bool
, indicating whether the memory buffers needed for computation has been allocated.forTraining
: whether the module is binded for training (if binded).paramsInitialized
:bool
, indicating whether the parameters of this modules has been initialized.optimizerInitialized
:bool
, indicating whether an optimizer is defined and initialized.inputsNeedGrad
:bool
, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.dataShapes
: a list of(name, shape)
. In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.labelShapes
: a list of(name, shape)
. This might be[]
if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.outputShapes
: a list of(name, shape)
for outputs of the module.getParams()
: return a tuple(argParams, auxParams)
. Each of those is a dictionary of name toNDArray
mapping. ThoseNDArray
always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. Therefore, modifyingsetParams(argParams, auxParams)
: assign parameters to the devices doing the computation.initParams(...)
: a more flexible interface to assign or initialize the parameters.bind()
: prepare environment for computation.initOptimizer()
: install optimizer for parameter updating.forward(dataBatch)
: forward operation.backward(outGrads=None)
: backward operation.update()
: update parameters according to installed optimizer.getOutputs()
: get outputs of the previous forward operation.getInputGrads()
: get the gradients with respect to the inputs computed in the previous backward operation.updateMetric(metric, labels)
: update performance metric for the previous forward computed results.symbol
: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, forBucketingModule
, this property is simply the *current* symbol being used. For other modules, this value might not be well defined.When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:
fit
: train the module parameters on a data setpredict
: run prediction on a data set and collect outputsscore
: run prediction on a data set and evaluate performance