A function when called with a bucket key, returns a triple
.(symbol, dataNames, labelNames)
The key for the default bucket.
Default is cpu().
Default None
, indicating uniform workload.
Default None
, indicating no network parameters are fixed.
Backward computation.
Backward computation.
Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
Bind the symbols to construct executors.
Bind the symbols to construct executors. This is necessary before one can perform computation with the module.
Typically is dataIter.provideData
.
Typically is dataIter.provideLabel
.
Default is true
. Whether the executors should be bind for training.
Default is false
.
Whether the gradients to the input data need to be computed.
Typically this is not needed.
But this might be needed when implementing composition of modules.
Default is false
.
This function does nothing if the executors are already binded.
But with this true
, the executors will be forced to rebind.
Default is None
. This is used in bucketing.
When not None
, the shared module essentially corresponds to
a different bucket -- a module with different symbol
but with the same sets of parameters
(e.g. unrolled RNNs with different lengths).
Requirement for gradient accumulation (globally). Can be 'write', 'add', or 'null' (default to 'write').
Train the module parameters.
Train the module parameters.
If not None
, will be used as validation set and evaluate
the performance after each epoch.
Number of epochs to run training.
Extra parameters for training.
Forward computation.
Forward computation.
input data
Default is None
, which means is_train
takes the value of for_training
.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be collected from multiple devices.
The results will look like grad1_dev2], [grad2_dev1, grad2_dev2
,
those NDArray
might live on different devices.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be merged from multiple devices,
as they look like from a single executor.
The results will look like [grad1, grad2]
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be collected from multiple devices.
The results will look like out1_dev2], [out2_dev1, out2_dev2
,
those NDArray
might live on different devices.
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be merged from multiple devices,
as they look like from a single executor.
The results will look like [out1, out2]
Get current parameters.
Get current parameters.
(arg_params, aux_params)
, each a dictionary of name to parameters (in
NDArray
) mapping.
(argParams, auxParams)
, a pair of dictionary of name to value mapping.
Install and initialize optimizers.
Install and initialize optimizers.
Default True
, indicating whether we should set rescaleGrad
& idx2name
for optimizer according to executorGroup
Default False
, indicating whether we should force re-initializing
the optimizer in the case an optimizer is already installed.
Initialize the parameters and auxiliary states.
Initialize the parameters and auxiliary states.
Called to initialize parameters if needed.
If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
If true, params could contain missing values, and the initializer will be called to fill those missing params.
If true, will force re-initialize even if already initialized.
Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when argParams or auxParams contain extra parameters that is not needed by the executor.
A list of (name, shape) pairs specifying the label inputs to this module.
A list of (name, shape) pairs specifying the label inputs to this module.
If this module does not accept labels -- either it is a module without loss
function, or it is not binded for training, then this should return an empty
list []
.
Load model parameters from file.
Load model parameters from file.
Path to input param file.
IOException
if param file is invalid
Run prediction and collect the outputs.
Run prediction and collect the outputs.
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a list [out1, out2, out3]
.
Where each element is concatenation of the outputs for all the mini-batches.
Run prediction and collect the outputs.
Run prediction and collect the outputs.
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a nested list like
out2_batch1, ...], [out1_batch2, out2_batch2, ...
This mode is useful because in some cases (e.g. bucketing),
the module does not necessarily produce the same number of outputs.
Prepares a data batch for forward.
Prepares a data batch for forward.
input data
Save model parameters to file.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
: DataIter
: EvalMetric
Number of batches to run. Default is Integer.MAX_VALUE
,
indicating run until the DataIter
finishes.
Could also be a list of functions.
Default True
,
indicating whether we should reset eval_data
before starting evaluating.
Default 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
Assign parameter and aux state values.
Assign parameter and aux state values.
Dictionary of name to value (NDArray
) mapping.
Dictionary of name to value (NDArray
) mapping.
If true, params could contain missing values, and the initializer will be called to fill those missing params.
If true, will force re-initialize even if already initialized.
Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when argParams or auxParams contain extra parameters that is not needed by the executor.
Switches to a different bucket.
Switches to a different bucket. This will change
.this._currModule
The key of the target bucket.
Typically is dataIter.provideData
.
Typically is dataIter.provideLabel
.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
This module helps to deal efficiently with varying-length inputs.