Add a module to the chain.
Add a module to the chain. An example of addinging two modules to a chain: val seqMod = new SequentialModule() seqMod.add(mod1).add(mod2)
The new module to add.
All the keyword arguments are saved as meta information for the added module. The currently known meta includes
SequentialModule
.This function returns this
to allow us to easily chain a series of add
calls.
Backward computation.
Backward computation.
Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
Bind the symbols to construct executors.
Bind the symbols to construct executors. This is necessary before one can perform computation with the module.
Typically is dataIter.provideData
.
Typically is data_iter.provide_label
.
Default is true
. Whether the executors should be bind for training.
Default is false
.
Whether the gradients to the input data need to be computed.
Typically this is not needed.
But this might be needed when implementing composition of modules.
Default is false
.
This function does nothing if the executors are already binded.
But with this true
, the executors will be forced to rebind.
Default is None
. This is used in bucketing.
When not None
, the shared module essentially corresponds to
a different bucket -- a module with different symbol
but with the same sets of parameters
(e.g. unrolled RNNs with different lengths).
Requirement for gradient accumulation (globally). Can be 'write', 'add', or 'null' (default to 'write').
Bind the symbols to construct executors.
Bind the symbols to construct executors. This is necessary before one can perform computation with the module.
Default is True
. Whether the executors should be bind for training.
Default is False
.
Whether the gradients to the input data need to be computed.
Typically this is not needed.
But this might be needed when implementing composition of modules.
Default is False
. This function does nothing
if the executors are already binded. But with this True
,
the executors will be forced to rebind.
Typically is DataIter.provideData
.
A list of names for data required by this module.
Get data shapes.
Get data shapes.
The data shapes of the first module is the data shape of a SequentialModule.
Train the module parameters.
Train the module parameters.
If not None
, will be used as validation set and evaluate
the performance after each epoch.
Number of epochs to run training.
Extra parameters for training.
Forward computation.
Forward computation.
input data
Default is None
, which means isTrain
takes the value of forTraining
.
Forward computation.
Forward computation.
a batch of data.
Whether it is for training or not.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be collected from multiple devices.
The results will look like [ [grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2] ]
,
those NDArray
might live on different devices.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be merged from multiple devices,
as they look like from a single executor.
The results will look like [grad1, grad2]
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be collected from multiple devices.
The results will look like [ [out1_dev1, out1_dev2], [out2_dev1, out2_dev2] ]
,
those NDArray
might live on different devices.
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be merged from multiple devices,
as they look like from a single executor.
The results will look like [out1, out2]
Get current parameters.
Get current parameters.
(argParams, auxParams), each a Map of name to parameters (in NDArray) mapping.
Install and initialize optimizers.
Install and initialize optimizers.
Default True
, indicating whether we should set rescaleGrad
& idx2name
for optimizer according to executorGroup
Default False
, indicating whether we should force re-initializing
the optimizer in the case an optimizer is already installed.
Initialize the parameters and auxiliary states.
Initialize the parameters and auxiliary states.
Called to initialize parameters if needed.
If not None, should be a dictionary of existing arg_params. Initialization will be copied from that.
If not None, should be a dictionary of existing aux_params. Initialization will be copied from that.
If true, params could contain missing values, and the initializer will be called to fill those missing params.
If true, will force re-initialize even if already initialized.
Whether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when argParams or auxParams contain extra parameters that is not needed by the executor.
Get label shapes.
Get label shapes.
The return value could be null if the module does not need labels, or if the module is not binded for training (in this case, label information is not available).
Load model parameters from file.
Load model parameters from file.
Path to input param file.
IOException
if param file is invalid
A list of names for the outputs of this module.
Get output shapes.
Get output shapes.
The output shapes of the last module is the output shape of a SequentialModule.
Run prediction and collect the outputs.
Run prediction and collect the outputs.
dataIter to do the Inference
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a list [out1, out2, out3]
.
The concatenation process will be like
outputBatches = [ [a1, a2, a3], // batch a [b1, b2, b3] // batch b ] result = [ NDArray, // [a1, b1] NDArray, // [a2, b2] NDArray, // [a3, b3] ]
Where each element is concatenation of the outputs for all the mini-batches.
Run prediction and collect the outputs.
Run prediction and collect the outputs.
Default is -1, indicating running all the batches in the data iterator.
Default is True
, indicating whether we should reset the data iter before start
doing prediction.
The return value will be a nested list like
[ [out1_batch1, out2_batch1, ...], [out1_batch2, out2_batch2, ...] ]
This mode is useful because in some cases (e.g. bucketing),
the module does not necessarily produce the same number of outputs.
Save model parameters to file.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
Run prediction on eval_data
and evaluate the performance according to eval_metric
.
: DataIter
: EvalMetric
Number of batches to run. Default is Integer.MAX_VALUE
,
indicating run until the DataIter
finishes.
Could also be a list of functions.
Default True
,
indicating whether we should reset eval_data
before starting evaluating.
Default 0. For compatibility, this will be passed to callbacks (if any). During training, this will correspond to the training epoch number.
Assign parameter and aux state values.
Assign parameter and aux state values.
argParams : dict
Dictionary of name to value (NDArray
) mapping.
auxParams : dict
Dictionary of name to value (NDArray
) mapping.
allowMissing : bool
If true, params could contain missing values, and the initializer will be
called to fill those missing params.
forceInit : bool
If true, will force re-initialize even if already initialized.
allowExtra : bool
Whether allow extra parameters that are not needed by symbol.
If this is True, no error will be thrown when argParams or auxParams
contain extra parameters that is not needed by the executor.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
Evaluate and accumulate evaluation metric on outputs of the last forward computation.
A SequentialModule is a container module that can chain multiple modules together. Note building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So this should be only used as a handy utility.