The base class of a modules.
This module helps to deal efficiently with varying-length inputs.
DataParallelExecutorGroup is a group of executors that lives on a group of devices.
DataParallelExecutorGroup is a group of executors that lives on a group of devices. This is a helper class used to implement data parallelism. Each mini-batch will be split and run on the devices.
Module is a basic module that wrap a Symbol
.
Module is a basic module that wrap a Symbol
. It is functionally the same
as the FeedForward
model, except under the module API.
A SequentialModule is a container module that can chain multiple modules together.
A SequentialModule is a container module that can chain multiple modules together. Note building a computation graph with this kind of imperative container is less flexible and less efficient than the symbolic graph. So this should be only used as a handy utility.
The base class of a modules. A module represents a computation component. The design purpose of a module is that it abstract a computation "machine", that one can run forward, backward, update parameters, etc. We aim to make the APIs easy to use, especially in the case when we need to use imperative API to work with multiple modules (e.g. stochastic depth network).
A module has several states:
- Initial state. Memory is not allocated yet, not ready for computation yet. - Binded. Shapes for inputs, outputs, and parameters are all known, memory allocated, ready for computation. - Parameter initialized. For modules with parameters, doing computation before initializing the parameters might result in undefined outputs. - Optimizer installed. An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).
In order for a module to interactive with others, a module should be able to report the following information in its raw stage (before binded)
data_names
: list of string indicating the names of required data.output_names
: list of string indicating the names of required outputs.And also the following richer information after binded:
binded
:bool
, indicating whether the memory buffers needed for computation has been allocated.forTraining
: whether the module is binded for training (if binded).paramsInitialized
:bool
, indicating whether the parameters of this modules has been initialized.optimizerInitialized
:bool
, indicating whether an optimizer is defined and initialized.inputsNeedGrad
:bool
, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.dataShapes
: a list of(name, shape)
. In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.labelShapes
: a list of(name, shape)
. This might be[]
if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.outputShapes
: a list of(name, shape)
for outputs of the module.getParams()
: return a tuple(argParams, auxParams)
. Each of those is a dictionary of name toNDArray
mapping. ThoseNDArray
always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters. Therefore, modifyingsetParams(argParams, auxParams)
: assign parameters to the devices doing the computation.initParams(...)
: a more flexible interface to assign or initialize the parameters.bind()
: prepare environment for computation.initOptimizer()
: install optimizer for parameter updating.forward(dataBatch)
: forward operation.backward(outGrads=None)
: backward operation.update()
: update parameters according to installed optimizer.getOutputs()
: get outputs of the previous forward operation.getInputGrads()
: get the gradients with respect to the inputs computed in the previous backward operation.updateMetric(metric, labels)
: update performance metric for the previous forward computed results.symbol
: the underlying symbolic graph for this module (if any) This property is not necessarily constant. For example, forBucketingModule
, this property is simply the *current* symbol being used. For other modules, this value might not be well defined.When those intermediate-level API are implemented properly, the following high-level API will be automatically available for a module:
fit
: train the module parameters on a data setpredict
: run prediction on a data set and collect outputsscore
: run prediction on a data set and evaluate performance