Run backward on all devices.
Run backward on all devices. A backward should be called after
a call to the forward function. Backward cannot be called unless
this.for_training
is True
.
Gradient on the outputs to be propagated back. This parameter is only needed when bind is called on outputs that are not a loss function.
Bind executors on their respective devices.
Bind executors on their respective devices.
DataDesc for input data.
DataDesc for input labels.
Should be a list of (name, shape) tuples, for the shapes of data.
Should be a list of (name, shape) tuples, for the shapes of data.
Note the order is important and should be the same as the order that
the DataIter
provide the data.
Split dataBatch
according to workload and run forward on each devices.
Split dataBatch
according to workload and run forward on each devices.
The hint for the backend, indicating whether we are during training phase.
Default is None
, then the value self.for_training
will be used.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be collected from multiple devices.
The results will look like [ [grad1_dev1, grad1_dev2], [grad2_dev1, grad2_dev2] ]
,
those NDArray
might live on different devices.
Get the gradients to the inputs, computed in the previous backward computation.
Get the gradients to the inputs, computed in the previous backward computation.
In the case when data-parallelism is used,
the grads will be merged from multiple devices,
as they look like from a single executor.
The results will look like [grad1, grad2]
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be collected from multiple devices.
The results will look like [ [out1_dev1, out1_dev2], [out2_dev1, out2_dev2] ]
,
those NDArray
might live on different devices.
Get outputs of the previous forward computation.
Get outputs of the previous forward computation.
In the case when data-parallelism is used,
the outputs will be merged from multiple devices,
as they look like from a single executor.
The results will look like [out1, out2]
Copy data from each executor to arg_params
and aux_params
.
Copy data from each executor to arg_params
and aux_params
.
target parameter arrays
target aux arrays Note this function will inplace update the NDArrays in arg_params and aux_params.
Should be a list of (name, shape) tuples, for the shapes of label.
Should be a list of (name, shape) tuples, for the shapes of label.
Note the order is important and should be the same as the order that
the DataIter
provide the label.
Reshape executors.
Reshape executors.
Assign, i.e.
Assign, i.e. copy parameters to all the executors.
A dictionary of name to NDArray
parameter mapping.
A dictionary of name to NDArray
auxiliary variable mapping.
hether allow extra parameters that are not needed by symbol. If this is True, no error will be thrown when argParams or auxParams contain extra parameters that is not needed by the executor.
Accumulate the performance according to eval_metric
on all devices.
Accumulate the performance according to eval_metric
on all devices.
The metric used for evaluation.
Typically comes from label
of a DataBatch
.
DataParallelExecutorGroup is a group of executors that lives on a group of devices. This is a helper class used to implement data parallelism. Each mini-batch will be split and run on the devices.