mxnet.npx.batch_norm¶
-
batch_norm(x, gamma, beta, running_mean, running_var, eps=0.001, momentum=0.9, fix_gamma=True, use_global_stats=False, output_mean_var=False, axis=1, cudnn_off=False, min_calib_range=None, max_calib_range=None, **kwargs)¶ Batch normalization.
Normalizes a data batch by mean and variance, and applies a scale
gammaas well as offsetbeta.Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis:
\[\begin{split}data\_mean[i] = mean(data[:,i,:,...]) \\ data\_var[i] = var(data[:,i,:,...])\end{split}\]Then compute the normalized output, which has the same shape as input, as following:
\[out[:,i,:,...] = \frac{data[:,i,:,...] - data\_mean[i]}{\sqrt{data\_var[i]+\epsilon}} * gamma[i] + beta[i]\]Both mean and var returns a scalar by treating the input as a vector.
Assume the input has size k on axis 1, then both
gammaandbetahave shape (k,). Ifoutput_mean_varis set to be true, then outputs bothdata_meanand the inverse ofdata_var, which are needed for the backward pass. Note that gradient of these two outputs are blocked.Besides the inputs and the outputs, this operator accepts two auxiliary states,
moving_meanandmoving_var, which are k-length vectors. They are global statistics for the whole dataset, which are updated by:moving_mean = moving_mean * momentum + data_mean * (1 - momentum) moving_var = moving_var * momentum + data_var * (1 - momentum)
If
use_global_statsis set to be true, thenmoving_meanandmoving_varare used instead ofdata_meananddata_varto compute the output. It is often used during inference.The parameter
axisspecifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is 1. Specifying -1 sets the channel axis to be the last item in the input shape.Both
gammaandbetaare learnable parameters. But iffix_gammais true, then setgammato 1 and its gradient to 0.Note
When
fix_gammais set to True, no sparse support is provided. Iffix_gamma isset to False, the sparse tensors will fallback.- Parameters
data (NDArray) – Input data to batch normalization
gamma (NDArray) – gamma array
beta (NDArray) – beta array
moving_mean (NDArray) – running mean of input
moving_var (NDArray) – running variance of input
eps (double, optional, default=0.0010000000474974513) – Epsilon to prevent div 0. Must be no less than CUDNN_BN_MIN_EPSILON defined in cudnn.h when using cudnn (usually 1e-5)
momentum (float, optional, default=0.899999976) – Momentum for moving average
fix_gamma (boolean, optional, default=1) – Fix gamma while training
use_global_stats (boolean, optional, default=0) – Whether use global moving statistics instead of local batch-norm. This will force change batch-norm into a scale shift operator.
output_mean_var (boolean, optional, default=0) – Output the mean and inverse std
axis (int, optional, default='1') – Specify which shape axis the channel is specified
cudnn_off (boolean, optional, default=0) – Do not select CUDNN operator, if available
min_calib_range (float or None, optional, default=None) – The minimum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale. Note: this calib_range is to calib bn output.
max_calib_range (float or None, optional, default=None) – The maximum scalar value in the form of float32 obtained through calibration. If present, it will be used to by quantized batch norm op to calculate primitive scale. Note: this calib_range is to calib bn output.
- Returns
out – The output of this function.
- Return type
NDArray or list of NDArrays
