layer_norm(data=None, gamma=None, beta=None, axis=None, eps=None, output_mean_var=None)

Layer normalization.

Normalizes the channels of the input tensor by mean and variance, and applies a scale gamma as well as offset beta.

Assume the input has more than one dimension and we normalize along axis 1. We first compute the mean and variance along this axis and then compute the normalized output, which has the same shape as input, as following:

\[out = \frac{data - mean(data, axis)}{\sqrt{var(data, axis) + \epsilon}} * gamma + beta\]

Both gamma and beta are learnable parameters.

Unlike BatchNorm and InstanceNorm, the mean and var are computed along the channel dimension.

Assume the input has size k on axis 1, then both gamma and beta have shape (k,). If output_mean_var is set to be true, then outputs both data_mean and data_std. Note that no gradient will be passed through these two outputs.

The parameter axis specifies which axis of the input shape denotes the ‘channel’ (separately normalized groups). The default is -1, which sets the channel axis to be the last item in the input shape.

  • data (NDArray) – Input data to layer normalization

  • gamma (NDArray) – gamma array

  • beta (NDArray) – beta array

  • axis (int, optional, default='-1') – The axis to perform layer normalization. Usually, this should be be axis of the channel dimension. Negative values means indexing from right to left.

  • eps (float, optional, default=9.99999975e-06) – An epsilon parameter to prevent division by 0.

  • output_mean_var (boolean, optional, default=0) – Output the mean and std calculated along the given axis.


out – The output of this function.

Return type

NDArray or list of NDArrays