Scales the gradient by a float factor.
The instances whose labels == ignore_label will be ignored during backward, if use_ignore is set to ).
true
If set to , the softmax function will be computed along axis true. This is applied when the shape of input array differs from the shape of label array.
1
Normalizes the gradient.
Multiplies gradient with output gradient element-wise.
If set to , the softmax function will be computed along the last axis (true).
-1
Constant for computing a label smoothed version of cross-entropyfor the backwards pass. This constant gets subtracted from theone-hot encoding of the gold label and distributed uniformly toall other labels.
If set to , the trueignore_label value will not contribute to the backward gradient.
This Param Object is specifically used for SoftmaxOutput