Step 4: Necessary components that are not in the network

Data and models are not the only components that you need to train a deep learning model. In this notebook, you will learn about the common components involved in training deep learning models. Here is a list of components necessary for training models in MXNet.

  1. Initialization

  2. Loss functions

    1. Built-in

    2. Custom

  3. Optimizers

  4. Metrics

from mxnet import np, npx,gluon
import mxnet as mx
from mxnet.gluon import nn

device = mx.cpu()


In a previous notebook, you used net.initialize() to initialize the network before a forward pass. Now, you will learn about initialization in a little more detail.

First, define and initialize the sequential network from earlier. After you initialize it, print the parameters using collect_params() method.

net = nn.Sequential()

net.add(nn.Dense(5, in_units=3, activation="relu"),
        nn.Dense(25, activation="relu"),

  (0): Dense(3 -> 5, Activation(relu))
  (1): Dense(-1 -> 25, Activation(relu))
  (2): Dense(-1 -> 2, linear)
params = net.collect_params()

for key, value in params.items():
    print(key, value)

0.weight Parameter (shape=(5, 3), dtype=float32)
0.bias Parameter (shape=(5,), dtype=float32)
1.weight Parameter (shape=(25, -1), dtype=float32)
1.bias Parameter (shape=(25,), dtype=float32)
2.weight Parameter (shape=(2, -1), dtype=float32)
2.bias Parameter (shape=(2,), dtype=float32)
[03:52:10] /work/mxnet/src/storage/ Using Pooled (Naive) StorageManager for CPU

Next, you will print shape and params after the first forward pass.

x = np.random.uniform(-1, 1, (10, 3))
net(x)  # Forward computation

params = net.collect_params()
for key, value in params.items():
    print(key, value)

0.weight Parameter (shape=(5, 3), dtype=float32)
0.bias Parameter (shape=(5,), dtype=float32)
1.weight Parameter (shape=(25, 5), dtype=float32)
1.bias Parameter (shape=(25,), dtype=float32)
2.weight Parameter (shape=(2, 25), dtype=float32)
2.bias Parameter (shape=(2,), dtype=float32)

Built-in Initialization

MXNet makes it easy to initialize by providing many common initializers. A subset that you will be using in the following sections include:

  • Constant

  • Normal

For more information, see Initializers

When you use net.intialize(), MXNet, by default, initializes the weight matrices uniformly by drawing random values with a uniform-distribution between −0.07 and 0.07 and updates the bias parameters by setting them all to 0.

To initialize your network using different built-in types, you have to use the init keyword argument in the initialize() method. Here is an example using constant and normal initialization.

from mxnet import init

# Constant init initializes the weights to be a constant value for all the params
net.initialize(init=init.Constant(3), device=device)
[ 0.0110525  -0.01441184 -0.05202791]
/work/mxnet/python/mxnet/ UserWarning: Parameter 'weight' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  return func(*args, **kwargs)
/work/mxnet/python/mxnet/ UserWarning: Parameter 'bias' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  return func(*args, **kwargs)

If you use Normal to initialize your weights then you will use a normal distribution with a mean of zero and standard deviation of sigma. If you have already initialized the weight but want to reinitialize the weight, set the force_reinit flag to True.

net.initialize(init=init.Normal(sigma=0.2), force_reinit=True, device=device)
[0.04093673 0.04288622 0.44828448]

Components used in a training loop

Till now you have seen how to create an algorithm and how to initialize it using mxnet APIs; additionally you have learned the basics of using mxnet. When you start training the ML algorithm, how do you actually teach the algorithm to learn or train?

There are three main components for training an algorithm.

  1. Loss function: calculates how far the model is from the true distribution

  2. Autograd: the mxnet auto differentiation tool that calculates the gradients to optimize the parameters

  3. Optimizer: updates the parameters based on an optimization algorithm

You have already learned about autograd in the previous notebook. In this notebook, you will learn more about loss functions and optimizers.

Loss function

Loss functions are used to train neural networks and help the algorithm learn from the data. The loss function computes the difference between the output from the neural network and ground truth. This output is used to update the neural network weights during training. Next, you will look at a simple example.

Suppose you have a neural network net and the data is stored in a variable data. The data consists of 5 total records (rows) and two features (columns) and the output from the neural network after the first epoch is given by the variable nn_output.

net = gluon.nn.Dense(1)

nn_input = np.array([[1.2, 0.56],
                     [3.0, 0.72],
                     [0.89, 0.9],
                     [0.89, 2.3],
                     [0.99, 0.52]])

nn_output = net(nn_input)
array([[-0.0928179 ],

The ground truth value of the data is stored in groundtruth_label is

groundtruth_label = np.array([[0.0083],
                             [0.00639]]).reshape(5, 1)

For this problem, you will use the L2 Loss. L2Loss, also called Mean Squared Error, is a regression loss function that computes the squared distances between the target values and the output of the neural network. It is defined as:

\[L = \frac{1}{2N}\sum_i{|label_i − pred_i|)^2}\]

The L2 loss function creates larger gradients for loss values which are farther apart due to the square operator and it also smooths the loss function space.

def L2Loss(output_values, true_values):
    return np.mean((output_values - true_values) ** 2, axis=1) / 2

L2Loss(nn_output, groundtruth_label)
array([0.00511241, 0.02495837, 0.00483336, 0.00586226, 0.0035302 ])

Now, you can do the same thing using the mxnet API

from mxnet.gluon import nn, loss as gloss
loss = gloss.L2Loss()

loss(nn_output, groundtruth_label)
array([0.00511241, 0.02495837, 0.00483336, 0.00586226, 0.0035302 ])

A network can improve by iteratively updating its weights to minimise the loss. Some tasks use a combination of multiple loss functions, but often you will just use one. MXNet Gluon provides a number of the most commonly used loss functions. The choice of your loss function will depend on your network and task. Some common tasks and loss function pairs include:

  • regression: L1Loss, L2Loss

  • classification: SigmoidBinaryCrossEntropyLoss, SoftmaxCrossEntropyLoss

  • embeddings: HingeLoss

Customizing your Loss functions

You can also create custom loss functions using Loss Blocks.

You can inherit the base Loss class and write your own forward method. The backward propagation will be automatically computed by autograd. However, that only holds true if you can build your loss from existing mxnet operators.

from mxnet.gluon.loss import Loss

class custom_L1_loss(Loss):
    def __init__(self, weight=None, batch_axis=0, **kwargs):
        super(custom_L1_loss, self).__init__(weight, batch_axis, **kwargs)

    def forward(self, pred, label):
        l = np.abs(label - pred)
        l = l.reshape(len(l),)
        return l

L1 = custom_L1_loss()
L1(nn_output, groundtruth_label)
array([0.10111789, 0.22342056, 0.09831944, 0.10827978, 0.08402617])
l1(nn_output, groundtruth_label)
array([0.10111789, 0.22342056, 0.09831944, 0.10827978, 0.08402617])


The loss function determines how much to change the parameters based on how far the model is from the groundtruth. Optimizer determines how the model weights or parameters are updated based on the loss function. In Gluon, this optimization step is performed by the gluon.Trainer.

Here is a basic example of how to call the gluon.Trainer method.

from mxnet import optimizer
trainer = gluon.Trainer(net.collect_params(),

When creating a Gluon Trainer, you must provide the trainer object with 1. A collection of parameters that need to be learnt. The collection of parameters will be the weights and biases of your network that you are training. 2. An Optimization algorithm (optimizer) that you want to use for training. This algorithm will be used to update the parameters every training iteration when trainer.step is called. For more information, see optimizers

curr_weight =
[[-0.06880813 -0.01830024]]
batch_size = len(nn_input)
trainer.step(batch_size, ignore_stale_grad=True)
[[-0.06880813 -0.01830024]]
print(curr_weight - net.weight.grad() * 1 / 5)
[[-0.06880813 -0.01830024]]


MXNet includes a metrics API that you can use to evaluate how your model is performing. This is typically used during training to monitor performance on the validation set. MXNet includes many commonly used metrics, a few are listed below:

Now, you will define two arrays for a dummy binary classification example.

# Vector of likelihoods for all the classes
pred = np.array([[0.1, 0.9], [0.05, 0.95], [0.83, 0.17], [0.63, 0.37]])

labels = np.array([1, 1, 0, 1])

Before you can calculate the accuracy of your model, the metric (accuracy) should be instantiated before the training loop

from mxnet.gluon.metric import Accuracy

acc = Accuracy()

To run and calculate the updated accuracy for each batch or epoch, you can call the update() method. This method uses labels and predictions which can be either class indexes or a vector of likelihoods for all of the classes.

acc.update(labels=labels, preds=pred)

Creating custom metrics

In addition to built-in metrics, if you want to create a custom metric, you can use the following skeleton code. This code inherits from the EvalMetric base class.

def MyCustomMetric(EvalMetric):
    def __init__(self):

    def update(self, labels, preds):

Here is an example using the Precision metric. First, define the two values labels and preds.

labels = np.array([0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1])
preds = np.array([0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0])

Next, define the custom metric class precision and instantiate it

from mxnet.gluon.metric import EvalMetric

class precision(EvalMetric):
    def __init__(self):

    def update(self,labels, preds):
        tp_labels = (labels == 1)
        true_positives = sum(preds[tp_labels] == 1)
        fp_labels = (labels == 0)
        false_positives = sum(preds[fp_labels] == 1)
        return true_positives / (true_positives + false_positives)

p = precision()

And finally, call the update method to return the results of precision for your data

p.update(np.array(labels), np.array(preds))

Next steps

Now that you have learned all the components required to train a neural network, you will see how to load your data using the Gluon API in Step 5: Gluon Datasets and DataLoader