Automatic differentiation

MXNet supports automatic differentiation with the autograd package. autograd allows you to differentiate a graph of NDArray operations with the chain rule. This is called define-by-run, i.e., the network is defined on-the-fly by running forward computation. You can define exotic network structures and differentiate them, and each iteration can have a totally different network structure.

import mxnet as mx
from mxnet import autograd

To use autograd, we must first mark variables that require gradient and attach gradient buffers to them:

x = mx.nd.array([[1, 2], [3, 4]])
x.attach_grad()

Now we can define the network while running forward computation by wrapping it inside a record (operations out of record does not define a graph and cannot be differentiated):

with autograd.record():
  y = x * 2
  z = y * x

Let’s backprop with z.backward(), which is equivalent to z.backward(mx.nd.ones_like(z)). When z has more than one entry, z.backward() is equivalent to mx.nd.sum(z).backward():

z.backward()
print(x.grad)

Now, let’s see if this is the expected output.

Here, y = f(x), z = f(y) = f(g(x)) which means y = 2 * x and z = 2 * x * x.

After, doing backprop with z.backward(), we will get gradient dz/dx as follows:

dy/dx = 2, dz/dx = 4 * x

So, we should get x.grad as an array of [[4, 8],[12, 16]].