Overview

MXNet.jl Namespace

Most the functions and types in MXNet.jl are organized in a flat namespace. Because many some functions are conflicting with existing names in the Julia Base module, we wrap them all in a mx module. The convention of accessing the MXNet.jl interface is the to use the mx. prefix explicitly:

julia> using MXNet

julia> x = mx.zeros(2, 3)             # MXNet NDArray
2×3 mx.NDArray{Float32} @ CPU0:
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> y = zeros(eltype(x), size(x))  # Julia Array
2×3 Array{Float32,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> copy!(y, x)                    # Overloaded function in Julia Base
2×3 Array{Float32,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> z = mx.ones(size(x), mx.gpu()) # MXNet NDArray on GPU
2×3 mx.NDArray{Float32} @ GPU0:
 1.0  1.0  1.0
 1.0  1.0  1.0

julia> mx.copy!(z, y)                 # Same as copy!(z, y)
2×3 mx.NDArray{Float32} @ GPU0:
 0.0  0.0  0.0
 0.0  0.0  0.0

Note functions like size, copy! that is extensively overloaded for various types works out of the box. But functions like zeros and ones will be ambiguous, so we always use the mx. prefix. If you prefer, the mx. prefix can be used explicitly for all MXNet.jl functions, including size and copy! as shown in the last line.

Low Level Interface

NDArray

NDArray is the basic building blocks of the actual computations in MXNet. It is like a Julia Array object, with some important differences listed here:

  • The actual data could live on different Context (e.g. GPUs). For some contexts, iterating into the elements one by one is very slow, thus indexing into NDArray is not recommanded in general. The easiest way to inspect the contents of an NDArray is to use the copy function to copy the contents as a Julia Array.
  • Operations on NDArray (including basic arithmetics and neural network related operators) are executed in parallel with automatic dependency tracking to ensure correctness.
  • There is no generics in NDArray, the eltype is always mx.MX_float. Because for applications in machine learning, single precision floating point numbers are typical a best choice balancing between precision, speed and portability. Also since libmxnet is designed to support multiple languages as front-ends, it is much simpler to implement with a fixed data type.

While most of the computation is hidden in libmxnet by operators corresponding to various neural network layers. Getting familiar with the NDArray API is useful for implementing Optimizer or customized operators in Julia directly.

The followings are common ways to create NDArray objects:

  • NDArray(undef, shape...; ctx = context, writable = true): create an uninitialized array of a given shape on a specific device. For example, NDArray(undef, 2, 3), NDArray(undef, 2, 3, ctx = mx.gpu(2)).
  • NDArray(undef, shape; ctx = context, writable = true)
  • NDArray{T}(undef, shape...; ctx = context, writable = true): create an uninitialized with the given type T.
  • mx.zeros(shape[, context]) and mx.ones(shape[, context]): similar to the Julia's built-in zeros and ones.
  • mx.copy(jl_arr, context): copy the contents of a Julia Array to a specific device.

Most of the convenient functions like size, length, ndims, eltype on array objects should work out-of-the-box. Although indexing is not supported, it is possible to take slices:

julia> using MXNet

julia> a = mx.ones(2, 3)
2×3 NDArray{Float32,2} @ cpu0:
 1.0f0  1.0f0  1.0f0
 1.0f0  1.0f0  1.0f0

julia> b = mx.slice(a, 1:2)
2×2 NDArray{Float32,2} @ cpu0:
 1.0f0  1.0f0
 1.0f0  1.0f0

julia> b[:] = 2
2

julia> a
2×3 NDArray{Float32,2} @ cpu0:
 2.0f0  2.0f0  1.0f0
 2.0f0  2.0f0  1.0f0

A slice is a sub-region sharing the same memory with the original NDArray object. A slice is always a contiguous piece of memory, so only slicing on the last dimension is supported. The example above also shows a way to set the contents of an NDArray.

julia> using MXNet

julia> mx.srand(42)
┌ Warning: `mx.srand` is deprecated, use `mx.seed!` instead.
└ @ MXNet.mx /work/mxnet/julia/src/random.jl:86

julia> a = NDArray(undef, 2, 3)
2×3 NDArray{Float32,2} @ cpu0:
 2.2f-44  3.36f-43  0.0f0
 0.0f0    0.0f0     4.5573f-41

julia> a[:] = 0.5              # set all elements to a scalar
0.5

julia> a[:] = rand(size(a))    # set contents with a Julia Array
ERROR: rand(rng, dims) is discontinued; try rand(rng, Float64, dims)

julia> copy!(a, rand(size(a))) # set value by copying a Julia Array
ERROR: rand(rng, dims) is discontinued; try rand(rng, Float64, dims)

julia> b = NDArray(undef, size(a))
2×3 NDArray{Float32,2} @ cpu0:
 2.2f-44  3.14f-43  0.0f0
 0.0f0    0.0f0     4.5573f-41

julia> b[:] = a                # copying and assignment between NDArrays
2×3 NDArray{Float32,2} @ cpu0:
 0.5f0  0.5f0  0.5f0
 0.5f0  0.5f0  0.5f0

Note due to the intrinsic design of the Julia language, a normal assignment

a = b

does not mean copying the contents of b to a. Instead, it just make the variable a pointing to a new object, which is b. Similarly, inplace arithmetics does not work as expected:

julia> using MXNet

julia> a = mx.ones(2)
2-element NDArray{Float32,1} @ cpu0:
 1.0f0
 1.0f0

julia> r = a           # keep a reference to a
2-element NDArray{Float32,1} @ cpu0:
 1.0f0
 1.0f0

julia> b = mx.ones(2)
2-element NDArray{Float32,1} @ cpu0:
 1.0f0
 1.0f0

julia> a += b          # translates to a = a + b
2-element NDArray{Float32,1} @ cpu0:
 2.0f0
 2.0f0

julia> a
2-element NDArray{Float32,1} @ cpu0:
 2.0f0
 2.0f0

julia> r
2-element NDArray{Float32,1} @ cpu0:
 1.0f0
 1.0f0

As we can see, a has expected value, but instead of inplace updating, a new NDArray is created and a is set to point to this new object. If we look at r, which still reference to the old a, its content has not changed. There is currently no way in Julia to overload the operators like += to get customized behavior.

Instead, you will need to write a[:] = a + b, or if you want real inplace += operation, MXNet.jl provides a simple macro @mx.inplace:

julia> @mx.inplace a += b
2-element NDArray{Float32,1} @ cpu0:
 3.0f0
 3.0f0

julia> macroexpand(:(@mx.inplace a += b))
ERROR: MethodError: no method matching macroexpand(::Expr)
Closest candidates are:
  macroexpand(!Matched::Module, !Matched::Any; recursive) at expr.jl:91

As we can see, it translate the += operator to an explicit add_to! function call, which invokes into libmxnet to add the contents of b into a directly. For example, the following is the update rule in the SGD Optimizer (both gradient and weight W are NDArray objects):

@inplace W .+= -η .* (∇ + λ .* W)

Note there is no much magic in mx.inplace: it only does a shallow translation. In the SGD update rule example above, the computation like scaling the gradient by grad_scale and adding the weight decay all create temporary NDArray objects. To mitigate this issue, libmxnet has a customized memory allocator designed specifically to handle this kind of situations. The following snippet does a simple benchmark on allocating temp NDArray vs. pre-allocating:

using Benchmark
using MXNet

N_REP = 1000
SHAPE = (128, 64)
CTX   = mx.cpu()
LR    = 0.1

function inplace_op()
  weight = mx.zeros(SHAPE, CTX)
  grad   = mx.ones(SHAPE, CTX)

  # pre-allocate temp objects
  grad_lr = NDArray(undef, SHAPE, ctx = CTX)

  for i = 1:N_REP
    copy!(grad_lr, grad)
    @mx.inplace grad_lr .*= LR
    @mx.inplace weight -= grad_lr
  end
  return weight
end

function normal_op()
  weight = mx.zeros(SHAPE, CTX)
  grad   = mx.ones(SHAPE, CTX)

  for i = 1:N_REP
    weight[:] -= LR * grad
  end
  return weight
end

# make sure the results are the same
@assert(maximum(abs(copy(normal_op() - inplace_op()))) < 1e-6)

println(compare([inplace_op, normal_op], 100))

The comparison on my laptop shows that normal_op while allocating a lot of temp NDArray in the loop (the performance gets worse when increasing N_REP), is only about twice slower than the pre-allocated one.

Row Function Average Relative Replications
1 "inplace_op" 0.0074854 1.0 100
2 "normal_op" 0.0174202 2.32723 100

So it will usually not be a big problem unless you are at the bottleneck of the computation.

Distributed Key-value Store

The type KVStore and related methods are used for data sharing across different devices or machines. It provides a simple and efficient integer - NDArray key-value storage system that each device can pull or push.

The following example shows how to create a local KVStore, initialize a value and then pull it back.

kv    = mx.KVStore(:local)
shape = (2, 3)
key   = 3

mx.init!(kv, key, mx.ones(shape) * 2)
a = NDArray(undef, shape)
mx.pull!(kv, key, a) # pull value into a
a
2×3 NDArray{Float32,2} @ cpu0:
 2.0f0  2.0f0  2.0f0
 2.0f0  2.0f0  2.0f0

Intermediate Level Interface

Symbols and Composition

The way we build deep learning models in MXNet.jl is to use the powerful symbolic composition system. It is like Theano, except that we avoided long expression compilation time by providing larger neural network related building blocks to guarantee computation performance.

The basic type is mx.SymbolicNode. The following is a trivial example of composing two symbols with the + operation.

A = mx.Variable(:A)
B = mx.Variable(:B)
C = A + B
print(C)  # debug printing
Symbol Outputs:
    output[0]=_plus0(0)
Variable:A
Variable:B
--------------------
Op:elemwise_add, Name=_plus0
Inputs:
    arg[0]=A(0) version=0
    arg[1]=B(0) version=0

We get a new SymbolicNode by composing existing SymbolicNodes by some operations. A hierarchical architecture of a deep neural network could be realized by recursive composition. For example, the following code snippet shows a simple 2-layer MLP construction, using a hidden layer of 128 units and a ReLU activation function.

net = mx.Variable(:data)
net = mx.FullyConnected(net, name=:fc1, num_hidden=128)
net = mx.Activation(net, name=:relu1, act_type=:relu)
net = mx.FullyConnected(net, name=:fc2, num_hidden=64)
net = mx.SoftmaxOutput(net, name=:out)
print(net)  # debug printing
Symbol Outputs:
    output[0]=out(0)
Variable:data
Variable:fc1_weight
Variable:fc1_bias
--------------------
Op:FullyConnected, Name=fc1
Inputs:
    arg[0]=data(0) version=0
    arg[1]=fc1_weight(0) version=0
    arg[2]=fc1_bias(0) version=0
Attrs:
    num_hidden=128
--------------------
Op:Activation, Name=relu1
Inputs:
    arg[0]=fc1(0)
Attrs:
    act_type=relu
Variable:fc2_weight
Variable:fc2_bias
--------------------
Op:FullyConnected, Name=fc2
Inputs:
    arg[0]=relu1(0)
    arg[1]=fc2_weight(0) version=0
    arg[2]=fc2_bias(0) version=0
Attrs:
    num_hidden=64
Variable:out_label
--------------------
Op:SoftmaxOutput, Name=out
Inputs:
    arg[0]=fc2(0)
    arg[1]=out_label(0) version=0

Each time we take the previous symbol, and compose with an operation. Unlike the simple + example above, the operations here are "bigger" ones, that correspond to common computation layers in deep neural networks.

Each of those operation takes one or more input symbols for composition, with optional hyper-parameters (e.g. num_hidden, act_type) to further customize the composition results.

When applying those operations, we can also specify a name for the result symbol. This is convenient if we want to refer to this symbol later on. If not supplied, a name will be automatically generated.

Each symbol takes some arguments. For example, in the + case above, to compute the value of C, we will need to know the values of the two inputs A and B. For neural networks, the arguments are primarily two categories: inputs and parameters. inputs are data and labels for the networks, while parameters are typically trainable weights, bias, filters.

When composing symbols, their arguments accumulates. We can list all the arguments by

mx.list_arguments(net)
6-element Array{Symbol,1}:
 :data      
 :fc1_weight
 :fc1_bias  
 :fc2_weight
 :fc2_bias  
 :out_label 

Note the names of the arguments are generated according to the provided name for each layer. We can also specify those names explicitly:

julia> using MXNet

julia> net = mx.Variable(:data)
SymbolicNode data

julia> w   = mx.Variable(:myweight)
SymbolicNode myweight

julia> net = mx.FullyConnected(net, weight=w, name=:fc1, num_hidden=128)
SymbolicNode fc1

julia> mx.list_arguments(net)
3-element Array{Symbol,1}:
 :data
 :myweight
 :fc1_bias

The simple fact is that a Variable is just a placeholder mx.SymbolicNode. In composition, we can use arbitrary symbols for arguments. For example:

julia> using MXNet

julia> net  = mx.Variable(:data)
SymbolicNode data

julia> net  = mx.FullyConnected(net, name=:fc1, num_hidden=128)
SymbolicNode fc1

julia> net2 = mx.Variable(:data2)
SymbolicNode data2

julia> net2 = mx.FullyConnected(net2, name=:net2, num_hidden=128)
SymbolicNode net2

julia> mx.list_arguments(net2)
3-element Array{Symbol,1}:
 :data2
 :net2_weight
 :net2_bias

julia> composed_net = net2(data2=net, name=:composed)
SymbolicNode composed

julia> mx.list_arguments(composed_net)
5-element Array{Symbol,1}:
 :data
 :fc1_weight
 :fc1_bias
 :net2_weight
 :net2_bias

Note we use a composed symbol, net as the argument data2 for net2 to get a new symbol, which we named :composed. It also shows that a symbol itself is a call-able object, which can be invoked to fill in missing arguments and get more complicated symbol compositions.

Shape Inference

Given enough information, the shapes of all arguments in a composed symbol could be inferred automatically. For example, given the input shape, and some hyper-parameters like num_hidden, the shapes for the weights and bias in a neural network could be inferred.

julia> using MXNet

julia> net = mx.Variable(:data)
SymbolicNode data

julia> net = mx.FullyConnected(net, name=:fc1, num_hidden=10)
SymbolicNode fc1

julia> arg_shapes, out_shapes, aux_shapes = mx.infer_shape(net, data=(10, 64))
(Tuple[(10, 64), (10, 10), (10,)], Tuple[(10, 64)], Tuple[])

The returned shapes corresponds to arguments with the same order as returned by mx.list_arguments. The out_shapes are shapes for outputs, and aux_shapes can be safely ignored for now.

julia> for (n, s) in zip(mx.list_arguments(net), arg_shapes)
         println("$n\t=> $s")
       end
data    => (10, 64)
fc1_weight  => (10, 10)
fc1_bias    => (10,)
julia> for (n, s) in zip(mx.list_outputs(net), out_shapes)
         println("$n\t=> $s")
       end
fc1_output  => (10, 64)

Binding and Executing

In order to execute the computation graph specified a composed symbol, we will bind the free variables to concrete values, specified as mx.NDArray. This will create an mx.Executor on a given mx.Context. A context describes the computation devices (CPUs, GPUs, etc.) and an executor will carry out the computation (forward/backward) specified in the corresponding symbolic composition.

julia> using MXNet

julia> A = mx.Variable(:A)
SymbolicNode A

julia> B = mx.Variable(:B)
SymbolicNode B

julia> C = A .* B
SymbolicNode _mul0

julia> a = mx.ones(3) * 4
3-element NDArray{Float32,1} @ cpu0:
 4.0f0
 4.0f0
 4.0f0

julia> b = mx.ones(3) * 2
3-element NDArray{Float32,1} @ cpu0:
 2.0f0
 2.0f0
 2.0f0

julia> c_exec = mx.bind(C, context=mx.cpu(), args=Dict(:A => a, :B => b));

julia> mx.forward(c_exec)
1-element Array{NDArray{Float32,1},1}:
 NDArray(Float32[8.0, 8.0, 8.0])

julia> c_exec.outputs[1]
3-element NDArray{Float32,1} @ cpu0:
 8.0f0
 8.0f0
 8.0f0

julia> copy(c_exec.outputs[1])  # copy turns NDArray into Julia Array
3-element Array{Float32,1}:
 8.0
 8.0
 8.0

For neural networks, it is easier to use simple_bind. By providing the shape for input arguments, it will perform a shape inference for the rest of the arguments and create the NDArray automatically. In practice, the binding and executing steps are hidden under the Model interface.

TODO Provide pointers to model tutorial and further details about binding and symbolic API.

High Level Interface

The high level interface include model training and prediction API, etc.