Hybridize

A Hybrid of Imperative and Symbolic Programming

Imperative programming makes use of programming statements to change a program’s state. Consider the following example of simple imperative programming code.

[1]:
def add(a, b):
    return a + b

def fancy_func(a, b, c, d):
    e = add(a, b)
    f = add(c, d)
    g = add(e, f)
    return g

fancy_func(1, 2, 3, 4)
[1]:
10

As expected, Python will perform an addition when running the statement e = add(a, b), and will store the result as the variable e, thereby changing the program’s state. The next two statements f = add(c, d) and g = add(e, f) will similarly perform additions and store the results as variables.

Although imperative programming is convenient, it may be inefficient. On the one hand, even if the add function is repeatedly called throughout the fancy_func function, Python will execute the three function calling statements individually, one after the other. On the other hand, we need to save the variable values of e and f until all the statements in fancy_func have been executed. This is because we do not know whether the variables e and f will be used by other parts of the program after the statements e = add(a, b) and f = add(c, d) have been executed.

Contrary to imperative programming, symbolic programming is usually performed after the computational process has been fully defined. Symbolic programming is used by multiple deep learning frameworks, including Theano and TensorFlow. The process of symbolic programming generally requires the following three steps:

  1. Define the computation process.

  2. Compile the computation process into an executable program.

  3. Provide the required inputs and call on the compiled program for execution.

In the example below, we utilize symbolic programming to re-implement the imperative programming code provided at the beginning of this section.

[2]:
def add_str():
    return '''
def add(a, b):
    return a + b
'''

def fancy_func_str():
    return '''
def fancy_func(a, b, c, d):
    e = add(a, b)
    f = add(c, d)
    g = add(e, f)
    return g
'''

def evoke_str():
    return add_str() + fancy_func_str() + '''
print(fancy_func(1, 2, 3, 4))
'''

prog = evoke_str()
print(prog)
y = compile(prog, '', 'exec')
exec(y)

def add(a, b):
    return a + b

def fancy_func(a, b, c, d):
    e = add(a, b)
    f = add(c, d)
    g = add(e, f)
    return g

print(fancy_func(1, 2, 3, 4))

10

The three functions defined above will only return the results of the computation process as a string. Finally, the complete computation process is compiled and run using the compile function. This leaves more room to optimize computation, since the system is able to view the entire program during its compilation. For example, during compilation, the program can be rewritten as print((1 + 2) + (3 + 4)) or even directly rewritten as print(10). Apart from reducing the amount of function calls, this process also saves memory.

A comparison of these two programming methods shows that

  • imperative programming is easier. When imperative programming is used in Python, the majority of the code is straightforward and easy to write. At the same time, it is easier to debug imperative programming code. This is because it is easier to obtain and print all relevant intermediate variable values, or make use of Python’s built-in debugging tools.

  • Symbolic programming is more efficient and easier to port. Symbolic programming makes it easier to better optimize the system during compilation, while also having the ability to port the program into a format independent of Python. This allows the program to be run in a non-Python environment, thus avoiding any potential performance issues related to the Python interpreter.

Hybrid programming provides the best of both worlds.

Most deep learning frameworks choose either imperative or symbolic programming. For example, both Theano and TensorFlow (inspired by the latter) make use of symbolic programming, while Chainer and its predecessor PyTorch utilize imperative programming. When designing Gluon, developers considered whether it was possible to harness the benefits of both imperative and symbolic programming. The developers believed that users should be able to develop and debug using pure imperative programming, while having the ability to convert most programs into symbolic programming to be run when product-level computing performance and deployment are required This was achieved by Gluon through the introduction of hybrid programming.

In hybrid programming, we can build models using either the HybridBlock or the HybridSequential classes. By default, they are executed in the same way Block or Sequential classes are executed in imperative programming. When the hybridize function is called, Gluon will convert the program’s execution into the style used in symbolic programming. In fact, most models can make use of hybrid programming’s execution style.

Through the use of experiments, this section will demonstrate the benefits of hybrid programming.

Constructing Models Using the HybridSequential Class

Previously, we learned how to use the Sequential class to concatenate multiple layers. Next, we will replace the Sequential class with the HybridSequential class in order to make use of hybrid programming.

[3]:
from mxnet import np, npx, sym
from mxnet.gluon import nn
import time

def get_net():
    net = nn.HybridSequential()  # Here we use the class HybridSequential.
    net.add(nn.Dense(256, activation='relu'),
            nn.Dense(128, activation='relu'),
            nn.Dense(2))
    net.initialize()
    return net

x = np.random.normal(size=(1, 512))
net = get_net()
net(x)
[04:45:52] /work/mxnet/src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU
[3]:
array([[-0.00306124, -0.19131702]])

By calling the hybridize function, we are able to compile and optimize the computation of the concatenation layer in the HybridSequential instance. The model’s computation result remains unchanged.

[4]:
net.hybridize()
net(x)
[4]:
array([[-0.00306124, -0.19131702]])

It should be noted that only the layers inheriting the HybridBlock class will be optimized during computation. For example, the HybridSequential and Dense classes provided by Gluon are all subclasses of HybridBlock class, meaning they will both be optimized during computation. A layer will not be optimized if it inherits from the Block class rather than the HybridBlock class.

Computing Performance

To demonstrate the performance improvement gained by the use of symbolic programming, we will compare the computation time before and after calling the hybridize function. Here we time 1000 net model computations. The model computations are based on imperative and symbolic programming, respectively, before and after net has called the hybridize function.

[5]:
def benchmark(net, x):
    start = time.time()
    for i in range(1000):
        _ = net(x)
    npx.waitall()  # To facilitate timing, we wait for all computations to be completed.
    return time.time() - start

net = get_net()
print('before hybridizing: %.4f sec' % (benchmark(net, x)))
net.hybridize()
print('after hybridizing: %.4f sec' % (benchmark(net, x)))
before hybridizing: 0.4131 sec
after hybridizing: 0.2037 sec

As is observed in the above results, after a HybridSequential instance calls the hybridize function, computing performance is improved through the use of symbolic programming.

Achieving Symbolic Programming

We can save the symbolic program and model parameters to the hard disk through the use of the export function after the net model has finished computing the output based on the input, such as in the case of net(x) in the benchmark function.

[6]:
net.export('my_mlp')
[6]:
('my_mlp-symbol.json', 'my_mlp-0000.params')

The .json and .params files generated during this process are a symbolic program and a model parameter, respectively. They can be read by other front-end languages supported by Python or MXNet, such as C++, R, Scala, and Perl. This allows us to deploy trained models to other devices and easily use other front-end programming languages. At the same time, because symbolic programming was used during deployment, the computing performance is often superior to that based on imperative programming.

In MXNet, a symbolic program refers to a program that makes use of the Symbol type. We know that, when the NDArray input x is provided to net, net(x) will directly calculate the model output and return a result based on x. For models that have called the hybridize function, we can also provide a Symbol-type input variable, and net(x) will return Symbol type results.

x = sym.var('data')
net(x)

Constructing Models Using the HybridBlock Class

Similar to the correlation between the Sequential Block classes, the HybridSequential class is a HybridBlock subclass.

Earlier, we demonstrated that, after calling the hybridize function, the model is able to achieve superior computing performance and portability. In addition, model flexibility can be affected after calling the hybridize function. We will demonstrate this by constructing a model using the HybridBlock class.

[7]:
class HybridNet(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(HybridNet, self).__init__(**kwargs)
        self.hidden = nn.Dense(10)
        self.output = nn.Dense(2)

    def forward(self, x):
        print('x: ', x)
        x = npx.relu(self.hidden(x))
        print('hidden: ', x)
        return self.output(x)
[8]:
net = HybridNet()
net.initialize()
x = np.random.normal(size=(1, 4))
net(x)
x:  [[ 2.145466    0.64158314 -0.00491827  0.3836984 ]]
hidden:  [[0.15626392 0.         0.         0.10257901 0.         0.
  0.         0.         0.14553794 0.03259005]]
[8]:
array([[-0.00194377,  0.00215183]])

Repeating the forward computation will achieve the same results.

[9]:
net(x)
x:  [[ 2.145466    0.64158314 -0.00491827  0.3836984 ]]
hidden:  [[0.15626392 0.         0.         0.10257901 0.         0.
  0.         0.         0.14553794 0.03259005]]
[9]:
array([[-0.00194377,  0.00215183]])

Next, we will see what happens after we call the hybridize function.

[10]:
net.hybridize()
net(x)
x:  [[ 2.145466    0.64158314 -0.00491827  0.3836984 ]]
hidden:  [[0.15626392 0.         0.         0.10257901 0.         0.
  0.         0.         0.14553794 0.03259005]]
[10]:
array([[-0.00194377,  0.00215183]])

Now, we repeat the forward computation.

[11]:
net(x)
[11]:
array([[-0.00194377,  0.00215183]])

We can see that the three lines of print statements defined in the forward function will not print anything. This is because a symbolic computing graph has been recorded since the last time net(x) was run by calling the hybridize function. Afterwards, when we run net(x) again, MXNet will no longer need to access Python code, but can directly perform symbolic programming at the C++ backend. This is another reason why model computing performance will be improve after the hybridize function is called. However, there is always the potential that any programs we write will suffer a loss in flexibility. If we want to use the three lines of print statements to debug the code in the above example, they will be skipped over and we would not be able to print when the symbolic program is executed. Additionally, in the case of a few functions not supported by Symbol (like asnumpy), and operations in-place like a += b and a[:] = a + b (must be rewritten as a = a + b). Therefore, we will not be able to use the forward function or perform forward computation after the hybridize function has been called.

Disabling Hybridization

If we want to disable the hybridize function, we can do that by using the following code:

[12]:
net.hybridize(active=False)