How to | Convert from Caffe to MXNet¶
Key topics covered include the following:
Converting Caffe trained models to MXNet¶
The converting tool is available at
tools/caffe_converter. On
the remaining of this section, we assume we are on the tools/caffe_converter
directory.
How to build¶
If Caffe’s python package is installed, namely we can run import caffe
in
python, then we are ready to go.
For example, we can used AWS Deep Learning AMI with both Caffe and MXNet installed.
Otherwise we can install the Google protobuf compiler and its python binding. It is easier to install, but may be slower during running.
- Install the compiler:
- Linux: install
protobuf-compiler
e.g.sudo apt-get install protobuf-compiler
for Ubuntu andsudo yum install protobuf-compiler
for Redhat/Fedora. - Windows: Download the win32 build of
protobuf. Make sure to
download the version that corresponds to the version of the python binding
on the next step. Extract to any location then add that location to your
PATH
- Mac OS X:
brew install protobuf
- Install the python binding by either
conda install -c conda-forge protobuf
orpip install protobuf
. - Compile Caffe proto definition. Run
make
in Linux or Mac OS X, ormake_win32.bat
in Windows
How to use¶
There are three tools:
convert_symbol.py
: convert Caffe model definition in protobuf into MXNet’s Symbol in JSON format.convert_model.py
: convert Caffe model parameters into MXNet’s NDArray formatconvert_mean.py
: convert Caffe input mean file into MXNet’s NDArray format
In addition, there are two tools:
convert_caffe_modelzoo.py
: download and convert models from Caffe model zoo.test_converter.py
: test the converted models by checking the prediction accuracy.
Calling Caffe operators in MXNet¶
Besides converting Caffe models, MXNet supports calling most Caffe operators, including network layer, data layer, and loss function, directly. It is particularly useful if there are customized operators implemented in Caffe, then we do not need to re-implement them in MXNet.
How to install¶
This feature requires Caffe. In particular, we need to re-compile Caffe before PR #4527 is merged into Caffe. There are the steps of how to rebuild Caffe:
- Download Caffe. E.g.
git clone https://github.com/BVLC/caffe
- Download the patch for the MXNet interface and apply to Caffe. E.g.
cd caffe && wget https://github.com/BVLC/caffe/pull/4527.patch && git apply 4527.patch
- Build and install Caffe by following the official guide.
Next we need to compile MXNet with Caffe supports
- Copy
make/config.mk
(for Linux) ormake/osx.mk
(for Mac) into the MXNet root folder asconfig.mk
if you have not done it yet - Open the copied
config.mk
and uncomment these two lines
CAFFE_PATH = $(HOME)/caffe
MXNET_PLUGINS += plugin/caffe/caffe.mk
Modify CAFFE_PATH
to your Caffe installation, if necessary.
- Then build with 8 threads
make clean && make -j8
.
How to use¶
This Caffe plugin adds three components into MXNet:
sym.CaffeOp
: Caffe neural network layersym.CaffeLoss
: Caffe loss functionsio.CaffeDataIter
: Caffe data layer
Use sym.CaffeOp
¶
The following example shows the definition of a 10 classes multi-layer perceptron:
data = mx.sym.Variable('data')
fc1 = mx.sym.CaffeOp(data_0=data, num_weight=2, name='fc1', prototxt="layer{type:\"InnerProduct\" inner_product_param{num_output: 128} }")
act1 = mx.sym.CaffeOp(data_0=fc1, prototxt="layer{type:\"TanH\"}")
fc2 = mx.sym.CaffeOp(data_0=act1, num_weight=2, name='fc2', prototxt="layer{type:\"InnerProduct\" inner_product_param{num_output: 64} }")
act2 = mx.sym.CaffeOp(data_0=fc2, prototxt="layer{type:\"TanH\"}")
fc3 = mx.sym.CaffeOp(data_0=act2, num_weight=2, name='fc3', prototxt="layer{type:\"InnerProduct\" inner_product_param{num_output: 10}}")
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
Let’s break it down. First, data = mx.sym.Variable('data')
defines a variable
as a placeholder for input. Then, it’s fed through Caffe operators with fc1 = mx.sym.CaffeOp(...)
. CaffeOp
accepts several arguments:
- The inputs to Caffe operators are named as
data_i
for i=0, ..., num_data-1 num_data
is the number of inputs. In default it is 1, and therefore skipped in the above example.num_out
is the number of outputs. In default it is 1 and also skipped.num_weight
is the number of weights (blobs_
). Its default value is 0. We need to explicitly specify it for a non-zero value.prototxt
is the protobuf configuration string.
Use sym.CaffeLoss
¶
Using Caffe loss is similar. We can replace the MXNet loss with Caffe loss. We can replace
Replacing the last line of the above example with the following two lines we can call Caffe loss instead of MXNet loss.
label = mx.sym.Variable('softmax_label')
mlp = mx.sym.CaffeLoss(data=fc3, label=label, grad_scale=1, name='softmax', prototxt="layer{type:\"SoftmaxWithLoss\"}")
Similar to CaffeOp
, CaffeLoss
has arguments num_data
(2 in default) and
num_out
(1 in default). But there are two differences
- Inputs are
data
andlabel
. And we need to explicitly create a variable placeholder for label, which is implicitly done in MXNet loss. grad_scale
is the weight of this loss.
Use io.CaffeDataIter
¶
We can also wrap a Caffe data layer into MXNet’s data iterator. Below is an example for creating a data iterator for MNIST
train = mx.io.CaffeDataIter(
prototxt =
'layer { \
name: "mnist" \
type: "Data" \
top: "data" \
top: "label" \
include { \
phase: TEST \
} \
transform_param { \
scale: 0.00390625 \
} \
data_param { \
source: "caffe/examples/mnist/mnist_test_lmdb" \
batch_size: 100 \
backend: LMDB \
} \
}',
flat = flat,
num_examples = 60000,
)
Put it all together¶
The complete example is available at example/caffe