The following tutorials will help you learn how to tune MXNet or use tools that will improve training and inference performance.


Improving Performance/api/faq/perf

How to get the best performance from MXNet.


How to profile MXNet models.


Compression: float16/api/faq/float16

How to use float16 in your model to boost training speed.

Gradient Compression/api/faq/gradient_compression

How to use gradient compression to reduce communication bandwidth and increase speed.

Accelerated Backend


How to use NVIDIA’s TensorRT to boost inference performance.

Distributed Training

Distributed Training Using the KVStore API/api/faq/distributed_training.html

How to use the KVStore API to use multiple GPUs when training a model.

Training with Multiple GPUs Using Model Parallelism/api/faq/model_parallel_lstm.html

An overview of using multiple GPUs when training an LSTM.

Data Parallelism in MXNet/api/faq/multi_device

An overview of distributed training strategies.

MXNet with Horovod

A set of example scripts demonstrating MNIST and ImageNet training with Horovod as the distributed training backend.