Performance

The following tutorials will help you learn how to tune MXNet or use tools that will improve training and inference performance.

Essential

Improving Performance/api/faq/perf

How to get the best performance from MXNet.

Profilerbackend/profiler.html

How to profile MXNet models.

Compression

Compression: float16/api/faq/float16

How to use float16 in your model to boost training speed.

Gradient Compression/api/faq/gradient_compression

How to use gradient compression to reduce communication bandwidth and increase speed.

Accelerated Backend

TensorRTbackend/tensorrt/tensorrt.html

How to use NVIDIA’s TensorRT to boost inference performance.

Distributed Training

Distributed Training Using the KVStore API/api/faq/distributed_training.html

How to use the KVStore API to use multiple GPUs when training a model.

Training with Multiple GPUs Using Model Parallelism/api/faq/model_parallel_lstm.html

An overview of using multiple GPUs when training an LSTM.

Data Parallelism in MXNet/api/faq/multi_device

An overview of distributed training strategies.

MXNet with Horovodhttps://github.com/apache/mxnet/tree/master/example/distributed_training-horovod

A set of example scripts demonstrating MNIST and ImageNet training with Horovod as the distributed training backend.