Performance¶
The following tutorials will help you learn how to tune MXNet or use tools that will improve training and inference performance.
Essential¶
Improving Performancehttps://mxnet.apache.org/api/faq/perf
How to get the best performance from MXNet.
Profilerbackend/profiler.html
How to profile MXNet models.
Tuning NumPy Operationshttps://mxnet.apache.org/versions/master/tutorials/gluon/gotchas_numpy_in_mxnet.html
Gotchas using NumPy in MXNet.
Compression¶
Compression: float16compression/float16.html
How to use float16 in your model to boost training speed.
Gradient Compressioncompression/gradient_compression.html
How to use gradient compression to reduce communication bandwidth and increase speed.
Accelerated Backend¶
TensorRTbackend/tensorRt.html
How to use NVIDIA’s TensorRT to boost inference performance.
Distributed Training¶
Distributed Training Using the KVStore APIhttps://mxnet.apache.org/versions/master/faq/distributed_training.html
How to use the KVStore API to use multiple GPUs when training a model.
Training with Multiple GPUs Using Model Parallelismhttps://mxnet.apache.org/versions/master/faq/model_parallel_lstm.html
An overview of using multiple GPUs when training an LSTM.
Data Parallelism in MXNethttps://mxnet.apache.org/versions/master/faq/multi_devices.html
An overview of distributed training strategies.
MXNet with Horovodhttps://github.com/apache/incubator-mxnet/tree/master/example/distributed_training-horovod
A set of example scripts demonstrating MNIST and ImageNet training with Horovod as the distributed training backend.