Error Handling Guide

MXNet contains structured error classes to indicate specific types of error. Please raise a specific error type when possible, so that users can write code to handle a specific error category if necessary. You can directly raise the specific error object in python. In other languages like c++, you simply add <ErrorType>: prefix to the error message(see below).

Raise a Specific Error in C++

You can add <ErrorType>: prefix to your error message to raise an error of the corresponding type. Note that you do not have to add a new type mxnet.base.MXNetError will be raised by default when there is no error type prefix in the message. This mechanism works for both LOG(FATAL) and CHECK macros. The following code gives an example on how to do so.

// Python frontend receives the following error type:
// ValueError: Check failed: x == y (0 vs. 1) : expect x and y to be equal.
CHECK_EQ(0, 1) << "ValueError: expect x and y to be equal."


// Python frontend receives the following error type:
// InternalError: cannot reach here
LOG(FATAL) << "InternalError: cannot reach here";

As you can see in the above example, MXNet’s ffi system combines both the python and C++’s stacktrace into a single message, and generate the corresponding error class automatically.

How to choose an Error Type

You can go through the error types are listed below, try to use common sense and also refer to the choices in the existing code. We try to keep a reasonable amount of error types. If you feel there is a need to add a new error type, do the following steps:

  • Send a RFC proposal with a description and usage examples in the current codebase.
  • Add the new error type to mxnet.error with clear documents.
  • Update the list in this file to include the new error type.
  • Change the code to use the new error type.

We also recommend to use less abstraction when creating the short error messages. The code is more readable in this way, and also opens path to craft specific error messages when necessary.

def preferred():
    # Very clear about what is being raised and what is the error message.
    raise OpNotImplemented("Operator relu is not implemented in the MXNet frontend")

def _op_not_implemented(op_name):
    return OpNotImplemented("Operator {} is not implemented.").format(op_name)

def not_preferred():
    # Introduces another level of indirection.
    raise _op_not_implemented("relu")

If we need to introduce a wrapper function that constructs multi-line error messages, please put wrapper in the same file so other developers can look up the implementation easily.

Signal Handling

When not careful, some errors can occur in the form of a signal, which is handled by the OS kernel. In MXNet, you can choose to handle certain signals in the form of a catchable exception. This can be combined with the error type selection above so that it can be caught in the Python frontend. Currently, the following signals are handled this way:

  • SIGFPE: throws FloatingPointError
  • SIGBUS: throws IOError

To extend this to other signals, you can modify the signal handler registration in /src/initialize.cc.