Gluon Recurrent Neural Network API¶

Overview¶

This document lists the recurrent neural network API in Gluon:

Recurrent Layers¶

Recurrent layers can be used in Sequential with other regular neural network layers. For example, to construct a sequence labeling model where a prediction is made for each time-step:

model = mx.gluon.nn.Sequential()
with model.name_scope():
    model.add(mx.gluon.nn.Embedding(30, 10))
    model.add(mx.gluon.rnn.LSTM(20))
    model.add(mx.gluon.nn.Dense(5, flatten=False))
model.initialize()
model(mx.nd.ones((2,3)))

`RNN`	Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.
`LSTM`	Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
`GRU`	Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

Recurrent Cells¶

Recurrent cells allows fine-grained control when defining recurrent models. User can explicit step and unroll to construct complex networks. It provides more flexibility but is slower than recurrent layers. Recurrent cells can be stacked with SequentialRNNCell:

model = mx.gluon.rnn.SequentialRNNCell()
with model.name_scope():
    model.add(mx.gluon.rnn.LSTMCell(20))
    model.add(mx.gluon.rnn.LSTMCell(20))
states = model.begin_state(batch_size=32)
inputs = mx.nd.random.uniform(shape=(5, 32, 10))
outputs = []
for i in range(5):
    output, states = model(inputs[i], states)
    outputs.append(output)

`RNNCell`	Elman RNN recurrent neural network cell.
`LSTMCell`	Long-Short Term Memory (LSTM) network cell.
`GRUCell`	Gated Rectified Unit (GRU) network cell.
`RecurrentCell`	Abstract base class for RNN cells
`SequentialRNNCell`	Sequentially stacking multiple RNN cells.
`BidirectionalCell`	Bidirectional RNN cell.
`DropoutCell`	Applies dropout on input.
`ZoneoutCell`	Applies Zoneout on base cell.
`ResidualCell`	Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

API Reference¶

Recurrent neural network module.

class mxnet.gluon.rnn.BidirectionalCell(l_cell, r_cell, output_prefix='bi_')[source]¶

Bidirectional RNN cell.

Parameters:	l_cell (RecurrentCell) – Cell for forward unrolling r_cell (RecurrentCell) – Cell for backward unrolling

class mxnet.gluon.rnn.DropoutCell(rate, axes=(), prefix=None, params=None)[source]¶

Applies dropout on input.

Parameters:	rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain. axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.

Inputs:

data: input tensor with shape (batch_size, size).
states: a list of recurrent state tensors.

Outputs:

out: output tensor with shape (batch_size, size).
next_states: returns input states directly.

class mxnet.gluon.rnn.GRU(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]¶

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:

hidden_size (int) – The number of features in the hidden state h
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
prefix (str or None) – Prefix of this Block.
params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:

data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:

out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)

class mxnet.gluon.rnn.GRUCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]¶

Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

Each call computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters:

hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
prefix (str, default 'gru_') – prefix for name of Block`s (and name of weight if params is `None).
params (Parameter or None, default None) – Container for weight sharing between cells. Created if None.

Inputs:

data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:

out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.

class mxnet.gluon.rnn.HybridRecurrentCell(prefix=None, params=None)[source]¶: HybridRecurrentCell supports hybridize.

class mxnet.gluon.rnn.HybridSequentialRNNCell(prefix=None, params=None)[source]¶

Sequentially stacking multiple HybridRNN cells.

add(cell)[source]¶

Appends a cell into the stack.

Parameters:	cell (RecurrentCell) – The cell to add.

class mxnet.gluon.rnn.LSTM(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]¶

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:

hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
projection_size (int, default None) – The number of features after projection.
h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.
state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.
state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.
state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
prefix (str or None) – Prefix of this Block.
params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:

data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:

out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> c0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])

class mxnet.gluon.rnn.LSTMCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None, activation='tanh', recurrent_activation='sigmoid')[source]¶

Long-Short Term Memory (LSTM) network cell.

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters:

hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
prefix (str, default 'lstm_') – Prefix for name of Block`s (and name of weight if params is `None).
params (Parameter or None, default None) – Container for weight sharing between cells. Created if None.
activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.
recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.
Inputs –
- data: input tensor with shape (batch_size, input_size).
- states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).
Outputs –
- out: output tensor with shape (batch_size, num_hidden).
- next_states: a list of two output recurrent state tensors. Each has the same shape as states.

class mxnet.gluon.rnn.ModifierCell(base_cell)[source]¶

Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.

After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.

class mxnet.gluon.rnn.RNN(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]¶

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:

hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
prefix (str or None) – Prefix of this Block.
params (ParameterDict or None) – Shared Parameters for this Block.

Inputs:

data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:

out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)

class mxnet.gluon.rnn.RNNCell(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]¶

Elman RNN recurrent neural network cell.

Each call computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters:

hidden_size (int) – Number of units in output symbol
activation (str or Symbol, default 'tanh') – Type of activation function.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
prefix (str, default 'rnn_') – Prefix for name of Block`s (and name of weight if params is `None).
params (Parameter or None) – Container for weight sharing between cells. Created if None.

Inputs:

data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:

out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.

class mxnet.gluon.rnn.RecurrentCell(prefix=None, params=None)[source]¶

Abstract base class for RNN cells

Parameters:	prefix (str, optional) – Prefix for names of Block`s (this prefix is also used for names of weights if `params is None i.e. if params are being created and not reused) params (Parameter or None, default None) – Container for weight sharing between cells. A new Parameter container is created if params is None.

begin_state(batch_size=0, func=, **kwargs)[source]¶

Initial state for this cell.

Parameters:	func (callable, default symbol.zeros) – Function for creating initial state. For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states. For NDArray API, func can be ndarray.zeros, ndarray.ones, etc. batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input. **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
Returns:	states – Starting states for the first RNN step.
Return type:	nested list of Symbol

forward(inputs, states)[source]¶

Unrolls the recurrent cell for one time step.

Parameters:

inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns:

output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.