gluon.rnn¶
Buildin recurrent neural network layers are provided in the following two modules:
Recurrent neural network module. 

Contrib recurrent neural network module. 
Recurrent Cells¶
LongShort Term Memory (LSTM) network cell. 

Gated Rectified Unit (GRU) network cell. 

Abstract base class for RNN cells 

Sequentially stacking multiple RNN cells. 

Bidirectional RNN cell. 

Applies dropout on input. 

Applies Zoneout on base cell. 

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). 
Recurrent Layers¶
Applies a multilayer Elman RNN with tanh or ReLU nonlinearity to an input sequence. 

Applies a multilayer long shortterm memory (LSTM) RNN to an input sequence. 

Applies a multilayer gated recurrent unit (GRU) RNN to an input sequence. 
API Reference¶
Recurrent neural network module.
Classes

Bidirectional RNN cell. 

Applies dropout on input. 

Applies a multilayer gated recurrent unit (GRU) RNN to an input sequence. 

Gated Rectified Unit (GRU) network cell. 
HybridRecurrentCell supports hybridize. 

Sequentially stacking multiple HybridRNN cells. 


Applies a multilayer long shortterm memory (LSTM) RNN to an input sequence. 

LongShort Term Memory (LSTM) network cell. 

Base class for modifier cells. 

Applies a multilayer Elman RNN with tanh or ReLU nonlinearity to an input sequence. 

Elman RNN recurrent neural network cell. 
Abstract base class for RNN cells 


Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). 
Sequentially stacking multiple RNN cells. 


Applies Zoneout on base cell. 

class
mxnet.gluon.rnn.
BidirectionalCell
(l_cell, r_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Bidirectional RNN cell.
 Parameters
l_cell (RecurrentCell) – Cell for forward unrolling
r_cell (RecurrentCell) – Cell for backward unrolling
Methods
begin_state
(**kwargs)Initial state for this cell.
state_info
([batch_size])shape and layout information of states
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.

begin_state
(**kwargs)[source]¶ Initial state for this cell.
 Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
 Returns
states – Starting states for the first RNN step.
 Return type
nested list of Symbol

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class
mxnet.gluon.rnn.
DropoutCell
(rate, axes=())[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Applies dropout on input.
 Parameters
rate (float) – Percentage of elements to drop out, which is 1  percentage to retain.
axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.
Methods
hybrid_forward
(F, inputs, states)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
 Inputs:
data: input tensor with shape (batch_size, size).
states: a list of recurrent state tensors.
 Outputs:
out: output tensor with shape (batch_size, size).
next_states: returns input states directly.

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

class
mxnet.gluon.rnn.
GRU
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multilayer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t1)} + b_{hn})) \\ h_t = (1  i_t) * n_t + i_t * h_{(t1)} \\ \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.
 Parameters
hidden_size (int) – The number of features in the hidden state h
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If nonzero, introduces a dropout layer on the outputs of each RNN layer except the last layer
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
 Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
 Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.GRU(100, 3) >>> layer.initialize() >>> input = mx.nd.random.uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, h0)

class
mxnet.gluon.rnn.
GRUCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).
Each call computes the following function:
\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t1)} + b_{hn})) \\ h_t = (1  i_t) * n_t + i_t * h_{(t1)} \\ \end{array}\end{split}\]Methods
hybrid_forward
(F, inputs, states, …)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.
 Parameters
hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.
recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.
 Inputs:
data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
 Outputs:
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.

class
mxnet.gluon.rnn.
HybridRecurrentCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.RecurrentCell
,mxnet.gluon.block.HybridBlock
HybridRecurrentCell supports hybridize.
Methods
hybrid_forward
(F, x, *args, **kwargs)Overrides to construct symbolic graph for this Block.

class
mxnet.gluon.rnn.
HybridSequentialRNNCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Sequentially stacking multiple HybridRNN cells.
Methods
add
(cell)Appends a cell into the stack.
begin_state
(**kwargs)Initial state for this cell.
hybrid_forward
(F, inputs, states)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.

add
(cell)[source]¶ Appends a cell into the stack.
 Parameters
cell (RecurrentCell) – The cell to add.

begin_state
(**kwargs)[source]¶ Initial state for this cell.
 Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
 Returns
states – Starting states for the first RNN step.
 Return type
nested list of Symbol

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().


class
mxnet.gluon.rnn.
LSTM
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multilayer long shortterm memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t1)} + b_{ho}) \\ c_t = f_t * c_{(t1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
 Parameters
hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If nonzero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
projection_size (int, default None) – The number of features after projection.
h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.
state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.
state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.
state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
 Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
 Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.LSTM(100, 3) >>> layer.initialize() >>> input = mx.nd.random.uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> c0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, [h0, c0])

class
mxnet.gluon.rnn.
LSTMCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
LongShort Term Memory (LSTM) network cell.
Each call computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t1)} + b_{ho}) \\ c_t = f_t * c_{(t1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]Methods
hybrid_forward
(F, inputs, states, …)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
 Parameters
hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.
recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.
Inputs –
data: input tensor with shape (batch_size, input_size).
states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).
Outputs –
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of two output recurrent state tensors. Each has the same shape as states.

class
mxnet.gluon.rnn.
ModifierCell
(base_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.
After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.
Methods
begin_state
([func])Initial state for this cell.
hybrid_forward
(F, inputs, states)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
begin_state
(func=<function zeros>, **kwargs)[source]¶ Initial state for this cell.
 Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
 Returns
states – Starting states for the first RNN step.
 Return type
nested list of Symbol

property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).


class
mxnet.gluon.rnn.
RNN
(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multilayer Elman RNN with tanh or ReLU nonlinearity to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t1)} + b_{hh})\]where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.
 Parameters
hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If nonzero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
 Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
 Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.RNN(100, 3) >>> layer.initialize() >>> input = mx.nd.random.uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random.uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, h0)

class
mxnet.gluon.rnn.
RNNCell
(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Elman RNN recurrent neural network cell.
Each call computes the following function:
\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t1)} + b_{hh})\]Methods
hybrid_forward
(F, inputs, states, …)Overrides to construct symbolic graph for this Block.
state_info
([batch_size])shape and layout information of states
where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.
 Parameters
hidden_size (int) – Number of units in output symbol
activation (str or Symbol, default 'tanh') – Type of activation function.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
 Inputs:
data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
 Outputs:
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.

class
mxnet.gluon.rnn.
RecurrentCell
[source]¶ Bases:
mxnet.gluon.block.Block
Abstract base class for RNN cells
Methods
begin_state
([batch_size, func])Initial state for this cell.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
reset
()Reset before reusing the cell for another graph.
state_info
([batch_size])shape and layout information of states
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.

begin_state
(batch_size=0, func=<function zeros>, **kwargs)[source]¶ Initial state for this cell.
 Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
 Returns
states – Starting states for the first RNN step.
 Return type
nested list of Symbol

forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
 Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
 Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().


class
mxnet.gluon.rnn.
ResidualCell
(base_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.ModifierCell
Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.
Methods
hybrid_forward
(F, inputs, states)Overrides to construct symbolic graph for this Block.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().


class
mxnet.gluon.rnn.
SequentialRNNCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.RecurrentCell
Sequentially stacking multiple RNN cells.
Methods
add
(cell)Appends a cell into the stack.
begin_state
(**kwargs)Initial state for this cell.
state_info
([batch_size])shape and layout information of states
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.

add
(cell)[source]¶ Appends a cell into the stack.
 Parameters
cell (RecurrentCell) – The cell to add.

begin_state
(**kwargs)[source]¶ Initial state for this cell.
 Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
 Returns
states – Starting states for the first RNN step.
 Return type
nested list of Symbol

unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
 Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequencetosequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
 Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().


class
mxnet.gluon.rnn.
ZoneoutCell
(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.ModifierCell
Applies Zoneout on base cell.
Methods
hybrid_forward
(F, inputs, states)Overrides to construct symbolic graph for this Block.
reset
()Reset before reusing the cell for another graph.