mx.symbol.RNN¶
Description¶
Applies recurrent layers to input data. Currently, vanilla RNN, LSTM and GRU are implemented, with both multi-layer and bidirectional support.
When the input data is of type float32 and the environment variables MXNET_CUDA_ALLOW_TENSOR_CORE and MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION are set to 1, this operator will try to use pseudo-float16 precision (float32 math with float16 I/O) precision in order to use Tensor Cores on suitable NVIDIA GPUs. This can sometimes give significant speedups.
Vanilla RNN
Applies a single-gate recurrent layer to input X. Two kinds of activation function are supported: ReLU and Tanh.
With ReLU activation function:
With Tanh activtion function:
Reference paper: Finding structure in time - Elman, 1988. https://crl.ucsd.edu/~elman/Papers/fsit.pdf
LSTM
Long Short-Term Memory - Hochreiter, 1997. http://www.bioinf.jku.at/publications/older/2604.pdf
With the projection size being set, LSTM could use the projection feature to reduce the parameters size and give some speedups without significant damage to the accuracy.
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition - Sak et al. 2014. https://arxiv.org/abs/1402.1128
GRU
Gated Recurrent Unit - Cho et al. 2014. http://arxiv.org/abs/1406.1078
The definition of GRU here is slightly different from paper but compatible with CUDNN.
Usage¶
mx.symbol.RNN(...)
Arguments¶
| Argument | Description | 
|---|---|
| 
 | NDArray-or-Symbol. Input data to RNN | 
| 
 | NDArray-or-Symbol. Vector of all RNN trainable parameters concatenated | 
| 
 | NDArray-or-Symbol initial hidden state of the RNN | 
| 
 | NDArray-or-Symbol initial cell state for LSTM networks (only for LSTM) | 
| 
 | NDArray-or-Symbol. Vector of valid sequence lengths for each element in batch. (Only used if use_sequence_length kwarg is True) | 
| 
 | int (non-negative), required. size of the state for each layer | 
| 
 | int (non-negative), required. number of stacked layers | 
| 
 | boolean, optional, default=0. whether to use bidirectional recurrent layers | 
| 
 | {‘gru’, ‘lstm’, ‘rnn_relu’, ‘rnn_tanh’}, required. the type of RNN to compute | 
| 
 | float, optional, default=0. drop rate of the dropout on the outputs of each RNN layer, except the last layer. | 
| 
 | boolean, optional, default=0. Whether to have the states as symbol outputs. | 
| 
 | int or None, optional, default=’None’ size of project size | 
| 
 | double or None, optional, default=None. Minimum clip value of LSTM states. This option must be used together with lstm_state_clip_max. | 
| 
 | double or None, optional, default=None. Maximum clip value of LSTM states. This option must be used together with lstm_state_clip_min. | 
| 
 | boolean, optional, default=0. Whether to stop NaN from propagating in state by clipping it to min/max. If clipping range is not specified, this option is ignored. | 
| 
 | boolean, optional, default=0. If set to true, this layer takes in an extra input parameter sequence_length to specify variable length sequence | 
| 
 | string, optional. Name of the resulting symbol. | 
Value¶
out The result mx.symbol
Link to Source Code: http://github.com/apache/incubator-mxnet/blob/1.6.0/src/operator/rnn.cc#L377
