gluon.rnn

Build-in recurrent neural network layers are provided in the following two modules:

mxnet.gluon.rnn

Recurrent neural network module.

Recurrent Cells

rnn.LSTMCell

Long-Short Term Memory (LSTM) network cell.

rnn.GRUCell

Gated Rectified Unit (GRU) network cell.

rnn.RecurrentCell

Abstract base class for RNN cells

rnn.LSTMPCell

Long-Short Term Memory Projected (LSTMP) network cell.

rnn.SequentialRNNCell

Sequentially stacking multiple RNN cells.

rnn.BidirectionalCell

Bidirectional RNN cell.

rnn.DropoutCell

Applies dropout on input.

rnn.VariationalDropoutCell

Applies Variational Dropout on base cell.

rnn.ZoneoutCell

Applies Zoneout on base cell.

rnn.ResidualCell

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

Convolutional Recurrent Cells

rnn.Conv1DLSTMCell

1D Convolutional LSTM network cell.

rnn.Conv2DLSTMCell

2D Convolutional LSTM network cell.

rnn.Conv3DLSTMCell

3D Convolutional LSTM network cell.

rnn.Conv1DGRUCell

1D Convolutional Gated Rectified Unit (GRU) network cell.

rnn.Conv2DGRUCell

2D Convolutional Gated Rectified Unit (GRU) network cell.

rnn.Conv3DGRUCell

3D Convolutional Gated Rectified Unit (GRU) network cell.

rnn.Conv1DRNNCell

1D Convolutional RNN cell.

rnn.Conv2DRNNCell

2D Convolutional RNN cell.

rnn.Conv3DRNNCell

3D Convolutional RNN cells

Recurrent Layers

rnn.RNN

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

rnn.LSTM

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

rnn.GRU

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

API Reference

Recurrent neural network module.

Classes

BidirectionalCell(l_cell, r_cell)

Bidirectional RNN cell.

Conv1DGRUCell(input_shape, hidden_channels, …)

1D Convolutional Gated Rectified Unit (GRU) network cell.

Conv1DLSTMCell(input_shape, hidden_channels, …)

1D Convolutional LSTM network cell.

Conv1DRNNCell(input_shape, hidden_channels, …)

1D Convolutional RNN cell.

Conv2DGRUCell(input_shape, hidden_channels, …)

2D Convolutional Gated Rectified Unit (GRU) network cell.

Conv2DLSTMCell(input_shape, hidden_channels, …)

2D Convolutional LSTM network cell.

Conv2DRNNCell(input_shape, hidden_channels, …)

2D Convolutional RNN cell.

Conv3DGRUCell(input_shape, hidden_channels, …)

3D Convolutional Gated Rectified Unit (GRU) network cell.

Conv3DLSTMCell(input_shape, hidden_channels, …)

3D Convolutional LSTM network cell.

Conv3DRNNCell(input_shape, hidden_channels, …)

3D Convolutional RNN cells

DropoutCell(rate[, axes])

Applies dropout on input.

GRU(hidden_size[, num_layers, layout, …])

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

GRUCell(hidden_size[, …])

Gated Rectified Unit (GRU) network cell.

HybridRecurrentCell()

HybridRecurrentCell supports hybridize.

HybridSequentialRNNCell()

Sequentially stacking multiple HybridRNN cells.

LSTM(hidden_size[, num_layers, layout, …])

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

LSTMCell(hidden_size[, …])

Long-Short Term Memory (LSTM) network cell.

LSTMPCell(hidden_size, projection_size[, …])

Long-Short Term Memory Projected (LSTMP) network cell.

ModifierCell(base_cell)

Base class for modifier cells.

RNN(hidden_size[, num_layers, activation, …])

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

RNNCell(hidden_size[, activation, …])

Elman RNN recurrent neural network cell.

RecurrentCell()

Abstract base class for RNN cells

ResidualCell(base_cell)

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144).

SequentialRNNCell()

Sequentially stacking multiple RNN cells.

VariationalDropoutCell(base_cell[, …])

Applies Variational Dropout on base cell.

ZoneoutCell(base_cell[, zoneout_outputs, …])

Applies Zoneout on base cell.

class BidirectionalCell(l_cell, r_cell)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Bidirectional RNN cell.

Parameters

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state(**kwargs)

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class Conv1DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell

1D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv1DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell

1D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv1DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell

1D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv2DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell

2D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv2DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell

2D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv2DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell

2D Convolutional RNN cell.

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv3DGRUCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell

3D Convolutional Gated Rectified Unit (GRU) network cell.

\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv3DLSTMCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell

3D Convolutional LSTM network cell.

“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015

\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class Conv3DRNNCell(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]

Bases: mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell

3D Convolutional RNN cells

\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]
Parameters
  • input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).

  • hidden_channels (int) – Number of output channels.

  • i2h_kernel (int or tuple of int) – Input convolution kernel sizes.

  • h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.

  • i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.

  • i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.

  • h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.

  • i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.

  • h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.

  • conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.

  • activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See Activation() for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.

class DropoutCell(rate, axes=())[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Applies dropout on input.

Parameters
  • rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.

  • axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

Inputs:
  • data: input tensor with shape (batch_size, size).

  • states: a list of recurrent state tensors.

Outputs:
  • out: output tensor with shape (batch_size, size).

  • next_states: returns input states directly.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class GRU(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters
  • hidden_size (int) – The number of features in the hidden state h

  • num_layers (int, default 1) – Number of recurrent layers.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, h0)
class GRUCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).

Each call computes the following function:

\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.

Parameters
  • hidden_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.

  • recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.

Inputs:
  • data: input tensor with shape (batch_size, input_size).

  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
  • out: output tensor with shape (batch_size, num_hidden).

  • next_states: a list of one output recurrent state tensor with the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class HybridRecurrentCell[source]

Bases: mxnet.gluon.rnn.rnn_cell.RecurrentCell, mxnet.gluon.block.HybridBlock

HybridRecurrentCell supports hybridize.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(x, *args, **kwargs)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(x, *args, **kwargs)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class HybridSequentialRNNCell[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Sequentially stacking multiple HybridRNN cells.

Methods

add(cell)

Appends a cell into the stack.

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state(**kwargs)

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(_, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

add(cell)[source]

Appends a cell into the stack.

Parameters

cell (RecurrentCell) – The cell to add.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(_, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class LSTM(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters
  • hidden_size (int) – The number of features in the hidden state h.

  • num_layers (int, default 1) – Number of recurrent layers.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • projection_size (int, default None) – The number of features after projection.

  • h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.

  • state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.

  • state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.

  • state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> c0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])
class LSTMCell(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Long-Short Term Memory (LSTM) network cell.

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters
  • hidden_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.

  • recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.

  • Inputs

    • data: input tensor with shape (batch_size, input_size).

    • states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).

  • Outputs

    • out: output tensor with shape (batch_size, num_hidden).

    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class LSTMPCell(hidden_size, projection_size, i2h_weight_initializer=None, h2h_weight_initializer=None, h2r_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Long-Short Term Memory Projected (LSTMP) network cell. (https://arxiv.org/abs/1402.1128)

Each call computes the following function:

\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ r_t = W_{hr} h_t \end{array}\end{split}\]

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

where \(r_t\) is the projected recurrent activation at time t, \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.

Parameters
  • hidden_size (int) – Number of units in cell state symbol.

  • projection_size (int) – Number of units in output symbol.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the hidden state.

  • h2r_weight_initializer (str or Initializer) – Initializer for the projection weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • Inputs

    • data: input tensor with shape (batch_size, input_size).

    • states: a list of two initial recurrent state tensors, with shape (batch_size, projection_size) and (batch_size, hidden_size) respectively.

  • Outputs

    • out: output tensor with shape (batch_size, num_hidden).

    • next_states: a list of two output recurrent state tensors. Each has the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class ModifierCell(base_cell)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.

After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Return an attribute of instance, which is of type owner.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(*args)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class RNN(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]

Bases: mxnet.gluon.rnn.rnn_layer._RNNLayer

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

For each element in the input sequence, each layer computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters
  • hidden_size (int) – The number of features in the hidden state h.

  • num_layers (int, default 1) – Number of recurrent layers.

  • activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.

  • layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.

  • dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

  • bidirectional (bool, default False) – If True, becomes a bidirectional RNN.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

  • dtype (str, default 'float32') – Type to initialize the parameters and default states to

Inputs:
  • data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.

  • states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.

Outputs:
  • out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)

  • out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.

Examples

>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.np.random.uniform(size=(5, 3, 10))
>>> # by default zeros are used as begin state
>>> output = layer(input)
>>> # manually specify begin state.
>>> h0 = mx.np.random.uniform(size=(3, 3, 100))
>>> output, hn = layer(input, h0)
class RNNCell(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell

Elman RNN recurrent neural network cell.

Each call computes the following function:

\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.

Parameters
  • hidden_size (int) – Number of units in output symbol

  • activation (str or Symbol, default 'tanh') – Type of activation function.

  • i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.

  • h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.

  • i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.

  • input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.

Inputs:
  • data: input tensor with shape (batch_size, input_size).

  • states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).

Outputs:
  • out: output tensor with shape (batch_size, num_hidden).

  • next_states: a list of one output recurrent state tensor with the same shape as states.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class RecurrentCell[source]

Bases: mxnet.gluon.block.Block

Abstract base class for RNN cells

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state([batch_size, func])

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(batch_size=0, func=<function zeros>, **kwargs)[source]

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class ResidualCell(base_cell)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Return an attribute of instance, which is of type owner.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class SequentialRNNCell[source]

Bases: mxnet.gluon.rnn.rnn_cell.RecurrentCell

Sequentially stacking multiple RNN cells.

Methods

add(cell)

Appends a cell into the stack.

apply(fn)

Applies fn recursively to every child block as well as self.

begin_state(**kwargs)

Initial state for this cell.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

forward(*args, **kwargs)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

state_info([batch_size])

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

add(cell)[source]

Appends a cell into the stack.

Parameters

cell (RecurrentCell) – The cell to add.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

begin_state(**kwargs)[source]

Initial state for this cell.

Parameters
  • func (callable, default symbol.zeros) –

    Function for creating initial state.

    For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.

    For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.

  • batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.

  • **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.

Returns

states – Starting states for the first RNN step.

Return type

nested list of Symbol

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

forward(*args, **kwargs)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

state_info(batch_size=0)[source]

shape and layout information of states

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class VariationalDropoutCell(base_cell, drop_inputs=0.0, drop_states=0.0, drop_outputs=0.0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Applies Variational Dropout on base cell. https://arxiv.org/pdf/1512.05287.pdf

Variational dropout uses the same dropout mask across time-steps. It can be applied to RNN inputs, outputs, and states. The masks for them are not shared.

The dropout mask is initialized when stepping forward for the first time and will remain the same until .reset() is called. Thus, if using the cell and stepping manually without calling .unroll(), the .reset() should be called after each sequence.

Parameters
  • base_cell (RecurrentCell) – The cell on which to perform variational dropout.

  • drop_inputs (float, default 0.) – The dropout rate for inputs. Won’t apply dropout if it equals 0.

  • drop_states (float, default 0.) – The dropout rate for state inputs on the first state channel. Won’t apply dropout if it equals 0.

  • drop_outputs (float, default 0.) – The dropout rate for outputs. Won’t apply dropout if it equals 0.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Return an attribute of instance, which is of type owner.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.

class ZoneoutCell(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]

Bases: mxnet.gluon.rnn.rnn_cell.ModifierCell

Applies Zoneout on base cell.

Methods

apply(fn)

Applies fn recursively to every child block as well as self.

cast(dtype)

Cast this Block to use another data type.

collect_params([select])

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

export(path[, epoch, remove_amp_cast])

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

forward(inputs, states)

Unrolls the recurrent cell for one time step.

hybridize([active])

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize([init, device, verbose, force_reinit])

Initializes Parameter s of this Block and its children.

load(prefix)

Load a model saved using the save API

load_dict(param_dict[, device, …])

Load parameters from dict

load_parameters(filename[, device, …])

Load parameters from file previously saved by save_parameters.

optimize_for(x, *args[, backend, clear, …])

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.

register_child(block[, name])

Registers block as a child of self.

register_forward_hook(hook)

Registers a forward hook on the block.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

register_op_hook(callback[, monitor_all])

Install callback monitor.

reset()

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated.

reset_device(device)

Re-assign all Parameters to other devices.

save(prefix)

Save the model architecture and parameters to load again later

save_parameters(filename[, deduplicate])

Save parameters to file.

setattr(name, value)

Set an attribute to a new value for all Parameters.

share_parameters(shared)

Share parameters recursively inside the model.

summary(*inputs)

Print the summary of the model’s output and parameters.

unroll(length, inputs[, begin_state, …])

Unrolls an RNN cell across time steps.

zero_grad()

Sets all Parameters’ gradient buffer to 0.

Attributes

params

Return an attribute of instance, which is of type owner.

apply(fn)

Applies fn recursively to every child block as well as self.

Parameters

fn (callable) – Function to be applied to each submodule, of form fn(block).

Returns

Return type

this block

cast(dtype)

Cast this Block to use another data type.

Parameters

dtype (str or numpy.dtype) – The new data type.

collect_params(select=None)

Returns a Dict containing this Block and all of its children’s Parameters(default), also can returns the select Dict which match some given regular expressions.

For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:

model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters

select (str) – regular expressions

Returns

Return type

The selected Dict

export(path, epoch=0, remove_amp_cast=True)

Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.

Note

When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.

Parameters
  • path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.

  • epoch (int) – Epoch number of saved model.

  • remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.

Returns

  • symbol_filename (str) – Filename to which model symbols were saved, including path prefix.

  • params_filename (str) – Filename to which model parameters were saved, including path prefix.

forward(inputs, states)[source]

Unrolls the recurrent cell for one time step.

Parameters
  • inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).

  • states (list of sym.Variable) – RNN state from previous step or the output of begin_state().

Returns

  • output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.

See also

begin_state()

This function can provide the states for the first time step.

unroll()

This function unrolls an RNN for a given number of (>=1) time steps.

hybridize(active=True, **kwargs)

Please refer description of HybridBlock hybridize().

infer_shape(i, x, is_bidirect)[source]

Infers shape of Parameters from inputs.

infer_type(*args)

Infers data type of Parameters from inputs.

initialize(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)

Initializes Parameter s of this Block and its children.

Parameters
  • init (Initializer) – Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

  • device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).

  • verbose (bool, default False) – Whether to verbosely print out details on initialization.

  • force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.

load(prefix)

Load a model saved using the save API

Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.

This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.

Parameters

prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params

load_dict(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from dict

Parameters
  • param_dict (dict) – Dictionary containing model parameters

  • device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

load_parameters(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')

Load parameters from file previously saved by save_parameters.

Parameters
  • filename (str) – Path to parameter file.

  • device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.

  • allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.

  • ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.

  • cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.

  • dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters

References

Saving and Loading Gluon Models

optimize_for(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)

Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.

Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.

Examples

# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)

# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)

Parameters
  • x (NDArray) – first input to model

  • *args (NDArray) – other inputs to model

  • backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None

  • backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

  • clear (bool, default False) – clears any previous optimizations

  • partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists

  • static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.

  • static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.

  • inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.

  • forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.

  • backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.

  • **kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty

property params

Return an attribute of instance, which is of type owner.

register_child(block, name=None)

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

register_forward_hook(hook)

Registers a forward hook on the block.

The hook function is called immediately after forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input, output) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_forward_pre_hook(hook)

Registers a forward pre-hook on the block.

The hook function is called immediately before forward(). It should not modify the input or output.

Parameters

hook (callable) – The forward hook function of form hook(block, input) -> None.

Returns

Return type

mxnet.gluon.utils.HookHandle

register_op_hook(callback, monitor_all=False)

Install callback monitor.

Parameters
  • callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).

  • monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.

reset()[source]

Reset before re-using the cell for another graph.

reset_ctx(ctx)

This function has been deprecated. Please refer to Block.reset_device.

reset_device(device)

Re-assign all Parameters to other devices.

Parameters

device (Device or list of Device, default device.current_device().) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.

save(prefix)

Save the model architecture and parameters to load again later

Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.

Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.

Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.

Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.

For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.

Parameters

prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params

save_parameters(filename, deduplicate=False)

Save parameters to file.

Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use HybridBlock.export().

Parameters
  • filename (str) – Path to file.

  • deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.

References

Saving and Loading Gluon Models

setattr(name, value)

Set an attribute to a new value for all Parameters.

For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.setattr('grad_req', 'null')

or change the learning rate multiplier:

model.setattr('lr_mult', 0.5)
Parameters
  • name (str) – Name of the attribute.

  • value (valid type for attribute name) – The new value for the attribute.

share_parameters(shared)

Share parameters recursively inside the model.

For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20)
dense1.share_parameters(dense0.collect_params())
which equals to

dense1.weight = dense0.weight dense1.bias = dense0.bias

Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.

Parameters

shared (Dict) – Dict of the shared parameters.

Returns

Return type

this block

summary(*inputs)

Print the summary of the model’s output and parameters.

The network must have been initialized, and must not have been hybridized.

Parameters

inputs (object) – Any input that the model supports. For any tensor in the input, only mxnet.ndarray.NDArray is supported.

unroll(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)

Unrolls an RNN cell across time steps.

Parameters
  • length (int) – Number of steps to unroll.

  • inputs (Symbol, list of Symbol, or None) –

    If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.

    If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).

  • begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.

  • layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.

  • merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.

  • valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.

Returns

  • outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.

  • states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().

zero_grad()

Sets all Parameters’ gradient buffer to 0.