gluon.rnn¶
Build-in recurrent neural network layers are provided in the following two modules:
Recurrent neural network module. |
Recurrent Cells¶
Long-Short Term Memory (LSTM) network cell. |
|
Gated Rectified Unit (GRU) network cell. |
|
Abstract base class for RNN cells |
|
Long-Short Term Memory Projected (LSTMP) network cell. |
|
Sequentially stacking multiple RNN cells. |
|
Bidirectional RNN cell. |
|
Applies dropout on input. |
|
Applies Variational Dropout on base cell. |
|
Applies Zoneout on base cell. |
|
Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). |
Convolutional Recurrent Cells¶
1D Convolutional LSTM network cell. |
|
2D Convolutional LSTM network cell. |
|
3D Convolutional LSTM network cell. |
|
1D Convolutional Gated Rectified Unit (GRU) network cell. |
|
2D Convolutional Gated Rectified Unit (GRU) network cell. |
|
3D Convolutional Gated Rectified Unit (GRU) network cell. |
|
1D Convolutional RNN cell. |
|
2D Convolutional RNN cell. |
|
3D Convolutional RNN cells |
Recurrent Layers¶
Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. |
|
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. |
|
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. |
API Reference¶
Recurrent neural network module.
Classes
|
Bidirectional RNN cell. |
|
1D Convolutional Gated Rectified Unit (GRU) network cell. |
|
1D Convolutional LSTM network cell. |
|
1D Convolutional RNN cell. |
|
2D Convolutional Gated Rectified Unit (GRU) network cell. |
|
2D Convolutional LSTM network cell. |
|
2D Convolutional RNN cell. |
|
3D Convolutional Gated Rectified Unit (GRU) network cell. |
|
3D Convolutional LSTM network cell. |
|
3D Convolutional RNN cells |
|
Applies dropout on input. |
|
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. |
|
Gated Rectified Unit (GRU) network cell. |
HybridRecurrentCell supports hybridize. |
|
Sequentially stacking multiple HybridRNN cells. |
|
|
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. |
|
Long-Short Term Memory (LSTM) network cell. |
|
Long-Short Term Memory Projected (LSTMP) network cell. |
|
Base class for modifier cells. |
|
Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. |
|
Elman RNN recurrent neural network cell. |
Abstract base class for RNN cells |
|
|
Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). |
Sequentially stacking multiple RNN cells. |
|
|
Applies Variational Dropout on base cell. |
|
Applies Zoneout on base cell. |
-
class
BidirectionalCell
(l_cell, r_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Bidirectional RNN cell.
- Parameters
l_cell (RecurrentCell) – Cell for forward unrolling
r_cell (RecurrentCell) – Cell for backward unrolling
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
(**kwargs)Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(**kwargs)[source]¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
Conv1DGRUCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell
1D Convolutional Gated Rectified Unit (GRU) network cell.
\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv1DLSTMCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell
1D Convolutional LSTM network cell.
“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015
\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv1DRNNCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, ), i2h_dilate=(1, ), h2h_dilate=(1, ), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell
1D Convolutional RNN cell.
\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCW’ the shape should be (C, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0,)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1,)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1,)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCW’ and ‘NWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv2DGRUCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell
2D Convolutional Gated Rectified Unit (GRU) network cell.
\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv2DLSTMCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell
2D Convolutional LSTM network cell.
“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015
\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv2DRNNCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0), i2h_dilate=(1, 1), h2h_dilate=(1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell
2D Convolutional RNN cell.
\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCHW’ the shape should be (C, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCHW’ and ‘NHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv3DGRUCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvGRUCell
3D Convolutional Gated Rectified Unit (GRU) network cell.
\[\begin{split}\begin{array}{ll} r_t = \sigma(W_r \ast x_t + R_r \ast h_{t-1} + b_r) \\ z_t = \sigma(W_z \ast x_t + R_z \ast h_{t-1} + b_z) \\ n_t = tanh(W_i \ast x_t + b_i + r_t \circ (R_n \ast h_{t-1} + b_n)) \\ h^\prime_t = (1 - z_t) \circ n_t + z_t \circ h \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in n_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv3DLSTMCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvLSTMCell
3D Convolutional LSTM network cell.
“Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting” paper. Xingjian et al. NIPS2015
\[\begin{split}\begin{array}{ll} i_t = \sigma(W_i \ast x_t + R_i \ast h_{t-1} + b_i) \\ f_t = \sigma(W_f \ast x_t + R_f \ast h_{t-1} + b_f) \\ o_t = \sigma(W_o \ast x_t + R_o \ast h_{t-1} + b_o) \\ c^\prime_t = tanh(W_c \ast x_t + R_c \ast h_{t-1} + b_c) \\ c_t = f_t \circ c_{t-1} + i_t \circ c^\prime_t \\ h_t = o_t \circ tanh(c_t) \\ \end{array}\end{split}\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function used in c^prime_t. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
Conv3DRNNCell
(input_shape, hidden_channels, i2h_kernel, h2h_kernel, i2h_pad=(0, 0, 0), i2h_dilate=(1, 1, 1), h2h_dilate=(1, 1, 1), i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', conv_layout='NCDHW', activation='tanh')[source]¶ Bases:
mxnet.gluon.rnn.conv_rnn_cell._ConvRNNCell
3D Convolutional RNN cells
\[h_t = tanh(W_i \ast x_t + R_i \ast h_{t-1} + b_i)\]- Parameters
input_shape (tuple of int) – Input tensor shape at each time step for each sample, excluding dimension of the batch size and sequence length. Must be consistent with conv_layout. For example, for layout ‘NCDHW’ the shape should be (C, D, H, W).
hidden_channels (int) – Number of output channels.
i2h_kernel (int or tuple of int) – Input convolution kernel sizes.
h2h_kernel (int or tuple of int) – Recurrent convolution kernel sizes. Only odd-numbered sizes are supported.
i2h_pad (int or tuple of int, default (0, 0, 0)) – Pad for input convolution.
i2h_dilate (int or tuple of int, default (1, 1, 1)) – Input convolution dilate.
h2h_dilate (int or tuple of int, default (1, 1, 1)) – Recurrent convolution dilate.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the input convolutions.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the input convolutions.
i2h_bias_initializer (str or Initializer, default zeros) – Initializer for the input convolution bias vectors.
h2h_bias_initializer (str or Initializer, default zeros) – Initializer for the recurrent convolution bias vectors.
conv_layout (str, default 'NCDHW') – Layout for all convolution inputs, outputs and weights. Options are ‘NCDHW’ and ‘NDHWC’.
activation (str or gluon.Block, default 'tanh') – Type of activation function. If argument type is string, it’s equivalent to nn.Activation(act_type=str). See
Activation()
for available choices. Alternatively, other activation blocks such as nn.LeakyReLU can be used.
-
class
DropoutCell
(rate, axes=())[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Applies dropout on input.
- Parameters
rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.
axes (tuple of int, default ()) – The axes on which dropout mask is shared. If empty, regular dropout is applied.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).- Inputs:
data: input tensor with shape (batch_size, size).
states: a list of recurrent state tensors.
- Outputs:
out: output tensor with shape (batch_size, size).
next_states: returns input states directly.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
GRU
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.
- Parameters
hidden_size (int) – The number of features in the hidden state h
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
- Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.GRU(100, 3) >>> layer.initialize() >>> input = mx.np.random.uniform(size=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.np.random.uniform(size=(3, 3, 100)) >>> output, hn = layer(input, h0)
-
class
GRUCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014; the reset gate \(r_t\) is applied after matrix multiplication).
Each call computes the following function:
\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)} + b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.
- Parameters
hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.
recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.
- Inputs:
data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
- Outputs:
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
HybridRecurrentCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.RecurrentCell
,mxnet.gluon.block.HybridBlock
HybridRecurrentCell supports hybridize.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(x, *args, **kwargs)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(x, *args, **kwargs)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
state_info
(batch_size=0)¶ shape and layout information of states
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
HybridSequentialRNNCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Sequentially stacking multiple HybridRNN cells.
Methods
add
(cell)Appends a cell into the stack.
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
(**kwargs)Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(_, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
add
(cell)[source]¶ Appends a cell into the stack.
- Parameters
cell (RecurrentCell) – The cell to add.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(**kwargs)[source]¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
LSTM
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', projection_size=None, h2r_weight_initializer=None, state_clip_min=None, state_clip_max=None, state_clip_nan=False, dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
- Parameters
hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
projection_size (int, default None) – The number of features after projection.
h2r_weight_initializer (str or Initializer, default None) – Initializer for the projected recurrent weights matrix, used for the linear transformation of the recurrent state to the projected space.
state_clip_min (float or None, default None) – Minimum clip value of LSTM states. This option must be used together with state_clip_max. If None, clipping is not applied.
state_clip_max (float or None, default None) – Maximum clip value of LSTM states. This option must be used together with state_clip_min. If None, clipping is not applied.
state_clip_nan (boolean, default False) – Whether to stop NaN from propagating in state by clipping it to min/max. If the clipping range is not specified, this option is ignored.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: a list of two initial recurrent state tensors. Each has shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
- Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: a list of two output recurrent state tensors with the same shape as in states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.LSTM(100, 3) >>> layer.initialize() >>> input = mx.np.random.uniform(size=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.np.random.uniform(size=(3, 3, 100)) >>> c0 = mx.np.random.uniform(size=(3, 3, 100)) >>> output, hn = layer(input, [h0, c0])
-
class
LSTMCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, activation='tanh', recurrent_activation='sigmoid')[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Long-Short Term Memory (LSTM) network cell.
Each call computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
- Parameters
hidden_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
activation (str, default 'tanh') – Activation type to use. See nd/symbol Activation for supported types.
recurrent_activation (str, default 'sigmoid') – Activation type to use for the recurrent step. See nd/symbol Activation for supported types.
Inputs –
data: input tensor with shape (batch_size, input_size).
states: a list of two initial recurrent state tensors. Each has shape (batch_size, num_hidden).
Outputs –
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of two output recurrent state tensors. Each has the same shape as states.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
LSTMPCell
(hidden_size, projection_size, i2h_weight_initializer=None, h2h_weight_initializer=None, h2r_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Long-Short Term Memory Projected (LSTMP) network cell. (https://arxiv.org/abs/1402.1128)
Each call computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{ri} r_{(t-1)} + b_{ri}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{rf} r_{(t-1)} + b_{rf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{rc} r_{(t-1)} + b_{rg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ro} r_{(t-1)} + b_{ro}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ r_t = W_{hr} h_t \end{array}\end{split}\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).where \(r_t\) is the projected recurrent activation at time t, \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
- Parameters
hidden_size (int) – Number of units in cell state symbol.
projection_size (int) – Number of units in output symbol.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the hidden state.
h2r_weight_initializer (str or Initializer) – Initializer for the projection weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
Inputs –
data: input tensor with shape (batch_size, input_size).
states: a list of two initial recurrent state tensors, with shape (batch_size, projection_size) and (batch_size, hidden_size) respectively.
Outputs –
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of two output recurrent state tensors. Each has the same shape as states.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
ModifierCell
(base_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Base class for modifier cells. A modifier cell takes a base cell, apply modifications on it (e.g. Zoneout), and returns a new cell.
After applying modifiers the base cell should no longer be called directly. The modifier cell should be used instead.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(*args)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Return an attribute of instance, which is of type owner.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(func=<function zeros>, **kwargs)[source]¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_shape
(*args)¶ Infers shape of Parameters from inputs.
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Return an attribute of instance, which is of type owner.
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
RNN
(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, dtype='float32', **kwargs)[source]¶ Bases:
mxnet.gluon.rnn.rnn_layer._RNNLayer
Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]where \(h_t\) is the hidden state at time t, and \(x_t\) is the output of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.
- Parameters
hidden_size (int) – The number of features in the hidden state h.
num_layers (int, default 1) – Number of recurrent layers.
activation ({'relu' or 'tanh'}, default 'relu') – The activation function to use.
layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
dtype (str, default 'float32') – Type to initialize the parameters and default states to
- Inputs:
data: input tensor with shape (sequence_length, batch_size, input_size) when layout is “TNC”. For other layouts, dimensions are permuted accordingly using transpose() operator which adds performance overhead. Consider creating batches in TNC layout during data batching step.
states: initial recurrent state tensor with shape (num_layers, batch_size, num_hidden). If bidirectional is True, shape will instead be (2*num_layers, batch_size, num_hidden). If states is None, zeros will be used as default begin states.
- Outputs:
out: output tensor with shape (sequence_length, batch_size, num_hidden) when layout is “TNC”. If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
out_states: output recurrent state tensor with the same shape as states. If states is None out_states will not be returned.
Examples
>>> layer = mx.gluon.rnn.RNN(100, 3) >>> layer.initialize() >>> input = mx.np.random.uniform(size=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.np.random.uniform(size=(3, 3, 100)) >>> output, hn = layer(input, h0)
-
class
RNNCell
(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.HybridRecurrentCell
Elman RNN recurrent neural network cell.
Each call computes the following function:
\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.
- Parameters
hidden_size (int) – Number of units in output symbol
activation (str or Symbol, default 'tanh') – Type of activation function.
i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
i2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
h2h_bias_initializer (str or Initializer, default 'zeros') – Initializer for the bias vector.
input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- Inputs:
data: input tensor with shape (batch_size, input_size).
states: a list of one initial recurrent state tensor with shape (batch_size, num_hidden).
- Outputs:
out: output tensor with shape (batch_size, num_hidden).
next_states: a list of one output recurrent state tensor with the same shape as states.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
RecurrentCell
[source]¶ Bases:
mxnet.gluon.block.Block
Abstract base class for RNN cells
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
([batch_size, func])Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(batch_size=0, func=<function zeros>, **kwargs)[source]¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
ResidualCell
(base_cell)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.ModifierCell
Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Return an attribute of instance, which is of type owner.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Return an attribute of instance, which is of type owner.
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
SequentialRNNCell
[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.RecurrentCell
Sequentially stacking multiple RNN cells.
Methods
add
(cell)Appends a cell into the stack.
apply
(fn)Applies
fn
recursively to every child block as well as self.begin_state
(**kwargs)Initial state for this cell.
cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.forward
(*args, **kwargs)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
state_info
([batch_size])shape and layout information of states
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).-
add
(cell)[source]¶ Appends a cell into the stack.
- Parameters
cell (RecurrentCell) – The cell to add.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
begin_state
(**kwargs)[source]¶ Initial state for this cell.
- Parameters
func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
**kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
- Returns
states – Starting states for the first RNN step.
- Return type
nested list of Symbol
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
forward
(*args, **kwargs)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
property
params
¶ Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset
()¶ Reset before re-using the cell for another graph.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
-
class
VariationalDropoutCell
(base_cell, drop_inputs=0.0, drop_states=0.0, drop_outputs=0.0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.ModifierCell
Applies Variational Dropout on base cell. https://arxiv.org/pdf/1512.05287.pdf
Variational dropout uses the same dropout mask across time-steps. It can be applied to RNN inputs, outputs, and states. The masks for them are not shared.
The dropout mask is initialized when stepping forward for the first time and will remain the same until .reset() is called. Thus, if using the cell and stepping manually without calling .unroll(), the .reset() should be called after each sequence.
- Parameters
base_cell (RecurrentCell) – The cell on which to perform variational dropout.
drop_inputs (float, default 0.) – The dropout rate for inputs. Won’t apply dropout if it equals 0.
drop_states (float, default 0.) – The dropout rate for state inputs on the first state channel. Won’t apply dropout if it equals 0.
drop_outputs (float, default 0.) – The dropout rate for outputs. Won’t apply dropout if it equals 0.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Return an attribute of instance, which is of type owner.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Return an attribute of instance, which is of type owner.
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)[source]¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-
class
ZoneoutCell
(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]¶ Bases:
mxnet.gluon.rnn.rnn_cell.ModifierCell
Applies Zoneout on base cell.
Methods
apply
(fn)Applies
fn
recursively to every child block as well as self.cast
(dtype)Cast this Block to use another data type.
collect_params
([select])Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.export
(path[, epoch, remove_amp_cast])Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
forward
(inputs, states)Unrolls the recurrent cell for one time step.
hybridize
([active])Please refer description of HybridBlock hybridize().
infer_shape
(i, x, is_bidirect)Infers shape of Parameters from inputs.
infer_type
(*args)Infers data type of Parameters from inputs.
initialize
([init, device, verbose, force_reinit])Initializes
Parameter
s of thisBlock
and its children.load
(prefix)Load a model saved using the save API
load_dict
(param_dict[, device, …])Load parameters from dict
load_parameters
(filename[, device, …])Load parameters from file previously saved by save_parameters.
optimize_for
(x, *args[, backend, clear, …])Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass.
register_child
(block[, name])Registers block as a child of self.
register_forward_hook
(hook)Registers a forward hook on the block.
Registers a forward pre-hook on the block.
register_op_hook
(callback[, monitor_all])Install callback monitor.
reset
()Reset before re-using the cell for another graph.
reset_ctx
(ctx)This function has been deprecated.
reset_device
(device)Re-assign all Parameters to other devices.
save
(prefix)Save the model architecture and parameters to load again later
save_parameters
(filename[, deduplicate])Save parameters to file.
setattr
(name, value)Set an attribute to a new value for all Parameters.
share_parameters
(shared)Share parameters recursively inside the model.
summary
(*inputs)Print the summary of the model’s output and parameters.
unroll
(length, inputs[, begin_state, …])Unrolls an RNN cell across time steps.
Sets all Parameters’ gradient buffer to 0.
Attributes
Return an attribute of instance, which is of type owner.
-
apply
(fn)¶ Applies
fn
recursively to every child block as well as self.- Parameters
fn (callable) – Function to be applied to each submodule, of form fn(block).
- Returns
- Return type
this block
-
cast
(dtype)¶ Cast this Block to use another data type.
- Parameters
dtype (str or numpy.dtype) – The new data type.
-
collect_params
(select=None)¶ Returns a
Dict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectDict
which match some given regular expressions.For example, collect the specified parameters in [‘conv1.weight’, ‘conv1.bias’, ‘fc.weight’, ‘fc.bias’]:
model.collect_params('conv1.weight|conv1.bias|fc.weight|fc.bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
- Parameters
select (str) – regular expressions
- Returns
- Return type
The selected
Dict
-
export
(path, epoch=0, remove_amp_cast=True)¶ Export HybridBlock to json format that can be loaded by gluon.SymbolBlock.imports or the C++ interface.
Note
When there are only one input, it will have name data. When there Are more than one inputs, they will be named as data0, data1, etc.
- Parameters
path (str or None) – Path to save model. Two files path-symbol.json and path-xxxx.params will be created, where xxxx is the 4 digits epoch number. If None, do not export to file but return Python Symbol object and corresponding dictionary of parameters.
epoch (int) – Epoch number of saved model.
remove_amp_cast (bool, optional) – Whether to remove the amp_cast and amp_multicast operators, before saving the model.
- Returns
symbol_filename (str) – Filename to which model symbols were saved, including path prefix.
params_filename (str) – Filename to which model parameters were saved, including path prefix.
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
- Parameters
inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
- Returns
output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
This function can provide the states for the first time step.
unroll()
This function unrolls an RNN for a given number of (>=1) time steps.
-
hybridize
(active=True, **kwargs)¶ Please refer description of HybridBlock hybridize().
-
infer_type
(*args)¶ Infers data type of Parameters from inputs.
-
initialize
(init=<mxnet.initializer.Uniform object>, device=None, verbose=False, force_reinit=False)¶ Initializes
Parameter
s of thisBlock
and its children.- Parameters
init (Initializer) – Global default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.device (Device or list of Device) – Keeps a copy of Parameters on one or many device(s).
verbose (bool, default False) – Whether to verbosely print out details on initialization.
force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
load
(prefix)¶ Load a model saved using the save API
Reconfigures a model using the saved configuration. This function does not regenerate the model architecture. It resets each Block’s parameter UUIDs as they were when saved in order to match the names of the saved parameters.
This function assumes the Blocks in the model were created in the same order they were when the model was saved. This is because each Block is uniquely identified by Block class name and a unique ID in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph (Symbol & inputs) and settings are restored if it had been hybridized before saving.
- Parameters
prefix (str) – The prefix to use in filenames for loading this model: <prefix>-model.json and <prefix>-model.params
-
load_dict
(param_dict, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from dict
- Parameters
param_dict (dict) – Dictionary containing model parameters
device (Device, optional) – Device context on which the memory is allocated. Default is mxnet.device.current_device().
allow_missing (bool, default False) – Whether to silently skip loading parameters not represented in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this dict.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
-
load_parameters
(filename, device=None, allow_missing=False, ignore_extra=False, cast_dtype=False, dtype_source='current')¶ Load parameters from file previously saved by save_parameters.
- Parameters
filename (str) – Path to parameter file.
device (Device or list of Device, default cpu()) – Device(s) to initialize loaded parameters on.
allow_missing (bool, default False) – Whether to silently skip loading parameters not represents in the file.
ignore_extra (bool, default False) – Whether to silently ignore parameters from the file that are not present in this Block.
cast_dtype (bool, default False) – Cast the data type of the NDArray loaded from the checkpoint to the dtype provided by the Parameter if any.
dtype_source (str, default 'current') – must be in {‘current’, ‘saved’} Only valid if cast_dtype=True, specify the source of the dtype for casting the parameters
References
-
optimize_for
(x, *args, backend=None, clear=False, partition_if_dynamic=True, static_alloc=False, static_shape=False, inline_limit=2, forward_bulk_size=None, backward_bulk_size=None, **kwargs)¶ Partitions the current HybridBlock and optimizes it for a given backend without executing a forward pass. Modifies the HybridBlock in-place.
Immediately partitions a HybridBlock using the specified backend. Combines the work done in the hybridize API with part of the work done in the forward pass without calling the CachedOp. Can be used in place of hybridize, afterwards export can be called or inference can be run. See README.md in example/extensions/lib_subgraph/README.md for more details.
Examples
# partition and then export to file block.optimize_for(x, backend=’myPart’) block.export(‘partitioned’)
# partition and then run inference block.optimize_for(x, backend=’myPart’) block(x)
- Parameters
x (NDArray) – first input to model
*args (NDArray) – other inputs to model
backend (str) – The name of backend, as registered in SubgraphBackendRegistry, default None
backend_opts (dict of user-specified options to pass to the backend for partitioning, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
clear (bool, default False) – clears any previous optimizations
partition_if_dynamic (bool, default False) – whether to partition the graph when dynamic shape op exists
static_alloc (bool, default False) – Statically allocate memory to improve speed. Memory usage may increase.
static_shape (bool, default False) – Optimize for invariant input shapes between iterations. Must also set static_alloc to True. Change of input shapes is still allowed but slower.
inline_limit (optional int, default 2) – Maximum number of operators that can be inlined.
forward_bulk_size (optional int, default None) – Segment size of bulk execution during forward pass.
backward_bulk_size (optional int, default None) – Segment size of bulk execution during backward pass.
**kwargs (The backend options, optional) – Passed on to PrePartition and PostPartition functions of SubgraphProperty
-
property
params
¶ Return an attribute of instance, which is of type owner.
-
register_child
(block, name=None)¶ Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
-
register_forward_hook
(hook)¶ Registers a forward hook on the block.
The hook function is called immediately after
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input, output) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_forward_pre_hook
(hook)¶ Registers a forward pre-hook on the block.
The hook function is called immediately before
forward()
. It should not modify the input or output.- Parameters
hook (callable) – The forward hook function of form hook(block, input) -> None.
- Returns
- Return type
mxnet.gluon.utils.HookHandle
-
register_op_hook
(callback, monitor_all=False)¶ Install callback monitor.
- Parameters
callback (function) – Function called to inspect the values of the intermediate outputs of blocks after hybridization. It takes 3 parameters: name of the tensor being inspected (str) name of the operator producing or consuming that tensor (str) tensor being inspected (NDArray).
monitor_all (bool, default False) – If True, monitor both input and output, otherwise monitor output only.
-
reset_ctx
(ctx)¶ This function has been deprecated. Please refer to
Block.reset_device
.
-
reset_device
(device)¶ Re-assign all Parameters to other devices.
- Parameters
device (Device or list of Device, default
device.current_device()
.) – Assign Parameter to given device. If device is a list of Device, a copy will be made for each device.
-
save
(prefix)¶ Save the model architecture and parameters to load again later
Saves the model architecture as a nested dictionary where each Block in the model is a dictionary and its children are sub-dictionaries.
Each Block is uniquely identified by Block class name and a unique ID. We save each Block’s parameter UUID to restore later in order to match the saved parameters.
Recursively traverses a Block’s children in order (since its an OrderedDict) and uses the unique ID to denote that specific Block.
Assumes that the model is created in an identical order every time. If the model is not able to be recreated deterministically do not use this set of APIs to save/load your model.
For HybridBlocks, the cached_graph is saved (Symbol & inputs) if it has already been hybridized.
- Parameters
prefix (str) – The prefix to use in filenames for saving this model: <prefix>-model.json and <prefix>-model.params
-
save_parameters
(filename, deduplicate=False)¶ Save parameters to file.
Saved parameters can only be loaded with load_parameters. Note that this method only saves parameters, not model structure. If you want to save model structures, please use
HybridBlock.export()
.- Parameters
filename (str) – Path to file.
deduplicate (bool, default False) – If True, save shared parameters only once. Otherwise, if a Block contains multiple sub-blocks that share parameters, each of the shared parameters will be separately saved for every sub-block.
References
-
setattr
(name, value)¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.setattr('grad_req', 'null')
or change the learning rate multiplier:
model.setattr('lr_mult', 0.5)
- Parameters
name (str) – Name of the attribute.
value (valid type for attribute name) – The new value for the attribute.
Share parameters recursively inside the model.
For example, if you want
dense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20) dense1.share_parameters(dense0.collect_params())
- which equals to
dense1.weight = dense0.weight dense1.bias = dense0.bias
Note that unlike the load_parameters or load_dict functions, share_parameters results in the Parameter object being shared (or tied) between the models, whereas load_parameters or load_dict only set the value of the data dictionary of a model. If you call load_parameters or load_dict after share_parameters, the loaded value will be reflected in all networks that use the shared (or tied) Parameter object.
- Parameters
shared (Dict) – Dict of the shared parameters.
- Returns
- Return type
this block
-
summary
(*inputs)¶ Print the summary of the model’s output and parameters.
The network must have been initialized, and must not have been hybridized.
- Parameters
inputs (object) – Any input that the model supports. For any tensor in the input, only
mxnet.ndarray.NDArray
is supported.
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None, valid_length=None)¶ Unrolls an RNN cell across time steps.
- Parameters
length (int) – Number of steps to unroll.
inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, …).
begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, …) if layout is ‘NTC’, or (length, batch_size, …) if layout is ‘TNC’. If None, output whatever is faster.
valid_length (Symbol, NDArray or None) – valid_length specifies the length of the sequences in the batch without padding. This option is especially useful for building sequence-to-sequence models where the input and output sequences would potentially be padded. If valid_length is None, all sequences are assumed to have the same length. If valid_length is a Symbol or NDArray, it should have shape (batch_size,). The ith element will be the length of the ith sequence in the batch. The last valid state will be return and the padded outputs will be masked with 0. Note that valid_length must be smaller or equal to length.
- Returns
outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
zero_grad
()¶ Sets all Parameters’ gradient buffer to 0.
-