Gluon Package¶
Warning
This package is currently experimental and may change in the near future.
Overview¶
Gluon package is a high-level interface for MXNet designed to be easy to use while keeping most of the flexibility of low level API. Gluon supports both imperative and symbolic programming, making it easy to train complex models imperatively in Python and then deploy with symbolic graph in C++ and Scala.
Parameter¶
Parameter |
A Container holding parameters (weights) of `Block`s. |
ParameterDict |
A dictionary managing a set of parameters. |
Containers¶
Block |
Base class for all neural network layers and models. |
HybridBlock |
HybridBlock supports forwarding with both Symbol and NDArray. |
SymbolBlock |
Construct block from symbol. |
Neural Network Layers¶
Containers¶
Sequential |
Stacks `Block`s sequentially. |
HybridSequential |
Stacks `HybridBlock`s sequentially. |
Basic Layers¶
Dense |
Just your regular densely-connected NN layer. |
Activation |
Applies an activation function to input. |
Dropout |
Applies Dropout to the input. |
BatchNorm |
Batch normalization layer (Ioffe and Szegedy, 2014). |
LeakyReLU |
Leaky version of a Rectified Linear Unit. |
Embedding |
Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. |
Convolutional Layers¶
Conv1D |
1D convolution layer (e.g. temporal convolution). |
Conv2D |
2D convolution layer (e.g. spatial convolution over images). |
Conv3D |
3D convolution layer (e.g. spatial convolution over volumes). |
Conv1DTranspose |
Transposed 1D convolution layer (sometimes called Deconvolution). |
Conv2DTranspose |
Transposed 2D convolution layer (sometimes called Deconvolution). |
Conv3DTranspose |
Transposed 3D convolution layer (sometimes called Deconvolution). |
Pooling Layers¶
MaxPool1D |
Max pooling operation for one dimensional data. |
MaxPool2D |
Max pooling operation for two dimensional (spatial) data. |
MaxPool3D |
Max pooling operation for 3D data (spatial or spatio-temporal). |
AvgPool1D |
Average pooling operation for temporal data. |
AvgPool2D |
Average pooling operation for spatial data. |
AvgPool3D |
Average pooling operation for 3D data (spatial or spatio-temporal). |
GlobalMaxPool1D |
Global max pooling operation for temporal data. |
GlobalMaxPool2D |
Global max pooling operation for spatial data. |
GlobalMaxPool3D |
Global max pooling operation for 3D data. |
GlobalAvgPool1D |
Global average pooling operation for temporal data. |
GlobalAvgPool2D |
Global average pooling operation for spatial data. |
GlobalAvgPool3D |
Global max pooling operation for 3D data. |
Recurrent Layers¶
RecurrentCell |
Abstract base class for RNN cells |
RNN |
Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence. |
LSTM |
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. |
GRU |
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. |
RNNCell |
Simple recurrent neural network cell. |
LSTMCell |
Long-Short Term Memory (LSTM) network cell. |
GRUCell |
Gated Rectified Unit (GRU) network cell. |
SequentialRNNCell |
Sequentially stacking multiple RNN cells. |
BidirectionalCell |
Bidirectional RNN cell. |
DropoutCell |
Applies dropout on input. |
ZoneoutCell |
Applies Zoneout on base cell. |
ResidualCell |
Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). |
Loss functions¶
L2Loss |
Calculates the mean squared error between output and label: |
L1Loss |
Calculates the mean absolute error between output and label: |
SoftmaxCrossEntropyLoss |
Computes the softmax cross entropy loss. |
KLDivLoss |
The Kullback-Leibler divergence loss. |
Utilities¶
split_data |
Splits an NDArray into num_slice slices along batch_axis. |
split_and_load |
Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list. |
clip_global_norm |
Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm. |
Data¶
Dataset |
Abstract dataset class. |
ArrayDataset |
A dataset with a data array and a label array. |
RecordFileDataset |
A dataset wrapping over a RecordIO (.rec) file. |
ImageRecordDataset |
Sampler |
Base class for samplers. |
SequentialSampler |
Samples elements from [0, length) sequentially. |
RandomSampler |
Samples elements from [0, length) randomly without replacement. |
BatchSampler |
Wraps over another Sampler and return mini-batches of samples. |
DataLoader |
Loads data from a dataset and returns mini-batches of data. |
Vision¶
MNIST |
MNIST handwritten digits dataset from `http://yann.lecun.com/exdb/mnist`_. |
CIFAR10 |
CIFAR10 image classification dataset from `https://www.cs.toronto.edu/~kriz/cifar.html`_. |
Model Zoo¶
Model zoo provides pre-defined and pre-trained models to help bootstrap machine learning applications.
Vision¶
Module for pre-defined neural network models.
This module contains definitions for the following model architectures: - AlexNet - DenseNet - Inception V3 - ResNet V1 - ResNet V2 - SqueezeNet - VGG
You can construct a model with random weights by calling its constructor: .. code:
import mxnet.gluon.models as models
resnet18 = models.resnet18_v1()
alexnet = models.alexnet()
squeezenet = models.squeezenet1_0()
densenet = models.densenet_161()
We provide pre-trained models for all the models except ResNet V2.
These can constructed by passing
pretrained=True
:
.. code:
import mxnet.gluon.models as models
resnet18 = models.resnet18_v1(pretrained=True)
alexnet = models.alexnet(pretrained=True)
Pretrained models are converted from torchvision.
All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB images of shape (N x 3 x H x W),
where N is the batch size, and H and W are expected to be at least 224.
The images have to be loaded in to a range of [0, 1] and then normalized
using mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
.
The transformation should preferrably happen at preprocessing. You can use
mx.image.color_normalize
for such transformation:
image = image/255
normalized = mx.image.color_normalize(image,
mean=mx.nd.array([0.485, 0.456, 0.406]),
std=mx.nd.array([0.229, 0.224, 0.225]))
get_model |
Returns a pre-defined model by name |
ResNet¶
resnet18_v1 |
ResNet-18 V1 model from “Deep Residual Learning for Image Recognition” paper. |
resnet34_v1 |
ResNet-34 V1 model from “Deep Residual Learning for Image Recognition” paper. |
resnet50_v1 |
ResNet-50 V1 model from “Deep Residual Learning for Image Recognition” paper. |
resnet101_v1 |
ResNet-101 V1 model from “Deep Residual Learning for Image Recognition” paper. |
resnet152_v1 |
ResNet-152 V1 model from “Deep Residual Learning for Image Recognition” paper. |
resnet18_v2 |
ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper. |
resnet34_v2 |
ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper. |
resnet50_v2 |
ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper. |
resnet101_v2 |
ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper. |
resnet152_v2 |
ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper. |
ResNetV1 |
ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. |
ResNetV2 |
ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper. |
BasicBlockV1 |
BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. |
BasicBlockV2 |
BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. |
BottleneckV1 |
Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. |
BottleneckV2 |
Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. |
get_resnet |
ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. |
VGG¶
vgg11 |
VGG-11 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg13 |
VGG-13 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg16 |
VGG-16 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg19 |
VGG-19 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg11_bn |
VGG-11 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg13_bn |
VGG-13 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg16_bn |
VGG-16 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
vgg19_bn |
VGG-19 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
VGG |
VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
get_vgg |
VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper. |
Alexnet¶
alexnet |
AlexNet model from the “One weird trick...” paper. |
AlexNet |
AlexNet model from the “One weird trick...” paper. |
DenseNet¶
densenet121 |
Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper. |
densenet161 |
Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper. |
densenet169 |
Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper. |
densenet201 |
Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper. |
DenseNet |
Densenet-BC model from the “Densely Connected Convolutional Networks” paper. |
SqueezeNet¶
squeezenet1_0 |
SqueezeNet 1.0 model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. |
squeezenet1_1 |
SqueezeNet 1.1 model from the official SqueezeNet repo. |
SqueezeNet |
SqueezeNet model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. |
Inception¶
inception_v3 |
Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper. |
Inception3 |
Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper. |
API Reference¶
-
class
mxnet.gluon.
Parameter
(name, grad_req='write', shape=None, dtype=, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True)[source]¶ A Container holding parameters (weights) of `Block`s.
Parameter holds a copy of the parameter on each Context after it is initialized with Parameter.initialize(...). If grad_req is not null, it will also hold a gradient array on each Context:
ctx = mx.gpu(0) x = mx.nd.zeros((16, 100), ctx=ctx) w = mx.gluon.Parameter('fc_weight', shape=(64, 100), init=mx.init.Xavier()) b = mx.gluon.Parameter('fc_bias', shape=(64,), init=mx.init.Zero()) w.initialize(ctx=ctx) b.initialize(ctx=ctx) out = mx.nd.FullyConnected(x, w.data(ctx), b.data(ctx), num_hidden=64)
Parameters: - name (str) – Name of this parameter.
- grad_req ({'write', 'add', 'null'}, default 'write') –
Specifies how to update gradient to grad arrays.
- ‘write’ means everytime gradient is written to grad NDArray.
- ‘add’ means everytime gradient is added to the grad NDArray. You need to manually call zero_grad() to clear the gradient buffer before each iteration when using this option.
- ‘null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
- shape (tuple of int, default None) – Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for Symbol API, but init will throw an error when using NDArray API.
- dtype (numpy.dtype or str, default 'float32') – Data type of this parameter. For example, numpy.float32 or ‘float32’.
- lr_mult (float, default 1.0) – Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
- wd_mult (float, default 1.0) – Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
- init (Initializer, default None) – Initializer of this parameter. Will use the global initializer by default.
-
grad_req
¶ {‘write’, ‘add’, ‘null’} – This can be set before or after initialization. Setting grad_req to null with x.grad_req = ‘null’ saves memory and computation when you don’t need gradient w.r.t x.
-
initialize
(init=None, ctx=None, default_init=, force_reinit=False)[source]¶ Initializes parameter and gradient arrays. Only used for NDArray API.
Parameters: - init (Initializer) – The initializer to use. Overrides Parameter.init and default_init.
- ctx (Context or list of Context, defaults to context.current_context().) –
Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context.
Note
Copies are independent arrays. User is responsible for keeping
their values consistent when updating. Normally gluon.Trainer does this for you.
- default_init (Initializer) – Default initializer is used when both init and Parameter.init are None.
- force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2)) >>> weight.initialize(ctx=mx.cpu(0)) >>> weight.data() [[-0.01068833 0.01729892] [ 0.02042518 -0.01618656]]
>>> weight.grad() [[ 0. 0.] [ 0. 0.]] >>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)]) >>> weight.data(mx.gpu(0)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] >>> weight.data(mx.gpu(1)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]]
-
reset_ctx
(ctx)[source]¶ Re-assign Parameter to other contexts.
- ctx : Context or list of Context, default context.current_context().
- Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
-
data
(ctx=None)[source]¶ Returns a copy of this parameter on one context. Must have been initialized on this context before.
Parameters: ctx (Context) – Desired context. Returns: Return type: NDArray on ctx
-
list_data
()[source]¶ Returns copies of this parameter on all contexts, in the same order as creation.
-
grad
(ctx=None)[source]¶ Returns a gradient buffer for this parameter on one context.
Parameters: ctx (Context) – Desired context.
-
class
mxnet.gluon.
ParameterDict
(prefix='', shared=None)[source]¶ A dictionary managing a set of parameters.
Parameters: - prefix (str, default '') – The prefix to be prepended to all Parameters’ names created by this dict.
- shared (ParameterDict or None) – If not None, when this dict’s get method creates a new parameter, will first try to retrieve it from shared dict. Usually used for sharing parameters with another Block.
-
prefix
¶ Prefix of this dict. It will be prepended to Parameters’ name created with get.
-
get
(name, **kwargs)[source]¶ Retrieves a Parameter with name self.prefix+name. If not found, get will first try to retrieve it from shared dict. If still not found, get will create a new Parameter with key-word arguments and insert it to self.
Parameters: - name (str) – Name of the desired Parameter. It will be prepended with this dictionary’s prefix.
- **kwargs (dict) – The rest of key-word arguments for the created Parameter.
Returns: The created or retrieved Parameter.
Return type:
-
initialize
(init=, ctx=None, verbose=False, force_reinit=False)[source]¶ Initializes all Parameters managed by this dictionary to be used for NDArray API. It has no effect when using Symbol API.
Parameters: - init (Initializer) – Global default Initializer to be used when Parameter.init is None. Otherwise, Parameter.init takes precedence.
- ctx (Context or list of Context) – Keeps a copy of Parameters on one or many context(s).
- force_reinit (bool, default False) – Whether to force re-initialization if parameter is already initialized.
-
reset_ctx
(ctx)[source]¶ Re-assign all Parameters to other contexts.
- ctx : Context or list of Context, default context.current_context().
- Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
-
setattr
(name, value)[source]¶ Set an attribute to a new value for all Parameters.
For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.collect_params().setattr('grad_req', 'null')
or change the learning rate multiplier:
model.collect_params().setattr('lr_mult', 0.5)
Parameters: - name (str) – Name of the attribute.
- value (valid type for attribute name) – The new value for the attribute.
-
save
(filename, strip_prefix='')[source]¶ Save parameters to file.
- filename : str
- Path to parameter file.
- strip_prefix : str, default ‘’
- Strip prefix from parameter names before saving.
-
load
(filename, ctx, allow_missing=False, ignore_extra=False, restore_prefix='')[source]¶ Load parameters from file.
- filename : str
- Path to parameter file.
- ctx : Context or list of Context
- Context(s) initialize loaded parameters on.
- allow_missing : bool, default False
- Whether to silently skip loading parameters not represents in the file.
- ignore_extra : bool, default False
- Whether to silently ignore parameters from the file that are not present in this ParameterDict.
- restore_prefix : str, default ‘’
- prepend prefix to names of stored parameters before loading.
-
class
mxnet.gluon.
Block
(prefix=None, params=None)[source]¶ Base class for all neural network layers and models. Your models should subclass this class.
Block can be nested recursively in a tree structure. You can create and assign child Block as regular attributes:
from mxnet.gluon import Block, nn from mxnet import ndarray as F class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. # It also allows sharing Parameters between Blocks recursively. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(F.zeros((10, 10), ctx=mx.cpu(0)))
Child Block assigned this way will be registered and collect_params will collect their Parameters recursively.
Parameters: - prefix (str) – Prefix acts like a name space. It will be prepended to the names of all Parameters and child Block`s in this `Block‘s name_scope. Prefix should be unique within one model to prevent name collisions.
- params (ParameterDict or None) –
ParameterDict for sharing weights with the new Block. For example, if you want dense1 to share dense0‘s weights, you can do:
dense0 = nn.Dense(20) dense1 = nn.Dense(20, params=dense0.collect_params())
-
prefix
¶ Prefix of this Block.
-
name
¶ Name of this Block, without ‘_’ in the end.
-
name_scope
()[source]¶ Returns a name space object managing a child Block and parameter names. Should be used within a with statement:
with self.name_scope(): self.dense = nn.Dense(20)
-
params
¶ Returns this Block‘s parameter dictionary (does not include its children’s parameters).
-
collect_params
()[source]¶ Returns a ParameterDict containing this Block and all of its children’s Parameters.
-
load_params
(filename, ctx, allow_missing=False, ignore_extra=False)[source]¶ Load parameters from file.
- filename : str
- Path to parameter file.
- ctx : Context or list of Context
- Context(s) initialize loaded parameters on.
- allow_missing : bool, default False
- Whether to silently skip loading parameters not represents in the file.
- ignore_extra : bool, default False
- Whether to silently ignore parameters from the file that are not present in this Block.
-
register_child
(block)[source]¶ Registers block as a child of self. `Block`s assigned to self as attributes will be registered automatically.
-
initialize
(init=, ctx=None, verbose=False)[source]¶ Initializes Parameter`s of this `Block and its children.
Equivalent to block.collect_params().initialize(...)
-
class
mxnet.gluon.
HybridBlock
(prefix=None, params=None)[source]¶ HybridBlock supports forwarding with both Symbol and NDArray.
Forward computation in HybridBlock must be static to work with Symbol`s, i.e. you cannot call `.asnumpy(), .shape, .dtype, etc on tensors. Also, you cannot use branching or loop logic that bases on non-constant expressions like random numbers or intermediate results, since they change the graph structure for each iteration.
Before activating with hybridize(), HybridBlock works just like normal Block. After activation, HybridBlock will create a symbolic graph representing the forward computation and cache it. On subsequent forwards, the cached graph will be used instead of hybrid_forward.
Refer Hybrid tutorial to see the end-to-end usage.
-
class
mxnet.gluon.
SymbolBlock
(outputs, inputs, params=None)[source]¶ Construct block from symbol. This is useful for using pre-trained models as feature extractors. For example, you may want to extract get the output from fc2 layer in AlexNet.
Parameters: - outputs (Symbol or list of Symbol) – The desired output for SymbolBlock.
- inputs (Symbol or list of Symbol) – The Variables in output’s argument that should be used as inputs.
- params (ParameterDict) – Parameter dictionary for arguments and auxililary states of outputs that are not inputs.
Examples
>>> # To extract the feature from fc1 and fc2 layers of AlexNet: >>> alexnet = gluon.model_zoo.vision.alexnet(pretrained=True, ctx=mx.cpu(), prefix='model_') >>> inputs = mx.sym.var('data') >>> out = alexnet(inputs) >>> internals = out.get_internals() >>> print(internals.list_outputs()) ['data', ..., 'model_dense0_relu_fwd_output', ..., 'model_dense1_relu_fwd_output', ...] >>> outputs = [internals['model_dense0_relu_fwd_output'], internals['model_dense1_relu_fwd_output']] >>> # Create SymbolBlock that shares parameters with alexnet >>> feat_model = gluon.SymbolBlock(outputs, inputs, params=alexnet.collect_params()) >>> x = mx.nd.random_normal(shape=(16, 3, 224, 224)) >>> print(feat_model(x))
-
class
mxnet.gluon.nn.
Sequential
(prefix=None, params=None)[source]¶ Stacks `Block`s sequentially.
Example:
net = nn.Sequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20))
-
class
mxnet.gluon.nn.
HybridSequential
(prefix=None, params=None)[source]¶ Stacks `HybridBlock`s sequentially.
Example:
net = nn.Sequential() # use net's name_scope to give child Blocks appropriate names. with net.name_scope(): net.add(nn.Dense(10, activation='relu')) net.add(nn.Dense(20))
-
class
mxnet.gluon.nn.
Dense
(units, activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_units=0, **kwargs)[source]¶ Just your regular densely-connected NN layer.
Dense implements the operation: output = activation(dot(input, weight) + bias) where activation is the element-wise activation function passed as the activation argument, weight is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
Note: the input must be a tensor with rank 2. Use flatten to convert it to rank 2 manually if necessary.
Parameters: - units (int) – Dimensionality of the output space.
- activation (str) – Activation function to use. See help on Activation layer. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x).
- use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the kernel weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- in_units (int, optional) – Size of the input data. If not specified, initialization will be deferred to the first time forward is called and in_units will be inferred from the shape of input data.
- prefix (str or None) – See document of Block.
- params (ParameterDict or None) – See document of Block.
- Input shape:
- A 2D input with shape (batch_size, in_units).
- Output shape:
- The output would have shape (batch_size, units).
-
class
mxnet.gluon.nn.
Activation
(activation, **kwargs)[source]¶ Applies an activation function to input.
Parameters: activation (str) – Name of activation function to use. See Activation()
for available choices.- Input shape:
- Arbitrary.
- Output shape:
- Same shape as input.
-
class
mxnet.gluon.nn.
Dropout
(rate, **kwargs)[source]¶ Applies Dropout to the input.
Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.
Parameters: rate (float) – Fraction of the input units to drop. Must be a number between 0 and 1. - Input shape:
- Arbitrary.
- Output shape:
- Same shape as input.
References
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
-
class
mxnet.gluon.nn.
BatchNorm
(axis=1, momentum=0.9, epsilon=1e-05, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', running_mean_initializer='zeros', running_variance_initializer='ones', in_channels=0, **kwargs)[source]¶ Batch normalization layer (Ioffe and Szegedy, 2014). Normalizes the input at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
Parameters: - axis (int, default 1) – The axis that should be normalized. This is typically the channels (C) axis. For instance, after a Conv2D layer with layout=’NCHW’, set axis=1 in BatchNorm. If layout=’NHWC’, then set axis=3.
- momentum (float, default 0.9) – Momentum for the moving average.
- epsilon (float, default 1e-5) – Small float added to variance to avoid dividing by zero.
- center (bool, default True) – If True, add offset of beta to normalized tensor. If False, beta is ignored.
- scale (bool, default True) – If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
- beta_initializer (str or Initializer, default ‘zeros’) – Initializer for the beta weight.
- gamma_initializer (str or Initializer, default ‘ones’) – Initializer for the gamma weight.
- moving_mean_initializer (str or Initializer, default ‘zeros’) – Initializer for the moving mean.
- moving_variance_initializer (str or Initializer, default ‘ones’) – Initializer for the moving variance.
- in_channels (int, default 0) – Number of channels (feature maps) in input data. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- Input shape:
- Arbitrary.
- Output shape:
- Same shape as input.
-
class
mxnet.gluon.nn.
LeakyReLU
(alpha, **kwargs)[source]¶ Leaky version of a Rectified Linear Unit.
It allows a small gradient when the unit is not active:
`f(x) = alpha * x for x < 0`, `f(x) = x for x >= 0`.
Parameters: alpha (float) – slope coefficient for the negative half axis. Must be >= 0. - Input shape:
- Arbitrary.
- Output shape:
- Same shape as input.
-
class
mxnet.gluon.nn.
Embedding
(input_dim, output_dim, dtype='float32', weight_initializer=None, **kwargs)[source]¶ Turns non-negative integers (indexes/tokens) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
Parameters: - input_dim (int) – Size of the vocabulary, i.e. maximum integer index + 1.
- output_dim (int) – Dimension of the dense embedding.
- dtype (str or np.dtype, default 'float32') – Data type of output embeddings.
- weight_initializer (Initializer) – Initializer for the embeddings matrix.
- Input shape:
- 2D tensor with shape: (N, M).
- Output shape:
- 3D tensor with shape: (N, M, output_dim).
-
class
mxnet.gluon.nn.
Conv1D
(channels, kernel_size, strides=1, padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ 1D convolution layer (e.g. temporal convolution).
This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 1 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 1 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 1 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 1 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCW') – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, etc. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 3D array of shape (batch_size, in_channels, width) if layout is NCW.
- Output shape:
This depends on the layout parameter. Output is 3D array of shape (batch_size, channels, out_width) if layout is NCW. out_width is calculated as:
out_width = floor((width+2*padding-dilation*(kernel_size-1)-1)/stride)+1
-
class
mxnet.gluon.nn.
Conv2D
(channels, kernel_size, strides=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ 2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 2 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 2 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 2 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 2 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCHW') – Dimension ordering of data and weight. Can be ‘NCHW’, ‘NHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 4D array of shape (batch_size, in_channels, height, width) if layout is NCHW.
- Output shape:
This depends on the layout parameter. Output is 4D array of shape (batch_size, channels, out_height, out_width) if layout is NCHW.
out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_width = floor((width+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1
-
class
mxnet.gluon.nn.
Conv3D
(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ 3D convolution layer (e.g. spatial convolution over volumes).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCDHW') – Dimension ordering of data and weight. Can be ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’ and ‘W’ dimensions.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 5D array of shape (batch_size, in_channels, depth, height, width) if layout is NCDHW.
- Output shape:
This depends on the layout parameter. Output is 5D array of shape (batch_size, channels, out_depth, out_height, out_width) if layout is NCDHW.
out_depth, out_height and out_width are calculated as:
out_depth = floor((depth+2*padding[0]-dilation[0]*(kernel_size[0]-1)-1)/stride[0])+1 out_height = floor((height+2*padding[1]-dilation[1]*(kernel_size[1]-1)-1)/stride[1])+1 out_width = floor((width+2*padding[2]-dilation[2]*(kernel_size[2]-1)-1)/stride[2])+1
-
class
mxnet.gluon.nn.
Conv1DTranspose
(channels, kernel_size, strides=1, padding=0, output_padding=0, dilation=1, groups=1, layout='NCW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Transposed 1D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCW') – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, etc. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Convolution is applied on the ‘W’ dimension.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 3D array of shape (batch_size, in_channels, width) if layout is NCW.
- Output shape:
This depends on the layout parameter. Output is 3D array of shape (batch_size, channels, out_width) if layout is NCW.
out_width is calculated as:
out_width = (width-1)*strides-2*padding+kernel_size+output_padding
-
class
mxnet.gluon.nn.
Conv2DTranspose
(channels, kernel_size, strides=(1, 1), padding=(0, 0), output_padding=(0, 0), dilation=(1, 1), groups=1, layout='NCHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Transposed 2D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCHW') – Dimension ordering of data and weight. Can be ‘NCHW’, ‘NHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. Convolution is applied on the ‘H’ and ‘W’ dimensions.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 4D array of shape (batch_size, in_channels, height, width) if layout is NCHW.
- Output shape:
This depends on the layout parameter. Output is 4D array of shape (batch_size, channels, out_height, out_width) if layout is NCHW.
out_height and out_width are calculated as:
out_height = (height-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0] out_width = (width-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1]
-
class
mxnet.gluon.nn.
Conv3DTranspose
(channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), output_padding=(0, 0, 0), dilation=(1, 1, 1), groups=1, layout='NCDHW', activation=None, use_bias=True, weight_initializer=None, bias_initializer='zeros', in_channels=0, **kwargs)[source]¶ Transposed 3D convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution.
If in_channels is not specified, Parameter initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
Parameters: - channels (int) – The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution.
- kernel_size (int or tuple/list of 3 int) – Specifies the dimensions of the convolution window.
- strides (int or tuple/list of 3 int,) – Specify the strides of the convolution.
- padding (int or a tuple/list of 3 int,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
- dilation (int or tuple/list of 3 int) – Specifies the dilation rate to use for dilated convolution.
- groups (int) – Controls the connections between inputs and outputs. At groups=1, all inputs are convolved to all outputs. At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
- layout (str, default 'NCDHW') – Dimension ordering of data and weight. Can be ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. Convolution is applied on the ‘D’, ‘H’, and ‘W’ dimensions.
- in_channels (int, default 0) – The number of input channels to this layer. If not specified, initialization will be deferred to the first time forward is called and in_channels will be inferred from the shape of input data.
- activation (str) – Activation function to use. See
Activation()
. If you don’t specify anything, no activation is applied (ie. “linear” activation: a(x) = x). - use_bias (bool) – Whether the layer uses a bias vector.
- weight_initializer (str or Initializer) – Initializer for the weight weights matrix.
- bias_initializer (str or Initializer) – Initializer for the bias vector.
- Input shape:
- This depends on the layout parameter. Input is 5D array of shape (batch_size, in_channels, depth, height, width) if layout is NCDHW.
- Output shape:
This depends on the layout parameter. Output is 5D array of shape (batch_size, channels, out_depth, out_height, out_width) if layout is NCDHW. out_depth, out_height and out_width are calculated as:
out_depth = (depth-1)*strides[0]-2*padding[0]+kernel_size[0]+output_padding[0] out_height = (height-1)*strides[1]-2*padding[1]+kernel_size[1]+output_padding[1] out_width = (width-1)*strides[2]-2*padding[2]+kernel_size[2]+output_padding[2]
-
class
mxnet.gluon.nn.
MaxPool1D
(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, **kwargs)[source]¶ Max pooling operation for one dimensional data.
Parameters: - pool_size (int) – Size of the max pooling windows.
- strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCW') – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, etc. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. Pooling is applied on the W dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 3D array of shape (batch_size, channels, width) if layout is NCW.
- Output shape:
This depends on the layout parameter. Output is 3D array of shape (batch_size, channels, out_width) if layout is NCW.
out_width is calculated as:
out_width = floor((width+2*padding-pool_size)/strides)+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
MaxPool2D
(pool_size=(2, 2), strides=None, padding=0, layout='NCHW', ceil_mode=False, **kwargs)[source]¶ Max pooling operation for two dimensional (spatial) data.
Parameters: - pool_size (int or list/tuple of 2 ints,) – Size of the max pooling windows.
- strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCHW') – Dimension ordering of data and weight. Can be ‘NCHW’, ‘NHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 4D array of shape (batch_size, channels, height, width) if layout is NCHW.
- Output shape:
This depends on the layout parameter. Output is 4D array of shape (batch_size, channels, out_height, out_width) if layout is NCHW.
out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1 out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
MaxPool3D
(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', **kwargs)[source]¶ Max pooling operation for 3D data (spatial or spatio-temporal).
Parameters: - pool_size (int or list/tuple of 3 ints,) – Size of the max pooling windows.
- strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCDHW') – Dimension ordering of data and weight. Can be ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 5D array of shape (batch_size, channels, depth, height, width) if layout is NCDHW.
- Output shape:
This depends on the layout parameter. Output is 5D array of shape (batch_size, channels, out_depth, out_height, out_width) if layout is NCDHW.
out_depth, out_height and out_width are calculated as
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1 out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1 out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
AvgPool1D
(pool_size=2, strides=None, padding=0, layout='NCW', ceil_mode=False, **kwargs)[source]¶ Average pooling operation for temporal data.
Parameters: - pool_size (int) – Size of the max pooling windows.
- strides (int, or None) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCW') – Dimension ordering of data and weight. Can be ‘NCW’, ‘NWC’, etc. ‘N’, ‘C’, ‘W’ stands for batch, channel, and width (time) dimensions respectively. padding is applied on ‘W’ dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 3D array of shape (batch_size, channels, width) if layout is NCW.
- Output shape:
This depends on the layout parameter. Output is 3D array of shape (batch_size, channels, out_width) if layout is NCW.
out_width is calculated as:
out_width = floor((width+2*padding-pool_size)/strides)+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
AvgPool2D
(pool_size=(2, 2), strides=None, padding=0, ceil_mode=False, layout='NCHW', **kwargs)[source]¶ Average pooling operation for spatial data.
Parameters: - pool_size (int or list/tuple of 2 ints,) – Size of the max pooling windows.
- strides (int, list/tuple of 2 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int or list/tuple of 2 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCHW') – Dimension ordering of data and weight. Can be ‘NCHW’, ‘NHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’ stands for batch, channel, height, and width dimensions respectively. padding is applied on ‘H’ and ‘W’ dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 4D array of shape (batch_size, channels, height, width) if layout is NCHW.
- Output shape:
This depends on the layout parameter. Output is 4D array of shape (batch_size, channels, out_height, out_width) if layout is NCHW.
out_height and out_width are calculated as:
out_height = floor((height+2*padding[0]-pool_size[0])/strides[0])+1 out_width = floor((width+2*padding[1]-pool_size[1])/strides[1])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
AvgPool3D
(pool_size=(2, 2, 2), strides=None, padding=0, ceil_mode=False, layout='NCDHW', **kwargs)[source]¶ Average pooling operation for 3D data (spatial or spatio-temporal).
Parameters: - pool_size (int or list/tuple of 3 ints,) – Size of the max pooling windows.
- strides (int, list/tuple of 3 ints, or None.) – Factor by which to downscale. E.g. 2 will halve the input size. If None, it will default to pool_size.
- padding (int or list/tuple of 3 ints,) – If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
- layout (str, default 'NCDHW') – Dimension ordering of data and weight. Can be ‘NCDHW’, ‘NDHWC’, etc. ‘N’, ‘C’, ‘H’, ‘W’, ‘D’ stands for batch, channel, height, width and depth dimensions respectively. padding is applied on ‘D’, ‘H’ and ‘W’ dimension.
- ceil_mode (bool, default False) – When True, will use ceil instead of floor to compute the output shape.
- Input shape:
- This depends on the layout parameter. Input is 5D array of shape (batch_size, channels, depth, height, width) if layout is NCDHW.
- Output shape:
This depends on the layout parameter. Output is 5D array of shape (batch_size, channels, out_depth, out_height, out_width) if layout is NCDHW.
out_depth, out_height and out_width are calculated as
out_depth = floor((depth+2*padding[0]-pool_size[0])/strides[0])+1 out_height = floor((height+2*padding[1]-pool_size[1])/strides[1])+1 out_width = floor((width+2*padding[2]-pool_size[2])/strides[2])+1
When ceil_mode is True, ceil will be used instead of floor in this equation.
-
class
mxnet.gluon.nn.
GlobalMaxPool1D
(layout='NCW', **kwargs)[source]¶ Global max pooling operation for temporal data.
-
class
mxnet.gluon.nn.
GlobalMaxPool2D
(layout='NCHW', **kwargs)[source]¶ Global max pooling operation for spatial data.
-
class
mxnet.gluon.nn.
GlobalMaxPool3D
(layout='NCDHW', **kwargs)[source]¶ Global max pooling operation for 3D data.
-
class
mxnet.gluon.nn.
GlobalAvgPool1D
(layout='NCW', **kwargs)[source]¶ Global average pooling operation for temporal data.
-
class
mxnet.gluon.nn.
GlobalAvgPool2D
(layout='NCHW', **kwargs)[source]¶ Global average pooling operation for spatial data.
-
class
mxnet.gluon.nn.
GlobalAvgPool3D
(layout='NCDHW', **kwargs)[source]¶ Global max pooling operation for 3D data.
-
class
mxnet.gluon.rnn.
RecurrentCell
(prefix=None, params=None)[source]¶ Abstract base class for RNN cells
Parameters: - prefix (str, optional) – Prefix for names of Block`s (this prefix is also used for names of weights if `params is None i.e. if params are being created and not reused)
- params (Parameter or None, optional) – Container for weight sharing between cells. A new Parameter container is created if params is None.
-
begin_state
(batch_size=0, func=, **kwargs)[source]¶ Initial state for this cell.
Parameters: - func (callable, default symbol.zeros) –
Function for creating initial state.
For Symbol API, func can be symbol.zeros, symbol.uniform, symbol.var etc. Use symbol.var if you want to directly feed input as states.
For NDArray API, func can be ndarray.zeros, ndarray.ones, etc.
- batch_size (int, default 0) – Only required for NDArray API. Size of the batch (‘N’ in layout) dimension of input.
- **kwargs – Additional keyword arguments passed to func. For example mean, std, dtype, etc.
Returns: states – Starting states for the first RNN step.
Return type: nested list of Symbol
- func (callable, default symbol.zeros) –
-
unroll
(length, inputs, begin_state=None, layout='NTC', merge_outputs=None)[source]¶ Unrolls an RNN cell across time steps.
Parameters: - length (int) – Number of steps to unroll.
- inputs (Symbol, list of Symbol, or None) –
If inputs is a single Symbol (usually the output of Embedding symbol), it should have shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’.
If inputs is a list of symbols (usually output of previous unroll), they should all have shape (batch_size, ...).
- begin_state (nested list of Symbol, optional) – Input states created by begin_state() or output state of another cell. Created from begin_state() if None.
- layout (str, optional) – layout of input symbol. Only used if inputs is a single Symbol.
- merge_outputs (bool, optional) – If False, returns outputs as a list of Symbols. If True, concatenates output across time steps and returns a single symbol with shape (batch_size, length, ...) if layout is ‘NTC’, or (length, batch_size, ...) if layout is ‘TNC’. If None, output whatever is faster.
Returns: - outputs (list of Symbol or Symbol) – Symbol (if merge_outputs is True) or list of Symbols (if merge_outputs is False) corresponding to the output from the RNN from this unrolling.
- states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state().
-
forward
(inputs, states)[source]¶ Unrolls the recurrent cell for one time step.
Parameters: - inputs (sym.Variable) – Input symbol, 2D, of shape (batch_size * num_units).
- states (list of sym.Variable) – RNN state from previous step or the output of begin_state().
Returns: - output (Symbol) – Symbol corresponding to the output from the RNN when unrolling for a single time step.
- states (list of Symbol) – The new state of this RNN after this unrolling. The type of this symbol is same as the output of begin_state(). This can be used as an input state to the next time step of this RNN.
See also
begin_state()
- This function can provide the states for the first time step.
unroll()
- This function unrolls an RNN for a given number of (>=1) time steps.
-
class
mxnet.gluon.rnn.
RNN
(hidden_size, num_layers=1, activation='relu', layout='TNC', dropout=0, bidirectional=False, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, **kwargs)[source]¶ Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})\]where \(h_t\) is the hidden state at time t, and \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer. If nonlinearity=’relu’, then ReLU is used instead of tanh.
Parameters: - hidden_size (int) – The number of features in the hidden state h.
- num_layers (int, default 1) – Number of recurrent layers.
- activation ({'relu' or 'tanh'}, default 'tanh') – The activation function to use.
- layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
- dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
- bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- prefix (str or None) – Prefix of this Block.
- params (ParameterDict or None) – Shared Parameters for this Block.
- Input shapes:
- The input shape depends on layout. For layout=’TNC’, the input has shape (sequence_length, batch_size, input_size)
- Output shape:
- The output shape depends on layout. For layout=’TNC’, the output has shape (sequence_length, batch_size, num_hidden). If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
- Recurrent state:
- The recurrent state is an NDArray with shape (num_layers, batch_size, num_hidden). If bidirectional is True, the recurrent state shape will instead be (2*num_layers, batch_size, num_hidden) If input recurrent state is None, zeros are used as default begin states, and the output recurrent state is omitted.
Examples
>>> layer = mx.gluon.rnn.RNN(100, 3) >>> layer.initialize() >>> input = mx.nd.random_uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random_uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, h0)
-
class
mxnet.gluon.rnn.
LSTM
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', **kwargs)[source]¶ Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = sigmoid(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\ o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and out gates, respectively.
Parameters: - hidden_size (int) – The number of features in the hidden state h.
- num_layers (int, default 1) – Number of recurrent layers.
- layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
- dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.
- bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- prefix (str or None) – Prefix of this Block.
- params (ParameterDict or None) – Shared Parameters for this Block.
- Input shapes:
- The input shape depends on layout. For layout=’TNC’, the input has shape (sequence_length, batch_size, input_size)
- Output shape:
- The output shape depends on layout. For layout=’TNC’, the output has shape (sequence_length, batch_size, num_hidden). If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
- Recurrent state:
- The recurrent state is a list of two NDArrays. Both has shape (num_layers, batch_size, num_hidden). If bidirectional is True, each recurrent state will instead have shape (2*num_layers, batch_size, num_hidden). If input recurrent state is None, zeros are used as default begin states, and the output recurrent state is omitted.
Examples
>>> layer = mx.gluon.rnn.LSTM(100, 3) >>> layer.initialize() >>> input = mx.nd.random_uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random_uniform(shape=(3, 3, 100)) >>> c0 = mx.nd.random_uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, [h0, c0])
-
class
mxnet.gluon.rnn.
GRU
(hidden_size, num_layers=1, layout='TNC', dropout=0, bidirectional=False, input_size=0, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', **kwargs)[source]¶ Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function:
\[\begin{split}\begin{array}{ll} r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ i_t = sigmoid(W_{ii} x_t + b_{ii} + W_hi h_{(t-1)} + b_{hi}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\ \end{array}\end{split}\]where \(h_t\) is the hidden state at time t, \(x_t\) is the hidden state of the previous layer at time t or \(input_t\) for the first layer, and \(r_t\), \(i_t\), \(n_t\) are the reset, input, and new gates, respectively.
Parameters: - hidden_size (int) – The number of features in the hidden state h
- num_layers (int, default 1) – Number of recurrent layers.
- layout (str, default 'TNC') – The format of input and output tensors. T, N and C stand for sequence length, batch size, and feature dimensions respectively.
- dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
- bidirectional (bool, default False) – If True, becomes a bidirectional RNN.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- input_size (int, default 0) – The number of expected features in the input x. If not specified, it will be inferred from input.
- prefix (str or None) – Prefix of this Block.
- params (ParameterDict or None) – Shared Parameters for this Block.
- Input shapes:
- The input shape depends on layout. For layout=’TNC’, the input has shape (sequence_length, batch_size, input_size)
- Output shape:
- The output shape depends on layout. For layout=’TNC’, the output has shape (sequence_length, batch_size, num_hidden). If bidirectional is True, output shape will instead be (sequence_length, batch_size, 2*num_hidden)
- Recurrent state:
- The recurrent state is an NDArray with shape (num_layers, batch_size, num_hidden). If bidirectional is True, the recurrent state shape will instead be (2*num_layers, batch_size, num_hidden) If input recurrent state is None, zeros are used as default begin states, and the output recurrent state is omitted.
Examples
>>> layer = mx.gluon.rnn.GRU(100, 3) >>> layer.initialize() >>> input = mx.nd.random_uniform(shape=(5, 3, 10)) >>> # by default zeros are used as begin state >>> output = layer(input) >>> # manually specify begin state. >>> h0 = mx.nd.random_uniform(shape=(3, 3, 100)) >>> output, hn = layer(input, h0)
-
class
mxnet.gluon.rnn.
RNNCell
(hidden_size, activation='tanh', i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]¶ Simple recurrent neural network cell.
Parameters: - hidden_size (int) – Number of units in output symbol
- activation (str or Symbol, default 'tanh') – Type of activation function.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- prefix (str, default ‘rnn_‘) – Prefix for name of Block`s (and name of weight if params is `None).
- params (Parameter or None) – Container for weight sharing between cells. Created if None.
-
class
mxnet.gluon.rnn.
LSTMCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]¶ Long-Short Term Memory (LSTM) network cell.
Parameters: - hidden_size (int) – Number of units in output symbol.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer, default 'lstmbias') – Initializer for the bias vector. By default, bias for the forget gate is initialized to 1 while all other biases are initialized to zero.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- prefix (str, default ‘lstm_‘) – Prefix for name of Block`s (and name of weight if params is `None).
- params (Parameter or None) – Container for weight sharing between cells. Created if None.
-
class
mxnet.gluon.rnn.
GRUCell
(hidden_size, i2h_weight_initializer=None, h2h_weight_initializer=None, i2h_bias_initializer='zeros', h2h_bias_initializer='zeros', input_size=0, prefix=None, params=None)[source]¶ Gated Rectified Unit (GRU) network cell. Note: this is an implementation of the cuDNN version of GRUs (slight modification compared to Cho et al. 2014).
Parameters: - hidden_size (int) – Number of units in output symbol.
- i2h_weight_initializer (str or Initializer) – Initializer for the input weights matrix, used for the linear transformation of the inputs.
- h2h_weight_initializer (str or Initializer) – Initializer for the recurrent weights matrix, used for the linear transformation of the recurrent state.
- i2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- h2h_bias_initializer (str or Initializer) – Initializer for the bias vector.
- prefix (str, default ‘gru_‘) – prefix for name of Block`s (and name of weight if params is `None).
- params (Parameter or None) – Container for weight sharing between cells. Created if None.
-
class
mxnet.gluon.rnn.
SequentialRNNCell
(prefix=None, params=None)[source]¶ Sequentially stacking multiple RNN cells.
-
class
mxnet.gluon.rnn.
BidirectionalCell
(l_cell, r_cell, output_prefix='bi_')[source]¶ Bidirectional RNN cell.
Parameters: - l_cell (RecurrentCell) – Cell for forward unrolling
- r_cell (RecurrentCell) – Cell for backward unrolling
-
class
mxnet.gluon.rnn.
DropoutCell
(rate, prefix=None, params=None)[source]¶ Applies dropout on input.
Parameters: rate (float) – Percentage of elements to drop out, which is 1 - percentage to retain.
-
class
mxnet.gluon.rnn.
ZoneoutCell
(base_cell, zoneout_outputs=0.0, zoneout_states=0.0)[source]¶ Applies Zoneout on base cell.
-
class
mxnet.gluon.rnn.
ResidualCell
(base_cell)[source]¶ Adds residual connection as described in Wu et al, 2016 (https://arxiv.org/abs/1609.08144). Output of the cell is output of the base cell plus input.
-
class
mxnet.gluon.
Trainer
(params, optimizer, optimizer_params=None, kvstore='device')[source]¶ Applies an Optimizer on a set of Parameters. Trainer should be used together with autograd.
Parameters: - params (ParameterDict) – The set of parameters to optimize.
- optimizer (str or Optimizer) – The optimizer to use. See help on Optimizer for a list of available optimizers.
- optimizer_params (dict) – Key-word arguments to be passed to optimizer constructor. For example, {‘learning_rate’: 0.1}. All optimizers accept learning_rate, wd (weight decay), clip_gradient, and lr_scheduler. See each optimizer’s constructor for a list of additional supported arguments.
- kvstore (str or KVStore) – kvstore type for multi-gpu and distributed training. See help on
mxnet.kvstore.create
for more information.
-
step
(batch_size, ignore_stale_grad=False)[source]¶ Makes one step of parameter update. Should be called after autograd.compute_gradient and outside of record() scope.
Parameters: - batch_size (int) – Batch size of data processed. Gradient will be normalized by 1/batch_size. Set this to 1 if you normalized loss manually with loss = mean(loss).
- ignore_stale_grad (bool, optional, default=False) – If true, ignores Parameters with stale gradient (gradient that has not been updated by backward after last step) and skip update.
-
class
mxnet.gluon.loss.
L2Loss
(weight=1.0, batch_axis=0, **kwargs)[source]¶ Calculates the mean squared error between output and label:
\[L = \frac{1}{2}\sum_i \Vert {output}_i - {label}_i \Vert^2.\]Output and label can have arbitrary shape as long as they have the same number of elements.
Parameters: - weight (float or None) – Global scalar weight for loss.
- sample_weight (Symbol or None) – Per sample weighting. Must be broadcastable to the same shape as loss. For example, if loss has shape (64, 10) and you want to weight each sample in the batch, sample_weight should have shape (64, 1).
- batch_axis (int, default 0) – The axis that represents mini-batch.
-
class
mxnet.gluon.loss.
L1Loss
(weight=None, batch_axis=0, **kwargs)[source]¶ Calculates the mean absolute error between output and label:
\[L = \frac{1}{2}\sum_i \vert {output}_i - {label}_i \vert.\]Output and label must have the same shape.
Parameters: - weight (float or None) – Global scalar weight for loss.
- sample_weight (Symbol or None) – Per sample weighting. Must be broadcastable to the same shape as loss. For example, if loss has shape (64, 10) and you want to weight each sample in the batch, sample_weight should have shape (64, 1).
- batch_axis (int, default 0) – The axis that represents mini-batch.
-
class
mxnet.gluon.loss.
SoftmaxCrossEntropyLoss
(axis=-1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]¶ Computes the softmax cross entropy loss. (alias: SoftmaxCELoss)
If sparse_label is True, label should contain integer category indicators:
\[ \begin{align}\begin{aligned}p = {softmax}({output})\\L = -\sum_i {log}(p_{i,{label}_i})\end{aligned}\end{align} \]Label’s shape should be output’s shape without the axis dimension. i.e. for output.shape = (1,2,3,4) and axis = 2, label.shape should be (1,2,4).
If sparse_label is False, label should contain probability distribution with the same shape as output:
\[ \begin{align}\begin{aligned}p = {softmax}({output})\\L = -\sum_i \sum_j {label}_j {log}(p_{ij})\end{aligned}\end{align} \]Parameters: - axis (int, default -1) – The axis to sum over when computing softmax and entropy.
- sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.
- from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead of unnormalized numbers.
- weight (float or None) – Global scalar weight for loss.
- sample_weight (Symbol or None) – Per sample weighting. Must be broadcastable to the same shape as loss. For example, if loss has shape (64, 10) and you want to weight each sample in the batch, sample_weight should have shape (64, 1).
- batch_axis (int, default 0) – The axis that represents mini-batch.
-
class
mxnet.gluon.loss.
KLDivLoss
(from_logits=True, weight=None, batch_axis=0, **kwargs)[source]¶ The Kullback-Leibler divergence loss.
KL divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.
\[L = 1/n \sum_i (label_i * (log(label_i) - output_i))\]Label’s shape should be the same as output’s.
Parameters: - from_logits (bool, default is True) – Whether the input is log probability (usually from log_softmax) instead of unnormalized numbers.
- weight (float or None) – Global scalar weight for loss.
- sample_weight (Symbol or None) – Per sample weighting. Must be broadcastable to the same shape as loss. For example, if loss has shape (64, 10) and you want to weight each sample in the batch, sample_weight should have shape (64, 1).
- batch_axis (int, default 0) – The axis that represents mini-batch.
-
utils.
split_data
(data, num_slice, batch_axis=0, even_split=True)¶ Splits an NDArray into num_slice slices along batch_axis. Usually used for data parallelism where each slices is sent to one device (i.e. GPU).
Parameters: - data (NDArray) – A batch of data.
- num_slice (int) – Number of desired slices.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements. If True, an error will be raised when num_slice does not evenly divide data.shape[batch_axis].
Returns: Return value is a list even if num_slice is 1.
Return type: list of NDArray
-
utils.
split_and_load
(data, ctx_list, batch_axis=0, even_split=True)¶ Splits an NDArray into len(ctx_list) slices along batch_axis and loads each slice to one context in ctx_list.
Parameters: - data (NDArray) – A batch of data.
- ctx_list (list of Context) – A list of Contexts.
- batch_axis (int, default 0) – The axis along which to slice.
- even_split (bool, default True) – Whether to force all slices to have the same number of elements.
Returns: Each corresponds to a context in ctx_list.
Return type: list of NDArray
-
utils.
clip_global_norm
(arrays, max_norm)¶ Rescales NDArrays so that the sum of their 2-norm is smaller than max_norm.
-
class
mxnet.gluon.data.
Dataset
[source]¶ Abstract dataset class. All datasets should have this interface.
Subclasses need to override __getitem__, which returns the i-th element, and __len__, which returns the total number elements.
Note
An mxnet or numpy array can be directly used as a dataset.
-
class
mxnet.gluon.data.
ArrayDataset
(data, label)[source]¶ A dataset with a data array and a label array.
The i-th sample is (data[i], lable[i]).
Parameters: - data (array-like object) – The data array. Can be mxnet or numpy array.
- label (array-like object) – The label array. Can be mxnet or numpy array.
-
class
mxnet.gluon.data.
RecordFileDataset
(filename)[source]¶ A dataset wrapping over a RecordIO (.rec) file.
Each sample is a string representing the raw content of an record.
Parameters: filename (str) – Path to rec file.
-
class
mxnet.gluon.data.
Sampler
[source]¶ Base class for samplers.
All samplers should subclass Sampler and define __iter__ and __len__ methods.
-
class
mxnet.gluon.data.
SequentialSampler
(length)[source]¶ Samples elements from [0, length) sequentially.
Parameters: length (int) – Length of the sequence.
-
class
mxnet.gluon.data.
RandomSampler
(length)[source]¶ Samples elements from [0, length) randomly without replacement.
Parameters: length (int) – Length of the sequence.
-
class
mxnet.gluon.data.
BatchSampler
(sampler, batch_size, last_batch='keep')[source]¶ Wraps over another Sampler and return mini-batches of samples.
Parameters: - sampler (Sampler) – The source Sampler.
- batch_size (int) – Size of mini-batch.
- last_batch ({'keep', 'discard', 'rollover'}) –
Specifies how the last batch is handled if batch_size does not evenly divide sequence length.
If ‘keep’, the last batch will be returned directly, but will contain less element than batch_size requires.
If ‘discard’, the last batch will be discarded.
If ‘rollover’, the remaining elements will be rolled over to the next iteration.
Examples
>>> sampler = gluon.data.SequentialSampler(10) >>> batch_sampler = gluon.data.BatchSampler(sampler, 3, 'keep') >>> list(batch_sampler) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
-
class
mxnet.gluon.data.
DataLoader
(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None)[source]¶ Loads data from a dataset and returns mini-batches of data.
Parameters: - dataset (Dataset) – Source dataset. Note that numpy and mxnet arrays can be directly used as a Dataset.
- batch_size (int) – Size of mini-batch.
- shuffle (bool) – Whether to shuffle the samples.
- sampler (Sampler) – The sampler to use. Either specify sampler or shuffle, not both.
- last_batch ({'keep', 'discard', 'rollover'}) –
How to handle the last batch if batch_size does not evenly divide len(dataset).
keep - A batch with less samples than previous batches is returned. discard - The last batch is discarded if its incomplete. rollover - The remaining samples are rolled over to the next epoch.
- batch_sampler (Sampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
Dataset container.
-
class
mxnet.gluon.data.vision.
MNIST
(root='~/.mxnet/datasets/', train=True, transform=None)[source]¶ MNIST handwritten digits dataset from `http://yann.lecun.com/exdb/mnist`_.
Each sample is an image (in 3D NDArray) with shape (28, 28, 1).
Parameters: - root (str) – Path to temp folder for storing data.
- train (bool) – Whether to load the training or testing set.
- transform (function) –
A user defined callback that transforms each instance. For example:
transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.
CIFAR10
(root='~/.mxnet/datasets/', train=True, transform=None)[source]¶ CIFAR10 image classification dataset from `https://www.cs.toronto.edu/~kriz/cifar.html`_.
Each sample is an image (in 3D NDArray) with shape (32, 32, 1).
Parameters: - root (str) – Path to temp folder for storing data.
- train (bool) – Whether to load the training or testing set.
- transform (function) –
A user defined callback that transforms each instance. For example:
transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.
ImageRecordDataset
(filename, flag=1, transform=None)[source]¶ A dataset wrapping over a RecordIO file containing images.
Each sample is an image and its corresponding label.
Parameters: - filename (str) – Path to rec file.
- flag ({0, 1}, default 1) –
If 0, always convert images to greyscale.
If 1, always convert images to colored (RGB).
- transform (function) –
A user defined callback that transforms each instance. For example:
transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.
ImageFolderDataset
(root, flag=1, transform=None)[source]¶ A dataset for loading image files stored in a folder structure like:
root/car/0001.jpg root/car/xxxa.jpg root/car/yyyb.jpg root/bus/123.jpg root/bus/023.jpg root/bus/wwww.jpg
Parameters: - root (str) – Path to root directory.
- flag ({0, 1}, default 1) – If 0, always convert loaded images to greyscale (1 channel). If 1, always convert loaded images to colored (3 channels).
- transform (callable) –
A function that takes data and label and transforms them:
transform = lambda data, label: (data.astype(np.float32)/255, label)
-
synsets
¶ list – List of class names. synsets[i] is the name for the integer label i
-
items
¶ list of tuples – List of all images in (filename, label) pairs.
-
vision.
get_model
(name, **kwargs)¶ Returns a pre-defined model by name
Parameters: - name (str) – Name of the model.
- pretrained (bool) – Whether to load the pretrained weights for model.
- classes (int) – Number of classes for the output layer.
Returns: The model.
Return type:
-
vision.
resnet18_v1
(**kwargs)¶ ResNet-18 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet34_v1
(**kwargs)¶ ResNet-34 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet50_v1
(**kwargs)¶ ResNet-50 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet101_v1
(**kwargs)¶ ResNet-101 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet152_v1
(**kwargs)¶ ResNet-152 V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet18_v2
(**kwargs)¶ ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet34_v2
(**kwargs)¶ ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet50_v2
(**kwargs)¶ ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet101_v2
(**kwargs)¶ ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
resnet152_v2
(**kwargs)¶ ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
get_resnet
(version, num_layers, pretrained=False, ctx=cpu(0), **kwargs)¶ ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - version (int) – Version of ResNet. Options are 1, 2.
- num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.
- pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
ResNetV1
(block, layers, channels, classes=1000, thumbnail=False, **kwargs)[source]¶ ResNet V1 model from “Deep Residual Learning for Image Recognition” paper.
Parameters: - block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
- layers (list of int) – Numbers of layers in each block
- channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.
- classes (int, default 1000) – Number of classification classes.
- thumbnail (bool, default False) – Enable thumbnail.
-
class
mxnet.gluon.model_zoo.vision.
BasicBlockV1
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ BasicBlock V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 18, 34 layers.
Parameters: - channels (int) – Number of output channels.
- stride (int) – Stride size.
- downsample (bool, default False) – Whether to downsample the input.
- in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.
-
class
mxnet.gluon.model_zoo.vision.
BottleneckV1
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ Bottleneck V1 from “Deep Residual Learning for Image Recognition” paper. This is used for ResNet V1 for 50, 101, 152 layers.
Parameters: - channels (int) – Number of output channels.
- stride (int) – Stride size.
- downsample (bool, default False) – Whether to downsample the input.
- in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.
-
class
mxnet.gluon.model_zoo.vision.
ResNetV2
(block, layers, channels, classes=1000, thumbnail=False, **kwargs)[source]¶ ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.
Parameters: - block (HybridBlock) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
- layers (list of int) – Numbers of layers in each block
- channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.
- classes (int, default 1000) – Number of classification classes.
- thumbnail (bool, default False) – Enable thumbnail.
-
class
mxnet.gluon.model_zoo.vision.
BasicBlockV2
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ BasicBlock V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 18, 34 layers.
Parameters: - channels (int) – Number of output channels.
- stride (int) – Stride size.
- downsample (bool, default False) – Whether to downsample the input.
- in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.
-
class
mxnet.gluon.model_zoo.vision.
BottleneckV2
(channels, stride, downsample=False, in_channels=0, **kwargs)[source]¶ Bottleneck V2 from “Identity Mappings in Deep Residual Networks” paper. This is used for ResNet V2 for 50, 101, 152 layers.
Parameters: - channels (int) – Number of output channels.
- stride (int) – Stride size.
- downsample (bool, default False) – Whether to downsample the input.
- in_channels (int, default 0) – Number of input channels. Default is 0, to infer from the graph.
-
vision.
vgg11
(**kwargs)¶ VGG-11 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg13
(**kwargs)¶ VGG-13 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg16
(**kwargs)¶ VGG-16 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg19
(**kwargs)¶ VGG-19 model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg11_bn
(**kwargs)¶ VGG-11 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg13_bn
(**kwargs)¶ VGG-13 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg16_bn
(**kwargs)¶ VGG-16 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
vgg19_bn
(**kwargs)¶ VGG-19 model with batch normalization from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
get_vgg
(num_layers, pretrained=False, ctx=cpu(0), **kwargs)¶ VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - num_layers (int) – Number of layers for the variant of densenet. Options are 11, 13, 16, 19.
- pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
VGG
(layers, filters, classes=1000, batch_norm=False, **kwargs)[source]¶ VGG model from the “Very Deep Convolutional Networks for Large-Scale Image Recognition” paper.
Parameters: - layers (list of int) – Numbers of layers in each feature block.
- filters (list of int) – Numbers of filters in each feature block. List length should match the layers.
- classes (int, default 1000) – Number of classification classes.
- batch_norm (bool, default False) – Use batch normalization.
-
vision.
alexnet
(pretrained=False, ctx=cpu(0), **kwargs)¶ AlexNet model from the “One weird trick...” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
AlexNet
(classes=1000, **kwargs)[source]¶ AlexNet model from the “One weird trick...” paper.
Parameters: classes (int, default 1000) – Number of classes for the output layer.
-
vision.
densenet121
(**kwargs)¶ Densenet-BC 121-layer model from the “Densely Connected Convolutional Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
densenet161
(**kwargs)¶ Densenet-BC 161-layer model from the “Densely Connected Convolutional Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
densenet169
(**kwargs)¶ Densenet-BC 169-layer model from the “Densely Connected Convolutional Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
densenet201
(**kwargs)¶ Densenet-BC 201-layer model from the “Densely Connected Convolutional Networks” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
DenseNet
(num_init_features, growth_rate, block_config, bn_size=4, dropout=0, classes=1000, **kwargs)[source]¶ Densenet-BC model from the “Densely Connected Convolutional Networks” paper.
Parameters: - num_init_features (int) – Number of filters to learn in the first convolution layer.
- growth_rate (int) – Number of filters to add each layer (k in the paper).
- block_config (list of int) – List of integers for numbers of layers in each pooling block.
- bn_size (int, default 4) – Multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)
- dropout (float, default 0) – Rate of dropout after each dense layer.
- classes (int, default 1000) – Number of classification classes.
-
vision.
squeezenet1_0
(**kwargs)¶ SqueezeNet 1.0 model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
vision.
squeezenet1_1
(**kwargs)¶ SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
SqueezeNet
(version, classes=1000, **kwargs)[source]¶ SqueezeNet model from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size” paper. SqueezeNet 1.1 model from the official SqueezeNet repo. SqueezeNet 1.1 has 2.4x less computation and slightly fewer parameters than SqueezeNet 1.0, without sacrificing accuracy.
Parameters: - version (str) – Version of squeezenet. Options are ‘1.0’, ‘1.1’.
- classes (int, default 1000) – Number of classification classes.
-
vision.
inception_v3
(pretrained=False, ctx=cpu(0), **kwargs)¶ Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.
Parameters: - pretrained (bool, default False) – Whether to load the pretrained weights for model.
- ctx (Context, default CPU) – The context in which to load the pretrained weights.
-
class
mxnet.gluon.model_zoo.vision.
Inception3
(classes=1000, **kwargs)[source]¶ Inception v3 model from “Rethinking the Inception Architecture for Computer Vision” paper.
Parameters: classes (int, default 1000) – Number of classification classes.