Gluon Data API¶
Overview¶
This document lists the data APIs in Gluon:
mxnet.gluon.data |
Dataset utilities. |
mxnet.gluon.data.vision |
Vision utilities. |
The Gluon Data
API, defined in the gluon.data
package, provides useful dataset loading
and processing tools, as well as common public datasets.
In the rest of this document, we list routines provided by the gluon.data
package.
Data¶
Dataset |
Abstract dataset class. |
ArrayDataset |
A dataset that combines multiple dataset-like objects, e.g. |
RecordFileDataset |
A dataset wrapping over a RecordIO (.rec) file. |
Sampler |
Base class for samplers. |
SequentialSampler |
Samples elements from [0, length) sequentially. |
RandomSampler |
Samples elements from [0, length) randomly without replacement. |
BatchSampler |
Wraps over another Sampler and return mini-batches of samples. |
DataLoader |
Loads data from a dataset and returns mini-batches of data. |
Vision¶
Vision Datasets¶
MNIST |
MNIST handwritten digits dataset from http://yann.lecun.com/exdb/mnist |
FashionMNIST |
A dataset of Zalando’s article images consisting of fashion products, |
CIFAR10 |
CIFAR10 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html |
CIFAR100 |
CIFAR100 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html |
ImageRecordDataset |
A dataset wrapping over a RecordIO file containing images. |
ImageFolderDataset |
A dataset for loading image files stored in a folder structure like: |
API Reference¶
Dataset utilities.
-
class
mxnet.gluon.data.
ArrayDataset
(*args)[source]¶ A dataset that combines multiple dataset-like objects, e.g. Datasets, lists, arrays, etc.
The i-th sample is defined as (x1[i], x2[i], ...).
Parameters: *args (one or more dataset-like objects) – The data arrays.
-
class
mxnet.gluon.data.
BatchSampler
(sampler, batch_size, last_batch='keep')[source]¶ Wraps over another Sampler and return mini-batches of samples.
Parameters: - sampler (Sampler) – The source Sampler.
- batch_size (int) – Size of mini-batch.
- last_batch ({'keep', 'discard', 'rollover'}) –
Specifies how the last batch is handled if batch_size does not evenly divide sequence length.
If ‘keep’, the last batch will be returned directly, but will contain less element than batch_size requires.
If ‘discard’, the last batch will be discarded.
If ‘rollover’, the remaining elements will be rolled over to the next iteration.
Examples
>>> sampler = gluon.data.SequentialSampler(10) >>> batch_sampler = gluon.data.BatchSampler(sampler, 3, 'keep') >>> list(batch_sampler) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
-
class
mxnet.gluon.data.
DataLoader
(dataset, batch_size=None, shuffle=False, sampler=None, last_batch=None, batch_sampler=None, batchify_fn=None, num_workers=0)[source]¶ Loads data from a dataset and returns mini-batches of data.
Parameters: - dataset (Dataset) – Source dataset. Note that numpy and mxnet arrays can be directly used as a Dataset.
- batch_size (int) – Size of mini-batch.
- shuffle (bool) – Whether to shuffle the samples.
- sampler (Sampler) – The sampler to use. Either specify sampler or shuffle, not both.
- last_batch ({'keep', 'discard', 'rollover'}) –
How to handle the last batch if batch_size does not evenly divide len(dataset).
keep - A batch with less samples than previous batches is returned. discard - The last batch is discarded if its incomplete. rollover - The remaining samples are rolled over to the next epoch.
- batch_sampler (Sampler) – A sampler that returns mini-batches. Do not specify batch_size, shuffle, sampler, and last_batch if batch_sampler is specified.
- batchify_fn (callable) –
Callback function to allow users to specify how to merge samples into a batch. Defaults to default_batchify_fn:
def default_batchify_fn(data): if isinstance(data[0], nd.NDArray): return nd.stack(*data) elif isinstance(data[0], tuple): data = zip(*data) return [default_batchify_fn(i) for i in data] else: data = np.asarray(data) return nd.array(data, dtype=data.dtype)
- num_workers (int, default 0) – The number of multiprocessing workers to use for data preprocessing. num_workers > 0 is not supported on Windows yet.
-
class
mxnet.gluon.data.
Dataset
[source]¶ Abstract dataset class. All datasets should have this interface.
Subclasses need to override __getitem__, which returns the i-th element, and __len__, which returns the total number elements.
Note
An mxnet or numpy array can be directly used as a dataset.
-
transform
(fn, lazy=True)[source]¶ Returns a new dataset with each sample transformed by the transformer function fn.
Parameters: - fn (callable) – A transformer function that takes a sample as input and returns the transformed sample.
- lazy (bool, default True) – If False, transforms all samples at once. Otherwise, transforms each sample on demand. Note that if fn is stochastic, you must set lazy to True or you will get the same result on all epochs.
Returns: The transformed dataset.
Return type:
-
transform_first
(fn, lazy=True)[source]¶ Returns a new dataset with the first element of each sample transformed by the transformer function fn.
This is useful, for example, when you only want to transform data while keeping label as is.
Parameters: - fn (callable) – A transformer function that takes the first elemtn of a sample as input and returns the transformed element.
- lazy (bool, default True) – If False, transforms all samples at once. Otherwise, transforms each sample on demand. Note that if fn is stochastic, you must set lazy to True or you will get the same result on all epochs.
Returns: The transformed dataset.
Return type:
-
-
class
mxnet.gluon.data.
RandomSampler
(length)[source]¶ Samples elements from [0, length) randomly without replacement.
Parameters: length (int) – Length of the sequence.
-
class
mxnet.gluon.data.
RecordFileDataset
(filename)[source]¶ A dataset wrapping over a RecordIO (.rec) file.
Each sample is a string representing the raw content of an record.
Parameters: filename (str) – Path to rec file.
-
class
mxnet.gluon.data.
Sampler
[source]¶ Base class for samplers.
All samplers should subclass Sampler and define __iter__ and __len__ methods.
-
class
mxnet.gluon.data.
SequentialSampler
(length)[source]¶ Samples elements from [0, length) sequentially.
Parameters: length (int) – Length of the sequence.
-
class
mxnet.gluon.data.
SimpleDataset
(data)[source]¶ Simple Dataset wrapper for lists and arrays.
Parameters: data (dataset-like object) – Any object that implements len() and [].
Vision utilities.
Dataset container.
-
class
mxnet.gluon.data.vision.datasets.
MNIST
(root='~/.mxnet/datasets/mnist', train=True, transform=None)[source]¶ MNIST handwritten digits dataset from http://yann.lecun.com/exdb/mnist
Each sample is an image (in 3D NDArray) with shape (28, 28, 1).
Parameters: - root (str, default '~/.mxnet/datasets/mnist') – Path to temp folder for storing data.
- train (bool, default True) – Whether to load the training or testing set.
- transform (function, default None) – A user defined callback that transforms each sample. For example:
- :: – transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.datasets.
FashionMNIST
(root='~/.mxnet/datasets/fashion-mnist', train=True, transform=None)[source]¶ A dataset of Zalando’s article images consisting of fashion products, a drop-in replacement of the original MNIST dataset from https://github.com/zalandoresearch/fashion-mnist
Each sample is an image (in 3D NDArray) with shape (28, 28, 1).
Parameters: - root (str, default '~/.mxnet/datasets/fashion-mnist') – Path to temp folder for storing data.
- train (bool, default True) – Whether to load the training or testing set.
- transform (function, default None) – A user defined callback that transforms each sample. For example:
- :: – transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.datasets.
CIFAR10
(root='~/.mxnet/datasets/cifar10', train=True, transform=None)[source]¶ CIFAR10 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html
Each sample is an image (in 3D NDArray) with shape (32, 32, 1).
Parameters: - root (str, default '~/.mxnet/datasets/cifar10') – Path to temp folder for storing data.
- train (bool, default True) – Whether to load the training or testing set.
- transform (function, default None) – A user defined callback that transforms each sample. For example:
- :: – transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.datasets.
CIFAR100
(root='~/.mxnet/datasets/cifar100', fine_label=False, train=True, transform=None)[source]¶ CIFAR100 image classification dataset from https://www.cs.toronto.edu/~kriz/cifar.html
Each sample is an image (in 3D NDArray) with shape (32, 32, 1).
Parameters: - root (str, default '~/.mxnet/datasets/cifar100') – Path to temp folder for storing data.
- fine_label (bool, default False) – Whether to load the fine-grained (100 classes) or coarse-grained (20 super-classes) labels.
- train (bool, default True) – Whether to load the training or testing set.
- transform (function, default None) – A user defined callback that transforms each sample. For example:
- :: – transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.datasets.
ImageRecordDataset
(filename, flag=1, transform=None)[source]¶ A dataset wrapping over a RecordIO file containing images.
Each sample is an image and its corresponding label.
Parameters: - filename (str) – Path to rec file.
- flag ({0, 1}, default 1) –
If 0, always convert images to greyscale.
If 1, always convert images to colored (RGB).
- transform (function, default None) – A user defined callback that transforms each sample. For example:
- :: – transform=lambda data, label: (data.astype(np.float32)/255, label)
-
class
mxnet.gluon.data.vision.datasets.
ImageFolderDataset
(root, flag=1, transform=None)[source]¶ A dataset for loading image files stored in a folder structure like:
root/car/0001.jpg root/car/xxxa.jpg root/car/yyyb.jpg root/bus/123.jpg root/bus/023.jpg root/bus/wwww.jpg
Parameters: - root (str) – Path to root directory.
- flag ({0, 1}, default 1) – If 0, always convert loaded images to greyscale (1 channel). If 1, always convert loaded images to colored (3 channels).
- transform (callable, default None) – A function that takes data and label and transforms them:
- :: – transform = lambda data, label: (data.astype(np.float32)/255, label)
-
synsets
¶ list – List of class names. synsets[i] is the name for the integer label i
-
items
¶ list of tuples – List of all images in (filename, label) pairs.