mxnet.io

Data iterators for common data formats and utility functions.

Functions

CSVIter(*args, **kwargs)

b”Returns the CSV file iterator.nnIn this function, the data_shape parameter is used to set the shape of each line of the input data.nIf a row in an input file is 1,2,3,4,5,6` and data_shape is (3,2), that rownwill be reshaped, yielding the array [[1,2],[3,4],[5,6]] of shape (3,2).nnBy default, the CSVIter has round_batch parameter set to True. So, if batch_sizenis 3 and there are 4 total rows in CSV file, 2 more examplesnare consumed at the first round. If reset function is called after first round,nthe call is ignored and remaining examples are returned in the second round.nnIf one wants all the instances in the second round after calling reset, make surento set round_batch to False.nnIf data_csv = 'data/' is set, then all the files in this directory will be read.nn``reset()`` is expected to be called only after a complete pass of data.nnBy default, the CSVIter parses all entries in the data file as float32 data type,nif dtype argument is set to be ‘int32’ or ‘int64’ then CSVIter will parse all entries in the filenas int32 or int64 data type accordingly.nnExamples::nn // Contents of CSV file data/data.csv.n 1,2,3n 2,3,4n 3,4,5n 4,5,6nn // Creates a CSVIter with batch_size`=2 and default `round_batch`=True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 2)nn // Two batches read from the above iterator are as follows:n [[ 1. 2. 3.]n [ 2. 3. 4.]]n [[ 3. 4. 5.]n [ 4. 5. 6.]]nn // Creates a `CSVIter with default round_batch set to True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3)nn // Two batches read from the above iterator in the first pass are as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [1. 2. 3.]n [2. 3. 4.]]nn // Now, reset method is called.n CSVIter.reset()nn // Batch read from the above iterator in the second pass is as follows:n [[ 3. 4. 5.]n [ 4. 5. 6.]n [ 1. 2. 3.]]nn // Creates a CSVIter with round_batch`=False.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [2. 3. 4.]n [3. 4. 5.]]nn // Creates a ‘CSVIter’ with dtype`=’int32’n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False, dtype=’int32’)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1 2 3]n [2 3 4]n [3 4 5]]nn [[4 5 6]n [2 3 4]n [3 4 5]]nnnnDefined in src/io/iter_csv.cc:L308”

ImageDetRecordIter(*args, **kwargs)

b’Create iterator for image detection dataset packed in recordio.’

ImageRecordInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordInt8Iter is deprecated. Use ImageRecordIter(dtype=’int8’) instead.nnThis iterator is identical to ImageRecordIter except for using int8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L948”

ImageRecordIter(*args, **kwargs)

b’Iterates on image RecordIO filesnnReads batches of images from .rec RecordIO files. One can use im2rec.py tooln(in tools/) to pack raw image files into RecordIO files. This iterator is lessnflexible to customization but is fast and has lot of language bindings. Toniterate over raw images directly use ImageIter instead (in Python).nnExample::nn data_iter = mx.io.ImageRecordIter(n path_imgrec=”./sample.rec”, # The target record file.n data_shape=(3, 227, 227), # Output data shape; 227x227 region will be cropped from the original image.n batch_size=4, # Number of items per batch.n resize=256 # Resize the shorter edge to 256 before cropping.n # You can specify more augmentation options. Use help(mx.io.ImageRecordIter) to see all the options.n )n # You can now use the data_iter to access batches of images.n batch = data_iter.next() # first batch.n images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227.n # process the imagesn …n data_iter.reset() # To restart the iterator from the beginning.nnnnDefined in src/io/iter_image_recordio_2.cc:L911’

ImageRecordIter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordIter_v1 is deprecated. Use ImageRecordIter instead.nnnRead images batches from RecordIO files with a rich of data augmentationnoptions.nnOne can use tools/im2rec.py to pack individual image files into RecordIOnfiles.nnnnDefined in src/io/iter_image_recordio.cc:L352’

ImageRecordUInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordUInt8Iter is deprecated. Use ImageRecordIter(dtype=’uint8’) instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L930”

ImageRecordUInt8Iter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordUInt8Iter_v1 is deprecated. Use ImageRecordUInt8Iter instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio.cc:L377’

LibSVMIter(*args, **kwargs)

b”Returns the LibSVM iterator which returns data with csrnstorage type. This iterator is experimental and should be used with care.nnThe input data is stored in a format similar to LibSVM file format, except that the indicesnare expected to be zero-based instead of one-based, and the column indices for each row arenexpected to be sorted in ascending order. Details of the LibSVM format are availablen`here. <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/>`_nnnThe data_shape parameter is used to set the shape of each line of the data.nThe dimension of both data_shape and label_shape are expected to be 1.nnThe data_libsvm parameter is used to set the path input LibSVM file.nWhen it is set to a directory, all the files in the directory will be read.nnWhen label_libsvm is set to NULL, both data and label are read from the file specifiednby data_libsvm. In this case, the data is stored in csr storage type, while the label is a 1Dndense array.nnThe LibSVMIter only support round_batch parameter set to True. Therefore, if batch_sizenis 3 and there are 4 total rows in libsvm file, 2 more examples are consumed at the first round.nnWhen num_parts and part_index are provided, the data is split into num_parts partitions,nand the iterator only reads the part_index-th partition. However, the partitions are notnguaranteed to be even.nn``reset()`` is expected to be called only after a complete pass of data.nnExample::nn # Contents of libsvm file data.t.n 1.0 0:0.5 2:1.2n -2.0n -3.0 0:0.6 1:2.4 2:1.2n 4 2:-1.2nn # Creates a LibSVMIter with batch_size`=3.n >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,), batch_size = 3)n # The data of the first batch is stored in csr storage typen >>> batch = data_iter.next()n >>> csr = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2]]n # The label of first batchn >>> label = batch.label[0]n >>> labeln [ 1. -2. -3.]n <NDArray 3 @cpu(0)>nn >>> second_batch = data_iter.next()n # The data of the second batchn >>> second_batch.data[0].asnumpy()n [[ 0. 0. -1.2 ]n [ 0.5 0. 1.2 ]n [ 0. 0. 0. ]]n # The label of the second batchn >>> second_batch.label[0].asnumpy()n [ 4. 1. -2.]nn >>> data_iter.reset()n # To restart the iterator for the second pass of the datannWhen `label_libsvm is set to the path to another LibSVM file,ndata is read from data_libsvm and label from label_libsvm.nIn this case, both data and label are stored in the csr format.nIf the label column in the data_libsvm file is ignored.nnExample::nn # Contents of libsvm file label.tn 1.0n -2.0 0:0.125n -3.0 2:1.2n 4 1:1.0 2:-1.2nn # Creates a LibSVMIter with specified label filen >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,),n label_libsvm = ‘label.t’, label_shape = (3,), batch_size = 3)nn # Both data and label are in csr storage typen >>> batch = data_iter.next()n >>> csr_data = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_data.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2 ]]n >>> csr_label = batch.label[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_label.asnumpy()n [[ 0. 0. 0. ]n [ 0.125 0. 0. ]n [ 0. 0. 1.2 ]]nnnnDefined in src/io/iter_libsvm.cc:L298”

MNISTIter(*args, **kwargs)

b’Iterating on the MNIST dataset.nnOne can download the dataset from http://yann.lecun.com/exdb/mnist/nnnnDefined in src/io/iter_mnist.cc:L265’

Classes

DataBatch(data[, label, pad, index, …])

A data batch.

DataDesc

DataDesc is used to store name, shape, type and layout information of the data or the label.

DataIter([batch_size])

The base class for an MXNet data iterator.

MXDataIter(handle[, data_name, label_name])

A python wrapper a C++ data iterator.

NDArrayIter(data[, label, batch_size, …])

Returns an iterator for mx.nd.NDArray, numpy.ndarray, h5py.Dataset mx.nd.sparse.CSRNDArray or scipy.sparse.csr_matrix.

PrefetchingIter(iters[, rename_data, …])

Performs pre-fetch for other data iterators.

ResizeIter(data_iter, size[, reset_internal])

Resize a data iterator to a given number of batches.

mxnet.io.CSVIter(*args, **kwargs)

b”Returns the CSV file iterator.nnIn this function, the data_shape parameter is used to set the shape of each line of the input data.nIf a row in an input file is 1,2,3,4,5,6` and data_shape is (3,2), that rownwill be reshaped, yielding the array [[1,2],[3,4],[5,6]] of shape (3,2).nnBy default, the CSVIter has round_batch parameter set to True. So, if batch_sizenis 3 and there are 4 total rows in CSV file, 2 more examplesnare consumed at the first round. If reset function is called after first round,nthe call is ignored and remaining examples are returned in the second round.nnIf one wants all the instances in the second round after calling reset, make surento set round_batch to False.nnIf data_csv = 'data/' is set, then all the files in this directory will be read.nn``reset()`` is expected to be called only after a complete pass of data.nnBy default, the CSVIter parses all entries in the data file as float32 data type,nif dtype argument is set to be ‘int32’ or ‘int64’ then CSVIter will parse all entries in the filenas int32 or int64 data type accordingly.nnExamples::nn // Contents of CSV file data/data.csv.n 1,2,3n 2,3,4n 3,4,5n 4,5,6nn // Creates a CSVIter with batch_size`=2 and default `round_batch`=True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 2)nn // Two batches read from the above iterator are as follows:n [[ 1. 2. 3.]n [ 2. 3. 4.]]n [[ 3. 4. 5.]n [ 4. 5. 6.]]nn // Creates a `CSVIter with default round_batch set to True.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3)nn // Two batches read from the above iterator in the first pass are as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [1. 2. 3.]n [2. 3. 4.]]nn // Now, reset method is called.n CSVIter.reset()nn // Batch read from the above iterator in the second pass is as follows:n [[ 3. 4. 5.]n [ 4. 5. 6.]n [ 1. 2. 3.]]nn // Creates a CSVIter with round_batch`=False.n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1. 2. 3.]n [2. 3. 4.]n [3. 4. 5.]]nn [[4. 5. 6.]n [2. 3. 4.]n [3. 4. 5.]]nn // Creates a ‘CSVIter’ with dtype`=’int32’n CSVIter = mx.io.CSVIter(data_csv = ‘data/data.csv’, data_shape = (3,),n batch_size = 3, round_batch=False, dtype=’int32’)nn // Contents of two batches read from the above iterator in both passes, after callingn // `reset method before second pass, is as follows:n [[1 2 3]n [2 3 4]n [3 4 5]]nn [[4 5 6]n [2 3 4]n [3 4 5]]nnnnDefined in src/io/iter_csv.cc:L308”

Parameters
  • data_csv (string, required) – The input CSV file or a directory path.

  • data_shape (Shape(tuple), required) – The shape of one example.

  • label_csv (string, optional, default='NULL') – The input CSV file or a directory path. If NULL, all labels will be returned as 0.

  • label_shape (Shape(tuple), optional, default=[1]) – The shape of one label.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • device_id (int, optional, default='-1') – The default device id for context. -1 indicate it’s on default device

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

Returns

The result iterator.

Return type

MXDataIter

class mxnet.io.DataBatch(data, label=None, pad=None, index=None, bucket_key=None, provide_data=None, provide_label=None)[source]

Bases: object

A data batch.

MXNet’s data iterator returns a batch of data for each next call. This data contains batch_size number of examples.

If the input data consists of images, then shape of these images depend on the layout attribute of DataDesc object in provide_data parameter.

If layout is set to ‘NCHW’ then, images should be stored in a 4-D matrix of shape (batch_size, num_channel, height, width). If layout is set to ‘NHWC’ then, images should be stored in a 4-D matrix of shape (batch_size, height, width, num_channel). The channels are often in RGB order.

Parameters
  • data (list of NDArray, each array containing batch_size examples.) – A list of input data.

  • label (list of NDArray, each array often containing a 1-dimensional array. optional) – A list of input labels.

  • pad (int, optional) – The number of examples padded at the end of a batch. It is used when the total number of examples read is not divisible by the batch_size. These extra padded examples are ignored in prediction.

  • index (numpy.array, optional) – The example indices in this batch.

  • bucket_key (int, optional) – The bucket key, used for bucketing module.

  • provide_data (list of DataDesc, optional) – A list of DataDesc objects. DataDesc is used to store name, shape, type and layout information of the data. The i-th element describes the name and shape of data[i].

  • provide_label (list of DataDesc, optional) – A list of DataDesc objects. DataDesc is used to store name, shape, type and layout information of the label. The i-th element describes the name and shape of label[i].

class mxnet.io.DataDesc[source]

Bases: mxnet.io.io.DataDesc

DataDesc is used to store name, shape, type and layout information of the data or the label.

The layout describes how the axes in shape should be interpreted, for example for image data setting layout=NCHW indicates that the first axis is number of examples in the batch(N), C is number of channels, H is the height and W is the width of the image.

For sequential data, by default layout is set to NTC, where N is number of examples in the batch, T the temporal axis representing time and C is the number of channels.

Parameters
  • cls (DataDesc) – The class.

  • name (str) – Data name.

  • shape (tuple of int) – Data shape.

  • dtype (np.dtype, optional) – Data type.

  • layout (str, optional) – Data layout.

Methods

get_batch_axis(layout)

Get the dimension that corresponds to the batch size.

get_list(shapes, types)

Get DataDesc list from attribute lists.

static get_batch_axis(layout)[source]

Get the dimension that corresponds to the batch size.

When data parallelism is used, the data will be automatically split and concatenated along the batch-size dimension. Axis can be -1, which means the whole array will be copied for each data-parallelism device.

Parameters

layout (str) – layout string. For example, “NCHW”.

Returns

An axis indicating the batch_size dimension.

Return type

int

static get_list(shapes, types)[source]

Get DataDesc list from attribute lists.

Parameters
  • shapes (a tuple of (name_, shape_)) –

  • types (a tuple of (name_, np.dtype)) –

class mxnet.io.DataIter(batch_size=0)[source]

Bases: object

The base class for an MXNet data iterator.

All I/O in MXNet is handled by specializations of this class. Data iterators in MXNet are similar to standard-iterators in Python. On each call to next they return a DataBatch which represents the next batch of data. When there is no more data to return, it raises a StopIteration exception.

Parameters

batch_size (int, optional) – The batch size, namely the number of items in the batch.

Methods

getdata()

Get data of current batch.

getindex()

Get index of the current batch.

getlabel()

Get label of the current batch.

getpad()

Get the number of padding examples in the current batch.

iter_next()

Move to the next batch.

next()

Get next data batch from iterator.

reset()

Reset the iterator to the begin of the data.

See also

NDArrayIter

Data-iterator for MXNet NDArray or numpy-ndarray objects.

CSVIter

Data-iterator for csv data.

LibSVMIter

Data-iterator for libsvm data.

ImageIter

Data-iterator for images.

getdata()[source]

Get data of current batch.

Returns

The data of the current batch.

Return type

list of NDArray

getindex()[source]

Get index of the current batch.

Returns

index – The indices of examples in the current batch.

Return type

numpy.array

getlabel()[source]

Get label of the current batch.

Returns

The label of the current batch.

Return type

list of NDArray

getpad()[source]

Get the number of padding examples in the current batch.

Returns

Number of padding examples in the current batch.

Return type

int

iter_next()[source]

Move to the next batch.

Returns

Whether the move is successful.

Return type

boolean

next()[source]

Get next data batch from iterator.

Returns

The data of next batch.

Return type

DataBatch

Raises

StopIteration – If the end of the data is reached.

reset()[source]

Reset the iterator to the begin of the data.

mxnet.io.ImageDetRecordIter(*args, **kwargs)

b’Create iterator for image detection dataset packed in recordio.’

Parameters
  • path_imglist (string, optional, default='') – Dataset Param: Path to image list.

  • path_imgrec (string, optional, default='./data/imgrec.rec') – Dataset Param: Path to image record file.

  • aug_seq (string, optional, default='det_aug_default') – Augmentation Param: the augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters. Make sure you don’t use normal augmenters for detection tasks.

  • label_width (int, optional, default='-1') – Dataset Param: How many labels for an image, -1 for variable label size.

  • data_shape (Shape(tuple), required) – Dataset Param: Shape of each instance generated by the DataIter.

  • preprocess_threads (int, optional, default='4') – Backend Param: Number of thread to do preprocessing.

  • verbose (boolean, optional, default=1) – Auxiliary Param: Whether to output parser information.

  • num_parts (int, optional, default='1') – partition the data into multiple parts

  • part_index (int, optional, default='0') – the index of the part will read

  • shuffle_chunk_size (long (non-negative), optional, default=0) – the size(MB) of the shuffle chunk, used with shuffle=True, it can enable global shuffling

  • shuffle_chunk_seed (int, optional, default='0') – the seed for chunk shuffling

  • label_pad_width (int, optional, default='0') – pad output label width if set larger than 0, -1 for auto estimate

  • label_pad_value (float, optional, default=-1) – label padding value if enabled

  • shuffle (boolean, optional, default=0) – Augmentation Param: Whether to shuffle data.

  • seed (int, optional, default='0') – Augmentation Param: Random Seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • device_id (int, optional, default='-1') – The default device id for context. -1 indicate it’s on default device

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Augmentation Param: scale shorter edge to size before applying other augmentations, -1 to disable.

  • rand_crop_prob (float, optional, default=0) – Augmentation Param: Probability of random cropping, <= 0 to disable

  • min_crop_scales (tuple of <float>, optional, default=[0]) – Augmentation Param: Min crop scales.

  • max_crop_scales (tuple of <float>, optional, default=[1]) – Augmentation Param: Max crop scales.

  • min_crop_aspect_ratios (tuple of <float>, optional, default=[1]) – Augmentation Param: Min crop aspect ratios.

  • max_crop_aspect_ratios (tuple of <float>, optional, default=[1]) – Augmentation Param: Max crop aspect ratios.

  • min_crop_overlaps (tuple of <float>, optional, default=[0]) – Augmentation Param: Minimum crop IOU between crop_box and ground-truths.

  • max_crop_overlaps (tuple of <float>, optional, default=[1]) – Augmentation Param: Maximum crop IOU between crop_box and ground-truth.

  • min_crop_sample_coverages (tuple of <float>, optional, default=[0]) – Augmentation Param: Minimum ratio of intersect/crop_area between crop box and ground-truths.

  • max_crop_sample_coverages (tuple of <float>, optional, default=[1]) – Augmentation Param: Maximum ratio of intersect/crop_area between crop box and ground-truths.

  • min_crop_object_coverages (tuple of <float>, optional, default=[0]) – Augmentation Param: Minimum ratio of intersect/gt_area between crop box and ground-truths.

  • max_crop_object_coverages (tuple of <float>, optional, default=[1]) – Augmentation Param: Maximum ratio of intersect/gt_area between crop box and ground-truths.

  • num_crop_sampler (int, optional, default='1') – Augmentation Param: Number of crop samplers.

  • crop_emit_mode ({'center', 'overlap'},optional, default='center') – Augmentation Param: Emition mode for invalid ground-truths after crop. center: emit if centroid of object is out of crop region; overlap: emit if overlap is less than emit_overlap_thresh.

  • emit_overlap_thresh (float, optional, default=0.300000012) – Augmentation Param: Emit overlap thresh for emit mode overlap only.

  • max_crop_trials (Shape(tuple), optional, default=[25]) – Augmentation Param: Skip cropping if fail crop trail count exceeds this number.

  • rand_pad_prob (float, optional, default=0) – Augmentation Param: Probability for random padding.

  • max_pad_scale (float, optional, default=1) – Augmentation Param: Maximum padding scale.

  • max_random_hue (int, optional, default='0') – Augmentation Param: Maximum random value of H channel in HSL color space.

  • random_hue_prob (float, optional, default=0) – Augmentation Param: Probability to apply random hue.

  • max_random_saturation (int, optional, default='0') – Augmentation Param: Maximum random value of S channel in HSL color space.

  • random_saturation_prob (float, optional, default=0) – Augmentation Param: Probability to apply random saturation.

  • max_random_illumination (int, optional, default='0') – Augmentation Param: Maximum random value of L channel in HSL color space.

  • random_illumination_prob (float, optional, default=0) – Augmentation Param: Probability to apply random illumination.

  • max_random_contrast (float, optional, default=0) – Augmentation Param: Maximum random value of delta contrast.

  • random_contrast_prob (float, optional, default=0) – Augmentation Param: Probability to apply random contrast.

  • rand_mirror_prob (float, optional, default=0) – Augmentation Param: Probability to apply horizontal flip aka. mirror.

  • fill_value (int, optional, default='127') – Augmentation Param: Filled color value while padding.

  • inter_method (int, optional, default='1') – Augmentation Param: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • resize_mode ({'fit', 'force', 'shrink'},optional, default='force') – Augmentation Param: How image data fit in data_shape. force: force reshape to data_shape regardless of aspect ratio; shrink: ensure each side fit in data_shape, preserve aspect ratio; fit: fit image to data_shape, preserve ratio, will upscale if applicable.

  • mean_img (string, optional, default='') – Augmentation Param: Mean Image to be subtracted.

  • mean_r (float, optional, default=0) – Augmentation Param: Mean value on R channel.

  • mean_g (float, optional, default=0) – Augmentation Param: Mean value on G channel.

  • mean_b (float, optional, default=0) – Augmentation Param: Mean value on B channel.

  • mean_a (float, optional, default=0) – Augmentation Param: Mean value on Alpha channel.

  • std_r (float, optional, default=0) – Augmentation Param: Standard deviation on R channel.

  • std_g (float, optional, default=0) – Augmentation Param: Standard deviation on G channel.

  • std_b (float, optional, default=0) – Augmentation Param: Standard deviation on B channel.

  • std_a (float, optional, default=0) – Augmentation Param: Standard deviation on Alpha channel.

  • scale (float, optional, default=1) – Augmentation Param: Scale in color space.

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.ImageRecordInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordInt8Iter is deprecated. Use ImageRecordIter(dtype=’int8’) instead.nnThis iterator is identical to ImageRecordIter except for using int8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L948”

Parameters
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.

  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.

  • path_imgidx (string, optional, default='') – Path to the image RecordIO index (.idx) file. Created with tools/im2rec.py.

  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.

  • label_width (int, optional, default='1') – The number of labels per image.

  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.

  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.

  • verbose (boolean, optional, default=1) – If or not output verbose information.

  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.

  • part_index (int, optional, default='0') – The i-th virtual partition to be read.

  • device_id (int, optional, default='0') – The device id used to create context for internal NDArray. Setting device_id to -1 will create Context::CPU(0). Setting device_id to valid positive device id will create Context::CPUPinned(device_id). Default is 0.

  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.

  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling

  • seed_aug (int or None, optional, default='None') – Random seed for augmentations.

  • shuffle (boolean, optional, default=0) – Whether to shuffle data randomly or not.

  • seed (int, optional, default='0') – The random seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.

  • rand_crop (boolean, optional, default=0) – If or not randomly crop the image

  • random_resized_crop (boolean, optional, default=0) – If or not perform random resized cropping on the image, as a standard preprocessing for resnet training on ImageNet data.

  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]

  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value. If min_aspect_ratio is None then the aspect ratio ins sampled from [1 - max_aspect_ratio, 1 + max_aspect_ratio], else it is in [min_aspect_ratio, max_aspect_ratio]

  • min_aspect_ratio (float or None, optional, default=None) – Change the aspect (namely width/height) to a random value in [min_aspect_ratio, max_aspect_ratio]

  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]

  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]. Ignored if random_resized_crop is True.

  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]``Ignored if ``random_resized_crop is True.

  • max_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • min_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied

  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied

  • brightness (float, optional, default=0) – Add a random value in [-brightness, brightness] to the brightness of image.

  • contrast (float, optional, default=0) – Add a random value in [-contrast, contrast] to the contrast of image.

  • saturation (float, optional, default=0) – Add a random value in [-saturation, saturation] to the saturation of image.

  • pca_noise (float, optional, default=0) – Add PCA based noise to the image.

  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.

  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.

  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.

  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.

  • fill_value (int, optional, default='255') – Set the padding pixels value to fill_value.

  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.ImageRecordIter(*args, **kwargs)

b’Iterates on image RecordIO filesnnReads batches of images from .rec RecordIO files. One can use im2rec.py tooln(in tools/) to pack raw image files into RecordIO files. This iterator is lessnflexible to customization but is fast and has lot of language bindings. Toniterate over raw images directly use ImageIter instead (in Python).nnExample::nn data_iter = mx.io.ImageRecordIter(n path_imgrec=”./sample.rec”, # The target record file.n data_shape=(3, 227, 227), # Output data shape; 227x227 region will be cropped from the original image.n batch_size=4, # Number of items per batch.n resize=256 # Resize the shorter edge to 256 before cropping.n # You can specify more augmentation options. Use help(mx.io.ImageRecordIter) to see all the options.n )n # You can now use the data_iter to access batches of images.n batch = data_iter.next() # first batch.n images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227.n # process the imagesn …n data_iter.reset() # To restart the iterator from the beginning.nnnnDefined in src/io/iter_image_recordio_2.cc:L911’

Parameters
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.

  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.

  • path_imgidx (string, optional, default='') – Path to the image RecordIO index (.idx) file. Created with tools/im2rec.py.

  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.

  • label_width (int, optional, default='1') – The number of labels per image.

  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.

  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.

  • verbose (boolean, optional, default=1) – If or not output verbose information.

  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.

  • part_index (int, optional, default='0') – The i-th virtual partition to be read.

  • device_id (int, optional, default='0') – The device id used to create context for internal NDArray. Setting device_id to -1 will create Context::CPU(0). Setting device_id to valid positive device id will create Context::CPUPinned(device_id). Default is 0.

  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.

  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling

  • seed_aug (int or None, optional, default='None') – Random seed for augmentations.

  • shuffle (boolean, optional, default=0) – Whether to shuffle data randomly or not.

  • seed (int, optional, default='0') – The random seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.

  • rand_crop (boolean, optional, default=0) – If or not randomly crop the image

  • random_resized_crop (boolean, optional, default=0) – If or not perform random resized cropping on the image, as a standard preprocessing for resnet training on ImageNet data.

  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]

  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value. If min_aspect_ratio is None then the aspect ratio ins sampled from [1 - max_aspect_ratio, 1 + max_aspect_ratio], else it is in [min_aspect_ratio, max_aspect_ratio]

  • min_aspect_ratio (float or None, optional, default=None) – Change the aspect (namely width/height) to a random value in [min_aspect_ratio, max_aspect_ratio]

  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]

  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]. Ignored if random_resized_crop is True.

  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]``Ignored if ``random_resized_crop is True.

  • max_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • min_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied

  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied

  • brightness (float, optional, default=0) – Add a random value in [-brightness, brightness] to the brightness of image.

  • contrast (float, optional, default=0) – Add a random value in [-contrast, contrast] to the contrast of image.

  • saturation (float, optional, default=0) – Add a random value in [-saturation, saturation] to the saturation of image.

  • pca_noise (float, optional, default=0) – Add PCA based noise to the image.

  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.

  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.

  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.

  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.

  • fill_value (int, optional, default='255') – Set the padding pixels value to fill_value.

  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes

  • mirror (boolean, optional, default=0) – Whether to mirror the image or not. If true, images are flipped along the horizontal axis.

  • rand_mirror (boolean, optional, default=0) – Whether to randomly mirror images or not. If true, 50% of the images will be randomly mirrored (flipped along the horizontal axis)

  • mean_img (string, optional, default='') – Filename of the mean image.

  • mean_r (float, optional, default=0) – The mean value to be subtracted on the R channel

  • mean_g (float, optional, default=0) – The mean value to be subtracted on the G channel

  • mean_b (float, optional, default=0) – The mean value to be subtracted on the B channel

  • mean_a (float, optional, default=0) – The mean value to be subtracted on the alpha channel

  • std_r (float, optional, default=1) – Augmentation Param: Standard deviation on R channel.

  • std_g (float, optional, default=1) – Augmentation Param: Standard deviation on G channel.

  • std_b (float, optional, default=1) – Augmentation Param: Standard deviation on B channel.

  • std_a (float, optional, default=1) – Augmentation Param: Standard deviation on Alpha channel.

  • scale (float, optional, default=1) – Multiply the image with a scale value.

  • max_random_contrast (float, optional, default=0) – Change the contrast with a value randomly chosen from [-max_random_contrast, max_random_contrast]

  • max_random_illumination (float, optional, default=0) – Change the illumination with a value randomly chosen from [-max_random_illumination, max_random_illumination]

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.ImageRecordIter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordIter_v1 is deprecated. Use ImageRecordIter instead.nnnRead images batches from RecordIO files with a rich of data augmentationnoptions.nnOne can use tools/im2rec.py to pack individual image files into RecordIOnfiles.nnnnDefined in src/io/iter_image_recordio.cc:L352’

Parameters
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.

  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.

  • path_imgidx (string, optional, default='') – Path to the image RecordIO index (.idx) file. Created with tools/im2rec.py.

  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.

  • label_width (int, optional, default='1') – The number of labels per image.

  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.

  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.

  • verbose (boolean, optional, default=1) – If or not output verbose information.

  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.

  • part_index (int, optional, default='0') – The i-th virtual partition to be read.

  • device_id (int, optional, default='0') – The device id used to create context for internal NDArray. Setting device_id to -1 will create Context::CPU(0). Setting device_id to valid positive device id will create Context::CPUPinned(device_id). Default is 0.

  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.

  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling

  • seed_aug (int or None, optional, default='None') – Random seed for augmentations.

  • shuffle (boolean, optional, default=0) – Whether to shuffle data randomly or not.

  • seed (int, optional, default='0') – The random seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.

  • rand_crop (boolean, optional, default=0) – If or not randomly crop the image

  • random_resized_crop (boolean, optional, default=0) – If or not perform random resized cropping on the image, as a standard preprocessing for resnet training on ImageNet data.

  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]

  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value. If min_aspect_ratio is None then the aspect ratio ins sampled from [1 - max_aspect_ratio, 1 + max_aspect_ratio], else it is in [min_aspect_ratio, max_aspect_ratio]

  • min_aspect_ratio (float or None, optional, default=None) – Change the aspect (namely width/height) to a random value in [min_aspect_ratio, max_aspect_ratio]

  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]

  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]. Ignored if random_resized_crop is True.

  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]``Ignored if ``random_resized_crop is True.

  • max_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • min_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied

  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied

  • brightness (float, optional, default=0) – Add a random value in [-brightness, brightness] to the brightness of image.

  • contrast (float, optional, default=0) – Add a random value in [-contrast, contrast] to the contrast of image.

  • saturation (float, optional, default=0) – Add a random value in [-saturation, saturation] to the saturation of image.

  • pca_noise (float, optional, default=0) – Add PCA based noise to the image.

  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.

  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.

  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.

  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.

  • fill_value (int, optional, default='255') – Set the padding pixels value to fill_value.

  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes

  • mirror (boolean, optional, default=0) – Whether to mirror the image or not. If true, images are flipped along the horizontal axis.

  • rand_mirror (boolean, optional, default=0) – Whether to randomly mirror images or not. If true, 50% of the images will be randomly mirrored (flipped along the horizontal axis)

  • mean_img (string, optional, default='') – Filename of the mean image.

  • mean_r (float, optional, default=0) – The mean value to be subtracted on the R channel

  • mean_g (float, optional, default=0) – The mean value to be subtracted on the G channel

  • mean_b (float, optional, default=0) – The mean value to be subtracted on the B channel

  • mean_a (float, optional, default=0) – The mean value to be subtracted on the alpha channel

  • std_r (float, optional, default=1) – Augmentation Param: Standard deviation on R channel.

  • std_g (float, optional, default=1) – Augmentation Param: Standard deviation on G channel.

  • std_b (float, optional, default=1) – Augmentation Param: Standard deviation on B channel.

  • std_a (float, optional, default=1) – Augmentation Param: Standard deviation on Alpha channel.

  • scale (float, optional, default=1) – Multiply the image with a scale value.

  • max_random_contrast (float, optional, default=0) – Change the contrast with a value randomly chosen from [-max_random_contrast, max_random_contrast]

  • max_random_illumination (float, optional, default=0) – Change the illumination with a value randomly chosen from [-max_random_illumination, max_random_illumination]

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.ImageRecordUInt8Iter(*args, **kwargs)

b”Iterating on image RecordIO filesnn.. note:: ImageRecordUInt8Iter is deprecated. Use ImageRecordIter(dtype=’uint8’) instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio_2.cc:L930”

Parameters
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.

  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.

  • path_imgidx (string, optional, default='') – Path to the image RecordIO index (.idx) file. Created with tools/im2rec.py.

  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.

  • label_width (int, optional, default='1') – The number of labels per image.

  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.

  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.

  • verbose (boolean, optional, default=1) – If or not output verbose information.

  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.

  • part_index (int, optional, default='0') – The i-th virtual partition to be read.

  • device_id (int, optional, default='0') – The device id used to create context for internal NDArray. Setting device_id to -1 will create Context::CPU(0). Setting device_id to valid positive device id will create Context::CPUPinned(device_id). Default is 0.

  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.

  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling

  • seed_aug (int or None, optional, default='None') – Random seed for augmentations.

  • shuffle (boolean, optional, default=0) – Whether to shuffle data randomly or not.

  • seed (int, optional, default='0') – The random seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.

  • rand_crop (boolean, optional, default=0) – If or not randomly crop the image

  • random_resized_crop (boolean, optional, default=0) – If or not perform random resized cropping on the image, as a standard preprocessing for resnet training on ImageNet data.

  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]

  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value. If min_aspect_ratio is None then the aspect ratio ins sampled from [1 - max_aspect_ratio, 1 + max_aspect_ratio], else it is in [min_aspect_ratio, max_aspect_ratio]

  • min_aspect_ratio (float or None, optional, default=None) – Change the aspect (namely width/height) to a random value in [min_aspect_ratio, max_aspect_ratio]

  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]

  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]. Ignored if random_resized_crop is True.

  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]``Ignored if ``random_resized_crop is True.

  • max_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • min_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied

  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied

  • brightness (float, optional, default=0) – Add a random value in [-brightness, brightness] to the brightness of image.

  • contrast (float, optional, default=0) – Add a random value in [-contrast, contrast] to the contrast of image.

  • saturation (float, optional, default=0) – Add a random value in [-saturation, saturation] to the saturation of image.

  • pca_noise (float, optional, default=0) – Add PCA based noise to the image.

  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.

  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.

  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.

  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.

  • fill_value (int, optional, default='255') – Set the padding pixels value to fill_value.

  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.ImageRecordUInt8Iter_v1(*args, **kwargs)

b’Iterating on image RecordIO filesnn.. note::nn ImageRecordUInt8Iter_v1 is deprecated. Use ImageRecordUInt8Iter instead.nnThis iterator is identical to ImageRecordIter except for using uint8 asnthe data type instead of float.nnnnDefined in src/io/iter_image_recordio.cc:L377’

Parameters
  • path_imglist (string, optional, default='') – Path to the image list (.lst) file. Generally created with tools/im2rec.py. Format (Tab separated): <index of record> <one or more labels> <relative path from root folder>.

  • path_imgrec (string, optional, default='') – Path to the image RecordIO (.rec) file or a directory path. Created with tools/im2rec.py.

  • path_imgidx (string, optional, default='') – Path to the image RecordIO index (.idx) file. Created with tools/im2rec.py.

  • aug_seq (string, optional, default='aug_default') – The augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.

  • label_width (int, optional, default='1') – The number of labels per image.

  • data_shape (Shape(tuple), required) – The shape of one output image in (channels, height, width) format.

  • preprocess_threads (int, optional, default='4') – The number of threads to do preprocessing.

  • verbose (boolean, optional, default=1) – If or not output verbose information.

  • num_parts (int, optional, default='1') – Virtually partition the data into these many parts.

  • part_index (int, optional, default='0') – The i-th virtual partition to be read.

  • device_id (int, optional, default='0') – The device id used to create context for internal NDArray. Setting device_id to -1 will create Context::CPU(0). Setting device_id to valid positive device id will create Context::CPUPinned(device_id). Default is 0.

  • shuffle_chunk_size (long (non-negative), optional, default=0) – The data shuffle buffer size in MB. Only valid if shuffle is true.

  • shuffle_chunk_seed (int, optional, default='0') – The random seed for shuffling

  • seed_aug (int or None, optional, default='None') – Random seed for augmentations.

  • shuffle (boolean, optional, default=0) – Whether to shuffle data randomly or not.

  • seed (int, optional, default='0') – The random seed.

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

  • resize (int, optional, default='-1') – Down scale the shorter edge to a new size before applying other augmentations.

  • rand_crop (boolean, optional, default=0) – If or not randomly crop the image

  • random_resized_crop (boolean, optional, default=0) – If or not perform random resized cropping on the image, as a standard preprocessing for resnet training on ImageNet data.

  • max_rotate_angle (int, optional, default='0') – Rotate by a random degree in [-v, v]

  • max_aspect_ratio (float, optional, default=0) – Change the aspect (namely width/height) to a random value. If min_aspect_ratio is None then the aspect ratio ins sampled from [1 - max_aspect_ratio, 1 + max_aspect_ratio], else it is in [min_aspect_ratio, max_aspect_ratio]

  • min_aspect_ratio (float or None, optional, default=None) – Change the aspect (namely width/height) to a random value in [min_aspect_ratio, max_aspect_ratio]

  • max_shear_ratio (float, optional, default=0) – Apply a shear transformation (namely (x,y)->(x+my,y)) with m randomly chose from [-max_shear_ratio, max_shear_ratio]

  • max_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • min_crop_size (int, optional, default='-1') – Crop both width and height into a random size in [min_crop_size, max_crop_size].``Ignored if ``random_resized_crop is True.

  • max_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]. Ignored if random_resized_crop is True.

  • min_random_scale (float, optional, default=1) – Resize into [width*s, height*s] with s randomly chosen from [min_random_scale, max_random_scale]``Ignored if ``random_resized_crop is True.

  • max_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • min_random_area (float, optional, default=1) – Change the area (namely width * height) to a random value in [min_random_area, max_random_area]. Ignored if random_resized_crop is False.

  • max_img_size (float, optional, default=1e+10) – Set the maximal width and height after all resize and rotate argumentation are applied

  • min_img_size (float, optional, default=0) – Set the minimal width and height after all resize and rotate argumentation are applied

  • brightness (float, optional, default=0) – Add a random value in [-brightness, brightness] to the brightness of image.

  • contrast (float, optional, default=0) – Add a random value in [-contrast, contrast] to the contrast of image.

  • saturation (float, optional, default=0) – Add a random value in [-saturation, saturation] to the saturation of image.

  • pca_noise (float, optional, default=0) – Add PCA based noise to the image.

  • random_h (int, optional, default='0') – Add a random value in [-random_h, random_h] to the H channel in HSL color space.

  • random_s (int, optional, default='0') – Add a random value in [-random_s, random_s] to the S channel in HSL color space.

  • random_l (int, optional, default='0') – Add a random value in [-random_l, random_l] to the L channel in HSL color space.

  • rotate (int, optional, default='-1') – Rotate by an angle. If set, it overwrites the max_rotate_angle option.

  • fill_value (int, optional, default='255') – Set the padding pixels value to fill_value.

  • inter_method (int, optional, default='1') – The interpolation method: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.

  • pad (int, optional, default='0') – Change size from [width, height] into [pad + width + pad, pad + height + pad] by padding pixes

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.LibSVMIter(*args, **kwargs)

b”Returns the LibSVM iterator which returns data with csrnstorage type. This iterator is experimental and should be used with care.nnThe input data is stored in a format similar to LibSVM file format, except that the indicesnare expected to be zero-based instead of one-based, and the column indices for each row arenexpected to be sorted in ascending order. Details of the LibSVM format are availablen`here. <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/>`_nnnThe data_shape parameter is used to set the shape of each line of the data.nThe dimension of both data_shape and label_shape are expected to be 1.nnThe data_libsvm parameter is used to set the path input LibSVM file.nWhen it is set to a directory, all the files in the directory will be read.nnWhen label_libsvm is set to NULL, both data and label are read from the file specifiednby data_libsvm. In this case, the data is stored in csr storage type, while the label is a 1Dndense array.nnThe LibSVMIter only support round_batch parameter set to True. Therefore, if batch_sizenis 3 and there are 4 total rows in libsvm file, 2 more examples are consumed at the first round.nnWhen num_parts and part_index are provided, the data is split into num_parts partitions,nand the iterator only reads the part_index-th partition. However, the partitions are notnguaranteed to be even.nn``reset()`` is expected to be called only after a complete pass of data.nnExample::nn # Contents of libsvm file data.t.n 1.0 0:0.5 2:1.2n -2.0n -3.0 0:0.6 1:2.4 2:1.2n 4 2:-1.2nn # Creates a LibSVMIter with batch_size`=3.n >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,), batch_size = 3)n # The data of the first batch is stored in csr storage typen >>> batch = data_iter.next()n >>> csr = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2]]n # The label of first batchn >>> label = batch.label[0]n >>> labeln [ 1. -2. -3.]n <NDArray 3 @cpu(0)>nn >>> second_batch = data_iter.next()n # The data of the second batchn >>> second_batch.data[0].asnumpy()n [[ 0. 0. -1.2 ]n [ 0.5 0. 1.2 ]n [ 0. 0. 0. ]]n # The label of the second batchn >>> second_batch.label[0].asnumpy()n [ 4. 1. -2.]nn >>> data_iter.reset()n # To restart the iterator for the second pass of the datannWhen `label_libsvm is set to the path to another LibSVM file,ndata is read from data_libsvm and label from label_libsvm.nIn this case, both data and label are stored in the csr format.nIf the label column in the data_libsvm file is ignored.nnExample::nn # Contents of libsvm file label.tn 1.0n -2.0 0:0.125n -3.0 2:1.2n 4 1:1.0 2:-1.2nn # Creates a LibSVMIter with specified label filen >>> data_iter = mx.io.LibSVMIter(data_libsvm = ‘data.t’, data_shape = (3,),n label_libsvm = ‘label.t’, label_shape = (3,), batch_size = 3)nn # Both data and label are in csr storage typen >>> batch = data_iter.next()n >>> csr_data = batch.data[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_data.asnumpy()n [[ 0.5 0. 1.2 ]n [ 0. 0. 0. ]n [ 0.6 2.4 1.2 ]]n >>> csr_label = batch.label[0]n <CSRNDArray 3x3 @cpu(0)>n >>> csr_label.asnumpy()n [[ 0. 0. 0. ]n [ 0.125 0. 0. ]n [ 0. 0. 1.2 ]]nnnnDefined in src/io/iter_libsvm.cc:L298”

Parameters
  • data_libsvm (string, required) – The input zero-base indexed LibSVM data file or a directory path.

  • data_shape (Shape(tuple), required) – The shape of one example.

  • label_libsvm (string, optional, default='NULL') – The input LibSVM label file or a directory path. If NULL, all labels will be read from data_libsvm.

  • label_shape (Shape(tuple), optional, default=[1]) – The shape of one label.

  • num_parts (int, optional, default='1') – partition the data into multiple parts

  • part_index (int, optional, default='0') – the index of the part will read

  • batch_size (int (non-negative), required) – Batch size.

  • round_batch (boolean, optional, default=1) – Whether to use round robin to handle overflow batch or not.

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • device_id (int, optional, default='-1') – The default device id for context. -1 indicate it’s on default device

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

Returns

The result iterator.

Return type

MXDataIter

mxnet.io.MNISTIter(*args, **kwargs)

b’Iterating on the MNIST dataset.nnOne can download the dataset from http://yann.lecun.com/exdb/mnist/nnnnDefined in src/io/iter_mnist.cc:L265’

Parameters
  • image (string, optional, default='./train-images-idx3-ubyte') – Dataset Param: Mnist image path.

  • label (string, optional, default='./train-labels-idx1-ubyte') – Dataset Param: Mnist label path.

  • batch_size (int, optional, default='128') – Batch Param: Batch Size.

  • shuffle (boolean, optional, default=1) – Augmentation Param: Whether to shuffle data.

  • flat (boolean, optional, default=0) – Augmentation Param: Whether to flat the data into 1D.

  • seed (int, optional, default='0') – Augmentation Param: Random Seed.

  • silent (boolean, optional, default=0) – Auxiliary Param: Whether to print out data info.

  • num_parts (int, optional, default='1') – partition the data into multiple parts

  • part_index (int, optional, default='0') – the index of the part will read

  • prefetch_buffer (long (non-negative), optional, default=4) – Maximum number of batches to prefetch.

  • ctx ({'cpu', 'cpu_pinned', 'gpu'},optional, default='gpu') – Context data loader optimized for. Note that it only indicates the optimization strategy for devices, by no means the prefetcher will load data to GPUs. If ctx is ‘cpu_pinned’ and device_id is not -1, it will use cpu_pinned(device_id) as ctx

  • device_id (int, optional, default='-1') – The default device id for context. -1 indicate it’s on default device

  • dtype ({None, 'bfloat16', 'float16', 'float32', 'float64', 'int32', 'int64', 'int8', 'uint8'},optional, default='None') – Output data type. None means no change.

Returns

The result iterator.

Return type

MXDataIter

class mxnet.io.MXDataIter(handle, data_name='data', label_name='softmax_label', **kwargs)[source]

Bases: mxnet.io.io.DataIter

A python wrapper a C++ data iterator.

This iterator is the Python wrapper to all native C++ data iterators, such as CSVIter, ImageRecordIter, MNISTIter, etc. When initializing CSVIter for example, you will get an MXDataIter instance to use in your Python code. Calls to next, reset, etc will be delegated to the underlying C++ data iterators.

Usually you don’t need to interact with MXDataIter directly unless you are implementing your own data iterators in C++. To do that, please refer to examples under the src/io folder.

Parameters
  • handle (DataIterHandle, required) – The handle to the underlying C++ Data Iterator.

  • data_name (str, optional) – Data name. Default to “data”.

  • label_name (str, optional) – Label name. Default to “softmax_label”.

Methods

getdata()

Get data of current batch.

getindex()

Get index of the current batch.

getlabel()

Get label of the current batch.

getpad()

Get the number of padding examples in the current batch.

iter_next()

Move to the next batch.

next()

Get next data batch from iterator.

reset()

Reset the iterator to the begin of the data.

See also

src, e.g., None

getdata()[source]

Get data of current batch.

Returns

The data of the current batch.

Return type

list of NDArray

getindex()[source]

Get index of the current batch.

Returns

index – The indices of examples in the current batch.

Return type

numpy.array

getlabel()[source]

Get label of the current batch.

Returns

The label of the current batch.

Return type

list of NDArray

getpad()[source]

Get the number of padding examples in the current batch.

Returns

Number of padding examples in the current batch.

Return type

int

iter_next()[source]

Move to the next batch.

Returns

Whether the move is successful.

Return type

boolean

next()[source]

Get next data batch from iterator.

Returns

The data of next batch.

Return type

DataBatch

Raises

StopIteration – If the end of the data is reached.

reset()[source]

Reset the iterator to the begin of the data.

class mxnet.io.NDArrayIter(data, label=None, batch_size=1, shuffle=False, last_batch_handle='pad', data_name='data', label_name='softmax_label')[source]

Bases: mxnet.io.io.DataIter

Returns an iterator for mx.nd.NDArray, numpy.ndarray, h5py.Dataset mx.nd.sparse.CSRNDArray or scipy.sparse.csr_matrix.

Examples

Methods

getdata()

Get data.

getlabel()

Get label.

getpad()

Get pad value of DataBatch.

hard_reset()

Ignore roll over data and set to start.

iter_next()

Increments the coursor by batch_size for next batch and check current cursor if it exceed the number of data points.

next()

Returns the next batch of data.

reset()

Resets the iterator to the beginning of the data.

Attributes

provide_data

The name and shape of data provided by this iterator.

provide_label

The name and shape of label provided by this iterator.

>>> data = np.arange(40).reshape((10,2,2))
>>> labels = np.ones([10, 1])
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard')
>>> for batch in dataiter:
...     print batch.data[0].asnumpy()
...     batch.data[0].shape
...
[[[ 36.  37.]
  [ 38.  39.]]
 [[ 16.  17.]
  [ 18.  19.]]
 [[ 12.  13.]
  [ 14.  15.]]]
(3L, 2L, 2L)
[[[ 32.  33.]
  [ 34.  35.]]
 [[  4.   5.]
  [  6.   7.]]
 [[ 24.  25.]
  [ 26.  27.]]]
(3L, 2L, 2L)
[[[  8.   9.]
  [ 10.  11.]]
 [[ 20.  21.]
  [ 22.  23.]]
 [[ 28.  29.]
  [ 30.  31.]]]
(3L, 2L, 2L)
>>> dataiter.provide_data # Returns a list of `DataDesc`
[DataDesc[data,(3, 2L, 2L),<type 'numpy.float32'>,NCHW]]
>>> dataiter.provide_label # Returns a list of `DataDesc`
[DataDesc[softmax_label,(3, 1L),<type 'numpy.float32'>,NCHW]]

In the above example, data is shuffled as shuffle parameter is set to True and remaining examples are discarded as last_batch_handle parameter is set to discard.

Usage of last_batch_handle parameter:

>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='pad')
>>> batchidx = 0
>>> for batch in dataiter:
...     batchidx += 1
...
>>> batchidx  # Padding added after the examples read are over. So, 10/3+1 batches are created.
4
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard')
>>> batchidx = 0
>>> for batch in dataiter:
...     batchidx += 1
...
>>> batchidx # Remaining examples are discarded. So, 10/3 batches are created.
3
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, False, last_batch_handle='roll_over')
>>> batchidx = 0
>>> for batch in dataiter:
...     batchidx += 1
...
>>> batchidx # Remaining examples are rolled over to the next iteration.
3
>>> dataiter.reset()
>>> dataiter.next().data[0].asnumpy()
[[[ 36.  37.]
  [ 38.  39.]]
 [[ 0.  1.]
  [ 2.  3.]]
 [[ 4.  5.]
  [ 6.  7.]]]
(3L, 2L, 2L)

NDArrayIter also supports multiple input and labels.

>>> data = {'data1':np.zeros(shape=(10,2,2)), 'data2':np.zeros(shape=(20,2,2))}
>>> label = {'label1':np.zeros(shape=(10,1)), 'label2':np.zeros(shape=(20,1))}
>>> dataiter = mx.io.NDArrayIter(data, label, 3, True, last_batch_handle='discard')

NDArrayIter also supports mx.nd.sparse.CSRNDArray with last_batch_handle set to discard.

>>> csr_data = mx.nd.array(np.arange(40).reshape((10,4))).tostype('csr')
>>> labels = np.ones([10, 1])
>>> dataiter = mx.io.NDArrayIter(csr_data, labels, 3, last_batch_handle='discard')
>>> [batch.data[0] for batch in dataiter]
[
<CSRNDArray 3x4 @cpu(0)>,
<CSRNDArray 3x4 @cpu(0)>,
<CSRNDArray 3x4 @cpu(0)>]
Parameters
  • data (array or list of array or dict of string to array) – The input data.

  • label (array or list of array or dict of string to array, optional) – The input label.

  • batch_size (int) – Batch size of data.

  • shuffle (bool, optional) – Whether to shuffle the data. Only supported if no h5py.Dataset inputs are used.

  • last_batch_handle (str, optional) – How to handle the last batch. This parameter can be ‘pad’, ‘discard’ or ‘roll_over’. If ‘pad’, the last batch will be padded with data starting from the begining If ‘discard’, the last batch will be discarded If ‘roll_over’, the remaining elements will be rolled over to the next iteration and note that it is intended for training and can cause problems if used for prediction.

  • data_name (str, optional) – The data name.

  • label_name (str, optional) – The label name.

getdata()[source]

Get data.

getlabel()[source]

Get label.

getpad()[source]

Get pad value of DataBatch.

hard_reset()[source]

Ignore roll over data and set to start.

iter_next()[source]

Increments the coursor by batch_size for next batch and check current cursor if it exceed the number of data points.

next()[source]

Returns the next batch of data.

property provide_data

The name and shape of data provided by this iterator.

property provide_label

The name and shape of label provided by this iterator.

reset()[source]

Resets the iterator to the beginning of the data.

class mxnet.io.PrefetchingIter(iters, rename_data=None, rename_label=None)[source]

Bases: mxnet.io.io.DataIter

Performs pre-fetch for other data iterators.

This iterator will create another thread to perform iter_next and then store the data in memory. It potentially accelerates the data read, at the cost of more memory usage.

Parameters
  • iters (DataIter or list of DataIter) – The data iterators to be pre-fetched.

  • rename_data (None or list of dict) – The i-th element is a renaming map for the i-th iter, in the form of {‘original_name’ : ‘new_name’}. Should have one entry for each entry in iter[i].provide_data.

  • rename_label (None or list of dict) – Similar to rename_data.

Methods

getdata()

Get data of current batch.

getindex()

Get index of the current batch.

getlabel()

Get label of the current batch.

getpad()

Get the number of padding examples in the current batch.

iter_next()

Move to the next batch.

next()

Get next data batch from iterator.

reset()

Reset the iterator to the begin of the data.

Examples

>>> iter1 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25)
>>> iter2 = mx.io.NDArrayIter({'data':mx.nd.ones((100,10))}, batch_size=25)
>>> piter = mx.io.PrefetchingIter([iter1, iter2],
...                               rename_data=[{'data': 'data_1'}, {'data': 'data_2'}])
>>> print(piter.provide_data)
[DataDesc[data_1,(25, 10L),<type 'numpy.float32'>,NCHW],
 DataDesc[data_2,(25, 10L),<type 'numpy.float32'>,NCHW]]
getdata()[source]

Get data of current batch.

Returns

The data of the current batch.

Return type

list of NDArray

getindex()[source]

Get index of the current batch.

Returns

index – The indices of examples in the current batch.

Return type

numpy.array

getlabel()[source]

Get label of the current batch.

Returns

The label of the current batch.

Return type

list of NDArray

getpad()[source]

Get the number of padding examples in the current batch.

Returns

Number of padding examples in the current batch.

Return type

int

iter_next()[source]

Move to the next batch.

Returns

Whether the move is successful.

Return type

boolean

next()[source]

Get next data batch from iterator.

Returns

The data of next batch.

Return type

DataBatch

Raises

StopIteration – If the end of the data is reached.

reset()[source]

Reset the iterator to the begin of the data.

class mxnet.io.ResizeIter(data_iter, size, reset_internal=True)[source]

Bases: mxnet.io.io.DataIter

Resize a data iterator to a given number of batches.

Parameters
  • data_iter (DataIter) – The data iterator to be resized.

  • size (int) – The number of batches per epoch to resize to.

  • reset_internal (bool) – Whether to reset internal iterator on ResizeIter.reset.

Methods

getdata()

Get data of current batch.

getindex()

Get index of the current batch.

getlabel()

Get label of the current batch.

getpad()

Get the number of padding examples in the current batch.

iter_next()

Move to the next batch.

reset()

Reset the iterator to the begin of the data.

Examples

>>> nd_iter = mx.io.NDArrayIter(mx.nd.ones((100,10)), batch_size=25)
>>> resize_iter = mx.io.ResizeIter(nd_iter, 2)
>>> for batch in resize_iter:
...     print(batch.data)
[<NDArray 25x10 @cpu(0)>]
[<NDArray 25x10 @cpu(0)>]
getdata()[source]

Get data of current batch.

Returns

The data of the current batch.

Return type

list of NDArray

getindex()[source]

Get index of the current batch.

Returns

index – The indices of examples in the current batch.

Return type

numpy.array

getlabel()[source]

Get label of the current batch.

Returns

The label of the current batch.

Return type

list of NDArray

getpad()[source]

Get the number of padding examples in the current batch.

Returns

Number of padding examples in the current batch.

Return type

int

iter_next()[source]

Move to the next batch.

Returns

Whether the move is successful.

Return type

boolean

reset()[source]

Reset the iterator to the begin of the data.