MXNet Model Zoo¶
MXNet features fast implementations of many state-of-the-art models reported in the academic literature. This Model Zoo is an ongoing project to collect complete models, with python scripts, pre-trained weights as well as instructions on how to build and fine tune these models.
How to Contribute a Pre-Trained Model (and what to include)¶
The Model Zoo has good entries for CNNs but is seeking content in other areas.
Issue a Pull Request containing the following:
- Gist Log
- .json model definition
- Model parameter file
- Readme file (details below)
Readme file should contain:
- Model Location, access instructions (wget)
- Confirmation the trained model meets published accuracy from original paper
- Step by step instructions on how to use the trained model
- References to any other applicable docs or arxiv papers the model is based on
Convolutional Neural Networks (CNNs)¶
Convolutional neural networks are the state-of-art architecture for many image and video processing problems. Some available datasets include:
- ImageNet: a large corpus of 1 million natural images, divided into 1000 categories.
- CIFAR10: 60,000 natural images (32 x 32 pixels) from 10 categories.
- PASCAL_VOC: A subset of ImageNet images with object bounding boxes.
- UCF101: 13,320 videos from 101 action categories.
- Mini-Places2: Subset of the Places2 dataset. Includes 100,000 images from 100 scene categories.
- ImageNet 11k
- Places2: There are 1.6 million train images from 365 scene categories in the Places365-Standard, which are used to train the Places365 CNNs. There are 50 images per category in the validation set and 900 images per category in the testing set. Compared to the train set of Places365-Standard, the train set of Places365-Challenge has 6.2 million extra images, leading to totally 8 million train images for the Places365 challenge 2016. The validation set and testing set are the same as the Places365-Standard.
- Multimedia Commons: YFCC100M (99.2 million images and 0.8 million videos from Flickr) and supplemental material (pre-extracted features, additional annotations).
For instructions on using these models, see the python tutorial on using pre-trained ImageNet models.
Model Definition | Dataset | Model Weights | Research Basis | Contributors |
---|---|---|---|---|
CaffeNet | ImageNet | Param File | Krizhevsky, 2012 | @jspisak |
Network in Network (NiN) | ImageNet | Param File | Lin et al.., 2014 | @jspisak |
SqueezeNet v1.1 | ImageNet | Param File | Iandola et al.., 2016 | @jspisak |
VGG16 | ImageNet | Param File | Simonyan et al.., 2015 | @jspisak |
VGG19 | ImageNet | Param File | Simonyan et al.., 2015 | @jspisak |
Inception w/ BatchNorm | ImageNet | Param File | Szegedy et al.., 2015 | @jspisak |
ResidualNet152 | ImageNet | Param File | He et al.., 2015 | @jspisak |
ResNext101-64x4d | ImageNet | Param File | Xie et al.., 2016 | @Jerryzcn |
Fast-RCNN | PASCAL VOC | [Param File] | Girshick, 2015 | |
Faster-RCNN | PASCAL VOC | [Param File] | Ren et al..,2016 | |
Single Shot Detection (SSD) | PASCAL VOC | [Param File] | Liu et al.., 2016 | |
LocationNet | MultimediaCommons | Param File | Weyand et al.., 2016 | @jychoi84 @kevinli7 |
Recurrent Neural Networks (RNNs) including LSTMs¶
MXNet supports many types of recurrent neural networks (RNNs), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) networks. Some available datasets include:
- Sherlock Holmes: Text corpus with ~1 million words.The task is predicting downstream words/characters.
- Penn Treebank (PTB): Text corpus with ~1 million words. Vocabulary is limited to 10,000 words. The task is predicting downstream words/characters.
- Shakespeare: Complete text from Shakespeare’s works.
- IMDB reviews: 25,000 movie reviews, labeled as positive or negative
- Facebook bAbI: As a set of 20 question & answer tasks, each with 1,000 training examples.
- Flickr8k, COCO: Images with associated caption (sentences). Flickr8k consists of 8,092 images captioned by AmazonTurkers with ~40,000 captions. COCO has 328,000 images, each with 5 captions. The COCO images also come with labeled objects using segmentation algorithms.
Model Definition | Dataset | Model Weights | Research Basis | Contributors |
---|---|---|---|---|
LSTM - Image Captioning | Flickr8k, MS COCO | Vinyals et al.., 2015 | @... | |
LSTM - Q&A System | bAbl | Weston et al.., 2015 | ||
LSTM - Sentiment Analysis | IMDB | Li et al.., 2015 |
Generative Adversarial Networks (GANs)¶
Generative Adversarial Networks train a competing pair of neural networks: a generator network which transforms a latent vector into content like an image, and a discriminator network that tries to distinguish between generated content and supplied “real” training content. When properly trained the two achieve a Nash equilibrium.
Model Definition | Dataset | Model Weights | Research Basis | Contributors |
---|---|---|---|---|
DCGANs | ImageNet | Radford et al..,2016 | @... | |
Text to Image Synthesis | MS COCO | Reed et al.., 2016 | ||
Deep Jazz | Deepjazz.io |
Other Models¶
MXNet Supports a variety of model types beyond the canonical CNN and LSTM model types. These include deep reinforcement learning, linear models, etc.. Some available datasets and sources include:
- Google News: A text corpus with a vocabulary of 3 million words architected for word2vec.
- MovieLens 20M Dataset: 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.
- Atari Video Game Emulator: Stella is a multi-platform Atari 2600 VCS emulator released under the GNU General Public License (GPL).
Model Definition | Dataset | Model Weights | Research Basis | Contributors |
---|---|---|---|---|
Word2Vec | Google News | Mikolov et al.., 2013 | @... | |
Matrix Factorization | MovieLens 20M | Huang et al.., 2013 | ||
Deep Q-Network | Atari video games | Minh et al.., 2015 | ||
Asynchronous advantage actor-critic (A3C) | Atari video games | Minh et al.., 2016 |