Datasets

These datasets can be used for benchmarking deep learning algorithms:

Symbolic Music Datasets

Piano-midi.de: classical piano pieces (http://www.piano-midi.de/)
Nottingham : over 1000 folk tunes (http://abc.sourceforge.net/NMD/)
MuseData: electronic library of classical music scores (http://musedata.stanford.edu/)
JSB Chorales: set of four-part harmonized chorales (http://www.jsbchorales.net/index.shtml)

Natural Images

MNIST: handwritten digits (http://yann.lecun.com/exdb/mnist/)
NIST: similar to MNIST, but larger
Perturbed NIST: a dataset developed in Yoshua’s class (NIST with tons of deformations)
CIFAR10 / CIFAR100: 32×32 natural image dataset with 10/100 categories (http://www.cs.utoronto.ca/~kriz/cifar.html)
Caltech 101: pictures of objects belonging to 101 categories (http://www.vision.caltech.edu/Image_Datasets/Caltech101/)
Caltech 256: pictures of objects belonging to 256 categories (http://www.vision.caltech.edu/Image_Datasets/Caltech256/)
Caltech Silhouettes: 28×28 binary images contains silhouettes of the Caltech 101 dataset
STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications.http://www.stanford.edu/~acoates//stl10/
The Street View House Numbers (SVHN) Dataset - http://ufldl.stanford.edu/housenumbers/
NORB: binocular images of toy figurines under various illumination and pose (http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/)
Imagenet: image database organized according to the WordNethierarchy (http://www.image-net.org/)
Pascal VOC: various object recognition challenges (http://pascallin.ecs.soton.ac.uk/challenges/VOC/)
Labelme: A large dataset of annotated images, http://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php
COIL 20: different objects imaged at every angle in a 360 rotation(http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php)
COIL100: different objects imaged at every angle in a 360 rotation (http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php)

Artificial Datasets

Arcade Universe- An artificial dataset generator with images containing arcade games sprites such as tetris pentomino/tetromino objects. This generator is based on the O. Breleux’s bugland dataset generator.
A collection of datasets inspired by the ideas from BabyAISchool:
- BabyAIShapesDatasets : distinguishing between 3 simple shapes
- BabyAIImageAndQuestionDatasets : a question-image-answer dataset
Datasets generated for the purpose of an empirical evaluation of deep architectures (DeepVsShallowComparisonICML2007):
- MnistVariations : introducing controlled variations in MNIST
- RectanglesData : discriminating between wide and tall rectangles
- ConvexNonConvex : discriminating between convex and nonconvex shapes
- BackgroundCorrelation : controlling the degree of correlation in noisy MNIST backgrounds

Faces

Labelled Faces in the Wild: 13,000 images of faces collected from the web, labelled with the name of the person pictured (http://vis-www.cs.umass.edu/lfw/)
Toronto Face Dataset
Olivetti: a few images of several different people (http://www.cs.nyu.edu/~roweis/data.html)
Multi-Pie: The CMU Multi-PIE Face Database (http://www.multipie.org/)
Face-in-Action (http://www.flintbox.com/public/project/5486/)
JACFEE: Japanese and Caucasian Facial Expressions of Emotion (http://www.humintell.com/jacfee/)
FERET: The Facial Recognition Technology Database (http://www.itl.nist.gov/iad/humanid/feret/feret_master.html)
mmifacedb: MMI Facial Expression Database (http://www.mmifacedb.com/)
IndianFaceDatabase: http://vis-www.cs.umass.edu/~vidit/IndianFaceDatabase/)
(e.g. The Yale Face Database (http://vision.ucsd.edu/content/yale-face-database) and The Yale Face Database B (http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html)).

Text

20 newsgroups: classification task, mapping word occurences to newsgroup ID (http://qwone.com/~jason/20Newsgroups/)
Reuters (RCV*) Corpuses: text/topic prediction (http://about.reuters.com/researchandstandards/corpus/)
Penn Treebank : used for next word prediction or next character prediction (http://www.cis.upenn.edu/~treebank/)
Broadcast News: large text dataset, classically used for next word prediction (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC97S44)
Wikipedia Dataset
Multidomain sentiment analysis dataset: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/

Speech

TIMIT Speech Corpus: phoneme classification (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1)
Aurora : Timit with noise and additional information

Recommendation Systems

MovieLens: Two datasets available from http://www.grouplens.org.
The first dataset has 100,000 ratings for 1682 movies by 943 users,
subdivided into five disjoint subsets. The second dataset has about 1
million ratings for 3900 movies by 6040 users.
Jester: This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users.
Netflix Prize: Netflix released an anonymised version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1 and all of the 17,770 movies.
Book-Crossing dataset: This dataset is from the Book-Crossing community, and contains 278,858 users providing 1,149,780 ratings about 271,379 books.

Misc

“Musk” dataset
CMU Motion Capture Database: (http://mocap.cs.cmu.edu/)
Brodatz dataset: texture modeling (http://www.ux.uis.no/~tranden/brodatz.html)
Million Song dataset: http://labrosa.ee.columbia.edu/millionsong/
Merck Molecular Activity Challenge - http://www.kaggle.com/c/MerckActivity/data

from: http://deeplearning.net/datasets/

深度学习数据集Deep Learning Datasets的更多相关文章

深度学习（Deep Learning）资料大全（不断更新）
Deep Learning(深度学习)学习笔记(不断更新): Deep Learning(深度学习)学习笔记之系列(一) 深度学习(Deep Learning)资料(不断更新):新增数据集,微信公众号 ...
学习笔记之深度学习（Deep Learning）
深度学习 - 维基百科,自由的百科全书 https://zh.wikipedia.org/wiki/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0 深度学习(deep lea ...
读李宏毅《一天看懂深度学习》——Deep Learning Tutorial
大牛推荐的入门用深度学习导论,刚拿到有点懵,第一次接触PPT类型的学习资料,但是耐心看下来收获还是很大的,适合我这种小白入门哈哈. 原PPT链接:http://www.slideshare.net/t ...
深度学习（deep learning）
最近deep learning大火,不仅仅受到学术界的关注,更在工业界受到大家的追捧.在很多重要的评测中,DL都取得了state of the art的效果.尤其是在语音识别方面,DL使得错误率下降了 ...
如何正确理解深度学习（Deep Learning）的概念
现在深度学习在机器学习领域是一个很热的概念,不过经过各种媒体的转载播报,这个概念也逐渐变得有些神话的感觉:例如,人们可能认为,深度学习是一种能够模拟出人脑的神经结构的机器学习方式,从而能够让计算机具有 ...
深度学习教程Deep Learning Tutorials
Deep Learning Tutorials Deep Learning is a new area of Machine Learning research, which has been int ...
Caffe——清晰高效的深度学习（Deep Learning）框架
Caffe(http://caffe.berkeleyvision.org/)是一个清晰而高效的深度学习框架,其作者是博士毕业于UC Berkeley的贾扬清(http://daggerfs.com/ ...
深度学习研究组Deep Learning Research Groups
Deep Learning Research Groups Some labs and research groups that are actively working on deep learni ...
深度学习（deep learning）优化调参细节（trick）
https://blog.csdn.net/h4565445654/article/details/70477979

随机推荐

hadoop出现error包问题记录
前段时间,我公司发现大部分hadoop服务器有重传数据包和error包现象,且重传率经常超过1%.zabbix告警hadoop主机有error包出现.收到大量类似如下告警信息: Trigger: et ...
三 oracle 用户管理一
一.创建用户概述:在oracle中要创建一个新的用户使用create user语句,一般是具有dba(数据库管理员)的权限才能使用.create user 用户名 identified by 密码; ...
Java 集合Collection——初学者参考，高手慎入（未完待续）
1.集合简介和例子 Collection,集合.和数学定义中的集合类似,把很多元素放在一个容器中,方便我们存放结果/查找等操作. Collection集合实际上是很多形式集合的一个抽象. 例如十九大就 ...
java-Excel导出中的坑
在Excel导出过程中,若遇到合并单元格样式只有第一行合并,而下面要合并的行没有边框显示. 一般问题出在将单元格样式设置与合并单元格放在同一个循环中导致. 以下为一个完整版的demo以供参考定义边框 ...
JAVA 9 新特性
Oracle已将JAVA 9的开发提上日程.OpenJDK上已经出现了关于下一个主版本JAVA 9的改进建议(JEP).与以往不同,Oracle在这次谈及了一些真正的特性.而早期对于JDK9的声明仅 ...
python opencv3 背景分割 mog2 knn
git:https://github.com/linyi0604/Computer-Vision 使用mog2算法进行背景分割 # coding:utf-8 import cv2 # 获取摄像头对象 ...
luoguP4336 [SHOI2016]黑暗前的幻想乡容斥原理 + 矩阵树定理
自然地想到容斥原理然后套个矩阵树就行了求行列式的时候只有换行要改变符号啊QAQ 复杂度为\(O(2^n * n^3)\) #include <cstdio> #include < ...
如何成为一名优秀的CTO（首席技术官）
最近我发现很多开发人员都表示不知道如何规划职业生涯的下一个步骤.基于我们目前所处的科技泡沫现象,很多工程师都倾向于留在大型的成熟公司,或者要么a)去初创企业工作要么b)自己搞初创公司. 回顾我自己的职 ...
Linux提权exp大全
如下表 #CVE #Description #Kernels CVE-2017-1000367 [Sudo] (Sudo 1.8.6p7 - 1.8.20) CVE-2017-7494 [Samba ...
2018-2019-2 20162318《网络对抗技术》Exp2 后门原理与实践
一.实验内容 1.使用netcat获取主机操作Shell,cron启动 2.使用socat获取主机操作Shell, 任务计划启动 3.使用MSF meterpreter(或其他软件)生成可执行文件,利 ...

深度学习数据集Deep Learning Datasets

Datasets

深度学习数据集Deep Learning Datasets的更多相关文章

随机推荐

热门专题