Open Data for Deep Learning

Here you’ll find an organized list of interesting, high-quality datasets for machine learning research. We welcome your contributions for curating this list! You can find other lists of such datasets on Wikipedia, for example. Recent AdditionsOpen Source Biometric Recognition DataGoogle Audioset: An expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.Uber 2B trip data: Slow rollout of access to ride data for 2Bn trips.Natural-Image DatasetsMNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten digits. It is an easy task — just because something works on MNIST, doesn’t mean it works.CIFAR10 / CIFAR100: 32x32 color images with 10 / 100 categories. Not commonly used anymore, though once again, can be an interesting sanity check.Caltech 101: Pictures of objects belonging to 101 categor…