Posts

Showing posts from June, 2017

Open Data for Deep Learning

https://deeplearning4j.org/opendata#open-data-for-deep-learning Here you’ll find an organized list of interesting, high-quality datasets for machine learning research. We welcome your contributions for curating this list! You can find other lists of such datasets  on Wikipedia , for example. Recent Additions Open Source Biometric Recognition Data Google Audioset : An expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. Uber 2B trip data : Slow rollout of access to ride data for 2Bn trips. Natural-Image Datasets MNIST: handwritten digits : The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten digits. It is an easy task — just because something works on MNIST, doesn’t mean it works. CIFAR10 / CIFAR100 : 32x32 color images with 10 / 100 categories. Not commonly used anymore, though once again, can be an interesting sanity check. Caltech 101 : Pictures of ob