Image Datasets

Chest X-Ray Images Pneumonia

Chest X-Ray Images (Pneumonia)

This dataset contains X-Ray images of patients suffering from Pneumonia in comparison with X-Ray images referring to normal condition. For more information please refer to https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/home.The data files can be downloaded separately for training, testing and validation available on Kaggle https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

Here is some information regarding this dataset:

  • Number of images in the dataset: 5863 images (5216 images for training, 624 images for test and 16 images for validation)

  • Number of classes: 2 (Normal or Pneumonia)

  • Image resolution is different for the image samples.

If you use this dataset:

Please make sure to read the License carefully which is available on https://creativecommons.org/licenses/by/4.0/.

Please make sure to cite the paper:

D. S. Kermany, M. Goldbaum, W. Cai, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018.

keywords: Vision, Image, Biology and Health, X-Ray, Classification

HAM10000

HAM10000:

This dataset contains 10015 dermatoscopic images of pigmented lesions for patients in 7 diagnostic categories. For more than half of the subjects, the diagnosis was confirmed through histopathology and for the rest of the patience through follow-up examinations, expert consensus, or by in-vivo confocal microscopy. More information about the dataset and the diagnosis categories, features and patience conditions besides the links to download the dataset can be found on either Harvard Dataverse https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T or on Kaggle https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000/home. This dataset is for non-commercial use only.

Here is some information regarding the dataset:

Number of Images: 10015 dermatoscopic images

Number of categories: 7 diagnostic categories of pigmented lesions

If you use this dataset:

Make sure to read the Terms of Use carefully, which is available on the same page and needs confirmation before downloading the data files. This dataset is for non-commercial use only.

Make sure to cite the dataset:

Tschandl, Philipp, 2018, The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V1, UNF:6:IQTf5Cb+3EzwZ95U5r0hnQ== [fileUNF]

keywords: Vision, Image, Biology and Health, Classification

CBIS-DDSM

CBIS-DDSM: Curated Breast Imaging Subset of DDSM:

This dataset contains images for screening Mammography and is a subset of a DDSM dataset (Digital Database for Screening Mammography http://marathon.csee.usf.edu/Mammography/Database.html). CBIS-DDSM contains images of cases with three conditions of breast cancer (normal, benign, and malignant). The dataset also includes ROI segmentation and bounding boxes and pathologic diagnosis for the training data. This dataset can be downloaded from the Data Access section on https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM#97542eefbc8e4234a95231cbcd86cb1d.

Here is some information regarding this dataset:

  • Number of images in the dataset: 10,239

  • Number of subjects: 6671

  • Total Images Size in GB: 163.6

If you use this dataset:

Make sure to cite these papers:

R. S. Lee, F. Gimenez, A. Hoogi, D. Rubin. Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive, 2016.

R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, D. L. Rubin. A Curated Mammography Data set for Use in Computer-aided Detection and Diagnosis Research. Scientific Data Volume 4, Article number: 170177, 2017.

K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, F. Prior. The Cancer Imaging Archive(TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, 2013.

Make sure to follow the Policy and Terms of Use available on https://creativecommons.org/licenses/by/3.0/ and https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions.

keywords: Vision, Image, Biology and Health, CT, Classification, Cancer

NLST: National Lung Screening Trial

NLST: National Lung Screening Trial:

This dataset contains images of the screening tests of patients suffering from lung cancer collected during a controlled clinical trial. The patients participated in a study for about 6.5 years of follow-up, while they were randomly divided into two groups of either receiving a low-dose helical CT screening or a single-view chest radiography. The dataset is not public, and a research proposal is required to gain access and download the dataset. To obtain more information regarding the research details or to request to gain access to the dataset, please refer to https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial#4c242d6186bf4aff949bb62cb2ab60da or https://biometry.nci.nih.gov/cdas/learn/nlst/images/. Additionally, a detailed description regarding the dataset participants, CT screening and abnormalities, X-Ray screening and abnormalities, diagnostic procedures, treatment, cause of death and so many other useful information about the dataset is available on https://biometry.nci.nih.gov/cdas/datasets/nlst/.

Here is some information regarding this dataset:

  • Number of images in the dataset: 21,082,502

  • Number of subjects: 26,254

  • Total Images Size in TB: 11.3

If you use this dataset:

Make sure to provide proper citations according to the Citations & Data Usage Policy available on the same page provided above.

Make sure to follow the Policy and Terms of Use even after receiving access to use the dataset for your own research purpose https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions.

keywords: VisionImage, Biology and Health, CT, Classification, Cancer

Human Protein Atlas Image

Human Protein Atlas Image:

This dataset contains protein images of human body available from the Human Protein Atlas Image Classification Competition on Kaggle or from The Human Protein Atlas page https://www.proteinatlas.org/cell. The dataset might be either used for the Kaggle Competition, research and education and non-commercial purposes. Please refer to the competition rules on Kaggle for more information about the Terms of Use and the Rules regarding the dataset https://www.kaggle.com/c/human-protein-atlas-image-classification/rules.

Here is some information regarding this dataset:

  • Number of classes: 28 categories as integers from 0 to 27, each referring to a human protein.

  • Available separate datafiles for training and testing with three resolutions: 512×512 PNG, 2048×2048 TIFF, 3072×3072 TIFF

If you use this dataset:

Make sure to use the dataset for non-commercial purposes only.

keywords: Vision, Image, Biology and Health, Classification, Protein, Cell, Object Detection

COIL-100

COIL-100:

This dataset contains color images of objects at every 5 angles in a 360 degree rotation. The dataset was collected by the Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University. This dataset was used in a real-time image recognition study.

Here is some information regarding this dataset:

  • Number of images in the dataset: 7200 images

  • Number of classes: 100 object categories each with 72 poses

  • Image resolution: 128×128

More information can be found in the technical report in bellow, or the Kaggle page https://www.kaggle.com/jessicali9530/coil100/home.

The main page for the dataset can be found on http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php.

If you use this dataset:

Please make sure to use the dataset for non-commercial research purposes only (Terms of Use).

Please refer to the technical report in bellow and cite:

S. A. Nene, S. K. Nayar and H. Murase, Columbia Object Image Library (COIL-100), Technical Report CUCS-006-96, February 1996.

keywords: Vision, Image, Classification, Object Detection, Rotation

WIDER FACE

WIDER FACE:

This dataset which is a subset of WIDER dataset contains labeled face images with different poses, scales and different situations like marching or hand shaking. Separate download links are available on the dataset page for training, validation and testing with random selection of 40%, 10% and 50% of the whole data respectively. The evaluation and testing results are available for comparison on the results section of the page. You can find this information as well as the download links on http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/index.html.

Here is some information regarding this dataset:

  • Number of images in the dataset: 32,203 images

  • Number of identities: 393,703 subjects with labeled faces

  • Image resolution: 1024×754

If you use this dataset:

Please make sure to cite the paper:

S. Yang, P. Luo, C. C. Loy, X. Tang, WIDER FACE: A Face Detection Benchmark. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

keywords: Vision, Image, Face, Face Detection, Classification, Event

CelebA: Large-scale CelebFaces Attributes

CelebA: Large-scale CelebFaces Attributes:

This dataset contains color face images with 40 attribute annotations for each image. The dataset can be used for different computer vision tasks including face detection, face attribute recognition and landmark or facial part localization. More information about the dataset and links of download can be found on the dataset page http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html or the Kaggle page https://www.kaggle.com/jessicali9530/celeba-dataset/version/2.

Here is some information regarding this dataset:

  • Number of images in the dataset: 202,599 images

  • Number of identities: 10,177 subjects

  • Image resolution: 178×218

If you use this dataset:

Please make sure to use the dataset for non-commercial research purposes only (Terms of Use).

Please make sure to cite the paper:

S. Yang, P. Luo, C. C. Loy, X. Tang, From Facial Parts Responses to Face Detection: A Deep Learning Approach. IEEE International Conference on Computer Vision (ICCV), 2015.

keywords: Vision, Image, Face, Celeb Faces, Face Recognition

VGG & VGG2

VGG & VGG2:

These two face recognition datasets contain color face images of celebrities collected from the web. The images are available with large variation of poses and ages for both datasets.

VGG

VGG has no overlap with some other popular benchmarks such as LFW. Because the images are subject to copyright and VGG does not own the images, only URLs of the images are available by VGG. More information and links for download can be found on http://www.robots.ox.ac.uk/~vgg/data/vgg_face/. Each celebrity’s name is the name of a text file containing the image URLs and corresponding face detections.

Here is some information regarding VGG dataset:

  • Number of identities: 2622

If you use this dataset:

Please make sure to use the dataset for non-commercial research purposes only (Terms of Use). The detailed Terms of Use can be found on http://www.robots.ox.ac.uk/~vgg/data/vgg_face/licence.txt).

Please make sure to cite the paper:

O. M. Parkhi, A. Vedaldi, A. Zisserman, Deep Face Recognition. British Machine Vision Conference, 2015.

VGG2

VGG2 provides loosely cropped faces in separated files to download for training and testing. More information and links for download can be found on http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/data_infor.html. You will need to create an account to be able to download the files.

Here is some information regarding VGG2 dataset:

  • Number of identities: 9131 (8631 identities for training, 500 identities for testing)

  • More than 3.3 million images in the wild

  • Almost 362 image samples per person

If you use this dataset:

Please make sure to pay attention to the License information for using the dataset for Commercial/Research purposes (Terms of Use) available on http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/.

Please make sure to cite the paper:

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, A. Zisserman, VGGFace2: A Dataset for Recognizing Face across Pose and Age. International Conference on Automatic Face and Gesture Recognition, 2018.

keywords: VisionImage, Face, Face Verification, In the Wild

AgeDB

AgeDB:

This dataset contains face images of celebrities, politicians and scientists in different ages and poses. The annotations per image include gender, age and identity of the person in the image. The age variations are from 3 to 101 years old. In the paper mentioned bellow, they have used AgeDB dataset for different experiments including age estimation, age invariant face verification and face age progression. The link for download can be found on https://ibug.doc.ic.ac.uk/resources/agedb/.

Here is some information regarding this dataset:

  • Number of images in the dataset: 12,240 images

  • Number of identities: 440 subjects

If you use this dataset:

Please make sure to use the dataset for non-commercial research purposes only (Terms of Use). The detailed Terms of Use can be found on https://ibug.doc.ic.ac.uk/resources/agedb/.

Please make sure to cite the paper:

S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, S. Zafeiriou. AgeDB: The First Manually Collected, In-the-wild Age Dataset. Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR-W), 2017.

keywords: Vision, Image, Face, Age Estimation, Face Verification, Celeb Faces