Posts Tagged ‘Classification’

Chest X-Ray Images Pneumonia

Chest X-Ray Images (Pneumonia)

This dataset contains X-Ray images of patients suffering from Pneumonia in comparison with X-Ray images referring to normal condition. For more information please refer to https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/home.The data files can be downloaded separately for training, testing and validation available on Kaggle https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

Here is some information regarding this dataset:

  • Number of images in the dataset: 5863 images (5216 images for training, 624 images for test and 16 images for validation)

  • Number of classes: 2 (Normal or Pneumonia)

  • Image resolution is different for the image samples.

If you use this dataset:

Please make sure to read the License carefully which is available on https://creativecommons.org/licenses/by/4.0/.

Please make sure to cite the paper:

D. S. Kermany, M. Goldbaum, W. Cai, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018.

keywords: Vision, Image, Biology and Health, X-Ray, Classification

HAM10000

HAM10000:

This dataset contains 10015 dermatoscopic images of pigmented lesions for patients in 7 diagnostic categories. For more than half of the subjects, the diagnosis was confirmed through histopathology and for the rest of the patience through follow-up examinations, expert consensus, or by in-vivo confocal microscopy. More information about the dataset and the diagnosis categories, features and patience conditions besides the links to download the dataset can be found on either Harvard Dataverse https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T or on Kaggle https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000/home. This dataset is for non-commercial use only.

Here is some information regarding the dataset:

Number of Images: 10015 dermatoscopic images

Number of categories: 7 diagnostic categories of pigmented lesions

If you use this dataset:

Make sure to read the Terms of Use carefully, which is available on the same page and needs confirmation before downloading the data files. This dataset is for non-commercial use only.

Make sure to cite the dataset:

Tschandl, Philipp, 2018, The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V1, UNF:6:IQTf5Cb+3EzwZ95U5r0hnQ== [fileUNF]

keywords: Vision, Image, Biology and Health, Classification

CBIS-DDSM

CBIS-DDSM: Curated Breast Imaging Subset of DDSM:

This dataset contains images for screening Mammography and is a subset of a DDSM dataset (Digital Database for Screening Mammography http://marathon.csee.usf.edu/Mammography/Database.html). CBIS-DDSM contains images of cases with three conditions of breast cancer (normal, benign, and malignant). The dataset also includes ROI segmentation and bounding boxes and pathologic diagnosis for the training data. This dataset can be downloaded from the Data Access section on https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM#97542eefbc8e4234a95231cbcd86cb1d.

Here is some information regarding this dataset:

  • Number of images in the dataset: 10,239

  • Number of subjects: 6671

  • Total Images Size in GB: 163.6

If you use this dataset:

Make sure to cite these papers:

R. S. Lee, F. Gimenez, A. Hoogi, D. Rubin. Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive, 2016.

R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, D. L. Rubin. A Curated Mammography Data set for Use in Computer-aided Detection and Diagnosis Research. Scientific Data Volume 4, Article number: 170177, 2017.

K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, F. Prior. The Cancer Imaging Archive(TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, 2013.

Make sure to follow the Policy and Terms of Use available on https://creativecommons.org/licenses/by/3.0/ and https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions.

keywords: Vision, Image, Biology and Health, CT, Classification, Cancer

NLST: National Lung Screening Trial

NLST: National Lung Screening Trial:

This dataset contains images of the screening tests of patients suffering from lung cancer collected during a controlled clinical trial. The patients participated in a study for about 6.5 years of follow-up, while they were randomly divided into two groups of either receiving a low-dose helical CT screening or a single-view chest radiography. The dataset is not public, and a research proposal is required to gain access and download the dataset. To obtain more information regarding the research details or to request to gain access to the dataset, please refer to https://wiki.cancerimagingarchive.net/display/NLST/National+Lung+Screening+Trial#4c242d6186bf4aff949bb62cb2ab60da or https://biometry.nci.nih.gov/cdas/learn/nlst/images/. Additionally, a detailed description regarding the dataset participants, CT screening and abnormalities, X-Ray screening and abnormalities, diagnostic procedures, treatment, cause of death and so many other useful information about the dataset is available on https://biometry.nci.nih.gov/cdas/datasets/nlst/.

Here is some information regarding this dataset:

  • Number of images in the dataset: 21,082,502

  • Number of subjects: 26,254

  • Total Images Size in TB: 11.3

If you use this dataset:

Make sure to provide proper citations according to the Citations & Data Usage Policy available on the same page provided above.

Make sure to follow the Policy and Terms of Use even after receiving access to use the dataset for your own research purpose https://wiki.cancerimagingarchive.net/display/Public/Data+Usage+Policies+and+Restrictions.

keywords: VisionImage, Biology and Health, CT, Classification, Cancer

Human Protein Atlas Image

Human Protein Atlas Image:

This dataset contains protein images of human body available from the Human Protein Atlas Image Classification Competition on Kaggle or from The Human Protein Atlas page https://www.proteinatlas.org/cell. The dataset might be either used for the Kaggle Competition, research and education and non-commercial purposes. Please refer to the competition rules on Kaggle for more information about the Terms of Use and the Rules regarding the dataset https://www.kaggle.com/c/human-protein-atlas-image-classification/rules.

Here is some information regarding this dataset:

  • Number of classes: 28 categories as integers from 0 to 27, each referring to a human protein.

  • Available separate datafiles for training and testing with three resolutions: 512×512 PNG, 2048×2048 TIFF, 3072×3072 TIFF

If you use this dataset:

Make sure to use the dataset for non-commercial purposes only.

keywords: Vision, Image, Biology and Health, Classification, Protein, Cell, Object Detection

COIL-100

COIL-100:

This dataset contains color images of objects at every 5 angles in a 360 degree rotation. The dataset was collected by the Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University. This dataset was used in a real-time image recognition study.

Here is some information regarding this dataset:

  • Number of images in the dataset: 7200 images

  • Number of classes: 100 object categories each with 72 poses

  • Image resolution: 128×128

More information can be found in the technical report in bellow, or the Kaggle page https://www.kaggle.com/jessicali9530/coil100/home.

The main page for the dataset can be found on http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php.

If you use this dataset:

Please make sure to use the dataset for non-commercial research purposes only (Terms of Use).

Please refer to the technical report in bellow and cite:

S. A. Nene, S. K. Nayar and H. Murase, Columbia Object Image Library (COIL-100), Technical Report CUCS-006-96, February 1996.

keywords: Vision, Image, Classification, Object Detection, Rotation

WIDER FACE

WIDER FACE:

This dataset which is a subset of WIDER dataset contains labeled face images with different poses, scales and different situations like marching or hand shaking. Separate download links are available on the dataset page for training, validation and testing with random selection of 40%, 10% and 50% of the whole data respectively. The evaluation and testing results are available for comparison on the results section of the page. You can find this information as well as the download links on http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/index.html.

Here is some information regarding this dataset:

  • Number of images in the dataset: 32,203 images

  • Number of identities: 393,703 subjects with labeled faces

  • Image resolution: 1024×754

If you use this dataset:

Please make sure to cite the paper:

S. Yang, P. Luo, C. C. Loy, X. Tang, WIDER FACE: A Face Detection Benchmark. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

keywords: Vision, Image, Face, Face Detection, Classification, Event

SALICON

SALICON:

This image dataset which is also a mouse tracking dataset, has been created from a subset of images from a parent dataset called MS COCO 2014 (available on http://cocodataset.org/#home) with an additional annotation type “fixation”. The visual attentional data for this dataset is collected by using mouse tracking methods. The research work related to this dataset aimed to find answers about human visual attention and decision making. In their paper which is available in bellow, they evaluated their mouse tracking method by comparing the results with eye-tracking.

Here is some information regarding the latest version of this dataset:

  • Number of images in the dataset: 20,000 (10,000 images for training set, 5000 images for validation, 5000 for test set)

  • Number of classes: 80

  • Image resolution: 640×480

More details and links for download can be found on the dataset page http://salicon.net/ and SALICON challenge 2017 page http://salicon.net/challenge-2017/.

You might also be interested to use the SALICON API Python package available on GitHub https://github.com/NUS-VIP/salicon-api.

If you use this dataset:

Please make sure to read Terms of Use available on http://salicon.net/challenge-2017/.

Please make sure to cite the paper:

M. Jiang, S. Huang, J. Duan, Q. Zhao, SALICON: Saliency in Context. CVPR 2015.

keywords: Vision, Image, Classification, Scene, Saliency Analysis

SUN

SUN:

This dataset contains thousands of color images for scenes recognition provided by Princeton University. The images include environmental scenes, places and objects. To create the dataset, WordNet English dictionary is used to find any nouns completing the sentence “I am in -a place-“ or “Let’s go to -the place-“ and data samples are manually categorized. The number of images per category are different for this dataset with the minimum of 100 images per category for the LSUN397 version.

Different versions are available for the dataset. Here is some information about LSUN397 dataset:

  • Number of images in the dataset: 16,873

  • Number of classes: 397 (Abbey, Access_road, etc.)

Here is some information regarding the latest version of this dataset:

  • Number of images in the dataset: 131,067

  • Number of classes: 908 scene categories and 3819 object categories

More details and links of download are available on the dataset pages https://vision.princeton.edu/projects/2010/SUN/ and https://groups.csail.mit.edu/vision/SUN/. Recommendations for training and testing split are also available in the mentioned pages.

If you use this dataset, make sure to cite these two papers:

J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun Database: Large-scale Scene Recognition from Abbey to Zoo, IEEE Conference on Computer Vision and Pattern Recognition, 2010.

J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva, Sun Database: Exploring a Large Collection of Scene Categories. International Journal of Computer Vision (IJCV), 2014.

keywords: VisionImage, Classification, Scene, Object Detection

LSUN

LSUN:

This dataset contains millions of color images for scenes and objects which is far bigger than ImageNet dataset. The labels for this dataset are available based on human’s effort for labeling in conjunction with several different image classification models. The images are from parent databases Pascal Voc 2012 and 10 Million Images for 10 Scene Categories.

Here is some information regarding the LSUN dataset:

  • Number of images in the dataset: More than 59 million and still growing

  • Number of classes: 10 scene categories and 20 object categories

  1. Scene categories (bedroom, bridge, church_outdoor, classroom, conference_room, dining_room, kitchen, living_room, restaurant, tower)

20 object categories (airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining_table, dog, horse, motorbike, person, potted_plant, sheep, sofa, train, tv-monitor)

The dataset can be downloaded either from GitHub https://github.com/fyu/lsun or the categories lists on http://tigress-web.princeton.edu/~fy/lsun/public/release/. More details are available on the dataset page http://www.yf.io/p/lsun.

If you use this dataset, make sure to cite the paper:

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser and Jianxiong Xiao. Corr, LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. abs/1506.03365, 2015

keywords: Vision, Image, Classification, Scene, Object Detection