Network Datasets

Megascale Cell-Cell Similarity Network

Megascale Cell-Cell Similarity Network:

This dataset contains information for the mouse brain cells and is a single-cell RNA-sequecing dataset. The dataset is preprocessed and the unwanted sources of variations are filtered out. Mouse brain cells are represented by nodes and the edges refer to nearest neighbor similarities between cells according to similar gene expressions.

Here is some information regarding this dataset:

  • Number of Nodes: 1,018,524

  • Number of Edges: 24,735,503

More information about the dataset and links of download are available on SNAP http://snap.stanford.edu/biodata/datasets/10023/10023-CC-Neuron.html.

If you use this dataset, make sure to cite the papers:

M. Zitnik, R. Sosic, J. Leskovec, Prioritizing Network Communities, Nature Communications, 2018.

G. X. Zheng et al, Massively Parallel Digital Transcriptional Profiling of Single Cells, Nature Communations, 2017.

M. Zitnik, R. Sosic, S. Maheshwari, J. Leskovec, BioSNAP Datasets: Biomedical Network Dataset Collection, 2018. http://snap.stanford.edu/biodata/

Keywords: Network, Biology and Health, RNA

Infectious Disease Spread: Flu

Infectious Disease Spread: Flu

This dataset contains information about the Flu virus spreading between healthy and infected students by having close interactions. The nodes refer to almost the entire school population and the edges refer to the interactions with different durations. Most of the contacts are short time. More information about the dataset and links of download can be found on http://sing.stanford.edu/flu/ and the two publications on the dataset.

Here is some information regarding the dataset:

  • Number of Nodes: 788 individuals (655 students and 73 teachers, 55 staff, 5 other)

  • Number of Edges: 2,148,199 Close Proximity Records (762,868 interactions with a mean duration of 2.8 CPRs (~1min) or 118,291 interactions with mean duration of 18.7 CPRs (~6min)

Detailed information about the dataset can be found on the papers:

M. Salathe, M. Kazandjieva, J. W. Lee, P. Levis, M. W. Feldman, J. H. jones, A High-Resolution Human Contact Network for Infectious Disease Transmission, In Proceedings of National Academy of Science (PNAS), 2010.

m. Kazandjieva, J. W. Lee, M. Salathe, M. W. Feldman, J. H. Jones, P. Levis, Experiences in Measuring a Human Contact Network for Epidemiology Research, Proceedings of the ACM Workshop on Hot Topics in Embedded Networked Sensors (HotEmNets), 2010.

Keywords: Network, Biology and Health, Spreading Phenomena, Epidemic Process, Disease, Flu, Time Series

Hypertext 2009 Dynamic Contact Network

Hypertext 2009 Dynamic Contact Network:

This dataset refers to the dynamical network of face to face contacts of a conference attendees, while the attendees voluntarily wear radio badges monitoring their direct contacts. Two data files to download are the Contact List and the Contact Intervals. The Contact List contains the active contacts of the nodes (110 conference attendees) and the connection time in seconds. The Contact Intervals contains a dictionary of the participats’ IDs and the neighbors of the participants in their connection networks in conjunction with the active duration of contact.

More information regarding the dataset and the links of download can be found on http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/.

If you use this dataset:

Make sure to read the License carefully which is available on https://creativecommons.org/licenses/by-nc-sa/3.0/.

Make sure to cite the paper:

L. Isella, J. Stehle, A. Barrat, C. Cattuto, J. Pinton, W. V. Broeck, What’s I a crowd? Analysis of Face-to-face Behavioral Networks, Journal of Theoretical Biology 271, 166, 2011.

Keywords: Network, Social Network, Face-to-face Contact

Hypertext 2009 Dynamic Contact Network

Hypertext 2009 Dynamic Contact Network:

This dataset refers to the dynamical network of face to face contacts of a conference attendees, while the attendees voluntarily wear radio badges monitoring their direct contacts. Two data files to download are the Contact List and the Contact Intervals. The Contact List contains the active contacts of the nodes (110 conference attendees) and the connection time in seconds. The Contact Intervals contains a dictionary of the participats’ IDs and the neighbors of the participants in their connection networks in conjunction with the active duration of contact.

More information regarding the dataset and the links of download can be found on http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/.

If you use this dataset:

Make sure to read the License carefully which is available on https://creativecommons.org/licenses/by-nc-sa/3.0/.

Make sure to cite the paper:

L. Isella, J. Stehle, A. Barrat, C. Cattuto, J. Pinton, W. V. Broeck, What’s I a crowd? Analysis of Face-to-face Behavioral Networks, Journal of Theoretical Biology 271, 166, 2011.

Keywords: Network, Social Network, Face-to-face Contact

Les Miserables:

Les Miserables:

This dataset is an undirected network of co-occurrences of characters in the Victor Hugos novel ‘Les Miserables while the network nodes represent the characters and the edges are according to two characters appeared in the same chapter of the novel or not. Additionally, the links have corresponding weights which refer to the number of co-occurrences of the characters.

Here is some information regarding the dataset:

  • Number of Nodes: 77

  • Number of Edges: 254

More details about the dataset and the links of downloads can be found on http://konect.uni-koblenz.de/networks/moreno_lesmis.

If you use this dataset, make sure to cite the dataset and the paper:

Les Miserables Network Dataset – KONECT, April 2017.

D. e. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, volume 37. Addition-Wesley Reading, 1993.

Keywords: Network

Route Views Project

Route Views Project:

This dataset is a snapshot network of the structure of the internet collected and published by the University of Oregon’s Route Views Project. The project aimed to provide a tool for internet operators to get real-time BGP information about the global routing system from the perspectives of several different locations around the Internet. The dataset is different from the previous tools by providing the operators the ability to determine the global routing system’s access to the operators’ prefixes and AS space. More information regarding this dataset can be found on the dataset’s page on http://www.routeviews.org/routeviews/. The links of download are available through the Archive section on the same page, and on Github gephi on https://github.com/gephi/gephi/wiki/Datasets. Additionally, a list of publications which used the dataset can be found on http://www.routeviews.org/routeviews/index.php/papers/.

Keywords: Web and Internet, Network

SNAP Social Circles

SNAP Social Circles:

This dataset contains circles (different friendship networks) of the ego-networks of the users in social networks (Facebook, Twitter and Google+) and the dataset features are the related information of the users’ profiles including university, school, sport team, etc. The dataset was collected for a study to build a generative model of social circles with unsupervised learning of the model parameters.

Here is some information regarding this dataset:

Facebook

  • Number of ego-networks: 10
  • Number of nodes: 4039

  • Number of Edges: 88,234

To find more information and the links of download, please refer to https://snap.stanford.edu/data/ego-Facebook.html.

Twitter

  • Number of ego-networks: 1000
  • Number of nodes: 81,306

  • Number of Edges: 1,768,149

To find more information and the links of download, please refer to https://snap.stanford.edu/data/ego-Twitter.html.

Google+

  • Number of Ego-networks: 133
  • Number of Nodes: 107,614

  • Number of Edges: 13,673,453

To find more information and the links of download, please refer to https://snap.stanford.edu/data/ego-Gplus.html.

If you use any of the 3 datasets provided above:

Make sure to cite the paper:

J. McAuley, J. Leskovec, Learning to Discover Social Circles in Ego Networks, NIPS, 2012.

Keywords: Social Network, Network, Facebook, Twitter, Google+