Data Archive

Comprehensive Collection of Educational Open Data Sources for Machine Learning and Data Science



The Complex Adaptive Systems Laboratory (CASL) has collected information and links to numerous open source machine learning and network science datasets for students, university professors, and data scientists. The links are provided for educational and research purposes only and are classified into categories based on subject area domains. This archive of links is still growing and includes datasets from kaggle, UCI Machine Learning Repository,, Harvard Dataverse, skymind, Microsoft Research Open Data, etc. The dataset descriptions include citations and links to the original owner’s pages. Please contact the CASL at or should you have any datasets to share or any questions and concerns about the links provided.

General Terms of Use:

This archive is provided for research purposes only. By using this archive, you understand and adhere to take the sole responsibility of how you use these datasets. You accept to refer to the original pages and follow the policies for each specific dataset. Use of each datasets means that you have agreed to the terms and conditions regarding that particular dataset. This is why we are referring to the original webpages for each of the datasets provided in our archive. The datasets should not be used for any illegal activities or hateful purposes or for any other purposes contrary to the terms and conditions that pertain to each dataset.

All conditions for use and copyright provisions are posted on the owner’s original websites and it is the sole responsibility of each user who accesses these datasets to abide by such conditions and/or restrictions of use. It is imperative that the dataset owners’ rights are preserved. By using this archive and the respective datasets, you hereby agree to indemnify and hold the University of Central Florida and UCF Board of Trustees harmless from any and all use by you that is contrary to the terms and conditions pertaining to a particular dataset.

The University of Central Florida Board of Trustees, the University of Central Florida and its employees, agents, faculty, staff and students (“University of Central Florida and Associated Parties”) shall have no obligation, expressed or implied, to supervise, monitor, review, maintain, provide support services, or otherwise assume responsibility for the use of these datasets and associated websites. The University of Central Florida expressly disclaims any and all warranties (express or implied) in conjunction with these datasets and associated websites, including but not limited to warranties of merchantability, fitness for a particular purpose, warranties of non-infringement or non-violation of any kind whatsoever (including but not limited to non-infringement of any kind of intellectual property) and any and all other warranties.  The RECIPIENT hereby agrees to accept the datasets and associated websites in accordance with the terms and conditions pertaining to each particular dataset. In no event will the University of Central Florida and Associated Parties be responsible to the RECIPIENT for any damages, including but not limited to direct damages, indirect damages, incidental damages, consequential damages, exemplary damages of any kind, lost goodwill, lost profits, lost business and/or any direct or indirect economic damages whatsoever regardless of whether such damages arise from claims based upon negligence, tort (including strict liability or other legal theory), and regardless of whether the University of Central Florida and Associated Parties were advised or had reason to know of the possibility of incurring such damages in advance.  The University of Central Florida expressly retains all rights, benefits, and immunities of sovereign immunity under section 768.28, Florida statutes.


Biology and Health Datasets
Image Datasets
Natural Language Processing
Social Networks
Time Series Analysis
Web and Internet