Skip to content

Lists of publicly available datasets for machine learning

License

Notifications You must be signed in to change notification settings

mdozmorov/Data_notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 

Repository files navigation

Data related notes

License: MIT PR's Welcome

A continualy expanding collection of data-related notes. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Table of content

Datasets

Datasets in R

  • library(help = "datasets") or data() - shows built-in R datasets

  • A list of over 1,000 datasets available in R packages, curated by @VincentAB.

  • curran/data - A collection of public data sets, primarily in text format

  • Tidy Tuesday - A weekly social data project in R with curated datasets

  • dsbox - Data Science in the Box datasets

  • dslabs - Data Science Labs - Datasets and functions that can be used for data analysis practice, homework and projects in data science courses and workshops. 26 datasets are available for case studies in data visualization, statistical inference, modeling, linear regression, data wrangling and machine learning. Made by Rafael Irizarry and Amy Gill.

Genomics

Machine learning

Imaging

COVID-19

  • SARS COV-2 database of uniformly processed 21 COVID-19 scRNA-seq datasets (over 3.2 million cells). Table 1 - COVID-19 data obtained with various technologies. GitHub with processing scripts.
    Paper Tian, Yuan, Lindsay N. Carpp, Helen E. R. Miller, Michael Zager, Evan W. Newell, and Raphael Gottardo. “Single-Cell Immunology of SARS-CoV-2 Infection.” Nature Biotechnology, December 20, 2021. https://doi.org/10.1038/s41587-021-01131-y.

Text

  • Gitenberg is a collaborative, open source community curating and publishing highly usable and attractive ebooks in the public domain. Our books are free to use by anyone for any purpose. They contain detailed metadata and are accessible in a wide variety of formats. https://gitenberg.org/

Misc

About

Lists of publicly available datasets for machine learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published