## Public Data Sources * Open data catalogs from various governments and NGOs: * [NYC Open Data](https://nycopendata.socrata.com/) * [DC Open Data Catalog](http://data.dc.gov/) / [OpenDataDC](http://www.opendatadc.org/) * [DataLA](https://data.lacity.org/) * [data.gov](https://www.data.gov/) (see also: [Project Open Data Dashboard](http://data.civicagency.org/)) * [data.gov.uk](http://data.gov.uk/) * [US Census Bureau](http://www.census.gov/) * [World Bank Open Data](http://data.worldbank.org/) * [Humanitarian Data Exchange](http://docs.hdx.rwlabs.org/) * [Sunlight Foundation](http://sunlightfoundation.com/api/): government-focused data * [ProPublica Data Store](https://projects.propublica.org/data-store/) * Datasets hosted by academic institutions: * [UC Irvine Machine Learning Repository](http://archive.ics.uci.edu/ml/): datasets specifically designed for machine learning * [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/): graph data * [Inter-university Consortium for Political and Social Research](http://www.icpsr.umich.edu/) * [Pittsburgh Science of Learning Center's DataShop](http://www.learnlab.org/technologies/datashop/) * [Academic Torrents](http://academictorrents.com/): distributed network for sharing large research datasets * [Dataverse Project](http://dataverse.org/): searchable archive of research data * Datasets hosted by private companies: * [Quandl](https://www.quandl.com/): over 10 million financial, economic, and social datasets * [Amazon Web Services Public Data Sets](http://aws.amazon.com/datasets/) * [Kaggle](http://www.kaggle.com/) provides datasets with their challenges, but each competition has its own rules as to whether the data can be used outside of the scope of the competition. * Big lists of datasets: * [Awesome Public Datasets](https://github.com/caesar0301/awesome-public-datasets): Well-organized and frequently updated * [Rdatasets](http://vincentarelbundock.github.io/Rdatasets/): collection of 700+ datasets originally distributed with R packages * [RDataMining.com](http://www.rdatamining.com/resources/data) * [KDnuggets](http://www.kdnuggets.com/datasets/index.html) * [inside-R](http://www.inside-r.org/howto/finding-data-internet) * [100+ Interesting Data Sets for Statistics](http://rs.io/2014/05/29/list-of-data-sets.html) * [20 Free Big Data Sources](http://smartdatacollective.com/bernardmarr/235366/big-data-20-free-big-data-sources-everyone-should-know) * [Sebastian Raschka](https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md): datasets categorized by format and topic * APIs: * [Apigee](https://apigee.com/providers): explore dozens of popular APIs * [Mashape](https://www.mashape.com/): explore hundreds of APIs * [Python APIs](http://www.pythonforbeginners.com/api/list-of-python-apis): Python wrappers for many APIs * Other interesting datasets: * [FiveThirtyEight](https://github.com/fivethirtyeight/data): data and code related to their articles * [The Upshot](https://github.com/TheUpshot/): data related to their articles * [Yelp Dataset Challenge](http://www.yelp.com/dataset_challenge): Yelp reviews, business attributes, users, and more from 10 cities * [Donors Choose](http://data.donorschoose.org/open-data/overview/): data related to their projects * [200,000+ Jeopardy questions](http://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/) * [CrowdFlower](http://www.crowdflower.com/data-for-everyone): interesting datasets created or enhanced by their contributors * [UFO reports](https://github.com/planetsig/ufo-reports): geolocated and time-standardized UFO reports for close to a century * [Reddit Top 2.5 Million](https://github.com/umbrae/reddit-top-2.5-million): all-time top 1,000 posts from each of the top 2,500 subreddits * Other resources: * [Datasets subreddit](http://www.reddit.com/r/datasets/): ask for help finding a specific data set, or post your own * [Center for Data Innovation](http://www.datainnovation.org/category/publications/data-set-blog/): blog posts about interesting, recently-released data sets.