The best approach is to start with a problem that interests you, and then look for data. However, if you are creative and critical, you can go the other way around: start with the data and then identify areas of research.
There are many sources of data and you can seek out anything and everything you can get your hands on. Google will be your friend for finding a dataset. Here are some sources for you to explore:
Sample datasets you may use for this project¶
UCI Machine Learning Repository Classic datasets for machine learning (e.g., Iris, Wine Quality). Structured and reliable.
Google Dataset Search: A search engine for datasets from various sources. Helps students find project-specific data.
Data.gov: U.S. government open data (e.g., education, climate). Authoritative and real-world focused.
World Bank Data Catalog: Global development data (e.g., GDP, health stats). Great for economic or social analyses.
NASA Open Data Portal: Space, climate, and earth science data. Engaging for science enthusiasts.
FiveThirtyEight: Datasets from articles on politics, sports, etc. Accessible and student-friendly.
Pew Research Center: Public opinion and demographic data. Useful for social science projects.
UNdata: United Nations stats on global issues (e.g., poverty, education). Broad and impactful.
IMDb Datasets: Movie and entertainment data. Fun for media-related projects.
Note: Check dataset licenses and terms of use, prefer machine-readable formats like CSV or JSON, and verify dataset size and data quality before committing to a project.
Local datasets (bonus):¶
Data.seattle.gov for Seattle open government data (e.g., transit, schools). Relevant for North Creek students.
data.wa.gov for Washington state open data (e.g., statewide transit, education).
A variety of datasets are available from UW Libraries.
Other datasets¶
Awesome Public Datasets - large variety of maintained data sets
Baron Schwartz’s list of datasets. Some of these are themselves rich lists of datasets, such as the Amazon AWS public datasets.
An archive of datasets distributed with the R statistical language
Office for National Statistics (UK) — a repository of detailed statistics about Great Britain and Northern Ireland.
CDC NCHS Data - CDC’s National Center for Health Statistics Data Access