Sources of Data

The best approach is to start with a problem that interests you, and then look for data. However, if you are creative and critical, you can go the other way around: start with the data and then identify areas of research.

There are MANY sources of data and you can seek out anything and everything you can get your hands on. Google will be your friend for finding a dataset.Here are some sources for you to explore:

Sample datasets you may use for this project

  1. UCI Machine Learning Repository Classic datasets for machine learning (e.g., Iris, Wine Quality). Structured and reliable.
  2. Google Dataset Search: A search engine for datasets from various sources. Helps students find project-specific data.
  3. Data.gov: U.S. government open data (e.g., education, climate). Authoritative and real-world focused.
  4. World Bank Data Catalog: Global development data (e.g., GDP, health stats). Great for economic or social analyses.
  5. NASA Open Data Portal: Space, climate, and earth science data. Engaging for science enthusiasts.
  6. FiveThirtyEight: Datasets from articles on politics, sports, etc. Accessible and student-friendly.
  7. Pew Research Center: Public opinion and demographic data. Useful for social science projects.
  8. UNdata: United Nations stats on global issues (e.g., poverty, education). Broad and impactful.
  9. IMDb Datasets: Movie and entertainment data. Fun for media-related projects.
  10. Kaggle Datasets (Blocked at NCHS): Thousands of datasets across domains (e.g., health, sports) with example notebooks. Ideal for beginners.

Local Bonus:

  • Data.seattle.gov for Seattle open government data or data.wa.gov: Seattle or Washington state data (e.g., transit, schools). Relevant for North Creek students.

  • A variety of data sets are available from UW Libraries

Other datasets you may find usueful