Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The best approach is to start with a problem that interests you, and then look for data. However, if you are creative and critical, you can go the other way around: start with the data and then identify areas of research.

There are many sources of data and you can seek out anything and everything you can get your hands on. Google will be your friend for finding a dataset. Here are some sources for you to explore:

Sample datasets you may use for this project

  1. UCI Machine Learning Repository Classic datasets for machine learning (e.g., Iris, Wine Quality). Structured and reliable.

  2. Google Dataset Search: A search engine for datasets from various sources. Helps students find project-specific data.

  3. Data.gov: U.S. government open data (e.g., education, climate). Authoritative and real-world focused.

  4. World Bank Data Catalog: Global development data (e.g., GDP, health stats). Great for economic or social analyses.

  5. NASA Open Data Portal: Space, climate, and earth science data. Engaging for science enthusiasts.

  6. FiveThirtyEight: Datasets from articles on politics, sports, etc. Accessible and student-friendly.

  7. Pew Research Center: Public opinion and demographic data. Useful for social science projects.

  8. UNdata: United Nations stats on global issues (e.g., poverty, education). Broad and impactful.

  9. IMDb Datasets: Movie and entertainment data. Fun for media-related projects.

Note: Check dataset licenses and terms of use, prefer machine-readable formats like CSV or JSON, and verify dataset size and data quality before committing to a project.

Local datasets (bonus):

Other datasets