Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Challenge goals help us to define expectations while still offering flexibility for you to design your own project. Meeting the requirements of a challenge goal is described here.

Challenge Description

Read the list of possible challenge tasks below and make sure you understand what it takes to meet the requirements of a Challenge Goal. Then, set aside ample time to completely fulfill the goal.

Many students underestimate the amount of work to meet a Challenge. A typical mistake is to assume that they can meet the demands of “Multiple Datasets” because they have a few datasets downloaded. In reality, to qualify you must work with four or more datasets what require merging together in various ways. The datasets need to have a valid reason for merging; the merging must add value to the research.

Others think that learning a new library will be quick and easy. They end up either not doing it at all or slapping on a piece of code that took 5-minutes to look up on the internet. The size of a student’s accomplishments will impact a student’s grade. Note that the amount of time spent is not graded, but it is very difficult to have a large accomplishment in a small amount of time.

Valuable Unit Testing

To qualify, you must deliver valuable Unit Tests of the methods that clean and organize your data. You need to provide some fake data and run the tests that validate the results. You should use Python’s Unit Test framework. Look at past warmup or projects where there was a run_tests.py file and tests folder. Copy that infrastructure.

Multiple Datasets

To qualify, you must work with four or more datasets what require merging together in various ways. The merging must be necessary to come up with a richer analysis. This requirement is not just about using more than one dataset across your research questions, but is more about combining (via merge) the datasets to make a more in-depth analysis.

Web Scraping or API Usage

To qualify, you must successfully scrape hundreds of rows of data off one or more web page. Or, you must use some public API to collect data from some data service (e.g. Spotify). The resulting data would be saved as a CSV file for later organization and analysis. To Web Scrape, consider using Beautiful Soup.

See more info on web scraping here

Statistical Validation

To qualify, you must do some extra work on top of your results to verify the validity of the results. For example, using some test of statistical significance to verify your results aren’t likely to happen by chance. The statistical analysis needs to be visible in the plots themselves and briefly discussed in your write-ups. You cannot simply use seaborn to plot a regression and consider this challenge fulfilled.

Machine Learning

Many students have attempted this with great failure. This challenge goal requires going above and beyond the fundamental steps of a Machine Learning pipeline we introduced in class. You cannot simply create a model, present some accuracy number and call it quits. To qualify you need to:

New Library

Learn a new Python library and use it in your project in a significant way to help with your analysis. Part of this class is being able to learn libraries in Python. Show that you are able to take what you’ve learned in the context of learning a library we have not discussed in-depth in this course. Here are some recommended libraries.