Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Overview

You will deliver the following:

More often than not, the raw data is not ready for plotting. The data needs to be rearranged and organized into a format that allows for matplotlib.pyplot to plot. The data may need to be reduced, filtered, joined, calculated or reorganized.

Sketches of Plots

Think about the types of plots you want to create and how those could answer the questions you’ve posed. Be creative and hypothetical in this step. Imagine what a plot would look like and then sketch it out by hand or using Paint.

The goal is to imagine what your plots might look like while not constraining yourself by your coding abilities. Consider ways to convey information while assuming data values. If you find that you’re drawing simple, repetitive bar plots or line plots, then you need to dig deeper into your creativity. Get some inspiration by looking at other published reports and visualizations.

Good Features

Fabulous sketches will:

Things to consider:

Sample Sketch

Sample Sketch

This sketch was created in Paint in about 15 minutes. These are sketches for a hypothetical project on the box office revenues of movies. The project is attempting to show correlation between box office revenues, Netflix earnings, and America’s GDP.

Do not be fooled! This sketch represents C quality work. The sketches illustrate a few things that should be addressed early. Before submitting this Data Organization Deliverable, the sketches should be updated to address these shortcomings:

There are some positive realizations from doing these sketches that will impact my project’s efforts:

GitHub Project

You must follow the structure found in the template project. The GitHub project has several important folders and files describe below.

Organization of Files & Folders

Follow the directions in the ./data_organized/README.md and other related README.md files in subfolders.

Preprocessing Data

To reduce processing time you should save your processed data into the folder ‘data_organized’. You can do this with: df.to_csv('data_organized/filename.csv')

You will spend a surprising amount of time in this area. You will be much more successful if you can complete this early.

You may have additional helper files besides preprocess_data.py, but, this file should be the entry point for that activity.

You should test this file’s code in run_tests.py.

Each file has a header comment that explains what code belongs in each file. These files should have the main-method pattern to allow one to execute that functionality only.