Datasets Example

Here is a short example. Your dataset documentation is likely to be longer!

Datasets Summary:

All the data can be found on this Fake link to Google Folder.

This shows that we are using only three datasets.

DataSet Source Size Notes
Report_Card_Graduation_2018-19.csv Your link must be a deep link that goes to the data like this:
catalog.data.gov
81,267 Graduation information for washington state.
teachers_2014.csv data.gov 48x10 Contains full-time teacher pay and benefits by school district
geo_wa_counties.json Natural Earth NA Contains geometry data for the counties in Washington state

Graduation_2018.csv

This dataset contains graduation rates of high school students in the year 2018 only. The rates are by race and school district.

Column Description
DistrictName string: The name of the school district
County string: A list of county names that the school district is in. A district may span multiple counties
StudentGroup string: The race of the students in this row. Races included are [White, Hispanic/ Latino of any race(s), Black/ African American, Asian…]
GraduationRate double: The percent of students of this race that graduated high school in four years.

Teachers_2014.csv

This dataset contains salary & benefits information for full-time teachers by school district in the year 2014.

Column Description
DNUM integer: The number for the school district. For example, Northshore is 417.
PERV integer: The number of personal vacation days that a teacher gets per year.
BASE double: The Base salary of a full-time teacher.
HRPAY double: The additional pay given to a teacher beyond their base salary for simply being a teacher.
SPST double: The average additional pay (stipend) given to a teacher for coaching a sport.
APST double: The additional pay (stipend) given to an AP Teacher.

Data Challenges

The datasets come from different years because we could not get accurate data for both sets during the same year. If we correlate the data across different years, we are not representing the true data. We need to highlight this!

While the teacher pay dataset is extensive, there is no single column that gives a simple summary how much an “average” teacher makes. This is because we don’t know how many teachers receive certain types of stipends.

It would be valuable to track the changes of graduation rates over time as related to the changes of salary over time. I will be doing some extra work to find more datasets to allow graphing over time.

The School Districts don’t map easily across datasets. One dataset uses a number while the other uses a string. I may need to manually create a mapping dataset that allows me to join the two together.

It would be good to geospatially plot graduation rates, but the geometry data that I’ve found so far is only by county while the school districts can span many counties. I may have to manually pick, or randomly guess, which county a school district mostly represents. Or, perhaps I can locate geometry for the school districts themselves.