By the end of this lesson, students will be able to:
- Design spreadsheet data models that enable reproducible data analysis.
- Convert a pivot table operation to
pandas
groupby
and vice versa. - Write spreadsheet formulas that apply a function over many cells.
For this lesson, we’ll spend most of our time in the preceding notebook on Groupby and Indexing.
Later, we’ll download the earthquakes.csv
file and use it to create a spreadsheet. In lecture, we will visit sheets.new to create a new Google Sheet.
import pandas as pd
import seaborn as sns
sns.set_theme()
What is pivot table?¶
Let’s first revisit the life expectancy dataset and use this as an example of showing what it is in pandas.
life_expectancy = sns.load_dataset("healthexp", index_col=["Year", "Country"])
life_expectancy
Let’s try pivoting the table about the “Country” column. We can read the documentation of pivot_table
here.
pivoted_table = life_expectancy.pivot_table(index="Year", columns="Country", values="Life_Expectancy")
pivoted_table.head()
life_expectancy.columns, pivoted_table.columns
sns.relplot(pivoted_table, kind="line")
# pretty much the same as
# sns.relplot(life_expectancy, x="Year", y="Life_Expectancy", hue="Country", kind="line")
Check the pandas version of earthquakes pivot table¶
earthquakes = pd.read_csv("earthquakes.csv")