Skip to article frontmatterSkip to article content

By the end of this lesson, students will be able to:

  • Design spreadsheet data models that enable reproducible data analysis.
  • Convert a pivot table operation to pandas groupby and vice versa.
  • Write spreadsheet formulas that apply a function over many cells.

For this lesson, we’ll spend most of our time in the preceding notebook on Groupby and Indexing.

Later, we’ll download the earthquakes.csv file and use it to create a spreadsheet. In lecture, we will visit sheets.new to create a new Google Sheet.

import pandas as pd
import seaborn as sns

sns.set_theme()

What is pivot table?

Let’s first revisit the life expectancy dataset and use this as an example of showing what it is in pandas.

life_expectancy = sns.load_dataset("healthexp", index_col=["Year", "Country"])
life_expectancy

Let’s try pivoting the table about the “Country” column. We can read the documentation of pivot_table here.

pivoted_table = life_expectancy.pivot_table(index="Year", columns="Country", values="Life_Expectancy")
pivoted_table.head()
life_expectancy.columns, pivoted_table.columns
sns.relplot(pivoted_table, kind="line")
# pretty much the same as
# sns.relplot(life_expectancy, x="Year", y="Life_Expectancy", hue="Country", kind="line")

Check the pandas version of earthquakes pivot table

earthquakes = pd.read_csv("earthquakes.csv")