Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Line Plots are likely the most common plot you’ll create in IDP. It is fantastic for showing correlations. While line plots are not complicated, it can be frustrating to draw multiple lines as you intend. Furthermore, there are many different API available to draw lines and they accept different named arguments (which is frustrating!).

On this page you’ll see how to:

Libraries

There are several different libraries and objects that allow you to plot a line. The one you use depends on the structure of your data and the complexity of your plot. I have found that the most useful object to use when drawing is the DataFrame where you can do most of what you want to do. When doing subplots, using the Axes object becomes invaluable.

You should read and become familiar with all the named arguments and options for each library/object.

LibraryAPIComments
PandasDataFrame.plotRecommended and most common method of plotting in IDP
Matplotlib.pyplotplt.plotThe fundamental plotting API. All other objects will essentially call this one eventually. Access to more named arguments, but works on only one axis.
Matplotlib.axesaxes.plotAllows one to target a plot to a specific axes. Very useful in customizing figures with multiple subplots.
PandasSeries.plotUseful when the object is a Series
Seabornsns.lineplotAlso works on unpivoted data. Provides simple access to some statistics and various styles of drawing lines.

Helpful Resources/API

Data

The data is three years of temperature data from Snohomish county. This page will do some computations and data organization to enable better plots, but this is what the data basically looks like. We load it as a TimeSeries using pd.read_csv.
Temperature Data

DataFrame

Simple
Multi-Line
Subplots

In this very simple, one line, df.plot(), we default to drawing all columns using the index as the x-axis. We see that by default, there is no label for the y-axis and the x-axis ticks are rotated for us. The Sunrise and Sunset columns cause all the other data to be squashed down to the bottom due to the units being 24-hour time. Furthermore, the 24-hour time causes there to be “jumps” in the data because it will jump from 1059 to 1100 since 1075 is not a valid time. We will fix this below.
Default Plot

# This will plot all columns on the same axes so long as all the values
# in each cell is a numerical value
df.plot()
# move the legend to the side by tying the upper-left of the legend
# to the coordinate (1,.9) which is in units of percentages of the drawn figure.
plt.legend(loc='upper left', bbox_to_anchor=(1.0, 0.9))
plt.title('Default Graph of a DataFrame')

Simple Plots

Temperatures
Day light
Min & Max
Custom Line

Default Plot
We take a slice of the dataframe using loc and then plot only three columns.

# do a time-slice using the fact that we have a TimeSeries and plot only 2021
# plot the three lines on the DataFrame by providing all the column names
df.loc['2021-01':'2021-12'].plot(y=['MaxTemp', 'MinTemp', 'AvgTemp'])
plt.title('Temperatures for 2021')

Twinx Plots

There are times when we want to plot two lines together but their units are dramatically different. You can see this impact in the DataFrame Default image at the top of this page.

Here we will look at an example where the units get in the way of our plot. Then, we will address it using twinx. Online example

Let’s plot the normal temperature (average temperature over many years) along with the the average current temperature and the amount of current precipitation.

First Attempt
Rescaled

Bad Precipitation
Here you’ll see how the amount of precipitation is dwarfed by the scale of the temperatures. Below is the code used to generate the above plot. Note that we resample at the week interval to smooth out the data. And, for precipitation, we use sum instead of mean to get a larger, more representative value.

We also use loc to do a time-slice of the data which is easy because our DataFrame is a TimeSeries. Python semantically understands the date strings.

# Creating a DataFrame and plotting it is very simple and gets us most of 
# what we want. We are missing a proper scale for the rain.

# Resample the Average Temperature at the weekly rate to remove noise.
at = df['AvgTemp'].loc['2021-08':'2022-07'].resample("W").mean()

# Resample and sum up the rain for the week to amplify the values. 
# Using a mean() value has the amount of rain too small.
rain = df['Precipitation'].loc['2021-08':'2022-07'].resample('W').sum()

# To have a valid DataFrame, we need an equal number of rows in each column.
# Eventhough the normal temperature is already smooth, we resample to reduce
# the number of rows to match the other columns and to give a good mean value. 
sn = df['Normal'].loc['2021-08':'2022-07'].resample("W").mean()

# Create the data frame from a dictionary representation of these 3 Series.
df_year = pd.DataFrame({'Normal Temp': sn, 
                        'Avg Temp for Week': at, 
                        'Total Rain for Week': rain})
df_year.plot()
plt.title('A Year of Precipitation & Temperatures')
# The y-axis represents two units!
plt.ylabel('Temp in Fahrenheit\nPrecipitation in Inches')
# Remove 'DATE' from label as it is obvious
plt.xlabel('')

Another Twinx Example

This example is nice because it annotates the daylight curve with the length of the day. The plot shows how the average temperature “lags” the length of the day: the temperature doesn’t change immediately with the length of the day.

The code takes a time-slice for a 12 month period. When finding the equinox points, we make use of Tuple packing and unpacking to combine two lines of code into one line. This is a bit of a trick and is a nice way to shorten highly similar code.

Image
Code
Temperature & Daylight

Seaborn

I’ve found that using Seaborn is helpful in only a few situations.

  1. When the data is in the “unpivoted” format.

  2. When you have multiple y-values at the same x-value and you want to average them out automatically.

  3. When you have many lines that you want to plot with different colors.

  4. When you want to differentiate the lines using size or style.

  5. When you want to do a scatter plot and do a line of best fit.

Pivoted vs Unpivoted PIVOTED
The “pivot” format is where the data is simply organized by column. There is a column that contains the x-values, and there is a column that contains the y-values. In the data shown below, we would use the column year as the x-value, and the columns investment and/or return as the y-value to draw one or two lines.
Pivot Data

UNPIVOTED
The “unpivoted” format is where there is a column dedicated to expressing which “line” the row belongs to. For example, in the data shown below, you might use the DATE column of the DataFrame as the x-axis, use the value column as the y-value and have different lines depending on the source column. In other words, You’d have one line for ‘MaxTemp’ and another line for ‘AvgTemp’. In this example, the source column contains the name of the line that the data belongs to.
Unpivoted Data

Here is code that shows how we can restructure the data from one format to the other.

df_unpivoted = df_pivoted.melt(value_name='value', var_name='source', ignore_index=False)
df_unpivoted.reset_index(inplace=True)

# go back to the original pivoted format
df_original = df_unpivoted.pivot(index='DATE', columns='source', values='value')