In this homework, you’ll read, process, and group CSV data to compute descriptive statistics in two ways for each problem: with the Pandas library and without the Pandas library.
import doctest
import io
import pandas as pd
# For prettifying doctest output involving data structures
# See also: https://stackoverflow.com/a/21227671
from pprint import pprint
In the Pokémon video game series, the player catches pokemon, fictional creatures trained to battle each other as part of a sport franchise. For this first task, you’ll practice creating your own pokemon-themed CSV dataset in the following format.
pokemon_box = pd.read_csv("pokemon_box.csv")
pokemon_box
id
is a unique numeric identifier corresponding to the species of a pokemon.name
is the name of the species of pokemon, such as Bulbasaur.level
is the integer level of the pokemon.personality
is a one-word string describing the personality of the pokemon, such as Jolly.type
is a one-word string describing the type of the pokemon, such as Grass.weakness
is the enemy type that this pokemon is weak toward. Bulbasaur is weak to fire-type pokemon.atk
,def
,hp
are integers that indicate the attack power, defense power, and hit points of the pokemon.stage
is an integer that indicates the particular developmental stage of the pokemon.
Assume the data is never empty (there’s at least one pokemon), that there’s no missing data (each pokemon has every attribute), and pokemon stats can be any non-negative integers, including 0.
This assessment introduces a new way of validating and testing your data programs by comparing two different approaches to implementing the same function: writing an implementation once using plain Python and again using Pandas. For each programming task below, you’ll write, document, and test each function in the same way to build confidence in their correctness and robustness.
In addition to the large pokemon_box
dataset above, we’ve provided a much smaller pokemon_test
dataset below.
pokemon_test = pd.read_csv(io.StringIO("""
id,name,level,personality,type,weakness,atk,def,hp,stage
59,Arcanine,35,impish,fire,water,50,55,90,2
59,Arcanine,35,gentle,fire,water,45,60,80,2
121,Starmie,67,sassy,water,electric,174,56,113,2
131,Lapras,72,lax,water,electric,107,113,29,1
"""))
pokemon_test
Note that it’s possible to have multiple pokemon that have very similar attributes. In the pokemon_test
dataset, there are two pokemon named “Arcanine” with the same id
, level
, and type
: differing only in personality
, atk
, def
, and hp
. Since there’s not a clearly unique key to use as an index, we won’t define a meaningful index for this assessment.
Outside Sources¶
Update the following Markdown cell to include your name and list your outside sources. Submitted work should be consistent with the curriculum and your sources.
Name: YOUR_NAME_HERE
- Enter your outside sources as a list here, or remove this line if you did not consult any outside sources at all.
Task: Create your own dataset¶
Before starting your programming tasks, create at least one additional testing dataset below. In total, each function you write should contain 3 tests:
- One test for the large
pokemon_box
dataset. - One test for the small
pokemon_test
dataset. - One test for your own
pokemon_mine
dataset below.
pokemon_mine = pd.read_csv(io.StringIO(
...
))
pokemon_mine
Task: Species count¶
Write a function python_species_count
that takes a list of dictionaries representing the pokemon dataset and returns the number of unique pokemon species in the dataset as determined by the name
attribute without using Pandas.
Write a function pandas_species_count
that does the same thing but using a DataFrame
as input.
Add your test case and a descriptive docstring for both functions.
def python_species_count(data):
"""
...
>>> python_species_count(pokemon_box.to_dict("records"))
82
>>> python_species_count(pokemon_test.to_dict("records"))
3
"""
...
doctest.run_docstring_examples(python_species_count, globals())
def pandas_species_count(data):
"""
...
>>> pandas_species_count(pokemon_box)
82
>>> pandas_species_count(pokemon_test)
3
"""
...
doctest.run_docstring_examples(pandas_species_count, globals())
Task: Max level¶
Write a function python_max_level
that takes a list of dictionaries representing the pokemon dataset and returns a 2-element tuple for the (name, level)
of the pokemon with the highest level
in the dataset. If there are multiple pokemon with the highest level
, return the pokemon that appears first in the dataset.
Write a function pandas_max_level
that does the same thing but using a DataFrame
as input.
Add your test case and a descriptive docstring for both functions.
def python_max_level(data):
"""
...
>>> python_max_level(pokemon_box.to_dict("records"))
('Victreebel', 100)
>>> python_max_level(pokemon_test.to_dict("records"))
('Lapras', 72)
"""
...
doctest.run_docstring_examples(python_max_level, globals())
def pandas_max_level(data):
"""
...
>>> pandas_max_level(pokemon_box)
('Victreebel', 100)
>>> pandas_max_level(pokemon_test)
('Lapras', 72)
"""
...
doctest.run_docstring_examples(pandas_max_level, globals())
Task: Filter range¶
Write a function python_filter_range
that takes a list of dictionaries representing the pokemon dataset and two integers: a lower bound (inclusive) and upper bound (exclusive). The function should return a list of the names of pokemon whose level
fall within the bounds in the same order that they appear in the dataset.
Write a function pandas_filter_range
that does the same thing but using a DataFrame
as input. To convert a Series
to a list
, use the built-in list
function as shown below.
csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))
list(data['name'])
# ['Fido', 'Meowrty', 'Chester', 'Phil']
list(data.loc[1])
# ['Meowrty', 6, 'cat']
Add your test case and a descriptive docstring for both functions.
def python_filter_range(data, lower, upper):
"""
...
>>> pprint(python_filter_range(pokemon_box.to_dict("records"), 0, 10))
['Primeape',
'Metapod',
'Caterpie',
'Ninetales',
'Weezing',
'Tangela',
'Butterfree',
'Exeggcute',
'Arcanine']
>>> pprint(python_filter_range(pokemon_test.to_dict("records"), 35, 72))
['Arcanine', 'Arcanine', 'Starmie']
"""
...
doctest.run_docstring_examples(python_filter_range, globals())
def pandas_filter_range(data, lower, upper):
"""
...
>>> pprint(pandas_filter_range(pokemon_box, 0, 10))
['Primeape',
'Metapod',
'Caterpie',
'Ninetales',
'Weezing',
'Tangela',
'Butterfree',
'Exeggcute',
'Arcanine']
>>> pprint(pandas_filter_range(pokemon_test, 35, 72))
['Arcanine', 'Arcanine', 'Starmie']
"""
...
doctest.run_docstring_examples(pandas_filter_range, globals())
Task: Mean attack for type¶
Write a function python_mean_attack_for_type
that takes a list of dictionaries representing the pokemon dataset and a str
representing the pokemon type
. The function should return the average atk
for all the pokemon in the dataset with the given type
. If there are no pokemon of the given type
, return None
.
Write a function pandas_mean_attack_for_type
that does the same thing but using a DataFrame
as input.
Add your test case and a descriptive docstring for both functions.
def python_mean_attack_for_type(data, pokemon_type):
"""
...
>>> python_mean_attack_for_type(pokemon_box.to_dict("records"), "water")
99.75
>>> python_mean_attack_for_type(pokemon_test.to_dict("records"), "fire")
47.5
"""
...
doctest.run_docstring_examples(python_mean_attack_for_type, globals())
def pandas_mean_attack_for_type(data, pokemon_type):
"""
...
>>> pandas_mean_attack_for_type(pokemon_box, "water")
99.75
>>> pandas_mean_attack_for_type(pokemon_test, "fire")
47.5
"""
...
doctest.run_docstring_examples(pandas_mean_attack_for_type, globals())
Task: Count types¶
Write a function python_count_types
that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon type
and the number of pokemon of that type
. The order of entries in the returned dictionary does not matter.
Write a function pandas_count_types
that does the same thing but using a DataFrame
as input. To convert a Series
to a dict
, use the built-in dict
function as shown below.
csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))
dict(data['name'])
# {0: 'Fido', 1: 'Meowrty', 2: 'Chester', 3: 'Phil'}
dict(data.loc[1])
# {'name': 'Meowrty', 'age': 6, 'species': 'cat'}
Add your test case and a descriptive docstring for both functions.
def python_count_types(data):
"""
...
>>> pprint(python_count_types(pokemon_box.to_dict("records")))
{'bug': 3,
'electric': 1,
'fairy': 3,
'fighting': 3,
'fire': 15,
'flying': 6,
'ghost': 3,
'grass': 17,
'ground': 5,
'normal': 10,
'poison': 12,
'psychic': 6,
'rock': 7,
'water': 24}
>>> pprint(python_count_types(pokemon_test.to_dict("records")))
{'fire': 2, 'water': 2}
"""
...
doctest.run_docstring_examples(python_count_types, globals())
def pandas_count_types(data):
"""
...
>>> pprint(pandas_count_types(pokemon_box))
{'bug': 3,
'electric': 1,
'fairy': 3,
'fighting': 3,
'fire': 15,
'flying': 6,
'ghost': 3,
'grass': 17,
'ground': 5,
'normal': 10,
'poison': 12,
'psychic': 6,
'rock': 7,
'water': 24}
>>> pprint(pandas_count_types(pokemon_test))
{'fire': 2, 'water': 2}
"""
...
doctest.run_docstring_examples(pandas_count_types, globals())
Task: Mean attack per type¶
Write a function python_mean_attack_per_type
that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon type
and the average atk
of pokemon of that type
. The order of entries in the returned dictionary does not matter.
Write a function pandas_mean_attack_per_type
that does the same thing but using a DataFrame
as input.
Add your test case and a descriptive docstring for both functions.
def python_mean_attack_per_type(data):
"""
...
>>> pprint(python_mean_attack_per_type(pokemon_box.to_dict("records")))
{'bug': 25.0,
'electric': 64.0,
'fairy': 76.33333333333333,
'fighting': 99.66666666666667,
'fire': 99.4,
'flying': 110.83333333333333,
'ghost': 88.0,
'grass': 105.3529411764706,
'ground': 116.6,
'normal': 108.0,
'poison': 121.75,
'psychic': 114.83333333333333,
'rock': 84.85714285714286,
'water': 99.75}
>>> pprint(python_mean_attack_per_type(pokemon_test.to_dict("records")))
{'fire': 47.5, 'water': 140.5}
"""
...
doctest.run_docstring_examples(python_mean_attack_per_type, globals())
def pandas_mean_attack_per_type(data):
"""
...
>>> pprint(pandas_mean_attack_per_type(pokemon_box))
{'bug': 25.0,
'electric': 64.0,
'fairy': 76.33333333333333,
'fighting': 99.66666666666667,
'fire': 99.4,
'flying': 110.83333333333333,
'ghost': 88.0,
'grass': 105.3529411764706,
'ground': 116.6,
'normal': 108.0,
'poison': 121.75,
'psychic': 114.83333333333333,
'rock': 84.85714285714286,
'water': 99.75}
>>> pprint(pandas_mean_attack_per_type(pokemon_test))
{'fire': 47.5, 'water': 140.5}
"""
...
doctest.run_docstring_examples(pandas_mean_attack_per_type, globals())
Testing¶
test_results = doctest.testmod()
print(test_results)
assert test_results.failed == 0, "There are failed doctests."
assert test_results.attempted >= 36, "Total number of doctests should be at least 36; less than 36 means you did not have three tests per function."