Pokemon - Intermediate Data Programming

In this homework, you’ll read, process, and group CSV data to compute descriptive statistics in two ways for each problem: with the Pandas library and without the Pandas library.

import doctest
import io
import pandas as pd

# For prettifying doctest output involving data structures
# See also: https://stackoverflow.com/a/21227671
from pprint import pprint

In the Pokémon video game series, the player catches pokemon, fictional creatures trained to battle each other as part of a sport franchise. For this first task, you’ll practice creating your own pokemon-themed CSV dataset in the following format.

pokemon_box = pd.read_csv("pokemon_box.csv")
pokemon_box

id is a unique numeric identifier corresponding to the species of a pokemon.
name is the name of the species of pokemon, such as Bulbasaur.
level is the integer level of the pokemon.
personality is a one-word string describing the personality of the pokemon, such as Jolly.
type is a one-word string describing the type of the pokemon, such as Grass.
weakness is the enemy type that this pokemon is weak toward. Bulbasaur is weak to fire-type pokemon.
atk, def, hp are integers that indicate the attack power, defense power, and hit points of the pokemon.
stage is an integer that indicates the particular developmental stage of the pokemon.

Assume the data is never empty (there’s at least one pokemon), that there’s no missing data (each pokemon has every attribute), and pokemon stats can be any non-negative integers, including 0.

This assessment introduces a new way of validating and testing your data programs by comparing two different approaches to implementing the same function: writing an implementation once using plain Python and again using Pandas. For each programming task below, you’ll write, document, and test each function in the same way to build confidence in their correctness and robustness.

In addition to the large pokemon_box dataset above, we’ve provided a much smaller pokemon_test dataset below.

pokemon_test = pd.read_csv(io.StringIO("""
id,name,level,personality,type,weakness,atk,def,hp,stage
59,Arcanine,35,impish,fire,water,50,55,90,2
59,Arcanine,35,gentle,fire,water,45,60,80,2
121,Starmie,67,sassy,water,electric,174,56,113,2
131,Lapras,72,lax,water,electric,107,113,29,1
"""))
pokemon_test

Note that it’s possible to have multiple pokemon that have very similar attributes. In the pokemon_test dataset, there are two pokemon named “Arcanine” with the same id, level, and type: differing only in personality, atk, def, and hp. Since there’s not a clearly unique key to use as an index, we won’t define a meaningful index for this assessment.

Outside Sources¶

Update the following Markdown cell to include your name and list your outside sources. Submitted work should be consistent with the curriculum and your sources.

Name: YOUR_NAME_HERE

Enter your outside sources as a list here, or remove this line if you did not consult any outside sources at all.

Task: Create your own dataset¶

Before starting your programming tasks, create at least one additional testing dataset below. In total, each function you write should contain 3 tests:

One test for the large pokemon_box dataset.
One test for the small pokemon_test dataset.
One test for your own pokemon_mine dataset below.

pokemon_mine = pd.read_csv(io.StringIO(
    ...
))
pokemon_mine

Task: Species count¶

Write a function python_species_count that takes a list of dictionaries representing the pokemon dataset and returns the number of unique pokemon species in the dataset as determined by the name attribute without using Pandas.

Write a function pandas_species_count that does the same thing but using a DataFrame as input.

Add your test case and a descriptive docstring for both functions.

def python_species_count(data):
    """
    ...

    >>> python_species_count(pokemon_box.to_dict("records"))
    82
    >>> python_species_count(pokemon_test.to_dict("records"))
    3
    """
    ...


doctest.run_docstring_examples(python_species_count, globals())

def pandas_species_count(data):
    """
    ...

    >>> pandas_species_count(pokemon_box)
    82
    >>> pandas_species_count(pokemon_test)
    3
    """
    ...


doctest.run_docstring_examples(pandas_species_count, globals())

Task: Max level¶

Write a function python_max_level that takes a list of dictionaries representing the pokemon dataset and returns a 2-element tuple for the (name, level) of the pokemon with the highest level in the dataset. If there are multiple pokemon with the highest level, return the pokemon that appears first in the dataset.

Write a function pandas_max_level that does the same thing but using a DataFrame as input.

Add your test case and a descriptive docstring for both functions.

def python_max_level(data):
    """
    ...

    >>> python_max_level(pokemon_box.to_dict("records"))
    ('Victreebel', 100)
    >>> python_max_level(pokemon_test.to_dict("records"))
    ('Lapras', 72)
    """
    ...


doctest.run_docstring_examples(python_max_level, globals())

def pandas_max_level(data):
    """
    ...

    >>> pandas_max_level(pokemon_box)
    ('Victreebel', 100)
    >>> pandas_max_level(pokemon_test)
    ('Lapras', 72)
    """
    ...


doctest.run_docstring_examples(pandas_max_level, globals())

Task: Filter range¶

Write a function python_filter_range that takes a list of dictionaries representing the pokemon dataset and two integers: a lower bound (inclusive) and upper bound (exclusive). The function should return a list of the names of pokemon whose level fall within the bounds in the same order that they appear in the dataset.

Write a function pandas_filter_range that does the same thing but using a DataFrame as input. To convert a Series to a list, use the built-in list function as shown below.

csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))

list(data['name'])
# ['Fido', 'Meowrty', 'Chester', 'Phil']

list(data.loc[1])
# ['Meowrty', 6, 'cat']

Add your test case and a descriptive docstring for both functions.

def python_filter_range(data, lower, upper):
    """
    ...

    >>> pprint(python_filter_range(pokemon_box.to_dict("records"), 0, 10))
    ['Primeape',
     'Metapod',
     'Caterpie',
     'Ninetales',
     'Weezing',
     'Tangela',
     'Butterfree',
     'Exeggcute',
     'Arcanine']
    >>> pprint(python_filter_range(pokemon_test.to_dict("records"), 35, 72))
    ['Arcanine', 'Arcanine', 'Starmie']
    """
    ...


doctest.run_docstring_examples(python_filter_range, globals())

def pandas_filter_range(data, lower, upper):
    """
    ...

    >>> pprint(pandas_filter_range(pokemon_box, 0, 10))
    ['Primeape',
     'Metapod',
     'Caterpie',
     'Ninetales',
     'Weezing',
     'Tangela',
     'Butterfree',
     'Exeggcute',
     'Arcanine']
    >>> pprint(pandas_filter_range(pokemon_test, 35, 72))
    ['Arcanine', 'Arcanine', 'Starmie']
    """
    ...


doctest.run_docstring_examples(pandas_filter_range, globals())

Task: Mean attack for type¶

Write a function python_mean_attack_for_type that takes a list of dictionaries representing the pokemon dataset and a str representing the pokemon type. The function should return the average atk for all the pokemon in the dataset with the given type. If there are no pokemon of the given type, return None.

Write a function pandas_mean_attack_for_type that does the same thing but using a DataFrame as input.

Add your test case and a descriptive docstring for both functions.

def python_mean_attack_for_type(data, pokemon_type):
    """
    ...

    >>> python_mean_attack_for_type(pokemon_box.to_dict("records"), "water")
    99.75
    >>> python_mean_attack_for_type(pokemon_test.to_dict("records"), "fire")
    47.5
    """
    ...


doctest.run_docstring_examples(python_mean_attack_for_type, globals())

def pandas_mean_attack_for_type(data, pokemon_type):
    """
    ...

    >>> pandas_mean_attack_for_type(pokemon_box, "water")
    99.75
    >>> pandas_mean_attack_for_type(pokemon_test, "fire")
    47.5
    """
    ...


doctest.run_docstring_examples(pandas_mean_attack_for_type, globals())

Task: Count types¶

Write a function python_count_types that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon type and the number of pokemon of that type. The order of entries in the returned dictionary does not matter.

Write a function pandas_count_types that does the same thing but using a DataFrame as input. To convert a Series to a dict, use the built-in dict function as shown below.

csv = """
name,age,species
Fido,4,dog
Meowrty,6,cat
Chester,1,dog
Phil,1,axolotl
"""
data = pd.read_csv(io.StringIO(csv))

dict(data['name'])
# {0: 'Fido', 1: 'Meowrty', 2: 'Chester', 3: 'Phil'}

dict(data.loc[1])
# {'name': 'Meowrty', 'age': 6, 'species': 'cat'}

Add your test case and a descriptive docstring for both functions.

def python_count_types(data):
    """
    ...

    >>> pprint(python_count_types(pokemon_box.to_dict("records")))
    {'bug': 3,
     'electric': 1,
     'fairy': 3,
     'fighting': 3,
     'fire': 15,
     'flying': 6,
     'ghost': 3,
     'grass': 17,
     'ground': 5,
     'normal': 10,
     'poison': 12,
     'psychic': 6,
     'rock': 7,
     'water': 24}
    >>> pprint(python_count_types(pokemon_test.to_dict("records")))
    {'fire': 2, 'water': 2}
    """
    ...


doctest.run_docstring_examples(python_count_types, globals())

def pandas_count_types(data):
    """
    ...

    >>> pprint(pandas_count_types(pokemon_box))
    {'bug': 3,
     'electric': 1,
     'fairy': 3,
     'fighting': 3,
     'fire': 15,
     'flying': 6,
     'ghost': 3,
     'grass': 17,
     'ground': 5,
     'normal': 10,
     'poison': 12,
     'psychic': 6,
     'rock': 7,
     'water': 24}
    >>> pprint(pandas_count_types(pokemon_test))
    {'fire': 2, 'water': 2}
    """
    ...


doctest.run_docstring_examples(pandas_count_types, globals())

Task: Mean attack per type¶

Write a function python_mean_attack_per_type that takes a list of dictionaries representing the pokemon dataset and returns a dictionary of each pokemon type and the average atk of pokemon of that type. The order of entries in the returned dictionary does not matter.

Write a function pandas_mean_attack_per_type that does the same thing but using a DataFrame as input.

Add your test case and a descriptive docstring for both functions.

def python_mean_attack_per_type(data):
    """
    ...

    >>> pprint(python_mean_attack_per_type(pokemon_box.to_dict("records")))
    {'bug': 25.0,
     'electric': 64.0,
     'fairy': 76.33333333333333,
     'fighting': 99.66666666666667,
     'fire': 99.4,
     'flying': 110.83333333333333,
     'ghost': 88.0,
     'grass': 105.3529411764706,
     'ground': 116.6,
     'normal': 108.0,
     'poison': 121.75,
     'psychic': 114.83333333333333,
     'rock': 84.85714285714286,
     'water': 99.75}
    >>> pprint(python_mean_attack_per_type(pokemon_test.to_dict("records")))
    {'fire': 47.5, 'water': 140.5}
    """
    ...


doctest.run_docstring_examples(python_mean_attack_per_type, globals())

def pandas_mean_attack_per_type(data):
    """
    ...

    >>> pprint(pandas_mean_attack_per_type(pokemon_box))
    {'bug': 25.0,
     'electric': 64.0,
     'fairy': 76.33333333333333,
     'fighting': 99.66666666666667,
     'fire': 99.4,
     'flying': 110.83333333333333,
     'ghost': 88.0,
     'grass': 105.3529411764706,
     'ground': 116.6,
     'normal': 108.0,
     'poison': 121.75,
     'psychic': 114.83333333333333,
     'rock': 84.85714285714286,
     'water': 99.75}
    >>> pprint(pandas_mean_attack_per_type(pokemon_test))
    {'fire': 47.5, 'water': 140.5}
    """
    ...


doctest.run_docstring_examples(pandas_mean_attack_per_type, globals())

Testing¶

test_results = doctest.testmod()
print(test_results)
assert test_results.failed == 0, "There are failed doctests."
assert test_results.attempted >= 36, "Total number of doctests should be at least 36; less than 36 means you did not have three tests per function."