Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This activity checks understanding for a model to learn the feature importance of data.

We construct a learning model. You will train a model on this data and see if it “learns” the hidden formula features for this.

Output each feature and the importance (coef_) to complete the activity.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# --- STEP 1: GENERATE THE DATA ---
np.random.seed(10)
n = 100
data = {
    'Hours_Slept': np.random.uniform(4, 10, n),
    'Practice_Problems': np.random.randint(0, 50, n),
    'Coffee_Cups': np.random.randint(0, 8, n),      # Noise (Mostly)
    'Video_Game_Hours': np.random.uniform(0, 5, n)  # Negative impact
}
df = pd.DataFrame(data)

# The Hidden Formula: Score = (10 * Sleep) + (0.5 * Problems) - (5 * VideoGames) + Random Noise
df['Test_Score'] = (10 * df['Hours_Slept']) + (0.5 * df['Practice_Problems']) - (5 * df['Video_Game_Hours']) + np.random.normal(0, 2, n)

# --- STEP 2: YOUR CODE HERE ---
# 1. Define X (all columns except Test_Score) and y (Test_Score)
# 2. Initialize and Fit a LinearRegression model
# 3. Print the .coef_ for each feature