Module 7

Libraries & Packages

pip, virtual environments, pandas, requests, and matplotlib — the Python ecosystem that transforms a scripting language into a professional powerhouse.

💡What You'll Build
By the end of this module, you will install packages with pip, manage dependencies with virtual environments, analyze a CSV with pandas, call an API with requests, and create professional charts with matplotlib. You will set up a complete project structure with `requirements.txt` — the way real Python developers work.

How one library made Python the language of data science

In 2008, a quantitative analyst at a hedge fund named Wes McKinney was frustrated. He needed to analyze financial data — millions of rows of stock prices, trades, and time series — and the existing tools were slow, clunky, and expensive. MATLAB cost thousands of dollars. R was powerful but awkward. Excel choked on large datasets.

So he built pandas — an open-source Python library for data analysis. He released it for free in 2009.

Today, pandas is installed over 100 million times per month. It single-handedly made Python the dominant language for data science, finance, and analytics. Data scientists at Netflix, Spotify, NASA, and every major bank use it daily. An entire industry runs on a library that one frustrated analyst built in his spare time.

This is the power of Python's ecosystem. You do not have to build everything yourself. Hundreds of thousands of libraries are available — free, one command to install — covering everything from data analysis to web scraping to machine learning.

In Module 6, you used csv and urllib — Python's built-in tools for data and APIs. They work, but they are verbose. Libraries like pandas, requests, and matplotlib do the same jobs in a fraction of the code, with far more power.

500000+packages on PyPI

100M+pandas downloads per month

1 cmdto install any package

pip — the package installer

pip is Python's package manager. It downloads and installs libraries from PyPI (Python Package Index), the central repository of Python packages. Think of PyPI as an app store for Python code.

bash
# Install a package
pip install pandas

# Install a specific version
pip install pandas==2.2.0

# Install multiple packages
pip install pandas matplotlib requests

# See what is installed
pip list

# Uninstall a package
pip uninstall pandas
⚠️python vs python3, pip vs pip3
On some systems (especially Mac/Linux), `python` and `pip` point to Python 2, while `python3` and `pip3` point to Python 3. If `pip install pandas` gives a "command not found" error, try `pip3 install pandas`. To avoid confusion, always verify: `python --version` should say Python 3.x. If it says 2.x, use `python3` and `pip3` everywhere.

Virtual environments — keeping projects separate

Imagine you have two projects. Project A needs pandas version 1.5. Project B needs pandas version 2.2. If both share the same Python installation, you cannot have both versions at once.

A virtual environment is a separate, isolated Python installation for each project. Think of it as giving each project its own toolbox instead of sharing one messy toolbox for everything.

bash
# Create a virtual environment
python -m venv myproject_env

# Activate it (Mac/Linux)
source myproject_env/bin/activate

# Activate it (Windows)
myproject_env\Scripts\activate

# Your terminal now shows (myproject_env) — you are inside
# Now pip installs go into THIS environment only
pip install pandas

# Deactivate when done
deactivate

Step 1: Createpython -m venv env_name makes a new isolated environment

Step 2: Activatesource env_name/bin/activate (Mac/Linux) or env_name\Scripts\activate (Windows)

Step 3: Installpip install packages — installs ONLY in this environment

Step 4: Freezepip freeze > requirements.txt — saves the exact list of packages

Step 5: Share — Anyone can recreate your environment: pip install -r requirements.txt

🔑requirements.txt is your project's ingredient list
`pip freeze > requirements.txt` creates a file listing every installed package and its exact version. When a teammate clones your project, they run `pip install -r requirements.txt` and get the identical setup. Every professional Python project has a `requirements.txt`. It is the recipe card that ensures everyone is cooking with the same ingredients.

There Are No Dumb Questions

"Do I really need virtual environments? It seems like extra work."

For learning, you can skip them. For any real project you plan to share, deploy, or maintain, they are essential. Without them, installing a new package for one project can break a different project. The 30 seconds it takes to create a venv saves hours of debugging dependency conflicts later.

"What about conda? I see it mentioned in data science tutorials."

Conda is an alternative package manager popular in data science. It can manage both Python packages AND non-Python dependencies (like C libraries). For this course, pip and venv are simpler and sufficient. If you later do heavy data science or machine learning, you may want to explore conda or miniconda.

pandas — data analysis in one line

pandas turns Python into a spreadsheet on steroids. It reads CSVs, filters rows, calculates statistics, and handles missing data — tasks that take hours in Excel take seconds in pandas.

python
import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv("employees.csv")

# See the first 5 rows
print(df.head())

# Basic statistics
print(df.describe())

# Filter rows
engineers = df[df["department"] == "Engineering"]

# Calculate average salary
avg_salary = df["salary"].mean()
print(f"Average salary: ${avg_salary:,.2f}")

# Sort by salary, highest first
top_paid = df.sort_values("salary", ascending=False)

# Add a new column
df["bonus"] = df["salary"] * 0.1
pandas operationWhat it doesExcel equivalent
pd.read_csv("file.csv")Load a CSVFile → Open
df.head()Show first 5 rowsScroll to top
df.describe()Summary statisticsManual formulas
df[df["col"] > 50]Filter rowsFilter button
df["col"].mean()Average of column=AVERAGE()
df.sort_values("col")Sort by columnSort A→Z
df.groupby("col").mean()Group and aggregatePivot table

🔒

pandas in Action

25 XP

Create a CSV file called `sales_data.csv`: ``` product,category,units,price Laptop,Electronics,120,999.99 Phone,Electronics,350,699.99 Desk,Furniture,85,249.99 Chair,Furniture,210,149.99 Keyboard,Electronics,500,79.99 Lamp,Furniture,175,39.99 ``` Then write a pandas script that: 1. Reads the CSV 2. Adds a "revenue" column (units * price) 3. Finds the product with the highest revenue 4. Calculates the total revenue per category 5. Prints the results _Hint: `df["revenue"] = df["units"] * df["price"]`. Use `df.sort_values("revenue", ascending=False).iloc[0]` for the top product. Use `df.groupby("category")["revenue"].sum()` for category totals._

Sign in to earn XP

requests — talking to APIs the easy way

In Module 6, we used urllib to call APIs. The requests library makes this much cleaner:

python
import requests

# GET request — fetch data
response = requests.get("https://api.open-meteo.com/v1/forecast", params={
    "latitude": 40.71,
    "longitude": -74.01,
    "current_weather": True
})

data = response.json()    # Automatically parses JSON
weather = data["current_weather"]
print(f"NYC temperature: {weather['temperature']}C")

# Check for errors
if response.status_code == 200:
    print("Success!")
else:
    print(f"Error: {response.status_code}")

urllib (built-in)

  • Verbose — 4+ lines per request
  • Must manually parse JSON
  • Error handling is clunky
  • Good for simple, quick scripts

requests (third-party)

  • Clean — 1-2 lines per request
  • .json() method built in
  • Status codes are easy to check
  • Best for real projects

There Are No Dumb Questions

"Why are there so many Python libraries? How do I know which to use?"

Python's philosophy is "batteries included" (good standard library) plus "there is a library for that" (rich ecosystem). For common tasks, one library dominates: pandas for data, requests for APIs, matplotlib for charts, flask or django for web. When in doubt, search "best Python library for X" — the community has strong consensus.

matplotlib — creating visualizations

matplotlib turns data into charts. It is the most widely used plotting library in Python and the foundation that other visualization libraries (seaborn, plotly) build on.

python
import matplotlib.pyplot as plt

# Simple bar chart
products = ["Laptop", "Phone", "Desk", "Chair"]
revenue = [120000, 245000, 21250, 31500]

plt.figure(figsize=(8, 5))
plt.bar(products, revenue, color=["#3b82f6", "#8b5cf6", "#10b981", "#f59e0b"])
plt.title("Revenue by Product")
plt.xlabel("Product")
plt.ylabel("Revenue ($)")
plt.tight_layout()
plt.savefig("revenue_chart.png")
plt.show()
python
# Line chart — trends over time
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
sales = [45, 52, 49, 63, 58, 71]

plt.figure(figsize=(8, 5))
plt.plot(months, sales, marker="o", color="#8b5cf6", linewidth=2)
plt.title("Monthly Sales Trend")
plt.xlabel("Month")
plt.ylabel("Sales (units)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("sales_trend.png")
plt.show()
python
# Pie chart — proportions
categories = ["Electronics", "Furniture", "Clothing", "Food"]
sizes = [35, 25, 20, 20]

plt.figure(figsize=(6, 6))
plt.pie(sizes, labels=categories, autopct="%1.1f%%",
        colors=["#3b82f6", "#10b981", "#f59e0b", "#ef4444"])
plt.title("Revenue by Category")
plt.tight_layout()
plt.savefig("category_pie.png")
plt.show()

🔒

Visualize Your Data

25 XP

Using the sales data from the previous challenge, create two charts: 1. A bar chart showing revenue by product 2. A pie chart showing revenue share by category Save both as PNG files. Make them look professional with titles, labels, and colors. _Hint: Use pandas to calculate the data, then pass the values to matplotlib. `plt.savefig("chart.png")` must come BEFORE `plt.show()` — otherwise the figure is cleared before saving._

Sign in to earn XP

The essential starter kit

Here are the libraries every Python beginner should know about:

LibraryPurposeInstall command
pandasData analysis and manipulationpip install pandas
requestsHTTP requests and APIspip install requests
matplotlibCharts and visualizationspip install matplotlib
numpyFast numerical computing (arrays, math)pip install numpy
python-dotenvLoad environment variables from .env filespip install python-dotenv
openpyxlRead/write Excel filespip install openpyxl
beautifulsoup4Web scraping (parse HTML)pip install beautifulsoup4
pytestTesting your codepip install pytest

<classifychallenge xp="25" title="Which Library?" items={["Read a 50,000-row CSV and calculate averages per category","Fetch the current Bitcoin price from a web API","Create a bar chart of sales by region","Multiply two large matrices of numbers","Scrape product prices from a website","Read and write Excel (.xlsx) files"]} options={["pandas","requests","matplotlib","numpy","beautifulsoup4","openpyxl"]} hint="pandas handles CSV/data analysis. requests handles HTTP/API calls. matplotlib creates charts. numpy does fast numerical computing. beautifulsoup4 parses HTML for web scraping. openpyxl reads and writes Excel files.">

🔒

Set Up a Professional Project

50 XP

Create a complete professional project setup: 1. Create a new directory called `my_data_project` 2. Create a virtual environment inside it 3. Activate the environment 4. Install pandas, requests, and matplotlib 5. Freeze the requirements: `pip freeze > requirements.txt` 6. Create a `main.py` that imports all three and prints their versions ```python import pandas as pd import requests import matplotlib print(f"pandas: {pd.__version__}") print(f"requests: {requests.__version__}") print(f"matplotlib: {matplotlib.__version__}") ``` _Hint: After `pip freeze > requirements.txt`, open the file and verify it lists all three packages with version numbers. This file is how you share your project's dependencies with others._

Sign in to earn XP

Back to Wes McKinney

McKinney built pandas because he was frustrated. The tools that existed were slow, expensive, or painful to use. So he built something better — and released it for free. That is the Python ecosystem in a nutshell: one frustrated developer's solution becomes the entire industry's standard tool. pandas, requests, matplotlib, numpy — all built by individuals or small teams who saw a problem and solved it.

You just learned to tap into that ecosystem. One pip install command gives you access to work that took brilliant developers years to build. That is leverage.

Next up: In the final module, every concept from this track comes together. You will build a complete data analysis project from scratch — load a dataset, clean it, analyze trends, create visualizations, and generate a report. It is the same workflow professional data analysts use every day, and it is portfolio-worthy.

Key takeaways

  • pip is Python's package manager — pip install package_name installs anything from PyPI's 500,000+ packages
  • Virtual environments isolate project dependencies — always use one for real projects (python -m venv env_name)
  • requirements.txt records your exact dependencies — create it with pip freeze > requirements.txt
  • pandas turns Python into a data analysis powerhouse — read CSVs, filter, aggregate, all in one line
  • requests makes API calls clean and simple — response = requests.get(url) then response.json()
  • matplotlib creates publication-quality charts — bar, line, pie, scatter, and more
  • The ecosystem is Python's biggest strength — the language itself is simple; the libraries make it powerful

?

Knowledge Check

1.What is the purpose of a virtual environment in Python?

2.Which command saves a list of all installed packages and their versions to a file?

3.In pandas, what does `df[df['salary'] > 50000]` do?

4.Why must `plt.savefig('chart.png')` come BEFORE `plt.show()` in matplotlib?

Want to go deeper?

💻 Software Engineering Master Class

The complete software engineering program — from your first line of code to landing your first job.

View the full program