Libraries & Packages
pip, virtual environments, pandas, requests, and matplotlib — the Python ecosystem that transforms a scripting language into a professional powerhouse.
How one library made Python the language of data science
In 2008, a quantitative analyst at a hedge fund named Wes McKinney was frustrated. He needed to analyze financial data — millions of rows of stock prices, trades, and time series — and the existing tools were slow, clunky, and expensive. MATLAB cost thousands of dollars. R was powerful but awkward. Excel choked on large datasets.
So he built pandas — an open-source Python library for data analysis. He released it for free in 2009.
Today, pandas is installed over 100 million times per month. It single-handedly made Python the dominant language for data science, finance, and analytics. Data scientists at Netflix, Spotify, NASA, and every major bank use it daily. An entire industry runs on a library that one frustrated analyst built in his spare time.
This is the power of Python's ecosystem. You do not have to build everything yourself. Hundreds of thousands of libraries are available — free, one command to install — covering everything from data analysis to web scraping to machine learning.
pip — the package installer
pip is Python's package manager. It downloads and installs libraries from PyPI (Python Package Index), the central repository of Python packages. Think of PyPI as an app store for Python code.
# Install a package
pip install pandas
# Install a specific version
pip install pandas==2.2.0
# Install multiple packages
pip install pandas matplotlib requests
# See what is installed
pip list
# Uninstall a package
pip uninstall pandas
Virtual environments — keeping projects separate
Imagine you have two projects. Project A needs pandas version 1.5. Project B needs pandas version 2.2. If both share the same Python installation, you cannot have both versions at once.
A virtual environment is a separate, isolated Python installation for each project. Think of it as giving each project its own toolbox instead of sharing one messy toolbox for everything.
# Create a virtual environment
python -m venv myproject_env
# Activate it (Mac/Linux)
source myproject_env/bin/activate
# Activate it (Windows)
myproject_env\Scripts\activate
# Your terminal now shows (myproject_env) — you are inside
# Now pip installs go into THIS environment only
pip install pandas
# Deactivate when done
deactivate
Step 1: Create — python -m venv env_name makes a new isolated environment
Step 2: Activate — source env_name/bin/activate (Mac/Linux) or env_name\Scripts\activate (Windows)
Step 3: Install — pip install packages — installs ONLY in this environment
Step 4: Freeze — pip freeze > requirements.txt — saves the exact list of packages
Step 5: Share — Anyone can recreate your environment: pip install -r requirements.txt
There Are No Dumb Questions
"Do I really need virtual environments? It seems like extra work."
For learning, you can skip them. For any real project you plan to share, deploy, or maintain, they are essential. Without them, installing a new package for one project can break a different project. The 30 seconds it takes to create a venv saves hours of debugging dependency conflicts later.
"What about conda? I see it mentioned in data science tutorials."
Conda is an alternative package manager popular in data science. It can manage both Python packages AND non-Python dependencies (like C libraries). For this course, pip and venv are simpler and sufficient. If you later do heavy data science or machine learning, you may want to explore conda or miniconda.
pandas — data analysis in one line
pandas turns Python into a spreadsheet on steroids. It reads CSVs, filters rows, calculates statistics, and handles missing data — tasks that take hours in Excel take seconds in pandas.
import pandas as pd
# Read a CSV file into a DataFrame
df = pd.read_csv("employees.csv")
# See the first 5 rows
print(df.head())
# Basic statistics
print(df.describe())
# Filter rows
engineers = df[df["department"] == "Engineering"]
# Calculate average salary
avg_salary = df["salary"].mean()
print(f"Average salary: ${avg_salary:,.2f}")
# Sort by salary, highest first
top_paid = df.sort_values("salary", ascending=False)
# Add a new column
df["bonus"] = df["salary"] * 0.1
| pandas operation | What it does | Excel equivalent |
|---|---|---|
pd.read_csv("file.csv") | Load a CSV | File → Open |
df.head() | Show first 5 rows | Scroll to top |
df.describe() | Summary statistics | Manual formulas |
df[df["col"] > 50] | Filter rows | Filter button |
df["col"].mean() | Average of column | =AVERAGE() |
df.sort_values("col") | Sort by column | Sort A→Z |
df.groupby("col").mean() | Group and aggregate | Pivot table |
pandas in Action
25 XPrequests — talking to APIs the easy way
In Module 6, we used urllib to call APIs. The requests library makes this much cleaner:
import requests
# GET request — fetch data
response = requests.get("https://api.open-meteo.com/v1/forecast", params={
"latitude": 40.71,
"longitude": -74.01,
"current_weather": True
})
data = response.json() # Automatically parses JSON
weather = data["current_weather"]
print(f"NYC temperature: {weather['temperature']}C")
# Check for errors
if response.status_code == 200:
print("Success!")
else:
print(f"Error: {response.status_code}")
✗ Without AI
- ✗Verbose — 4+ lines per request
- ✗Must manually parse JSON
- ✗Error handling is clunky
- ✗Good for simple, quick scripts
✓ With AI
- ✓Clean — 1-2 lines per request
- ✓.json() method built in
- ✓Status codes are easy to check
- ✓Best for real projects
There Are No Dumb Questions
"Why are there so many Python libraries? How do I know which to use?"
Python's philosophy is "batteries included" (good standard library) plus "there is a library for that" (rich ecosystem). For common tasks, one library dominates:
pandasfor data,requestsfor APIs,matplotlibfor charts,flaskordjangofor web. When in doubt, search "best Python library for X" — the community has strong consensus.
matplotlib — creating visualizations
matplotlib turns data into charts. It is the most widely used plotting library in Python and the foundation that other visualization libraries (seaborn, plotly) build on.
import matplotlib.pyplot as plt
# Simple bar chart
products = ["Laptop", "Phone", "Desk", "Chair"]
revenue = [120000, 245000, 21250, 31500]
plt.figure(figsize=(8, 5))
plt.bar(products, revenue, color=["#3b82f6", "#8b5cf6", "#10b981", "#f59e0b"])
plt.title("Revenue by Product")
plt.xlabel("Product")
plt.ylabel("Revenue ($)")
plt.tight_layout()
plt.savefig("revenue_chart.png")
plt.show()
# Line chart — trends over time
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
sales = [45, 52, 49, 63, 58, 71]
plt.figure(figsize=(8, 5))
plt.plot(months, sales, marker="o", color="#8b5cf6", linewidth=2)
plt.title("Monthly Sales Trend")
plt.xlabel("Month")
plt.ylabel("Sales (units)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("sales_trend.png")
plt.show()
# Pie chart — proportions
categories = ["Electronics", "Furniture", "Clothing", "Food"]
sizes = [35, 25, 20, 20]
plt.figure(figsize=(6, 6))
plt.pie(sizes, labels=categories, autopct="%1.1f%%",
colors=["#3b82f6", "#10b981", "#f59e0b", "#ef4444"])
plt.title("Revenue by Category")
plt.tight_layout()
plt.savefig("category_pie.png")
plt.show()
Visualize Your Data
25 XPThe essential starter kit
Here are the libraries every Python beginner should know about:
| Library | Purpose | Install command |
|---|---|---|
| pandas | Data analysis and manipulation | pip install pandas |
| requests | HTTP requests and APIs | pip install requests |
| matplotlib | Charts and visualizations | pip install matplotlib |
| numpy | Fast numerical computing (arrays, math) | pip install numpy |
| python-dotenv | Load environment variables from .env files | pip install python-dotenv |
| openpyxl | Read/write Excel files | pip install openpyxl |
| beautifulsoup4 | Web scraping (parse HTML) | pip install beautifulsoup4 |
| pytest | Testing your code | pip install pytest |
Monthly downloads (millions, approximate)
Set Up a Professional Project
50 XPKey takeaways
- pip is Python's package manager —
pip install package_nameinstalls anything from PyPI's 500,000+ packages - Virtual environments isolate project dependencies — always use one for real projects (
python -m venv env_name) requirements.txtrecords your exact dependencies — create it withpip freeze > requirements.txt- pandas turns Python into a data analysis powerhouse — read CSVs, filter, aggregate, all in one line
- requests makes API calls clean and simple —
response = requests.get(url)thenresponse.json() - matplotlib creates publication-quality charts — bar, line, pie, scatter, and more
- The ecosystem is Python's biggest strength — the language itself is simple; the libraries make it powerful
Knowledge Check
1.What is the purpose of a virtual environment in Python?
2.Which command saves a list of all installed packages and their versions to a file?
3.In pandas, what does `df[df['salary'] > 50000]` do?
4.Why must `plt.savefig('chart.png')` come BEFORE `plt.show()` in matplotlib?