Module 6

Working with Files & Data

Reading files, writing CSVs, parsing JSON, and calling APIs — how to get real-world data in and out of your Python programs.

💡What You'll Build
By the end of this module, you will read and write text files, parse CSV spreadsheets, work with JSON data, call a live weather API, and handle errors with try/except. You will build a complete data pipeline that reads a CSV, filters it, calculates averages, and writes a JSON report.

The intern who automated 4 hours of work in 12 lines

In 2019, a marketing intern at a mid-size e-commerce company was given a daily task: download a CSV report from the analytics dashboard, open it in Excel, filter out rows where revenue was below $10, calculate the daily average, and email a summary to the team. It took about 4 hours every day — downloading, clicking, copying, pasting, formatting.

After learning basic Python, the intern wrote 12 lines of code that did the entire job in 3 seconds. Download the file, filter it, calculate the average, format the summary. The manager was so impressed that the intern was promoted within three months.

That intern did not use machine learning. Did not use AI. Just read a file, processed the data, and wrote the output. The most practically valuable Python skill is not fancy algorithms — it is moving data in and out of files and APIs.

In Module 5, you learned to organize data in memory using lists and dictionaries. But when your program ends, that data vanishes. This module teaches you to make data permanent — reading it from files and writing it back.

4 hrsmanual work per day

3 secautomated with Python

12lines of code

Reading and writing text files

The simplest way to work with data is plain text files. Python's built-in open() function handles this.

python
# Writing to a file
with open("notes.txt", "w") as file:
    file.write("Line 1: Hello from Python\n")
    file.write("Line 2: This is a text file\n")
    file.write("Line 3: Easy, right?\n")

# Reading a file
with open("notes.txt", "r") as file:
    content = file.read()
    print(content)

# Reading line by line
with open("notes.txt", "r") as file:
    for line in file:
        print(line.strip())    # strip() removes trailing newline
ModeWhat it doesCreates file?
"r"Read onlyNo — crashes if file missing
"w"Write (overwrites everything)Yes
"a"Append (adds to end)Yes
"r+"Read and writeNo
🔑Always use 'with open()' — never just 'open()'
The `with` keyword automatically closes the file when the block ends, even if your code crashes. Without it, you must manually call `file.close()` — and if you forget (or an error happens first), the file can become corrupted or locked. The `with` pattern is called a "context manager" and it is the professional way to handle files in Python.

There Are No Dumb Questions

"What does 'w' mode do if the file already exists?"

It erases everything and starts fresh. This is the most dangerous file mode for beginners — you can accidentally delete hours of data with one open("important.txt", "w"). If you want to add to a file without erasing, use "a" (append) mode. Always double-check your mode before running.

"What is \n?"

It is a "newline character" — it tells the computer "go to the next line." When you press Enter in a text editor, it inserts a \n behind the scenes. When reading files, strip() removes it from the end of each line.

<classifychallenge xp="25" title="Which File Mode?" items={["Read a configuration file at program startup","Save a brand new report to disk","Append today's log entry to an existing log file","Overwrite a settings file with updated values","Add a new row to the bottom of an existing CSV","Read a list of usernames from a text file"]} options={[""r" (read)",""w" (write/overwrite)",""a" (append)"]} hint="Read mode (r) is for loading existing data without changing it. Write mode (w) creates or overwrites — use it for new files or complete rewrites. Append mode (a) adds to the end without erasing — use it for logs, growing CSVs, or cumulative data.">

CSV files — the spreadsheet of programming

CSV (Comma-Separated Values) is the most common data format in the world. Every spreadsheet app can export CSV. Every database can import it. It is just text with commas between values:

name,age,city,salary
Alice,30,London,75000
Bob,25,New York,68000
Charlie,35,Tokyo,82000

Python has a built-in csv module:

python
import csv

# Reading a CSV file
with open("employees.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(f"{row['name']} earns ${row['salary']}")

# Writing a CSV file
data = [
    {"name": "Alice", "score": 95},
    {"name": "Bob", "score": 87},
    {"name": "Charlie", "score": 92},
]

with open("scores.csv", "w", newline="") as file:
    writer = csv.DictWriter(file, fieldnames=["name", "score"])
    writer.writeheader()
    writer.writerows(data)
⚠️CSV values are always strings
When you read a CSV, every value comes in as a string — even numbers. `row['salary']` is `"75000"` (text), not `75000` (number). You must convert: `int(row['salary'])` or `float(row['price'])`. Forgetting this is the #1 CSV bug for beginners.

🔒

Analyze a CSV

25 XP

Create a file called `sales.csv` with this content: ``` product,units_sold,price Widget A,150,9.99 Widget B,89,24.99 Widget C,210,4.99 Widget D,45,49.99 Widget E,178,14.99 ``` Then write a Python script that: 1. Reads the CSV 2. Calculates the revenue for each product (units_sold * price) 3. Finds the product with the highest revenue 4. Prints a summary _Hint: Use `csv.DictReader`. Convert `units_sold` to `int` and `price` to `float`. Track the best product as you loop._

Sign in to earn XP

JSON — the language of APIs

JSON (JavaScript Object Notation) is how data moves across the internet. When an app fetches weather data, user profiles, or stock prices, it arrives as JSON. And JSON looks almost identical to Python dictionaries:

json
{
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "SQL", "Excel"],
    "address": {
        "city": "London",
        "country": "UK"
    }
}

Python's json module converts between JSON strings and Python data:

python
import json

# Python dict → JSON string
data = {"name": "Alice", "age": 30, "skills": ["Python", "SQL"]}
json_string = json.dumps(data, indent=2)
print(json_string)

# JSON string → Python dict
json_text = '{"name": "Bob", "age": 25}'
parsed = json.loads(json_text)
print(parsed["name"])    # "Bob"

# Read JSON from a file
with open("data.json", "r") as file:
    data = json.load(file)

# Write JSON to a file
with open("output.json", "w") as file:
    json.dump(data, file, indent=2)
FunctionDirectionSource
json.dumps()Python → JSON stringDictionary in memory
json.loads()JSON string → PythonString variable
json.dump()Python → JSON fileWrites to file
json.load()JSON file → PythonReads from file
🔑dumps = dump string, loads = load string
The "s" at the end stands for "string." `json.dump()` writes to a file. `json.dumps()` creates a string. `json.load()` reads from a file. `json.loads()` reads from a string. Once you see the pattern, you will never mix them up.

Working with APIs — getting live data

An API (Application Programming Interface) is a URL that returns data instead of a web page. You send a request, and the server sends back JSON.

python
import urllib.request
import json

# Fetch data from a public API
url = "https://api.open-meteo.com/v1/forecast?latitude=40.71&longitude=-74.01&current_weather=true"

with urllib.request.urlopen(url) as response:
    data = json.loads(response.read())

weather = data["current_weather"]
print(f"Temperature: {weather['temperature']}C")
print(f"Wind speed: {weather['windspeed']} km/h")

This fetches live weather data for New York City using a free, no-signup API. No API key required.

Step 1 — Construct the URL with any required parameters (latitude, longitude, etc.)

Step 2 — Send the request with urllib.request.urlopen()

Step 3 — Read the response and parse the JSON with json.loads()

Step 4 — Access the data like a Python dictionary — because it IS one now

There Are No Dumb Questions

"Do I need the requests library? I see it in every tutorial."

requests is a third-party library that makes API calls easier and more readable. urllib is built into Python — no installation needed. For learning, urllib is fine. For real projects, install requests (we will cover this in Module 7). The concepts are identical.

"What if the API is down or the request fails?"

Your program will crash with a URLError. In production code, you wrap API calls in a try/except block to handle failures gracefully. For now, just know that network requests can fail and error handling is important.

🔒

Fetch Live Data

25 XP

Use the Open-Meteo weather API to fetch the current weather for your city. Find your city's latitude and longitude (Google it), then modify this code: ```python import urllib.request import json lat = ___ # Your city's latitude lon = ___ # Your city's longitude url = f"https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}&current_weather=true" with urllib.request.urlopen(url) as response: data = json.loads(response.read()) weather = data["current_weather"] print(f"Temperature: {weather['temperature']}C") print(f"Wind speed: {weather['windspeed']} km/h") ``` Bonus: convert the temperature to Fahrenheit using your function from Module 4. _Hint: London is roughly 51.51, -0.13. Tokyo is 35.68, 139.69. Paris is 48.86, 2.35._

Sign in to earn XP

Error handling — when things go wrong

Files can be missing. APIs can be down. Users can enter garbage. Error handling lets your program deal with problems gracefully instead of crashing.

python
# Without error handling — crashes on bad input
age = int(input("Enter your age: "))    # Crashes if user types "abc"

# With error handling — recovers gracefully
try:
    age = int(input("Enter your age: "))
    print(f"You are {age} years old")
except ValueError:
    print("That is not a valid number!")

# Multiple except blocks
try:
    with open("data.csv", "r") as file:
        data = file.read()
    value = int(data.split(",")[0])
except FileNotFoundError:
    print("File not found — check the filename")
except ValueError:
    print("File contains non-numeric data")
except Exception as e:
    print(f"Unexpected error: {e}")

The pattern: try the risky thing. If it fails, except catches the specific error and runs alternative code.

🔒

Build a Data Pipeline

50 XP

Write a complete script that: 1. Reads `employees.csv` (create it with 5 employees: name, department, salary) 2. Filters only employees in the "Engineering" department 3. Calculates the average salary of engineers 4. Writes the results to `engineering_report.json` Include error handling for the case where the CSV file does not exist. Expected JSON output: ```json { "department": "Engineering", "employee_count": 2, "average_salary": 85000.0, "employees": ["Alice", "Charlie"] } ``` _Hint: Read with `csv.DictReader`. Filter with a list comprehension. Calculate average with `sum()/len()`. Write with `json.dump()`. Wrap file reading in try/except._

Sign in to earn XP

Back to the intern

That marketing intern's 12 lines of Python? You could write them now. Read a CSV with csv.DictReader, filter rows with a list comprehension, calculate an average with sum()/len(), and format the output with an f-string. Twelve lines, three seconds, four hours of manual work eliminated.

The difference between a junior developer and someone who just finished a tutorial is the ability to move data between the real world and your code. You just learned that skill — CSV in, JSON out, APIs on demand, errors handled gracefully.

Next up: You have been using Python's built-in tools — csv, json, urllib. They work, but the Python ecosystem has hundreds of thousands of third-party libraries that make everything easier. In the next module, you will learn to install packages with pip, manage projects with virtual environments, and use pandas (data analysis), requests (cleaner APIs), and matplotlib (charts and visualizations).

Key takeaways

  • with open() is the safe way to read and write files — it automatically closes the file, even on errors
  • "w" mode erases everything — use "a" (append) if you want to add to an existing file
  • CSV values are always strings — convert to int() or float() before doing math
  • json.dumps()/json.loads() work with strings; json.dump()/json.load() work with files — the "s" = "string"
  • APIs return JSON — fetch with urllib, parse with json.loads(), access like a dictionary
  • try/except handles errors gracefully — always wrap file and network operations
  • This is the most practical Python skill — most real-world automation is reading, processing, and writing data

?

Knowledge Check

1.What is the danger of opening a file with `open('data.txt', 'w')`?

2.When reading a CSV file with csv.DictReader, what data type are all values?

3.What is the difference between `json.dump()` and `json.dumps()`?

4.What Python construct should you use to handle a FileNotFoundError gracefully?

Want to go deeper?

💻 Software Engineering Master Class

The complete software engineering program — from your first line of code to landing your first job.

View the full program