Advanced SQL
JOINs, subqueries, window functions, and CTEs — the SQL that separates junior analysts from senior ones. This is the SQL that gets you hired.
The analyst who got promoted with one query
A product analyst at a fintech startup had been writing basic SQL for a year — SELECT, WHERE, GROUP BY. Solid but limited. One Friday, the VP of Product asked a question that stumped the whole analytics team: "For each customer, show me their most recent transaction, how it compares to their average, and rank all customers by lifetime value — but only for customers who made at least 3 purchases in the last 90 days."
Everyone else said they'd need to build it in Python or export the data to Excel. The analyst opened her SQL editor and wrote a single query using a CTE to filter active customers, a window function to calculate running averages and rankings, and a self-join to pull the most recent transaction.
Twelve lines of SQL. The VP had his answer in 30 seconds. The analyst was promoted to Senior Analyst two months later.
The difference between basic and advanced SQL isn't about writing more code. It's about solving harder problems in fewer steps.
JOINs: connecting the pieces
Real databases don't store everything in one table. Customer info lives in customers, orders in orders, products in products. JOINs connect them.
The four types of JOINs
| JOIN type | What it returns | When to use it |
|---|---|---|
| INNER JOIN | Only rows that match in both tables | When you want customers who HAVE orders |
| LEFT JOIN | All rows from the left table + matches from right | When you want ALL customers, even those with no orders |
| RIGHT JOIN | All rows from the right table + matches from left | Rarely used — just swap table order and use LEFT JOIN |
| FULL OUTER JOIN | All rows from both tables | When you need to find mismatches between tables |
-- INNER JOIN: Only customers who have placed orders
SELECT c.name, o.product, o.amount
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
-- LEFT JOIN: ALL customers, even if they haven't ordered
SELECT c.name, o.product, o.amount
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
The LEFT JOIN is the workhorse of analytics. If a customer has no orders, the order columns show NULL — which tells you something valuable: this is a customer who hasn't bought anything.
✗ Without AI
- ✗Only matching rows
- ✗Drops customers with no orders
- ✗Smaller result set
- ✗Use when both sides must exist
✓ With AI
- ✓All rows from left table
- ✓Keeps customers with no orders (NULLs)
- ✓Larger result set
- ✓Use when you want the full picture
There Are No Dumb Questions
"How do I know which table is 'left' and which is 'right'?"
The table in the FROM clause is the left table. The table in the JOIN clause is the right table.
FROM customers LEFT JOIN orders— customers is left, orders is right. Every customer appears; only matching orders appear."What happens if I JOIN on a column that isn't unique?"
You get duplicate rows. If customer 42 has 5 orders, a JOIN on customer_id produces 5 rows for customer 42 — one for each order. This is usually correct. But if you accidentally JOIN on a non-unique column (like
name), you'll get a "cartesian explosion" — thousands of false matches. Always JOIN on IDs.
JOIN Exercises
25 XPSubqueries: queries inside queries
A subquery is a SELECT statement nested inside another query. Think of it as a two-step question: first find something, then use that result.
In the WHERE clause
-- Find customers who spent more than the average
SELECT name, total_spent
FROM customers
WHERE total_spent > (SELECT AVG(total_spent) FROM customers);
The inner query calculates the average. The outer query filters using that average. You can't do this with a single WHERE clause.
In the FROM clause (derived tables)
-- Average orders per customer (two-step: first count, then average)
SELECT AVG(order_count) as avg_orders_per_customer
FROM (
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
) customer_orders;
With IN for list filtering
-- Find products that have never been ordered
SELECT name
FROM products
WHERE id NOT IN (SELECT DISTINCT product_id FROM orders);
CTEs: readable, reusable query blocks
A Common Table Expression (CTE) is a named temporary result set. It makes complex queries readable by breaking them into logical steps.
WITH active_customers AS (
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 3
),
customer_details AS (
SELECT c.name, c.email, ac.order_count, ac.total_spent
FROM customers c
JOIN active_customers ac ON c.id = ac.customer_id
)
SELECT *
FROM customer_details
ORDER BY total_spent DESC;
Without the CTE, this would be a mess of nested subqueries. With CTEs, each step has a name and a purpose:
- active_customers — filter to customers with 3+ orders in 2025
- customer_details — join in names and emails
- Final SELECT — sort and display
There Are No Dumb Questions
"What's the difference between a CTE and a subquery?"
Functionally, they're often interchangeable. The difference is readability. A CTE sits at the top of the query with a clear name. A subquery is buried inside another query. For anything more than one level of nesting, CTEs are dramatically easier to read, debug, and maintain.
"Can I reference a CTE multiple times in the same query?"
Yes — that's one of the biggest advantages over subqueries. If you define
WITH monthly_revenue AS (...), you can JOIN to it, filter by it, and aggregate it multiple times without re-running the query.
Window functions: the advanced analytics power tool
Window functions perform calculations across a set of rows related to the current row — without collapsing the rows into groups like GROUP BY does.
The key difference: GROUP BY reduces 1,000 rows to 10 groups. Window functions keep all 1,000 rows but add a calculated column.
ROW_NUMBER, RANK, DENSE_RANK
-- Rank customers by total spending
SELECT
name,
total_spent,
ROW_NUMBER() OVER (ORDER BY total_spent DESC) as row_num,
RANK() OVER (ORDER BY total_spent DESC) as rank,
DENSE_RANK() OVER (ORDER BY total_spent DESC) as dense_rank
FROM customers;
| Function | Ties behavior | Example (scores: 100, 95, 95, 90) |
|---|---|---|
| ROW_NUMBER | No ties — arbitrary order | 1, 2, 3, 4 |
| RANK | Same rank for ties, gaps after | 1, 2, 2, 4 |
| DENSE_RANK | Same rank for ties, no gaps | 1, 2, 2, 3 |
Running totals and moving averages
-- Running total of revenue by month
SELECT
month,
revenue,
SUM(revenue) OVER (ORDER BY month) as running_total,
AVG(revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3mo
FROM monthly_revenue;
PARTITION BY: window within groups
-- Rank each employee within their department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
PARTITION BY is like GROUP BY for window functions — it creates separate "windows" for each partition. The rank resets for each department.
SUM() OVER — running totals and cumulative values
AVG() OVER — moving averages (great for smoothing trends)
ROW_NUMBER() OVER — unique numbering (great for deduplication)
RANK() OVER — ranking with ties (great for leaderboards)
LAG() / LEAD() OVER — access previous/next row (great for period-over-period comparison)
LAG and LEAD: comparing rows
-- Month-over-month revenue change
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) as change,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0
/ LAG(revenue) OVER (ORDER BY month), 1) as pct_change
FROM monthly_revenue;
This calculates "how much did revenue change from last month?" — one of the most common business questions, answered in a single query.
Window Function Challenge
50 XPPutting it all together
Here's the query that got our analyst promoted — reconstructed:
WITH active_customers AS (
-- Step 1: Find customers with 3+ purchases in last 90 days
SELECT customer_id
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY customer_id
HAVING COUNT(*) >= 3
),
customer_stats AS (
-- Step 2: Calculate lifetime stats for active customers
SELECT
t.customer_id,
c.name,
SUM(t.amount) as lifetime_value,
AVG(t.amount) as avg_transaction,
MAX(t.transaction_date) as last_transaction_date,
RANK() OVER (ORDER BY SUM(t.amount) DESC) as ltv_rank
FROM transactions t
JOIN customers c ON t.customer_id = c.id
WHERE t.customer_id IN (SELECT customer_id FROM active_customers)
GROUP BY t.customer_id, c.name
),
latest_transactions AS (
-- Step 3: Get the most recent transaction for each customer
SELECT *
FROM (
SELECT
customer_id,
amount as latest_amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date DESC) as rn
FROM transactions
) ranked
WHERE rn = 1
)
-- Step 4: Combine everything
SELECT
cs.name,
cs.lifetime_value,
cs.avg_transaction,
lt.latest_amount,
lt.latest_amount - cs.avg_transaction as vs_average,
cs.ltv_rank
FROM customer_stats cs
JOIN latest_transactions lt ON cs.customer_id = lt.customer_id
ORDER BY cs.ltv_rank;
Four CTEs, each doing one clear thing. Window functions for ranking and getting the latest row. JOINs to combine the pieces. This is senior-level SQL — and you now know every building block.
Read the Query
25 XPKey takeaways
- LEFT JOIN is the analyst's best friend. It keeps all rows from the left table, filling NULLs for non-matches — essential for seeing the complete picture.
- CTEs make complex queries readable. Name each step, break the logic into blocks, and compose them. Any query with more than one subquery should be a CTE.
- Window functions analyze without collapsing. ROW_NUMBER, RANK, SUM OVER, LAG, and LEAD calculate across rows while keeping every row visible.
- PARTITION BY is GROUP BY for windows. It creates separate calculations within groups without reducing the result set.
- LAG/LEAD compare adjacent rows. Month-over-month changes, period comparisons, and sequential analysis are all one function call away.
- Advanced SQL is a career differentiator. These skills are tested in every senior analyst and analytics engineer interview.
Knowledge Check
1.What is the key difference between INNER JOIN and LEFT JOIN?
2.What problem does a CTE (Common Table Expression) solve that subqueries don't?
3.Three employees have salaries of $90K, $85K, $85K, and $80K. Using RANK() OVER (ORDER BY salary DESC), what ranks do they receive?
4.What does LAG(revenue) OVER (ORDER BY month) return?