Window functions, the elegant stuff
You know how to GROUP BY and aggregate. Now I am going to show you how to compute things "per group" without collapsing rows. Window functions keep all your rows AND add extra columns that look across the group.
This is the lesson where your SQL stops looking junior. Window functions are the difference between "I know SQL" and "this person actually thinks in SQL."
The shape: function() OVER (...)
Every window function looks like this:
SELECT name, total,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY total DESC) AS rank
FROM orders;PARTITION BYis like GROUP BY, but it does not collapse rows. It just defines the bucket.ORDER BYinside OVER orders rows within each bucket.- The function (here, ROW_NUMBER) runs over that ordered bucket.
Result: every order row is still there, plus a new rank column showing its position within its city. GROUP BY could never.
ROW_NUMBER, RANK, DENSE_RANK
All three number your rows. They differ on ties.
ROW_NUMBER()always 1, 2, 3, 4, even if two rows tie. Arbitrary order on ties.RANK()ties get the same number, then it skips. 1, 2, 2, 4.DENSE_RANK()ties get the same number, no skip. 1, 2, 2, 3.
Use ROW_NUMBER when you want "top 1 per group." Use RANK when ties matter.
SELECT city, name, total FROM (
SELECT city, name, total,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY total DESC) AS rn
FROM orders
) ranked
WHERE rn = 1;That is "highest-value order in each city." A classic interview question.
LAG and LEAD: peek at neighbours
LAG looks at the previous row. LEAD looks at the next row. Both within the window.
SELECT order_date, total,
LAG(total) OVER (ORDER BY order_date) AS prev_total
FROM daily_revenue;Day-over-day comparison in one query, no self-join needed. Before window functions, this query was three times longer and twice as slow.
Running totals with SUM() OVER
The one that makes everyone go "ohhh."
SELECT order_date, total,
SUM(total) OVER (ORDER BY order_date) AS running_total
FROM orders;Each row shows the cumulative sum up to and including itself. Add PARTITION BY customer_id and you get a running total per customer. Add it to AVG or COUNT and you get moving averages and running counts.
When someone asks you for "month-on-month growth" or "running balance" or "compare each row to the previous," window functions are the answer. Reaching for a self-join here marks you as someone who learned SQL in 2008 and stopped.
A note on performance
Window functions are not free. They require a sort within each partition. On large tables, make sure the columns you PARTITION BY and ORDER BY are indexed, or at least that the query plan looks sane under EXPLAIN.
Try it on the right
Three assignments, all using window functions. The third one is a running total. When it works, take a moment. You have just written something that would have taken your 2010-self half a day.
That wraps the track. Go build something.
Free tools you can use while you learn
Watching quietly. Tap me if you want a tip.
Try this (0 of 2 done)
- 1
For each city, list the top 1 highest-spending order.
hint
Wrap your window query in a subquery, filter rank = 1.
show answer
SELECT * FROM ( SELECT c.city, c.name, o.total, ROW_NUMBER() OVER (PARTITION BY c.city ORDER BY o.total DESC) AS rk FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.status = 'delivered' ) WHERE rk = 1; - 2
Running total of order amount, ordered by placed_at.
hint
SUM(total) OVER (ORDER BY placed_at)
show answer
SELECT placed_at, total, SUM(total) OVER (ORDER BY placed_at) AS running_total FROM orders WHERE status = 'delivered';