GROUP BY and aggregates, counting like a manager
So far our queries have returned raw rows. Useful, but rarely what your boss actually asks for. They want totals. Averages. Counts per city. This is what aggregates are for.
GROUP BY is where SQL starts to feel like a real analysis tool. This is the lesson where you stop being a "data fetcher" and start being someone who answers questions.
The five aggregates
These are functions that collapse many rows into one number.
COUNT(*)how many rows.COUNT(column)ignores NULLs.SUM(column)total of a numeric column.AVG(column)average. Watch the precision, sometimes returns decimals you don't expect.MIN(column)smallest value. Works on numbers, dates, even strings (alphabetical).MAX(column)largest value.
SELECT COUNT(*), SUM(total), AVG(total)
FROM orders;That gives you one row with three numbers. The total order count, the total revenue, and the average order value, for the entire table.
GROUP BY: one row per bucket
What if you want those numbers per city? Add GROUP BY.
SELECT city, COUNT(*), SUM(total)
FROM orders
GROUP BY city;Now you get one row per city. Read GROUP BY as "make one bucket per unique value of this column, then run the aggregates inside each bucket."
- Every non-aggregate column in SELECT must appear in GROUP BY. This is a hard rule, not a suggestion.
- You can GROUP BY multiple columns:
GROUP BY city, statusgives one row per city-status combo.
The mental model: GROUP BY creates buckets. Aggregates summarise each bucket. If a column is not bucketed and not aggregated, SQL does not know what to do with it, and it will yell at you.
HAVING vs WHERE
Both filter rows. The difference is when.
WHEREfilters rows before grouping.HAVINGfilters groups after aggregating.
SELECT city, COUNT(*) AS order_count
FROM orders
WHERE status = 'delivered'
GROUP BY city
HAVING COUNT(*) > 100;Read it top to bottom. Take delivered orders only (WHERE). Bucket them by city (GROUP BY). Show only cities with more than 100 (HAVING).
If you tried to put COUNT(*) > 100 in WHERE, SQL would refuse. The count does not exist yet at WHERE time.
A real query, end to end
"Top 5 cities by revenue from delivered orders."
SELECT city, SUM(total) AS revenue
FROM orders
WHERE status = 'delivered'
GROUP BY city
ORDER BY revenue DESC
LIMIT 5;Every clause is doing one job. WHERE filters. GROUP BY buckets. SUM aggregates. ORDER BY sorts. LIMIT cuts. This is the shape of nine out of ten analytics queries.
Try it on the right
Three GROUP BY assignments. The third one combines GROUP BY with a JOIN, which is where it gets fun. Next: indexes, and why your slow query is probably missing one.
Free tools you can use while you learn
Watching quietly. Tap me if you want a tip.
Try this (0 of 3 done)
- 1
Revenue earned by each restaurant from delivered orders.
hint
GROUP BY restaurant, SUM(total), filter delivered first.
show answer
SELECT r.name, SUM(o.total) AS revenue FROM orders o JOIN restaurants r ON r.id = o.restaurant_id WHERE o.status = 'delivered' GROUP BY r.name ORDER BY revenue DESC; - 2
Customers who placed more than 2 orders. (HAVING clause.)
hint
GROUP BY customer then HAVING COUNT(*) > 2
show answer
SELECT c.name, COUNT(*) AS order_count FROM orders o JOIN customers c ON c.id = o.customer_id GROUP BY c.name HAVING COUNT(*) > 2; - 3
Average order value per city.
hint
AVG(total) grouped by city. JOIN customers.
show answer
SELECT c.city, ROUND(AVG(o.total), 2) AS avg_order FROM orders o JOIN customers c ON c.id = o.customer_id GROUP BY c.city;