How I Write SQL
I recently got asked by a friend and former co-worker how I write SQL. At first this caught me by surprise and I assumed there was nothing different, but after a few additional comments on it, it became clear most people have no concept for creating clean readable SQL. So without further adieu here’s how I write SQL, with a built up example query.
First let’s understand an example schema:
# \dt
Schema | Name | Type | Owner
--------+----------------------------+-------+----------------
public | app_rating | table | craig
public | app_recommendation | table | craig
public | app_userprofile | table | craig
public | app_wine | table | craig
public | app_winemakeup | table | craig
public | app_winery | table | craig
public | auth_user | table | craig
The above schema contains wines from wineries, that users give ratings and notes to. Especially relevant is the app_rating table, it contains a variety of things we’re going to want report against:
# \d app_rating
Table "public.app_rating"
Column | Type | Modifiers
------------+--------------------------+---------------------------------------------------------
id | integer | not null default nextval('app_rating_id_seq'::regclass)
user_id | integer | not null
wine_id | integer | not null
rated_at | date | not null
rating | integer | not null
notes | text |
tags | character varying(255)[] |
created_at | timestamp with time zone | not null
Most of the above should be pretty straightforward, though if you’re unfamiliar with Arrays in Postgres check out this earlier article
Given all this data lets say we want to produce some query that for a given wine contains the winery, the average rating, the tags for it. Diving in I’ll typically start by creating each key part then pulling it together. Let’s start with grabbing the average.
But first some basic structure, for maximum readability I make sure to use all caps for reserved SQL words. For a large query I make sure all my columns/conditions are on their own line. So to get the average it would look something like this:
SELECT
avg(rating),
wine_id
FROM
app_wine
GROUP BY
wine_id;
Next I’ll work with the array of tags which has some specific things to Postgres:
SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags;
Finally I’m going to put it all together. This is going to have an additional query to get the winery and the wine name as well. We’re also going to use CTE’s (Common Table Expressions), think of these as temporary views that can make your query more readable:
WITH
wine_ratings as (
SELECT
avg(rating) as rating,
wine_id
FROM
app_rating
GROUP BY
wine_id),
wine_tags as (
SELECT DISTINCT
unnest(tags) as tag,
wine_id
FROM
app_rating
GROUP BY
wine_id, tags),
wine_detail as (
SELECT
app_wine.name as name,
app_wine.id,
app_winery.name as winery
FROM
app_wine,
app_winery
WHERE app_wine.winery_id = app_winery.id
)
SELECT
name,
rating,
array_agg(tag),
winery
FROM
wine_ratings,
wine_detail
LEFT OUTER JOIN
wine_tags ON wine_detail.id = wine_tags.wine_id
WHERE wine_detail.id = wine_ratings.wine_id
GROUP BY
name,
rating,
winery
ORDER BY
rating DESC
One thing to point out, is SELECT
, FROM
and ORDER BY
are followed by a new line. When I have WHERE
multiple conditions I ensure the AND
and the condition occur on the same line. This is intentional to make those easier to read as well as easy to remove/add. The key to allowing it to still be readable is an extra two spaces before the AND
so the condition aligns with the above one. This would appear something similar to:
SELECT foo
FROM bar
WHERE foo.id = bar.foo_id
AND foo.created_at > now() - '7 days'::INTERVAL
And just for an example we get this result from the query:
name | rating | array_agg | winery
-----------------------+--------+--------------------+------------------------
Bordeaux Blend | 5.0 | {'dry', 'smooth'} | Chateau Rahoul
Cabernet Franc | 5.0 | {'chocolate'} | Beaucanon
Cabernet Sauvignon | 5.0 | {'young', 'dry'} | Hawkes
While very long, this should ideally be quite legible.