Using array_agg in Postgres – powerful and flexible
In almost any application it’s common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it’s not a surprise that there’s a much better way to do this with PostgreSQL.
Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.
An example
Given a project management application, you may have users
who have projects
that have tasks
. An example piece of functionality might be to send an email with a list of all projects that have tasks that are past their due dates of completion. Your schema might look something like this:
# \d users
Table "public.users"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
email | character varying(255) |
...
# \d projects
Table "public.projects"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
user_id | integer | not null
name | character varying(255) | not null
...
# \d tasks
Table "public.tasks"
Column | Type | Modifiers
--------------+-----------------------------+-----------
id | integer | not null
project_id | integer | not null
completed_at | timestamp without time zone |
due_at | timestamp without time zone |
...
To get a list of all projects that have tasks that haven’t been completed, you would start with something like:
SELECT
projects.name
FROM
projects,
tasks
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
This would give you a list of projects which you could then easily join this with users:
SELECT
users.email
projects.name
FROM
projects,
tasks,
users
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
AND users.id = projects.user_id
At this point you’ve got everything you need to pull this up into Ruby, Python, or other language of your choice and then build the full set. However if this is thousands or even hundreds of results you’ll be spending more time than necessary, grouping this data for a sensible email. With 3 other small changes you can have this already formatted for you to immediately send of in an email. The first is using a handy function called array_agg
which will aggregate items and then you can format them how you wish. The second is just ensuring you’re grouping correctly. Finally you’ll want to unnest the array so it formats the data in a clean way for you.
Looking at it all put together:
SELECT
users.email,
array_to_string(array_agg(projects.name), ',')) as projects
FROM
projects,
tasks,
users
WHERE projects.id = tasks.project_id
AND tasks.due_at > tasks.completed_at
AND tasks.due_at > now()
AND users.id = projects.user_id
GROUP BY
users.email
This would give you a nice clean result of projects that have overdue tasks that you could then send to the user in an email:
email | projects
---------------------------+-------------------
craig.kerstiens@gmail.com | blog, timetracker
craig@heroku.com | foo, bar, baz