Skip to main content

Category: Postgres

Pivoting in Postgres

Earlier today on an internal Heroku group alias there was a dataclip shared. The dataclip listed off some data grouped by a category, there was a reply a few minutes later with a modification to the query that used the crosstab function to pivot directly in SQL. There were immediately several reactions on the list that went something like this:

mindblown

While a mostly simple function in Postgres (there are a few rough edges), it really is all too handy. So here it is in action. Taking some data that looks like

  • row identifier, in this case date
  • category grouping, in this case OS
  • value

Javascript Functions for PostgreSQL

Javascript in Postgres has gotten a good bit of love lately, part of that is from Heroku Postgres recently adding support for Javascript and part from a variety of people championing the power of it such as @leinweber (Embracing the web with JSON and PLV8) and @selenamarie (schema liberation with JSON and PLV8). In a recent conversation it was pointed out that it seems a bit of headache to have to create your own functions, or at least having an initial collection would make it that much more powerful. While many can look forward to PostgreSQL 9.3 which will have a bit more built in support for JSON a few functions can really help make it more useful today.

Explaining your PostgreSQL data

I’ve written a bit before about understanding the output from EXPLAIN and EXPLAIN ANALYZE in PostgreSQL. Though understandably getting a grasp on execution plans could probably use some more guidance. Yet, this time around I’m taking a bit of a cop out and highlighting a few tools instead of documenting myself, which I’ve done in a talk I’ve frequently given Postgres Demystified.

Explain explained

Postgres Indexing - A collection of indexing tips

Even from intial reviews of my previous post on expression based indexes I received a lot of questions and feedback around many different parts of indexing in Postgres. Here’s a mixed collection of valuable tips and guides around much of that.

Unused Indexes

In an earlier tweet I joked about some SQL that would generate the SQL to add an index to every column:

# SELECT 'CREATE INDEX idx_' 
    || table_name || '_' 
    || column_name || ' ON ' 
    || table_name || ' ("' 
    || column_name || '");' 
  FROM information_schema.columns;
                      ?column?
---------------------------------------------------------------------
 CREATE INDEX idx_pg_proc_proname ON pg_proc ("proname");
 CREATE INDEX idx_pg_proc_pronamespace ON pg_proc ("pronamespace");
 CREATE INDEX idx_pg_proc_proowner ON pg_proc ("proowner");

The reasoning behind this is guessing whether an index will be helpful can be a bit hard within Postgres. So the easy solution is to add indexes to everything, then just observe if they’re being used. Of course you want to add it to all tables/columns because you never know if core of Postgres may be missing some needed ones

Postgres Indexes – Expression/Functional Indexing

Postgres is rich with options for indexing. First you’ve got a variety of types, and beyond that you can do a variety of things with each of these such as create unique indexes, use conditions to index only a portion of your data, or create indexes based on complex expressions or functions. In cases where you commonly use various PostgreSQL functions in your application or reporting you can get some great gains from this.

My SQL Bad Habits

I’m reasonably proficient at SQL – a coworker when pseudocoding some logic for him pointed out that my pseudocode is what he thought was executable SQL. I’m fully capable of writing clear and readable SQL – which most SQL is not. Despite that I still have several bad habits when it comes to SQL. Without further adieu heres some of my dirty laundry so hopefully others can not make the same mistakes.

Using array_agg in Postgres – powerful and flexible

In almost any application it’s common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it’s not a surprise that there’s a much better way to do this with PostgreSQL.

Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.

Getting more out of psql (The PostgreSQL CLI)

After my last post I had a variety of readers reach out about many different tweaks they’d made to their workflows using with psql. One people Grégoire Hubert had a wondeful extensive list of items. Grégoire has been a freelance in web development and he has worked with Postgresql for some time now in addition to being the author of Pomm. Without further ado heres what he has to say on how he uses psql:

Get the most of psql

Psql, the CLI postgreSQL client, is a powerful tool. Sadly, lot of developers are not aware of the features and instead look for a GUI to provide what they need. Let’s fly over what can psql do for you.

How I work with Postgres – psql, My PostgreSQL Admin

On at least a weekly basis and not uncommonly multiple times in a single week I get this question:

@neilmiddleton I’ve been hunting for a nice PG interface that works within other things. PGAdmin kinda works, except the SQL editor is a piece of shit

Sometimes it leans more to, what is the Sequel Pro equivilant for Postgres. My default answer is I just use psql, though I do have to then go on to explain how I use it. For those just interested you can read more below or just get the highlights here:

  • Set your default EDITOR then use \e
  • On postgres 9.2 and up \x auto is your friend
  • Set history to unlimited
  • \d all the things

Before going into detail on why psql works perfectly fine as an interface I want to rant for a minute about what the problems with current editors are and where I expect them to go in the future. First this is not a knock on the work thats been done on previous ones, for their time PgAdmin, phpPgAdmin, and others were valuable tools, but we’re coming to a point where theres a broader set of users of databases than ever before and empowering them is becoming ever more important.

Empowering developers, DBA’s, product people, marketers and others to be comfortable with their database will lead to more people taking advantage of whats in their data. pg_stat_statements was a great start to this laying a great foundation for valuable information being captured. Even with all of the powerful stats being captured in the statistics of PostgreSQL so many are still terrified when they see something like:

                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=4.25..8.62 rows=100 width=107) (actual time=0.126..0.230 rows=100 loops=1)
   Hash Cond: (purchases.user_id = users.id)
   ->  Seq Scan on purchases  (cost=0.00..3.00 rows=100 width=84) (actual time=0.012..0.035 rows=100 loops=1)
   ->  Hash  (cost=3.00..3.00 rows=100 width=27) (actual time=0.097..0.097 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 6kB
         ->  Seq Scan on users  (cost=0.00..3.00 rows=100 width=27) (actual time=0.007..0.042 rows=100 loops=1)
 Total runtime: 0.799 ms
(7 rows)

Empowering more developers by surfacing this information in a digestable form, such as building on top of pg_stat_statements tools such as datascope by @leinweber and getting this to be part of the default admin we will truly begin empowering a new set of user.

But enough of a detour, those tools aren’t available today. If you’re interested in helping build those to make the community better please reach out. For now I live in a work where I’m quite content with simple ole psql here’s how:

More on Postgres Performance

If you missed my previous post on Understanding Postgres Performance its a great starting point. On this particular post I’m going to dig in to some real life examples of optimizing queries and indexes.

It all starts with stats

I wrote about some of the great new features in Postgres 9.2 in the recent announcement on support of Postgres 9.2 on Heroku. One of those awesome features, is pg_stat_statements. Its not commonly known how much information Postgres keeps about your database (beyond the data of course), but in reality it keeps a great deal. Ranging from basic stuff like table size to cardinality of joins to distribution of indexes, and with pg_stat_statments it keeps a normalized record of when queries are run.