Friday 23 May 2014

pgbouncer, Django & Heroku.

TL;DR handle more traffic with pgbouncer & postgres on Heroku

If you use Django on Heroku and your traffic starts to climb to dizzying levels (say 4k requests per minute) you will discover (the hard way) that Django opens a database connection for each and every request. This goes largely unnoticed at small levels of traffic because it doesn’t matter, but as your app grows in popularity you will, at some point, fail in a very troubling way.

Ouch. Four spells of downtime, massive spikes in db connections. There is only one thing for it - pooled database connections. This post about Fixing database connections in Django has three very solid suggestions in it, but unfortunately the first two didn’t want to play with Django 1.6 and to be honest, I wanted to go down the route of having something that was dedicated to dbpooling and not tied into the app. So this post is about how I got pgbouncer to play nicely on Heroku and it assumes you have a working project already using postgres.

I actually had much more trouble setting this up on my local machine that I did deploying to production on Heroku. You don’t have to install it locally, but I was reluctant to deploy something I hadn’t tinkered with, so I opted to go local first.

Local Dev

First of all you need to get it working locally so that you can perform local, bog standard, one person at a time tests. For me, I had to install pgbouncer –

brew install pgbouncer
# load at launch - prevents 20 mins wondering why it isn't working
ln -sfv /usr/local/opt/pgbouncer/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.pgbouncer.plist

Test that it is available –

pgbouncer
>>> Need config file.  See pgbouncer -h for usage.

Smashing, but out of the box it’s not going to work as the config file has no database settings in it and the paths to the log files don’t exist so it will not start correctly.

Edit the default file ( as installed by homebrew ) at /usr/local/etc/pgbouncer.ini. This is very much based on first steps with pgbouncer. I use the postgres.app from Heroku so my settings are for that–

# backup the original first
cp /usr/local/etc/pgbouncer.ini{,.backup}
# edit it, I use vim, feel free to use mate|subl etc
vi /usr/local/etc/pgbouncer.ini

The pgbouncer usage docs have all the things, but I whipped this up in vim, needless to say, this is not for production but local development–

[databases]
* = host=localhost port=5432 user=

[pgbouncer]
logfile = /usr/local/var/log/pgbouncer.log
pidfile = /usr/local/var/run/pgbouncer.pid
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = any

You’ll likely need to restart pgbouncer now and I’d recommend that you start it without launchctl and tail the logfile to make sure all works well. I had three tabs open in iTerm when I was doing this –

# tab one - unload (and make sure) that pgbouncer is dead, then start pgbouncer
# unload
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.pgbouncer.plist
# just to be sure
killall pgbouncer
 # now start it
pgbouncer /usr/local/etc/pgbouncer.ini
>>> 2014-05-23 14:02:38.930 73512 LOG File descriptor limit: ...
>>> 2014-05-23 14:02:38.934 73512 LOG listening on ...
>>> 2014-05-23 14:02:38.935 73512 LOG listening on ...
>>> 2014-05-23 14:02:38.935 73512 LOG process up ....

# tab two - tail that log file baby
tail -f /usr/local/var/log/pgbouncer.log
>>> BLAH FOO BAR # lots of things

# tab three - connect
psql -p 6432
>>> psql (9.3.1)
>>> Type "help" for help.
# win.
yourusername=#

Now disconnect, ctrl-c the tail and the pgbouncer process and hand it back over to launchctl, and double check you can still connect –

# give it back to launchctl
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.pgbouncer.plist

# connect again
psql -p 6432
>>> psql (9.3.1)
>>> Type "help" for help.
yourusername=#

If this all worked for you, then head over to your database settings for your django project and update the port to use the pgbouncer port ( 6432 ) and not the default postgres port (5432) . This isn’t a lesson on django settings, so do what you got to do to seperate the local/dev/staging/production settings from one another! Suffice to say your django project should function as you expect.

Onwards.

Configure for Heroku.

This S.O. question was very helpful for the gist on the direction to take. Basically you have to –

  1. Enable and configure multi-buildpacks
  2. Update your Procfile to run pgbouncer before your processes

Seems easy right? Actually it is and between this very helpful post on S.O and the heroku-buildpack-pgbouncer github page we have all of the information we need.

First, we need to activate multiple buildpacks on our app, to run these commands you’ll need the Heroku toolbelt installed –

heroku config:add BUILDPACK_URL=https://github.com/ddollar/heroku-buildpack-multi.git -a YOUR_APP

Next, you need to create a .buildpacks file and into it you place the multiple buildpack URLS. In my case I was using pgbouncer and python –

https://github.com/gregburek/heroku-buildpack-pgbouncer.git#v0.3.0
https://github.com/heroku/heroku-buildpack-python

Finally, update your Procfile to put the pgbouncer binary call before your other stuff like this –

web: bin/start-pgbouncer-stunnel newrelic-admin run-program python ...
worker: bin/start-pgbouncer-stunnel  newrelic-admin run-program celery ....

When I deployed this to staging it went so smoothly, I doubted that it had even taken place. I was wrong, it was in and working. It’s not much, but here is a screenshot of log2viz simulation of about 7.6K RPM at a constant load of 500-600 connections per second. It was on a stack that was in comparison pitiful compared to a production stack that was getting going down at about 2.2K RPM.

Yes, the the results for median and 95th percentile in the above are gruesome, and would never be tolerated in production, but pushing two 1x dynos like this with only a few lines of code; well, It really was an easy, easy win.

Deploying to production was a breeze. I even through in a quick load test that help up 5K RPM and it didn’t even make the app blink –

I know there is more to these kinds of metrics than just a hard number of requests per minute; indeed the quality of a response (time, perception of snappiness etc etc) is just as important as quantity. But it’s great to know that you faced a challenge, felt totally and utterly destroyed by it, picked yourself back up and moved on. It’s also great to understand and know that this is how you learn and that sometimes, learning can be a horrible, hard, cold experience.

Just remember that in face of adversity, the only thing you can really control is your reaction to the situation.

But 5K RPM because reasons. K THX BYE.