Pressflow and Varnish, advanced hosting for Drupal
It's been a while since our last post, not because we didn't have anything to talk about (in contrary) but because we were super-busy (we didn't even properly announce our new & redesigned company site). So what did we do? Among other things we were busy moving Merge to Tilburg, and a lot, lot of other exciting developments. More on that in an official announcement in the next few weeks. In the meanwhile let's get back om some developments.
For some years now, we are building and hosting Drupal sites. One of the things we always keep focus on is speed, performance and scalability, which is a delicate balance to keep in a flexible framework like Drupal, and its thousands of modules. Keeping a high performance with high traffic drupal sites demands:
- making the right architectural decisions in Drupal
- knowledge and finetuning of the whole LAMP stack
- optimizing your frontend
Now I won't be talking about no. 1 or 3 (in fact I already wrote a post about Drupal front-end optimization) but instead focus on some hosting issues. Varnish As we all know, Drupal can be pretty heavy on resources. Especially with a lot of modules enabled, and there is just so much you can do about it in Drupal. Of course you can use modules like boost and caching tools like memcached and APC which will help taking a load off your mysql database and CPU, but that won't prevent firing up Apache each time a request comes in, eating valuable and limited memory each time. And under high load (spikes, 'slashdot' -effect) this will most likely grind your server to a halt.
This is were Varnish comes in. Varnish allows for Edge-Side Includes (ESI) which will make your websites rock! Essentially, Varnish stores copies of a page in memory cache, and if that exact page is requested again, it is served immediately back to the user without going to the whole request on your webserver. The results are amazing. Not only are pages served blazingly fast, but it decreases server load significantly. We noticed an immediate and structural drop in server load.
So what's the downside?
Basically, this mostly works for anonymous users. Anytime a user is logged in (and a session is set), it negates the use of Varnish. In fact, this is also the reason you will need Pressflow or Drupal 7 be able to set a reverse proxy like Varnish and to get rid of the standard session cookie which is always set by Drupal 6 (even for anonymous users). But you will most likely want to use Pressflow anyway if you need high performance (see comparison chart). Also you need to know exactly what modules you use that can break Varnish because they set a cookie.
If you need to optimze a high traffic site with a lot of loggedin users, you will need another approach. But there aren't that many options yet (see comparison chart) and Drupal 7 won't deliver more. Most likely you will want to strip everything to the bare essentials (drupal and stack) like described in this impressive presentation by 2bits.
Or - and this is where it gets pretty cutting-edge - you can go for real Edge-Side Includes with Drupal. Using this module, you can set cache (expire time) per-block, user and role. This allow for advanced Varnish caching: instead of caching the whole page, you now can tell which parts (blocks) of the page should have a shorter lifespan.
In conclusion: there are many solutions, and I've even seen services popping up like mushrooms these last weeks which give you the ability to create Drupal sites on specialized hosting (aegir based). If anything it shows how important it is (and challenging) to have decent Drupal hosting. Nobody likes sluggish websites. Not even Google.
update dec'10: apparantly varnish 2.1.4. has a problem with Apache's Keep-Alive function. If you encounter very long page load when you refresh (F5) or post something in Chrome, try turning this setting Off.