(Alternative title to this post: Rails Caching Enlightenment Through Perl)
Server Crashes Are Bad
Last night I was up coding, but sadly, not rails code. Yesterday I had a call from a client telling me that the website had crashed, and their clients were pissed off because this happened just when they had put out their weekly newsletter. The server (a virtual hosted system hosted at GoDaddy) had locked up tight, rendering all the sites that were on the server un-reachable.
The server was rebooted, and digging into it there was no clear reason why it had crashed, which is of course not what they wanted to hear. I had a development server set up at home, so I did a bit of testing to see if I could duplicate it. The server was old and slow (not even dual core), but usable as a Linux workstation, so I figured that if I could make the site feel faster here, the gains on the production server would be huge.
Determining A Baseline
First was to hit it with the trusted Apache Benchmark utility, I started with a perfectly reasonable 10 concurrent connections for 1000 hits hitting the landing page that was sent out in the newsletter, and probably one of the more intensive pages DB and logic wise on the site.
$ ab -c 10 -n 1000 “http://site.com/page.html?id=117”
Looking at my system monitor, I immediately saw my CPU use jump to 100%, disk access go from a blip here and there to constant, and the machine ground to a halt.
I hit ctrl-c pretty fast.
OK, problem found, the site is a vampire and sucks the life out of the server. I played with a few different settings and eventually ended up using 5 concurrent connections and 20 hits “-c 5 -n 20” which gave me an average 2000-4000ms to serve pages, about 1 request a second.
Horrible right? Before you put me against the wall to keep me from doing any web programming again, remember this is a really old server. Please? Maybe one smoke before it’s time for me to go?
Small Fixes, Small Gains
So there were three things that I figured I needed to look at:
- Use the YSlow plugin to find some small gains
- Finally see if there are any just bad code that could be refactored out, loops within loops, useless re-calculation, etc.
- Re-examine the number of queries going on on the page
YSlow gave me a few things to do. Setting the expires header for images, moving CSS and JS to the top and bottom of the page, and a couple of other minor things that gave me no real gains via ab.
Surprisingly, there weren’t any low hanging fruit for bad code or useless loops within loops. This sucked, mostly because that meant me going through and re-looking at SQL and refactoring that, which I’m not sure about you, but that doesn’t sound like fun to me.
Somewhat more surprisingly there were only a couple of extra queries, mostly related to the ORM I was using, Class::DBI and some just silly things. Sadly none of these gave me any more gains.
One thing I did find was where the issues were. When I commented out the main grid of items that is the focus on the page, the page response went from 2000-4000ms response time to 200. Hmm…, so what to do with this. What if I could make it so the time to generate the main product grid didn’t happen? So I commented out the dynamic code, and copied in the HTML produced (from the view source window in firefox) to see if it was the dynamic generation (which wasn’t really that complex from what I could see). Again, 200-400ms time, serving 9-10 requests a second, with almost no CPU or disk impact.
OK, so I thought what if there was a way to pre-generate the HTML periodically, and then have the perl code load that instead of doing it dynamically each time. That almost sounds like….. “caching“. Huh, almost like something that should be built in.
Honestly my experiences with caching have been minimal, most of the time I am trying to prevent caching (for re-uploaded images with the same filename, that sort of thing), and also it just hadn’t come up yet, probably because most of the sites I have worked on don’t get huge enough traffic to require it. Luckily I had just read something about Caching in the HTML::Mason developer docs while finding some information for something else.
HTML::Mason has the concept of “components”, similar to partials in the rails world. You’d call something like this:
blah blah < & "/comp/gallery.mc", id => $id, page => $cur_page, title => "TiR" &> blah blah
When the page is rendered it would call the gallery.mc component with the given arguments, render it, running whatever code is in there (HTML::Mason isn’t the nice separated MVC that Rails is, so there’s potentially lots of controller code in your pages and components) and replacing the < & &> with the output. The documents have a nice section on the built in page and component (think fragment) caching where all you need to do is to add this code to the top of your component’s “init” section:
return if $m->cache_self(key => 'fookey', expires_in => '3 hours', [other options...] );
This lets your component see if it’s already in the cache, and not expired, and if it is, serves that, and if not, renders and then caches itself with the given key. The only tricky part is figuring out the right cache key to ensure it’s unique for each section of code. I ended up writing something like this:
$key = "gallery|$id|$cur_page|$title"; return if $m->cache_self(key => $key, expires_in => '10 minutes', [other options...] );
This makes the cache key a hash of the arguments sent to the component, ensuring that each differently rendered version of the page will get a different cache key. Not perfect I’m sure, but a nice mix of good caching and safety.
Running ‘ab’ again I found that while the first couple of requests still took 2000-4000ms to run, subsequent pages were served in the 200-400ms range, and the CPU and disk load was way down.
WTF – This is a Rails Blog
So why am I telling you all this Perl stuff? It’s because this is related more to web programming and programmer mindset than Perl or HTML::Mason. You could replace “perl” with “ruby”, “component” with “partial” and “HTML::Mason” with “Rails” and get the same idea.
Because everything ran fine when the site was under development and only two or three people were hitting it I didn’t have to worry about caching or performance issues. In fact, I didn’t even think about performance because thigns “just worked”. When things did go badly (again, server crashes == pissed off clients), I had to scramble to find a solution (luckily only one night of work).
I’m still doing testing with the new caching code, but I expect to put it online tonight or tomorrow, and look forward to the before and after numbers on the production server.
My lessons learned:
- Watch from the start for cachable pieces of code. Big complex SQL queries or complex logic that can be created once a week, day or even every minute is a candidate. In Rails it can be as simple as surrounding the code with < % cache do %> .. < % end %>.
- Test performance from the onset. Learn to love Apache Benchmark and start hitting your sites potential hot pages from the start, and watch and learn what causes reponsiveness to go down.
For those of you wanting some actual rails resources to learn more about this stuff, have a look at the following:
- Understanding ‘ab’ results – Nice resource for how to read that output.
- Caching with Rails – The rails guides documentation with details on page, fragment, action caching and everything in between.
- Rails 2.1 Caching – A bit older, but a nice list of the caching capabilities introduced and available in Rails 2.1, still pretty relevant.
- The Scaling Rails Podcast Series – Fantastic information in here, I recommend watching all of them, if you can’t, hit #2, 3, 5, 6, 7 for caching, and then #15 and 16 for load testing with ab and friends.