Archive for the ‘Code’ Category

When and Where to use Caching In Rails

Tuesday, May 11th, 2010

(Alternative title to this post: Rails Caching Enlightenment Through Perl)

This entire post will either make you think I’m a horrible web programmer, or hopefully, show you the deep and meaningful insights that I’ve managed to eke out from the experience.

Server Crashes Are Bad
Last night I was up coding, but sadly, not rails code. Yesterday I had a call from a client telling me that the website had crashed, and their clients were pissed off because this happened just when they had put out their weekly newsletter. The server (a virtual hosted system hosted at GoDaddy) had locked up tight, rendering all the sites that were on the server un-reachable.

The server was rebooted, and digging into it there was no clear reason why it had crashed, which is of course not what they wanted to hear. I had a development server set up at home, so I did a bit of testing to see if I could duplicate it. The server was old and slow (not even dual core), but usable as a Linux workstation, so I figured that if I could make the site feel faster here, the gains on the production server would be huge.

Determining A Baseline
First was to hit it with the trusted Apache Benchmark utility, I started with a perfectly reasonable 10 concurrent connections for 1000 hits hitting the landing page that was sent out in the newsletter, and probably one of the more intensive pages DB and logic wise on the site.

$ ab -c 10 -n 1000 “http://site.com/page.html?id=117”

Looking at my system monitor, I immediately saw my CPU use jump to 100%, disk access go from a blip here and there to constant, and the machine ground to a halt.

I hit ctrl-c pretty fast.

OK, problem found, the site is a vampire and sucks the life out of the server. I played with a few different settings and eventually ended up using 5 concurrent connections and 20 hits “-c 5 -n 20” which gave me an average 2000-4000ms to serve pages, about 1 request a second.

Horrible right? Before you put me against the wall to keep me from doing any web programming again, remember this is a really old server. Please? Maybe one smoke before it’s time for me to go?

Small Fixes, Small Gains
So there were three things that I figured I needed to look at:

  1. Use the YSlow plugin to find some small gains
  2. Finally see if there are any just bad code that could be refactored out, loops within loops, useless re-calculation, etc.
  3. Re-examine the number of queries going on on the page

YSlow gave me a few things to do. Setting the expires header for images, moving CSS and JS to the top and bottom of the page, and a couple of other minor things that gave me no real gains via ab.

Surprisingly, there weren’t any low hanging fruit for bad code or useless loops within loops. This sucked, mostly because that meant me going through and re-looking at SQL and refactoring that, which I’m not sure about you, but that doesn’t sound like fun to me.
Somewhat more surprisingly there were only a couple of extra queries, mostly related to the ORM I was using, Class::DBI and some just silly things. Sadly none of these gave me any more gains.

One thing I did find was where the issues were. When I commented out the main grid of items that is the focus on the page, the page response went from 2000-4000ms response time to 200. Hmm…, so what to do with this. What if I could make it so the time to generate the main product grid didn’t happen? So I commented out the dynamic code, and copied in the HTML produced (from the view source window in firefox) to see if it was the dynamic generation (which wasn’t really that complex from what I could see). Again, 200-400ms time, serving 9-10 requests a second, with almost no CPU or disk impact.

OK, so I thought what if there was a way to pre-generate the HTML periodically, and then have the perl code load that instead of doing it dynamically each time. That almost sounds like….. “caching“. Huh, almost like something that should be built in.

Enter Caching
Honestly my experiences with caching have been minimal, most of the time I am trying to prevent caching (for re-uploaded images with the same filename, that sort of thing), and also it just hadn’t come up yet, probably because most of the sites I have worked on don’t get huge enough traffic to require it. Luckily I had just read something about Caching in the HTML::Mason developer docs while finding some information for something else.

HTML::Mason has the concept of “components”, similar to partials in the rails world. You’d call something like this:

blah blah
< & "/comp/gallery.mc", id => $id, page => $cur_page, title => "TiR" &>
blah blah

When the page is rendered it would call the gallery.mc component with the given arguments, render it, running whatever code is in there (HTML::Mason isn’t the nice separated MVC that Rails is, so there’s potentially lots of controller code in your pages and components) and replacing the < & &> with the output. The documents have a nice section on the built in page and component (think fragment) caching where all you need to do is to add this code to the top of your component’s “init” section:

return if $m->cache_self(key => 'fookey', expires_in => '3 hours', [other options...] );

This lets your component see if it’s already in the cache, and not expired, and if it is, serves that, and if not, renders and then caches itself with the given key. The only tricky part is figuring out the right cache key to ensure it’s unique for each section of code. I ended up writing something like this:

$key = "gallery|$id|$cur_page|$title";
return if $m->cache_self(key => $key, expires_in => '10 minutes', [other options...] );

This makes the cache key a hash of the arguments sent to the component, ensuring that each differently rendered version of the page will get a different cache key. Not perfect I’m sure, but a nice mix of good caching and safety.

Running ‘ab’ again I found that while the first couple of requests still took 2000-4000ms to run, subsequent pages were served in the 200-400ms range, and the CPU and disk load was way down.

WTF – This is a Rails Blog
So why am I telling you all this Perl stuff? It’s because this is related more to web programming and programmer mindset than Perl or HTML::Mason. You could replace “perl” with “ruby”, “component” with “partial” and “HTML::Mason” with “Rails” and get the same idea.

Because everything ran fine when the site was under development and only two or three people were hitting it I didn’t have to worry about caching or performance issues. In fact, I didn’t even think about performance because thigns “just worked”. When things did go badly (again, server crashes == pissed off clients), I had to scramble to find a solution (luckily only one night of work).

I’m still doing testing with the new caching code, but I expect to put it online tonight or tomorrow, and look forward to the before and after numbers on the production server.

Lessons Learned
My lessons learned:

  • Watch from the start for cachable pieces of code. Big complex SQL queries or complex logic that can be created once a week, day or even every minute is a candidate. In Rails it can be as simple as surrounding the code with < % cache do %> .. < % end %>.
  • Test performance from the onset. Learn to love Apache Benchmark and start hitting your sites potential hot pages from the start, and watch and learn what causes reponsiveness to go down.

Resources
For those of you wanting some actual rails resources to learn more about this stuff, have a look at the following:

  • Understanding ‘ab’ results – Nice resource for how to read that output.
  • Caching with Rails – The rails guides documentation with details on page, fragment, action caching and everything in between.
  • Rails 2.1 Caching – A bit older, but a nice list of the caching capabilities introduced and available in Rails 2.1, still pretty relevant.
  • The Scaling Rails Podcast Series – Fantastic information in here, I recommend watching all of them, if you can’t, hit #2, 3, 5, 6, 7 for caching, and then #15 and 16 for load testing with ab and friends.
Any other resources or hints as to how to deal with caching in Rails (or Perl for that matter! :) ?

Evolving A Simple Twitter to Blog Ruby Program Part 1

Tuesday, May 4th, 2010

In a “quick” and dirty exercise I built a little ruby program to grab my twitter posts and colate them into a list for posting a “tweets of the week” type blog post.  It was a lot more about figuring out how to do it than the actual output (I’d hope you just follow me on twitter than rely on me posting my tweets here).  Tonight at the FV.rb meeting @dkubb helped me a lot in pointing out some glaring non-rubyisms and I thought that going through some of the changes might help others moving from “old school” programming (structured, functional, perl-y) to “new hotness” programming (object oriented, yeilds, and awesomeness).

I started out with this code in a gist.  Nothing hugely bad, it pulls in either a URL or a file, parses the XML for the status updates, and for each one does some HTML replacement (@user, #hash, and URLs get auto-linked) and then spits out some HTML that can be copied and pasted into a blog entry.

Starting Off

The first step was figuring out how to parse the XML.   A bit of googling found some possibilities.  Hpricot, libxml-ruby, and Nokogiri.  The first post I saw noted that libxml-ruby was the fastest, which makes sense as it’ll be pretty close to the bare metal C libraries, so I took a run with that.  Not great success, the biggest challenge was figuring out how Ruby dealt with XML structure.  There was a lot of mucking around in IRB.

ruby-1.8.7-p249 > require 'xml'
 => true
ruby-1.8.7-p249 > parser = XML::Parser.file('twitter.xml')
 => #<LibXML::XML::Parser:0x101170140 @context=#<LibXML::XML::Parser::Context:0x101170168>
ruby-1.8.7-p249 > doc = parser.parse
# snip xml spew to STDOUT
ruby-1.8.7-p249 > doc.class
 => LibXML::XML::Document
# Hmm.... does find work?
ruby-1.8.7-p249 > s = doc.find('/status')
 => #<LibXML::XML::XPath::Object:0x1018ff398>
ruby-1.8.7-p249 > s.methods
# snip list of methods, and searching for what to do
ruby-1.8.7-p249 > s.each { |node| puts node.class }
 => nil
ruby-1.8.7-p249 > s.each { |node| puts node.inspect }
 => nil
# WTF? OK, so what now then?

It was a bit frustrating, though probably mostly because I just didn’t grok how the XML was being represented internally, and thinking of it more like a Perl hash-of-hashes than whatever libxml-ruby was using.  So I moved on.  (Ironically while re-doing some of this for this article I went back and was running the commands figuring now I would get it, and failed miserably :)

Next I looked at Hpricot, but the syntax in the readme and examples scared me away.

Starting Progress on the First Iteration

Someone at work suggested that Nokogiri was the way to go, and realizing that parsing a few kb of XML probably wasn’t going to run me into any performance issues, I took a run at it with this.  I soon found that having a static XML file would be the easiest for testing, so I saved twitter.xml in the same directory as I was running IRB out of and played some more.

Much better.  Then to find out to get a list of the statuses:

ruby-1.8.7-p249 > require 'nokogiri'
 => true
ruby-1.8.7-p249 > doc = Nokogiri::XML(File.new('twitter.xml'))
# snip
ruby-1.8.7-p249 > doc.class
 => Nokogiri::XML::Document
ruby-1.8.7-p249 > doc.xpath('/status')
 => []
ruby-1.8.7-p249 > doc.xpath('//status')
# snip lots more xml spew and more testing until...
ruby-1.8.7-p249 > doc.xpath('//status').each { |node| puts node.xpath(".//text").first.content }
# snip lovely output of each of the tweets in the xml file

Ok, so now I could run “.each()” on this, having discovered that the xpath() function basically allowed me to get a list of XML nodes with that path, and then I could get a list of node data from that, remembering to use the “start from current node” syntax (using the ‘.’ to represent the current location in the tree).

The next steps were (relatively) easy.  Looping through each status, get some information (time, status ID, content, etc), format that into HTML, find and implement a couple of “convert @user to an HTML link” bits of code I found online, and voila, first iteration was completed and working.

Now With Some Expert Advice

So after reading the What I wish I had been told a year ago post, I figured the next stage was to convert it to a class, make it more ruby-y, and give it some tests.  Dan Kubb of DataMapper fame thankfully answered my question to help and moved me on to this current version with some helpful advice.

I’ll continue this later on this week with Part 2, in which I iterate into more awesomeness!

Is A Site For Ruby Idioms Needed? [Update: Yes!]

Monday, May 3rd, 2010

I’ve noticed lately that there is definitely a “Ruby way” to write Ruby code. When I first read Effective Perl Programming years and years ago I went from writing code that looks like this (note that I know the “FILE” is wrong, but the wordpress auto-syntax highlighter thingy doesn’t seem to deal well with the correct bracketed syntax):

while (my $line = FILE ) {
    if( $line =~ /foo/ ) {
		print "$line";
	}
}

(which isn’t all that un-perl-y to begin with, but that’s 15 years of perl in my brain stopping me from writing really un-idiomatic code) to far more idiomatic:

while (FILE) {
	print if /foo/
}

The point being that there are certain conventions and ways that your programming style will adapt to the given language. The following loop

for (i = 0; i &lt; 10; i++) { }

Is perfectly natural in C, but if you saw it in perl or ruby, while it might be perfectly valid, it would look way out of place in either language, and you’d get funny looks if you presented it to a code review.

In ruby some of these would be to not do this:

t.gsub!(/(http|https)biglonguglyandhardlyworksregex/, "<a href="\">\1</a>")

But instead do this:

    URI.extract(t, %w[ http https ftp ]).each do |url|
      t.gsub!(url, "<a href="\">#{url}</a>")
    end

Or maybe instead of this

sum = 0
list = [1,2,3]
for i in list do
  sum += i
end
# sum is now 6

you use the must more awesomeer (yes it’s a word)

list = [1,2,3]
sum = list.inject(0) { |s,v| s + v } # sum =&gt; 6

Of course, you can write bad code in almost any language pretty easily (note how I’m not making a php/python/whitespace/brainfuck joke here!)

So my question is this.  Is there some grand collection of these Ruby idioms?  Is there a need for them?  Would a fusion of StackOverflow and Refactor My Code be a useful collection to have somewhere?  Or are the resources out there (which are a bit scattered) good enough?

So far I’ve found:

Alternatively there are a lot of places where you can read other’s code to learn idiomatic Ruby by osmosis.  Github, RefactorMyCode, the popular Gems in the community, The Ruby Quiz are all great resources for this.

My vision is a melding of StackOverflow (maybe using their new StackExchange community software?) and Refactor My Code where you can search for an idiom or programming operation based on code, Class, or tag, vote, and comment or submit a different version.  Sort of like a Perl Golf contest except instead of the fewest keystrokes being the goal it’s the cleanest/nicest/most effective way of doing the operation.

So what do you think, would this be useful to work on with the Ruby community, or is there enough information out there already that is google-able enough?  Everyone will also have their own way of doing things, but Ruby is an opinionated language (or is that only Rails?) so maybe there is One (or two) “correct” ways to do things.

Your thoughts appreciated.

Update – So a bit of discussion here, and lots of great comments on proggit show me that this idea does deserve a go of it.  I’ve registered ruby-idioms.com (pointed here for now) and hope to have something up in the next couple of weeks, and will take a few beta tests to have a run at it.  Keep an eye here for any news by subscribing to the RSS or following me on Twitter.  Thanks everyone!

More on Using Enums For Constant Data in Rails

Monday, April 19th, 2010

So I got around to rollowing tip #3 on using Enums in AR, and found it worked…. mostly.  The problem comes in where the value isn’t already set.  I started with this in my model:

# game.rb
  # at the top of the file, define the list of genders
  GENDERS = %w( boy girl coed )
  # and validations for it
  validates_inclusion_of :gender,   :in => Game::GENDERS, :on => :create, :message => "extension %s is not included in the list"
 
  # finally define the gender as a symbol for lookups
  def gender
     read_attribute(:gender).to_sym
  end
  def gender=(value)
    write_attribute(:gender, value.to_s)
  end

This works fine until you have a nil value for gender.  OK, next step, just check if the value is nil before you read it and return nil if it is.  Only thing is that if you were to add something like

if self.gender.nil? return nil

But then you get an ugly “stack level too deep” error, because when you call ‘self.gender’ it’s calling the gender method, which checks to see if self.gender is nil, which calls the gender method, which… well, you get the picture.

Took a bit of looking, and I’m not sure if this is the “correct” solution, but it does work properly.  I just modified the gender method as such:

#game.rb
  def gender
    attributes = attributes_before_type_cast
    if attributes["gender"]
      read_attribute(:gender).to_sym
    else
      nil
    end
  end

This uses the attributes_before_type_cast grabs all attributes into a hash (before they are mangled by whatever ActiveRecord does), checks to see if the ‘gender’ attribute is filled in and either returns it or nil.  Depending on if you’re learning or not, you may want to just check out the activerecord_symbolize plugin though :)

All working, and ready to commit to the main branch.

Using Enums For Constant Data in Rails and ActiveRecord

Friday, April 16th, 2010

I’ve been wondering what to write up for todays entry, in an effort to keep up the “post a day” discipline.   I have about eight drafts sitting in the wordpress queue that are either incomplete thoughts or completely uninteresting posts.  So I figured I’d take a look at what my next challenge is in my own project.  After all, that’s what this site is for in some ways… a muse to get my off my butt and back into Rails code.

The last request that came in was for the ability to take a game object with a non-boolean, but discreet bit of data to specify if it is a boys only, girls only, or coed game.  A constant you might say.

Back in the old perl world I’d probably look at doing something like adding an integer field to the database and then just remember that 1 = boys, 2 = girls and 0 = coed.  Or something.  Or was it 3 = coed?  You see the problem here though.  Having to remember the settings, and make sure that all the different files output the same values for the same settings… ugh.  Even putting an output filter of some sort (or in rails parlance, a helper), is a bit of an ugly situation because that doesn’t help you with your database layer either.  I might also have used the “Constant” or “ReadOnly”modules to do the same, or even gone with the more complex solution is to create a foreign key into another table called “gamegendertype” or something that just has the 1 = boys, 2 = girls, 3 = coded, but that’s unneeded complexity, database lookups/joins, etc (though it doesmake it easier to add another data type).

So I dug into it to find out what the canonical “rails way” was to do it.  And that search led to enums.  Enums allow you to specify your own discreet data types that can be searched on as if they were integers or the like, but are words like “Diamonds” “Hearts” “Clubs” and “Spades” for a solitaire game.

Rails doesn’t seem to have a built in way to do it, but the way that the various links I found had you do it were to store the string value in the backend database, but turn that into a :symbol for when it’s accessed to and from the front end.  A link is worth a thousand words though :)

For my project I’d rather not use a plugin (trying to learn by doing and all) so I’ll probably go with the virtual attribute method (last link).  Keep an eye on the commit log on my github account to see where it goes :)

Refactoring With Helpers

Wednesday, April 14th, 2010

Bit of a follow up to my last post, with mucho thanks to the guys at the FV.rb who helped out, @pennyminder and @dkubb.

I originally figured I needed to do some dynamic methods like you can do in Perl to auto-create methods, but it turns out the better way to do it nicely (or more nicely) was refactor my previous code is two areas.

First is to move the view logic (ie: each user having a URL attached to them) out of the controller and into a helper and to reverse it, so instead of having  “<rolename>?” method for each user type (ie: user.referee? user.assignor?) having a nice little method that just resolves the role path for the user.

  # application_helper.rb
  def user_role_path(user)
    role_path = {
      'clubadmin' =&gt; '/some/admin/path',
      'referee'   =&gt; bids_path(user),
      'assignor'  =&gt; assignor_path(user),
    }
    role_path[user.role.name.downcase]
  end

I don’t like having to have the hash in here, but since that is view specific, it seems to work nicely. Anyway, this gives me the ability to call this view specific code in my view and by using the following I can output the correct path based on the user.

# _header.html.erb
&lt;%= link_to "My Page", user_role_path(session[:user]) %&gt;

Also whereas the application helpers get access to all the “view” stuff, ie: it knows how to resolve something like bids_path(user), the controllers don’t have the RESTful functions, but can if you pull in the helper code. IE:

# users_controller.rb
  # pull in the helper functions from application_helper.rb
  include ApplicationHelper
  [...]
  def login
    # [...] authenticate
    # and now redirect....
    redirect_to user_role_path(session[:user])
  end

Not sure how pragmatic this is, but it eliminated a big ugly if/elsif/elsif/end block.

The second revelation was that I was perhaps going in the wrong direction with wanting to have the “user.<rolename>? functions, where instead I could more simply have a function to see if the user is a role. IE:

  def has_role?(role)
    return self.role.include?(role)
  end

Way better than a big long if/elsif/elsif/elsif/elsif/end block or case statement, and way more extendable.

Feel free to check out the commit on github for all the gory details.  As always any suggestions are welcome.