Archive for the ‘Database’ Category

DataMapper 1.0 Interview

Wednesday, July 21st, 2010

My buddy Dan Kubb has a nice interview on the recent DataMapper 1.0 Milestone.  DM is something that I have been wanting to look at forever, everytime Dan does his “let me just show you how to do this in DM instead of ActiveRecord” stints, I’m always amazed at how fast and easy he does things.  Of course, I’m also amazed at how fast and easy things are done in ActiveRecord by people who are proficient in it :)  Time to break out the Rails 3 and see how fast and easy the integration of DM into it is!

Anyway, a great interview and well worth the read.

Datamapper 1.0 Released

Tuesday, June 8th, 2010

DataMapper 1.0 releaseWell, as of right now there aren’t any release notes or website update, but Dan Kubb (@dkubb) has tagged the 1.0 release of DataMapper.  This is for the talk at RailsConf today that Dirkjan Bussink (@dbussink) will be giving in about 15 minutes.

A huge congrats to my friend Dan and the rest of the DataMapper team as I know this has been a colossal amount of work to get a project like this to the 1.0 state, especially with the amount of stability and test coverage that the project has.

Update: The slides from the talk are up here.

Database and Performance In Web Applications

Monday, May 17th, 2010

This is sort of a follow up to my last post about caching.  One thing I discovered after chatting with @dkubb was that while caching was a fantastic help to make the site faster, but it was really just covering up the actual issue.

As he’s a bit of a database guy, he looked at some of the things I was doing and showed me some good performance testing tools (more on that in a second) and discovered that there was a lot of slowdown from things like repetitive DB calls and repetitive DB calls on unindexed data.  Anyway, it was fairly eye opening to someone who hasn’t done a huge amount of work with high-traffic sites, and has been blindly believing that since it seems to “just work” it should “just work” no matter what the situation is.  I figure I’d list out a few techniques and commands you can use to deal with your own scaling problems before they become problems.

As always, I’d suggest you actually ignore the rest of this post as it’s going to be a lot of naval gazing, and instead go and watch the Scaling Rails podcast series :)

How Do I Tell If There’s a Problem

First thing I did was enable the slow query log in MySQL, assuming that’s your database of choice.  If you have a modern version, set the threshold very low, I have mine set for 0.05.  This means that any query that takes more than 50 milliseconds will be logged.  If you have an older MySQL you may need to patch it to allow microsecond support older versions only support down t0 1 second.  Depending on how bad things are, even a resolution of 1 second or slower can be a huge help.

When MySQL is restarted, the logs may look something like this (possibly, as it was in my case, lots of messages like these):

# User@Host: db[db] @ localhost []
# Query_time: 0  Lock_time: 0  Rows_sent: 1  Rows_examined: 37
SELECT id
FROM   page_content
WHERE  ( ( content_order = '1' AND page_id = '1' ) ) ORDER BY content_order;

This shows you the database (helpful if you have more than one), how long it took (I assume to a resolution of seconds), how many rows the query looked at (more on this in a second) and the actual db that was run.

Ok, not bad, now we have two things, first is a metric to run by.  If you don’t know how to determine how bad things are, it’s really hard to figure out if you’ve fixed it.  In this case I used the slow query log as the metric, if it was empty, the issue is “fixed”, if there are entries going into there, that means that the queries are slow and there is something to go after, either through design or database indexes.

Using “Explain” and Database Indexes

The first thing I saw in my case was that each time I loaded an item, to get it’s image it had to make one of these queries.  Kinda sucky.  Now realize that the gallery is showing 48 images on each page.  So 48 items each looking through 5600+ rows to get their thumbnails.  I’ll let you do the math, but it equalsreally sucky however you do it.  So how to determine exactly where the error is and how to fix it, or at least make it better.

So the above error in the slow log tells us the most important thing, the query being run.  So what I did was take that and run it in the MySQL command line.  Unsurprisingly, it displayed the one row that was expected.

mysql> SELECT id FROM
   page_content WHERE  ((content_order = '1' AND page_id = '1'))
   ORDER BY content_order;
+----+
| id |
+----+
|  5 |
+----+
1 row IN SET (0.00 sec)

Next step, why did it take so long.  Running the MySQL built in command ‘explain’ before your query will give you some interesting information.

mysql> EXPLAIN SELECT id FROM
    page_content WHERE ((content_order = '1' AND page_id = '1'))
    ORDER BY content_order;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | TABLE        | type | possible_keys | KEY  | key_len | ref  | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | page_content | ALL  | NULL          | NULL |    NULL | NULL |   37 | USING WHERE |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
1 row IN SET (0.00 sec)

What does this mean? It means that to find the content entry for this page, the database had to go through 37 rows of data, and there were no possible keys (database indexes) for it to try to look through. Another thing I’m told to watch out for is “filesort” in the “extra” column, which means the system is doing extra work that will slow it down.

How To Fix This

Let me prefix this next bit of discussion  by saying I’m not a DBA, and know there’s way more to things like creating database indexes than what I’m presenting here, and that depending on what you do, you could make performance worse if you add too many indexes.

Now 37 isn’t all that bad, but it’s a scaling issue, so lets say the client puts more and more content on it, with no indexes on the database you get what’s called an On problem.  That is, the more database rows you have, the more it has to search through.  In fact, it’s pretty much the definition of “doesn’t scale” :)

A database index basically is a look-up table for queries.  It lets the database say “give me the data for conditions XYZ” and the database simply hands back the correct data.  Without indexes the database has to go through each row in the table to find the data matching the given conditions.   In this case I had of course an index on the primary key of this content table, but I had forgotten that it would be accessed primarily through the main “content_id” field, which does not have an index on it.  Oops, lets rectify this now:

mysql> CREATE INDEX page_content_page_id_index
    ON page_content(page_id);
Query OK, 37 rows affected (0.05 sec)
Records: 37  Duplicates: 0  Warnings: 0

And run the “explain” command again:

mysql> EXPLAIN SELECT id FROM
    page_content WHERE ((content_order = '1' AND page_id = '1'))
    ORDER BY content_order;
+----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+
| id | select_type | TABLE        | type | possible_keys              | KEY                        | key_len | ref   | rows | Extra       |
+----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | page_content | ref  | page_content_page_id_index | page_content_page_id_index |       4 | const |    2 | USING WHERE |
+----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+
1 row IN SET (0.00 sec)

So the big thing to look at here is the number of rows examined, 2.  Pretty damn good, but why were there 2?  This is because there are two rows with page_id = 1 and only 1 with content_order = 1.  A further optimization we can do is to add this index:

mysql> CREATE INDEX page_content_page_id_content_order_index
    ON page_content(page_id,content_order);

This creates an index on the page_id and content_order fields. Now when we do our query (slightly edited for width) we get:

mysql> EXPLAIN SELECT id FROM
    page_content WHERE ((content_order = '1' AND page_id = '1'))
    ORDER BY content_order;
+----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+
| id | select_type | TABLE        | type | possible_keys                  | KEY                                      | key_len | ref         | rows | Extra       |
+----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+
|  1 | SIMPLE      | page_content | ref  | page_content_page_id_index,... | page_content_page_id_content_order_index |       8 | const,const |    1 | USING WHERE |
+----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+

Now the number of rows examined to find the results of the query is 1, down from 2, so a win right? Well, yes and no. Down this path could lie madness. You don’t want to create indexes on every single combination because as you increase the number of indexes you up the time that the database takes to find the results. Or something like that, I’m honestly a bit fuzzy on the whole thing, but I do know that you don’t want to add a bazillion database indexes if you can avoid it.

Conclusion

As I said above, I’m by no means a database expert (as you can clearly see from the original blunders I made), but the point of this was more providing you with some tips and pointers on how to do some of the troubleshooting and get yourself onto the path of better performance.  To sum it all up:

  • Keep an eye on your logs to keep an eye out for large numbers of database queries.
  • Enable slow query logging and watch for queries that are going over a large number of rows.  Anything that’s taking more than a second (on the high end) is a big red flag.
  • Learn to love the ‘explain’ command in MySQL and add indexes appropriately.
  • Don’t add too many indexes :)

Hope that helps you!  More tips and pointers as always, appreciated.

When and Where to use Caching In Rails

Tuesday, May 11th, 2010

(Alternative title to this post: Rails Caching Enlightenment Through Perl)

This entire post will either make you think I’m a horrible web programmer, or hopefully, show you the deep and meaningful insights that I’ve managed to eke out from the experience.

Server Crashes Are Bad
Last night I was up coding, but sadly, not rails code. Yesterday I had a call from a client telling me that the website had crashed, and their clients were pissed off because this happened just when they had put out their weekly newsletter. The server (a virtual hosted system hosted at GoDaddy) had locked up tight, rendering all the sites that were on the server un-reachable.

The server was rebooted, and digging into it there was no clear reason why it had crashed, which is of course not what they wanted to hear. I had a development server set up at home, so I did a bit of testing to see if I could duplicate it. The server was old and slow (not even dual core), but usable as a Linux workstation, so I figured that if I could make the site feel faster here, the gains on the production server would be huge.

Determining A Baseline
First was to hit it with the trusted Apache Benchmark utility, I started with a perfectly reasonable 10 concurrent connections for 1000 hits hitting the landing page that was sent out in the newsletter, and probably one of the more intensive pages DB and logic wise on the site.

$ ab -c 10 -n 1000 “http://site.com/page.html?id=117”

Looking at my system monitor, I immediately saw my CPU use jump to 100%, disk access go from a blip here and there to constant, and the machine ground to a halt.

I hit ctrl-c pretty fast.

OK, problem found, the site is a vampire and sucks the life out of the server. I played with a few different settings and eventually ended up using 5 concurrent connections and 20 hits “-c 5 -n 20” which gave me an average 2000-4000ms to serve pages, about 1 request a second.

Horrible right? Before you put me against the wall to keep me from doing any web programming again, remember this is a really old server. Please? Maybe one smoke before it’s time for me to go?

Small Fixes, Small Gains
So there were three things that I figured I needed to look at:

  1. Use the YSlow plugin to find some small gains
  2. Finally see if there are any just bad code that could be refactored out, loops within loops, useless re-calculation, etc.
  3. Re-examine the number of queries going on on the page

YSlow gave me a few things to do. Setting the expires header for images, moving CSS and JS to the top and bottom of the page, and a couple of other minor things that gave me no real gains via ab.

Surprisingly, there weren’t any low hanging fruit for bad code or useless loops within loops. This sucked, mostly because that meant me going through and re-looking at SQL and refactoring that, which I’m not sure about you, but that doesn’t sound like fun to me.
Somewhat more surprisingly there were only a couple of extra queries, mostly related to the ORM I was using, Class::DBI and some just silly things. Sadly none of these gave me any more gains.

One thing I did find was where the issues were. When I commented out the main grid of items that is the focus on the page, the page response went from 2000-4000ms response time to 200. Hmm…, so what to do with this. What if I could make it so the time to generate the main product grid didn’t happen? So I commented out the dynamic code, and copied in the HTML produced (from the view source window in firefox) to see if it was the dynamic generation (which wasn’t really that complex from what I could see). Again, 200-400ms time, serving 9-10 requests a second, with almost no CPU or disk impact.

OK, so I thought what if there was a way to pre-generate the HTML periodically, and then have the perl code load that instead of doing it dynamically each time. That almost sounds like….. “caching“. Huh, almost like something that should be built in.

Enter Caching
Honestly my experiences with caching have been minimal, most of the time I am trying to prevent caching (for re-uploaded images with the same filename, that sort of thing), and also it just hadn’t come up yet, probably because most of the sites I have worked on don’t get huge enough traffic to require it. Luckily I had just read something about Caching in the HTML::Mason developer docs while finding some information for something else.

HTML::Mason has the concept of “components”, similar to partials in the rails world. You’d call something like this:

blah blah
< & "/comp/gallery.mc", id => $id, page => $cur_page, title => "TiR" &>
blah blah

When the page is rendered it would call the gallery.mc component with the given arguments, render it, running whatever code is in there (HTML::Mason isn’t the nice separated MVC that Rails is, so there’s potentially lots of controller code in your pages and components) and replacing the < & &> with the output. The documents have a nice section on the built in page and component (think fragment) caching where all you need to do is to add this code to the top of your component’s “init” section:

return if $m->cache_self(key => 'fookey', expires_in => '3 hours', [other options...] );

This lets your component see if it’s already in the cache, and not expired, and if it is, serves that, and if not, renders and then caches itself with the given key. The only tricky part is figuring out the right cache key to ensure it’s unique for each section of code. I ended up writing something like this:

$key = "gallery|$id|$cur_page|$title";
return if $m->cache_self(key => $key, expires_in => '10 minutes', [other options...] );

This makes the cache key a hash of the arguments sent to the component, ensuring that each differently rendered version of the page will get a different cache key. Not perfect I’m sure, but a nice mix of good caching and safety.

Running ‘ab’ again I found that while the first couple of requests still took 2000-4000ms to run, subsequent pages were served in the 200-400ms range, and the CPU and disk load was way down.

WTF – This is a Rails Blog
So why am I telling you all this Perl stuff? It’s because this is related more to web programming and programmer mindset than Perl or HTML::Mason. You could replace “perl” with “ruby”, “component” with “partial” and “HTML::Mason” with “Rails” and get the same idea.

Because everything ran fine when the site was under development and only two or three people were hitting it I didn’t have to worry about caching or performance issues. In fact, I didn’t even think about performance because thigns “just worked”. When things did go badly (again, server crashes == pissed off clients), I had to scramble to find a solution (luckily only one night of work).

I’m still doing testing with the new caching code, but I expect to put it online tonight or tomorrow, and look forward to the before and after numbers on the production server.

Lessons Learned
My lessons learned:

  • Watch from the start for cachable pieces of code. Big complex SQL queries or complex logic that can be created once a week, day or even every minute is a candidate. In Rails it can be as simple as surrounding the code with < % cache do %> .. < % end %>.
  • Test performance from the onset. Learn to love Apache Benchmark and start hitting your sites potential hot pages from the start, and watch and learn what causes reponsiveness to go down.

Resources
For those of you wanting some actual rails resources to learn more about this stuff, have a look at the following:

  • Understanding ‘ab’ results – Nice resource for how to read that output.
  • Caching with Rails – The rails guides documentation with details on page, fragment, action caching and everything in between.
  • Rails 2.1 Caching – A bit older, but a nice list of the caching capabilities introduced and available in Rails 2.1, still pretty relevant.
  • The Scaling Rails Podcast Series – Fantastic information in here, I recommend watching all of them, if you can’t, hit #2, 3, 5, 6, 7 for caching, and then #15 and 16 for load testing with ab and friends.
Any other resources or hints as to how to deal with caching in Rails (or Perl for that matter! :) ?

More on Using Enums For Constant Data in Rails

Monday, April 19th, 2010

So I got around to rollowing tip #3 on using Enums in AR, and found it worked…. mostly.  The problem comes in where the value isn’t already set.  I started with this in my model:

# game.rb
  # at the top of the file, define the list of genders
  GENDERS = %w( boy girl coed )
  # and validations for it
  validates_inclusion_of :gender,   :in => Game::GENDERS, :on => :create, :message => "extension %s is not included in the list"
 
  # finally define the gender as a symbol for lookups
  def gender
     read_attribute(:gender).to_sym
  end
  def gender=(value)
    write_attribute(:gender, value.to_s)
  end

This works fine until you have a nil value for gender.  OK, next step, just check if the value is nil before you read it and return nil if it is.  Only thing is that if you were to add something like

if self.gender.nil? return nil

But then you get an ugly “stack level too deep” error, because when you call ‘self.gender’ it’s calling the gender method, which checks to see if self.gender is nil, which calls the gender method, which… well, you get the picture.

Took a bit of looking, and I’m not sure if this is the “correct” solution, but it does work properly.  I just modified the gender method as such:

#game.rb
  def gender
    attributes = attributes_before_type_cast
    if attributes["gender"]
      read_attribute(:gender).to_sym
    else
      nil
    end
  end

This uses the attributes_before_type_cast grabs all attributes into a hash (before they are mangled by whatever ActiveRecord does), checks to see if the ‘gender’ attribute is filled in and either returns it or nil.  Depending on if you’re learning or not, you may want to just check out the activerecord_symbolize plugin though :)

All working, and ready to commit to the main branch.

Quick Way To Experiment With ActiveRecord

Tuesday, April 13th, 2010

A while back I spent a fair amount of time with the Pro Active Record book, learning all about dynamic finders, associations, validations, and all the other magic it has, sometimes it was a bit of a pain to set everything up by hand, typing out the .rb model files, creating the databases, etc.  I figured, why do all that work (yea yea, I know it’s just typing out some text) when you have the power of rails, generators and rake at your disposal.  So what I’d do is simply use the rails generators.

$ rails test && cd test
[...]
$ script/generate model User name:string age:integer
[...]
$ rake db:create && rake db:migrate
$ vi app/models/user.rb # optional
$ script/console

Voila!  Four commands and you’re in an interactive shell that will let you manipulate your models, add data, run finds, etc.  No need to deal with anything other than an already set up database and model file, and you have a nice interactive shell to boot.

Best part?  Just do an “rm -rf <directory>” when you’re done to nuke the directory and either start again or try out the next experiment.