Database and Performance In Web Applications
This is sort of a follow up to my last post about caching. One thing I discovered after chatting with @dkubb was that while caching was a fantastic help to make the site faster, but it was really just covering up the actual issue.
As he’s a bit of a database guy, he looked at some of the things I was doing and showed me some good performance testing tools (more on that in a second) and discovered that there was a lot of slowdown from things like repetitive DB calls and repetitive DB calls on unindexed data. Anyway, it was fairly eye opening to someone who hasn’t done a huge amount of work with high-traffic sites, and has been blindly believing that since it seems to “just work” it should “just work” no matter what the situation is. I figure I’d list out a few techniques and commands you can use to deal with your own scaling problems before they become problems.
As always, I’d suggest you actually ignore the rest of this post as it’s going to be a lot of naval gazing, and instead go and watch the Scaling Rails podcast series
How Do I Tell If There’s a Problem
First thing I did was enable the slow query log in MySQL, assuming that’s your database of choice. If you have a modern version, set the threshold very low, I have mine set for 0.05. This means that any query that takes more than 50 milliseconds will be logged. If you have an older MySQL you may need to patch it to allow microsecond support older versions only support down t0 1 second. Depending on how bad things are, even a resolution of 1 second or slower can be a huge help.
When MySQL is restarted, the logs may look something like this (possibly, as it was in my case, lots of messages like these):
# USER@Host: db[db] @ localhost [] # Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 37 SELECT id FROM page_content WHERE ( ( content_order = '1' AND page_id = '1' ) ) ORDER BY content_order;
This shows you the database (helpful if you have more than one), how long it took (I assume to a resolution of seconds), how many rows the query looked at (more on this in a second) and the actual db that was run.
Ok, not bad, now we have two things, first is a metric to run by. If you don’t know how to determine how bad things are, it’s really hard to figure out if you’ve fixed it. In this case I used the slow query log as the metric, if it was empty, the issue is “fixed”, if there are entries going into there, that means that the queries are slow and there is something to go after, either through design or database indexes.
Using “Explain” and Database Indexes
The first thing I saw in my case was that each time I loaded an item, to get it’s image it had to make one of these queries. Kinda sucky. Now realize that the gallery is showing 48 images on each page. So 48 items each looking through 5600+ rows to get their thumbnails. I’ll let you do the math, but it equalsreally sucky however you do it. So how to determine exactly where the error is and how to fix it, or at least make it better.
So the above error in the slow log tells us the most important thing, the query being run. So what I did was take that and run it in the MySQL command line. Unsurprisingly, it displayed the one row that was expected.
mysql> SELECT id FROM page_content WHERE ((content_order = '1' AND page_id = '1')) ORDER BY content_order; +----+ | id | +----+ | 5 | +----+ 1 ROW IN SET (0.00 sec)
Next step, why did it take so long. Running the MySQL built in command ‘explain’ before your query will give you some interesting information.
mysql> EXPLAIN SELECT id FROM page_content WHERE ((content_order = '1' AND page_id = '1')) ORDER BY content_order; +----+-------------+--------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+--------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | page_content | ALL | NULL | NULL | NULL | NULL | 37 | USING WHERE | +----+-------------+--------------+------+---------------+------+---------+------+------+-------------+ 1 ROW IN SET (0.00 sec)
What does this mean? It means that to find the content entry for this page, the database had to go through 37 rows of data, and there were no possible keys (database indexes) for it to try to look through. Another thing I’m told to watch out for is “filesort” in the “extra” column, which means the system is doing extra work that will slow it down.
How To Fix This
Let me prefix this next bit of discussion by saying I’m not a DBA, and know there’s way more to things like creating database indexes than what I’m presenting here, and that depending on what you do, you could make performance worse if you add too many indexes.
Now 37 isn’t all that bad, but it’s a scaling issue, so lets say the client puts more and more content on it, with no indexes on the database you get what’s called an On problem. That is, the more database rows you have, the more it has to search through. In fact, it’s pretty much the definition of “doesn’t scale”
A database index basically is a look-up table for queries. It lets the database say “give me the data for conditions XYZ” and the database simply hands back the correct data. Without indexes the database has to go through each row in the table to find the data matching the given conditions. In this case I had of course an index on the primary key of this content table, but I had forgotten that it would be accessed primarily through the main “content_id” field, which does not have an index on it. Oops, lets rectify this now:
mysql> CREATE INDEX page_content_page_id_index ON page_content(page_id); Query OK, 37 ROWS affected (0.05 sec) Records: 37 Duplicates: 0 Warnings: 0
And run the “explain” command again:
mysql> EXPLAIN SELECT id FROM page_content WHERE ((content_order = '1' AND page_id = '1')) ORDER BY content_order; +----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+ | 1 | SIMPLE | page_content | REF | page_content_page_id_index | page_content_page_id_index | 4 | const | 2 | USING WHERE | +----+-------------+--------------+------+----------------------------+----------------------------+---------+-------+------+-------------+ 1 ROW IN SET (0.00 sec)
So the big thing to look at here is the number of rows examined, 2. Pretty damn good, but why were there 2? This is because there are two rows with page_id = 1 and only 1 with content_order = 1. A further optimization we can do is to add this index:
mysql> CREATE INDEX page_content_page_id_content_order_index ON page_content(page_id,content_order);
This creates an index on the page_id and content_order fields. Now when we do our query (slightly edited for width) we get:
mysql> EXPLAIN SELECT id FROM page_content WHERE ((content_order = '1' AND page_id = '1')) ORDER BY content_order; +----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+ | 1 | SIMPLE | page_content | REF | page_content_page_id_index,... | page_content_page_id_content_order_index | 8 | const,const | 1 | USING WHERE | +----+-------------+--------------+------+--------------------------------+------------------------------------------+---------+-------------+------+-------------+
Now the number of rows examined to find the results of the query is 1, down from 2, so a win right? Well, yes and no. Down this path could lie madness. You don’t want to create indexes on every single combination because as you increase the number of indexes you up the time that the database takes to find the results. Or something like that, I’m honestly a bit fuzzy on the whole thing, but I do know that you don’t want to add a bazillion database indexes if you can avoid it.
Conclusion
As I said above, I’m by no means a database expert (as you can clearly see from the original blunders I made), but the point of this was more providing you with some tips and pointers on how to do some of the troubleshooting and get yourself onto the path of better performance. To sum it all up:
- Keep an eye on your logs to keep an eye out for large numbers of database queries.
- Enable slow query logging and watch for queries that are going over a large number of rows. Anything that’s taking more than a second (on the high end) is a big red flag.
- Learn to love the ‘explain’ command in MySQL and add indexes appropriately.
- Don’t add too many indexes
Hope that helps you! More tips and pointers as always, appreciated.
Related posts (maybe):
May 17th, 2010 at 6:06 am
I wrote about a similar issue here http://anthonyw.net/technique-to-approach-mysql-performance-issues. I also suggest that any web developer that doesn’t have a full time DBA pick up the book High Performance MySQL found here: http://oreilly.com/catalog/9780596003067
Also your encoder is turning your “>” to actual character encodings (“>”), might want to check that out.
May 17th, 2010 at 8:22 am
Good links, and I did notice the bad encodings… will try to convince wp-sytax to Do The Right Thing with them.
May 17th, 2010 at 10:04 am
The problem with lot of indexes is the insertion time. Indexes are metadata structures that are modified every time you insert a new row. When you create a new row, the indexed data must be also added in the index and maybe some [complex] operation must be ran.
Another problem on indexes is that consume take disk space, although they are not preatty big.
Nice tutorial man.
May 17th, 2010 at 10:04 am
I believe the reason you don’t want too many indexes is because it hurts performance during inserts, updates, and deletes. The reason for this is that every time you perform one of these operations the database will most likely have to update each affected index and so the more indexes the more updates to perform. However, I don’t believe that more indexes affect performance during select operations.
May 17th, 2010 at 10:10 am
I was under the impression that if you have a ton of indexes the DB ends up taking extra time figuring out which index to use, however I’m sure that’s more of an edge case (or an issue with far larger databases than I’m dealing with and the insert/update performance is more likely to suffer. That said, I wonder if in the situation that you’ve got a more read-heavy database, then it’s ok to overload with more indexes?
May 17th, 2010 at 10:10 am
Thanks Diego, appreciate the feedback!
May 17th, 2010 at 11:40 am
Nice, just last week I had to face the music with a slow application. Caching and Indexes to the rescue!
May 17th, 2010 at 1:12 pm
Along with indexes, be sure to watch your joins. MySQL, for example, is often not efficient when using inner joins nested within outer joins (see http://docforge.com/wiki/MySQL/Performance).
May 19th, 2010 at 1:44 am
nice tips.
btw…
>> Learn to love the ‘exlain’ command i
should be “explain”
May 20th, 2010 at 1:56 am
CREATE INDEX page_content_page_id_content_order_index ON page_content(page_id,content_order)
The order of the columns does matter, especially if you are probably using innoDB with B+Tree index. Consider using your content_order as the leftmost column in your index since you are sorting by it. The index is always kept sorted by its leftmost columns.
May 24th, 2010 at 5:51 am
[...] Thinking In Rails » Blog Archive » Database and Performance In Web Applications [...]
November 4th, 2010 at 9:34 am
[...] that can’t be optimized, either because of latency you can’t avoid from disk loading, more database indexes vs loading from disk, the laws of physics, code complexity vs readability, etc. It is however something though that I [...]