Thursday, May 15th, 2008...1:00 pm
It’s the Little Things in Life
By: Robert Baskin (Online Director)
Jump to CommentsI know I said my next post would have to do with our new Server header that we’re sending back, but I just wanted to get this in while it’s fresh in my mind. And plus, MySQL optimization is always worth talking about!
There are several places around our site where we use MySQL’s DATEDIFF function, which calculates the number of days between two given times. One example is the “Most Popular” box, which shows the most popular viewed, e-mailed and commented stories on our site. That works by retrieving stories posted within the last week sorted by the relevant count (hitcount, emailcount, or commentcount). If there aren’t 10 stories within the past seven days, it will go to 14 days to try and fill the box, then 21 and so on.
The query we were running had the following WHERE clause: DATEDIFF(NOW(), articles.dateposted) < 7. That got back only articles posted in the last week. That worked like we wanted it to, but it was a bit slow. Instead of taking just a couple milliseconds, it would take about half a second. Remember that it has to do at least three queries to get all three types, and if it doesn’t find enough articles within the past week, it has to run the queries again with 14 days, then 21 etc. That really only happens over breaks though, when we’re not publishing daily. Regardless, even three queries taking half a second is a lot.
The problem was mitigated by the fact that we cache the results of our database calls. So we actually only generate that “Most Popular” box every couple hours, rather than for every request. So did I actually need to optimize it? Well why not! I enjoy do this kind of stuff, and it looked like a quick fix. I was reading some slides called Join-Fu by Jay Pipes, who works for MySQL. Slide 17 was particularly relevant. He was talking about date calculations and optimizing them. I thought of our “Most Popular” box and set about optimizing it.
The problem with our query was that it could not take advantage of two important parts of what makes MySQL fast: indexes and query caching. First, apparently queries with functions operating on a column means it can’t use an index. (Indexes are kind of a large topic to get into in this post, so I’ll just direct you to some reading on them.) Not using indexes means the query runs much more slowly. To solve this problem, I followed his advice and rewrote the WHERE condition to be “articles.dateposted > NOW() - INTERVAL 7 DAY,” using MySQL’s INTERVAL syntax. The time taken to execute the query dropped to around 100 milliseconds - 1/5 of what it was!
But because the query still had a function in it, MySQL has to evaluate the result when the query is executed and therefore couldn’t cache the results of the query. Again, query caching is a bit outside the scope of this post (more reading on it), but at the basic level, MySQL caches the results of queries it executes, so the next time they’re run it serves the cached result instead of running the query again. I wanted to make our query use the query cache instead of having to be run every time. To do that, I just replaced the NOW() in the WHERE condition with today’s date as generated by PHP’s date function. So now the WHERE condition looks like “articles.dateposted > ” . date(’Y-m-d’) . ” - INTERVAL 7 DAY”. That query takes about 100 milliseconds to run the first time, but one millisecond or less to run each subsequent time. Remember: MySQL’s query cache persists beyond requests, so it’s cached for as long as MySQL is running (or until MySQL expires it), not just until the end of PHP’s life at the end of the request.
Was the optimization worth it? Probably not, but it’s always nice to improve the speed of frequently run queries by orders of magnitude.
1 Comment
June 7th, 2008 at 3:13 pm
[...] about It’s the Little Things in Life [...]
Leave a Reply
You must be logged in to post a comment.