Thursday, August 14th, 2008...6:42 pm
Caching Part 3: Expiring Caches
By: Robert Baskin (Online Director)
Jump to CommentsThis post is the third in my series about how we using caching here at the Yale Daily News. If you haven’t had a chance to read parts one and two, you should check them out first.
So as promised in parts one and two, this post will be about expiring caches. You want to know the secret of how we expire caches here at the YDN? The super-surreptitious algorithm? The answer is … we don’t!
Well, ok, it’s not exactly that we don’t expire caches. They do expire eventually. When we set the cache, we also set a time we want to expire. That could be 10 minutes from now, two hours from now or a day from now. Until that time, the caching helper will retrieve the data from the cache. When that time rolls around, it decides that the data is no longer valid and will regenerate the cache object.
What I mean by “We don’t expire caches” is that when editors make changes on the site, the relevant caches are not expired automatically. For example, let’s say an editor adds an article to the current issue. Ideally, it would show up immediately on the front page of the site. But since the data for the front page is cached, that article won’t show up immediately. It will have to wait until that cache expiration time rolls around; then it will show up.
Now, we do have a link that clears the entire cache. That invalidates all of the cache objects, so the entire site will be regenerated. We use that fairly often when we want updates to be reflected on the site immediately. But it’s not the greatest solution. Regenerating those caches is expensive; thus, clearing all of them is expensive. That’s why when we’re under load, I send out a “friendly” reminder saying “DON’T CLEAR THE **** CACHE!”
So why don’t we clear caches automatically? I know this is somewhat of a copout … but because it’s really really hard. This is mostly because of the way Cake returns the data from the model and how we’ve architected our caching system. The model returns an array of all of the relevant data. For example, our cache object for the front page looks something like this:
Array {
Issue { id:1,
Article {
{id:1},
{id:12345}
}
}
}
We save that array as our cache object. Now, let’s say an editor makes a change to article 12345. We would have to expire this cache. We’d have to expire the article page’s cache. Those are easy. But let’s say they change the headline. Is it in the most popular box? We’d have to expire that cache. Is it attached to a slideshow and cached in that slideshow’s cache object? As you can see, since our data is stored in so many cache objects, trying to expire them all doesn’t scale as a manual process.
One solution I’ve seen is to store in a separate field some metadata about the data in the cache object. So for the example above, store “issue_1, article_2, article_12345″. Then you can search through those to find which cache objects are holding things you need to expire. But that’s fairly slow and difficult to implement.
So we’ve basically decided that we’re ok having some data lag. Ideally, we’d have caches expire immediately. Maybe in the next complete rewrite?
Up next: switching to memory caching and view caching
Leave a Reply
You must be logged in to post a comment.