Sunday, August 24th, 2008...1:11 pm
Caching Part 4: Switching to Memory
By: Webmaster
Jump to CommentsThis post is the third in my series about how we using caching here at the Yale Daily News. If you haven’t had a chance to read parts one, two and three, you should check them out first.
So as I’ve described, we got a system going where we were caching our data. Yes, we weren’t expiring caches very well, but the system was working and helping reduce load our on our server, so we were somewhat out of the woods with regards to our performance problems. It was at that point that we realized that somewhere along the way, eAccelerator had been disabled on our server.
Let me explain the purpose of eAccelerator. eAccelerator is what is known as a PHP accelerator. PHP is different from languages like C# and Java because it is interpreted, not compiled. When you write a C# program, you use what is known as a compiler (probably Visual Studio, though there are others). The compiler takes your source code and turns it into object code (also called bytecode), which can be more easily and quickly interpreted by the computer.
PHP is an interpreted language, like Python and Ruby. Instead of having a compiler that creates an executable with bytecode (the stuff that’s easy for a machine to read), when you visit a Web page, the PHP process starts up and goes through each line of the source code, compiling it to bytecode for every request. Doing this is complex and just like retrieving data from the database, ripe for caching.
eAccelerator lets the PHP interpreter run once, then saves the bytecode into memory. The next time a PHP script needs to be run, eAccelerator retrieves the bytecode from memory, and the PHP interpreter doesn’t need to run and do the complex parsing it normally would have to. Much faster – according to Wikipedia, 2-10 times as fast!
Well, somehow our eAccelerator installation on our convoluted cPanel setup had stopped working, and we needed to get another solution working. We found XCache, from the makers of Lighttpd. (At the time, eAccelerator hadn’t had a release in quite some time, whereas XCache development was quite active. For the record, I also hear good things about APC.) XCache solved our PHP accelerator woes, which was good.
However, it also offered variable caching. Instead of serializing data and saving it to a file, like our solution was doing at the time, we could instead pass in the data we wanted to save to a special XCache function. It would handle serializing the data and saving that data into its memory cache. There were also functions for retrieving the data, checking if it was expired, etc.
This was great! Let me explain why. I’m a “computing assistant” at Yale, which means I help students fix their computers and support their software and hardware issues. I’m very frequently asked about upgrading their computers or buying new ones. Whenever I give advice, I always recommend buying more RAM, as much as they can get. If you understand that accessing RAM is faster than accessing the disk, you can skip the following sidebar. But, if you’d like a hopefully easy to understand explanation, here’s how I explain why RAM is good to non-technical people:
Your processor is the brains of the computer. It’s very good at taking instructions and doing them very quickly. The hard part is getting instructions to it. It has some storage space available on it (CPU caches). Getting information from there is super fast, but there’s not much space, just a couple megabytes. Think of it like your desk – it’s very quick to reach over and grab the stapler, but there’s not enough room to store an elephant. The CPU generally keeps things it needs very frequently in this space, often the most basic parts of your operating system.
Then there’s your RAM. I recommend getting at least a couple gigabytes, which is many times more than what’s available right on your processor. You could fit almost anything in there, and it’s very fast. Think of it like your house – enough space for a couple elephants, and pretty quick to get something and bring it back to your desk.
The absolute last resort is your hard drive. Think of it like the United States of America. It’s massive – you can fit anything you want on here, and if you run out of room, just annex Canada (buy another hard drive for more space). However, it’s very very slow. You could have to drive across the entire country to pick up your elephant! That is why when your processor receives something, it tries to put it in your RAM. If you have lots of RAM, it can keep most of what it needs in there. If it runs out of room, then it has to go to your hard drive. Not good.
That was a long way to say that hitting the hard disk is all bad. With our file-based caching, we had to hit the hard drive for every caching operation! Instead, we could use XCache’s variable caching, which exists in memory. Now our caching operations read from and write to memory, which is much faster.
By far the biggest player in the memory caching space is memcached. It’s the big dog. It was developed by Danga for use at Livejournal, and now it’s used by everybody from Facebook to Youtube. We considered it, but I read that it’s actually slower than XCache because Apache has to connect to another process, whereas XCache runs in-process. That being said, memcached offers some advantages for multi-server solutions (and some nifty interoperation with nginx), so we may switch in the near future.
How does XCache avoid taking up all of available memory? We set an amount of memory that it can use. If it fills that up, it deletes the least recently used cache object from the cache to make more room. It never uses more than the maximum we allow. This is called the LRU cache algorithm.
So there you have it: how and why we switched to memory caching. One more left, and it’s a big one: view caching.
Leave a Reply
You must be logged in to post a comment.