Better Living through Memcached
I wanted to put something specific in the title, like “Speed up your service” or “Reduce server load” or “Limit database calls” or… You see why I chose “Better Living.”
Memcached is a memory caching system with an obvious name. It allows you to store basically any data that can be serialized into a giant, memory-resident hash, then retrieve it with its unique key.
Imagine not querying your database on every request, and you only begin to get a sense of how useful this is.
Let’s go through a simple, single-server setup.
Installing Memcached
First you’ll need to make sure libevent is installed. Easy enough through yum or apt-get.
Download Memcached and untar it. Run the configure
script. You can probably do this with no arguments. I used --prefix=/usr
to change from the default of/usr/local
. On your system you may want to add --enable-64bit
or --enable-threads
, depending on your configuration and expected load.
Run a quick make && make install
and you’re done.
Running Memcached
You can run Memcached from the command line but sticking the -d
parameter to run as a daemon. If you’re still root, you’ll need to add -u<em>nobody</em>
, where “nobody” is some unprivileged user. And that’s it. Memcached is up and running on your server and ready to connect.
If you’re on a system with chkconfig, I’ve put together a simple chkconfig script for it. The zip file contains two files, etc/rc.d/init.d/memcached
and etc/sysconfig/memcached
. Copy them to those locations, chmod +x /etc/rc.d/init.d/memcached
and then run chkconfig --add memcached
, and chkconfig memcached on
to run Memcached when the server starts. You can type service memcached start
to get it up and running.
Typing memcached -h
will show you all the options, including which ports and IP addresses to listen on, or which socket to use, how much memory to allocate, and others. You can add any of these command line options to the $OPTIONS
variable in the /etc/sysconfig/memcached
file to set the default options for the Memcached service.
Using Memcached (Finally!)
Memcached is installed and running. Now what? Well it sort of depends on your language—there are Memcached bindings for Perl, PHP, Ruby, Java, Python, C, C#, Lua, and even Postgres and MySQL—and your program. I’m not going to go through the specific implementation, but here’s the basic idea. (In PHP. Ok, so I lied about that specific-implementation thing.)
Let’s say our goal is to reduce database load. Then we want to try to preempt as many SELECT queries as we can.
If your original code was like this: (Please! Be safe about SQL-injection.)
// What post do we want? $id = get_current_id(); // Query the database $query = "SELECT * FROM posts WHERE id = '$id';" $result = $db->query($query); // Now we have the post data $post = $result->fetch_assoc();
What we’re doing is getting an ID, possibly from a user, then getting the associated table row and returning it. Now to speed that up with Memcached:
// Create a new Memcache object $m = new Memcache; $m->connect('localhost'); // Check Memcache for the data, first if ( !$post = $m->get("post:$id") ) { // Not in the cache, get from the database $query = "SELECT * FROM posts WHERE id = '$id';"; $result = $db->query($query); // Get the post data $post = $result->fetch_assoc(); // If we expect this data to be used again soon, // we can store it in the cache $m->set("post:$id", $post); }
There we have all the basics. The string "post:$id"
is the Memcached key. The post data is serialized and stored (with Memcache::set) as a key-value pair. If it’s in the cache, which we can check with Memcache::get (unless the stored value was boolean false
, then it gets confusing) then we don’t need to waste a database query.
If you expect data to be used as soon as it’s created, you can do your database INSERT
statements and then store the same data in the cache immediately. (Or Memcache::replace, whichever is necessary.)
So, what?
To be fair, my Memcached example made the code much longer, and required instantiating another object, another TCP connection… What’s the payoff?
Memcached was written to improve performance over at LiveJournal when they hit the 20+ million page view/day mark. Imagine that database call was in a loop, or if, instead of serializing one post, we built an array of all the posts from one user, and stored that. You can quickly see how many database queries we eliminate.
For fun, I ran a test that generated a random number, then tried to pull the associated row out of a database. (I only generated numbers for which I knew there was a row.) I iterated over this 3,000 times and did 10 trials each.
In PHP with a MySQL database, the Memcached version was around 60% faster, averaging 0.315 seconds vs. 0.820 seconds without Memcached.
But Wait! There’s More
Since Memcached was designed for a site that already existed on dozens of servers, it’s based on the idea that you’d run Memcached on all your web servers, and then treat them like a pool. You can connect to several servers simultaneously to spread the Memcaching around. Data gets hashed to a particular server and, if it’s available on any of them, gets pulled back.
Memcached can also compress data on the fly with zlib, so if you did plan on storing the last 20 posts of every user, you wouldn’t be out quite so much space.
Finally, Memcached will keep adding data until it hits its memory limit, and which point it drops the Least Recently Used stuff to make room. So even though you can set data not to expire, you can’t trust that it will be in the cache forever.
For sites that rely on database reads, Memcached can help a lot. If you’re site depends more on writes or updates, it will probably do very little, or even cause unnecessary overhead.
But since most data-driven sites are significantly bigger readers than writers, Memcached is a great way to improve response time and quality of service, and to reduce load on those databases for when you really do need to read or write to it.
Do You Memcached?
So, do you use Memcached? How? Does it speed up your service or have no noticeable effect?