Developing at Scale: Database Replication

When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.

One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.

Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.

Enter Replication

A common response is to set up database replication, where one database server operates as a “master,” and one or more other servers operate as “slaves.” In this setup, all of your writes to the database will go to the master, then “replicate” to the slaves, and all or most of the reads will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)

Replication introduces a new type of problem: if you naively send all reads to the slaves then data you just wrote will not be there.

La…wait for it…g

Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.

The result is that ACIDity is essentially broken, specifically the Durability part. You cannot simply write data and immediately rely on its existence.

For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone’s post may take seconds to appear on the site. This is a problem if you’re trying to show a user their post immediately after posting it.

Smarter Reading

The solution is to occasionally read from the master. When you need to access data that was just written, it is probably only available on the master, so that’s where you’ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.

Outside of a single HTTP request, this is slightly more complex. If you’re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.

One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.

A Pitch

One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn’t getting enough traffic, it’s not worth it.

Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by #webdev in IRC!

Surviving Pac Man

On Friday, Google showed off a fun new doodle in honor of the 30th anniversary of Pac Man: a Pac Man clone, complete with sounds.

Unfortunately, in the initial release, those sounds started playing automatically—an oversight or an homage to <bgsound>, I guess. Even if Google was open in a background tab or window, or in a hidden iframe created by an add-on, the Pac Man music and sound effects would start.

And that confused some people.

Many people came to SUMO looking for an explanation, and many of them, not finding anything in the knowledge base, started posting to our forum. So many, in fact, that our database server started running out of connections.

The pounding we took on the forums also caused replication on our slave databases to fall behind by as much as 1.25 hours, so even when we wrote an article about the noises [article has been removed], it didn’t show up for most people.

As Sean put it: “We just got DDOSed by Pac Man.”

To shore up the site and bring it back from the brink of toppling over, we worked with IT (thanks, Dave!) to implement a number of temporary solutions. We…

  • …disabled a particular kind of slow, frequent, and useless query.*
  • …blocked Google’s crawler from indexing the site.
  • …disabled our own sumobot’s forum-crawling features.
  • …rotated DB slaves out of the production pool to allow them to catch up.

Google has already removed the Pac Man doodle from their home page, and we can revert most of the emergency measures here on Monday. But the event does remind us to look at what we’re doing in Kitsune, our rewrite, to weather storms like this in the future.

One idea, suggested by Dave Dash, is a read-only mode where all pages that can trigger database writes are temporarily disabled. We’ll be looking pretty seriously at this over the next couple of days.

Another important take-away is to make damn sure pages only trigger database writes if they really need to. Writes can never bounce off a cache, so they are very expensive.

Finally, we should be more proactive in how we interact with our Zeus cache. We’ll also think about whether it makes sense to start using Wil Clouser’s Zeus interface, Hera, sooner than later.

“Too much traffic” is the best problem a web development team can have. Hopefully, the first time this happens to Kitsune, we’ll be ready.

* The queries that increment the number of views a forum thread has gotten are particularly slow for some reason. They’re also wildly inaccurate, since most people see a cached version of those pages and never trigger the query. The worst part: they occur on every (non-cached) page view, even while just reading.

(This post was translated into Belorussian, isn’t that cool?)

Responsible SQL: How to Authenticate Users

Most SQL-injection articles set a horrible example for young programmers.

Here is a very typical “bad example” of why you need to escape user data before it goes into SQL queries:

(ed. The symbol « is a line break that’s not in the real code.)

$username = $_POST[‘username’]; // username=admin
$password = $_POST[‘password’]; // password=’ OR 1=1; — ‘

$user = $db->query("SELECT * FROM users WHERE «
           username=’$username’ AND «
           password=’$password’ LIMIT 1;"
);

The point, of course, is that you must sanitize your user input, or else this person would run this query:

$user = $db->query("SELECT * FROM users WHERE «
           username=’admin’ AND «
           password = ” OR 1=1; — ‘ LIMIT 1;"
);

Which grants the sneaky user all your admin privileges. Other versions have nefarious users dropping your users or articles tables.

The problem is: this is the wrong way to authenticate users. These examples are written for beginners to understand the importance of sanitizing input, but they also provide a model to those beginners for how user authentication works. And it’s a very bad model.

This is a long one, more after the break. (more…)

Connecting PHP, IIS 6, and SQL Server 2005

I know I will be accosted for this, but at work we needed to run PHP on IIS 6 (fairly simple) and connect it to a remote database server running SQL Server 2005 (not terrible, once I gave up the Microsoft way).

Yeah yeah, do it in ASP.NET, I know. While I like C# as a language, I kind of hate ASP.NET as a framework, so what are you gonna do? Java was an option but the start-up time was too long for this project.

My first Google search for “PHP SQL Server 2005″ turned up the Microsoft SQL Server 2005 Driver for PHP. “Well great!” I thought. It’s just a PHP extension, very easy to install on Windows. But I didn’t know the horrid depths into which I was about to sink.

The Microsoft driver comes with an example application and database. The application assumes you are connecting to a local database. There is scant information about remote databases.

The driver defines this function:

sqlsrv_connect($host[, $connectionOptions[, ...]]);

The example application tells you to set $host to (local). Supposedly this works. However, after scouring the internet for several days, and trying every permutation of hostname, Windows networking name, port, IP address, white space, and several other variables that shouldn’t have been in there, I’ve decided it doesn’t talk to remote servers nicely.

PDO‘s ODBC driver, on the other hand, and a quick visit to www.connectionstrings.com, worked wonderfully.

Here is how I needed to create the PDO object. I hope this is useful for someone else:

(ed. The symbol « is a line break that’s not in the real code.)

$host     = '1.2.3.4';
$port     = '1433';
$database = 'MyDatabase';
$user     = 'MyDatabaseUser';
$password = 'MyDatabasePassword';

$dsn = "odbc:DRIVER={SQL Server}; «
 SERVER=$server,$port;DATABASE=$database";

try {
  // connect
  $conn = new PDO($dsn,$user,$password);
} catch (PDOException $e) {
  // fancy error handling
}

Help Me Scale

I’ve been reading Eran Hammer-Lahav’s intelligent posts on microblog scalability, and now I’m concerned about my own “microblog” site, Picofiction.

Similar to social networks, social updates, social messaging, social… Like many social web sites—amongst our weaponry…—Picofiction lets you “follow” your favorite authors, displaying all their posts along with yours.

I handle this very naïvely: everything is offloaded to the database. There are three tables involved here, one of users, one of posts, and one of follower/followee bindings.

Here’s the basic structure of this query:

SELECT post_id, post_body, post_date, post_type,
  user_name AS author_name, user_id AS author_id
FROM posts
LEFT JOIN users
ON posts.author_id = users.user_id
WHERE author_id = 'CURRENT_USER'
OR author_id IN (
  (SELECT followed_id
   FROM followers
   WHERE following_id = 'CURRENT_USER')
  )
ORDER BY post_date DESC
LIMIT PAGE_START,20;

Here’s where I need help: this works great on a single database, but it does not scale horizontally.

Since this horizontal scalability is such a hot topic right now, I’m asking for ideas. I’d like to put in the infrastructure before there is a need for it.

Eran points out that caching is not as simple a solution as we’d like to think. What do you cache? How do you keep caches in sync?

Does anyone have experience with MySQL Cluster Servers? It seems like the best way of scaling is to make the process as parallelizable as possible. The database then handles the parallelization, so the less I can do in the program the better, right?