Developing at Scale: Database Replication

When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.

One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.

Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.

Enter Replication

A common response is to set up database replication, where one database server operates as a “master,” and one or more other servers operate as “slaves.” In this setup, all of your writes to the database will go to the master, then “replicate” to the slaves, and all or most of the reads will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)

Replication introduces a new type of problem: if you naively send all reads to the slaves then data you just wrote will not be there.

La…wait for it…g

Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.

The result is that ACIDity is essentially broken, specifically the Durability part. You cannot simply write data and immediately rely on its existence.

For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone’s post may take seconds to appear on the site. This is a problem if you’re trying to show a user their post immediately after posting it.

Smarter Reading

The solution is to occasionally read from the master. When you need to access data that was just written, it is probably only available on the master, so that’s where you’ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.

Outside of a single HTTP request, this is slightly more complex. If you’re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.

One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.

A Pitch

One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn’t getting enough traffic, it’s not worth it.

Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by #webdev in IRC!

Code-sharing Update

When we decided to move SUMO to a new platform, one of the reasons we chose Django was code sharing and reuse—specifically that SUMO and AMO would be able to share code, meaning both teams would save time and see benefits.

So how is that going? Were we right in our assumption here? The code we’re sharing so far:

MultiDB Router
A Django DB router that supports reading from a pool of slave databases.
Cache Machine
A powerful caching library for Django that, in particular, provides automatic object caching and invalidation through the ORM.
Jingo
An adapter for using Jinja2 templates with Django.
Django-Nose
A test runner for Django using Nose.
Django Debug Cache Panel
Adds a cache panel for Django Debug Toolbar.
Test-Utils
Tools we use testing in the Django/Jinja2/Nose setup.
Bleach
A library for sanitizing and linkifying user HTML, based on html5lib.
Fixture Magic
Django management commands for working with fixture data.

Additionally, we expect both teams will probably use the following, eventually:

DidYouMean
A wrapper for Hunspell, using PyHunspell to provide spelling suggestions for searches.
Django Gearman
Provides an easier interface from Django to the Python Gearman bindings.
AMO’s JS and CSS minification
AMO has already solved the problem of JS and CSS minification with Django and Jinja2.

And it’s not a released library, but SUMO has also been able to directly reuse code from AMO to simplify pagination.

Overall, it seems like we’re doing really well on this! It’s great to see the projects not just sharing code, but packaging and publishing it on Github and PyPI. If any of the above is useful to you, go ahead and try it out! You can open issues with any of the packages on Github, or find us in #webdev in irc.mozilla.org.

WP: Better Search Widget 1.1

Better Search Widget 1.1 is a significant upgrade to Better Search Widget that adds new features and fixes an old bug with internationalization.

Features

(New features in bold.)

  • Optional default value.
  • Optional, custom widget title.
  • Optional onfocus and onblur listeners.
  • Optional, customizable focus and blur colors.
  • Custom button value.
  • Custom field size.

The built-in search widget has only one of these features, the optional, custom title.

Onfocus and Onblur

In order to use the blur and focus colors, you must enable the onfocus and onblur event listeners. In order to use the listeners, you must specify a default value (otherwise none of this makes sense). Here’s an example:

Bug Fixes

A pretty serious typo meant that none of the internationalization code worked correctly. This has been fixed, and en_US, en_GB, and fr_FR localizations are available. de_DE is coming. If you’d like to translate, there is a .pot file included in the languages directory.

License

Better Search Widget is released under the MIT License. If you use it, or have suggestions for new features or bug fixes, let me know!

Getting It

You can download Better Search Widget 1.1 now in a Zip file. Or, to save yourself some trouble,  you can check it out of Subversion from

svn co svn://jamessocol.com/better-search-widget/tags/1.1.0 ./better-search-widget

(Run that in your wp-content/plugins directory.) Subversion will make it easiest to upgrade later.

Roadmap

Soon, though probably not today, I will be releasing Better Search Widget 2, which will take advantage of the new Widget API in WordPress 2.8. This will add support for multiple instances of the widget, but will require at least WordPress 2.8. You should upgrade, anyway.

Widget l10n

I spent some of today working on bringing a couple of WordPress widgets up-to-date (Better Search and Most Comments) only to discover there is a new widget API. I guess I haven’t been paying attention.

I’ll probably start some 2.0 branches tomorrow to take advantage of the new API. I wish I didn’t know how many people don’t keep their WordPress installations up to date, so I wouldn’t care about backwards compatibility.

At least both widgets got nice new, and functional, internationalization (i18n) code and new localization (l10n) files.

And BSW got a good feature update, incorporating some suggestions from Marco Jung, who is also, kindly, doing a German localization, and a few of my own. The built-in search widget has stepped up it’s game, and fixed the thing BSW was originally designed to fix (no widget title) so I have a higher bar to clear to justify the name “Better Search Widget.”

I’ll write up the new features tomorrow.

JavaScript: Private Static Members, Part 2

Finally, it’s time to finish up the lesson on private static members and methods in JavaScript.

Last time, I introduced the technique of creating and immediately executing a function, using parentheses. I talked a little about returning a function and storing it in a variable.

var myFunc = (function () {
  return function () {
    alert("Hello, World!");
  }
})();

alert(myFunc); // "function () … "

myFunc(); // Hello, World!

(more…)