Developing at Scale: Database Replication

When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.

One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.

Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.

Enter Replication

A common response is to set up database replication, where one database server operates as a “master,” and one or more other servers operate as “slaves.” In this setup, all of your writes to the database will go to the master, then “replicate” to the slaves, and all or most of the reads will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)

Replication introduces a new type of problem: if you naively send all reads to the slaves then data you just wrote will not be there.

La…wait for it…g

Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.

The result is that ACIDity is essentially broken, specifically the Durability part. You cannot simply write data and immediately rely on its existence.

For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone’s post may take seconds to appear on the site. This is a problem if you’re trying to show a user their post immediately after posting it.

Smarter Reading

The solution is to occasionally read from the master. When you need to access data that was just written, it is probably only available on the master, so that’s where you’ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.

Outside of a single HTTP request, this is slightly more complex. If you’re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.

One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.

A Pitch

One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn’t getting enough traffic, it’s not worth it.

Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by #webdev in IRC!

Programming in Middle School?

Eating breakfast in the hotel the other morning, my father mentioned a Twitter conversation he had to join about programming courses in schools and math curricula.

Programming and math education? I just had to get involved, too.

Ben Grey and Colleen K had started talking about the value of programming courses in school. Ben and my father (initially) were against it, concerned that the skills would be obsolete before they could be used, and were not particularly transferable to most fields. Colleen and I took the pro side.

Ben and my father both pointed out that languages go extinct, which is true. But judiciously chosen languages have staying power. Basic can be run in a browser, but is obviously a fine starting point. C has been around since 1972, and it isn’t going anywhere soon. JavaScript has been around since the mid-90s and seems to hold a secure point.

What of transferability? How many professional programmers still work in the first language they learn. Anyone? Anyone? Beuller? My first language was Perl, which helped me learn JavaScript and PHP, which helped me learn Java and C. People older than me probably started with Basic. People much older than me may have used ALGOL—which was on its fourth generation by the time C was born.

But what about the majority of students who don’t want to be programmers?

Programming is a powerful, concrete interface to the two mathematical concepts that cause the most problems for students in K-12: variables and functions.

In the US, our K-12 math curriculum covers three broad areas—numbers, variables and functions. These roughly correspond to primary math (counting, operations, fractions, equalities) algebra and geometry, and (pre-)calculus.

Most students can wrap their heads around numbers, even fractions. They are relatively concrete. Some students struggle with operations like multiplying and exponents, but this is the lowest hump, the bunny hill. The majority make it down unscathed.

When does your school start losing people in the math program? It’s probably around 8th grade. That’s when most schools begin the dreaded algebra. Variables are intuitive to some, but abstract and meaningless to others. If you believe Myers and Briggs, some of us are predisposed to abstractions and generalizations (*NTJ) but is there a biological reason others are lost here? Or do math teachers and curricula need to change?

Of the people who survive algebra without hating math, how many make it past “pre-calculus” or “FST”? I’ve heard the story a dozen times myself: “I was good at math until my calculus class.” If numbers are the bunny hill of K-12 math, then variables are the green circle, and functions are the double black diamond.

Again, functions may be completely intuitive to a few of us, but they can strike terror in the heart of even the most dedicated students.

Too often, when they reach a jump in complexity and struggle, students are told “it’s OK.” They are “not ‘math people.’” This can come from parents (“I was never good at math, either.”) or even well-intentioned teachers (“Maybe you’re just right-brained!”). There is a deeply held belief in this country that “math” is some innate ability, a genetic gift, and either you’re the next Will Hunting, or you may as well not try.

How does this relate to programming? How did you learn programming? Here’s a common roadmap:

  1. Imperative programming; no functions or abstractions; lots of constants; instructions in linear order.
  2. Using variables for input or consistency/ease-of-maintenance.
  3. Using functions and subroutines to encapsulate repeated operations.
  4. Using someone else’s functions—you probably don’t see the implementation.
  5. Branch off to more advanced ideas like object-oriented or functional programming; lots and lots and lots of abstractions.

A remarkably parallel route. And instead of saying “x is a number,” you can say “var name holds the name the user enters!” The results are far more immediate and interactive. Play is cheap. (“What if I change this line? Oh it doesn’t work, better undo that.”) Instead of limiting functions to scary numbers and equations, more pedestrian words can be used as arguments and return values.

Programming is applied mathematics. Teachers spend much of their lives looking for new examples, better applications, to drive home the theory, when there is already a wealth of application available.

There is also a two-pronged economic argument. To paraphrase Mr. Friedman: anything that can be outsourced, will, and the new jobs created here will require deeper interaction with computers. A cursory understanding of the programming techniques underneath will benefit all students as they enter the job market, and the exposure may mean more prepared students entering computer science programs. Basically: we need more talented, creative programmers—how many art students harbor latent programming skills—and even non-programmers will benefit from the exposure and understanding.

And of course, the math skills. No subject (that is taught) is taught as badly as math in our schools. On the personal level, this translates to people who misunderstand interest and get themselves in trouble with debt; on a national level, it certainly doesn’t help when a subprime mortgage market bubbles and pops.

Even if you don’t use “math” on a day-to-day basis, it is another way of thinking, of solving problems. Algorithmic thinking, epitomized in computer programming, provides a layer of cognitive flexibility, and every layer we can add, we should.

I don’t expect every student to become a master programmer, or even explicitly use those skills—or other skills in their math courses—every day. But I do expect schools to use every tool they have to make these methods of thinking and courses of study available to everyone. We wouldn’t allow a future programmer to skip his English class, why would we allow a future writer to skip his math and programming class?

JavaScript: Private Static Members, Part 1

A little while ago I talked about creating private variables and methods in JavaScript. This works, but is not necessarily efficient: each instance of the class creates new copies of the members. While that may be exactly what you want for instance variables (think of partNum in the old examples) it is not always ideal.

The complexity jumps significantly, though. So I’m dividing this half into two parts.

To get started, we need to forget about all this Object-Oriented Programming for a minute and look at some of the neat tricks you can do with functions in JavaScript.

Update: Part 2 is now available. (more…)

Private Variables in JavaScript

Ok, enough of this social/ranting stuff. Time to write something vaguely technical.

I have a love-hate relationship with JavaScript. I think anyone who works with it does. Sometimes it just doesn’t do what you expect, and it’s certainly different.

One trick, especially for people from real Object-Oriented languages like Java, Ruby, or let’s even say PHP 5, is the lack of access control. When everything is an object, the inability to hide certain values can become a problem. (more…)

System.out.println(“Hello World!”)

So I’ve been getting used to a couple new languages lately, mostly Ruby and Java but also Python and C++ for comparison. Coming from PHP, VBScript and JavaScript has confused the hell out of me. (more…)