Semi-disconnected this weekend

Shortly after I publish this blog post, I will disassemble the computer I’m using to write it and pack it all up.

I’ll probably have WiFi until tomorrow, but the best ways to get in touch with me are Twitter and personal email, or phone/text message. I’ll try to check Mozilla email but don’t expect a response before Monday.

Next week I’ll be in New York!

Moving to New York

It’s summer, and you know what that means: big life change blog post!

This one isn’t quite as big. I love the work I’m doing at Mozilla and not looking to make a change there. But I am moving across the country. Again.

In July, a week after the Mozilla Summit, I’ll be picking up and making the move to New York.

I will miss the Mozilla office. As I deliberated and thought about this decision, that was the colossal “pro” for staying in the Bay Area. I’ve made a number of friends here, even though I haven’t done a good enough job of getting to know people. Feel free to think back on your favorite, relevant Bilbo Baggins quote.

I’ll also miss the fantastic working environment Mozilla HQ offers. I joked about having 20 people in the same room talking to each other on IRC, but the truth is it’s a wonderfully collaborative environment, and having people nearby to talk through a problem will be hard to replace.

Seriously, it’s awesome.

But the East Coast, and New York in particular, has my family. And my family can take Mozilla HQ in a fight.

It’s not just about family, but family by itself is enough to sway me. It’s family and a bucket of little-to-medium things. The more I thought about it, the more of those piled up on the New York side—weather and climate, time zone, culture, music, night life, urban personality, getting rid of my car, and so on—all the little things to add up to quality of life.

(“Weather and climate?” I hear you inquire. Yes, weather. I’m a northerner. I grew up with 4 seasons—they weren’t all 3 months but they were all there—and they mark time for me. And frankly, I miss weather. The Valley has a climate, but it doesn’t really have “weather,” not in the sense I know.)

I don’t know exactly where I’ll be yet. Definitely Manhattan or Brooklyn, but I’m looking at a number of different neighborhoods, from the Upper West Side to Murray Hill to Fort Greene.

Fortunately—obviously, I suppose—I have family I can stay with for a bit while I find a place. If anyone has realtor recommendations, I’m very interested!

New York is home. It’s where I was born. It’s where my father was born. It’s the best city in the world—Paris is a surprisingly close second—and it’s where I want to be.

See the form below?

Where’s home for you?

Weekly Update for 21/06/2010

Happy Summer!

Last week saw some good work done on the Support Questions for 2.2, and we now believe we’ve ironed out the issues that we saw during the 2.1 push—one of them was actually a Django bug we fixed by upgrading.

Last week

  • Helped come up with a solution to the issues with 2.1.
  • Reviewed some UI work on 2.2.
  • Worked with IT to get replication in staging.
  • Got a plan together for a better staging environment early next quarter.
  • Helped Josh with a lightning talk topic for the Summit.
  • Worked on a plan for Q3, and goals with the team.
  • Worked with our stats provider.
  • Pushed out 1.5.4.2 (née 1.5.5.1) a minor update.

This week (me)

  • Get 2.1 out the door and 2.2 on staging.
  • Finish out a Q3 plan after working with the SUMO team.
  • Finish working with our stats provider.
  • Dig into some proper 2.2 bugs.
  • Focus more on the message I’m sending.
  • I’d really like to dig into Fantasai’s question about ellipses some more.

This week (team)

  • Push 2.1.
  • Finish the AAQ form.
  • Finish the answering process.
  • Demo the normal beginning-to-end workflow on staging.

Developing at Scale: Database Replication

When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.

One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.

Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.

Enter Replication

A common response is to set up database replication, where one database server operates as a “master,” and one or more other servers operate as “slaves.” In this setup, all of your writes to the database will go to the master, then “replicate” to the slaves, and all or most of the reads will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)

Replication introduces a new type of problem: if you naively send all reads to the slaves then data you just wrote will not be there.

La…wait for it…g

Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.

The result is that ACIDity is essentially broken, specifically the Durability part. You cannot simply write data and immediately rely on its existence.

For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone’s post may take seconds to appear on the site. This is a problem if you’re trying to show a user their post immediately after posting it.

Smarter Reading

The solution is to occasionally read from the master. When you need to access data that was just written, it is probably only available on the master, so that’s where you’ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.

Outside of a single HTTP request, this is slightly more complex. If you’re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.

One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.

A Pitch

One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn’t getting enough traffic, it’s not worth it.

Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by #webdev in IRC!

How Hulu Should Use My Data

It’s always been a little strange to me that Hulu has profiles. I suppose they’re for people who interact via the comments on videos, but the profiles seem so bland and token. It’s as if someone remembered to add them right before they shipped and then they forgot.

Specifically, the part of Hulu profiles I’m curious about is the “Favorite TV shows” and “Favorite Movies” boxes.

Hulu: don’t you know what my favorite movies and TV shows are?

Hulu should tell me my favorites. Do I ostensibly love The Simpsons but never watch it? (Yes.) Do I watch re-runs of pseudo-crime dramas like Lie To Me whenever they’re available? (Yes.) Have I watched every vampire-related bit of video available? (No. Only Buffy.)

Hulu already does recommendations, but they are surprisingly easy to miss, and seem to be based on watching one particular show or another. They’re not terrible, but when Netflix is willing to spend a million dollars to get some improvement in its recommendation engine, it’s obvious that people are looking for a little more than Hulu is giving at the moment.

I would like to see Hulu become the Last.fm of TV and film: teach me about myself, and bring that data and recommendations front and center. Put in a “Recommended Channel” and just show me stuff you think I’ll like, based not on a single movie or show, but the whole package, and what other people with similar taste also like. Show me those other people, too, and let me watch a “Neighborhood Channel.”

Go a step further and tell me what other people in my physical area are watching. You ask for my ZIP code: use it for more than marketing stats. If all my neighbors or coworkers are watching a show, maybe it’s worth checking out.

If there are two things people like, it’s being told random facts about themselves, and being told what else to like. Hulu should leverage that.