SUMO in Q2

At the end of 2010, I issued a challenge to my team: deploy support.mozilla.com continuously by the end of 2011. So, as we move into the last part of Q1, how are we doing, and what’s next?

So Far

This quarter we’ve managed to completely break free from our old svn repository. All our code is in git now. This has simplified our deployments significantly. We’ve also moved a bunch of code from difficult-to-maintain RewriteRules into the product and made it easier to manage.

We’ve still got some work to do on JavaScript unit tests and moving our crontab into version control, but these are both on track for this quarter, we just have to make sure we make the time.

Overall, I think we’ve done pretty well so far this quarter. I would give us a B, and we can still get to an A if we can tackle JS tests and crontabs.

Coming Up

The two things I’d like to see next are faster cycles and moving toward a more CD friendly way of building things. There are two challenges these help address.

Faster release cycles will get us thinking about managing time and planning work when releases aren’t centered around a big piece of functionality. Instead of starting with a big thing and picking a set of little things to lump in with it, we’ll need to balance fixing little things with work on bigger features.

By the beginning of June, I want us to release every week, whether there’s something big or not.

CD also presents a new challenge for big features: how do you deal with code that isn’t completely ready yet?

There are a couple of ways of handling this. One is to use longer-lived feature branches. That works pretty well for desktop software like Firefox where you can hand someone a binary and say “test this.” But it’s a challenge with a web app because, unless we move our staging environments around all the time, it blocks QA from testing something until it’s not only ready but, in fact, already on production.

A better solution is to use feature flags to hide new functionality until it’s ready. In this model, everything lives on the same branch (you still do feature branches for bug fix/review and for those times something does need to live a little longer before merging) and all staging servers—and eventually production—run the same code, but with a different set of flags turned on. You can turn a feature on in production after it’s been verified on stage.

This isn’t easier. It means structuring code differently, thinking about a new set of constraints. It’s what we should do.

W**e will screw up, probably a couple of times, before we get this down. That’s why we should start now, when our mistakes will be confined to staging servers.

In Q2 I want us to stop using the ‘next’ branch and start using Waffle to control features until they’re ready.

Not everything needs to hide behind a flag: small bug fixes obviously don’t. This may be giving us a window into what QA is like under CD: big things are manually tested before we turn them on, but little fixes, already verified by the developer and reviewer, make it through to production without manual intervention from QA.

We also need to start planning time to remove flags and dead code once a feature has shipped. That is part of the development cost of a feature, and will hopefully be offset by reducing the time it takes to get something to production once it is ready.

Big Changes

While Q1 has been about a larger set of smaller, more concrete and isolated changes, Q2 is going to be about a smaller set of much bigger changes.

Q1 was about changing our code. Q2 is about changing our thinking.

I’m excited about diving into these challenges. I can’t wait for Q2.