RSS
 

Posts Tagged ‘sumo’

Acronyms you should know: MTTD and MTTR

10 May

If you’re a SUMO contributor, there are two acronyms you will start to hear more often from us developers: MTTD and MTTR.

They mean “mean time to detect” and “mean time to resolve,” respectively, and they refer to how long it takes to detect an issue in production, and how long it takes to resolve that issue once it’s detected.

As we move toward continuous deployment, these are two of the metrics we’ll be using to gauge the effectiveness of our tools and processes.

For major production issues, our MTTR is actually fairly good right now—if it’s something that cannot wait until the next scheduled release, it takes us 60-90 minutes from becoming aware of an issue to pushing a fix. I think we can do better with better release processes, but we’re starting off pretty good and going to get better, which is great.

Our MTTD, on the other hand, needs work. SUMO 2.8.1 upgraded Django and included a sweeping change to our CSRF protection—this necessarily affected every form on the site. We discovered three related issues that warranted immediate hotfixes, but we didn’t discover two of them for almost two days when our contributors brought them to our attention.

It’s great that our contributors pointed out these issues to us. Our community is a critical part of “detection” and I want to encourage everyone to point out issues in the forums or IRC. It’s extremely helpful!

But there are things we can do, too, to notice things faster. One thing we are working to add is business metric graphs. We have useful data in Ganglia right now, but we will be using Graphite and Etsy‘s StatsD to peer into what our users are doing. If we deploy a change and notice that no one is previewing articles, for example, we know immediately that we have an issue and can start diagnosing and fixing it.

If you follow SUMO development, you’ll hear us start using terms like MTTD, MTTR, “detection,” more, and talking about how to reduce them. We welcome your input and ideas as we start working on these challenges. And of course, keep telling us when things are broken!

 
1 Comment

Posted in Articles

 

A brief SumoDev update

12 Mar

A little while ago, I said that I thought we got a B in Q1, but we could move up to an A with a little more work. (This is my favorite grading system: everyone starts at 0 and works up.)

Well, we landed two things:

I said these two things would bring us up to an A, so, way to go team!

We entered this quarter with 5 goals around Continuous Deployment. We’ve hit three. The other two were stretch goals, and it looks like we’ll miss them. It’s been a particularly busy quarter for IT, and there’s just a bunch of work left to get the JS tests into CI. We’ll carry those forward into Q2 along with our goals of releasing every week and dropping the ‘next’ branch.

I’m really proud of my team and the work we’ve done this quarter. We’ve not only done great work improving the user experience across the site—especially for mobile users—we’ve also made significant progress toward simplifying and streamlining our releases, which will be crucial to CD.

 
Comments Off

Posted in Articles

 

Weekly Update for 11/3/11

11 Mar

Been a busy week!

Now to start a busy weekend!

 
Comments Off

Posted in Articles

 

SUMO in Q2

02 Mar

At the end of 2010, I issued a challenge to my team: deploy support.mozilla.com continuously by the end of 2011. So, as we move into the last part of Q1, how are we doing, and what’s next?

So Far

This quarter we’ve managed to completely break free from our old svn repository. All our code is in git now. This has simplified our deployments significantly. We’ve also moved a bunch of code from difficult-to-maintain RewriteRules into the product and made it easier to manage.

We’ve still got some work to do on JavaScript unit tests and moving our crontab into version control, but these are both on track for this quarter, we just have to make sure we make the time.

Overall, I think we’ve done pretty well so far this quarter. I would give us a B, and we can still get to an A if we can tackle JS tests and crontabs.

Coming Up

The two things I’d like to see next are faster cycles and moving toward a more CD friendly way of building things. There are two challenges these help address.

Faster release cycles will get us thinking about managing time and planning work when releases aren’t centered around a big piece of functionality. Instead of starting with a big thing and picking a set of little things to lump in with it, we’ll need to balance fixing little things with work on bigger features.

By the beginning of June, I want us to release every week, whether there’s something big or not.

CD also presents a new challenge for big features: how do you deal with code that isn’t completely ready yet?

There are a couple of ways of handling this. One is to use longer-lived feature branches. That works pretty well for desktop software like Firefox where you can hand someone a binary and say “test this.” But it’s a challenge with a web app because, unless we move our staging environments around all the time, it blocks QA from testing something until it’s not only ready but, in fact, already on production.

A better solution is to use feature flags to hide new functionality until it’s ready. In this model, everything lives on the same branch (you still do feature branches for bug fix/review and for those times something does need to live a little longer before merging) and all staging servers—and eventually production—run the same code, but with a different set of flags turned on. You can turn a feature on in production after it’s been verified on stage.

This isn’t easier. It means structuring code differently, thinking about a new set of constraints. It’s what we should do.

We will screw up, probably a couple of times, before we get this down. That’s why we should start now, when our mistakes will be confined to staging servers.

In Q2 I want us to stop using the ‘next’ branch and start using Waffle to control features until they’re ready.

Not everything needs to hide behind a flag: small bug fixes obviously don’t. This may be giving us a window into what QA is like under CD: big things are manually tested before we turn them on, but little fixes, already verified by the developer and reviewer, make it through to production without manual intervention from QA.

We also need to start planning time to remove flags and dead code once a feature has shipped. That is part of the development cost of a feature, and will hopefully be offset by reducing the time it takes to get something to production once it is ready.

Big Changes

While Q1 has been about a larger set of smaller, more concrete and isolated changes, Q2 is going to be about a smaller set of much bigger changes.

Q1 was about changing our code. Q2 is about changing our thinking.

I’m excited about diving into these challenges. I can’t wait for Q2.

 
4 Comments

Posted in Articles

 

The future of SUMO development

27 Dec

Just this month, the SUMO development team completed our transition to our new platform, Kitsune. This small release represented the culmination of nearly a year of great work, and I couldn’t be prouder of the team.

At first, as the end of the tunnel approached, and we weren’t sure what we’d be doing next, I felt a little like Inigo Montoya: I’d been in the rewrite business so long, now that it’s over…

2010 was a year of investment in our code, and in our infrastructure, both hardware and software. We moved our code from svn to git; started using Hudson to run our comprehensive unit test suite; centralized our configuration to simplify deployment. Now it’s time to start taking advantage of that investment.

In that spirit, I want to issue a John Kennedy-style challenge: by this time next year, I want SUMO to deploy continuously.

There is a lot of work to do to get there.

  • Pushing code to production needs to be automated, for all but the biggest, downtime-requiring changes.
  • We need to expand automated test coverage to include our front-end and JavaScript code.
  • Our code reviews and standards will need to be even higher.
  • We’ll need to redefine the relationship between development and QA.
  • When there are problems, we need the agility to respond quickly and the focus to learn from them and improve.
  • We’ll need to be able to dark launch and flip on features, preferably with the flexibility to test with small groups.
  • We’ll need to reevaluate our branch management and staging environments.
  • We’ll need to rethink how we organize and prioritize work.

And unlike 2010, we’ll have to make these investments and improvements while maintaining a fast-paced development schedule.

This is not something the SUMOdev team can do alone: this is a challenge for us, for our ops team, and for QA as well. I look forward to working more closely with these teams as we chase this target.

Continuous deployment will bring a number of benefits—many of the requirements I just listed are benefits and solid goals by themselves—especially for contributors.

  • Bugs are fixed for everyone as soon as they’re fixed for us.
  • We can respond to issues faster.
  • Code will be tested at production scale.
  • Our processes will continually improve.
  • We’ll reduce the load on individuals from IT and QA.

I call this a challenge because it will not be easy: it will be hard. It will push the bounds of our experience and our ingenuity. SUMO will be better, and we will be better.

How do we get there? Frankly, I don’t know yet. There are some clear, actionable items, like front-end testing and feature flags, and there is some brainstorming to do. There will be unanticipated hurdles to overcome and we will almost certainly make some missteps, and we will have to work around those.

In 2010, we pushed ourselves in terms of how much we could get do, and while we accomplished an incredible amount, that’s not a healthy pace to set for another year. In 2011, I want us to push ourselves in terms of what we can do, and how we do it.

2011 will be a great year for SUMOdev. And with this challenge, I just want to say: Game On.

 
4 Comments

Posted in Articles