<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coffee on the Keyboard &#187; sumo</title>
	<atom:link href="http://coffeeonthekeyboard.com/tag/sumo/feed/" rel="self" type="application/rss+xml" />
	<link>http://coffeeonthekeyboard.com</link>
	<description>by James Socol</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:33:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Acronyms you should know: MTTD and MTTR</title>
		<link>http://coffeeonthekeyboard.com/acronyms-you-should-know-mttd-and-mttr-597/</link>
		<comments>http://coffeeonthekeyboard.com/acronyms-you-should-know-mttd-and-mttr-597/#comments</comments>
		<pubDate>Tue, 10 May 2011 19:32:47 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[continuous deployment]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=597</guid>
		<description><![CDATA[If you&#8217;re a SUMO contributor, there are two acronyms you will start to hear more often from us developers: MTTD and MTTR. They mean &#8220;mean time to detect&#8221; and &#8220;mean time to resolve,&#8221; respectively, and they refer to how long it takes to detect an issue in production, and how long it takes to resolve that [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re a <a href="https://support.mozilla.com/">SUMO</a> contributor, there are two acronyms you will start to hear more often from us developers: <strong>MTTD</strong> and <strong>MTTR</strong>.</p>
<p>They mean &#8220;<strong>mean time to detect</strong>&#8221; and &#8220;<strong>mean time to resolve</strong>,&#8221; respectively, and they refer to how long it takes to <strong>detect</strong> an issue in production, and how long it takes to <strong>resolve</strong> that issue once it&#8217;s detected.</p>
<p>As we move toward <strong><a href="http://coffeeonthekeyboard.com/the-future-of-sumo-development-511/">continuous deployment</a></strong>, these are two of the metrics we&#8217;ll be using to gauge the effectiveness of our tools and processes.</p>
<p>For major production issues, <strong>our MTTR is actually fairly good</strong> right now—if it&#8217;s something that cannot wait until the next scheduled release, it takes us 60-90 minutes from becoming aware of an issue to pushing a fix. I think we can do better with better release processes, but we&#8217;re starting off pretty good and going to <strong>get better</strong>, which is great.</p>
<p>Our <strong>MTTD</strong>, on the other hand, needs work. <a href="http://moxie.jamessocol.com/bugstats/sumo/2.8.1">SUMO 2.8.1</a> upgraded <a href="http://www.djangoproject.com/">Django</a> and included a sweeping change to our CSRF protection—this necessarily affected every form on the site. We discovered three related issues that warranted immediate hotfixes, but we didn&#8217;t discover two of them for <strong>almost two days</strong> when our contributors brought them to our attention.</p>
<p>It&#8217;s <strong>great</strong> that our contributors pointed out these issues to us. Our <strong>community is a critical part</strong> of &#8220;detection&#8221; and I want to encourage everyone to point out issues in the <a href="https://support.mozilla.com/forums/contributors">forums</a> or <a href="irc://irc.mozilla.org/sumodev">IRC</a>. It&#8217;s extremely helpful!</p>
<p>But there are things we can do, too, to <strong>notice things faster</strong>. One thing we are working to add is <a href="http://codeascraft.etsy.com/2010/12/08/track-every-release/"><strong>business metric graphs</strong></a>. We have useful data in <a href="http://ganglia.sourceforge.net/">Ganglia</a> right now, but we will be using <a href="http://graphite.wikidot.com/">Graphite</a> and <a href="http://codeascraft.etsy.com/">Etsy</a>&#8216;s <a href="https://github.com/etsy/statsd">StatsD</a> to peer into <strong>what our users are doing</strong>. If we deploy a change and notice that no one is <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=654827">previewing articles</a>, for example, we know immediately that we have an issue and can <strong>start diagnosing and fixing it</strong>.</p>
<p>If you follow SUMO development, you&#8217;ll hear us start using terms like MTTD, MTTR, &#8220;detection,&#8221; more, and talking about how to reduce them. We welcome your input and ideas as we start working on these challenges. And of course, keep telling us when things are broken!</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/acronyms-you-should-know-mttd-and-mttr-597/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A brief SumoDev update</title>
		<link>http://coffeeonthekeyboard.com/a-brief-sumodev-update-578/</link>
		<comments>http://coffeeonthekeyboard.com/a-brief-sumodev-update-578/#comments</comments>
		<pubDate>Sat, 12 Mar 2011 18:54:52 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[continuous deployment]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=578</guid>
		<description><![CDATA[A little while ago, I said that I thought we got a B in Q1, but we could move up to an A with a little more work. (This is my favorite grading system: everyone starts at 0 and works up.) Well, we landed two things: Initial JavaScript tests. (And for showfor.) Stuck a crontab [...]]]></description>
			<content:encoded><![CDATA[<p>A <a title="SUMO in Q2" href="http://coffeeonthekeyboard.com/sumo-in-q2-563/">little while ago</a>, I said that I thought we got a B in Q1, but we could move up to an A with a little more work. (This is my favorite grading system: everyone starts at 0 and works up.)</p>
<p>Well, we landed two things:</p>
<ul>
<li>Initial <a href="https://github.com/jsocol/kitsune/commit/1b03a4de5">JavaScript tests</a>. (And for <a href="https://github.com/jsocol/kitsune/commit/c28ee79bfc0">showfor</a>.)</li>
<li>Stuck a <a href="https://github.com/jsocol/kitsune/commit/bb98aede2">crontab</a> in git.</li>
</ul>
<p>I said these two things would bring us up to an A, so, way to go team!</p>
<p>We entered this quarter with 5 goals around <a href="https://wiki.mozilla.org/Support:Sumodev/Continuous_Deployment#Q1_2011">Continuous Deployment</a>. We&#8217;ve hit three. The other two were stretch goals, and it looks like we&#8217;ll miss them. It&#8217;s been a particularly busy quarter for IT, and there&#8217;s just a bunch of work left to get the JS tests into CI. We&#8217;ll carry those forward into Q2 along with our goals of releasing every week and dropping the &#8216;next&#8217; branch.</p>
<p>I&#8217;m really proud of my team and the work we&#8217;ve done this quarter. We&#8217;ve not only done great work improving the user experience across the site—especially for mobile users—we&#8217;ve also made significant progress toward simplifying and streamlining our releases, which will be crucial to CD.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/a-brief-sumodev-update-578/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weekly Update for 11/3/11</title>
		<link>http://coffeeonthekeyboard.com/weekly-update-for-11311-575/</link>
		<comments>http://coffeeonthekeyboard.com/weekly-update-for-11311-575/#comments</comments>
		<pubDate>Fri, 11 Mar 2011 23:55:58 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[node python]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>
		<category><![CDATA[weekly update]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=575</guid>
		<description><![CDATA[Been a busy week! Helped run down an issue with our ads on Reddit. Updated django-multidb-router. Learned a little about ContextDecorator and how to do that in Python 2.6. Shipped SUMO 2.6.1. Wrapped up SUMO 2.6.2, ready to go this weekend. Rolling through activity logs in SUMO 2.7. Building models. Reviewing code. Did some work [...]]]></description>
			<content:encoded><![CDATA[<p>Been a busy week!</p>
<ul>
<li>Helped run down an issue with our ads on Reddit.</li>
<li>Updated <a href="https://github.com/jbalogh/django-multidb-router">django-multidb-router</a>.
<ul>
<li>Learned a little about <a href="http://docs.python.org/dev/whatsnew/3.2.html#contextlib">ContextDecorator </a>and how to do that in Python 2.6.</li>
</ul>
</li>
<li>Shipped <a href="http://moxie.jamessocol.com/bugstats/sumo/2.6.1">SUMO 2.6.1</a>.</li>
<li>Wrapped up <a href="http://moxie.jamessocol.com/bugstats/sumo/2.6.2">SUMO 2.6.2,</a> ready to go this weekend.</li>
<li>Rolling through activity logs in <a href="http://moxie.jamessocol.com/bugstats/sumo/2.7">SUMO 2.7</a>.
<ul>
<li>Building models.</li>
<li>Reviewing code.</li>
</ul>
</li>
<li>Did some work on <a href="http://rasputinproject.org">Rasputin</a>:
<ul>
<li>Github let me have the URL <a href="https://github.com/rasputin">github.com/rasputin</a>!</li>
<li><a href="https://github.com/rasputin/rasputin-node">rasputin-node</a> is way more node-like.
<ul>
<li>I could really use input on the <a href="https://github.com/rasputin/rasputin-node/tree/amqp">AMQPBackend</a>.</li>
</ul>
</li>
<li>Added better threaded and multiprocess dispatchers to <a href="https://github.com/rasputin/rasputin">rasputin for Django</a>.
<ul>
<li>I think I need to punt on the logging module and go with <a href="https://github.com/mitsuhiko/logbook">Logbook</a>.</li>
<li>Want to get an AMQPBackend started.</li>
</ul>
</li>
</ul>
</li>
<li>Updated <a href="https://github.com/jsocol/logbot">logbot</a> to work with node 0.5.0 and <a href="https://github.com/indexzero/daemon.node">daemonize.node</a>.</li>
<li>Fixed a <a href="https://github.com/jsocol/bleach/commit/d9f2cf6cc">Bleach bug</a>.</li>
<li>Spent a long time tracking down an issue with celery: it just won&#8217;t quit. Literally, it doesn&#8217;t much care about SIGINT and SIGTERM.</li>
</ul>
<p>Now to start a busy weekend!</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/weekly-update-for-11311-575/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SUMO in Q2</title>
		<link>http://coffeeonthekeyboard.com/sumo-in-q2-563/</link>
		<comments>http://coffeeonthekeyboard.com/sumo-in-q2-563/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 21:28:13 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[kitsune]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[waffle]]></category>
		<category><![CDATA[web dev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=563</guid>
		<description><![CDATA[At the end of 2010, I issued a challenge to my team: deploy support.mozilla.com continuously by the end of 2011. So, as we move into the last part of Q1, how are we doing, and what&#8217;s next? So Far This quarter we&#8217;ve managed to completely break free from our old svn repository. All our code [...]]]></description>
			<content:encoded><![CDATA[<p>At the end of 2010, I issued a challenge to my team: <a title="The future of SUMO development" href="http://coffeeonthekeyboard.com/the-future-of-sumo-development-511/">deploy support.mozilla.com continuously</a> by the end of 2011. So, as we move into the last part of Q1, how are we doing, and what&#8217;s next?</p>
<h3>So Far</h3>
<p>This quarter we&#8217;ve managed to completely <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=616703">break free from our old svn</a> repository. All our <a href="https://github.com/jsocol/kitsune">code is in git</a> now. This has simplified our deployments significantly. We&#8217;ve also <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=616702">moved a bunch of code</a> from difficult-to-maintain RewriteRules into the product and made it easier to manage.</p>
<p>We&#8217;ve still got some work to do on JavaScript unit tests and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=625376">moving our crontab into version control</a>, but these are both on track for this quarter, we just have to make sure we make the time.</p>
<p>Overall, I think we&#8217;ve done pretty well so far this quarter. I would give us a B, and we can still get to an A if we can tackle JS tests and crontabs.</p>
<h3>Coming Up</h3>
<p>The two things I&#8217;d like to see next are <strong>faster cycles</strong> and moving toward a <strong>more CD friendly</strong> way of building things. There are two challenges these help address.</p>
<p>Faster release cycles will get us thinking about managing time and planning work when releases aren&#8217;t centered around a big piece of functionality. Instead of starting with a big thing and picking a set of little things to lump in with it, we&#8217;ll need to balance fixing little things with work on bigger features.</p>
<p>By the beginning of June, I want us to <strong>release every week</strong>, whether there&#8217;s something big or not.</p>
<p>CD also presents a new challenge for big features: how do you deal with code that isn&#8217;t completely ready yet?</p>
<p>There are a couple of ways of handling this. One is to use longer-lived feature branches. That works pretty well for desktop software like Firefox where you can hand someone a binary and say &#8220;test this.&#8221; But it&#8217;s a challenge with a web app because, unless we move our staging environments around all the time, it blocks QA from testing something until it&#8217;s not only ready but, in fact, <em>already on production</em>.</p>
<p>A better solution is to use feature flags to hide new functionality until it&#8217;s ready. In this model, everything lives on the same branch (you still do feature branches for bug fix/review and for those times something does need to live a little longer before merging) and all staging servers—and eventually production—run the same code, but with a different set of flags turned on. You can turn a feature on in production after it&#8217;s been verified on stage.</p>
<p>This isn&#8217;t easier. It means structuring code differently, thinking about a new set of constraints. It&#8217;s what we should do.</p>
<p><em>W</em><em>e will screw up</em>, probably a couple of times, before we get this down. That&#8217;s why we should start now, when our mistakes will be confined to staging servers.</p>
<p>In Q2 I want us to <strong>stop using the &#8216;next&#8217; branch</strong> and start using <a title="Introducing Waffle for Django" href="http://coffeeonthekeyboard.com/introducing-waffle-for-django-541/">Waffle</a> to control features until they&#8217;re ready.</p>
<p>Not everything needs to hide behind a flag: small bug fixes obviously don&#8217;t. This may be giving us a window into what QA is like under CD: big things are manually tested before we turn them on, but little fixes, already verified by the developer and reviewer, make it through to production without manual intervention from QA.</p>
<p>We also need to start planning time to remove flags and dead code once a feature has shipped. That is part of the development cost of a feature, and will hopefully be offset by reducing the time it takes to get something to production once it is ready.</p>
<h3>Big Changes</h3>
<p>While Q1 has been about a larger set of smaller, more concrete and isolated changes, Q2 is going to be about a smaller set of much bigger changes.</p>
<p>Q1 was about changing our code. Q2 is about changing our thinking.</p>
<p>I&#8217;m excited about diving into these challenges. I can&#8217;t wait for Q2.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/sumo-in-q2-563/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The future of SUMO development</title>
		<link>http://coffeeonthekeyboard.com/the-future-of-sumo-development-511/</link>
		<comments>http://coffeeonthekeyboard.com/the-future-of-sumo-development-511/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 21:56:24 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[kitsune]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[sumo]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=511</guid>
		<description><![CDATA[Just this month, the SUMO development team completed our transition to our new platform, Kitsune. This small release represented the culmination of nearly a year of great work, and I couldn&#8217;t be prouder of the team. At first, as the end of the tunnel approached, and we weren&#8217;t sure what we&#8217;d be doing next, I [...]]]></description>
			<content:encoded><![CDATA[<p>Just this month, the <a href="http://support.mozilla.com/">SUMO</a> development team completed our transition to our new platform, <a href="http://github.com/jsocol/kitsune">Kitsune</a>. This small release represented the culmination of nearly a year of great work, and I couldn&#8217;t be prouder of the team.</p>
<p>At first, as the end of the tunnel approached, and we weren&#8217;t sure what we&#8217;d be doing next, I felt a little like Inigo Montoya: I&#8217;d been in the rewrite business so long, now that it&#8217;s over&#8230;</p>
<p>2010 was a year of investment in our code, and in our infrastructure, both hardware and software. We moved our code from svn to <a href="http://github.com/jsocol/kitsune">git</a>; started using Hudson to run our comprehensive <a href="https://hudson.mozilla.org/job/sumo-master/">unit test suite</a>; centralized our configuration to simplify deployment. Now it&#8217;s time to start taking advantage of that investment.</p>
<p>In that spirit, I want to issue a John Kennedy-style challenge: <strong>by this time next year, I want SUMO to <a href="http://radar.oreilly.com/2009/03/continuous-deployment-5-eas.html">deploy</a> <a href="http://toni.org/2010/05/19/in-praise-of-continuous-deployment-the-wordpress-com-story/">continuously</a></strong>.</p>
<p>There is a lot of work to do to get there.</p>
<ul>
<li><strong>Pushing code to production needs to be automated</strong>, for all but the biggest, downtime-requiring changes.</li>
<li>We need to <strong>expand automated test coverage</strong> to include our <strong>front-end and JavaScript</strong> code.</li>
<li>Our code reviews and <strong>standards</strong> will need to be <strong>even higher</strong>.</li>
<li>We&#8217;ll need to <strong>redefine the relationship between development and QA</strong>.</li>
<li>When there are problems, we need the <strong>agility to respond quickly</strong> and the focus to <a href="http://www.startuplessonslearned.com/2008/11/five-whys.html">learn from them</a> and improve.</li>
<li>We&#8217;ll need to be able to <strong><a href="http://agiletesting.blogspot.com/2009/07/dark-launching-and-other-lessons-from.html">dark launch</a> and <a href="http://code.flickr.com/blog/2009/12/02/flipping-out/">flip on features</a></strong>, preferably with the flexibility to test with small groups.</li>
<li>We&#8217;ll need to <strong>reevaluate our branch management</strong> and <strong>staging environments</strong>.</li>
<li>We&#8217;ll need to <strong>rethink how we organize and prioritize work</strong>.</li>
</ul>
<p>And unlike 2010, we&#8217;ll have to make these investments and improvements while maintaining a fast-paced development schedule.</p>
<p>This is not something the SUMOdev team can do alone: this is a challenge for us, for our ops team, and for QA as well. I look forward to working more closely with these teams as we chase this target.</p>
<p>Continuous deployment will bring a number of benefits—many of the requirements I just listed are benefits and solid goals by themselves—especially for contributors.</p>
<ul>
<li>Bugs are <strong>fixed for everyone</strong> as soon as they&#8217;re fixed for us.</li>
<li>We can <strong>respond to issues</strong> faster.</li>
<li>Code will be <strong>tested at production scale</strong>.</li>
<li>Our <strong>processes will continually improve</strong>.</li>
<li>We&#8217;ll <strong>reduce the load</strong> on individuals from <strong>IT and QA</strong>.</li>
</ul>
<p>I call this a challenge because it will not be easy: it will be hard. It will push the bounds of our experience and our ingenuity. SUMO will be better, and we will be better.</p>
<p>How do we get there? Frankly, I don&#8217;t know yet. There are some clear, actionable items, like front-end testing and feature flags, and there is some brainstorming to do. There will be unanticipated hurdles to overcome and we will almost certainly make some missteps, and we will have to work around those.</p>
<p>In 2010, we pushed ourselves in terms of <em>how much</em> we could get do, and while we accomplished an incredible amount, that&#8217;s not a healthy pace to set for another year. In 2011, I want us to push ourselves in terms of <em>what</em> we can do, and <em>how</em> we do it.</p>
<p><strong>2011 will be a great year</strong> for SUMOdev. And with this challenge, I just want to say: <strong>Game On.</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/the-future-of-sumo-development-511/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>An End and a Beginning</title>
		<link>http://coffeeonthekeyboard.com/an-end-and-a-beginning-486/</link>
		<comments>http://coffeeonthekeyboard.com/an-end-and-a-beginning-486/#comments</comments>
		<pubDate>Wed, 03 Nov 2010 13:47:31 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[kitsune]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sumo]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=486</guid>
		<description><![CDATA[2010 is coming to a close, and, with it, the end of our year-long project to create a new platform for support.mozilla.com (SUMO) is in sight. For the past year, developing the new platform has been our focus and has effected our roadmap. When 2011 starts, we&#8217;ll begin a new chapter for SUMO. It&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p>2010 is coming to a close, and, with it, the end of our year-long project to create a new platform for <a href="http://support.mozilla.com/">support.mozilla.com</a> (SUMO) is in sight.</p>
<p>For the past year, developing the new platform has been our focus and has effected our roadmap. When 2011 starts, we&#8217;ll begin a new chapter for SUMO. It&#8217;s a very exciting time!</p>
<p>For the developers, this is the end of the investment phase and the beginning of the payoff for <a href="http://github.com/jsocol/kitsune">Kitsune</a>, the code-name for our new platform.</p>
<ul>
<li>The entire site will be faster.</li>
<li>We&#8217;ll be done rebuilding existing features, and can work on brand new features.</li>
<li>We&#8217;ll be free of our legacy code base, which will simplify some important sections of Kitsune.</li>
<li>We&#8217;ll be working on smaller, faster cycles.</li>
<li>We&#8217;ll be able to take time to circle back to fix things we&#8217;ve been unhappy about, but willing to live with during the migration to Kitsune.</li>
<li>We&#8217;ll be more effective at making the site even faster.</li>
<li>We&#8217;ll apply our <a href="http://master.support.mozilla.com/">new theme</a> to the entire site, making the experience more consistent and seamless.</li>
<li>We&#8217;ll be more agile, able to respond to issues faster.</li>
<li>We&#8217;ll be able to <a href="http://github.com/jsocol/kitsune">parallelize more</a>.</li>
<li>We&#8217;ll be able to push updates to the site far more frequently—and we&#8217;ve averaged releases every two weeks since August!</li>
<li>We&#8217;ll be able to take the time we need for large features and disruptive changes without blocking work on, or release of, smaller features and fixes.</li>
<li>Nagging issues with sessions will go away.</li>
<li>It will be easier to keep our entire platform up to date.</li>
<li>We&#8217;ll be free of an entire class of security issues.</li>
</ul>
<p>We&#8217;re just beginning to work on our roadmap for Q1, 2011. A lot of it is still up in the air, but there are some fun things on there. And we&#8217;ll be taking time to improve performance even more.</p>
<p>In the meantime, we&#8217;re getting very close to feature complete on <a href="http://moxie.jamessocol.com/bugstats/sumo/2.3">SUMO 2.3</a>, which will move the Knowledge Base over to the new platform.</p>
<p>After 2.3, there will be only one major release left in 2010, SUMO 2.4. SUMO 2.4 will be much smaller than 2.3—maybe 1/10th the number of bugs, and come out in a matter of weeks instead of months. But 2.4 will also be huge, in that it will move the final piece over to the new platform.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/an-end-and-a-beginning-486/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Developing at Scale: Database Replication</title>
		<link>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/</link>
		<comments>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 16:10:15 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=444</guid>
		<description><![CDATA[When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true. One reason is load. A popular [...]]]></description>
			<content:encoded><![CDATA[<p>When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.</p>
<p>One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.</p>
<p>Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.</p>
<h3>Enter Replication</h3>
<p>A common response is to set up database replication, where one database server operates as a &#8220;master,&#8221; and one or more other servers operate as &#8220;slaves.&#8221; In this setup, all of your <em>writes</em> to the database will go to the master, then &#8220;replicate&#8221; to the slaves, and all or most of the <em>reads</em> will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)</p>
<p>Replication introduces a new type of problem: if you naively send <em>all</em> reads to the slaves then data you just wrote <em>will not be there</em>.</p>
<h3>La&#8230;wait for it&#8230;g</h3>
<p>Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.</p>
<p>The result is that <a href="http://en.wikipedia.org/wiki/ACID">ACIDity</a> is essentially broken, specifically the <strong>D</strong>urability part. You cannot simply write data and immediately rely on its existence.</p>
<p>For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone&#8217;s post may take seconds to appear on the site. This is a problem if you&#8217;re trying to show a user their post immediately after posting it.</p>
<h3>Smarter Reading</h3>
<p>The solution is to occasionally read from the master. When you need to access data that was just written, it is <em>probably</em> only available on the master, so that&#8217;s where you&#8217;ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.</p>
<p>Outside of a single HTTP request, this is slightly more complex. If you&#8217;re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.</p>
<p>One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.</p>
<h3>A Pitch</h3>
<p>One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn&#8217;t getting enough traffic, it&#8217;s not worth it.</p>
<p>Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by <a href="irc://irc.mozilla.org/webdev">#webdev</a> in IRC!</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Weekly Update for 06/14/2010</title>
		<link>http://coffeeonthekeyboard.com/weekly-update-for-06142010-438/</link>
		<comments>http://coffeeonthekeyboard.com/weekly-update-for-06142010-438/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 04:58:26 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>
		<category><![CDATA[weekly update]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=438</guid>
		<description><![CDATA[Last week could have gone better. We tried to push SUMO 2.1 twice only to realize we had some issues with respect to replication that need to get ironed out. We think we have a fix for these issues and are rounding out the tests for that fix, but we won&#8217;t really know unless we [...]]]></description>
			<content:encoded><![CDATA[<p>Last week could have gone better. We tried to push SUMO 2.1 twice only to realize we had some issues with respect to replication that need to get ironed out.</p>
<p>We think we have a fix for these issues and are rounding out the tests for that fix, but we won&#8217;t <em>really</em> know unless we can test in a replicated environment. There are bugs open for IT to help us with that, and get replication set up for our staging server.</p>
<p>As <a href="http://morgamic.com/">Morgamic</a> said, we&#8217;ll gather info, document, learn and innovate, then repeat next time.</p>
<p>And, as &#8220;unsuccessful&#8221; pushes go, these went really well. Both times we gave it an hour, then were able to back everything out and reset in another half-hour, coming in well under the downtime window.</p>
<p><strong>Last week</strong></p>
<ul>
<li>Tried to push 2.1, twice. It didn&#8217;t take.</li>
<li>Filed IT bugs re: replication in staging.</li>
<li>Started thinking about Q3 goals.</li>
<li>Got the 2.2 (&#8220;questions&#8221;) branch rolling on Hudson.</li>
<li>Helped get people on the same page w/r/t 2.3 deliverables and timeline. (At least we&#8217;ll say I helped.)</li>
<li>Got all the people working on chat together.</li>
<li>Worked out a potential solution to our replication issues with Jeff and Erik.</li>
<li>Triaged 2.2—only about 5 bugs got moved out.</li>
<li>Reviewed 2.2 UI work, and a number of subsequent patches.</li>
</ul>
<p><strong>This week (me)<br />
</strong></p>
<ul>
<li>Have a timeline in place for 2.1 and 2.2.</li>
<li>Figure out replication in staging with IT.</li>
<li>Work out roughly what Q3 will look like.</li>
<li>1.5.5.1.</li>
<li>Get enough sleep.</li>
</ul>
<p><strong>This week (team)</strong></p>
<ul>
<li>Fix our replication issues.</li>
<li>Continue on 2.2 and accelerate.</li>
<li>Work with Cheng and Howse to get the AAQ done.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/weekly-update-for-06142010-438/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weekly Update for 07/06/2010</title>
		<link>http://coffeeonthekeyboard.com/weekly-update-for-07062010-427/</link>
		<comments>http://coffeeonthekeyboard.com/weekly-update-for-07062010-427/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 05:23:37 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>
		<category><![CDATA[weekly update]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=427</guid>
		<description><![CDATA[I missed last week. I blame the holiday on Monday. Also Erik started, which is very exciting! Tomorrow afternoon is our planned push for SUMO 2.1, which is our new discussion forum component, and migrating the old data into that component. This is huge, since it&#8217;s the first new component serving content creation. (We&#8217;ve been [...]]]></description>
			<content:encoded><![CDATA[<p>I missed last week. I blame <a href="http://en.wikipedia.org/wiki/Memorial_Day">the holiday</a> on Monday. Also Erik started, which is very exciting!</p>
<p>Tomorrow afternoon is our planned <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=570656">push for SUMO 2.1</a>, which is our new discussion forum component, and migrating the old data into that component. This is huge, since it&#8217;s the first new component serving content creation. (We&#8217;ve been running search results on Kitsune for a while now.)</p>
<p>Everything on our <a href="http://support-stage-new.mozilla.com/en-US/forums/contributors">staging server</a> feels faster on the Kitsune pages than the old pages. I can&#8217;t quite count to 2 loading a Kitsune page. I can usually get to 3 on a Tiki page. That in itself is a huge win, to me. On top of the speed, we&#8217;ve made some big leaps in our infrastructure and have done a lot of work that will directly enable 2.2, our support questions milestone</p>
<p><strong>Last two weeks</strong></p>
<ul>
<li>Closed out and reviewed a number of 2.1 bugs. Really, <a href="http://github.com/jsocol/kitsune/graphs/impact">I lost count</a>. It&#8217;s been fantastic to see the work spread out across the team.</li>
<li>Got <a href="http://moxie.jamessocol.com/bugstats/sumo/2.1">2.1</a> ready to go out tomorrow!(!!)</li>
<li>Fixed a number of small and last-minute 2.1 bugs.</li>
<li>Helped Paul finish out the data migration work so he could focus on his last final and graduate! (Congrats, Paul!)</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=555896">Built avatars</a> for <a href="http://support-stage-new.mozilla.com/en-US/forums/contributors/670992">users without them</a>.</li>
<li>Got <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=563991">email notifications</a> to go out with some help from Jeremy.</li>
<li>Welcomed Erik. Helped him get all the development environment stuff worked out.</li>
</ul>
<p><strong>This week (me)</strong></p>
<ul>
<li>Get everyone introduced to Rypple.</li>
<li>Navigate a smooth 2.1 launch.</li>
<li>Poll the team about a SUMOdev on-site.</li>
<li>Finish reviewing Ricky&#8217;s 2.2 UI work.</li>
<li>Triage 2.2 and focus the bugs.</li>
<li>Help everyone figure out deliverables and timelines for 2.3 mockups as best I can. (Mostly I&#8217;ve done what I can here, I think.)</li>
</ul>
<p><strong>This week (team)</strong></p>
<ul>
<li>Launch 2.1. Smoothly.</li>
<li>Go go go on 2.2. On staging the day (evening?) after 2.1 launches.</li>
<li>Start spreading around reviews more.</li>
<li>Help triage 2.2 and focus it.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/weekly-update-for-07062010-427/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Surviving Pac Man</title>
		<link>http://coffeeonthekeyboard.com/surviving-pac-man-422/</link>
		<comments>http://coffeeonthekeyboard.com/surviving-pac-man-422/#comments</comments>
		<pubDate>Mon, 24 May 2010 06:09:36 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[ddos]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[pac man]]></category>
		<category><![CDATA[spike]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[traffic]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=422</guid>
		<description><![CDATA[On Friday, Google showed off a fun new doodle in honor of the 30th anniversary of Pac Man: a Pac Man clone, complete with sounds. Unfortunately, in the initial release, those sounds started playing automatically—an oversight or an homage to &#60;bgsound&#62;, I guess. Even if Google was open in a background tab or window, or [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday, Google showed off a fun new <a href="http://www.google.com/logos/">doodle</a> in honor of the 30th anniversary of Pac Man: a <a href="http://www.google.com/pacman/">Pac Man clone</a>, complete with sounds.</p>
<p>Unfortunately, in the initial release, those sounds started playing automatically—an oversight or an homage to <code>&lt;bgsound&gt;</code>, I guess. Even if Google was open in a background tab or window, or in a hidden iframe created by an <a href="https://addons.mozilla.org/en-US/firefox/addon/2207/">add-on</a>, the Pac Man music and sound effects would start.</p>
<p>And that <a href="http://support.mozilla.com/en-US/forum/1/678028">confused some people</a>.</p>
<p>Many people came to <a href="http://support.mozilla.com/">SUMO</a> looking for an explanation, and many of them, not finding anything in the knowledge base, started posting to our forum. So many, in fact, that our database server started running out of connections.</p>
<p>The pounding we took on the forums also caused replication on our slave databases to fall behind by as much as 1.25 hours, so even when we wrote an article about the noises [article has been removed], it didn&#8217;t show up for most people.</p>
<p>As Sean put it: &#8220;We just got DDOSed by Pac Man.&#8221;</p>
<p>To shore up the site and bring it back from the brink of toppling over, we worked with IT (thanks, Dave!) to implement a number of temporary solutions. We&#8230;</p>
<ul>
<li>&#8230;disabled a particular kind of slow, frequent, and useless query.*</li>
<li>&#8230;blocked Google&#8217;s crawler from indexing the site.</li>
<li>&#8230;disabled our own sumobot&#8217;s forum-crawling features.</li>
<li>&#8230;rotated DB slaves out of the production pool to allow them to catch up.</li>
</ul>
<p>Google has already removed the Pac Man doodle from their home page, and we can revert most of the emergency measures here on Monday. But the event does remind us to look at what we&#8217;re doing in Kitsune, our rewrite, to weather storms like this in the future.</p>
<p>One idea, suggested by <a href="http://davedash.com/">Dave Dash</a>, is a read-only mode where all pages that can trigger database writes are temporarily disabled. We&#8217;ll be looking pretty seriously at this over the next couple of days.</p>
<p>Another important take-away is to make damn sure pages only trigger database writes if they really need to. Writes can never bounce off a cache, so they are very expensive.</p>
<p>Finally, we should be more proactive in how we interact with our Zeus cache. We&#8217;ll also think about whether it makes sense to start using <a href="http://micropipes.com/blog/">Wil Clouser&#8217;s</a> Zeus interface, <a href="http://github.com/clouserw/hera">Hera</a>, sooner than later.</p>
<p>&#8220;Too much traffic&#8221; is the best problem a web development team can have. Hopefully, the first time this happens to Kitsune, we&#8217;ll be ready.</p>
<p>* The queries that increment the number of views a forum thread has gotten are particularly slow for some reason. They&#8217;re also wildly inaccurate, since most people see a cached version of those pages and never trigger the query. The worst part: they occur on every (non-cached) page view, even while just reading.</p>
<p>(This post was <a href="http://pc.de/pages/surviving-pac-man-be">translated into Belorussian</a>, isn&#8217;t that cool?)</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/surviving-pac-man-422/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

