• The Evolution of SUMO

    by  • 23 February 2010 • Articles

    When I joined the SUMO team six months ago, the team was just starting a discussion of “where do we go from here?”  SUMO was built on a CMS called TikiWiki, and had diverged pretty significantly in two years. (David Tenser wrote a more detailed history if you’re interested.)

    After a few months of talking and testing—and a few changes of direction—we’ve decided that SUMO will follow our colleagues on AMO and move to a custom web application, built on Django, a development framework in Python.

    Why are we committing to such a dramatic new direction? Three major reasons. Keep in mind that SUMO was built on TikiWiki 1.10, a little more than two years out of date.

    Performance

    TikiWiki is a very feature-rich application. An unfortunate trade-off for us is performance, especially on a site serving 16 million users every week. As our European users in particular know, SUMO can be unacceptably slow at times, especially when editing articles. Many of the changes we made to the platform—most of which were contributed back over the past few months—were to improve performance via tools like output caching, database replication, and just refactoring. When we evaluated the latest version of TikiWiki, we found that performance was around the same, on average.

    In the new platform, we’ll be taking advantage of techniques now available, including query and template/fragment caching and expect to see dramatic performance improvements. We’ll also be avoiding some of the performance pitfalls that TikiWiki fell into over the years with improvements to the security, database, and templating layers, among others.

    But the biggest performance impact—I expect—will be moving from a general-purpose CMS to a dedicated web application, focused on providing the SUMO experience.

    Hackability

    To work on SUMO, you have to overcome a steep learning curve. Components tend to be tightly-coupled, or grouped in unintuitive ways, and are not as extensible as we’d like. The lack of a comprehensive test suite leaves changes to important sections of code open to introducing regressions in otherwise unrelated, dependent areas. SUMO 1.x also fails to function without a relatively complete copy of its database, which makes it difficult for community members outside the company to contribute.

    With the new platform, and some discipline from the team, our goal is to improve all of these and make it easier for someone to get started hacking on SUMO.

    • We’ll be striving to keep code loosely-coupled and extensible—including using existing or external libraries whenever possible, and turning our own contributions into external libraries where possible.
    • We’re adopting a test-driven development workflow to ensure that our components are easier to safely hack, and lighten the load on our QA team by reducing regressions.
    • TDD and Django will make it easier to work without a copy of the database, using fixtures and migrations to minimize the dependence on real data.

    The net effect of these decisions will be to lower the barrier to entry to SUMO development, and hopefully make useful code available to other projects. Wil Clouser listed more strengths of Django as a platform when the AMO team decided to switch.

    Strength in Numbers

    By using the same platform as AMO, both teams will benefit from sharing code and resources. We’re already using the same template adapter, database router, caching layer, and HTML sanitizer. As open source developers often say: “with enough eyes, all bugs are shallow,” and by sharing code we get more eyes on it. We’ll benefit from insights the AMO team has gleaned by starting the process of moving from a PHP framework to Python just ahead of us. We’ll even be able to send code reviews across teams and benefit from deeper knowledge of the various problem domains we share: have a question about localization? Both teams can share expertise and best-practices.

    Solving problems once and sharing the solution directly reduces the amount of work both teams have to do. And when SUMO writes code in such a way that AMO can use it, we can also release it separately so others can benefit from our solutions—and point out flaws and contribute improvements.

    Other Changes

    Also among the changes coming in the next year:

    • Version Control System. Though we don’t have a specific plan in place, it seems likely that SUMO will be moving from SVN to Git for source control. Because Git is distributed, it allows us to use a more collaborative workflow, and it’s easier for us to push our code to public repositories like Github.
    • Continuous Integration. We’ll be using Hudson for continuous integration, which will automate our tests and alert us to potential issues and regressions. The web QA team has also been working to make sure our Selenium tests can run through Hudson, greatly increasing test coverage for a web application like SUMO.
    • Interface Localization. One of the ways we plan to improve the SUMO experience this year is by moving our interface localization to gettext, which is an industry-standard tool for localization. As we move parts of the site from TikiWiki to Django, those new sections will be localized via gettext, which helps us take advantage of our great community with tools like Verbatim.

    A Foundation for the Future

    The goal of all of this work—and it will be a lot of work—is to put SUMO on a solid foundation for future growth and, at the same time, improve the experience for everyone—from developers to contributors to localizers to visitors. We have a daunting and aggressive road ahead of us, but I’m confident that we’ll emerge in a better place.

    SUMO 2 is codenamed Kitsune, and is already up on Github.

    • Pingback: SUMO Blog » Blog Archive » The Evolution of SUMO

    • http://blog.mozilla.com/axel Axel Hecht

      I seriously hope this will go well. My webapps are all over django, which makes me both a fan boy and a grain of salt. You know all the good stuff, so here’s the salt ;-)

      Fixtures are nice for small apps, but my apps tend to grow to a size that makes it hard to actually come up with good test data to run interesting tests on. Not sure on which side of the story sumo apps would be.

      Are there any concrete ideas on which apps to share between sumo and amo?

      Regarding wiki l10n, I’m happy to see that l10n is all over the notes. I’d like to add that tikiwiki is the only system I have found so far that doesn’t totally suck on translating live documents, though. Make sure to design that one with as much input from the l10n community as possible? I’ll totally comment and help out here myself. Historically, this is risky business. MDC l10n died on “good enough for now” and hasn’t recovered ever since. Makes the bad news that you might have to start off in that area by being better than tikiwiki from the get go. I’m aware that this is pretty much an “asshole requirement”, but we shouldn’t be surprised if SUMO needs to make up for MDC here. Again, or still, maybe?

    • http://jamessocol.com/ James

      @Axel: We probably won’t be sharing Django apps, but we are sharing libraries, middleware, and anything else we can.

      For l10n, there are really two problems: interface l10n–which we’ll handle with gettext–and content l10n. The latter is the tricky part, but there are good examples. Even AMO has localized add-on descriptions. It’s not a trivial problem, of course, and I’m glad you’re willing to help us make sure the l10n process is great!

    • http://blog.mozilla.com/axel Axel Hecht

      Well, the add-on descriptions are far from being as lively documents as a support document, I guess.

    • Pingback: Code-sharing Update | Coffee on the Keyboard

    • Pingback: Testing with Legacy Data in Django | Coffee on the Keyboard