RSS
 

Posts Tagged ‘python’

Bleach 1.0rc

16 Jan

After nearly a year, I’ve got something I’d like to call Bleach 1.0. But first, I want your feedback!

I incorporated some patches from the community this afternoon, and closed an issue that had been bugging me. These are all available in backwards-compatible changes between versions 0.3.5 and 0.4.0.

Then there’s 0.5.0, which is the current version on PyPI and is only backwards-incompatible if you’re using the linkify filters.

In 0.5.0 I added a new/old API that doesn’t require a Bleach object. I say “new/old” because this is actually how it worked in the first place. As of 0.5.0, much like 0.1.0, you can do:

>>> import bleach
>>> bleach.clean('a <script>bad()</script> string')
'a &lt;script&gt;bad()&lt;/script&gt; string'

At some point, there’s a commit that says “Move to a more maintainable architecture.” I really wish I knew what I meant by that. My suspicion is that I was trying to avoid passing callables into linkify().

So why did I change my mind and alter the API completely?

As I was considering a set of patches, I started thinking about whether this was going to make the API more or less logical. I didn’t want to add a huge collection of kwargs, but I didn’t want to make it necessary to instantiate multiple Bleach objects just because you wanted slightly different options in different places. I also wasn’t thrilled with needing to subclass Bleach at all.

Thinking this through, I came to the conclusion that lots of kwargs, with sane defaults, is better than a kind-of-stateful, unnecessarily-class-based API.

Please look at the changes between 0.3.5 and 1.0-branch and let me know what you think.

At the very least, before creating an official 1.0, I’m going to take the time to fix all the PEP-8 violations in sanitizer.py.

Here are the recent changes, by version:

0.3.5
  • Add a strip kwarg to clean that strips blacklisted HTML instead of escaping it. (Default: False.)
0.4.0
  • Add a strip_comments kwarg to clean that strips HTML comments. (Note that this always happened before.) (Default: True.)
  • Add a styles kwarg to clean that takes a list of whitelisted CSS properties. (Note that before, allowing style attributes essentially allowed all CSS properties.) (Default: [].)
0.5.0
  • Add a nofollow_relative kwarg to linkify that controls setting rel="nofollow" on relative links within the text (links starting with /). (Default: True.)
  • Add the optional, class-less API. bleach.clean() works exactly like bleach.Bleach().clean(), and similarly with bleach.linkify().
  • Drop Bleach.filter_url and Bleach.filter_text. These are now kwargs passed into either version of linkify.
1.0.0rc1
  • Drop the Bleach class completely. All access is through the new API.
latest
  • Clean up PEP8 violations and some general coding style.
  • Add a skip_pre option to linkify that can skip creating links inside <pre> sections. (Default: False.)
  • Drop nofollow_relative. Fred’s concerns below are 100% valid and, even though it’s not a security issue per se, I don’t want to give false impressions.
 
3 Comments

Posted in Articles

 

An End and a Beginning

03 Nov

2010 is coming to a close, and, with it, the end of our year-long project to create a new platform for support.mozilla.com (SUMO) is in sight.

For the past year, developing the new platform has been our focus and has effected our roadmap. When 2011 starts, we’ll begin a new chapter for SUMO. It’s a very exciting time!

For the developers, this is the end of the investment phase and the beginning of the payoff for Kitsune, the code-name for our new platform.

  • The entire site will be faster.
  • We’ll be done rebuilding existing features, and can work on brand new features.
  • We’ll be free of our legacy code base, which will simplify some important sections of Kitsune.
  • We’ll be working on smaller, faster cycles.
  • We’ll be able to take time to circle back to fix things we’ve been unhappy about, but willing to live with during the migration to Kitsune.
  • We’ll be more effective at making the site even faster.
  • We’ll apply our new theme to the entire site, making the experience more consistent and seamless.
  • We’ll be more agile, able to respond to issues faster.
  • We’ll be able to parallelize more.
  • We’ll be able to push updates to the site far more frequently—and we’ve averaged releases every two weeks since August!
  • We’ll be able to take the time we need for large features and disruptive changes without blocking work on, or release of, smaller features and fixes.
  • Nagging issues with sessions will go away.
  • It will be easier to keep our entire platform up to date.
  • We’ll be free of an entire class of security issues.

We’re just beginning to work on our roadmap for Q1, 2011. A lot of it is still up in the air, but there are some fun things on there. And we’ll be taking time to improve performance even more.

In the meantime, we’re getting very close to feature complete on SUMO 2.3, which will move the Knowledge Base over to the new platform.

After 2.3, there will be only one major release left in 2010, SUMO 2.4. SUMO 2.4 will be much smaller than 2.3—maybe 1/10th the number of bugs, and come out in a matter of weeks instead of months. But 2.4 will also be huge, in that it will move the final piece over to the new platform.

 
1 Comment

Posted in Articles

 

Django Fixtures with Circular Foreign Keys

29 Sep

If you create a nice, perfectly normalized database, you (probably) won’t ever run into circular foreign keys (when a row in table A references a row in table B that references the same row in table A).

In the real world, this happens pretty regularly. The most common situation is a “current” or “last” denormalization. You don’t really want to do a subquery with a sort every time you want to know the latest post in a forum thread, or current revision of a wiki page.

The problem—one we’ve been dealing with since we decided to rebuild SUMO—is that trying to load data with circular foreign keys produces a “chicken and the egg” situation: since each row depends on the other, neither can be loaded first.

(This is part of a bigger problem with MySQL, which is that it lacks deferred foreign key checks.)

The solution to this is to temporarily disable foreign key checks while you load in data. It’s not hard, but Django is so far unwilling to do it.

Well, now we get the chance to see if their concerns are realistic: with the latest commit to Jeff Balogh’s test-utils package for Django, we’re disabling foreign key checks during fixture loading.

Both SUMO and AMO have had to do some acrobatic hackery to get around the limit. This solution is definitely a filthy hack, but it’s contained in a single, small place, rather than spread throughout test cases in multiple projects.

Suggestions for improving this hideous monkey patch are welcome, but in the meantime I’ll be removing the gross parts from Kitsune that we needed to work around this.

 
3 Comments

Posted in Articles

 

Code-sharing Update

08 Mar

When we decided to move SUMO to a new platform, one of the reasons we chose Django was code sharing and reuse—specifically that SUMO and AMO would be able to share code, meaning both teams would save time and see benefits.

So how is that going? Were we right in our assumption here? The code we’re sharing so far:

MultiDB Router
A Django DB router that supports reading from a pool of slave databases.
Cache Machine
A powerful caching library for Django that, in particular, provides automatic object caching and invalidation through the ORM.
Jingo
An adapter for using Jinja2 templates with Django.
Django-Nose
A test runner for Django using Nose.
Django Debug Cache Panel
Adds a cache panel for Django Debug Toolbar.
Test-Utils
Tools we use testing in the Django/Jinja2/Nose setup.
Bleach
A library for sanitizing and linkifying user HTML, based on html5lib.
Fixture Magic
Django management commands for working with fixture data.

Additionally, we expect both teams will probably use the following, eventually:

DidYouMean
A wrapper for Hunspell, using PyHunspell to provide spelling suggestions for searches.
Django Gearman
Provides an easier interface from Django to the Python Gearman bindings.
AMO’s JS and CSS minification
AMO has already solved the problem of JS and CSS minification with Django and Jinja2.

And it’s not a released library, but SUMO has also been able to directly reuse code from AMO to simplify pagination.

Overall, it seems like we’re doing really well on this! It’s great to see the projects not just sharing code, but packaging and publishing it on Github and PyPI. If any of the above is useful to you, go ahead and try it out! You can open issues with any of the packages on Github, or find us in #webdev in irc.mozilla.org.

 
Comments Off

Posted in Articles

 

Bleach, HTML sanitizer and auto-linker

25 Feb

Bleach is a whitelist-based HTML sanitizer and auto-linker in Python, built on html5lib, for AMO and SUMO and released under the BSD license.

Bleach has two main functions: sanitizing HTML based on a whitelist of tags and attributes, and turning URLs into links. It uses html5lib for both.

For more information on using Bleach, see the README included in the source. For more info on how Bleach works, follow below the jump. Read the rest of this entry »

 
1 Comment

Posted in Articles