RSS
 

Bleach 1.0rc

16 Jan

After nearly a year, I’ve got something I’d like to call Bleach 1.0. But first, I want your feedback!

I incorporated some patches from the community this afternoon, and closed an issue that had been bugging me. These are all available in backwards-compatible changes between versions 0.3.5 and 0.4.0.

Then there’s 0.5.0, which is the current version on PyPI and is only backwards-incompatible if you’re using the linkify filters.

In 0.5.0 I added a new/old API that doesn’t require a Bleach object. I say “new/old” because this is actually how it worked in the first place. As of 0.5.0, much like 0.1.0, you can do:

>>> import bleach
>>> bleach.clean('a <script>bad()</script> string')
'a &lt;script&gt;bad()&lt;/script&gt; string'

At some point, there’s a commit that says “Move to a more maintainable architecture.” I really wish I knew what I meant by that. My suspicion is that I was trying to avoid passing callables into linkify().

So why did I change my mind and alter the API completely?

As I was considering a set of patches, I started thinking about whether this was going to make the API more or less logical. I didn’t want to add a huge collection of kwargs, but I didn’t want to make it necessary to instantiate multiple Bleach objects just because you wanted slightly different options in different places. I also wasn’t thrilled with needing to subclass Bleach at all.

Thinking this through, I came to the conclusion that lots of kwargs, with sane defaults, is better than a kind-of-stateful, unnecessarily-class-based API.

Please look at the changes between 0.3.5 and 1.0-branch and let me know what you think.

At the very least, before creating an official 1.0, I’m going to take the time to fix all the PEP-8 violations in sanitizer.py.

Here are the recent changes, by version:

0.3.5
  • Add a strip kwarg to clean that strips blacklisted HTML instead of escaping it. (Default: False.)
0.4.0
  • Add a strip_comments kwarg to clean that strips HTML comments. (Note that this always happened before.) (Default: True.)
  • Add a styles kwarg to clean that takes a list of whitelisted CSS properties. (Note that before, allowing style attributes essentially allowed all CSS properties.) (Default: [].)
0.5.0
  • Add a nofollow_relative kwarg to linkify that controls setting rel="nofollow" on relative links within the text (links starting with /). (Default: True.)
  • Add the optional, class-less API. bleach.clean() works exactly like bleach.Bleach().clean(), and similarly with bleach.linkify().
  • Drop Bleach.filter_url and Bleach.filter_text. These are now kwargs passed into either version of linkify.
1.0.0rc1
  • Drop the Bleach class completely. All access is through the new API.
latest
  • Clean up PEP8 violations and some general coding style.
  • Add a skip_pre option to linkify that can skip creating links inside <pre> sections. (Default: False.)
  • Drop nofollow_relative. Fred’s concerns below are 100% valid and, even though it’s not a security issue per se, I don’t want to give false impressions.
 
3 Comments

Posted in Articles

 

Tags: , , ,

  1. Fred

    17 January 2011 at 1:46 am

    Very nice, congrats on cleaning up the library :) I have a concern about the relative links setting. “Starting with /” is not a reliable way to check this. Try an href like: “//google.com” (sic). Further, relative links don’t have to start with a slash.

     
  2. James

    17 January 2011 at 10:53 am

    Hmm, a fair point. My motivating use case here is wiki-generated links in SUMO: those should definitely not have rel=”nofollow”, but links created with A tags should. Will have to think more about it.

     
  3. Coffee on the Keyboard » Bleach 1.0rc2

    25 January 2011 at 9:31 am

    [...] I announced Bleach 1.0rc1, a couple of important issues were found. Those have now been fixed. (Thanks, guys!) One of these [...]