After nearly a year, I’ve got something I’d like to call Bleach 1.0. But first, I want your feedback!
I incorporated some patches from the community this afternoon, and closed an issue that had been bugging me. These are all available in backwards-compatible changes between versions 0.3.5 and 0.4.0.
Then there’s 0.5.0, which is the current version on PyPI and is only backwards-incompatible if you’re using the linkify filters.
In 0.5.0 I added a new/old API that doesn’t require a Bleach object. I say “new/old” because this is actually how it worked in the first place. As of 0.5.0, much like 0.1.0, you can do:
>>> import bleach
>>> bleach.clean('a <script>bad()</script> string')
'a <script>bad()</script> string'
At some point, there’s a commit that says “Move to a more maintainable architecture.” I really wish I knew what I meant by that. My suspicion is that I was trying to avoid passing callables into linkify().
So why did I change my mind and alter the API completely?
As I was considering a set of patches, I started thinking about whether this was going to make the API more or less logical. I didn’t want to add a huge collection of kwargs, but I didn’t want to make it necessary to instantiate multiple Bleach objects just because you wanted slightly different options in different places. I also wasn’t thrilled with needing to subclass Bleach at all.
Thinking this through, I came to the conclusion that lots of kwargs, with sane defaults, is better than a kind-of-stateful, unnecessarily-class-based API.
Please look at the changes between 0.3.5 and 1.0-branch and let me know what you think.
At the very least, before creating an official 1.0, I’m going to take the time to fix all the PEP-8 violations in sanitizer.py.
Here are the recent changes, by version:
- 0.3.5
-
- Add a
stripkwarg tocleanthat strips blacklisted HTML instead of escaping it. (Default:False.)
- Add a
- 0.4.0
-
- Add a
strip_commentskwarg tocleanthat strips HTML comments. (Note that this always happened before.) (Default:True.) - Add a
styleskwarg tocleanthat takes a list of whitelisted CSS properties. (Note that before, allowingstyleattributes essentially allowed all CSS properties.) (Default:[].)
- Add a
- 0.5.0
-
- Add a
nofollow_relativekwarg tolinkifythat controls settingrel="nofollow"on relative links within the text (links starting with/). (Default:True.) - Add the optional, class-less API.
bleach.clean()works exactly likebleach.Bleach().clean(), and similarly withbleach.linkify(). - Drop
Bleach.filter_urlandBleach.filter_text. These are now kwargs passed into either version oflinkify.
- Add a
- 1.0.0rc1
-
- Drop the
Bleachclass completely. All access is through the new API.
- Drop the
- latest
- Clean up PEP8 violations and some general coding style.
- Add a
skip_preoption tolinkifythat can skip creating links inside<pre>sections. (Default:False.) - Drop
nofollow_relative. Fred’s concerns below are 100% valid and, even though it’s not a security issue per se, I don’t want to give false impressions.
Fred
17 January 2011 at 1:46 am
Very nice, congrats on cleaning up the library
I have a concern about the relative links setting. “Starting with /” is not a reliable way to check this. Try an href like: “//google.com” (sic). Further, relative links don’t have to start with a slash.
James
17 January 2011 at 10:53 am
Hmm, a fair point. My motivating use case here is wiki-generated links in SUMO: those should definitely not have rel=”nofollow”, but links created with A tags should. Will have to think more about it.
Coffee on the Keyboard » Bleach 1.0rc2
25 January 2011 at 9:31 am
[...] I announced Bleach 1.0rc1, a couple of important issues were found. Those have now been fixed. (Thanks, guys!) One of these [...]