Bleach 1.0rc

After nearly a year, I’ve got something I’d like to call Bleach 1.0. But first, I want your feedback!

I incorporated some patches from the community this afternoon, and closed an issue that had been bugging me. These are all available in backwards-compatible changes between versions 0.3.5 and 0.4.0.

Then there’s 0.5.0, which is the current version on PyPI and is only backwards-incompatible if you’re using the linkify filters.

In 0.5.0 I added a new/old API that doesn’t require a Bleach object. I say “new/old” because this is actually how it worked in the first place. As of 0.5.0, much like 0.1.0, you can do:

import bleach >>> bleach.clean('a string') 'a string'

At some point, there’s a commit that says “Move to a more maintainable architecture.” I really wish I knew what I meant by that. My suspicion is that I was trying to avoid passing callables into linkify().

So why did I change my mind and alter the API completely?

As I was considering a set of patches, I started thinking about whether this was going to make the API more or less logical. I didn’t want to add a huge collection of kwargs, but I didn’t want to make it necessary to instantiate multiple Bleach objects just because you wanted slightly different options in different places. I also wasn’t thrilled with needing to subclass Bleach at all.

Thinking this through, I came to the conclusion that lots of kwargs, with sane defaults, is better than a kind-of-stateful, unnecessarily-class-based API.

Please look at the changes between 0.3.5 and 1.0-branch and let me know what you think.

At the very least, before creating an official 1.0, I’m going to take the time to fix all the PEP-8 violations in sanitizer.py.

Here are the recent changes, by version:

0.3.5
- Add a `strip` kwarg to `clean` that strips blacklisted HTML instead of escaping it. (Default: `False`.)
0.4.0
- Add a `strip_comments` kwarg to `clean` that strips HTML comments. (Note that this always happened before.) (Default: `True`.) - Add a `styles` kwarg to `clean` that takes a list of whitelisted CSS properties. (Note that before, allowing `style` attributes essentially allowed all CSS properties.) (Default: `[]`.)
0.5.0
- Add a `nofollow_relative` kwarg to `linkify` that controls setting `rel="nofollow"` on relative links within the text (links starting with `/`). (Default: `True`.) - Add the optional, class-less API. `bleach.clean()` works exactly like `bleach.Bleach().clean()`, and similarly with `bleach.linkify()`. - Drop `Bleach.filter_url` and `Bleach.filter_text`. These are now kwargs passed into either version of `linkify`.
1.0.0rc1
- Drop the `Bleach` class completely. All access is through the new API.
latest
- Clean up PEP8 violations and some general coding style. - Add a `skip_pre` option to `linkify` that can skip creating links inside `
` sections. (Default: `False`.)
- Drop `nofollow_relative`. Fred’s concerns below are 100% valid and, even though it’s not a security issue per se, I don’t want to give false impressions.