<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coffee on the Keyboard &#187; html</title>
	<atom:link href="http://coffeeonthekeyboard.com/tag/html/feed/" rel="self" type="application/rss+xml" />
	<link>http://coffeeonthekeyboard.com</link>
	<description>by James Socol</description>
	<lastBuildDate>Fri, 20 Apr 2012 22:17:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Bleach, HTML sanitizer and auto-linker</title>
		<link>http://coffeeonthekeyboard.com/bleach-html-sanitizer-and-auto-linker-for-django-344/</link>
		<comments>http://coffeeonthekeyboard.com/bleach-html-sanitizer-and-auto-linker-for-django-344/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 19:22:00 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sanitize]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[user]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=344</guid>
		<description><![CDATA[Bleach is a whitelist-based HTML sanitizer and auto-linker in Python, built on html5lib, for AMO and SUMO and released under the BSD license. Bleach has two main functions: sanitizing HTML based on a whitelist of tags and attributes, and turning URLs into links. It uses html5lib for both. For more information on using Bleach, see [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://github.com/jsocol/bleach">Bleach</a> is a whitelist-based HTML sanitizer and auto-linker in Python, built on <a href="http://code.google.com/p/html5lib/">html5lib</a>, for <a href="https://addons.mozilla.org/">AMO</a> and <a href="http://support.mozilla.com/">SUMO</a> and released under the BSD license.</p>
<p>Bleach has two main functions: sanitizing HTML based on a whitelist of tags and attributes, and turning URLs into links. It uses html5lib for both.</p>
<p>For more information on using Bleach, see the <a href="http://github.com/jsocol/bleach/blob/master/README.rst">README</a> included in the source. For more info on how Bleach works, follow below the jump.<span id="more-344"></span></p>
<h3>Sanitizing HTML</h3>
<p>Bleach&#8217;s <code>clean()</code> function uses a slightly custom version of html5lib&#8217;s <code>HTMLSanitizer</code> tokenizer that adds support for per-tag attribute whitelists. Any entity that is not part of a whitelisted tag or valid entity will be encoded. Legitimate entities and tags are allowed. The default whitelist is set up for AMO.</p>
<h3>Linkifying Text</h3>
<p>The <code>linkify()</code> function is a little more complicated. Naïve implementations usually rely on a simple regular expression to find URL-like strings, but this quickly becomes insufficient when you need to handle situations like these:</p>
<ul>
<li><code>&lt;em&gt;http://example.com&lt;/em&gt;</code> (should be linkified)</li>
<li><code>&lt;a href="http://example.com"&gt;test&lt;/a&gt;</code> (already linked, no need to linkify)</li>
<li><code>&lt;a href="http://example.com"&gt;http://example.com&lt;/a&gt;</code> (really don&#8217;t need to linkify)</li>
<li><code>&lt;em&gt;http://xx.com &lt;a href="http://example.com"&gt;http://example.com&lt;/a&gt;&lt;/em&gt;</code> (regular expression freak-out)</li>
</ul>
<p>So <code>linkify()</code> actually uses html5lib to build a document fragment and walks it, only applying the naïve regular expression in safe locations. In pseudocode:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1">tree = parseFragment<span class="br0">&#40;</span><span class="kw2">input</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">linkify_nodes <span class="br0">&#40;</span>tree<span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">for</span> node <span class="kw1">in</span> tree:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> node <span class="kw1">is</span> a text node:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; replace node with text nodes <span class="kw1">and</span> links</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span> <span class="kw1">if</span> node <span class="kw1">is</span> a link:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> nofollow:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">set</span> rel=<span class="st0">&quot;nofollow&quot;</span> on node</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; linkify_nodes<span class="br0">&#40;</span>node.<span class="me1">childNodes</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">return</span> <span class="kw3">string</span><span class="br0">&#40;</span>linkify_nodes<span class="br0">&#40;</span>tree<span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
</ol>
</div>
<p>This avoids attempting to apply the regular expression to things like tag attributes, the inside of <code>&lt;a&gt;</code> tags, and other places it should generally be avoided. It also lets us do things like set the <code>rel</code> attribute on links already in the text and pass the <code>href</code> attribute through the same filter it would go through if we created the link. This filter lets us redirect links through an outbound redirect, so people know they&#8217;re leaving a Mozilla site. You could do other things with it, like rickroll your visitors. That&#8217;s up to you.</p>
<h3>Bad HTML</h3>
<p>Because both <code>clean()</code> and <code>linkify()</code> use <code>html5lib</code> and construct document trees, using either will fix up code mistakes, like unclosed takes, and escape bare entities. <code>linkify()</code> allows basically every tag and attribute, so if you need to limit the legal HTML to a subset, use <code>clean()</code> (or the shortcut <code>bleach()</code> to clean then linkify).</p>
<h3>Getting Bleach</h3>
<p>Bleach is <a href="http://github.com/jsocol/bleach">available on Github</a>, or can be installed via <code>pip</code> or <code>easy_install</code>. Improvements and test cases are very welcome! Actually, there&#8217;s one disabled test right now that is not supported. If you can make it work, that would be pretty great!</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/bleach-html-sanitizer-and-auto-linker-for-django-344/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Work Pattern: Designing Web Sites</title>
		<link>http://coffeeonthekeyboard.com/work-pattern-designing-web-sites-93/</link>
		<comments>http://coffeeonthekeyboard.com/work-pattern-designing-web-sites-93/#comments</comments>
		<pubDate>Mon, 26 May 2008 15:40:02 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Accessibility]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[Standards]]></category>
		<category><![CDATA[work pattern]]></category>
		<category><![CDATA[workflow]]></category>
		<category><![CDATA[xhtml]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/work-pattern-designing-web-sites-93/</guid>
		<description><![CDATA[The premise of Design Patterns is that similar problems have similar solutions. In the same vein, I propose this Work Pattern a set of common steps I use when I create a web site, and maybe you can use, too. Elements and Outline My first step is usually to create an un-styled outline of a [...]]]></description>
			<content:encoded><![CDATA[<p>The premise of <a href="http://en.wikipedia.org/wiki/Design_pattern_(computer_science)">Design Patterns</a> is that similar problems have similar solutions. In the same vein, I propose this <strong>Work Pattern</strong> a set of common steps I use when I create a web site, and maybe you can use, too.</p>
<h4>Elements and Outline</h4>
<p>My first step is usually to create an un-styled outline of a &#8220;typical&#8221; page. I fire up my editor, fill in the basic <abbr title="eXtensible HyperText Markup Language">XHTML</abbr>, and then go to work inside the <code>&lt;body&gt;</code> tag.</p>
<p>Most sites have this fairly common structure: header, content, footer. And just for fun, let&#8217;s throw in navigation between the header and the content. It&#8217;s pretty easy to represent this in XHTML:</p>
<pre>&lt;div id="header"&gt;
&lt;/div&gt;

&lt;div id="navigation"&gt;
&lt;/div&gt;

&lt;div id="content"&gt;
&lt;/div&gt;

&lt;div id="footer"&gt;
&lt;/div&gt;</pre>
<p>This is my first skeleton for &gt;90% of the sites I design. It&#8217;s a very standard document. Sometimes navigation will be inside the header, but most often it goes like this.</p>
<p>Now you have to start thinking about what elements will be on the page. On this site, a blog, I used &#8220;articles&#8221; instead of &#8220;content&#8221; for the main div. I also added two side bars, and I knew that inside the articles div I&#8217;d want, well, articles.</p>
<pre>&lt;div id="header"&gt;
  &lt;h1&gt;Page Title&lt;/h1&gt;
&lt;/div&gt;

&lt;div id="navigation"&gt;
  &lt;ul&gt;
    &lt;li&gt;Link 1&lt;/li&gt;
    &lt;li&gt;Link 2&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;div id="articles"&gt;
  &lt;h2&gt;Recent Articles&lt;/h2&gt;
  &lt;div class="article"&gt;
    &lt;h3&gt;Article Title&lt;/h3&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div id="theblog"&gt;
  &lt;h2&gt;Sidebar heading&lt;/h2&gt;
  &lt;p&gt;Sidebar paragraph&lt;/p&gt;
&lt;/div&gt;

&lt;div id="theworld"&gt;
  &lt;h2&gt;Sidebar heading&lt;/h2&gt;
  &lt;ul&gt;
    &lt;li&gt;Sidebar&lt;/li&gt;
    &lt;li&gt;list&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;div id="footer"&gt;
&lt;/div&gt;</pre>
<p class="image right"><img src="http://coffeeonthekeyboard.com/wp-content/uploads/2008/05/skeleton.png" alt="An Un-Styled Skeleton" /></p>
<p>I won&#8217;t bore you with more code examples; I think you get the idea. I <a href="http://coffeeonthekeyboard.com/building-accessible-sites-part-three-in-a-trilogy-69/" title="make an outline">make an outline</a>. I know at this point that my source is nice and valid, and that it will make sense when I <a href="http://coffeeonthekeyboard.com/assessing-accessibility-part-two-in-a-trilogy-67/" title="turn off the stylesheet">turn off the stylesheet</a>. I use <a href="http://coffeeonthekeyboard.com/use-semantics-to-guide-design-53/" title="semantic names">semantic names</a> for everything.</p>
<p>It&#8217;s not very pretty, but I now have a workable XHTML document, with a properly-nested outline, and most of the important elements. Good for me, because now I can start to style them.</p>
<h4>Layout and Style</h4>
<p>Now, I know what visual elements will need to go on the page. I know what page elements I need to style. Now I&#8217;ll start creating a style sheet.</p>
<p>My first style sheet will contain a few basic <abbr>HTML tags and the elements of my document. I could probably write an XML</abbr>-to-<abbr title="Cascading Style Sheets">CSS</abbr> generator with how strict I am with this step.</p>
<p>Ok, one more code example:</p>
<pre>body {}

h1,
h2,
h3,
h4,
h5,
h6 {}

a:link {}
a:visited {}
a:hover {}

#header {}
#header h1 {}

#navigation {}
#navigation ul {}
#navigation ul li {}

#articles {}
#articles h2 {}
#articles div.article {}
#articles div.article h2 {}

#theblog {}
#theblog h2 {}

#theworld {}
#theworld h2 {}
#theworld ul {}
#theworld ul li {}

#footer {}</pre>
<p>One of my favorite things about this is it&#8217;s almost impossible for a mistake in one section to mess up anything else.</p>
<p>But obviously there&#8217;s a lot in there I can combine, can shorten. Almost anything that&#8217;s true for <code>#theblog</code> will also be true for <code>#theworld</code> in this case, so <abbr title="Don't Repeat Yourself">DRY</abbr>, and keep things together as much as you can. But, when you&#8217;re just starting the style sheet, this is a good place to start.</p>
<p>As I&#8217;m going, I add a lot to the style sheet. I also add a lot to the XHTML template. Pixels get tweaked left and right and I swear at <abbr title="Internet Explorer 6">IE6</abbr>, of course.</p>
<h4>Building Templates</h4>
<p>Once I have a complete, or near-complete, <a href="http://coffeeonthekeyboard.com/wp-content/themes/mock.htm">mock up</a>,  it&#8217;s time to start building templates for your <abbr title="Content Management System">CMS</abbr> of choice. This is mostly copy-and-paste work at this point. Your <code>#header</code> and <code>#navigation</code> go into the header template. <code>#footer</code> goes into footer. <code>#content</code> goes in the content template.</p>
<p>See how easy that is?</p>
<p>Then you get to go through and actually add the template mark up. Whether it&#8217;s Smarty or PHP or ASP doesn&#8217;t really matter, you just replace your dummy text with the right tags.</p>
<h4>Starting Out</h4>
<p>I love this process, but there is one thing you really need for it to go smoothly:</p>
<p>You need to know what kind of content you&#8217;ll have. When you&#8217;re redesigning your blog, or building an in-house site, it&#8217;s pretty easy to know. When you&#8217;re working for a client, you may need to twist some arms to get this information. (I love <a href="http://www.alistapart.com/articles/designbymetaphor">this A List Apart article</a> for advice on communicating with clients.)</p>
<p>One final thought: use comments. Any time I create a div, I wrap it in comments like this:</p>
<pre>&lt;!--begin #articles --&gt;
&lt;div id="articles"&gt;
&lt;/div&gt;
&lt;!-- end #articles --&gt;</pre>
<p>I usually use the CSS selector because it&#8217;s specific, so <code>#articles</code>, <code>.article</code>, and so on. These comments—which I left out here to save space—have saved me so much time and effort compared to relying on indentation that I can&#8217;t imagine working without them.</p>
<p>I didn&#8217;t set out this process as a way to streamline my work, but rather, as I started noticing patterns that worked well, I started thinking about the process. Much like <a href="http://rubyonrails.org/">Rails</a>, which was already running <a href="http://www.basecamphq.com/">Basecamp</a> before it was a framework, I&#8217;ve been using more-and-more-polished versions of this work flow for months.</p>
<p>Maybe you&#8217;ll find it helpful, maybe not. Maybe you already have a &#8220;system&#8221; in place. If you do, what is it?</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/work-pattern-designing-web-sites-93/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The W3C Sucks</title>
		<link>http://coffeeonthekeyboard.com/the-w3c-sucks-92/</link>
		<comments>http://coffeeonthekeyboard.com/the-w3c-sucks-92/#comments</comments>
		<pubDate>Thu, 22 May 2008 22:54:53 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[CSS]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Standards]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[dom]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[mathml]]></category>
		<category><![CDATA[things that suck]]></category>
		<category><![CDATA[w3c]]></category>
		<category><![CDATA[xhtml]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/the-w3c-sucks-92/</guid>
		<description><![CDATA[&#8220;If you wish to be a success in the world, promise everything, deliver nothing.&#8221; If you want to remain the standard-setting body for the web, promise new recommendations, never deliver. A decade ago, the W3C was actively working to improve the standards we designers and developers use every day. Sure there were some controversial things [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;If you wish to be a success in the world, promise everything, deliver nothing.&#8221;</p>
<p>If you want to remain the standard-setting body for the web, promise new recommendations, never deliver.<span id="more-92"></span></p>
<div class="image left"><img src="http://coffeeonthekeyboard.com/wp-content/uploads/2008/05/css.png" alt="CSS 2.1 is not even a published recommendation. Off with their (the W3C) heads." style="float: left" /></div>
<p>A decade ago, the <abbr title="World Wide Web Consortium">W3C</abbr> was actively working to improve the standards we designers and developers use every day. Sure there were some controversial things (<abbr title="HyperText Markup Language">HTML</abbr> 3.0, <abbr title="eXtensible Markup Language">XML</abbr> 1.1) that never caught on, but at least there was discussion, thought, and sometimes even action.</p>
<p>The W3C started work on the <abbr title="Cascading Style Sheets">CSS</abbr>3 specification the same year they published CSS2—1998. Ten years later, CSS2.1 is still not technically a published recommendation.</p>
<p>Between 1995, when the W3C was founded, and 1999, HTML went from version 2, an <abbr title="Request For Comments">RFC</abbr>, to version 4.01. Where is 5? In January of <em>this year</em> it became a Working Draft.</p>
<p>When was <abbr title="eXtensible HyperText Markup Language">XHTML</abbr> last updated? 2001. The <abbr title="Document Object Model">DOM</abbr>? 2004. <abbr title="Math Markup Language">MathML</abbr>? 2003.</p>
<p>What happened?</p>
<p>When did &#8220;do nothing group&#8221; replace &#8220;working group&#8221; over there? (Probably around 2004.)</p>
<p>I realize that implementing new standards is not trivial. I also realize that standards are crucial to the continued growth of the web—this site is valid XHTML and uses valid CSS.</p>
<p>However, without updates, these &#8220;standards&#8221; will get old and die. Something else, or someone else, will replace them. We&#8217;ve already used CSS2 for a decade. Will we use it for another? (I want my drop shadows! I want my opacity! I want my rounded corners!)</p>
<p>I lead with a quote from Napoleon, so I&#8217;ll finish with the French Revolution: Off with their heads. The W3C needs a change in leadership or a vigorous shakedown to get off their asses and do something.</p>
<p>If they&#8217;re not willing to put forth the effort, then let them eat cake while someone else does.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/the-w3c-sucks-92/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

