<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coffee on the Keyboard &#187; Database</title>
	<atom:link href="http://coffeeonthekeyboard.com/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://coffeeonthekeyboard.com</link>
	<description>by James Socol</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:33:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Django Fixtures with Circular Foreign Keys</title>
		<link>http://coffeeonthekeyboard.com/django-fixtures-with-circular-foreign-keys-480/</link>
		<comments>http://coffeeonthekeyboard.com/django-fixtures-with-circular-foreign-keys-480/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 21:50:00 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[web dev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=480</guid>
		<description><![CDATA[If you create a nice, perfectly normalized database, you (probably) won&#8217;t ever run into circular foreign keys (when a row in table A references a row in table B that references the same row in table A). In the real world, this happens pretty regularly. The most common situation is a &#8220;current&#8221; or &#8220;last&#8221; denormalization. [...]]]></description>
			<content:encoded><![CDATA[<p>If you create a nice, perfectly normalized database, you (probably) won&#8217;t ever run into circular foreign keys (when a row in table A references a row in table B that references the same row in table A).</p>
<p>In the real world, this happens pretty regularly. The most common situation is a &#8220;current&#8221; or &#8220;last&#8221; denormalization. You don&#8217;t really want to do a subquery with a sort every time you want to know the latest post in a forum thread, or current revision of a wiki page.</p>
<p>The problem—one we&#8217;ve been dealing with since <a href="http://coffeeonthekeyboard.com/the-evolution-of-sumo-339/">we decided to rebuild SUMO</a>—is that trying to load data with circular foreign keys produces a &#8220;chicken and the egg&#8221; situation: since each row depends on the other, neither can be loaded first.</p>
<p>(This is part of a bigger problem with MySQL, which is that it lacks deferred foreign key checks.)</p>
<p>The solution to this is to temporarily disable foreign key checks while you load in data. It&#8217;s not hard, but Django is so far <a href="http://code.djangoproject.com/ticket/3615">unwilling</a> to do it.</p>
<p>Well, now we get the chance to see if their concerns are realistic: with <a href="http://github.com/jbalogh/test-utils/commit/ce0e9643ea3b38373823e04d8c2e5f2dc2de5665">the latest commit</a> to <a href="http://jeffbalogh.org/">Jeff Balogh&#8217;s</a> <a href="http://github.com/jbalogh/test-utils">test-utils</a> package for Django, we&#8217;re disabling foreign key checks during fixture loading.</p>
<p>Both <a href="http://support.mozilla.com/">SUMO</a> and <a href="https://addons.mozilla.org/">AMO</a> have had to do some acrobatic hackery to get around the limit. This solution is definitely a filthy hack, but it&#8217;s contained in a single, small place, rather than spread throughout test cases in multiple projects.</p>
<p>Suggestions for improving this hideous monkey patch are welcome, but in the meantime I&#8217;ll be removing the gross parts from <a href="http://github.com/jsocol/kitsune">Kitsune</a> that we needed to work around this.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/django-fixtures-with-circular-foreign-keys-480/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Developing at Scale: Database Replication</title>
		<link>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/</link>
		<comments>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 16:10:15 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[webdev]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=444</guid>
		<description><![CDATA[When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true. One reason is load. A popular [...]]]></description>
			<content:encoded><![CDATA[<p>When a website is small—like this one, for example—usually the entire thing, from the web server to the database, can live on a single server. Even a single virtual server. One of the first things that happens when a web site gets bigger is this is no longer true.</p>
<p>One reason is load. A popular website will simply require more than a single server, virtual or otherwise, can give, and the only way to keep scaling is to add more servers. For example, if the server runs out of available Apache connections and the number cannot be raised without negatively impacting performance.</p>
<p>Another reason is downtime. If a website is served from a single server, and that server goes down for any reason, planned or otherwise, then the website is down. At some point, downtime is essentially unacceptable—just ask Twitter—and redundancy is required.</p>
<h3>Enter Replication</h3>
<p>A common response is to set up database replication, where one database server operates as a &#8220;master,&#8221; and one or more other servers operate as &#8220;slaves.&#8221; In this setup, all of your <em>writes</em> to the database will go to the master, then &#8220;replicate&#8221; to the slaves, and all or most of the <em>reads</em> will come from the slaves. (Note that the slaves are doing both all the writes as well as all the reads: slaves are not a good place to recycle sub-par hardware.)</p>
<p>Replication introduces a new type of problem: if you naively send <em>all</em> reads to the slaves then data you just wrote <em>will not be there</em>.</p>
<h3>La&#8230;wait for it&#8230;g</h3>
<p>Even if the master and slave are sitting next to each other with a cable connecting them, replication will probably take more time than your code does to reach the next step. At a minimum, you need to assume that replication lag will be hundreds of milliseconds—an eternity when the time from one line in your web app to the next is measured in micro- or nanoseconds. In reality, replication in the real world may well take seconds, especially if your master and slaves are not physically next to each other.</p>
<p>The result is that <a href="http://en.wikipedia.org/wiki/ACID">ACIDity</a> is essentially broken, specifically the <strong>D</strong>urability part. You cannot simply write data and immediately rely on its existence.</p>
<p>For example, say you have a large discussion forum. If you naively send all reads to the slaves, then someone&#8217;s post may take seconds to appear on the site. This is a problem if you&#8217;re trying to show a user their post immediately after posting it.</p>
<h3>Smarter Reading</h3>
<p>The solution is to occasionally read from the master. When you need to access data that was just written, it is <em>probably</em> only available on the master, so that&#8217;s where you&#8217;ll read it. Within a single HTTP request, this is fairly simple: just force any queries that rely on recently-written data to the master.</p>
<p>Outside of a single HTTP request, this is slightly more complex. If you&#8217;re following the practice of redirecting after a POST request to a GET request (which you should) then creating a new forum post and viewing it will be on two different HTTP requests.</p>
<p>One way around this is to set a very short-lived cookie that tells your web app to continue reading from the master. If any write occurs in a request, the response should include this cookie. The exact time-to-live will depend on how long your replication lag usually is—cover at least 4 or 5 standard deviations. Any request that has this cookie should honor it by reading only from the master.</p>
<h3>A Pitch</h3>
<p>One of the hardest things for new web developers is developing large-scale applications: first, you need a large-scale application! Setting up database replication is a huge pain, and if your site isn&#8217;t getting enough traffic, it&#8217;s not worth it.</p>
<p>Mozilla is one way aspiring web developers can get some experience working with large-scale web apps. All of our web apps are open source and open to contributions from community members. To get involved, stop by <a href="irc://irc.mozilla.org/webdev">#webdev</a> in IRC!</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/developing-at-scale-database-replication-444/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Surviving Pac Man</title>
		<link>http://coffeeonthekeyboard.com/surviving-pac-man-422/</link>
		<comments>http://coffeeonthekeyboard.com/surviving-pac-man-422/#comments</comments>
		<pubDate>Mon, 24 May 2010 06:09:36 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[ddos]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[pac man]]></category>
		<category><![CDATA[spike]]></category>
		<category><![CDATA[sumo]]></category>
		<category><![CDATA[traffic]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=422</guid>
		<description><![CDATA[On Friday, Google showed off a fun new doodle in honor of the 30th anniversary of Pac Man: a Pac Man clone, complete with sounds. Unfortunately, in the initial release, those sounds started playing automatically—an oversight or an homage to &#60;bgsound&#62;, I guess. Even if Google was open in a background tab or window, or [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday, Google showed off a fun new <a href="http://www.google.com/logos/">doodle</a> in honor of the 30th anniversary of Pac Man: a <a href="http://www.google.com/pacman/">Pac Man clone</a>, complete with sounds.</p>
<p>Unfortunately, in the initial release, those sounds started playing automatically—an oversight or an homage to <code>&lt;bgsound&gt;</code>, I guess. Even if Google was open in a background tab or window, or in a hidden iframe created by an <a href="https://addons.mozilla.org/en-US/firefox/addon/2207/">add-on</a>, the Pac Man music and sound effects would start.</p>
<p>And that <a href="http://support.mozilla.com/en-US/forum/1/678028">confused some people</a>.</p>
<p>Many people came to <a href="http://support.mozilla.com/">SUMO</a> looking for an explanation, and many of them, not finding anything in the knowledge base, started posting to our forum. So many, in fact, that our database server started running out of connections.</p>
<p>The pounding we took on the forums also caused replication on our slave databases to fall behind by as much as 1.25 hours, so even when we wrote an article about the noises [article has been removed], it didn&#8217;t show up for most people.</p>
<p>As Sean put it: &#8220;We just got DDOSed by Pac Man.&#8221;</p>
<p>To shore up the site and bring it back from the brink of toppling over, we worked with IT (thanks, Dave!) to implement a number of temporary solutions. We&#8230;</p>
<ul>
<li>&#8230;disabled a particular kind of slow, frequent, and useless query.*</li>
<li>&#8230;blocked Google&#8217;s crawler from indexing the site.</li>
<li>&#8230;disabled our own sumobot&#8217;s forum-crawling features.</li>
<li>&#8230;rotated DB slaves out of the production pool to allow them to catch up.</li>
</ul>
<p>Google has already removed the Pac Man doodle from their home page, and we can revert most of the emergency measures here on Monday. But the event does remind us to look at what we&#8217;re doing in Kitsune, our rewrite, to weather storms like this in the future.</p>
<p>One idea, suggested by <a href="http://davedash.com/">Dave Dash</a>, is a read-only mode where all pages that can trigger database writes are temporarily disabled. We&#8217;ll be looking pretty seriously at this over the next couple of days.</p>
<p>Another important take-away is to make damn sure pages only trigger database writes if they really need to. Writes can never bounce off a cache, so they are very expensive.</p>
<p>Finally, we should be more proactive in how we interact with our Zeus cache. We&#8217;ll also think about whether it makes sense to start using <a href="http://micropipes.com/blog/">Wil Clouser&#8217;s</a> Zeus interface, <a href="http://github.com/clouserw/hera">Hera</a>, sooner than later.</p>
<p>&#8220;Too much traffic&#8221; is the best problem a web development team can have. Hopefully, the first time this happens to Kitsune, we&#8217;ll be ready.</p>
<p>* The queries that increment the number of views a forum thread has gotten are particularly slow for some reason. They&#8217;re also wildly inaccurate, since most people see a cached version of those pages and never trigger the query. The worst part: they occur on every (non-cached) page view, even while just reading.</p>
<p>(This post was <a href="http://pc.de/pages/surviving-pac-man-be">translated into Belorussian</a>, isn&#8217;t that cool?)</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/surviving-pac-man-422/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Responsible SQL: How to Authenticate Users</title>
		<link>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/</link>
		<comments>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/#comments</comments>
		<pubDate>Sun, 09 Nov 2008 17:16:58 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[attack]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[injection]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=144</guid>
		<description><![CDATA[Most SQL-injection articles set a horrible example for young programmers. Here is a very typical &#8220;bad example&#8221; of why you need to escape user data before it goes into SQL queries: (ed. The symbol « is a line break that’s not in the real code.) $username = $_POST&#91;&#8216;username&#8217;&#93;; // username=admin $password = $_POST&#91;&#8216;password&#8217;&#93;; // password=&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>Most SQL-injection articles set a horrible example for young programmers.</p>
<p>Here is a very typical &#8220;bad example&#8221; of why you need to escape user data before it goes into SQL queries:</p>
<p>(ed. The symbol « is a line break that’s not in the real code.)</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="re0">$username</span> = <span class="re0">$_POST</span><span class="br0">&#91;</span><span class="st0">&#8216;username&#8217;</span><span class="br0">&#93;</span>; <span class="co1">// username=admin</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$password</span> = <span class="re0">$_POST</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span>; <span class="co1">// password=&#8217; OR 1=1; &#8212; &#8216;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$user</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM users WHERE «</span></div>
</li>
<li class="li2">
<div class="de2"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; username=&#8217;$username&#8217; AND «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; password=&#8217;$password&#8217; LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
</ol>
</div>
<p>The point, of course, is that you must sanitize your user input, or else this person would run this query:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="re0">$user</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM users WHERE «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; username=&#8217;admin&#8217; AND «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; password = &#8221; OR 1=1; &#8212; &#8216; LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
</ol>
</div>
<p>Which grants the sneaky user all your admin privileges. Other versions have nefarious users dropping your users or articles tables.</p>
<p>The problem is: this is the wrong way to authenticate users. These examples are written for beginners to understand the importance of sanitizing input, but they also provide a model to those beginners for how user authentication works. And it&#8217;s a very bad model.</p>
<p>This is a long one, more after the break.<span id="more-144"></span></p>
<p>The only upside to authenticating this way is that you don&#8217;t expose any information on failure, that is, if I&#8217;m trying to hijack someone&#8217;s account, I can&#8217;t tell the difference between an invalid user name and a valid user name with a bad password. That&#8217;s good, but there are good reasons not to do this at the database level.</p>
<p>The &#8220;correct&#8221; way is not much more complex. Basically:</p>
<ol>
<li>Look up the record with the <strong>username</strong> only.</li>
<li>Get the (hashed) password out of the database.</li>
<li>Hash the submitted password.</li>
<li>Compare the two hashes.</li>
</ol>
<p>This is really not very hard to implement. In PHP:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="coMULTI">/**</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* Check a password against the database</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;*</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/param">param</a> string $username The username to check</span></div>
</li>
<li class="li2">
<div class="de2"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/param">param</a> string $password The (supposed) password</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/return">return</a> int 0=success, 1=bad username, 2=bad password</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;*/</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">function</span> check_password <span class="br0">&#40;</span> <span class="re0">$username</span>, <span class="re0">$password</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$db</span> = <span class="kw2">new</span> mysqli<span class="br0">&#40;</span><span class="br0">&#41;</span>; <span class="co1">// we need to talk to the DB</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// the real_escape_string() function is much better</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// than add_slashes() for escaping MySQL database input</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$_username</span> = <span class="re0">$db</span>-&gt;<span class="me1">real_escape_string</span><span class="br0">&#40;</span><span class="re0">$username</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// I try to make my SQL queries as easy to read</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// as possible. (Not always very easy.)</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$result</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT password &quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;FROM users &quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;WHERE username = &#8216;{$_username}&#8217; &quot;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// we&#8217;re assuming the query ran correctly</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// if we can&#8217;t return a row, then there&#8217;s no user with</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// that name</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> !<span class="re0">$user</span> = <span class="re0">$result</span>-&gt;<span class="me1">fetch_assoc</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">1</span>; <span class="co1">// return code for bad username</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// now, assuming the password was hashed with crypt()</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$user</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span> != «</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <a href="http://www.php.net/crypt"><span class="kw3">crypt</span></a><span class="br0">&#40;</span><span class="re0">$password</span>, <span class="re0">$user</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">2</span>; <span class="co1">// return code for bad password</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span>; <span class="co1">// return code for success</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>What&#8217;s going on here? Basically, we&#8217;re looking up the user by the username. If we don&#8217;t find a user, we throw out an error. If we do find a user, we re-encrypt the password they supplied, and check it against the encrypted password we already have. If they don&#8217;t match, we throw out an error. If they do, the user is allowed to log in.</p>
<p>There are two key differences between this method and the method so often espoused by tutorial writers:</p>
<ol>
<li>This method stores an encrypted password instead of plain text.</li>
<li>This method differentiates between bad usernames and bad passwords.</li>
</ol>
<p>#1 should be obvious. Never store an unencrypted password. It&#8217;s extremely dangerous: if someone ever gets a look at the table, they can just read the users&#8217; passwords—which may well be the same as their bank password (no it shouldn&#8217;t be, but it probably is). And it&#8217;s unnecessary. Every server-side language implements the MD5 hash, which is weak but works. Better options (like PHP&#8217;s <a onclick="window.open(this.href,'newwindow'); return false;" href="http://www.php.net/crypt">crypt()</a>) can use algorithms like Triple-DES, SHA1, Blowfish, or at least MD5 with a random salt.</p>
<p>But wait, #2, I said it was better <em>not</em> to distinguish between a bad username and a bad password, right? Well&#8230; yes, to the end user. In either case, I should display a message like &#8220;Bad username or password&#8221; to the person who tried to log in.</p>
<p>Internally, however, I want to know what happened. Is someone targetting known users, or just trying random combinations? How did they find real usernames? Where should I be improving security?</p>
<p>You&#8217;re also minimizing the number of user-submitted strings that get sent to the database. There are fewer opportunities for you to accidently allows an injection attack. If you have a policy on username syntax, you can keep yourself even safer by not talking to the database if the username is bad:</p>
<p>(I&#8217;ve omitted logging or real error-handling here. In a live version, I would probably wrap most of this in a <code><a onclick="window.open(this.href,'newwindow'); return false;" href="http://us2.php.net/manual/en/language.exceptions.php">try</a></code> block, throw one of three types of exceptions, and do some logging in the <code>catch</code> block.)</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">&lt;?php</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// Usernames must start with a letter, and contain</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// only letters, numbers, underscores and dots, but</span></div>
</li>
<li class="li2">
<div class="de2"><span class="co1">// must not end with a dot or underscore.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$user_regex</span> = <span class="st0">&#8216;/[a-zA-Z][a-zA-Z0-9_<span class="es0">\.</span>]*[a-zA-Z0-9]/&#8217;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span> <span class="br0">&#40;</span> <a href="http://www.php.net/preg_match"><span class="kw3">preg_match</span></a><span class="br0">&#40;</span><span class="re0">$user_regex</span>,<span class="re0">$username</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// the username matches our allowed syntax</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="re0">$auth</span> = check_password<span class="br0">&#40;</span><span class="re0">$username</span>, <span class="re0">$password</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$auth</span> === <span class="nu0">0</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="co1">// the do_login() function is an exercise</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="co1">// to the reader</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; do_login<span class="br0">&#40;</span><span class="re0">$username</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// the username was bad, or the username/password</span></div>
</li>
<li class="li2">
<div class="de2"><span class="co1">// was wrong</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// die() is an overly simplistic choice, here.</span></div>
</li>
<li class="li1">
<div class="de1"><a href="http://www.php.net/die"><span class="kw3">die</span></a><span class="br0">&#40;</span><span class="st0">&quot;Bad username or password.&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">?&gt;</span></div>
</li>
</ol>
</div>
<p>Obviously we still escape the username, to make damn sure, but this gives us another place to get information. Did someone actually enter <code>'; DROP TABLE users; --</code> into our login form, or did they just mistype their password.</p>
<p>I&#8217;m going to end with a request: if you&#8217;re about to write a tutorial for beginners, please be aware of what you&#8217;re modeling in your examples. If you&#8217;re doing something you would never do, for the sake of simplicity or because it&#8217;s not the focus of the tutorial, point that out. Link to another tutorial or at least mention that it&#8217;s a bad way to do something.</p>
<p>Don&#8217;t send a quiet message that wrong is OK.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Connecting PHP, IIS 6, and SQL Server 2005</title>
		<link>http://coffeeonthekeyboard.com/connecting-php-iis-6-and-sql-server-2005-129/</link>
		<comments>http://coffeeonthekeyboard.com/connecting-php-iis-6-and-sql-server-2005-129/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 16:33:20 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[iis]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[pdo]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[sql server]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=129</guid>
		<description><![CDATA[I know I will be accosted for this, but at work we needed to run PHP on IIS 6 (fairly simple) and connect it to a remote database server running SQL Server 2005 (not terrible, once I gave up the Microsoft way). Yeah yeah, do it in ASP.NET, I know. While I like C# as [...]]]></description>
			<content:encoded><![CDATA[<p>I know I will be accosted for this, but at work we needed to run PHP on IIS 6 (<a href="http://www.peterguy.com/php/install_IIS6.html">fairly simple</a>) and connect it to a remote database server running SQL Server 2005 (not terrible, once I gave up the Microsoft way).</p>
<p>Yeah yeah, do it in ASP.NET, I know. While I like C# as a language, I kind of hate ASP.NET as a framework, so what are you gonna do? Java was an option but the start-up time was too long for this project.</p>
<p>My first Google search for &#8220;PHP SQL Server 2005&#8243; turned up the Microsoft <a href="http://www.microsoft.com/sqlserver/2005/en/us/PHP-Driver.aspx">SQL Server 2005 Driver for PHP</a>. &#8220;Well great!&#8221; I thought. It&#8217;s just a PHP extension, very easy to install on Windows. But I didn&#8217;t know the horrid depths into which I was about to sink.</p>
<p>The Microsoft driver comes with an example application and database. The application assumes you are connecting to a local database. There is scant information about remote databases.</p>
<p>The driver defines this function:</p>
<pre>sqlsrv_connect($host[, $connectionOptions[, ...]]);</pre>
<p>The example application tells you to set <code>$host</code> to <var>(local)</var>. Supposedly this works. However, after scouring the internet for several days, and trying every permutation of hostname, Windows networking name, port, IP address, white space, and several other variables that shouldn&#8217;t have been in there, I&#8217;ve decided it doesn&#8217;t talk to remote servers nicely.</p>
<p><a href="http://us.php.net/manual/en/book.pdo.php">PDO</a>&#8216;s ODBC driver, on the other hand, and a quick visit to <a href="http://www.connectionstrings.com/">www.connectionstrings.com</a>, worked wonderfully.</p>
<p>Here is how I needed to create the PDO object. I hope this is useful for someone else:</p>
<p>(ed. The symbol « is a line break that&#8217;s not in the real code.)</p>
<pre>$host     = '1.2.3.4';
$port     = '1433';
$database = 'MyDatabase';
$user     = 'MyDatabaseUser';
$password = 'MyDatabasePassword';

$dsn = "odbc:DRIVER={SQL Server}; «
 SERVER=$server,$port;DATABASE=$database";

try {
  // connect
  $conn = new PDO($dsn,$user,$password);
} catch (PDOException $e) {
  // fancy error handling
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/connecting-php-iis-6-and-sql-server-2005-129/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Help Me Scale</title>
		<link>http://coffeeonthekeyboard.com/help-me-scale-97/</link>
		<comments>http://coffeeonthekeyboard.com/help-me-scale-97/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 14:58:38 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[db]]></category>
		<category><![CDATA[load]]></category>
		<category><![CDATA[microblog]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[social messaging]]></category>
		<category><![CDATA[subquery]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/help-me-scale-97/</guid>
		<description><![CDATA[I&#8217;ve been reading Eran Hammer-Lahav&#8217;s intelligent posts on microblog scalability, and now I&#8217;m concerned about my own &#8220;microblog&#8221; site, Picofiction. Similar to social networks, social updates, social messaging, social&#8230; Like many social web sites—amongst our weaponry&#8230;—Picofiction lets you &#8220;follow&#8221; your favorite authors, displaying all their posts along with yours. I handle this very naïvely: everything [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading Eran Hammer-Lahav&#8217;s <a href="http://www.hueniverse.com/hueniverse/2008/04/scaling-a-micro.html">intelligent</a> <a href="http://www.hueniverse.com/hueniverse/2008/03/scaling-a-micro.html">posts on</a> <a href="http://www.hueniverse.com/hueniverse/2008/03/on-scaling-a-mi.html">microblog scalability</a>, and now I&#8217;m concerned about my own &#8220;microblog&#8221; site, <a href="http://picofiction.com/">Picofiction</a>.</p>
<p>Similar to social networks, social updates, social messaging, social&#8230; Like many social web sites—amongst our weaponry&#8230;—Picofiction lets you &#8220;follow&#8221; your favorite authors, displaying all their posts along with yours.</p>
<p>I handle this very naïvely: everything is offloaded to the database. There are three tables involved here, one of users, one of posts, and one of follower/followee bindings.</p>
<p>Here&#8217;s the basic structure of this query:</p>
<pre>SELECT post_id, post_body, post_date, post_type,
  user_name AS author_name, user_id AS author_id
FROM posts
LEFT JOIN users
ON posts.author_id = users.user_id
WHERE author_id = '<var>CURRENT_USER</var>'
OR author_id IN (
  (SELECT followed_id
   FROM followers
   WHERE following_id = '<var>CURRENT_USER</var>')
  )
ORDER BY post_date DESC
LIMIT <var>PAGE_START</var>,20;</pre>
<p>Here&#8217;s where I need help: this works great on a single database, but it does not scale horizontally.</p>
<p>Since this horizontal scalability is such a hot topic right now, I&#8217;m asking for ideas. I&#8217;d like to put in the infrastructure <em>before</em> there is a need for it.</p>
<p>Eran points out that caching is not as simple a solution as we&#8217;d like to think. What do you cache? How do you keep caches in sync?</p>
<p>Does anyone have experience with MySQL Cluster Servers? It seems like the best way of scaling is to make the process as <a href="http://en.wikipedia.org/wiki/Amdahl%27s_law">parallelizable</a> as possible. The database then handles the parallelization, so the less I can do in the program the better, right?</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/help-me-scale-97/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL Subqueries</title>
		<link>http://coffeeonthekeyboard.com/mysql-subqueries-48/</link>
		<comments>http://coffeeonthekeyboard.com/mysql-subqueries-48/#comments</comments>
		<pubDate>Mon, 06 Aug 2007 01:31:00 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[subquery]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/mysql-subqueries-48/</guid>
		<description><![CDATA[I often find it difficult to find tips and advice for doing relatively simple things in things like MySQL, Ruby, Python, etc. So, starting with this post, I will help fill that niche. Today&#8217;s topic is Using Subqueries to Simplify your SQL Queries. For this article, I&#8217;m using PHP and MySQL for examples. There are [...]]]></description>
			<content:encoded><![CDATA[<p>I often find it difficult to find tips and advice for doing relatively simple things in things like MySQL, Ruby, Python, etc. So, starting with this post, I will help fill that niche. Today&#8217;s topic is <strong>Using Subqueries to Simplify your SQL Queries.</strong></p>
<p class="note" style="font-style: italic; font-size: 90%; color: #999999">For this article, I&#8217;m using PHP and MySQL for examples. There are slightly different implementations of SQL in the various database engines, but this is one thing they all have in common.</p>
<p>SQL is called &#8220;<strong>s</strong>tructured <strong>q</strong>uery <strong>l</strong>anguage&#8221; because it allows subqueries to make complex queries easier and faster. The idea of a subquery is simple: have the database perform one query and insert it into another.</p>
<p>There are dozens of useful ways of using subqueries, but I will concentrate on two: subqueries in the <em>select expression</em> and subqueries in the <em>where clause</em>.</p>
<h3>Security Concerns</h3>
<p>In most web programming languages, the interface between the script and the database only allows one query per access for security reasons: an injection attack could input something like <code>'; DELETE * FROM users;</code> and do some serious damage to a website. Imagine your SQL query to login looked something like:</p>
<blockquote><p><code>SELECT * FROM users WHERE user_name = '$username' AND password = '$password';</code></p></blockquote>
<p>If you are not checking and cleaning the input appropriately, someone could type the snippet above into your login form and, if multiple queries were allowed, MySQL would execute the following:</p>
<blockquote><p><code>SELECT * FROM users WHERE user_name = ''; DELETE * FROM users; AND password='';</code></p></blockquote>
<p>Since the empty string wouldn&#8217;t match any rows (hopefully), the first query would be discarded. The second query, the <code>DELETE</code> statement, would run, terminating at the second semicolon. Since the third piece of code is nonsense, MySQL would throw it out with an error.</p>
<p>To solve this problem, languages like PHP cause MySQL to issue an error any time there is more text (except comments) after the line terminator, usually the semicolon. The downside is that situations arise where you need to run multiple queries. The result is either often either a godawfully complicated statement with multiple <code>JOIN</code>s, or running several queries, each of which requires communication with your database server and can slow down your applications.</p>
<p>In the examples below, I&#8217;ll pretend we&#8217;re building a forum that has four tables:</p>
<ul>
<li><code>users</code> with primary key <code>user_id</code></li>
<li><code>forums</code>, a list of all the boards, with primary key <code>forum_id</code></li>
<li><code>threads</code> which links each thread to a forum with <code>forum_id</code> and has primary key <code>thread_id</code></li>
<li><code>posts</code> which links each post to a thread with <code>thread_id</code> and has primary key <code>post_id</code></li>
</ul>
<h3>Subqueries in Select Expressions</h3>
<p>One way to speed up your queries again is to use subqueries. Subqueries are full SQL queries nested within another query. For example:</p>
<blockquote><p><code>SELECT (SELECT * FROM t1);</code></p></blockquote>
<p>Obviously it&#8217;s a pretty simple example. Notice the parentheses. Subqueries must always be in parentheses, even if they are inside a function, like:</p>
<blockquote><p><code>SELECT MAX((SELECT salary FROM employees));</code></p></blockquote>
<p>Let&#8217;s get to work on our forum. Say that while reading all the threads of a forum you&#8217;d like to have both the number of threads and the number of posts in the forum. One way is to run two separate queries:</p>
<blockquote><p><code>SELECT COUNT(*) AS threads FROM threads WHERE forum_id='1';<br />
SELECT COUNT(*) AS posts FROM posts LEFT JOIN threads USING(thread_id) WHERE forum_id='1';</code></p></blockquote>
<p>That might not be so bad if your SQL server is <code>localhost</code>, but more and more hosts are running dedicated SQL servers, meaning that every query has to run across the internet, be processed, and run back, slowing down your application. But we can run this in one query with two subqueries:</p>
<blockquote><p><code>SELECT<br />
(SELECT COUNT(*) FROM threads WHERE forum_id='1') AS threads,<br />
(SELECT COUNT(*) FROM posts LEFT JOIN threads USING(thread_id) WHERE forum_id='1') AS posts;</code></p></blockquote>
<p>We can add the above to our query to get the name of the forum and its description, so we can further decrease the number of trips to the database:</p>
<blockquote><p><code>SELECT<br />
(SELECT COUNT(*) FROM threads WHERE threads.forum_id=forums.forum_id) AS threads,<br />
(SELECT COUNT(*) FROM posts LEFT JOIN threads USING(thread_id) WHERE threads.forum_id=forums.forum_id) AS posts,<br />
forum_name,<br />
forum_description<br />
FROM forums WHERE forum_id='1';</code></p></blockquote>
<p>Notice that we also changed the <code>WHERE</code> clauses to match whatever forum ID we put into the &#8220;<em>outer query</em>&#8220;.</p>
<h3>Subqueries in Where Clauses</h3>
<p>Another simple and useful way to use a subquery is in a <code>WHERE</code> clause. Here you must be careful to match the <code>WHERE</code> syntax and the type of data returned by the subquery. For example, in <code>WHERE user_name = (...)</code>, the subquery (<code>(...)</code>) must return a single value, while in <code>WHERE post_date IN (...)</code>, the subquery can return a list.</p>
<p>In our forum, we might want to search for all posts by a specific user, but we don&#8217;t want our visitors to need to know the user ID—or perhaps we want a more descriptive URL, like <code>search.php?user=USER_NAME</code> instead of <code>search.php?user=#ID#</code>. But in our forum, to be efficient, we link posts to their author by the <code>user_id</code> column.</p>
<p>One way to do this is to run a query to find the ID then run another query to find the posts. Another way in this particular case is to use a <code>JOIN</code> statement. But yet another way is to do this:</p>
<blockquote><p><code>SELECT * FROM posts WHERE user_id = (SELECT user_id FROM users WHERE user_name = 'foo');</code></p></blockquote>
<p>In the case above, a <code>JOIN</code> would also get us the information we want, but in some cases this isn&#8217;t true, for example:</p>
<blockquote><p><code>SELECT column1 FROM t1<br />
WHERE column1 = (SELECT MAX(column2) FROM t2);</code></p></blockquote>
<p>When you need to <code>COUNT</code> or otherwise aggregate one column, you&#8217;ll need to use a subquery instead of a <code>JOIN</code>, as well.</p>
<h3>Summary</h3>
<p>This article only scratched the surface of subqueries. Subqueries can be nested, they can appear in other places and do other things, and they can make your SQL more readable, among others. I don&#8217;t claim that the SQL statements above are the world&#8217;s most efficient or best way to do things—if you know a better way, let me know! I just want to give an introduction to subqueries, a very basic part of SQL that few people I&#8217;ve met seem to understand.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/mysql-subqueries-48/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

