<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Coffee on the Keyboard &#187; MySQL</title>
	<atom:link href="http://coffeeonthekeyboard.com/category/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://coffeeonthekeyboard.com</link>
	<description>by James Socol</description>
	<lastBuildDate>Mon, 06 Feb 2012 23:33:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Responsible SQL: How to Authenticate Users</title>
		<link>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/</link>
		<comments>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/#comments</comments>
		<pubDate>Sun, 09 Nov 2008 17:16:58 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[attack]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[injection]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/?p=144</guid>
		<description><![CDATA[Most SQL-injection articles set a horrible example for young programmers. Here is a very typical &#8220;bad example&#8221; of why you need to escape user data before it goes into SQL queries: (ed. The symbol « is a line break that’s not in the real code.) $username = $_POST&#91;&#8216;username&#8217;&#93;; // username=admin $password = $_POST&#91;&#8216;password&#8217;&#93;; // password=&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>Most SQL-injection articles set a horrible example for young programmers.</p>
<p>Here is a very typical &#8220;bad example&#8221; of why you need to escape user data before it goes into SQL queries:</p>
<p>(ed. The symbol « is a line break that’s not in the real code.)</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="re0">$username</span> = <span class="re0">$_POST</span><span class="br0">&#91;</span><span class="st0">&#8216;username&#8217;</span><span class="br0">&#93;</span>; <span class="co1">// username=admin</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$password</span> = <span class="re0">$_POST</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span>; <span class="co1">// password=&#8217; OR 1=1; &#8212; &#8216;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$user</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM users WHERE «</span></div>
</li>
<li class="li2">
<div class="de2"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; username=&#8217;$username&#8217; AND «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; password=&#8217;$password&#8217; LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
</ol>
</div>
<p>The point, of course, is that you must sanitize your user input, or else this person would run this query:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="re0">$user</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT * FROM users WHERE «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; username=&#8217;admin&#8217; AND «</span></div>
</li>
<li class="li1">
<div class="de1"><span class="st0"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; password = &#8221; OR 1=1; &#8212; &#8216; LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
</ol>
</div>
<p>Which grants the sneaky user all your admin privileges. Other versions have nefarious users dropping your users or articles tables.</p>
<p>The problem is: this is the wrong way to authenticate users. These examples are written for beginners to understand the importance of sanitizing input, but they also provide a model to those beginners for how user authentication works. And it&#8217;s a very bad model.</p>
<p>This is a long one, more after the break.<span id="more-144"></span></p>
<p>The only upside to authenticating this way is that you don&#8217;t expose any information on failure, that is, if I&#8217;m trying to hijack someone&#8217;s account, I can&#8217;t tell the difference between an invalid user name and a valid user name with a bad password. That&#8217;s good, but there are good reasons not to do this at the database level.</p>
<p>The &#8220;correct&#8221; way is not much more complex. Basically:</p>
<ol>
<li>Look up the record with the <strong>username</strong> only.</li>
<li>Get the (hashed) password out of the database.</li>
<li>Hash the submitted password.</li>
<li>Compare the two hashes.</li>
</ol>
<p>This is really not very hard to implement. In PHP:</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="coMULTI">/**</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* Check a password against the database</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;*</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/param">param</a> string $username The username to check</span></div>
</li>
<li class="li2">
<div class="de2"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/param">param</a> string $password The (supposed) password</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;* @<a href="http://twitter.com/return">return</a> int 0=success, 1=bad username, 2=bad password</span></div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">&nbsp;*/</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">function</span> check_password <span class="br0">&#40;</span> <span class="re0">$username</span>, <span class="re0">$password</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$db</span> = <span class="kw2">new</span> mysqli<span class="br0">&#40;</span><span class="br0">&#41;</span>; <span class="co1">// we need to talk to the DB</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// the real_escape_string() function is much better</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// than add_slashes() for escaping MySQL database input</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$_username</span> = <span class="re0">$db</span>-&gt;<span class="me1">real_escape_string</span><span class="br0">&#40;</span><span class="re0">$username</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// I try to make my SQL queries as easy to read</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// as possible. (Not always very easy.)</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="re0">$result</span> = <span class="re0">$db</span>-&gt;<span class="me1">query</span><span class="br0">&#40;</span><span class="st0">&quot;SELECT password &quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;FROM users &quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;WHERE username = &#8216;{$_username}&#8217; &quot;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .<span class="st0">&quot;LIMIT 1;&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// we&#8217;re assuming the query ran correctly</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// if we can&#8217;t return a row, then there&#8217;s no user with</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// that name</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> !<span class="re0">$user</span> = <span class="re0">$result</span>-&gt;<span class="me1">fetch_assoc</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">1</span>; <span class="co1">// return code for bad username</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="co1">// now, assuming the password was hashed with crypt()</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$user</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span> != «</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <a href="http://www.php.net/crypt"><span class="kw3">crypt</span></a><span class="br0">&#40;</span><span class="re0">$password</span>, <span class="re0">$user</span><span class="br0">&#91;</span><span class="st0">&#8216;password&#8217;</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">2</span>; <span class="co1">// return code for bad password</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span>; <span class="co1">// return code for success</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
</ol>
</div>
<p>What&#8217;s going on here? Basically, we&#8217;re looking up the user by the username. If we don&#8217;t find a user, we throw out an error. If we do find a user, we re-encrypt the password they supplied, and check it against the encrypted password we already have. If they don&#8217;t match, we throw out an error. If they do, the user is allowed to log in.</p>
<p>There are two key differences between this method and the method so often espoused by tutorial writers:</p>
<ol>
<li>This method stores an encrypted password instead of plain text.</li>
<li>This method differentiates between bad usernames and bad passwords.</li>
</ol>
<p>#1 should be obvious. Never store an unencrypted password. It&#8217;s extremely dangerous: if someone ever gets a look at the table, they can just read the users&#8217; passwords—which may well be the same as their bank password (no it shouldn&#8217;t be, but it probably is). And it&#8217;s unnecessary. Every server-side language implements the MD5 hash, which is weak but works. Better options (like PHP&#8217;s <a onclick="window.open(this.href,'newwindow'); return false;" href="http://www.php.net/crypt">crypt()</a>) can use algorithms like Triple-DES, SHA1, Blowfish, or at least MD5 with a random salt.</p>
<p>But wait, #2, I said it was better <em>not</em> to distinguish between a bad username and a bad password, right? Well&#8230; yes, to the end user. In either case, I should display a message like &#8220;Bad username or password&#8221; to the person who tried to log in.</p>
<p>Internally, however, I want to know what happened. Is someone targetting known users, or just trying random combinations? How did they find real usernames? Where should I be improving security?</p>
<p>You&#8217;re also minimizing the number of user-submitted strings that get sent to the database. There are fewer opportunities for you to accidently allows an injection attack. If you have a policy on username syntax, you can keep yourself even safer by not talking to the database if the username is bad:</p>
<p>(I&#8217;ve omitted logging or real error-handling here. In a live version, I would probably wrap most of this in a <code><a onclick="window.open(this.href,'newwindow'); return false;" href="http://us2.php.net/manual/en/language.exceptions.php">try</a></code> block, throw one of three types of exceptions, and do some logging in the <code>catch</code> block.)</p>
<div class="dean_ch" style="white-space: wrap;">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">&lt;?php</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// Usernames must start with a letter, and contain</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// only letters, numbers, underscores and dots, but</span></div>
</li>
<li class="li2">
<div class="de2"><span class="co1">// must not end with a dot or underscore.</span></div>
</li>
<li class="li1">
<div class="de1"><span class="re0">$user_regex</span> = <span class="st0">&#8216;/[a-zA-Z][a-zA-Z0-9_<span class="es0">\.</span>]*[a-zA-Z0-9]/&#8217;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">if</span> <span class="br0">&#40;</span> <a href="http://www.php.net/preg_match"><span class="kw3">preg_match</span></a><span class="br0">&#40;</span><span class="re0">$user_regex</span>,<span class="re0">$username</span><span class="br0">&#41;</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="co1">// the username matches our allowed syntax</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="re0">$auth</span> = check_password<span class="br0">&#40;</span><span class="re0">$username</span>, <span class="re0">$password</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$auth</span> === <span class="nu0">0</span> <span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="co1">// the do_login() function is an exercise</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="co1">// to the reader</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; do_login<span class="br0">&#40;</span><span class="re0">$username</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// the username was bad, or the username/password</span></div>
</li>
<li class="li2">
<div class="de2"><span class="co1">// was wrong</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co1">// die() is an overly simplistic choice, here.</span></div>
</li>
<li class="li1">
<div class="de1"><a href="http://www.php.net/die"><span class="kw3">die</span></a><span class="br0">&#40;</span><span class="st0">&quot;Bad username or password.&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">?&gt;</span></div>
</li>
</ol>
</div>
<p>Obviously we still escape the username, to make damn sure, but this gives us another place to get information. Did someone actually enter <code>'; DROP TABLE users; --</code> into our login form, or did they just mistype their password.</p>
<p>I&#8217;m going to end with a request: if you&#8217;re about to write a tutorial for beginners, please be aware of what you&#8217;re modeling in your examples. If you&#8217;re doing something you would never do, for the sake of simplicity or because it&#8217;s not the focus of the tutorial, point that out. Link to another tutorial or at least mention that it&#8217;s a bad way to do something.</p>
<p>Don&#8217;t send a quiet message that wrong is OK.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/responsible-sql-how-to-authenticate-144/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Help Me Scale</title>
		<link>http://coffeeonthekeyboard.com/help-me-scale-97/</link>
		<comments>http://coffeeonthekeyboard.com/help-me-scale-97/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 14:58:38 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Back-end]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[db]]></category>
		<category><![CDATA[load]]></category>
		<category><![CDATA[microblog]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[social messaging]]></category>
		<category><![CDATA[subquery]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/help-me-scale-97/</guid>
		<description><![CDATA[I&#8217;ve been reading Eran Hammer-Lahav&#8217;s intelligent posts on microblog scalability, and now I&#8217;m concerned about my own &#8220;microblog&#8221; site, Picofiction. Similar to social networks, social updates, social messaging, social&#8230; Like many social web sites—amongst our weaponry&#8230;—Picofiction lets you &#8220;follow&#8221; your favorite authors, displaying all their posts along with yours. I handle this very naïvely: everything [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading Eran Hammer-Lahav&#8217;s <a href="http://www.hueniverse.com/hueniverse/2008/04/scaling-a-micro.html">intelligent</a> <a href="http://www.hueniverse.com/hueniverse/2008/03/scaling-a-micro.html">posts on</a> <a href="http://www.hueniverse.com/hueniverse/2008/03/on-scaling-a-mi.html">microblog scalability</a>, and now I&#8217;m concerned about my own &#8220;microblog&#8221; site, <a href="http://picofiction.com/">Picofiction</a>.</p>
<p>Similar to social networks, social updates, social messaging, social&#8230; Like many social web sites—amongst our weaponry&#8230;—Picofiction lets you &#8220;follow&#8221; your favorite authors, displaying all their posts along with yours.</p>
<p>I handle this very naïvely: everything is offloaded to the database. There are three tables involved here, one of users, one of posts, and one of follower/followee bindings.</p>
<p>Here&#8217;s the basic structure of this query:</p>
<pre>SELECT post_id, post_body, post_date, post_type,
  user_name AS author_name, user_id AS author_id
FROM posts
LEFT JOIN users
ON posts.author_id = users.user_id
WHERE author_id = '<var>CURRENT_USER</var>'
OR author_id IN (
  (SELECT followed_id
   FROM followers
   WHERE following_id = '<var>CURRENT_USER</var>')
  )
ORDER BY post_date DESC
LIMIT <var>PAGE_START</var>,20;</pre>
<p>Here&#8217;s where I need help: this works great on a single database, but it does not scale horizontally.</p>
<p>Since this horizontal scalability is such a hot topic right now, I&#8217;m asking for ideas. I&#8217;d like to put in the infrastructure <em>before</em> there is a need for it.</p>
<p>Eran points out that caching is not as simple a solution as we&#8217;d like to think. What do you cache? How do you keep caches in sync?</p>
<p>Does anyone have experience with MySQL Cluster Servers? It seems like the best way of scaling is to make the process as <a href="http://en.wikipedia.org/wiki/Amdahl%27s_law">parallelizable</a> as possible. The database then handles the parallelization, so the less I can do in the program the better, right?</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/help-me-scale-97/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL Subqueries</title>
		<link>http://coffeeonthekeyboard.com/mysql-subqueries-48/</link>
		<comments>http://coffeeonthekeyboard.com/mysql-subqueries-48/#comments</comments>
		<pubDate>Mon, 06 Aug 2007 01:31:00 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[subquery]]></category>

		<guid isPermaLink="false">http://coffeeonthekeyboard.com/mysql-subqueries-48/</guid>
		<description><![CDATA[I often find it difficult to find tips and advice for doing relatively simple things in things like MySQL, Ruby, Python, etc. So, starting with this post, I will help fill that niche. Today&#8217;s topic is Using Subqueries to Simplify your SQL Queries. For this article, I&#8217;m using PHP and MySQL for examples. There are [...]]]></description>
			<content:encoded><![CDATA[<p>I often find it difficult to find tips and advice for doing relatively simple things in things like MySQL, Ruby, Python, etc. So, starting with this post, I will help fill that niche. Today&#8217;s topic is <strong>Using Subqueries to Simplify your SQL Queries.</strong></p>
<p class="note" style="font-style: italic; font-size: 90%; color: #999999">For this article, I&#8217;m using PHP and MySQL for examples. There are slightly different implementations of SQL in the various database engines, but this is one thing they all have in common.</p>
<p>SQL is called &#8220;<strong>s</strong>tructured <strong>q</strong>uery <strong>l</strong>anguage&#8221; because it allows subqueries to make complex queries easier and faster. The idea of a subquery is simple: have the database perform one query and insert it into another.</p>
<p>There are dozens of useful ways of using subqueries, but I will concentrate on two: subqueries in the <em>select expression</em> and subqueries in the <em>where clause</em>.</p>
<h3>Security Concerns</h3>
<p>In most web programming languages, the interface between the script and the database only allows one query per access for security reasons: an injection attack could input something like <code>'; DELETE * FROM users;</code> and do some serious damage to a website. Imagine your SQL query to login looked something like:</p>
<blockquote><p><code>SELECT * FROM users WHERE user_name = '$username' AND password = '$password';</code></p></blockquote>
<p>If you are not checking and cleaning the input appropriately, someone could type the snippet above into your login form and, if multiple queries were allowed, MySQL would execute the following:</p>
<blockquote><p><code>SELECT * FROM users WHERE user_name = ''; DELETE * FROM users; AND password='';</code></p></blockquote>
<p>Since the empty string wouldn&#8217;t match any rows (hopefully), the first query would be discarded. The second query, the <code>DELETE</code> statement, would run, terminating at the second semicolon. Since the third piece of code is nonsense, MySQL would throw it out with an error.</p>
<p>To solve this problem, languages like PHP cause MySQL to issue an error any time there is more text (except comments) after the line terminator, usually the semicolon. The downside is that situations arise where you need to run multiple queries. The result is either often either a godawfully complicated statement with multiple <code>JOIN</code>s, or running several queries, each of which requires communication with your database server and can slow down your applications.</p>
<p>In the examples below, I&#8217;ll pretend we&#8217;re building a forum that has four tables:</p>
<ul>
<li><code>users</code> with primary key <code>user_id</code></li>
<li><code>forums</code>, a list of all the boards, with primary key <code>forum_id</code></li>
<li><code>threads</code> which links each thread to a forum with <code>forum_id</code> and has primary key <code>thread_id</code></li>
<li><code>posts</code> which links each post to a thread with <code>thread_id</code> and has primary key <code>post_id</code></li>
</ul>
<h3>Subqueries in Select Expressions</h3>
<p>One way to speed up your queries again is to use subqueries. Subqueries are full SQL queries nested within another query. For example:</p>
<blockquote><p><code>SELECT (SELECT * FROM t1);</code></p></blockquote>
<p>Obviously it&#8217;s a pretty simple example. Notice the parentheses. Subqueries must always be in parentheses, even if they are inside a function, like:</p>
<blockquote><p><code>SELECT MAX((SELECT salary FROM employees));</code></p></blockquote>
<p>Let&#8217;s get to work on our forum. Say that while reading all the threads of a forum you&#8217;d like to have both the number of threads and the number of posts in the forum. One way is to run two separate queries:</p>
<blockquote><p><code>SELECT COUNT(*) AS threads FROM threads WHERE forum_id='1';<br />
SELECT COUNT(*) AS posts FROM posts LEFT JOIN threads USING(thread_id) WHERE forum_id='1';</code></p></blockquote>
<p>That might not be so bad if your SQL server is <code>localhost</code>, but more and more hosts are running dedicated SQL servers, meaning that every query has to run across the internet, be processed, and run back, slowing down your application. But we can run this in one query with two subqueries:</p>
<blockquote><p><code>SELECT<br />
(SELECT COUNT(*) FROM threads WHERE forum_id='1') AS threads,<br />
(SELECT COUNT(*) FROM posts LEFT JOIN threads USING(thread_id) WHERE forum_id='1') AS posts;</code></p></blockquote>
<p>We can add the above to our query to get the name of the forum and its description, so we can further decrease the number of trips to the database:</p>
<blockquote><p><code>SELECT<br />
(SELECT COUNT(*) FROM threads WHERE threads.forum_id=forums.forum_id) AS threads,<br />
(SELECT COUNT(*) FROM posts LEFT JOIN threads USING(thread_id) WHERE threads.forum_id=forums.forum_id) AS posts,<br />
forum_name,<br />
forum_description<br />
FROM forums WHERE forum_id='1';</code></p></blockquote>
<p>Notice that we also changed the <code>WHERE</code> clauses to match whatever forum ID we put into the &#8220;<em>outer query</em>&#8220;.</p>
<h3>Subqueries in Where Clauses</h3>
<p>Another simple and useful way to use a subquery is in a <code>WHERE</code> clause. Here you must be careful to match the <code>WHERE</code> syntax and the type of data returned by the subquery. For example, in <code>WHERE user_name = (...)</code>, the subquery (<code>(...)</code>) must return a single value, while in <code>WHERE post_date IN (...)</code>, the subquery can return a list.</p>
<p>In our forum, we might want to search for all posts by a specific user, but we don&#8217;t want our visitors to need to know the user ID—or perhaps we want a more descriptive URL, like <code>search.php?user=USER_NAME</code> instead of <code>search.php?user=#ID#</code>. But in our forum, to be efficient, we link posts to their author by the <code>user_id</code> column.</p>
<p>One way to do this is to run a query to find the ID then run another query to find the posts. Another way in this particular case is to use a <code>JOIN</code> statement. But yet another way is to do this:</p>
<blockquote><p><code>SELECT * FROM posts WHERE user_id = (SELECT user_id FROM users WHERE user_name = 'foo');</code></p></blockquote>
<p>In the case above, a <code>JOIN</code> would also get us the information we want, but in some cases this isn&#8217;t true, for example:</p>
<blockquote><p><code>SELECT column1 FROM t1<br />
WHERE column1 = (SELECT MAX(column2) FROM t2);</code></p></blockquote>
<p>When you need to <code>COUNT</code> or otherwise aggregate one column, you&#8217;ll need to use a subquery instead of a <code>JOIN</code>, as well.</p>
<h3>Summary</h3>
<p>This article only scratched the surface of subqueries. Subqueries can be nested, they can appear in other places and do other things, and they can make your SQL more readable, among others. I don&#8217;t claim that the SQL statements above are the world&#8217;s most efficient or best way to do things—if you know a better way, let me know! I just want to give an introduction to subqueries, a very basic part of SQL that few people I&#8217;ve met seem to understand.</p>
]]></content:encoded>
			<wfw:commentRss>http://coffeeonthekeyboard.com/mysql-subqueries-48/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

