Responsible SQL: How to Authenticate Users

Most SQL-injection articles set a horrible example for young programmers.

Here is a very typical “bad example” of why you need to escape user data before it goes into SQL queries:

(ed. The symbol « is a line break that’s not in the real code.)

$username = $_POST[‘username’]; // username=admin
  1. $password = $_POST[‘password’]; // password=’ OR 1=1; — ‘
  3. $user = $db->query("SELECT * FROM users WHERE «
  4.           username=’$username’ AND «
  5.           password=’$password’ LIMIT 1;");

The point, of course, is that you must sanitize your user input, or else this person would run this query:

$user = $db->query("SELECT * FROM users WHERE «
  1.           username=’admin’ AND «
  2.           password = ” OR 1=1; — ‘ LIMIT 1;");

Which grants the sneaky user all your admin privileges. Other versions have nefarious users dropping your users or articles tables.

The problem is: this is the wrong way to authenticate users. These examples are written for beginners to understand the importance of sanitizing input, but they also provide a model to those beginners for how user authentication works. And it’s a very bad model.

This is a long one, more after the break.

The only upside to authenticating this way is that you don’t expose any information on failure, that is, if I’m trying to hijack someone’s account, I can’t tell the difference between an invalid user name and a valid user name with a bad password. That’s good, but there are good reasons not to do this at the database level.

The “correct” way is not much more complex. Basically:

  1. Look up the record with the username only.
  2. Get the (hashed) password out of the database.
  3. Hash the submitted password.
  4. Compare the two hashes.

This is really not very hard to implement. In PHP:

  1.  * Check a password against the database
  2.  *
  3.  * @param string $username The username to check
  4.  * @param string $password The (supposed) password
  5.  * @return int 0=success, 1=bad username, 2=bad password
  6.  */
  7. function check_password ($username, $password){
  8.     $db = new mysqli(); // we need to talk to the DB
  10.     // the realescapestring() function is much better
  11.     // than add_slashes() for escaping MySQL database input
  12.     $username = $db->realescape_string($username);
  14.     // I try to make my SQL queries as easy to read
  15.     // as possible. (Not always very easy.)
  16.     $result = $db->query("SELECT password "
  17.                         ."FROM users "
  18.                         ."WHERE username = ‘{$_username}’ "
  19.                         ."LIMIT 1;");
  21.     // we’re assuming the query ran correctly
  23.     // if we can’t return a row, then there’s no user with
  24.     // that name
  25.     if( !$user = $result->fetch_assoc()){
  26.         return1; // return code for bad username
  27.     }
  29.     // now, assuming the password was hashed with crypt()
  30.     if($user[‘password’] != «
  31.             crypt($password, $user[‘password’])){
  32.         return2; // return code for bad password
  33.     }
  35.     return0; // return code for success
  36. }

What’s going on here? Basically, we’re looking up the user by the username. If we don’t find a user, we throw out an error. If we do find a user, we re-encrypt the password they supplied, and check it against the encrypted password we already have. If they don’t match, we throw out an error. If they do, the user is allowed to log in.

There are two key differences between this method and the method so often espoused by tutorial writers:

  1. This method stores an encrypted password instead of plain text.
  2. This method differentiates between bad usernames and bad passwords.

1 should be obvious. Never store an unencrypted password. It’s extremely dangerous: if someone ever gets a look at the table, they can just read the users’ passwords—which may well be the same as their bank password (no it shouldn’t be, but it probably is). And it’s unnecessary. Every server-side language implements the MD5 hash, which is weak but works. Better options (like PHP’s crypt()) can use algorithms like Triple-DES, SHA1, Blowfish, or at least MD5 with a random salt.

But wait, #2, I said it was better not to distinguish between a bad username and a bad password, right? Well… yes, to the end user. In either case, I should display a message like “Bad username or password” to the person who tried to log in.

Internally, however, I want to know what happened. Is someone targetting known users, or just trying random combinations? How did they find real usernames? Where should I be improving security?

You’re also minimizing the number of user-submitted strings that get sent to the database. There are fewer opportunities for you to accidently allows an injection attack. If you have a policy on username syntax, you can keep yourself even safer by not talking to the database if the username is bad:

(I’ve omitted logging or real error-handling here. In a live version, I would probably wrap most of this in a <a href="" onclick=",'newwindow'); return false;">try</a> block, throw one of three types of exceptions, and do some logging in the catch block.)

  2. // Usernames must start with a letter, and contain
  3. // only letters, numbers, underscores and dots, but
  4. // must not end with a dot or underscore.
  5. $userregex = ‘/[a-zA-Z][a-zA-Z0-9.]*[a-zA-Z0-9]/’;
  7. if(pregmatch($userregex,$username)){
  8.     // the username matches our allowed syntax
  9.     $auth = check_password($username, $password);
  11.     if($auth === 0){
  12.       // the do_login() function is an exercise
  13.       // to the reader
  14.       do_login($username);
  15.     }
  16. }
  18. // the username was bad, or the username/password
  19. // was wrong
  20. // die() is an overly simplistic choice, here.
  21. die("Bad username or password.");
  23. ?>

Obviously we still escape the username, to make damn sure, but this gives us another place to get information. Did someone actually enter '; DROP TABLE users; -- into our login form, or did they just mistype their password.

I’m going to end with a request: if you’re about to write a tutorial for beginners, please be aware of what you’re modeling in your examples. If you’re doing something you would never do, for the sake of simplicity or because it’s not the focus of the tutorial, point that out. Link to another tutorial or at least mention that it’s a bad way to do something.

Don’t send a quiet message that wrong is OK.