Simple out-of-process lock with Python and Memcached

On TodaysMeet I need to check that a name is not in use before creating a new record. Unfortunately, because names can be reused over time, I can’t create a UNIQUE key in the database and enforce it there. That means there is some tiny amount of time between checking for existence and writing a new record.

Normally, the gap is way too small to matter. But every once in a while some client will send multiple POST requests at more or less the same time. (The internet is a weird place at scale.)

Since the rest of the application relies on having only one “active” name at a time, writing the row twice causes problems throughout the app*.

The concept is simple. I want to acquire a lock on the name so that only one thread/process/server at a time can actually do a check and write. On a single process, or even across processes on a single server, Python has tools for this built in. (Assuming your concurrency model works with multiprocessing, I guess.)

But I’d like to be able to run this across multiple machines and scale horizontally. Fortunately, I have memcached and memcached has an atomic add operation.

add works perfectly for this when multiple processes might be writing to the cache. Unlike set, add will only succeed if the key does not exist. Boom.

Writing this up as a little context manager (the cache object here stands in for any memcached client, in my case the Django wrapper around it):

import hashlib
from contextlib import contextmanager

@<a href="http://twitter.com/contextmanager">contextmanager</a>
def name_lock(name):
    key = 'namelock::%s' % hashlib.md5(name.encode('utf-8')).hexdigest()
    lock = cache.add(key, True, expires=10)  # In case the power goes out.
    yield lock  # Tell the inner block if it acquired the lock.
    if lock:  # Only clear the lock if we had it.
        cache.delete(key)

Now I can have multiple processes on multiple machines safely check the name existence and create a row.

with name_lock(name) as lock:
    if lock and name_is_valid(name):
        save_name(name)

Is that the best way to do it? Probably not. If you can improve on this, or have an entirely better idea, I’d love to hear it!

*: The other interesting column is a datetime: expires. I’ve been bitten too often before by seconds ticking over to bother with a unique key on (name, expires).