Simple out-of-process lock with Python and Memcached
On TodaysMeet I need to check that a name is not in use before creating a new record. Unfortunately, because names can be reused over time, I can’t create a UNIQUE
key in the database and enforce it there. That means there is some tiny amount of time between checking for existence and writing a new record.
Normally, the gap is way too small to matter. But every once in a while some client will send multiple POST
requests at more or less the same time. (The internet is a weird place at scale.)
Since the rest of the application relies on having only one “active” name at a time, writing the row twice causes problems throughout the app*.
The concept is simple. I want to acquire a lock on the name so that only one thread/process/server at a time can actually do a check and write. On a single process, or even across processes on a single server, Python has tools for this built in. (Assuming your concurrency model works with multiprocessing
, I guess.)
But I’d like to be able to run this across multiple machines and scale horizontally. Fortunately, I have memcached and memcached has an atomic add
operation.
add
works perfectly for this when multiple processes might be writing to the cache. Unlike set
, add
will only succeed if the key does not exist. Boom.
Writing this up as a little context manager (the cache
object here stands in for any memcached client, in my case the Django wrapper around it):
import hashlib
from contextlib import contextmanager
@<a href="http://twitter.com/contextmanager">contextmanager</a>
def name_lock(name):
key = 'namelock::%s' % hashlib.md5(name.encode('utf-8')).hexdigest()
lock = cache.add(key, True, expires=10) # In case the power goes out.
yield lock # Tell the inner block if it acquired the lock.
if lock: # Only clear the lock if we had it.
cache.delete(key)
Now I can have multiple processes on multiple machines safely check the name existence and create a row.
with name_lock(name) as lock:
if lock and name_is_valid(name):
save_name(name)
Is that the best way to do it? Probably not. If you can improve on this, or have an entirely better idea, I’d love to hear it!
*: The other interesting column is a datetime: expires
. I’ve been bitten too often before by seconds ticking over to bother with a unique key on (name, expires)
.