Back-end

An Object Caching Pattern for Django

James Socol

07 May 2015 • 1 min read

Increasingly I’ve been treating even RDBMSes like structured key-value stores. There are still foreign keys and relationships in there, but the access patterns are most commonly by some kind of “primary” key (not always the primary key on the table, but a natural one).

Normally when I do something in more than two projects I’ll put it into a library, but for once this honestly feels too small, so instead, here’s a blog post and a gist.

This makes object caching quick to implement and very effective. Here’s a pattern I’ve been using in Django models:

Looking up an object looks like:

obj = MyModel.get(some_key)

Advantages of this pattern:

Straight-forward to implement, can be factored in a mixin without much work.
Caches non-existent entries (“misses”).
Very high hit rate in many common cases.
Low risk of caching stale data.
No signals or other spooky action at a distance.
Easy to mock get() in tests.

Disadvantages:

Subject to thundering herd when read rate is too high or hot-spots—can be partially alleviated by updating save() and delete() to write to the cache, too, but increases the probability of caching stale data.
No support for querysets or lists (intentional, as these are notoriously difficult to cache and invalidate correctly).
Can’t use queryset update() or delete() methods.

This works well when most read access is by the same natural key. You could extend it to support multiple keys—e.g. a name and an integer ID, by defining methods get_by_name(cls, name) and get_by_id(cls, pk), or similar, and then in flush, generating all the keys and using cache.delete_many. It works badly when most access is via related managers, e.g. my_obj.something_set.all().

The same pattern absolutely works outside of the Django ORM, but the specifics depend on how you’re accessing your DB. Personally, I like accessor functions that return dictionaries (e.g. get_some_object(key)).

Update, 7 May

Jannis pointed out, correctly, that this does introduce another call that can fail and thus it has implications for database transactions.

When I use this pattern, I typically enable atomic requests. Writes often cause side effects that need to propagate through various systems, so there’s usually more than one call that can fail. For the use cases I have today, atomic requests is enough. For others, more fine-grained transaction management is necessary.

Sign up for more like this.