An Object Caching Pattern for Django
Increasingly I’ve been treating even RDBMSes like structured key-value stores. There are still foreign keys and relationships in there, but the access patterns are most commonly by some kind of “primary” key (not always the primary key on the table, but a natural one).
Normally when I do something in more than two projects I’ll put it into a library, but for once this honestly feels too small, so instead, here’s a blog post and a gist.
This makes object caching quick to implement and very effective. Here’s a pattern I’ve been using in Django models:
Looking up an object looks like:
obj = MyModel.get(some_key)
Advantages of this pattern:
- Straight-forward to implement, can be factored in a mixin without much work.
- Caches non-existent entries (“misses”).
- Very high hit rate in many common cases.
- Low risk of caching stale data.
- No signals or other spooky action at a distance.
- Easy to mock
get()
in tests.
Disadvantages:
- Subject to thundering herd when read rate is too high or hot-spots—can be partially alleviated by updating
save()
anddelete()
to write to the cache, too, but increases the probability of caching stale data. - No support for querysets or lists (intentional, as these are notoriously difficult to cache and invalidate correctly).
- Can’t use queryset
update()
ordelete()
methods.
This works well when most read access is by the same natural key. You could extend it to support multiple keys—e.g. a name and an integer ID, by defining methods get_by_name(cls, name)
and get_by_id(cls, pk)
, or similar, and then in flush
, generating all the keys and using cache.delete_many
. It works badly when most access is via related managers, e.g. my_obj.something_set.all()
.
The same pattern absolutely works outside of the Django ORM, but the specifics depend on how you’re accessing your DB. Personally, I like accessor functions that return dictionaries (e.g. get_some_object(key)
).
Update, 7 May
Jannis pointed out, correctly, that this does introduce another call that can fail and thus it has implications for database transactions.
When I use this pattern, I typically enable atomic requests. Writes often cause side effects that need to propagate through various systems, so there’s usually more than one call that can fail. For the use cases I have today, atomic requests is enough. For others, more fine-grained transaction management is necessary.