Irregular Update 04/Oct/2014

I’ve been thinking about restarting the weekly updates I used to do, mostly just to get myself in the habit of writing more, but also to make myself take some note of what I’ve gotten done.

It has been a very busy summer, since I started working full-time on TodaysMeet. But as long as that list is, there are a bunch of under-the-hood changes I haven’t been able to talk about much yet. And I like being as open as I can with technical things.

The biggest change is the switch from periodic short polling (which produced too many requests per second for Linode’s NodeBalancers) to streaming connections.

The addition of streaming connections also marked the official switch of TodaysMeet from a monolithic Django app to a multi-service architecture†. Having spent time with Django and gevent working on chat prototypes for Mozilla Help way back in the day, I went with Node.js and Primus, which has been great*.

To keep the API responses fast (90%ile is under 50ms, plus network time) and to facilitate horizontal scaling for the streaming processes, I built a new intermediary process, called “reflektor” (because I’m a dork). Reflektor accepts HTTP requests and returns instantly, while queuing them for replay against one or more hosts. The API servers post to reflektor (90%ile, 3ms) which posts to the streaming service (“ekg”—see? dork).

I am planning, once I get to a breather moment, to open-source a lot of the internal node packages I’ve written. The in-process task queue from reflektor is very close to ready to go, and I’ve done some work with logging, porting much of Python’s logging module (which is itself a port of log4j, and which I’m sure dozens of others have done, but…) and on HTTP host pools. All of which I’ve written to be stand-alone and open-able.

Streaming connections have allowed me to do all sorts of other things. The most user-visible of which is deleting comments from rooms in a meaningful way.

To keep the UI responsive and get the streams started faster, I’ve started loading non-streaming messages lazily, with infinite scroll. This made some of the logic (“which order do messages come in? do I want messages since or messages before?”) fairly complex. Let’s all be thankful for tests! Limiting the number of messages per response took nearly all the spikiness out of the API response times and made them much faster. Here’s the 90%ile for two weeks either site of the switch.

Because I’ve started handling webhook requests from Mandrill, MailChimp, and now Stripe(!), I’ve been building a generic recording mechanism to prevent event replay, and that’s something that may make sense to open up, if it turns out to be generically useful. But given the lack of standardization of webhook payloads, it’s tough to do anything that’s both generic and interesting.

I’ve taken to building my own RPMs of more and more essential libraries, like nginx and OpenSSL and node, and even py2cairo, which is a pain to install normally.

All of which has meant I’ve done a bunch of work on deployments, specifically zero-downtime deployments. Almost all of that has been in Fabric. No one on the site should notice, or see an error, during a deployment, no matter what they try to do.

And the deployments work well! Having spent so much time working on continuous deployment at Mozilla and using the fantastic deployment systems at Bitly, I’m pretty proud of the hundreds of non-event deployments that I’ve done over the summer. Most of them during peak traffic, because it’s when I’m most awake if anything goes wrong.

One of the biggest challenges of running a one-person show is doing everything. Along with my lovely eng/devops/releng hat, I’ve spent an uncomfortable amount of time playing designer, social media intern, product designer, and, as I get ready to roll out the first paid TodaysMeet product, business person. Josh has been an invaluable friend and resource throughout. If you need design or product help, or development, you should hire him.

It’s been a busy summer. Even busier than the TodaysMeet blog lets on. Now it’s autumn, and with any luck it will be just as busy.

I want to write more, about some of this stuff in detail, and hopefully give some shorter talks about it in the area. And open source some software! So I’m going to do my best to make some time to do that. I can’t promise that I’ll totally restart weekly updates but at the very least I can do semi-regular little brain dumps like this.

  • Since I chose Primus, has released a new major version.

† Not including things like StatsD, Graphite, and email.