Help Me Scale
I’ve been reading Eran Hammer-Lahav’s intelligent posts on microblog scalability, and now I’m concerned about my own “microblog” site, Picofiction.
Similar to social networks, social updates, social messaging, social… Like many social web sites—amongst our weaponry…—Picofiction lets you “follow” your favorite authors, displaying all their posts along with yours.
I handle this very naïvely: everything is offloaded to the database. There are three tables involved here, one of users, one of posts, and one of follower/followee bindings.
Here’s the basic structure of this query:
SELECT post_id, post_body, post_date, post_type, user_name AS author_name, user_id AS author_id FROM posts LEFT JOIN users ON posts.author_id = users.user_id WHERE author_id = 'CURRENT_USER' OR author_id IN ( (SELECT followed_id FROM followers WHERE following_id = 'CURRENT_USER') ) ORDER BY post_date DESC LIMIT PAGE_START,20;
Here’s where I need help: this works great on a single database, but it does not scale horizontally.
Since this horizontal scalability is such a hot topic right now, I’m asking for ideas. I’d like to put in the infrastructure before there is a need for it.
Eran points out that caching is not as simple a solution as we’d like to think. What do you cache? How do you keep caches in sync?
Does anyone have experience with MySQL Cluster Servers? It seems like the best way of scaling is to make the process as parallelizable as possible. The database then handles the parallelization, so the less I can do in the program the better, right?