Friday, April 01, 2011

Building a Facebook Feed Like system on a Sharded mySQL System

Building a Feed can be broken down into a few key questions.

Who can see what?
How many people can see it?

These key answers really dictates the design of data structure from a global read method to a global write method. A global read method can be summarized using some sql.

SELECT * FROM Activity WHERE userId IN (?,?,?,?,?) AND createDate > NOW() - INTERVAL 2 WEEKS;

Above is actually not an optimal SELECT, what is really done in my case is foreach '?' do a parallel query to each shard group of

SELECT * FROM Activity WHERE userId = ? AND createDate > NOW() - INTERVAL 2 WEEKS;

The reason why the 1st query is not optimal is due to the fact that their are two ranges in the 1st query. The IN clause is a range and createDate is a range thus you can't use a composite key (userId, createDate) the query is only using userId.


A global write method can be summarized with the following query:

SELECT * FROM Activity WHERE rowOwnerId = ? AND createDate > NOW() - INTERVAL 2 WEEKS.

Now you may be wondering if you are doing a global write, why is there a select? For each friend that is following the person who has activity done to said person which either they initiated or is acted upon, write a row for said person and their friends. Thus one action can create 5000+ writes such that the number of writes are proportional to the number of friends that is following the person being acted upon. Writes are N+1.

Global writes allows for you're view of the Feed to be fast but is really hard to keep it in real-time in sync, additionally its very expensive since a copy of the pointer can be copied N times and to get real speed and to avoid additional reads, the content is copied for each friend write. In global writes you have to create queues which succumbs to queuing theory. Additional to these considerations edits and deletes are very hard to keep in sync as well as need special consideration for new friends joining. These things can expose system lag constantly. That being said it can still be done, it's a lot of work that is expensive in terms of hardware cost and developer time.


The system I have built supports both yet currently implements a global read method. The mysql table structure follows.

userId bigint unsigned NOT NULL DEFAULT 0 - this is the person being acted upon.
parentOwnerId bigint unsigned NOT NULL DEFAULT 0 - this is the person who created the content
parentId bigint unsigned NOT NULL DEFAULT 0 - this defines the pointer for the actual content
parentType smallint unsigned NOT NULL DEFAULT 0 - this defines that the content is a wall post or a friend event or

itemType smallint unsigned NOT NULL DEFAULT 0 - this defines the actual action, which could be a comment to the parent Type or just the parent itself
itemId bigint unsigned NOT NULL DEFAULT 0 - this defines the pointer to the item for content retrieval
createDate timestamp not NULL DEFAULT 0 - this indicates when the event entered the system
modifiedDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP - this defines when the item was added to for a later feature.


The primary key in this mix is
userId, itemType, parentId

this defines that the user acted upon will only have 1 row foreach type of a parentId. This means if there are 1000 comments to a status update, the activity table will have 1 row for that user that made the comment even if that same user was the person who made 1000 comments. Note that this activity table was updated 1000 times because modifiedDate is a timestamp that gets updated on every action.

As you can tell the Activity Table is centered to the entire system and the actual content tables are what I call branches to Activity. Let's take an example to explain this.

user 1000 adds a status update, status update is a table called WallPosts.

WallPosts.posterId - 88888
WallPosts.itemOwnerId - 1000
WallPosts.createDate - NOW()
WallPosts.posterId - 1000
WallPosts.post - This is my status update.

Activity.userId - 1000
Activity.parentOwnerId - 1000
Activity.parentId - 88888
Activity.parentType - 1
Activity.itemType - 1
Activity.itemId - 88888
Activity.createDate - NOW()
Activity.modifiedDate - NOW()


So now friends of 1000 will query userId 1000's shard for the WallPost. Now let's have userId 2000 leave a comment in a table called Comments

Comments.commentId - 88889
Comments.itemOwnerId - 1000
Comments.itemType - 1
Comments.commenterId - 2000
Comments.createDate - NOW()
Comments.comment - " I left a comment on your status update!"

This says that friend 2000 left a comment on userId's 1000 (their shard) of "I left a comment on your status update!"


Activity.userId - 2000
Activity.parentOwnerId - 1000
Activity.parentType - 1
Activity.parentId - 88888
Activity.itemType - 2 //(left a comment)
Activity.itemId - 88889
Activity.createDate - NOW()
Activity.modifiedDate - NOW()


This says that friends of userId -2000, 2000 left a comment on UserId 1000's status update where the comment and the status update both reside on 1000's shard. The content can be reached via

88889 == CommentId

and friends of 2000 now know that 2000 left a comment for 1000. Thus the viral effect within 1/2 degree's of userId 1000.


There is a lot more going on behind the scenes like Children of parents, Using the combined read throughput of memcache, sharded mysql system and permission filtering but this is the general idea.


I am going to continually update this post, so keep checking back here as I add diagrams, flows, links and details looks at grouping events.

Note: By no means am I saying that a global write or Fan-Out write is not as good as a Fan-Out on Load or global read. I'm taking an approach to make it cheap and not trying to optimize to soon.

Here are some good links that I found after writing this post:

http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed


http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed?q=news+feed

No comments: