Wednesday, April 20, 2011

Building an email system on EC2 from top to bottom with SendGrid

Email is a beast. Sending email is easy but getting it pass spam filters when you’re a legitimate service is rather hard. Sending good email is especially hard when on EC2. The reason is due to spammers using and abusing elastic IPS. So, for startups your best bet of sending out a lot of email and getting it to the user is to use a service. I picked Sendgrid. It's cheap, fast, has good email tracking and builds all the appropriate email headers to get the mail white listed and into the destinations inbox. Sendgrid is the sender (think of them as an extension to sendmail). This is the easy part but to make a true email system that protects your users you need to take some things into consideration.


All HTML mail needs a Text counter part. Some people just like mutt or pine over HTML email. Thus when sending email send out mail in HTML format and Text format with mime headers so what ever email client is used can see a good formatted email.


All HTML links should be encrypted and encoded when passing identifying information. This needs to be done to make sure that the person that the link is intended for is clicked by that person. For instance

http://www.example.com/?enc=Ujcrq3uW8oU%2BpkW8bPJirwfczkMBnaWMObHlzCK8taau9PAjEQhCIZToj302zjVRs2f61bt7dddT7v21kpbhw6ZR5B1%2BoBIZNAznoLceK7z%2B%2BBm%2FS7%2FHKx0zfYah2Du%2FdaxP9dGel67SyQBp9ZJurXomrkyqkeLJiPioKMCaoygHruI%2FcJ83DvmOBNhqOjNQLyVMIHdjEWx3yYTMTsSZRUDdNPdaBfuTD3InspKINsQBBON0fPe890l3%2Bpb6p%2F4GtA%3D%3D&utm_source=sendgrid.com&utm_medium=email&utm_campaign=website


Now I can track retention and since the enc value is encrypted using AES-256 people are not going to break this encoding with out the Private key. Personally I am using this data for two purposes. The primary purpose is to ensure that the click comes from the intended person; the next purpose is to pass data around for what the app needs to fetch.

An example. XYZ commented on your status update. Click here to see the comment. When the person clicks I need to pull that specific activity to generate the message. Thus the link allows for that with no storage overhead. Here is some example code

public function encrypt( $data, $forUserId='' ){
#
# open cipher module (do not change cipher/mode)
#
$this->openCipherModule();
$this->createIV();
$this->setUserKey($forUserId);


$msg = json_encode($data);

$this->init();

$encoded = $this->doEncryption($data);

$this->closeEncryption();

return $encoded;
}


Now that I have sending down, links down, we need to put it all together. I am using sendmail as my mail transfer agent (MTA) and here is what is needed on EC2 to get it to work.


  1. yum install sendmail

  2. yum install sendmail-cf

  3. vim /etc/mail/sendmail.cf and add define(`SMART_HOST', `smtp.sendgrid.net')dnl *says send all localmail to sendgrid*

  4. vim /etc/mail/access and add AuthInfo:smtp.sendgrid.net "U:sendgrid username" "P:sendgrid pass for your account" "M:PLAIN" *when sending mail through sendgrid use your sendgrid account info*

  5. m4 /path/to/m4.cf /etc/mail/sendmail.mc > /etc/mail/sendmail.cf *"compile the changes"

  6. makemap hash /etc/mail/access.db < /etc/mail/access *encode the pass*

  7. /etc/init.d/sendmail restart




I choose to send mail locally to queue incase sendgrid goes down, which happens often this is why I don't make a socket connection to their servers realtime.

Next we need to configure PHP's SWIFT class to sendmail locally

$transport = Swift_SmtpTransport::newInstance('localhost', 25);
$this->swift = Swift_Mailer::newInstance($transport);



Now the only thing left to do is building a table to record all the clicks that people do to unsubscribe from getting email



CREATE TABLE `DoNotEmail` (
`userId` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'userId that is getting the email',
`emailAddr` varchar(255) NOT NULL COMMENT 'Denormalized email address',
`emailAddrHash` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'emailAddr in our numeric format',
`createdDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'when the email entered the system',
PRIMARY KEY (`emailAddrHash`,`userId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8



Any time a person clicks unsubscribe a row is inserted into this table. Anytime email is ready to be built and sent a query is performed on this table by emailAddrHash which is 8 bytes instead of 50+ bytes for email. I like to keep my keys small.


Most of the time will be spent building your email templates and this is just an abbreviated list of steps things to consider to move the process faster.

Monday, April 18, 2011

Operations is the corner stone of a Web company

Operations defined as the guys/gals who deal with servers have many different names. They are called Dev OPS, Sysadmins, System Engineers, Capacity Planners, Server Monkey (if you are a jerk), DBA, the guy in the corner with an Anarchy Symbol on his Shirt etc.

OPS is the front line defense and Cavalry. They know more about the system then the developer who installed software on OPS managed servers. Because they see the crash happen in the wild. One word of advice, your organization is going to die if you piss off the guy with ROOT access. If you don't know what that is - you really really need to be nice to OPS.

I have been at many layers of a company. I've been QA, Lab Tech, System Tuner, Server Installer, Web Developer, Software Engineer, DBA, Director of Engineering, Chief Architect, Business interface, yet I am just the dude that can get stuff done with the help of the people around me.

In many places I have seen ops treated, as Second Class Citizens-this is where I step in. I pride myself as being a member of OPS. I pride myself, as being a member of DEV but what I really pride myself in is being the Dude that is the bridge between OPS and DEV. The guy that says hey dude this is your 5th Custom Server type your messing with OPS puppet profile, can we standardize? The guy that will consolidate underutilized servers when OPS let’s me know or when I catch it myself. I am the guy that does what OPS want when OPS wants it done. The guy that looks at the same charts that OPS looks at to find what's broken. The guy that suggests in Engineering meetings which excludes OPS to include OPS. I am also the guy that suggests DEVs should be on call as well as OPS.

Value OPS! They are on your team, and if they do not look busy that means they are really good, DEVs and OPS is jelling and your team is going to WIN! If your OPS team is not happy make them because do you know how to log into the power strip and bounce the rack?

Friday, April 01, 2011

Building a Facebook Feed Like system on a Sharded mySQL System

Building a Feed can be broken down into a few key questions.

Who can see what?
How many people can see it?

These key answers really dictates the design of data structure from a global read method to a global write method. A global read method can be summarized using some sql.

SELECT * FROM Activity WHERE userId IN (?,?,?,?,?) AND createDate > NOW() - INTERVAL 2 WEEKS;

Above is actually not an optimal SELECT, what is really done in my case is foreach '?' do a parallel query to each shard group of

SELECT * FROM Activity WHERE userId = ? AND createDate > NOW() - INTERVAL 2 WEEKS;

The reason why the 1st query is not optimal is due to the fact that their are two ranges in the 1st query. The IN clause is a range and createDate is a range thus you can't use a composite key (userId, createDate) the query is only using userId.


A global write method can be summarized with the following query:

SELECT * FROM Activity WHERE rowOwnerId = ? AND createDate > NOW() - INTERVAL 2 WEEKS.

Now you may be wondering if you are doing a global write, why is there a select? For each friend that is following the person who has activity done to said person which either they initiated or is acted upon, write a row for said person and their friends. Thus one action can create 5000+ writes such that the number of writes are proportional to the number of friends that is following the person being acted upon. Writes are N+1.

Global writes allows for you're view of the Feed to be fast but is really hard to keep it in real-time in sync, additionally its very expensive since a copy of the pointer can be copied N times and to get real speed and to avoid additional reads, the content is copied for each friend write. In global writes you have to create queues which succumbs to queuing theory. Additional to these considerations edits and deletes are very hard to keep in sync as well as need special consideration for new friends joining. These things can expose system lag constantly. That being said it can still be done, it's a lot of work that is expensive in terms of hardware cost and developer time.


The system I have built supports both yet currently implements a global read method. The mysql table structure follows.

userId bigint unsigned NOT NULL DEFAULT 0 - this is the person being acted upon.
parentOwnerId bigint unsigned NOT NULL DEFAULT 0 - this is the person who created the content
parentId bigint unsigned NOT NULL DEFAULT 0 - this defines the pointer for the actual content
parentType smallint unsigned NOT NULL DEFAULT 0 - this defines that the content is a wall post or a friend event or

itemType smallint unsigned NOT NULL DEFAULT 0 - this defines the actual action, which could be a comment to the parent Type or just the parent itself
itemId bigint unsigned NOT NULL DEFAULT 0 - this defines the pointer to the item for content retrieval
createDate timestamp not NULL DEFAULT 0 - this indicates when the event entered the system
modifiedDate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP - this defines when the item was added to for a later feature.


The primary key in this mix is
userId, itemType, parentId

this defines that the user acted upon will only have 1 row foreach type of a parentId. This means if there are 1000 comments to a status update, the activity table will have 1 row for that user that made the comment even if that same user was the person who made 1000 comments. Note that this activity table was updated 1000 times because modifiedDate is a timestamp that gets updated on every action.

As you can tell the Activity Table is centered to the entire system and the actual content tables are what I call branches to Activity. Let's take an example to explain this.

user 1000 adds a status update, status update is a table called WallPosts.

WallPosts.posterId - 88888
WallPosts.itemOwnerId - 1000
WallPosts.createDate - NOW()
WallPosts.posterId - 1000
WallPosts.post - This is my status update.

Activity.userId - 1000
Activity.parentOwnerId - 1000
Activity.parentId - 88888
Activity.parentType - 1
Activity.itemType - 1
Activity.itemId - 88888
Activity.createDate - NOW()
Activity.modifiedDate - NOW()


So now friends of 1000 will query userId 1000's shard for the WallPost. Now let's have userId 2000 leave a comment in a table called Comments

Comments.commentId - 88889
Comments.itemOwnerId - 1000
Comments.itemType - 1
Comments.commenterId - 2000
Comments.createDate - NOW()
Comments.comment - " I left a comment on your status update!"

This says that friend 2000 left a comment on userId's 1000 (their shard) of "I left a comment on your status update!"


Activity.userId - 2000
Activity.parentOwnerId - 1000
Activity.parentType - 1
Activity.parentId - 88888
Activity.itemType - 2 //(left a comment)
Activity.itemId - 88889
Activity.createDate - NOW()
Activity.modifiedDate - NOW()


This says that friends of userId -2000, 2000 left a comment on UserId 1000's status update where the comment and the status update both reside on 1000's shard. The content can be reached via

88889 == CommentId

and friends of 2000 now know that 2000 left a comment for 1000. Thus the viral effect within 1/2 degree's of userId 1000.


There is a lot more going on behind the scenes like Children of parents, Using the combined read throughput of memcache, sharded mysql system and permission filtering but this is the general idea.


I am going to continually update this post, so keep checking back here as I add diagrams, flows, links and details looks at grouping events.

Note: By no means am I saying that a global write or Fan-Out write is not as good as a Fan-Out on Load or global read. I'm taking an approach to make it cheap and not trying to optimize to soon.

Here are some good links that I found after writing this post:

http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed


http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed?q=news+feed