Wednesday, April 20, 2011

Building an email system on EC2 from top to bottom with SendGrid

Email is a beast. Sending email is easy but getting it pass spam filters when you’re a legitimate service is rather hard. Sending good email is especially hard when on EC2. The reason is due to spammers using and abusing elastic IPS. So, for startups your best bet of sending out a lot of email and getting it to the user is to use a service. I picked Sendgrid. It's cheap, fast, has good email tracking and builds all the appropriate email headers to get the mail white listed and into the destinations inbox. Sendgrid is the sender (think of them as an extension to sendmail). This is the easy part but to make a true email system that protects your users you need to take some things into consideration.

All HTML mail needs a Text counter part. Some people just like mutt or pine over HTML email. Thus when sending email send out mail in HTML format and Text format with mime headers so what ever email client is used can see a good formatted email.

All HTML links should be encrypted and encoded when passing identifying information. This needs to be done to make sure that the person that the link is intended for is clicked by that person. For instance

Now I can track retention and since the enc value is encrypted using AES-256 people are not going to break this encoding with out the Private key. Personally I am using this data for two purposes. The primary purpose is to ensure that the click comes from the intended person; the next purpose is to pass data around for what the app needs to fetch.

An example. XYZ commented on your status update. Click here to see the comment. When the person clicks I need to pull that specific activity to generate the message. Thus the link allows for that with no storage overhead. Here is some example code

public function encrypt( $data, $forUserId='' ){
# open cipher module (do not change cipher/mode)

$msg = json_encode($data);


$encoded = $this->doEncryption($data);


return $encoded;

Now that I have sending down, links down, we need to put it all together. I am using sendmail as my mail transfer agent (MTA) and here is what is needed on EC2 to get it to work.

  1. yum install sendmail

  2. yum install sendmail-cf

  3. vim /etc/mail/ and add define(`SMART_HOST', `')dnl *says send all localmail to sendgrid*

  4. vim /etc/mail/access and add "U:sendgrid username" "P:sendgrid pass for your account" "M:PLAIN" *when sending mail through sendgrid use your sendgrid account info*

  5. m4 /path/to/ /etc/mail/ > /etc/mail/ *"compile the changes"

  6. makemap hash /etc/mail/access.db < /etc/mail/access *encode the pass*

  7. /etc/init.d/sendmail restart

I choose to send mail locally to queue incase sendgrid goes down, which happens often this is why I don't make a socket connection to their servers realtime.

Next we need to configure PHP's SWIFT class to sendmail locally

$transport = Swift_SmtpTransport::newInstance('localhost', 25);
$this->swift = Swift_Mailer::newInstance($transport);

Now the only thing left to do is building a table to record all the clicks that people do to unsubscribe from getting email

`userId` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'userId that is getting the email',
`emailAddr` varchar(255) NOT NULL COMMENT 'Denormalized email address',
`emailAddrHash` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'emailAddr in our numeric format',
`createdDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'when the email entered the system',
PRIMARY KEY (`emailAddrHash`,`userId`)

Any time a person clicks unsubscribe a row is inserted into this table. Anytime email is ready to be built and sent a query is performed on this table by emailAddrHash which is 8 bytes instead of 50+ bytes for email. I like to keep my keys small.

Most of the time will be spent building your email templates and this is just an abbreviated list of steps things to consider to move the process faster.


Justin Swanhart said...

Is that really a hash, or is there an email address lookup table with a bigint auto_increment for up to 4B unique email addresses?

I ask, because I assume collisions would be fairly likely with a 64bit hash.

An MD5 hash, for example is 128 bits, which would require two BIGINT UNSIGNED columns for storage (or use BINARY(16) datatype).

Justin Swanhart said...

Oops. I meant:
18446744073709551615 addresses. Not 2B.

Dathan Vance Pattishall said...

the address space is huge, plus I made the primary key hash + userId so collisions are near impossible. Out of 120+ million email addresses I have yet to hit a collision, although there is code to detect it.

bucky said...

Can I ask how you are hashing your emails? I maintain an unsub list that has about 4 millions records in it, and the way you are doing it looks much more efficient

Anonymous said...

Do you mean "sendgrid" instead of "sendmail" after "incase" in the following sentence? If so, how often is often? Thanks.

I choose to send mail locally to queue incase sendmail goes down, which happens often this is why I don't make a socket connection to their servers realtime.

Danmark said...

Interesting post, thanks a lot for spending the time to write it. I like the direction you are taking your blog. I’ll be subscribing to your site in order to keep up in the future.

Dathan Vance Pattishall said...

@anon yes ment sendgrid fixed thanks
@bucky I hash by md5sum. I take 16 bytes from it and convert it to a 64 bit int in decimal

bucky said...

@dathan are you using php md5 + pack or just doing that on the mysql side with md5 and convert? sorry I read your blog alot and try to learn more but you are way above me ha

Dathan Vance Pattishall said...

@bucky I do it on the php side with bc_math yet I can do it on the mysql side as well. For this I don't want to make a network trip though.