Tuesday, December 20, 2011

The Effect of using Cloudfront and why it matters

For years (12+) I have been building systems on every tier of the web. Everything from low-level OS optimizations, mySQL internals, interpreted language performance tricks to static content optimization.

Building CDN's are easy, but what makes Akami or Cloudfront attractive-presences known as edge nodes-they have around the world to syndicate your content closest to the requester. 

Their smart DNS servers send people to the closest edge node to serve content. This is great for serving Javascript, Images CSS (Video) because its static. 

Here is a good example. Your system can serve content in less then 5ms if the network is not involved. With the network overhead that content is served in 10ms time if you are close to the DC (say about 1200 miles). Yet your users on the east coast (assuming your dc is in the west coast) or better yet your users in Europe see this content in 355 ms. Around 300 ms or so users start to notice the lag; as a result this lag is proportional to increase in chances that the user will bounce. People hate waiting. Now do you optimize the backend to serve the content faster or do you put the content closer to the end user?

Put the content closer to the end user to reduce the 350ms back down to 10ms. This is what cloudfront-an amazon product does for you. Here is a good wiki page to setup cloudfront. I've expanded on this to add some Apache Mod Rewrite rules to automate cache invalidation so I don't have to call an API to purge the CDN cache. Below is the mod rewrite rule


    RewriteRule ^/static/(\d+)/(.*)? /static/$2 [NC,QSA,L]
    RewriteRule ^/static/(\w+)/(\d+)/(.*)? /static/$1/$3 [NC,QSA,L]
    RewriteRule ^/static/(\w+)/(\S+)/(\d+)/(.*)? /static/$1/$2/$4 [NC,QSA,L]

what this says is 

given a url

http://domain/static/12345/main.js

serve from

http://domain/static/main.js

The dynamic url is generated by taking the abs crc32 of the file contents of each file. So if a file changes on disk so does the url breaking the cache and forcing cloudfront to refetch the content to display to the user. All this is calculated once during the deploy process.


For instance

http://d1wuzpn2rb4qzi.cloudfront.net/static/jquery/240184024/jquery.min.js

http://d1wuzpn2rb4qzi.cloudfront.net/static/jquery maps to http://your.schoolfeed.com/static/jquery

240184024 - is the cache breaker by doing this during the deploy process

$hashes[$file] = abs(crc32(file_get_contents($file)));

then generating php code that is global to the templates

$str = "";

Now when building the reference link

script type="text/javascript" src="{cloudfront file='/static/jquery/jquery.min.js'}"

{cloudfront} is a smarty function that takes the input file and splits the directory putting 240184024
into the path for the cloudfront url



So what has this done on the system.


Around the 5pm hour we see a drop in www_accesses, that's due to switching to Cloudfront. There is nearly 60% in savings. schoolFeed has a lot of javascript files. Additionally there is still another optimization that can be done to group javascript files together to reduce the amount of GETS.


Here we see a 35% drop in bytes out as a result of the change.

And the affect on mySQL is about 3-5% more traffic on the backend as users stick around longer since things are snapper.



Always keep this in mind. As one tier becomes faster or more performant the other tiers should have the capacity to keep up with demand.


No comments: