Monday, April 21, 2008

Common Steps to Scale Linearly

Whenever I work at a place I do the following.

Get a rundown of what the application is, what its demands are, what does the company expect the application to be a year from now - like how many users are going to use the application. 10 million, 20 million, 100 million?

Then I find all the slowdowns:
- What are the my.cnf settings?
- What are the most active tables?
- What type of SQL is being used?
- How is the data accessed?
- Who/What owns the data?
- What is the Read-Write Ratio?
- How many servers are used now to handle the site load, and how many are needed within a few months.
- What is the reads per second, connections per second, writes per second
- How does the data grow? MxN, MxNxO, N^4 etc.

Once I get this down (a few days) then I change everything :)

If the data is small and doesn't change often I don't bother federating that at 1st. I go for the meat of the product. My goal is to run mean, lean, cheap, fast, and easy to maintain. I love my sleep.

So steps on federating:

What is the main object?
What are the mappings to this main object?
Spread data out by this main object
Cache lookups to the pointer where the main object data is.
Build everything around the main object(s).
Use a versioning system
Document a global view of how things work, and make cookbooks-so someone else can wake up in the middle of the night. I love my sleep :)


At the same time get

dsh working
nagios working
ganglia working
custom tools working



What you have is an easy to use, maintained system that scales linearly as long as that main object is being referenced.

Above is the easy stuff. The time-consuming part of the procedure is rewriting all the code to work for old and new, and migrate the new. This is needed to make sure no one if affected by upgrading to the new system.

2 comments:

mike said...

I love dsh. I don't know how long ago I discovered it but man. I use it daily now. Combined with ssh keys...

suhaib said...

Good points to start optimizing. I'll also check for column index, where clauses, tmp tables.

Mike can you point me to 'dsh' tool. I am not sure how to locate it.