Monday, December 11, 2017

Designing a RDBMS SQL Table


Building tables initially should not really require a lot of thought. What? I'm suggesting that when designing a table think of the Table as a spreadsheet. Yes.


For instance, let's create a table that combines all social scores of a users' media in a single table. We will call this table platform_resources.

What do we need to record the social score total of a single person?
  • Who is this person? How I know this persona. 
  • What is the platform? Which Social Platform does this reference refer to
  • What is the platform identifier? What is the social platform identifier
  • What is a common social score for each user? View, Likes, Comments


So the table above answers my questions. For each piece of media that and interna_name owns, I am able to collect a summary of basic stats. By no means is this optimized. The row size is roughly

21+51+51+4+4+4+4+4+256+4 = 403 bytes not taking into account the primary key which is very large and takes a small byte overhead due to the exceeding an internal limit.

We are not optimizing yet, we are just answering questions.


The Primary Key was picked to be platform_id, platform, internal_name. Following the Left Most prefix rule for composite indexes, we have roughly 3 indexes in 1 index. The original primary key, platform_id & platform, then finally platform_id. The primary key was picked to be this because for a platform the platform_id is unique and the person who owns this platform_id should be represented. Additionally, since we are using INNODB the table is sorted by the primary key.

No optimizations just a basic table get's the job done. Now how would you optimize this table?
First, you should ask what are you optimizing the table for? Disk size? Memory fit? Because its ugly and it bothers me?

Let's estimate how this table will grow. This table is a MxN problem where for each internal_name they will have N resources per platform. The bounds of the growth are around 1000 items per year per platform. M is less than 20K so, It's really not worth it to optimize for any other reason just to do it because. So don't.

If I had to optimize because the MxN problem turned into a huge overhead.
First, I would reduce the row size of the table by making lookup tables for internal_name, platform, platform_id which keeps the primary key smaller - probably in 64 bits.

Next, distribute the table by either date_taken range since queries will be more interested in the latest data, or we can distribute the table by internal_name; this is another post.

Finally, sometimes you just need a table and you just want to query it like give me the total sum of views for all Instagram videos by a creator. The post is to think about optimizations when you need to think about optimizations and not beforehand. If your needs changes; change the schema to focus on the optimization you are going for. :)



No comments: