Wednesday, October 10, 2007

INNODB file_per_table growth based on Data Added

How does INNODB in mysql 5.0 tablespace per file grow?

I can just add data and hope to catch when the table space grows, but I really want to understand the internals of INNODB and build an accurate modal on how big the table space will grow when X rows are added.

    Some things that need to be known.
  • Data stored in INNODB is stored as pages with a default size of 16KB.
  • INNODB adds a page for every secondary INDEX

  • INNODB has some overhead of about 18 bytes (maybe more) per row



First let's set up a test environment


delimiter //

CREATE PROCEDURE resettest()

begin
DROP TABLE IF EXISTS testvarchar;
CREATE TABLE `testvarchar` (
`id` int(10) unsigned NOT NULL,
`data_key` varchar(255) default NULL,
`data_nokey` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `data_key` (`data_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


end;
//

delimiter ;


Now set up some inserts to populate the table


delimiter //

CREATE PROCEDURE populate(IN param1 INT)

begin

declare i int;

set i = 0;

start transaction;

while i < param1 do

insert into testvarchar (id, data_key, data_nokey)

values (i, repeat("-", 1), repeat("-", 250));

set i = i + 1;

end while;

commit;

end;
//

delimiter ;


Now from this I know that the size of each record is
id - 4 bytes
data_key - 2 bytes (varchar there is a pad of a single byte)
data_nokey - 251 bytes (ditto)
TOTAL: 257 bytes

Since there data_key AND it's not the PRIMARY INDEX-a page is allocated just for it. So let's sum up the data.

id - 4 bytes //PK
data_key - 2 bytes //Secondary Index
Total: 6 bytes

for a total 257 bytes for the table plus 6 bytes for the index for a total of 263 bytes.

Now there is some overhead on INNODB tables, there is a Record Header which is 5 bytes and the secondary index references the PRIMARY KEY (the clustered index), Plus a transaction-id which is 6 bytes and a roll pointer which is 7 bytes for the undo log

So the record size is 263 (INDEX && DATA) + INNODB OVERHEAD (17) = 280 bytes.

Now since the INDEX takes a page and data takes a page I need to take


32768/280 == NUMBER OF RECORDS to make the TABLE SPACE GROW 32 KB
This == 117 rows;


The start of the datafile is 112K which is 96K + Page for INDEX at create time


ls -lrh
total 132K
-rw-rw---- 1 mysql mysql 112K Oct 10 22:00 testvarchar.ibd
-rw-rw---- 1 mysql mysql 8.5K Oct 10 22:00 testvarchar.frm
-rw-rw---- 1 mysql mysql 65 Oct 10 00:16 db.opt


Adding 117 rows

call resettest();
call populate(117);



ls -lrh
total 164K
-rw-rw---- 1 mysql mysql 144K Oct 10 22:16 testvarchar.ibd
-rw-rw---- 1 mysql mysql 8.5K Oct 10 22:16 testvarchar.frm
-rw-rw---- 1 mysql mysql 65 Oct 10 00:16 db.opt



call resettest();
call populate(234); // 64K / 280 =~ 234



//GROWS another 32K
ls -lrh
total 196K
-rw-rw---- 1 mysql mysql 176K Oct 10 22:17 testvarchar.ibd
-rw-rw---- 1 mysql mysql 8.5K Oct 10 22:17 testvarchar.frm
-rw-rw---- 1 mysql mysql 65 Oct 10 00:16 db.opt


All though this is may all look right, in fact it's not. This may be a good approximation, but I am not taking into account the additional overhead added by INNODB to force the datafile to grow prior to this calculation at a much earlier row count. I have not tracked down what that is yet. It may be that innodb needs to reserved space based on the 2xheight of the binary tree to make sure there is enough space to undo leaf splitting. I'm sure that INNODB doesn't allocate the full 255 bytes for varchar in indexes or does it? Maybe you can shed some light?

No comments: