Friday, June 01, 2007

TO COUNT(*) or NOT TO COUNT(*)

Counts, we all love to show counts in our applications. Under a high traffic website, it's visually appealing to show a big number. But the cost of generating that big number realtime can be very expensive on our mySQL backend.

For example:


$sql = SELECT COUNT(*) FROM VERY_BIG_TABLE WHERE key_part1=NUM AND no_key IN (1,2,3) AND key_part2=NUM



Would you think this is fast? In a dev environment yes, but this query has to get the exact row count that meets the requirements of the WHERE clause.

In many cases we use a count to solve a BOOLEAN term in our code, for example

$count_result = db_fetch($sql)

if ($count_result) {
// do something
} else {
$SHOW_ERROR = 1;
}


Don't use a count for this purpose. Just do something like



$sql = SELECT PRIMARY_KEY FROM VERY_BIG_TABLE WHERE key_part1=NUM AND no_key IN (1,2,3) AND key_part2=NUM LIMIT 1;



This will tell mysql to return the 1st row found and that's it. In comparison this is about 10 times faster then doing a COUNT(*) under high concurrency.


But what if you need the actual number of rows:

Well, I suggest that what ever triggers an INSERT/DELETE/UPDATE into VERY_BIG_TABLE, do the count(*) at the end of the query and store the result in a meta table or summary table that is small and queried fast.

The draw back is, this makes your application a bit slower on writes, but if you do a lot of reads like most sites do, then caching the count will be a huge win, and this cache account tied to a transaction, makes the count extremely accurate.

14 comments:

Anonymous said...

If it is a MyISAM table COUNT(*) is for free and taken from SHOW TABLE STATUS (rows). You can take the same values for InnoDB, but it would be guessed and might be high or too load, but its faster than really counting all rows.

Anonymous said...

again with the write performance penility, you can also keep a count table, and use triggers off the normal table to keep them accurate.

Dathan Pattishall said...

Just to clarify this is not a COUNT(*) on a table it's a COUNT(*) for an index column.

Sergi said...

and what's the diference between COUNT(*) amd COUNT(primary_key)

Dathan Pattishall said...

@sergi

http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/#more-186

Here is a good explanation on counting a col. For PRIMARY keys it should be fine, but I tend to steer away from it, to be consistent.

Anonymous said...

Your idea to perform a count(*) after any write is terrible performance-wise and will not be thread-safe. The best way to provide these type of counts is to use Innodb tables and have a trigger that updates a summary table. Since the trigger action as well as the original insert/update will be done in an implicit transaction, you can guarantee that the counts will always be correct regardless of thread access.

With MyIsam, to guarantee a correct , thread-safe count you would have to do something like:

update summary_table set count = (select count(*) from table where key = some_value) where primary_key = a_value;


This would lock both tables for as long as the update and subselect took. You could also do explicit locking of the table. Obviously, these methods do not scale.


Also, the cardinality of the index traversed to answer the count(*) will affect the performance of the count. Including criteria that is outside the index will obviously drastically slow down the query since mysql will have to look at data outside the index node/file.

Anonymous said...

Wow, no coffee and I leave a wrong comment. :(

I meant to say that the summary table should be incrementally updated, not fully. You never have to run a count(*) with a summary table. Thus, the update should be (in a trigger or not):

update summary_table set count = count + 1 where primary_key = a_value;


Just with MyISAM or (Innodb and no transaction/trigger), you can't guarantee the accuracy of the counts.

Anonymous said...

MyISAM table COUNT(*) is only free when there is no search predicate. Introducing a search predicate causes the rows to actually get counted one by one.

Dathan Pattishall said...

UPDATE TABLE SET col=col+1 WHERE id = ?

fails, if one update is missed then your count is invalid.

Anonymous said...

if count(*) is evil, then you should cache these values, maybe using memcached or file cache

trigger count(*) on insert/update/delete can't solve the problem, and remember, write operations are difficult to scale

Anonymous said...

Great article Dathan!

any way, your method is fine if you
want to show only total counts, but
is not useful if you need use it with
group by

like:

select name, count(*)
from table_name
group by name

exists some kind of optimization for this?

Anonymous said...

I have seen people that would insert a counting col containing nothing but "1" in each row. To get the actual number of rows you would do a sum() which was supposedly faster than count().

Dennis said...

Some applications use a "soft" delete.
This requires actually summing the delete flag column.

I would like to hear a comment regarding mention that sum() is faster than count()

Anonymous said...

What about using mysql function SQL_CALC_FOUND_ROWS in a previously query, making count use obsolete ?