For example:
$sql = SELECT COUNT(*) FROM VERY_BIG_TABLE WHERE key_part1=NUM AND no_key IN (1,2,3) AND key_part2=NUM
Would you think this is fast? In a dev environment yes, but this query has to get the exact row count that meets the requirements of the WHERE clause.
In many cases we use a count to solve a BOOLEAN term in our code, for example
$count_result = db_fetch($sql)
if ($count_result) {
// do something
} else {
$SHOW_ERROR = 1;
}
Don't use a count for this purpose. Just do something like
$sql = SELECT PRIMARY_KEY FROM VERY_BIG_TABLE WHERE key_part1=NUM AND no_key IN (1,2,3) AND key_part2=NUM LIMIT 1;
This will tell mysql to return the 1st row found and that's it. In comparison this is about 10 times faster then doing a COUNT(*) under high concurrency.
But what if you need the actual number of rows:
Well, I suggest that what ever triggers an INSERT/DELETE/UPDATE into VERY_BIG_TABLE, do the count(*) at the end of the query and store the result in a meta table or summary table that is small and queried fast.
The draw back is, this makes your application a bit slower on writes, but if you do a lot of reads like most sites do, then caching the count will be a huge win, and this cache account tied to a transaction, makes the count extremely accurate.
14 comments:
If it is a MyISAM table COUNT(*) is for free and taken from SHOW TABLE STATUS (rows). You can take the same values for InnoDB, but it would be guessed and might be high or too load, but its faster than really counting all rows.
again with the write performance penility, you can also keep a count table, and use triggers off the normal table to keep them accurate.
Just to clarify this is not a COUNT(*) on a table it's a COUNT(*) for an index column.
and what's the diference between COUNT(*) amd COUNT(primary_key)
@sergi
http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/#more-186
Here is a good explanation on counting a col. For PRIMARY keys it should be fine, but I tend to steer away from it, to be consistent.
Your idea to perform a count(*) after any write is terrible performance-wise and will not be thread-safe. The best way to provide these type of counts is to use Innodb tables and have a trigger that updates a summary table. Since the trigger action as well as the original insert/update will be done in an implicit transaction, you can guarantee that the counts will always be correct regardless of thread access.
With MyIsam, to guarantee a correct , thread-safe count you would have to do something like:
update summary_table set count = (select count(*) from table where key = some_value) where primary_key = a_value;
This would lock both tables for as long as the update and subselect took. You could also do explicit locking of the table. Obviously, these methods do not scale.
Also, the cardinality of the index traversed to answer the count(*) will affect the performance of the count. Including criteria that is outside the index will obviously drastically slow down the query since mysql will have to look at data outside the index node/file.
Wow, no coffee and I leave a wrong comment. :(
I meant to say that the summary table should be incrementally updated, not fully. You never have to run a count(*) with a summary table. Thus, the update should be (in a trigger or not):
update summary_table set count = count + 1 where primary_key = a_value;
Just with MyISAM or (Innodb and no transaction/trigger), you can't guarantee the accuracy of the counts.
MyISAM table COUNT(*) is only free when there is no search predicate. Introducing a search predicate causes the rows to actually get counted one by one.
UPDATE TABLE SET col=col+1 WHERE id = ?
fails, if one update is missed then your count is invalid.
if count(*) is evil, then you should cache these values, maybe using memcached or file cache
trigger count(*) on insert/update/delete can't solve the problem, and remember, write operations are difficult to scale
Great article Dathan!
any way, your method is fine if you
want to show only total counts, but
is not useful if you need use it with
group by
like:
select name, count(*)
from table_name
group by name
exists some kind of optimization for this?
I have seen people that would insert a counting col containing nothing but "1" in each row. To get the actual number of rows you would do a sum() which was supposedly faster than count().
Some applications use a "soft" delete.
This requires actually summing the delete flag column.
I would like to hear a comment regarding mention that sum() is faster than count()
What about using mysql function SQL_CALC_FOUND_ROWS in a previously query, making count use obsolete ?
Post a Comment