This is a symptom
Losing some ticks... checking if CPU frequency changed.
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interrupts
rip __do_softirq+0x4d/0xd0
ttyS1: 1 input overrun(s)
ttyS1: 1 input overrun(s)
ttyS1: 1 input overrun(s)
ttyS1: 1 input overrun(s)
ttyS1: 1 input overrun(s)
BUT THE REAL PROBLEM is it Locks up the partition that the ibdata file is on.
Systems where the server locked up.
2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buffers cached
Mem: 16412760 16389976 22784 0 82440 686368
-/+ buffers/cache: 15621168 791592
Swap: 8393952 144 8393808
I had O_DIRECT running in production for over a month on some pretty loaded servers, but once I put it on some older servers, all hell broke loose.
Pretty Loaded is defined as
1000 qps mainly selects mixed with large ranges at a high concurrency of 30 threads.
CPU WIO is around 10-15% (acceptable thresholds)
If you insist on running O_DIRECT I recommend
1. Test O_DIRECT on ever OS version in your farm
2. Test O_DIRECT by producing so much load that it's unrealistic.
7 comments:
Can you provide any details around this? What is a "pretty loaded server?" What is the hardware configuration? What kind of application workload was this database? Since we run a lot of MySQL databases (using InnoDB) on ext3 file systems, I'm concerned but without anything to go on, I can't take this very seriously.
I am still tracking down why O_DIRECT does not work with EXT3 on my 2.6 RHEL version. All I have to go on is when I turned it off the partition never froze again.
IM me, it turns out you do have support.
-Brian
We're using O_DIRECT on innodb for > 60 days now. No problems with locked partitions.
I have the same load here (1000 qps) with ext3 and O_DIRECT (mysql 5.0.44sp1, ubuntu dapper server), no issues for more than 6 months. Looks like a platform-specific problem to me.
Thank you for your input. We still have no clue as to why it's not working for us.
Maybe a conflict with MegaRaid?
I think it's pretty irresponsible to immediately assume ext3 was culpable in your issue. Did you remount with ext2 or xfs for example, repeat the situation that caused the crash, and NOT have it crash?
I'm trying to understand where ext3 fits into your hypothesis.
Post a Comment