Wednesday, September 21, 2011

Stump the Murph: ulimit, pam and linux

There is a game that a small group of friends and I have been playing since my Friendster years. It's called Stump the Murph. Basically if there is some weird problem in Linux mainly but it's in a variety of subjects-that we can't figure out we pass it to one of our friends Kevin Murphy. In 8 years I believe I stumped him once but I can't remember what it is so it doesn't count.

Here is the problem

SQLSTATE[HY000] [1135] Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug

"Obviously" this means you need to raise the ulimit for the process running mysql. I say "obviously" because this error means different things. In most cases it means that the server ran out of memory. perror 11 says OS error code  11:  Resource temporarily unavailable, yet when there is enough memory there may be a pam_limit issue. In my case there is.


So I did the following

in /etc/security/limits.conf I added this

mysql   soft    nofile  10240
mysql   hard    nofile  1537454
mysql   soft    nproc   32768
mysql   hard    nproc   65535

yet when I test the changes su - mysql
I get
 

su: pam_limits(su-l:session): Could not set limit for 'nofile': Operation not permitted

So my next course of action is to check

/etc/pam.d/system-auth

wait a second it has

session required pam_limits.so
and

/etc/pam.d/su calls

session         include         system-auth

thus I don't need to add session required pam_limits.so

Now the game of Stump the Murph begins:

In about 1/2 hour Murph figured out the solution! He deduced that since 

cat /proc/sys/fs/file-max
1537454

you can't set the hard limit of nofile to 1537454 because in theory you could starve the kernel from file descriptors thus from murph's suggestion I did

mysql      soft    nofile  10240
mysql      hard    nofile  768727
mysql      soft    nproc   32768
mysql      hard    nproc   65535

#MAKE SURE root has the same or greater settings!!!

root      soft    nofile  10240
root      hard    nofile  768727
root      soft    nproc   32768
root      hard    nproc   65535


Thanks Murph!

Thursday, September 15, 2011

Amazon EBS mySQL, Disk Throughput and the Dual Edge of Software Raid

Amazon's EBS system is just a nice interface to a SAN subsystem, which manages the attachments of SAN LUNs. The problem with SAN when compared to Local SAS drives is latency and the shared controller, which caches IOPS for very distinct load profiles. Each load profile has an "optimized" cache profile from the SAN's redundant controller system. You may be able to attach petabytes of disks, but this system cannot utilize the true throughput when compared to small locally attached SAS Drives. Now the management portion of awesome. I love having the ability to mount more disk but I rarely need more space, I need speed.

How to get Speed out of Amazon's EBS volumes: Software RAID it!
mdadm --create /dev/md1 -v --raid-devices=8 --chunk=256 --level=raid10 /dev/xvdk /dev/xvdl /dev/xvdm /dev/xvdn /dev/xvdo /dev/xvdp /dev/xvdq /dev/xvdr

Take 8 EBS 125 GB volumes create a raid10 array with a 256KB chunk size. After various and mind numbing benchmarks I found that 256K is a good sweet spot. Feel free to do your own benches. The results have to be interpreted because of the nature of using a shared resource.

What I end up with is a 500GB partition, and I am roughly able to get around 22-25 MB of second of random I/O from 20 threads. To compare this to an 8 DISK 15K RPM PERC-6 2.5" SAS system I am able to get around 44 MB of second at a constant 1-2 ms response time for the same physical space. EBS volumes Response time per iop range from 6ms to 200ms. This sucks. Note: these numbers are based on RANDOM I/O 16KB Page size (4 iops per block write), what INNODB uses not sequential I/O.

Here is some iostat numbers from a live box with this configuration
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.83    0.00    1.75   22.32    0.08   74.01

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdh              0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00   0.00   0.00
xvdk              0.00     0.00   34.40   26.40  1100.80  1503.20    42.83     0.49    8.01   6.39  38.88
xvdl              0.00     0.00   13.20   26.40   422.40  1503.20    48.63     0.27    6.71   4.38  17.36
xvdm              0.00     0.20   32.40   27.00  1036.80  1524.20    43.11     0.30    5.13   4.19  24.88
xvdn              0.00     0.20    9.40   27.00   300.80  1524.20    50.14     0.15    4.11   2.48   9.04
xvdo              0.00     0.00   30.20   27.40   968.00  1496.80    42.79     0.45    7.76   6.56  37.76
xvdp              0.00     0.00   14.60   27.40   478.40  1496.80    47.03     0.22    5.26   3.92  16.48
xvdq              0.00     0.00   31.20   25.60   998.40  1501.60    44.01     0.38    6.73   5.32  30.24
xvdr              0.00     0.00    9.80   25.60   313.60  1501.60    51.28     0.16    4.50   2.35   8.32
md1               0.00     0.00  174.80   98.60  5606.40  6009.80    42.49     0.00    0.00   0.00   0.00

So, now that I have acceptable speed what is the drawback? A weekly cron job that runs a check across the raid array. On Amazon’s EBS system it cuts my throughput in 1/2

For my Amazon Linux system the cron job is located
-rwxr-xr-x 1 root root 2770 Jan 16  2011 /etc/cron.weekly/99-raid-check

It essentially runs

echo check > /sys/block/md1/md/sync_action

Yet, the check lasts for around 9000 min or 6.25 days! Thus I will only have .75 days of full throughput.

So to stop this I must run
echo idle > /sys/block/md1/md/sync_action

I do not recommend turning off the check, its needed. Now to find out a way to make this check happen faster.

Wednesday, September 07, 2011

Virtual Currency and Tracking What is Given Where

Virtual Currency is currency, which is used inside games, websites, applications, etc. It can be bought and traded and in many cases has a conversion to real currency. For instance Facebook Credits, this is virtual currency and each credit is worth 10 cents.

A common problem I run into when building Currency Sinks or earns is; what part of the application gave out the currency and why? Some techniques I used to manage this problem are to create a constants file of a bunch of action ids and used the constants wherever the application writes updates to the balance.

This causes a problem. Developers will just reuse the same id for very distinct reasons. When debugging situations I have to trace through a bunch of code.

Additionally problems can arise on deploy, because the use of currency change maybe pushed out yet the constants file is not. This causes data issues and or even preventable exceptions disrupting the customer experience.  The later possibility is much more offensive to me personally then data corruption. I go above and beyond to prevent any negative views a customer may have. My work shouldn't piss anyone off it should enable them.

I've tackled the problem by manually managing where the currency was deducted from, but this is the last time. If you do the same thing more then once and it can easily be solved by automation, automate it to make your life easier -- my motto.

How am I tackling the problem now?
By default I am adding the calling Objects class and method to my CurrencyHistory Table with each Currency Transaction and top app name. Now on the row basis I can track what class and method changed the balance and where it was called. Below is a table structure used to track currency and history of the currency balance change.

--
-- Holds the Balance of a users currency
--
DROP TABLE IF EXISTS Currency;
CREATE TABLE `Currency` (
  `userId` bigint unsigned NOT NULL,
  `regular` int NOT NULL,
  `premium` int NOT NULL,
  PRIMARY KEY (`userId`)
) ENGINE=InnoDB;

--
-- Holds the info on How the balance is generated
--
DROP TABLE IF EXISTS CurrencyHistory;
CREATE TABLE CurrencyHistory (
    transactionId BIGINT UNSIGNED NOT NULL COMMENT "Globally unique Id"
    userId BIGINT UNSIGNED NOT NULL COMMENT "User Identifier",
    itemId bigint unsigned NOT NULL DEFAULT 0 "Unique Identifier on what was bought or sold",
    itemOwnerId bigint unsigned NOT NULL DEFAULT 0 "User that was the last owner",
    regularAmount INT NOT NULL "How much was earned or sold",
    premiumAmount INT NOT NULL "How much was earned or sold for premium currency",
    createDate TIMESTAMP NOT NULL "When this event occurred",
    callStackId DEFAULT NULL "What part of the code base contains the code that changed currency the lookup table is generated on the fly from the json string and stored offline"
    PRIMARY KEY(userId, createDate, transactionId)
) Engine=InnoDB;



Now to modify Currency here is some PHP code:
  /**
    * Update user's balance
    * 
    * @param int $userId
    * @param int $regularDelta
    * @param int $premiumDelta
    * @return bool
    */
    public function CurrencyDB_updateBalance($userId, $regularDelta = 0, $premiumDelta = 0,  $itemId = 0, $itemOwnerId = 0) {

        $this->validate->id($userId);
        $this->validate->int($regularDelta, $premiumDelta);
        if( $regularDelta == 0 && $premiumDelta == 0 ) {
            return 0;
        }

        $this->shard($userId)->beginTransaction('updateBalance');
        try {
          
            if( $regularDelta < 0 || $premiumDelta < 0 ) { 
                $row = $this->shard($userId)->selectRow("SELECT * FROM Currency WHERE userId=? FOR UPDATE", array($userId));
                if( $row->regular < -$regularDelta || $row->premium < -$premiumDelta ) {
                    throw new CurrencyDBException("Insufficient funds", CurrencyDBException::INSUFFICIENT_FUNDS);
                }
            }
            
            // modify the balance
            $query = "INSERT INTO Currency(userId, regular, premium) VALUES(?, ?, ?) ON DUPLICATE KEY UPDATE regular=regular+VALUES(regular), premium=premium+VALUES(premium)";
            $this->shard($userId)->update($query, array($userId, $regularDelta, $premiumDelta), $this->cacheKey($userId));
            
            // get the GUID and record the details of the transaction
            $transactionId = $this->getTicket();
            $query = "INSERT INTO CurrencyHistory(transactionId, userId, itemId, itemOwnerId, regularAmount, premiumAmount, createDate, fromWhere) VALUES(?, ?, ?, ?, ?, ?, NOW(), ?)";
            $callStackId = $this->getCallStackId(); // get the backtrace as an ID
            $this->shard($userId)->update($query, array($transactionId, $userId, $itemId, $itemOwnerId, $regularDelta, $premiumDelta, $callStackId));
            
            $this->shard($userId)->commit('updateBalance');

        } catch (Exception $e) {

            $this->shard($userId)->rollback('updateBalance');
            throw $e;

        }
        return 1;
    }

 /**
    * Return an Id for the code stack
    * 
    * @return bigint
    */
   private function getCallStackId(){
        if ($GLOBALS['cfg']['disable_feature_backtrace_sql'] != 1){
        
            $trace      = debug_backtrace();
            $notdone    = 1;
            $numLoops   = 0;
            $maxLoops   = 10; // hope its not nested 10 levels deep
        
            while($numLoops++ < $maxLoops){
            
                $test = (strpos($trace[0]['class'], 'CurrencyDB') === FALSE ? 0 : 1);

                if (!$test){
                    break;
                }
                array_shift($trace);
            }
        

            $loc = array();
            
            
            $loc[] = substr(str_replace("/var/www/html","",$_SERVER['PHP_SELF']), 1);
            $loc[] = ($trace[0]['class'] ? $trace[0]['class'] : 'main');
            $loc[] = $trace[0]['function'] ? $trace[0]['function'].'()' : 'called_from_script';
            return $this->getCodeLookupEntityId(json_encode($loc));
                
        }
        
        return 0;
 }       

mysql> select * from CodeLookup;
+----------------------+------------------------------------------------------------------------+
| entityId             | json                                                                   |
+----------------------+------------------------------------------------------------------------+
| 17113479085265723743 | ["test\/dathan\/test_CheckBonus.php","CheckBonus","CheckBonus_give()"] |
+----------------------+------------------------------------------------------------------------+