Monday, February 06, 2012

SpyMemcached Transcoder with PHP PDO Objects using ZLIB

My technology stack services more then 2 Million Daily Active users.  Its very basic. PHP talks to mySQL, Memcache, RabbitMQ, Gearman and Facebook.  Now that we have more Java specifically to support our SmartFox Server and other services, blurring the lines between what data is set in PHP and what data is read in Java is very necessary.

Java J-Connect makes reading mySQL Data as simple IMHO as PHP's PDO. What is hard is reading PHP's serialized format from PHP's Memcache library.

In PHP there are two main C backed Libraries. There is Memcache the original PHP library which I happen to use, and Memcached which is the library I wanted to use but didn't deploy since EC2 package system conflicted and cause issues (I fixed them but to late to deploy). Memcache stores data in PHP's serialized format and compresses it via ZLIB, while Memcached can store data as PHP's serialized format, JSON, Binary Serialized (which is rather awesome), JSON Array Notation and has a multitude of compressing formats none of which are pure ZLIB that I noticed.

Here is the problem. Spymemcached is a lib for talking to memcache but can't unserialized PHP serialized format (or read it natively and return a string) and cannot Decompress ZLIB but can Decompress GZIP. Now a great speed up would be to use PHP's serialized data set from PHP and share memcache resources from PHP and Java just like what is done for the mySQL resources.

What needs to be done? Well, build your own Transcoder for Spymemcached. Fortunately Spymemcached documented an interface to do just that.

What is needed. Implement the spymemcached Interface defined here. Use org.lorecraft.phparser to unserialize PHP data  defined here. Return the Object.





 Below is the code.




package com.schoolfeed.spymemcached;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.InflaterInputStream;

import net.spy.memcached.CachedData;
import net.spy.memcached.compat.CloseUtil;
import net.spy.memcached.transcoders.BaseSerializingTranscoder;
import net.spy.memcached.transcoders.Transcoder;
import org.lorecraft.phparser.*;


public class PHPSerializedTranscoder extends BaseSerializingTranscoder implements Transcoder <Object> {
 
 static final int COMPRESSED=2;
 /**
  * Get a serializing transcoder with the default max data size.
  */
 public PHPSerializedTranscoder() {
  this(CachedData.MAX_SIZE);
 }

 /**
  * Get a serializing transcoder that specifies the max data size.
  */
 public PHPSerializedTranscoder(int max) {
  super(max);
 }
 
 /**
  * decode the byte data from Memcache decompress it if necessary and return the Object
  * @param CacheData - the byte data is turned into a object
  * @returns Object 
  */
 public Object decode(CachedData d){
  byte[] data=d.getData();
  
  Object rv=null;
  String ds="N;";
  
  if((d.getFlags() & COMPRESSED) != 0) {
   getLogger().debug("Looks like d is compressed");
   data=decompress(d.getData());
  }
  
  ds=decodeString(data);
   
  getLogger().debug("DECODED: [" + ds + "] about to SerializedPhpParser");
  
  SerializedPhpParser sp = new SerializedPhpParser(ds);
  
  try {
   rv = sp.parse();
   getLogger().debug("Parse was cool!!");
  } catch(Exception e){
   getLogger().debug("Not a PHP Object? : " +  ds);
   rv = ds;
  }
  
  return rv;
 }
 
 /**
  * PHP Memcache stores compress data in ZLIB format override the base class decompress method to handle ZLIB
  * 
  * @param byte array - raw data from Memcache
  * @returns byte array
  */
 protected byte[] decompress(byte[] in) {
  ByteArrayOutputStream bos=null;
  final int BUFFER = 2048;
  if(in != null) {
   ByteArrayInputStream bis=new ByteArrayInputStream(in);
   bos=new ByteArrayOutputStream();
   InflaterInputStream iis = null;
   try {
    iis = new InflaterInputStream(bis);

    byte[] buf=new byte[BUFFER];
    int r=-1;
    while((r=iis.read(buf, 0, BUFFER)) > 0) {
     bos.write(buf, 0, r);
    }
   } catch (IOException e) {
    getLogger().warn("Failed to decompress data", e);
    bos = null;
   } finally {
    CloseUtil.close(iis);
    CloseUtil.close(bis);
    CloseUtil.close(bos);
   }
  }
  
  return bos == null ? null : bos.toByteArray();
 }
 
 /**
  * encode -- not implemented yet
  *
  */
 public CachedData encode(Object o){
  int flags = 0;
  byte[] b=null;
  return new CachedData(flags, b, getMaxSize());
 }
 
 /**
  * no need to async Decode let's do it realtime
  */
 public boolean asyncDecode(CachedData d) {
  return false;
 }

}


This is a stop-gap solution until we make the transition to Memcached with JSON encoding. Then I can use Jackson-JSON - which is a fast JSON encoder/decoder for Java enabling a great portable message protocol between the two stacks and nearly any other language we might add to the system (like Python).

No comments: