Tuesday, May 23, 2017

Golang (Go) and BoltDB

I've been using Go for some time now (3 years) and I am constantly impressed with the language's ease of use. I originally started my career in C-Unix System Programming, then Java, then PHP and now I am rather language agnostic. Out of all the languages I know, go is the most fun and there is a strong community behind it.


BoltDB is yet another NoSQL Key-Value store, designed to be embedded and I happened across it for a small use case. I use GO to crawl sites and parse HTML DOM in a very concurrent manner to gather data for analysis from a variety of remote web sources. BoltDB is used to keep state as I transfer from my local mac book to a remote server and it is very easy to use. Basically, I needed a portable embedded database that is fast and resilient without setting up MySQL and keeping the schema in sync between dev and production. This is not user facing just a set of go packages that help me keep state so I can know where to pick up from in case of some sort of error, like I turn off my laptop or some random panic.


Let's look at BoltDB usage. Below is my struct, everything is a string because I am not formatting or typing things yet.


type TableRow struct {       

       Title string       
       Time string       
       Anchor string      
       Price string       
       Notified string // could make this a Time Struct but let's be simple
}


Next, I create my.db if it doesn't exist. The function check looks to see if there are errors and panics.  The line defer db.Close() will close the db at the end of the function which these calls are made from. The function addRecord will create a bucket called parser_bucket which is a const and add the key byte with value triggering a bucket creation if this is the first run. It is something fast to make a point and yes there are more efficient ways to do this.

db, err := bolt.Open("my.db", 0644, &bolt.Options{Timeout: 10 * time.Second})
check(err)
defer db.Close()
addRecord(db, []byte("start"), "starting") // create bucket when it doesn't exist


The function addRecord takes 3 arguments; db - the boltdb struct, key a byte array and a value which can be anything, in our case, TableRow the struct above. The function is lower case so it is not "public".  The interface v is marshaled into a byte array and stored in boltdb after it checks that the bucket is created. Finally, the addRecord function returns an error if an error occurred.

func addRecord(db *bolt.DB, key []byte, v interface{}) error {
       value, err := json.Marshal(v)
       check(err)
       return db.Update(func(tx *bolt.Tx) error {              
                  bkt, err := tx.CreateBucketIfNotExists([]byte(bucket))
              
                  if err != nil {                     
                     return err              
                  } 
             
                  fmt.Printf("Adding KEY %s\n", key)              
                  return bkt.Put(key, value)       
       })
}


To get a TableRow out of the database a read transaction is performed in BoltDB. This method is capitalized so it is a package public method. GetRecord returns a table row or panics if an error occurred.

func GetRecord(db *bolt.DB, key string) *TableRow {
       row := TableRow{}       err := db.View(func(tx *bolt.Tx) (error) {
              bkt := tx.Bucket([]byte(bucket))              
              if bkt == nil {                     
                 return fmt.Errorf("Bucket %q not found!\n", bucket)
              }

              val := bkt.Get([]byte(key))
              if len(val) == 0 {                     
                 fmt.Printf("key %s does not exist\n", key)                     
                 return nil              
              }
              err := json.Unmarshal(val, &row)
              return err
       })
       check(err)
       return &row
}





Calling this function returns a TableRow reference. There are no real pointers in go but I conceptualize this internally as a pointer.

This is it. This is all there really is to BoltDB. Read Transactions, Write Transactions that are concurrency-safe. You can even run the Unix command strings on the database file so see if you stored the data correctly as a sanity check and you should see json from the output (if that is your serializer).

In conclusion, BoltDB is fast, so far safe and does exactly what I need. Store State, without expecting an external DB. Embedded databases are awesome and go is awesome. Give it a try.







No comments: