Tuesday, April 23, 2019

Debugging awslab's aws-service-operator with go delve on vscode

Currently, I'm doing a lot of work in Kubernetes, especially around operators. One operator, in particular, I am working on is aws-service-operator from awslabs. We ran into a bug with the default behavior around the dynamodb CR. There is a bug in this cloudformation template that defaults RangeAttributeTypes into Strings, when the operator supports strings, number, bytes.

I know this is a bug, the highlighted text from the click-through clearly states the bug, but how do I verify the bug? My environment is a macbook pro with vscode using all the go tools extensions.

So let's set up the debug environment:

First I need to setup the repo itself
mkdir -p awslabs
cd $GOPATH/src/github.com/awslabs
git clone git@github.com:awslabs/aws-service-operator.git

Now let's follow the development guideline and build the environment outside of vscode (getting dep and everything working)

$> code aws-service-operator // this is an extension from vscode to call it at the command line.

Click the menu Debug, click Add Configuration. Paste below.

    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
            "name": "Launch",
            "type": "go",
            "request": "launch",
            "mode": "debug",
            "remotePath": "",
            "port": 2345,
            "host": "",
            "program": "${workspaceRoot}/cmd/aws-service-operator",
            "env": {},
            "args": ["server","--kubeconfig=/Users/dathan.pattishall/.kube/config", "--region=us-west-2", "--cluster-name=dathan-eks-cluster", "--resources=s3bucket,dynamodb,sqs", "--bucket=wek8s-dathan-aws-service-operator", "--default-namespace=system-addons"],
            "showLog": true

Click the menu Debug and click Start Debugging. This assumes that you're using saml for aws auth, your auth is admin and has at least IAM EKSWorkerNodeRole. If you are using AWS-Admin like I am, you are good.

Now let's start debugging. Put a breakpoint at line 101 of pkg/helpers/helpers.go.  Step into

resource, err := clientSet.CloudFormationTemplates(cNamespace).Get(cName, metav1.GetOptions{})

You'll see that the application makes a call to itself to try to get the cloudformation templates you installed. If you didn't install any cloudformation template called dynamodb the default will be used:


This is where the bug is. The cloudformation yaml has a bug where it does not ref the Hash or Range Attribute Types and the workaround is to install a cloudformation CR.

apiVersion: service-operator.aws/v1alpha1
kind: CloudFormationTemplate
  name: dynamodb
url: "https://s3-us-west-2.amazonaws.com/a-temp-public-test/dynamodb.yaml"

output.url contains the ClouldFormationTemplate with a data field that defines the cloudformation template. I can only surmise that to make common code paths, that they will make extra API calls for reuseability, because even though the aws-service-operator has the CloudFormationTemplate, it needs to fetch it remotely due to how the code is constructed, making redundant calls. You'll see this in the debug. Make an API call to itself, then parse the YAML, then fetch the YAML from a remote endpoint.

Now what we see here is that the operator needs to pull the CR from a REST endpoint or HTTP endpoint even though it already has it defined in K8s itself.

The fix to the bug is as follows.


              AttributeName: !Ref HashAttributeName
              AttributeType: "S"
              AttributeName: !Ref RangeAttributeName
              AttributeType: "S"


          AttributeName: !Ref HashAttributeName
          AttributeType: !Ref HashAttributeType
          AttributeName: !Ref RangeAttributeName

          AttributeType: !Ref RangeAttributeType

Additional to this,  awslabs uses N as a value. This value means false in YAML (why I don't know). Thus for the yaml passed to create a dynamodb table you need to quote it.

So in the end to create my table I need the following yaml to create the dynamodb table which I use to test the operator.

kind: DynamoDB
+ metadata:
+   name: sample-tablename
+ spec:
+   hashAttribute:
+     name: AuthorizationCode
+     type: "S"
+   rangeAttribute:
+     name: CreatedAt
+     type: "N"
+   readCapacityUnits: 10
+   writeCapacityUnits: 10

Notice the S is quoted along with the "N" otherwise, N equates to false.

In conclusion. Delve is awesome, the operator has a bug and I was table to figure it out with this debugging method to produce this case https://github.com/awslabs/aws-service-operator/issues/181

Tuesday, February 12, 2019

Aurora mySQL differences

Working with Aurora MySQL I thought would be a breeze, but its subtle differences make me scratch my head. Thus I need to find out more about this and write a post :)

What is Aurora?

It's a mySQL wire protocol compatible storage management system that sits on top of mySQL and modifies some innodb internals. You can read about more of the architecture here: I think of it as a Proxy Storage Engine System

The differences start from just starting the server.  Aurora MySQL has Huge Page support turned on by default since AWS launches Aurora MySQL server with their custom flag for innodb large page support:


This is not an open source setting documented by MySQL official build. In fact, there is not much information on this setting at all. I can only assume RDS instances are configured with Huge page support as detailed here and this custom setting for Aurora turns large page support on for mysqld.

So, what else is different between Aurora and Innodb? From Amazon's docs

The following MySQL parameters do not apply to Aurora MySQL:
  • innodb_adaptive_flushing
  • innodb_adaptive_flushing_lwm
  • innodb_checksum_algorithm
  • innodb_doublewrite
  • innodb_flush_method
  • innodb_flush_neighbors
  • innodb_io_capacity
  • innodb_io_capacity_max
  • innodb_log_buffer_size
  • innodb_log_file_size
  • innodb_log_files_in_group
  • innodb_max_dirty_pages_pct
  • innodb_use_native_aio
  • innodb_write_io_threads
  • thread_cache_size
The following MySQL status variables do not apply to Aurora MySQL:
  • innodb_buffer_pool_bytes_dirty
  • innodb_buffer_pool_pages_dirty
  • innodb_buffer_pool_pages_flushed
These lists are not exhaustive.

In summary, Aurora uses mySQL but it's also a layer on top of mySQL. In all essence, it's just another storage engine which forks Innodb and provides management primitives built into the DBMS system.

In the next weeks, I'll describe how we launch Aurora instances and why as well as capturing more differences that have not made it into this list.

Friday, July 20, 2018

Hackathon process per week Sprints Idea

I like hackathons. Hackathons provide the freedom to build outside the process. The forced speed to deliver something to demo and the fun self-deprecation of "ooh this is really ugly/bad TODO don't do this." in the source/commit logs which tells a great story. Also, a great side effect; people are really interested in refactoring and fixing the code especially if the demo went well.

So, I started thinking what if we can take this naturally formed fun process and define a weekly sprint, with a daily standup reporting on the process to achieve the product goal, using a hackathon method.

Day 1 and 2

"How much can you get done in two days for the demo"

  • This portion is no more than an hour planing. You talk to your team and divide up tasks for the hack you want to demo-in two days. For instance, "Johnny says I'll write the service" and "Amanda says I'll provide the data-it will be in MySQL". Sammy says "I'll write the front end to demo, Johnny let's agree what you'll send me, for now, I will simulate some pho data."
  • Then each person builds their part.
  • During the process, Johnny is building the interface from an un-authenticated HTTP Get request that has a JSON response to define what his service will return. Amanda finishes the process of testing some queries for functionality she checks in her part of how to get data, massage it and what tables are what, NOT performance.
  • Johnny sends a sample interface to Sammy so some dynamic data can be injected into the mockup when Sammy requests data. They agreed that a REST API using GET with a JSON response.
  • There are PR requests when sharing the same addition to the same place otherwise frequent merges
  • When fixing something that made it into master fix forward so check into master :P
  • Each check-in should be filled with a series of TODO, FIXME or "TODO don't do this" statements for speed until that's not needed when you have a refined process.
  • Demo

What does the individual developer each get? 
Each developer produced something quick to verify the viability of the idea. A vested interest to fix the hacks and beautify the code, reusing reusable parts, etc. 

What does the team get?
The team feels that they got something out pretty quick, the team has some talking points of what to fix next and what systems the team envisions that could possibly be used in other parts of the code.  Finally,  the chance to learn something new in the knowledge transfer or the ability to fix an approach before going too far down the rabbit hole.

Day 3 

The next day is mapping out what the developer wants to refactor, has to change and gets to delete. With knowledge transfer of the good, bad, and embarrassing things with an idea of the direction each person took. It is fun.
  • This is looking over the queries to make they make sense.  
  • Are the correct indexes there? 
  • Are we really answering the correct questions efficiently if not how can we? 
  • What hacks do we need to undo to provide what we delivered?
  • How do I test this thing? I need to make sure before I refactor I have reproducible tests. 

Day 4 

Document, Test, Refactor agree more as a team and focus on a code structure that enables adding the next round of features while setting standards of the direction going forward or revisiting them if need be.

Day 5

Do more of the same or get a beer with the team.

This process makes me feel that I am building something fast. The reason for the speed was to validate the idea or approach. Time is built into the process for testing, refactoring and documenting. The refactoring takes into account how to add new things faster. 50% building 50% testing, documenting, refactoring, making better. Producing a 4 day work week with daily standups

What about a really big project and delivering constantly

  • Whiteboard what is needed to deliver such as what the product is, what does it solve, what are the features. 
  • Answer what is alpha
  • Answer what is beta.
  • Divide and conquer the vision for each "hackathon period"
  • Adjust projection of delivery based on the previous hackathon progress
  • Keep working and visit each hackathon period to verify the correct thing is built correctly.
  • Profit from a fun fast paced delivery of code that treats features and delivery of great code the team all validates as equal partners.

Saturday, January 27, 2018

Spotify Top 200 in mySQL

I do a lot of data analysis lately, and I try to find answers to questions through data for my companies pressing questions. Let's look at the past year of 2017 and answer questions for people who like music.

artist is the artist name
track is the artist's track name
list_date is which chart date the artist show up on the top200
streams is the number of plays following spotify specific rules

Let's look at the data set

select count(*) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
| count(*) |
|    74142 |
1 row in set (0.04 sec)

How many artists made it in the top200 for the United States?

mysql> select count(DISTINCT(artist)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
| count(DISTINCT(artist)) |
|                     527 |
1 row in set (0.09 sec)

Wow, it's really hard to be a musician. Only 527 broke the top200.

How many tracks in 2017 broke the top200?

 select count(DISTINCT(track)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
| count(DISTINCT(track)) |
|                   1682 |

For the entire year, 1682 songs defined the united states listing habits for the most part.

Who showed up the most in the top200 for 2017?

mysql> select artist,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;
| artist           | CNT  |
| Drake            | 3204 |
| Lil Uzi Vert     | 1891 |
| Kendrick Lamar   | 1874 |
| Post Malone      | 1776 |
| Ed Sheeran       | 1581 |
| The Weeknd       | 1566 |
| Migos            | 1550 |
| Future           | 1536 |
| The Chainsmokers | 1503 |
| Kodak Black      | 1318 |
10 rows in set (0.16 sec)

Drake killed it, but Lil Uzi Vert is the star of the year, IMHO. Drake has a pedigree while Lil Uzi just started running.

Also from these artists I can tell HIP HOP dominated us charts; Let's verify this assumption.

mysql> select artist,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;
| artist           | CNT        |
| Drake            | 1253877919 |
| Kendrick Lamar   | 1161624639 |
| Post Malone      |  954546910 |
| Lil Uzi Vert     |  818889040 |
| Ed Sheeran       |  714523363 |
| Migos            |  682008192 |
| Future           |  574005011 |
| The Chainsmokers |  557708920 |
| 21 Savage        |  472043174 |
| Khalid           |  463878924 |
10 rows in set (0.48 sec)

Yup hip hop dominated the top 10 steams.

What about tracks? What are the top 10 tracks by streams?

 select track,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;
| track             | CNT       |
| HUMBLE.           | 340136186 |
| XO TOUR Llif3     | 314758565 |
| Congratulations   | 283551832 |
| Shape of You      | 280898054 |
| Unforgettable     | 261753940 |
| Mask Off          | 242524530 |
| Despacito - Remix | 241370570 |
| rockstar          | 225517132 |
| Location          | 224879215 |
| 1-800-273-8255    | 219689749 |
10 rows in set (0.43 sec)

Which tracks and artists had the most time in the top200?

 select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10;
| artist           | track                               | CNT |
| D.R.A.M.         | Broccoli (feat. Lil Yachty)         | 485 |
| French Montana   | Unforgettable                       | 417 |
| PnB Rock         | Selfish                             | 394 |
| Travis Scott     | goosebumps                          | 365 |
| Post Malone      | Go Flex                             | 365 |
| Childish Gambino | Redbone                             | 365 |
| Post Malone      | Congratulations                     | 365 |
| Post Malone      | White Iverson                       | 365 |
| Migos            | Bad and Boujee (feat. Lil Uzi Vert) | 364 |
| Bruno Mars       | That's What I Like                  | 364 |
10 rows in set (0.20 sec)

Also from this data I can tell that Post Malone had a fantastic year!

So, more questions can be answered, like who held the number 1 position on the top200 the most?

select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and pos=1 and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10;
| artist         | track                               | CNT |
| Post Malone    | rockstar                            | 105 |
| Kendrick Lamar | HUMBLE.                             |  67 |
| Ed Sheeran     | Shape of You                        |  48 |
| Luis Fonsi     | Despacito - Remix                   |  47 |
| Migos          | Bad and Boujee (feat. Lil Uzi Vert) |  29 |
| 21 Savage      | Bank Account                        |  20 |
| Drake          | Passionfruit                        |  12 |
| Logic          | 1-800-273-8255                      |  10 |
| Taylor Swift   | Look What You Made Me Do            |  10 |
| French Montana | Unforgettable                       |   7 |
10 rows in set (0.26 sec)

Wow can see hear that Post Malone is the star!

In summary, getting public data sources and doing simple queries can give a clearer insight into data to answer some pressing questions one may have.

With the schema above what questions would you answer?

Friday, January 19, 2018

Deploying Go Applications in Docker Containers using a Scratch Docker File

Programming in golang is fantastic. I find it fun, expressive and simple to build concurrent programs. Deploying a golang app from laptop to production is as hard now as when it was back when I was building Monolithic services. A great way to deploy nowadays is to deploy microservices in containers. Containers keep the environment between laptop and AWS Linux instance in sync since in essence the environment is deployed and not just the code or binary.

Containerization of the environment is not ideal although. Sometimes you can ship containers of 1GB in size or more. Deploying that across the LAN is ok, over the WAN .. it is debatable. So, to deal with this problem I work with scratch Dockerfiles when deploying applications.

Starting from scratch Dockerfiles, I know that there is no real environment overhead since the environment is the most basic it can be. Additionally, I do not have to worry about the golang environment in my container because we are not going to ship "golang and all its packages", we are going to ship the binary itself. This is best described as an example.

The Dockerfile, is like a Makefile but for your environment. Each line describes what the Dockerfile does. Prior to executing the docker file, we will need to set up the environment.

GOOS=linux go build .                                                                                           

This line will build the go program as a Linux binary.

docker build -t dathanvp/goprogram:latest .

This line says; execute the docker file and tag the image as dathanvp/goprogram.

docker run -p 8282:8282 -v /Users/dathan/gocode/src/github.com/dathanvp/goprogram/logs:/mnt:rw dathanvp/goprogram:latest

Now, this is the magic. Docker will open port 8282 and map it to port 8282 in the container. A volume is attached from my laptop to the container's /mnt directory with read and write privileges. (When executing my container in production only this line changes.) This volume is to keep the logs persistent. Containers reset state, thus losing anything generated and the reason for my volume. Finally docker run is going to run my image dathanvp/goprogram

I deploy my container's  to AWS by executing

docker push dathanvp/goprogram

This pushes my go program from my laptop to cloud.docker.com where my aws instances can then pull from, enabling running my programs in production without having to set up the environment on aws (other than docker of course).

Finally, why do it this way? I want my program to run on my laptop and on my AWS ubuntu servers without having to keep golang development environments in sync. Additionally, I want my containers to be really small so I don't have to ship hundreds of megs around to start the application, which itself is about 13MB. Uploading from comcast sucks. So, in conclusion, this is the best way I've found so far :)

Please let me know how you ship go applications and why.

Monday, December 11, 2017

Designing a RDBMS SQL Table

Building tables initially should not really require a lot of thought. What? I'm suggesting that when designing a table think of the Table as a spreadsheet. Yes.

For instance, let's create a table that combines all social scores of a users' media in a single table. We will call this table platform_resources.

What do we need to record the social score total of a single person?
  • Who is this person? How I know this persona. 
  • What is the platform? Which Social Platform does this reference refer to
  • What is the platform identifier? What is the social platform identifier
  • What is a common social score for each user? View, Likes, Comments

So the table above answers my questions. For each piece of media that and interna_name owns, I am able to collect a summary of basic stats. By no means is this optimized. The row size is roughly

21+51+51+4+4+4+4+4+256+4 = 403 bytes not taking into account the primary key which is very large and takes a small byte overhead due to the exceeding an internal limit.

We are not optimizing yet, we are just answering questions.

The Primary Key was picked to be platform_id, platform, internal_name. Following the Left Most prefix rule for composite indexes, we have roughly 3 indexes in 1 index. The original primary key, platform_id & platform, then finally platform_id. The primary key was picked to be this because for a platform the platform_id is unique and the person who owns this platform_id should be represented. Additionally, since we are using INNODB the table is sorted by the primary key.

No optimizations just a basic table get's the job done. Now how would you optimize this table?
First, you should ask what are you optimizing the table for? Disk size? Memory fit? Because its ugly and it bothers me?

Let's estimate how this table will grow. This table is a MxN problem where for each internal_name they will have N resources per platform. The bounds of the growth are around 1000 items per year per platform. M is less than 20K so, It's really not worth it to optimize for any other reason just to do it because. So don't.

If I had to optimize because the MxN problem turned into a huge overhead.
First, I would reduce the row size of the table by making lookup tables for internal_name, platform, platform_id which keeps the primary key smaller - probably in 64 bits.

Next, distribute the table by either date_taken range since queries will be more interested in the latest data, or we can distribute the table by internal_name; this is another post.

Finally, sometimes you just need a table and you just want to query it like give me the total sum of views for all Instagram videos by a creator. The post is to think about optimizations when you need to think about optimizations and not beforehand. If your needs changes; change the schema to focus on the optimization you are going for. :)