mySQL DBA, Architecture, Dev, Scale, HA, Code

Tuesday, February 07, 2023

ERROR 3546 (HY000) at line 24: @@GLOBAL.GTID_PURGED cannot be changed: the added gtid set must not overlap with @@GLOBAL.GTID_EXECUTED

ERROR 3546 (HY000) at line 24: @@GLOBAL.GTID_PURGED cannot be changed: the added gtid set must not overlap with @@GLOBAL.GTID_EXECUTED

As a MySQL 8.0 user, you may have encountered the following error message when trying to dump data from one database server and add that data to another server:

"ERROR 3546 (HY000) at line 24: @@GLOBAL.GTID_PURGED cannot be changed: the added gtid set must not overlap with @@GLOBAL.GTID_EXECUTED."

This error occurs when the Global Transaction Identifier (GTID) sets of the source and target servers overlap, probably from a previous import. A use case for this is importing staging into development as an example.

GTIDs are unique identifiers that are generated for each transaction in MySQL 8.0. They allow you to track changes to your data, even across multiple servers. When you receive this error message, it means that there is a conflict between the source and target server GTID sets.

The solution to this issue is to reset the master on the target server before importing the dump file. Resetting the master will erase all the binary logs and start a new one, allowing you to import the dump file without encountering the error.

RESET MASTER
mysql -uroot db < dump.sql

It is nice to blog again, I am blogging here about mySQL and @ https://dathan.github.io/blog/ on random other things.

Tuesday, April 23, 2019

Debugging awslab's aws-service-operator with go delve on vscode

Currently, I'm doing a lot of work in Kubernetes, especially around operators. One operator, in particular, I am working on is aws-service-operator from awslabs. We ran into a bug with the default behavior around the dynamodb CR. There is a bug in this cloudformation template that defaults RangeAttributeTypes into Strings, when the operator supports strings, number, bytes.

I know this is a bug, the highlighted text from the click-through clearly states the bug, but how do I verify the bug? My environment is a macbook pro with vscode using all the go tools extensions.

So let's set up the debug environment:

First I need to setup the repo itself
mkdir -p awslabs
cd $GOPATH/src/github.com/awslabs
git clone git@github.com:awslabs/aws-service-operator.git

Now let's follow the development guideline and build the environment outside of vscode (getting dep and everything working)

$> code aws-service-operator // this is an extension from vscode to call it at the command line.

Click the menu Debug, click Add Configuration. Paste below.


{

    // Use IntelliSense to learn about possible attributes.

    // Hover to view descriptions of existing attributes.

    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387

    "version": "0.2.0",

    "configurations": [

        {

            "name": "Launch",

            "type": "go",

            "request": "launch",

            "mode": "debug",

            "remotePath": "",

            "port": 2345,

            "host": "127.0.0.1",

            "program": "${workspaceRoot}/cmd/aws-service-operator",

            "env": {},

            "args": ["server","--kubeconfig=/Users/dathan.pattishall/.kube/config", "--region=us-west-2", "--cluster-name=dathan-eks-cluster", "--resources=s3bucket,dynamodb,sqs", "--bucket=wek8s-dathan-aws-service-operator", "--default-namespace=system-addons"],

            "showLog": true

        }

    ]

}

Click the menu Debug and click Start Debugging. This assumes that you're using saml for aws auth, your auth is admin and has at least IAM EKSWorkerNodeRole. If you are using AWS-Admin like I am, you are good.

Now let's start debugging. Put a breakpoint at line 101 of pkg/helpers/helpers.go. Step into

resource, err := clientSet.CloudFormationTemplates(cNamespace).Get(cName, metav1.GetOptions{})

You'll see that the application makes a call to itself to try to get the cloudformation templates you installed. If you didn't install any cloudformation template called dynamodb the default will be used:

https://s3-us-west-2.amazonaws.com/cloudkit-templates/dynamodb.yaml

This is where the bug is. The cloudformation yaml has a bug where it does not ref the Hash or Range Attribute Types and the workaround is to install a cloudformation CR.

apiVersion: service-operator.aws/v1alpha1
kind: CloudFormationTemplate
metadata:
name: dynamodb
output:
url: "https://s3-us-west-2.amazonaws.com/a-temp-public-test/dynamodb.yaml"

output.url contains the ClouldFormationTemplate with a data field that defines the cloudformation template. I can only surmise that to make common code paths, that they will make extra API calls for reuseability, because even though the aws-service-operator has the CloudFormationTemplate, it needs to fetch it remotely due to how the code is constructed, making redundant calls. You'll see this in the debug. Make an API call to itself, then parse the YAML, then fetch the YAML from a remote endpoint.

Now what we see here is that the operator needs to pull the CR from a REST endpoint or HTTP endpoint even though it already has it defined in K8s itself.

The fix to the bug is as follows.

From:

AttributeDefinitions:
-
AttributeName: !Ref HashAttributeName
AttributeType: "S"
-
AttributeName: !Ref RangeAttributeName
AttributeType: "S"

To

AttributeDefinitions:
-
AttributeName: !Ref HashAttributeName
AttributeType: !Ref HashAttributeType
-
AttributeName: !Ref RangeAttributeName

AttributeType: !Ref RangeAttributeType

Additional to this, awslabs uses N as a value. This value means false in YAML (why I don't know). Thus for the yaml passed to create a dynamodb table you need to quote it.

So in the end to create my table I need the following yaml to create the dynamodb table which I use to test the operator.

kind: DynamoDB
+ metadata:
+   name: sample-tablename
+ spec:
+   hashAttribute:
+     name: AuthorizationCode
+     type: "S"
+   rangeAttribute:
+     name: CreatedAt
+     type: "N"
+   readCapacityUnits: 10
+   writeCapacityUnits: 10

Notice the S is quoted along with the "N" otherwise, N equates to false.

In conclusion. Delve is awesome, the operator has a bug and I was table to figure it out with this debugging method to produce this case https://github.com/awslabs/aws-service-operator/issues/181

Tuesday, February 12, 2019

Aurora mySQL differences

Working with Aurora MySQL I thought would be a breeze, but its subtle differences make me scratch my head. Thus I need to find out more about this and write a post :)

What is Aurora?

It's a mySQL wire protocol compatible storage management system that sits on top of mySQL and modifies some innodb internals. You can read about more of the architecture here: I think of it as a Proxy Storage Engine System

The differences start from just starting the server. Aurora MySQL has Huge Page support turned on by default since AWS launches Aurora MySQL server with their custom flag for innodb large page support:

innodb_shared_buffer_pool_uses_huge_pages

This is not an open source setting documented by MySQL official build. In fact, there is not much information on this setting at all. I can only assume RDS instances are configured with Huge page support as detailed here and this custom setting for Aurora turns large page support on for mysqld.

So, what else is different between Aurora and Innodb? From Amazon's docs

The following MySQL parameters do not apply to Aurora MySQL:

innodb_adaptive_flushing

innodb_adaptive_flushing_lwm

innodb_checksum_algorithm

innodb_doublewrite

innodb_flush_method

innodb_flush_neighbors

innodb_io_capacity

innodb_io_capacity_max

innodb_log_buffer_size

innodb_log_file_size

innodb_log_files_in_group

innodb_max_dirty_pages_pct

innodb_use_native_aio

innodb_write_io_threads

thread_cache_size

The following MySQL status variables do not apply to Aurora MySQL:

innodb_buffer_pool_bytes_dirty

innodb_buffer_pool_pages_dirty

innodb_buffer_pool_pages_flushed

Note

These lists are not exhaustive.

In summary, Aurora uses mySQL but it's also a layer on top of mySQL. In all essence, it's just another storage engine which forks Innodb and provides management primitives built into the DBMS system.

In the next weeks, I'll describe how we launch Aurora instances and why as well as capturing more differences that have not made it into this list.

Friday, July 20, 2018

Hackathon process per week Sprints Idea

I like hackathons. Hackathons provide the freedom to build outside the process. The forced speed to deliver something to demo and the fun self-deprecation of "ooh this is really ugly/bad TODO don't do this." in the source/commit logs which tells a great story. Also, a great side effect; people are really interested in refactoring and fixing the code especially if the demo went well.

So, I started thinking what if we can take this naturally formed fun process and define a weekly sprint, with a daily standup reporting on the process to achieve the product goal, using a hackathon method.

Day 1 and 2

"How much can you get done in two days for the demo"

This portion is no more than an hour planing. You talk to your team and divide up tasks for the hack you want to demo-in two days. For instance, "Johnny says I'll write the service" and "Amanda says I'll provide the data-it will be in MySQL". Sammy says "I'll write the front end to demo, Johnny let's agree what you'll send me, for now, I will simulate some pho data."
Then each person builds their part.
During the process, Johnny is building the interface from an un-authenticated HTTP Get request that has a JSON response to define what his service will return. Amanda finishes the process of testing some queries for functionality she checks in her part of how to get data, massage it and what tables are what, NOT performance.
Johnny sends a sample interface to Sammy so some dynamic data can be injected into the mockup when Sammy requests data. They agreed that a REST API using GET with a JSON response.
There are PR requests when sharing the same addition to the same place otherwise frequent merges
When fixing something that made it into master fix forward so check into master :P
Each check-in should be filled with a series of TODO, FIXME or "TODO don't do this" statements for speed until that's not needed when you have a refined process.
Demo

What does the individual developer each get?

Each developer produced something quick to verify the viability of the idea. A vested interest to fix the hacks and beautify the code, reusing reusable parts, etc.

What does the team get?

The team feels that they got something out pretty quick, the team has some talking points of what to fix next and what systems the team envisions that could possibly be used in other parts of the code. Finally, the chance to learn something new in the knowledge transfer or the ability to fix an approach before going too far down the rabbit hole.

Day 3

The next day is mapping out what the developer wants to refactor, has to change and gets to delete. With knowledge transfer of the good, bad, and embarrassing things with an idea of the direction each person took. It is fun.

This is looking over the queries to make they make sense.
Are the correct indexes there?
Are we really answering the correct questions efficiently if not how can we?
What hacks do we need to undo to provide what we delivered?
How do I test this thing? I need to make sure before I refactor I have reproducible tests.

Day 4

Document, Test, Refactor agree more as a team and focus on a code structure that enables adding the next round of features while setting standards of the direction going forward or revisiting them if need be.

Day 5

Do more of the same or get a beer with the team.

This process makes me feel that I am building something fast. The reason for the speed was to validate the idea or approach. Time is built into the process for testing, refactoring and documenting. The refactoring takes into account how to add new things faster. 50% building 50% testing, documenting, refactoring, making better. Producing a 4 day work week with daily standups

What about a really big project and delivering constantly

Whiteboard what is needed to deliver such as what the product is, what does it solve, what are the features.
Answer what is alpha
Answer what is beta.
Divide and conquer the vision for each "hackathon period"
Adjust projection of delivery based on the previous hackathon progress
Keep working and visit each hackathon period to verify the correct thing is built correctly.
Profit from a fun fast paced delivery of code that treats features and delivery of great code the team all validates as equal partners.

Saturday, January 27, 2018

Spotify Top 200 in mySQL

I do a lot of data analysis lately, and I try to find answers to questions through data for my companies pressing questions. Let's look at the past year of 2017 and answer questions for people who like music.

artist is the artist name
track is the artist's track name
list_date is which chart date the artist show up on the top200
streams is the number of plays following spotify specific rules

Let's look at the data set

select count(*) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
+----------+
| count(*) |
+----------+
| 74142 |
+----------+
1 row in set (0.04 sec)

How many artists made it in the top200 for the United States?

mysql> select count(DISTINCT(artist)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';
+-------------------------+
| count(DISTINCT(artist)) |
+-------------------------+
| 527 |
+-------------------------+
1 row in set (0.09 sec)

Wow, it's really hard to be a musician. Only 527 broke the top200.

How many tracks in 2017 broke the top200?

select count(DISTINCT(track)) from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01';

+------------------------+

| count(DISTINCT(track)) |

+------------------------+

| 1682 |

+------------------------+

For the entire year, 1682 songs defined the united states listing habits for the most part.

Who showed up the most in the top200 for 2017?

mysql> select artist,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;

+------------------+------+

| artist | CNT |

+------------------+------+

| Drake | 3204 |

| Lil Uzi Vert | 1891 |

| Kendrick Lamar | 1874 |

| Post Malone | 1776 |

| Ed Sheeran | 1581 |

| The Weeknd | 1566 |

| Migos | 1550 |

| Future | 1536 |

| The Chainsmokers | 1503 |

| Kodak Black | 1318 |

+------------------+------+

10 rows in set (0.16 sec)

Drake killed it, but Lil Uzi Vert is the star of the year, IMHO. Drake has a pedigree while Lil Uzi just started running.

Also from these artists I can tell HIP HOP dominated us charts; Let's verify this assumption.

mysql> select artist,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;

+------------------+------------+

| artist | CNT |

+------------------+------------+

| Drake | 1253877919 |

| Kendrick Lamar | 1161624639 |

| Post Malone | 954546910 |

| Lil Uzi Vert | 818889040 |

| Ed Sheeran | 714523363 |

| Migos | 682008192 |

| Future | 574005011 |

| The Chainsmokers | 557708920 |

| 21 Savage | 472043174 |

| Khalid | 463878924 |

+------------------+------------+

10 rows in set (0.48 sec)

Yup hip hop dominated the top 10 steams.

What about tracks? What are the top 10 tracks by streams?

select track,SUM(streams) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 1 order by 2 DESC LIMIT 10;

+-------------------+-----------+

| track | CNT |

+-------------------+-----------+

| HUMBLE. | 340136186 |

| XO TOUR Llif3 | 314758565 |

| Congratulations | 283551832 |

| Shape of You | 280898054 |

| Unforgettable | 261753940 |

| Mask Off | 242524530 |

| Despacito - Remix | 241370570 |

| rockstar | 225517132 |

| Location | 224879215 |

| 1-800-273-8255 | 219689749 |

+-------------------+-----------+

10 rows in set (0.43 sec)

Which tracks and artists had the most time in the top200?

select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10;

+------------------+-------------------------------------+-----+

| artist | track | CNT |

+------------------+-------------------------------------+-----+

| D.R.A.M. | Broccoli (feat. Lil Yachty) | 485 |

| French Montana | Unforgettable | 417 |

| PnB Rock | Selfish | 394 |

| Travis Scott | goosebumps | 365 |

| Post Malone | Go Flex | 365 |

| Childish Gambino | Redbone | 365 |

| Post Malone | Congratulations | 365 |

| Post Malone | White Iverson | 365 |

| Migos | Bad and Boujee (feat. Lil Uzi Vert) | 364 |

| Bruno Mars | That's What I Like | 364 |

+------------------+-------------------------------------+-----+

10 rows in set (0.20 sec)

Also from this data I can tell that Post Malone had a fantastic year!

So, more questions can be answered, like who held the number 1 position on the top200 the most?

select artist,track,count(*) AS CNT from spotify.top200 WHERE country='us' and pos=1 and list_date >= '2017-01-01' and list_date < '2018-01-01' group by 2 order by 3 DESC LIMIT 10;

+----------------+-------------------------------------+-----+

| artist | track | CNT |

+----------------+-------------------------------------+-----+

| Post Malone | rockstar | 105 |

| Kendrick Lamar | HUMBLE. | 67 |

| Ed Sheeran | Shape of You | 48 |

| Luis Fonsi | Despacito - Remix | 47 |

| Migos | Bad and Boujee (feat. Lil Uzi Vert) | 29 |

| 21 Savage | Bank Account | 20 |

| Drake | Passionfruit | 12 |

| Logic | 1-800-273-8255 | 10 |

| Taylor Swift | Look What You Made Me Do | 10 |

| French Montana | Unforgettable | 7 |

+----------------+-------------------------------------+-----+

10 rows in set (0.26 sec)

Wow can see hear that Post Malone is the star!

In summary, getting public data sources and doing simple queries can give a clearer insight into data to answer some pressing questions one may have.

With the schema above what questions would you answer?

Friday, January 19, 2018

Deploying Go Applications in Docker Containers using a Scratch Docker File

Programming in golang is fantastic. I find it fun, expressive and simple to build concurrent programs. Deploying a golang app from laptop to production is as hard now as when it was back when I was building Monolithic services. A great way to deploy nowadays is to deploy microservices in containers. Containers keep the environment between laptop and AWS Linux instance in sync since in essence the environment is deployed and not just the code or binary.

Containerization of the environment is not ideal although. Sometimes you can ship containers of 1GB in size or more. Deploying that across the LAN is ok, over the WAN .. it is debatable. So, to deal with this problem I work with scratch Dockerfiles when deploying applications.

Starting from scratch Dockerfiles, I know that there is no real environment overhead since the environment is the most basic it can be. Additionally, I do not have to worry about the golang environment in my container because we are not going to ship "golang and all its packages", we are going to ship the binary itself. This is best described as an example.

The Dockerfile, is like a Makefile but for your environment. Each line describes what the Dockerfile does. Prior to executing the docker file, we will need to set up the environment.

GOOS=linux go build .

This line will build the go program as a Linux binary.

docker build -t dathanvp/goprogram:latest .

This line says; execute the docker file and tag the image as dathanvp/goprogram.

docker run -p 8282:8282 -v /Users/dathan/gocode/src/github.com/dathanvp/goprogram/logs:/mnt:rw dathanvp/goprogram:latest

Now, this is the magic. Docker will open port 8282 and map it to port 8282 in the container. A volume is attached from my laptop to the container's /mnt directory with read and write privileges. (When executing my container in production only this line changes.) This volume is to keep the logs persistent. Containers reset state, thus losing anything generated and the reason for my volume. Finally docker run is going to run my image dathanvp/goprogram

I deploy my container's to AWS by executing

docker push dathanvp/goprogram

This pushes my go program from my laptop to cloud.docker.com where my aws instances can then pull from, enabling running my programs in production without having to set up the environment on aws (other than docker of course).

Finally, why do it this way? I want my program to run on my laptop and on my AWS ubuntu servers without having to keep golang development environments in sync. Additionally, I want my containers to be really small so I don't have to ship hundreds of megs around to start the application, which itself is about 13MB. Uploading from comcast sucks. So, in conclusion, this is the best way I've found so far :)

Please let me know how you ship go applications and why.