All Hands on Deck - Kubernetes Hackday (Part 1)

PB057566.jpg

Intro

On the 5th of November 2018, the IT team at Kogan.com started another hackday. This time, we set our goal on learning Kubernetes. We wanted to answer the question; how exactly can we leverage container orchestration to make our deployment process faster and more efficient?

In order to achieve our goal, we set out to deploy one of our major apps that controls customer subscription preferences with Kubernetes on two different cloud providers, Google Cloud Platform (GCP GKE) and Amazon Web Services (AWS EKS). By doing this, we hoped we could understand the pros and cons of each platform, while learning the intricacies of Kubernetes deployment at the same time.

This will be a two part blog series. The first part will be a short overview of our motivation, goals, as well as how the day actually unfolded. The second part will be more technical, focus on the approaches of the two teams, and discuss the pros and cons of each platform.

Motivation

In order to understand our motivation, it is useful to have a general idea of our deployment process here at Kogan. Everyday, we have a set time where we do a daily deploy of our major apps. The deployment pipeline consists of a fairly expensive build step, storing the artefact, and then pushing the artifact to provisioned servers. Those servers are mostly provisioned with a combination of autoscaled CloudFormation and Salt.

While this existing process is sufficient in most cases, we are still facing some outstanding issues that we would like to improve. Autoscaling takes too long, so we aren't able to react to changes in traffic patterns as quickly as we'd like. Secondly, the deployment process can take up to 15 minutes.

One solution for this is to introduce docker into production as it will help us with deployment speed and standardise our process further. The number one tool for container orchestration at the moment is Kubernetes, and that's where it comes into play.

The word Kubernetes had been floating around the office for a while, but few of the engineers could say they understood it, let alone experimented with it. Infrastructure can be a bit of a black box for some people, so we took this opportunity to learn together and get everyone somewhat across how deployment works at Kogan.

Goal & Planning

Considering that we would like to learn the pros and cons of both GCP and AWS platforms we set up two teams of 6 to 7 developers. Each team was responsible for deploying the app to their assigned platform in a manner suitable for a staging or UAT environment.

With this in mind we wrote down a series of infrastructure related tasks on what constitutes deploying" the app. We separated tasks into major components, like setting up a cluster, deploying the web app, and deploying dependent services.

We began with an outline for the day, and then settled in to install all the tools we would require. After completing a short tutorial with minikube we split up into our respective teams to put together a proposed architecture diagram which we'd present to each other before any real hacking began.

 In the middle of architecture discussion

In the middle of architecture discussion

Hackday

The initial challenges were understanding the fundamental concepts and terminology surrounding Kubernetes. What is a Pod? What is a Service? What are the difference between them? Where does a Deployment come in?

As the day unfolded and each team had their infrastructure running, we met some new challenges. The first was a difficulty with making the application Kubernetes "ready", which involved new configuration, variables, and strategies for managing static content. Then there were difficulties with learning new tools, like Helm, for composing a Kubernetes deployment.

We found that examples and documentation could be lacking or outdated. There was a big jump in translating our existing docker-compose configuration to something that Kubernetes would understand e.g passing secrets to apps automatically, and getting the application to communicate with the Database.

There were also some difficulties in getting a cluster running. The GCP team, as expected, had an easier time considering that they were able to run Kubernetes controller out of the box on GCP. The AWS team spent quite some time configuring the VPC and other AWS services to accommodate a Kubernetes cluster.

Due to these challenges, at the end of the day, we had a situation where the GCP team had managed to setup their cluster, but had difficulties composing the configuration necessary to get the application running. The AWS team had managed to successfully get the apps deployed locally with Minikube but more of a hard time in translating that configuration to the AWS cluster.

Conclusions

Even though we did not manage to completely deploy our app from end-to-end with Kubernetes, we had gained a significant amount of experience as a team. Everyone was involved in the nitty gritty details of configuring Kubernetes, and even at various levels, every team member was aware of Kubernetes basic concepts and tooling.

The hackday was a giant leap forward on our long term goal of converting our production apps to running on Kubernetes. We also acquired valuable information on the pros and cons of each cloud platform. Overall, we were confident to say that the goals set out originally had been achieved by the end of the day.

Stay tuned for part 2 where we will discuss the technical advantages and disadvantages of both GCP and AWS platform, and what we have learned as a whole from a devops standpoint.

 Team GCP in the middle of demo

Team GCP in the middle of demo

 Team AWS in the middle of discussion

Team AWS in the middle of discussion

 Doughnuts accompanying this hackday

Doughnuts accompanying this hackday

June Hackday - Lifx Smart Tiles (Part 3)

Last June we had a Hackday at Kogan.com! This Hackday’s focus on displaying information and providing useful alerts using hardware and software. Teams were asked to express something they find interesting through one of the available mediums.

In this third and final part of the series Jake will talk a bit about Team LifxnChill.

Team LifxnChill

Our team was given a set of 5 smart Lifx Tiles. These nifty light panels are 8x8 LED grids of dissipated lights. Five tiles can be chained together, arranged and animated. They’re programmable using an API (docs here). Each tile cannot be programmed separately. These have great potential for expressing all kinds of stuff.


Brainstorming

Ideas proposed included the following:

  • Use github webhooks to do something with the lights when a commit is pushed, a pull request is merged/closed, or a deviant force pushes.

  • Standup Glow - Pulse when standup kicks off

  • Stretch goal - animations


Plan of attack

1. Write experimental commands

2. Create endpoints that use the commands

42. Create lambda functions for other stuff

We never got around to writing up steps 3 to 41. Steps 1, 2, and 42 were enough to get started!

The API

Two APIs were available; LAN and HTTP:

HTTP - Send off basic requests, such as pulses, brightness and cycles. Authenticated by an oauth token. This was low entry and really fun to see ideas come together. Team members could POST requests and see the result immediately.

curl -X POST "https://api.lifx.com/v1/lights/all/effects/breathe" \     -H "Authorization: Bearer YOUR_APP_TOKEN" \     -d 'period=2' \     -d 'cycles=5' \     -d 'color=green'

Keep me POSTed - Lifx Tile

LAN - Limited to, you guessed it, the Local Area Network. The lower latency LAN API allows calls to map each individual light rather than a whole tile. This meant you could use animations. We initially tried this with an existing package called photons-core but opted for HTTP for reasons we’ll later explain.


Problems faced

The LAN API was looking promising, until we discovered the complexity involved in getting the tiles running. Remember we only had one day here, so the focus had to be on getting something out. Using a local network also made it difficult for a team member working remotely to participate. With these factors in mind we opted for the HTTP API.

 Getting into it

Getting into it

While developing with the tiles, we discovered often API calls were not coming through. We suspected it was to do with throttling, but cumulatively the team’s usage was nothing should have triggered it. It turned out the Tiles had a bug:

WHEN all Lifx Tiles are off
AND a cURL request is sent
Expected Result:
All tiles animate according to the options sent
Actual Result:

The first master tile ignites, but it’s daisy-chained titles do not

When all tiles are off, you can’t power them all on with a single request (Which was incredibly frustrating). As a proof of concept everything worked, but a dealbreaker for day-to-day usage.

At this point I decided to post on Lifx’s forums seeking an answer. Not long after they posted a firmware update and voilà! The Tiles became usable.

Outcomes

We now have a proof of concept standup reminder, an Orange/Red/Green status integrated with our jenkins pipeline and a glow each time a commit is made.

In the future we’d like to move these actions over the LAN API with endpoints that our pipeline can hit, allowing the use of animations.

June Hackday - Team KASX (Part 1)

June Hackday - Team KASX (Part 1)

Last June we had a Hackday at Kogan.com! Team KASX was given the task of creating an app using React Native for a mobile device that could be used to display information such as Kogan’s stock price.

Kogame (Koh-Gah-Mi) - A real time game in Django

Kogame (Koh-Gah-Mi) - A real time game in Django

For our March hackday this year we decided to build a multiplayer game using

[Django Channels](https://channels.readthedocs.io/en/latest/). We've been keeping

an eye on channels over the time, and thought with the release of channels 2.0 that

it was the right time to dive in and get some practical experience. The organisers

didn't want to do yet-another-chat-system implementation, so they decided to make

things a bit more interesting, and look at writing a real-time game.