June Hackday - Deployment Traffic Lights (Part 2)

Last June we had a Hackday at Kogan.com! This Hackday’s focus on displaying information and providing useful alerts using hardware and software. Teams were asked to express something they find interesting through one of the available mediums.

In the second part of this series Alec will talk a bit about Deployment Traffic Lights.

Deployment Traffic Lights

Our team was to build a system to monitor the status of our deploys. We were given a handful of wifi-enabled Arduinos, a light stick, a relay board and some jumper leads. Our plan was to poll Jenkins, and then turn on the lights according to the following matrix:

  • New build starting: display amber light.

  • Build in critical deployment step: flash amber light.

  • Build finished successfully: display green light.

  • Build finished in error state: display red light.

Getting started

The first step for this project was to get our development environments under control. With half of us running the newer MacBooks with only USB-C ports, the biggest challenge was installing and configuring a serial interface to upload our programs. While we were tinkering away, our IoT expert, Michael, was wiring up the lights to the relay board.

Team division

Once we were up and running, we broke down the tasks into 3 groups:

  • Jenkins connectivity - ensuring a safe connection between the Arduino and production Jenkins

  • Light management - building a set of functions to change light colours

  • Integrating both sides - The glue between Jenkins and the light

Problems

As we were programming, we ran into a number of problems. Firstly, how to “acknowledge” a failed build? If a build failed or was cancelled before a critical step, we want to be able to flip the light back to green to reduce anxiety around the office. We decided that the best way forward was to have a novelty sized button that would mark a build as “OK”. This also meant that now our code was going to have to keep track of the individual build numbers (as the Jenkins polling was always retrieving the most recent build).

Next, concurrency. An Arduino is, for all intents and purposes, single threaded. Now that we’ve got some user input (a button), we need to make sure that that the button push event would be handled in a timely manner. In multithreaded environments, there’d normally be a thread for UI events, and then a pool of background workers to perform non-interacting, long running tasks. In this case, the UI would be the button, and the background tasks would be HTTP requests and light changing.

We came up with 3 options to solve this problem:

  1. Implement asynchronous http. Much like JavaScript, an asynchronous model seemed like a really good solution.

  2. Use interrupts. An interrupt pauses execution, executes another function, and then returns back to the original paused position. If the interrupt function is small enough, this would fake asynchronous behaviour.

  3. Hold down the button until something happens. The lazy method. Keep the signal high until the program has processed it.

As this was a hackday and we were pressed for time, we quickly crossed off asynchronous http. Interrupts would be the ideal solution, but we decided to go with the lazy option 3. If we had time at the end we would fix it. We decided that we’d also not poll Jenkins on every loop, to give time for the psuedo UI loop to “breathe”.

Finally, secrets. We had sensitive data that we wanted to load onto the Arduino, specifically the WiFi password and a Jenkins token. Since the permissions for the Jenkins token would be very limited, we weren’t too concerned about leaking this. As for the WiFi password, we were too pressed for time to come up with a good solution, so we baked it in to our repository (don’t look!).

Outcome

At the end of the day we didn’t have much to show. As it turns out, the WiFi module on the ESP32s isn’t that strong (or not that compatible with the office WiFi), and we had a lot of issues maintaining connectivity (even shutting the door would weaken the signal too much!). We did have connectivity to Jenkins, and the lights did change with the build status. The button was abandoned, and so were flashing lights (flashing lights meant asynchronous problems). We were left with green (everything OK), yellow (deploying), and red (failed).