‘Twas the week before Christmas…

…and like a ghost from the Charles Dickens classic, the Kogan Hackday was back to give the talented members of the Engineering team an opportunity to think of something other than what to add to last year’s Christmas shopping list.

What do “image recognition”, “web audio” and “boom gates” have in common? Absolutely nothing! But having a nice variety of topics this time around allowed us to push our limits as innovators and problem solvers through issues that we found interesting. After a quick introduction on the morning of the hack day, four teams were formed to work on the following problems:

Product Colour Extraction
Two teams wanted to accurately determine the colour of a product by utilising the product images’ Red, Green and Blue matrix values to build a machine learning model.
Bangers and Hash
The goal of this team was to create an interface that accepts string input; through the Web Audio API, this input is translated into an audio output. The chance to learn a bit of music theory and the potential to make “bangers” were also welcomed on the day.
Carmageddon
Parking spots are a scarce resource at the Kogan.com office, and since those who have an allocated spot aren’t always using them, the members of this team made it a mission to wire up a remote with ESP-32 before connecting it to a server and web/native app. This would allow any driving staff member to open the boom gate to park in an unused spot.

As a member of “Bangers and Hash”, I’ll be sharing how my team was able to integrate Web Audio API into our web app. I’ll also be giving a very brief overview of this system for context.

Web Audio API

Web Audio API allows users to control audio on the web. This system lets developers perform a range of tasks, from choosing the audio source to applying spatial effects to the audio. At its simplest form, the API handles operations within an “audio context”, which allows “modular routing”. Operations are performed with “audio nodes”, which are then linked together to form an “audio routing graph”. It should also be noted that timing is controlled with low latency and high precision. This means that your code can respond accurately to events and target specific samples.

Here is a typical workflow for the web audio:

You can read more about this here.

Brainstorming

After being given a quick presentation on the most basic parts of music theory by one of our members, the team went straight to planning. It wasn’t hard to agree on what our MVP should look like: a page with a text box where one can input any text and a Submit button that triggers the “translation” from text input to audio output. I would be lying, however, if I said that agreeing on the workflow felt just as automatic. After an hour or so of discussing how we should solve the problem of going from text input to audio output, the team was divided into sub-groups where each group was tasked with a specific component of the web app.

Development

We were lucky to have the Engineering team’s User Experience Designer with us, so we were able to create a straightforward yet sleek user interface. Besides a textbox, button and a cute little logo, the page also has a dropdown menu where users can select the key that they want their audio output to be in. For the sake of simplicity, the options include a major scale, a minor scale and a jazz scale.

The next challenge was figuring out a way to map the text input into its musical counterpart. Spaces (“ “) between characters/words were considered pauses in this case, so that had to be taken into account as well.

We ended up with a tuple of indices that signify a character’s musical equivalent and its duration. These values are calculated by applying the modulus operation between the array of characters (consecutiveCharsInWord) and the array of durations (durationMap).

A “pause” or an instance where no sound is produced is defined by the number of spaces in between a set of characters, i.e. words. In this tuple, the character index is defaulted to -1 to easily distinguish it from the values of the characters that are meant to produce a sound.

A snippet of how text input characters and their duration are mapped as a tuple

These tuples are pushed into the result array, which is then passed to the app’s driver where sound is produced using the Web Audio API.

The driver class contains all the methods and settings that are necessary to produce a sound based on the mapped text input. This is also where the audio context is initialized along with the gain, oscillators, octave multipliers and our sample scales, which are arrays of frequency values that match the actual note, e.g. middle C has a frequency of 261.63 (Hertz). In times like this, it’s definitely helpful to have two engineers in your team who also hold a degree in music!

After the index in the tuple mentioned above is further mapped as a frequency, this frequency and its duration are passed into a function called playVoice where the gain and the oscillation are configured to produce the intended sound:

Web Audio API is pretty simple with lots of room to experiment with, and our additional test functionality shows just that. Following the same workflow, our application has a “Test” button that plays a polyphonic tune by passing in an array of arrays that contain tuples to the Driver class. From this test functionality, we were able to build on the foundation of our app by adding another textbox to exhibit how polyphony can also be achieved from the text input.

Showcase

To cap the long day off, each team was given a chance to present their work. The teams were also given some time to talk about how they were able to accomplish the goals that they set earlier in the day, as well as how they can improve their work.

The two “Product Colour Extraction” teams took us through their projects using existing products from the Kogan website. They presented ways to extract colours and determine the most dominant colour using different machine learning APIs.

The “Bangers and Hash” team demoed their “music box” by inputting some strings and turning our laptop’s volume up. We also walked everyone through the other parts of our app, such as being able to choose a scale and the output of “Test” button.

The final team - “Carmageddon” - quickly talked about the system that they created before giving everyone a live demo.

And there you have it! Another fulfilling day that surely kept everyone on their toes from start to finish.

Remember to tune in for our next couple of blog posts where the other teams talk about the specifics of their projects. I personally can’t wait to read more about them.

If hack days where you can do practically anything from showcasing your hidden talents to solving frustrating parking issues are your thing, remember that we’re hiring!

DECEMBER 2021 HACKDAY (PART 1)

Web Audio API

Brainstorming

Development

Showcase

Ana Teo