Hack Day Image search

What if we could just upload an image of what we want to buy into our favourite shopping website's search bar? Wouldn't that make life about 100x easier?

These are questions that my team and I attempted to answer during our last hack day. We are always looking into ways of making the online shopping experience much more efficient for our users, and what better way to implement that than to make use of artificial intelligence? With new technologies being introduced into the image recognition space every day, we thought we should start laying some groundwork for something that could be really useful to us in the near future.

Research

We started out by exploring available options that are quick and out-of-the-box instead of building our own model. One idea that came up during our research was to use an available pre-trained model. What is a Pre-trained model you ask? Pre-trained models are an ML modes that are trained through millions of data, and readily available to use. For our purposes, we ended up with a pre-trained model from ImageNet called VGG-16. VGG-16 is a convolutional neural network that is 16 layers deep. You can load a pre-trained version of the network trained on more than a million images from the ImageNet database. The pre-trained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. This model should get us started on the basic image labeling engine. What we need to extend from this is that we need to have an image uploaded and process it on the fly.

Implementation

Our work will be divided into two main parts: Image labeling app. The response of this app will be the label of an image that the user uploads. Modify existing Kogan.com website to accept an image as search and to have all the integration with the image labeling server

We divided our team into two smaller groups, one team worked on the former and another one on the later.

For the image labeling application, we chose to do it in Flask given the quick setup. We set up a Flask app with a single API endpoint to upload an image and give the response of the list of labels for the uploaded image. The backend engine is based on the VGG-16 model to do the image labeling.

The first part is the usual image upload processing in Flask, which is pretty common. Once we saved the image, we passed on the image path to VGGModel object to do the prediction

from tensorflow.keras.applications import (vgg16) from tensorflow.keras.preprocessing.image import load_img from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.applications.imagenet_utils import decode_predictions import numpy as np

class VGGModel: def init(self): self.model = vgg16.VGG16(weights='imagenet')

def get_predictions(self, image_path):
    prepro_image = self.prepro_image(image_path)
    predictions = self.predict(prepro_image)

    labels = [label.replace('_', '+') for _, label, _ in predictions[0]]
    return labels

def prepro_image(self, image):
    original = load_img(image, target_size=(224, 224))
    numpy_image = img_to_array(original)
    image_batch = np.expand_dims(numpy_image, axis=0)
    processed_image = vgg16.preprocess_input(image_batch.copy())
    return processed_image

def predict(self, processed_image):
    predictions = self.model.predict(processed_image)
    label_vgg = decode_predictions(predictions)
    return label_vgg

The predict function will return a list of labels in string that will return a set of labels, sorted from the most accurate to the least one. Once we’ve done this basic application, we extend our work in this part by dockerise the whole server so it can be sent over to the other team to make the integration.

The next part is to modify our site to accept image upload, send the image to be uploaded to the image labeling server, and process the list of labels. We didn’t really do anything special here, as a mock up version of the whole thing, we only have a button without any styling that is able to read files and upload it on click. It was an implementation of a simple form with a single button that automatically submits when a file is selected . While it is necessary to upload it first to our main site backend, for the sake of simplicity, we directly upload it to the image labeling server to directly get the result.

The result we got was quite good given this is the model that is available as open source and pre-trained in generic images. It can correctly label an image of shoes, shopping cart, and buckets. On the other hand, it also gives weird results for more complex images, such as fruits, people, and other more specific images.

What we learned and how we could improve

Given time constraints and limitations on our expertise in Machine Learning, we think this can be a good start for an image search feature on an ecommerce website. There are many considerations before rolling this out into production, such as priority, how to detect branding or more precise image labeling. Branding is an important one, for example, the ability to detect “Nike shoes” instead of generic ones like “shoes” or “running shoes” could hugely benefit the customer using it. There is also further enhancement of this model where we can fine tune the model to adjust to the above needs.

Another solution is to use a managed service available in the cloud such as Google’s to do this; however, it will have the same result as we haven't really fine tuned anything yet, so we passed on this option as it will take some time to get it working and get the GCP account sorted out. Nevertheless, this shows that basic working of image search is not something that requires huge time and effort investment as one might perceive.

Final words

As with the previous ones, this hack day allowed us to explore topics that could be useful to the further development of the website. It’s always interesting to discover new technologies and adapt them to our current codebase, as well as to collaborate with different people within the Engineering team to make our ideas come to life. It was also really nice to see what the other teams came up with, and the presentations at the end of the day showed how creative and skilled our engineers really are!