Going too deep with Django Sessions

The other day I was battling with some weird behaviour where a key in a session was updated, but sometimes it would revert after a while.

The key in question was a flag to say that a customer had been sent an email about abandoning their cart, and when the key reverted they ended up getting duplicate emails.  To achieve this, we have an offline Celery task that looks over all sessions in the DB and checks a flag on the cart to know if the email had already been sent.

When I dug into the cases where duplicate emails were sent, I noticed that all of them came back to the site after the first email and started browsing again. But why? Why would browsing the site cause the flag to change state? And why wasn't everyone effected? ...

A hidden gem in Django 1.7: ManifestStaticFilesStorage

The biggest change in Django 1.7 was the built-in schema migration support which everyone is aware of, however 1.7 also shipped with lots of other great additions, ManifestStaticFilesStorage - the new static files storage backend was one of them.

Static file caching is everywhere

Before explaining what ManifestStaticFilesStorage is and how it works, this is the overview of why we need it at Kogan.com:

cache busting static

In order to deliver the content to our customers as fast as possible, we cache the downloaded static files by using max-age request headers. This allows our customers to download the content once and the subsequent requests to static files will be served from a cache. As shown on the diagram, if we were to use normal static file names like base.css, the content of the file would be cached in the CDN as well as on the browser and we would have a hard time trying to invalidate these caches. We cache-bust the content by appending a md5 hash of the content of the file to the file name. When we deploy a new base.css, {% static %} template tag will turn base.css into base.d1833e.css and the browser will then request a new file. {% static %} template tag is able to translate base.css into base.d1833e.css thanks to static files storage backend. This setting is named STATICFILES_STORAGE in Django.

Before ManifestStaticFilesStorage

Our Django app was previously configured to use CachedStaticFilesStorage which resulted in placing file mappings in the CACHES backend, for us it was Redis. Django adds these mappings during collectstatic when it gathers all statics and puts them in one place.

collectstatic

This solution has coupled static assets deployment with code deployment resulting in a number of issues:

  • Running collectstatic as part of code deployment --> slow deploys
  • Extra load on Redis
  • App servers were sometimes out of sync as we deploy them in batch. When we start the deployment, Redis would be updated with the new keys, the first batch of App servers would get the new code, but the other half still had old code.

Out of sync app servers

ManifestStaticFilesStorage to the rescue

ManifestStaticFilesStorage has helped us to decouple the static compilation stage from deployments by allowing Django to read static file mappings from staticfiles.json on a filesystem. staticfiles.json is an artifact file produced by collectstatic with ManifestStaticFilesStorage as a backend. We can now include this staticfiles.json into our code package and deploy it to a single app server without affecting the others.

New ManifestStaticFilesStorage

Where is staticfiles.json located?

By default staticfiles.json will reside in STATIC_ROOT which is the directory where all static files are collected in. We host all our static assets on an S3 bucket which means staticfiles.json by default would end up being synced to S3. However, we wanted it to live in the code directory so we could package it and ship it to each app server. As a result of this, ManifestStaticFilesStorage will look for staticfiles.json in STATIC_ROOT in order to read the mappings. We had to overwrite this behaviour, so we subclassed ManifestStaticFilesStorage:


from django.contrib.staticfiles.storage import ManifestStaticFilesStorage
from django.conf import settings

class KoganManifestStaticFilesStorage(ManifestStaticFilesStorage):

    def read_manifest(self):
        """
        Looks up staticfiles.json in Project directory
        """
        manifest_location = os.path.abspath(
            os.path.join(settings.PROJECT_ROOT, self.manifest_name)
        )
        try:
            with open(manifest_location) as manifest:
                return manifest.read().decode('utf-8')
        except IOError:
            return None

With the above change, Django static template tag will now read the mappings from staticfiles.json that resides in project root directory.

Thanks Django

Thanks to Django 1.7, we've not only gotten a better schema migration system but also improved our deployment process. And not to mention ManifestStaticFilesStorage addition was only 40-50 lines of code (as of the day this blog post was published).

Catches when Expecting Exceptions in Django Unit Tests

To cover all bases when writing a suite of unit tests, you need to test for the exceptional cases. However, handling exceptions can break the usual flow of the test case and confuse Django.

Example scenario: unique_together

For example, we have an ecommerce site with many products serving multiple countries, which may have different national languages. Our products may have a description written in different languages, but only one description per (product, language) pair.

We can set up a unique_together constraint to enforce that unique pairing:

class Description(models.Model):
    product = models.ForeignKey("Product")
    language = models.ForeignKey("countries.Language")

    class Meta:
        unique_together = ("product", "language")

    subtitle = models.CharField(...)
    body = models.CharField(...)
    ...

Developer chooses AssertRaises()

If the unique_together rule is violated, Django will raise an IntegrityError. A unit test can verify that this occurs using assertRaises() on a lambda function:

def test_unique_product_description(self):
   desc1 = DescriptionFactory(self.prod1, self.lang1)
   self.assertRaises(IntegrityError, lambda:
      desc2 = DescriptionFactory(self.prod1, self.lang1)

The assertion passes, but the test will fail with a new exception.

A wild TransactionManagementError appears!

Raising the exception when creating a new object will break the current database transaction, causing further queries to be invalid. The next code that accesses the DB - probably the test teardown - will cause a TransactionManagementError to be thrown:

Traceback (most recent call last):
File ".../test_....py", line 29, in tearDown
   ...
File ...
   ...
File ".../django/db/backends/__init.py, line 386, in validate_no_broken_transaction
An error occurred in the current transaction.
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.

Developer used transaction.atomic. It's super effective!

Wrapping the test (or just the assertion) in its own transaction will prevent the TransactionManagementError from occurring, as only the inner transaction will be affected by the IntegrityError:

def test_unique_product_description(self):
   desc1 = DescriptionFactory(self.prod1, self.lang1)
   with transaction.atomic():
       self.assertRaises(IntegrityError, lambda:
          desc2 = DescriptionFactory(self.prod1, self.lang1)

You don't have to catch 'em all: Another solution

Another way to fix this issue is to subclass your test from TransactionTestCase instead of the usual TestCase. Despite the name, TransactionTestCase doesn't use DB transactions to reset between tests; instead it truncates the tables. This may make the test slower for some cases, but will be more convenient if you are dealing with many IntegrityErrors in the one test. See the Django Documentation for more details on the difference between the two classes.

Webpack Your Things

We very recently finished migrating our front-end build process to Webpack. As with any reasonably sized codebase, it's always a little more complex than the 3 line examples in how-to guides. This post will list some of the higher-level things I learned during this undertaking, the next one in the Webpack Series will detail some specific quirks and solutions.

Resources

I was largely able to do this by leveraging the hard work of some clever heroes. This very conveniently timed blog post covered a lot of what was needed. JLongster also has some very good tips, helpful for more than just backend apps. Regarding documentation, the widely-cited Webpack How-to gives a pretty concise overview of most things you will need. And of course, the docs have a lot of information. Sometimes too much. But usually most things you need are listed there. Occasionally something isn't, which brings me to lesson one.

Lesson 1: cd node_modules

One of the biggest things I learned in this undertaking isn't limited just to Webpack, and helped fix a few other things. Previously I had treated the node_modules folder as a black box - just npm install and be on my way. This is fine for everyday usage, but when you hit barriers or bugs sometimes you need to do some digging. Rather than throwing random inputs at a black box to measure the effect, you can just crack it open.

A good example of this is the CommonsChunkPlugin, which is documented thusly:

If omitted and options.async or options.children is set all chunks are used, elsewise options.filename is used as chunk name
— chunk.name definition

I found this sentence somewhat confusing, but easy to clarify by reading the code that checks this. If/else and some variable assignments and straightforward things to follow. And the nature of Webpack modules means they are generally quite small, if all else fails just console.log everything.

Note this doesn't necessarily mean you must always open the box and understand the internal implementation of libs you are using. But it is reassuring to know that you can.

Lesson 2: Use Tables

For visualising data, never for layout. The quite excellent webpack analyse tool provides a tonne of useful data to improve your module situation. The crazy tree-view animations look awesome, and are animated and zoomable. But they can quickly spiral out of control into a meaningless ball of branches. There are fortunately table views for all these pages as well. While repeatedly processing a bunch of lists to try minimising file sizes isn't the most romantic task, you can sort and group a lot more easily. Conversely, the Chunks tree view stays parseable for a longer time (as you will have fewer chunks than modules). It can give a quick overview if any of your chunks are ballooning out of control, and the accompanying table used for more automated analysis.

Trees: Awesome, albeit unclear

Lesson 3: Measuring Victory

As with any code change, the best measure of success is not breaking anything. In this case our ideal outcome was the change was completely invisible to end users, assuming everything deploys and the site still works. Beyond that it was meant to simplify frontend development for all of our devs, again a success. We have simple build & watch tasks without global Node package requirements (except npm). We were able to deploy our first React component by adding a single line to process JSX. So we don't (yet) have a quantifiable metric for the success of this adventure into the world of Webpack. But as someone who formerly complained about writing build tasks, it has been fun.


In the next post I will drill down into a few specific issues encountered and how they were fixed, and some other useful features and tricks. If you have any advice or thoughts, we'd love to work with you - careers.

Top 3 lessons from CssConfAu

We recently got the opportunity to attend cssconf in Melbourne, followed by Decompress (a day of lightning talks and hacking).

I found it to be a really rewarding experience. The material presented was really relevant to my work and I got to pick the brains of some of the speakers during break time :)

Here's a list of my top 3 takeaways from the CSS Conf AU by category:

  1. CSS & Usability & Accessibility
    • 4 1/2 of theming css - Depending on your needs.
    • Everyone is responsible for "UX" - that includes devs!
    • Every hour you spend making the web faster, more accessible and easier to use, turns into days of time saved by real people
    • SVGs are what we should eat for breakfast every morning
    • Pick your colors wisely, because they might increase/decrease your accessibility
  2. Web Page performance:
    • Cram your initial view into the first 14kb
    • Eliminate any non-critical resources blocking your critical path
    • Be responsible with your responsive design
    • Perceived performance > Actual performance
  3. Animations:
    • Improves your user experience
    • Animate exclusively on opacity and transforms
    • Perceived speed > Actual speed
    • Animations help infer context from revealed information
    • Use animations instead of gifs!

After following all the speakers on Twitter, I happened upon a couple of good resources unbeknownst to me, and I’ve added them to my Easter break reading/review/code list of resources:

  1. Front End Guidelines by Benjamin de Cock
    https://github.com/bendc/frontend-guidelines

  2. CSS Guidelines by Harry Roberts
    http://cssguidelin.es/

  3. Stray articles I’ve missed from Filament Group
    http://www.filamentgroup.com/lab/
  4. A guide to SVG Animations by Sara Soueidan (+ her other articles)
    https://css-tricks.com/guide-svg-animations-smil/

Like what we do? Join us then.

ReactJS + Flux Meetup

Yesterday we had the pleasure of hosting the first ReactJS Melbourne meetup and we're absolutely thrilled with the turn out!

For those who missed out, here are the slides from the Kogan.com prototype demo we presented yesterday:

We'd like to thank everyone who came, and we look forward to seeing what you create with these tools!

Here are some snaps from the event:


Like the sound of how we work? Check out our Careers Page!