Faster Django Tests by Disabling Signals

Django signals are a form of Inversion Of Control that developers can use to trigger changes based (usually) on updates to models.

The canonical example is automatically creating a UserProfile when a User instance is created:

from django.conf import settings
from django.db.models.signals import post_save
from django.dispatch import receiver

@receiver(post_save, sender=settings.AUTH_USER_MODEL)
def post_save_receiver(sender, instance, created, **kwargs):
    UserProfile.objects.create(user=instance, ...)

Signals are useful for connecting your own behaviour to events that you might not have control over, such as those generated by the framework or libary code. In general, you should prefer other methods like overriding model save methods if you have the ability, as it makes code easier to reason about.

Signal receivers can sometimes be much more complex than creating a single object, and there can be many receivers for any given event, which multiply the time it takes to perform simple actions.

Tests and signals

Often you won't actually need your signals to execute when you're running your test suite, especially if you're creating and deleting many thousands of model instances. Disconnecting signals is tricky though, especially if the disconnect/reconnect logic can be stacked.

An easier, but more invasive method for suspending signal receivers is to check a global setting, and return early.

from django.conf import settings

@receiver(post_save, sender=MyModel)
def mymodel_post_save(sender, **kwargs):
    if settings.DEBUG:
        return
    work()

This has two drawbacks. Firstly, it's messy, and you need to remember to add the check to each receiver. Secondly, it depends on a specific setting that you might need to have turned off when running your tests, which makes it harder to test the actual receiver.

Selectively disabling signals

We can take this most recent idea of checking a setting, and wrap it up nicely in our own decorator. When running your tests, you can override the SUSPEND_SIGNALS setting per test method or class.

Here's a gist that does just that

import functools

from django.conf import settings
from django.dispatch import receiver

def suspendingreceiver(signal, **decorator_kwargs):
    def our_wrapper(func):
        @receiver(signal, **decorator_kwargs)
        @functools.wraps(func)
        def fake_receiver(sender, **kwargs):
            if settings.SUSPEND_SIGNALS:
                return
            return func(sender, **kwargs)
        return fake_receiver
    return our_wrapper

And using this decorator in your test suite is straightforward:

@override_settings(SUSPEND_SIGNALS=True)
class MyTestCase(TestCase):
    def test_method(self):
        Model.objects.create()  # post_save_receiver won't execute

@suspendingreceiver(post_save, sender=MyModel)
def mymodel_post_save(sender, **kwargs):
    work()

And just like that, we can skip signal receivers in our test suite as we like!

Kogame (Koh-Gah-Mi) - A real time game in Django

Kogame (Koh-Gah-Mi) - A real time game in Django

For our March hackday this year we decided to build a multiplayer game using

[Django Channels](https://channels.readthedocs.io/en/latest/). We've been keeping

an eye on channels over the time, and thought with the release of channels 2.0 that

it was the right time to dive in and get some practical experience. The organisers

didn't want to do yet-another-chat-system implementation, so they decided to make

things a bit more interesting, and look at writing a real-time game.

Going too deep with Django Sessions

The other day I was battling with some weird behaviour where a key in a session was updated, but sometimes it would revert after a while.

The key in question was a flag to say that a customer had been sent an email about abandoning their cart, and when the key reverted they ended up getting duplicate emails.  To achieve this, we have an offline Celery task that looks over all sessions in the DB and checks a flag on the cart to know if the email had already been sent.

When I dug into the cases where duplicate emails were sent, I noticed that all of them came back to the site after the first email and started browsing again. But why? Why would browsing the site cause the flag to change state? And why wasn't everyone effected? ...

A hidden gem in Django 1.7: ManifestStaticFilesStorage

The biggest change in Django 1.7 was the built-in schema migration support which everyone is aware of, however 1.7 also shipped with lots of other great additions, ManifestStaticFilesStorage - the new static files storage backend was one of them.

Static file caching is everywhere

Before explaining what ManifestStaticFilesStorage is and how it works, this is the overview of why we need it at Kogan.com:

cache busting static

In order to deliver the content to our customers as fast as possible, we cache the downloaded static files by using max-age request headers. This allows our customers to download the content once and the subsequent requests to static files will be served from a cache. As shown on the diagram, if we were to use normal static file names like base.css, the content of the file would be cached in the CDN as well as on the browser and we would have a hard time trying to invalidate these caches. We cache-bust the content by appending a md5 hash of the content of the file to the file name. When we deploy a new base.css, {% static %} template tag will turn base.css into base.d1833e.css and the browser will then request a new file. {% static %} template tag is able to translate base.css into base.d1833e.css thanks to static files storage backend. This setting is named STATICFILES_STORAGE in Django.

Before ManifestStaticFilesStorage

Our Django app was previously configured to use CachedStaticFilesStorage which resulted in placing file mappings in the CACHES backend, for us it was Redis. Django adds these mappings during collectstatic when it gathers all statics and puts them in one place.

collectstatic

This solution has coupled static assets deployment with code deployment resulting in a number of issues:

  • Running collectstatic as part of code deployment --> slow deploys
  • Extra load on Redis
  • App servers were sometimes out of sync as we deploy them in batch. When we start the deployment, Redis would be updated with the new keys, the first batch of App servers would get the new code, but the other half still had old code.

Out of sync app servers

ManifestStaticFilesStorage to the rescue

ManifestStaticFilesStorage has helped us to decouple the static compilation stage from deployments by allowing Django to read static file mappings from staticfiles.json on a filesystem. staticfiles.json is an artifact file produced by collectstatic with ManifestStaticFilesStorage as a backend. We can now include this staticfiles.json into our code package and deploy it to a single app server without affecting the others.

New ManifestStaticFilesStorage

Where is staticfiles.json located?

By default staticfiles.json will reside in STATIC_ROOT which is the directory where all static files are collected in. We host all our static assets on an S3 bucket which means staticfiles.json by default would end up being synced to S3. However, we wanted it to live in the code directory so we could package it and ship it to each app server. As a result of this, ManifestStaticFilesStorage will look for staticfiles.json in STATIC_ROOT in order to read the mappings. We had to overwrite this behaviour, so we subclassed ManifestStaticFilesStorage:


from django.contrib.staticfiles.storage import ManifestStaticFilesStorage
from django.conf import settings

class KoganManifestStaticFilesStorage(ManifestStaticFilesStorage):

    def read_manifest(self):
        """
        Looks up staticfiles.json in Project directory
        """
        manifest_location = os.path.abspath(
            os.path.join(settings.PROJECT_ROOT, self.manifest_name)
        )
        try:
            with open(manifest_location) as manifest:
                return manifest.read().decode('utf-8')
        except IOError:
            return None

With the above change, Django static template tag will now read the mappings from staticfiles.json that resides in project root directory.

Thanks Django

Thanks to Django 1.7, we've not only gotten a better schema migration system but also improved our deployment process. And not to mention ManifestStaticFilesStorage addition was only 40-50 lines of code (as of the day this blog post was published).

Catches when Expecting Exceptions in Django Unit Tests

To cover all bases when writing a suite of unit tests, you need to test for the exceptional cases. However, handling exceptions can break the usual flow of the test case and confuse Django.

Example scenario: unique_together

For example, we have an ecommerce site with many products serving multiple countries, which may have different national languages. Our products may have a description written in different languages, but only one description per (product, language) pair.

We can set up a unique_together constraint to enforce that unique pairing:

class Description(models.Model):
    product = models.ForeignKey("Product")
    language = models.ForeignKey("countries.Language")

    class Meta:
        unique_together = ("product", "language")

    subtitle = models.CharField(...)
    body = models.CharField(...)
    ...

Developer chooses AssertRaises()

If the unique_together rule is violated, Django will raise an IntegrityError. A unit test can verify that this occurs using assertRaises() on a lambda function:

def test_unique_product_description(self):
   desc1 = DescriptionFactory(self.prod1, self.lang1)
   self.assertRaises(IntegrityError, lambda:
      desc2 = DescriptionFactory(self.prod1, self.lang1)

The assertion passes, but the test will fail with a new exception.

A wild TransactionManagementError appears!

Raising the exception when creating a new object will break the current database transaction, causing further queries to be invalid. The next code that accesses the DB - probably the test teardown - will cause a TransactionManagementError to be thrown:

Traceback (most recent call last):
File ".../test_....py", line 29, in tearDown
   ...
File ...
   ...
File ".../django/db/backends/__init.py, line 386, in validate_no_broken_transaction
An error occurred in the current transaction.
TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.

Developer used transaction.atomic. It's super effective!

Wrapping the test (or just the assertion) in its own transaction will prevent the TransactionManagementError from occurring, as only the inner transaction will be affected by the IntegrityError:

def test_unique_product_description(self):
   desc1 = DescriptionFactory(self.prod1, self.lang1)
   with transaction.atomic():
       self.assertRaises(IntegrityError, lambda:
          desc2 = DescriptionFactory(self.prod1, self.lang1)

You don't have to catch 'em all: Another solution

Another way to fix this issue is to subclass your test from TransactionTestCase instead of the usual TestCase. Despite the name, TransactionTestCase doesn't use DB transactions to reset between tests; instead it truncates the tables. This may make the test slower for some cases, but will be more convenient if you are dealing with many IntegrityErrors in the one test. See the Django Documentation for more details on the difference between the two classes.

Tips for writing unit tests for Django middleware

Django framework provides developers with great testing tools and it's dead easy to write tests for views using Django's test client. It has extensive documentation on how to use django.test.Client to write automated tests. However, we often want to write tests for components that we have no control over when using django.test.Client. An example of that is Django Middleware which is used to add business logic either before or after view processing. django.test.Client has no public API for developers to access the internal request object.

Here is a simple example of a middleware class that creates a stash from data saved in the session.


class Stash(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

class StashMiddleware(object):
    """
    Reconstructs the stash object from session
    and attach it to the request object
    """
    def process_request(self, request):
        stashed_data = request.session.get('stashed_data', None)
        # Instatiate the stash from data in session
        if stashed_data is None:
            stash = Stash()
        else:
            stash = Stash(**stashed_data)
        # Attach the stash to request object
        setattr(request, 'stash', stash)
        return None

Let's analyze what needs to be tested:
1. Assert that if the stashed data exists in the session, it should be set as an attribute of the request
2. Assert that if the stashed data doesn't exist in the session, an empty stash is created and attached to the request object
3. Assert that all attributes of the stash can be accessed

How about dependencies? What do we need in order to write this test?
- StashMiddleware class (this can be easily imported)
- request object as an argument in process_request(). This one is a bit harder to obtain, and since we are writing a unit test, let's just mock it.

We are now ready to write the test


from django.test import TestCase
from mock import Mock
from bugfreeapp.middleware import StashMiddleware, Stash

class StashMiddlewareTest(TestCase):

    def setUp(self):
        self.middleware = StashMiddleware()
        self.request = Mock()
        self.request.session = {}

This sets up an instance of StashMiddleware and mocks a request. I'm using Michael Foord's mock library to assist me with this. Since we know session is a dictionary like object, we can mock it with an empty dictionary.


    def test_process_request_without_stash(self):
        self.assertIsNone(self.middleware.process_request(self.request))
        self.assertIsInstance(self.request.stash, Stash)

    def test_process_request_with_stash(self):
        data = {'foo': 'bar'}
        self.request.session = {'stashed_data': data}
        self.assertIsNone(self.middleware.process_request(self.request))
        self.assertIsInstance(self.request.stash, Stash)
        self.assertEqual(self.request.stash.foo, 'bar')

The first test asserts that (without stashed data in the session):
- process_request returns None
- Stash object has been attached to request

The second test asserts that:
- process_request returns None
- Dictionary containing data in session is unpacked and used to create a Stash object.
- Stash attributes can be accessed

In both cases, we assert for a return value of process_request. This might sound like a redandunt thing to test for but it actually helps us to identify regressions. Knowing that process_request returns None, we don't have to worry about this middleware skipping the subsequent middlewares.

Tips

  • Not all tests can be written with django.test.client.Client.
  • Keep your dependencies for unit tests as light as possible, use mocks.
  • Write unit tests that run fast. Don't test ORM or network calls, try using mock.patch instead
  • Revisit your code if you have a hard time trying to set up dependencies, that normally indicates that the code is too coupled.

Like the sound of how we work? Check out our Careers Page!