Caching is one of the best strategies you can use when tuning the performance of your website. A lot of work goes into minimising the number of database queries for any given page, and making the queries that do hit the database efficient. But the best query is the one that's never executed at all.
There are many different types of cache:
- Browser caches that cache static assets in a users browser.
- Proxy caches that cache web pages and static assets in data centers or edge nodes.
- Application caches that cache the result of expensive calculations in the application.
Django's cache framework provides an interface that backends can implement to act as application caches. A number of backends are provided out of the box:
- Memcached (recommended)
- Database
- Filesystem
- Local memory
- Dummy
django-redis is also extremely popular as a cache backend, and comes with a bunch of useful features.
The Benefits of Caching
Application caches in Django are used for storing rendered partial templates, fully rendered pages, and even the result of an expensive database aggregation.
The common theme is that the thing being stored is generally expensive to calculate compared to the cost of retrieving the data from a cache.
For example, rendering a navigation bar might require 3 database queries, a nested loop, and some non-trivial if-else statements. That might take "only" 30ms, but if your time budget is 100ms then you haven't got time for much else. Rather than paying the 30ms cost every time a user views a page, you might store that fragment in a cache, and return it the next time a user views a page. Getting the value from cache might cost 3ms, which would be a substantial time saving.
Caching Pitfalls
There are only two hard things in computer science - cache invalidation, naming things, and off by one errors.
Cache invalidation is one of the harder problems to overcome, because it requires:
- that you know that something cached has changed, and;
- that you have a mechanism for clearing that cached something
For this reason, cached values are often coupled with an expiry, called a TTL (time to live). When the time expires, the value is either purged from the cache, or the value is cleared on the next access.
Another, sometimes overlooked, problem is managing the size of a cache. This is especially of concern if you're storing very large strings like templates, or calculated properties for a large number of objects. Memory can fill up, the process managing the cache can die, and the cache goes away. Being unable to access the cache can often mean your application server returning errors to users.
Overhead
Caching isn't free.
The most robust cache solutions are implemented as servers, sometimes on the same server, but often a network hop away. You might end up paying a cost of up to a few milliseconds requesting data from a server over the network.
Then there's the problem of storing complex objects. Strings and numbers are easy to store in a separate system. Lists and class instances not so much. To store and recreate complex data types, we need to serialize and deserialize data which comes with an overhead of time and memory. Django will use the pickle module for serialization.
Local-Memory Caching
By default Django will use the LocMemCache
backend, which stores all key-values in the memory of the application. It's basically a per-process thread-safe dictionary, and is not recommended for production because it's not particularly memory efficient.
Because the LocMemCache
backend is per-process, there can be many slightly different versions of the cache across nodes and worker processes.
Benefits:
- Avoids network overhead
- Great for access patterns that hit a small number of keys hundreds or thousands of times in a short period
Pitfalls:
- Still incurs serialization overhead
- Non-deterministic eviction strategy at max size
- Overuse can cause memory problems for the application server
- Cache invalidation is only local to that process
Better Local-Memory Caching
We had found many places in our code base that were storing values in what were, effectively, global variables. Each location had a slightly different way of storing this data but the purpose was always to avoid expensive re-calculations.
A lot of this data didn't make sense to store in our primary Redis cache. It was either highly local to the process (large offline celery tasks), network call overhead would eliminate any value of caching, or the particular value was recalled so often that eliminating the network overhead resulted in significant savings.
Still, these solutions were all poor re-implementations of a local memory cache to avoid the pitfalls of the local memory cache backend. If we were able to design a local memory cache without those limitations, we'd gain a useful reusable caching component.
We began by taking the existing LocMemCache
backend and changing the behaviour we weren't interested in.
Eliminating Pitfalls
The first limitation was the easiest to overcome. Since the cache is in-process complex objects can be stored in a simple dictionary without any serialization. All calls to pickle
were dropped.
The default eviction strategy is to drop a large percentage of key-values upon reaching the cache max size. Unfortunately this strategy could evict really useful keys while leaving untouched keys to exist forever. It also requires iterating through the entire key space each time an eviction is performed which is an O(N) operation. A better eviction strategy is a Least Recently Used (LRU) strategy.
LRU works by keeping all keys ordered according to the last access time. When key 123
is retrieved it's moved to the front of the queue. If a new key 321
is added it's moved to the front of the queue. When the cache reaches its max size then the value at the very back of the queue is removed. Fresh and frequently accessed keys are kept, while old unused keys are expired.
The easiest way of introducing LRU behaviour to the cache was to change the caching dictionary to an implementation of an LRU dictionary. lru-dict was chosen as a fast C-implementation but other implementations could be used if CPython is not the target platform.
Handling time based expiration (TTL) needed to change once LRU expiration was implemented, as we no longer had information about which keys were expiring. Storing the key and expiration date in a separate dictionary no longer made sense. Instead, we chose to store the expiration alongside the cached value, which we check on each access:
lru_dict[key] = (value, expiration)
The final change was to remove key name validation. Memcache does not allow specific characters in its keys so Django will validate key names to ensure applications will work across all cache backends. Validating key names is again an O(N) operation based on the length of the key which can add up to significant CPU overhead. This got us about a 17% increase in throughput.
There is no good way of preventing unintended memory growth of a local in-memory cache. A small number of keys can still be used to store huge lists. Trying to memory inspect every value that enters the cache is non-optimal, and not the easiest thing to do in Python anyway.
We've approached the memory growth problem with the following strategy:
- Default to a small number of max keys (50 - 100).
- Clearly document the limitations of the cache, and when it's useful.
- Don't make it the default cache! Using the local memory cache must be opt-in.
By making this cache opt-in, we're able to identify good candidates and apply targeted optimisation, without fear of a developer accidentally spraying large querysets into it.
There is no good global cache invalidation solution that we were willing to implement. The tradeoff we made was to have relatively short TTLs, and to only add data that can be read stale.
The graphs below show the timing differences (lower is better) between LocMemCache
, the first iteration of our LRU cache, and a second version of our LRU cache that eliminated key validation.
Set operations get extremely slow for LocMemCache
as lots of evictions start to occur due to key iteration.
Results
This project began because a particular group of queries had begun to really impact our database. We determined that preloading all instances of a particular type and performing calculations in Python would be a lot more efficient than performing calculations in the database.
Because these values are used by “fat models” it made sense to have a service layer returning the data rather than each model instance fetching or generating it. Fetching data over the network would have been extremely costly for the number of times our models required values from the cache. Eliminating serialization was also a large performance win. Pickling lists of model instances isn't fast.
We measured a massive improvement to the CPU utilization of our database after deployment. Can you spot it?
We'll continue to selectively add small sets of stable data to the in-memory cache to take even more load off the database. We're considering any data fetched by context processors first. Another big one for us are the store objects, which are used everywhere, and change on the order of months or years. Monitoring application memory usage is now a lot more important, and we'll be keeping a close eye on those particular metrics as we scale up this cache.
Show me the code!
We’ve released the LRU cache backend on github at https://github.com/kogan/django-lrucache-backend and published to pypi. There’s also some performance and tuning tools along with analysis in the benchmarking folder. The benchmarking tools were adapted from the awesome python-diskcache project.
As mentioned before, there is a dependency on lru-dict but any dict-like LRU implementation should be able to be used with little modification.
Happy Caching!