Before working with Django at Kogan I used SQLAlchemy. One of the many features I liked about SQLAlchemy was you had the freedom to join tables on any clause. This is especially useful for when you have a not-quite-normal schema and the data almost matches, for example, matching a denormalised ID from different systems.
The trick in Django is to use the undocumented ForeignObject
(the base class of ForeignKey
) which allows for more flexibility when joining. Using the ForeignObject
on its own will attempt to create a new column in the database which we don't want. Setting private_only=True
will let us use a "virtual column"!
Here's a snippet we use to set up joins:
class Relationship(models.ForeignObject): """ Create a django link between models on a field where a foreign key isn't used. This class allows that link to be realised through a proper relationship, allowing prefetches and select_related. """ def __init__(self, model, from_fields, to_fields, **kwargs): super().__init__( model, on_delete=models.DO_NOTHING, from_fields=from_fields, to_fields=to_fields, null=True, blank=True, **kwargs, ) def contribute_to_class(self, cls, name, private_only=False, **kwargs): # override the default to always make it private # this ensures that no additional columns are created super().contribute_to_class(cls, name, private_only=True, **kwargs)
This keeps the original column intact (so your production code won’t need a huge refactor) and allows you to prefetch or select_related other attributes off the referenced table.
Here’s a hypothetical situation where you might use this. Say you’re working on a legacy system (because you’d never make these mistakes now!). You’ve got a table customers
which has customer_number
which was generated by an external system. You’ve also got another table sales
which uses customer_number
as a foreign key. Unfortunately, customer_number
might reference something that doesn’t exist in the customers
table as it was dated a long time ago.
CREATE TABLE customers (
customer_number VARCHAR(100) NOT NULL UNIQUE,
name VARCHAR(100)
);
CREATE TABLE sales (
id int NOT NULL,
customer_number VARCHAR(100) NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO customers (customer_number, name) VALUES ("CUST01", "Steven"), ("CUST02", "Amy");
INSERT INTO sales (id, customer_number) VALUES (1, "CUST01"), (2, "EXT-01");
How could you perform a join on these?
Let’s start with raw SQL.
SELECT id, name FROM sales LEFT JOIN customers ON sales.customer_number = customers.customer_number;
In SQLAlchemy, you could use a custom join condition:
q = session.query(Sales).join(Customer, Sales.customer_number == Customer.customer_number)
In Django, you can now do this:
Class Sale(models.Model):
customer_number = models.CharField(max_length=100)
customer_reference = Relationship("Customer", from_fields=["customer_number"], to_fields=["customer_number"])
Sale.objects.values_list("id", "customer_reference__name")
We've found this to be very useful for our own legacy schemas by being able to optimise blocks where multiple queries were previously necessary.