(TODO)
To be able to talk about code organisation, it's worth defining some common terminology. These definitions are taken from domain-driven design which, although it isn't possible to fully apply in Django, provides some valuable techniques we can use.
The reason DDD can't be fully applied the key pattern in DDD of separating your model from persistence so when the model changes your database doesn't necessarily have to, and vice versa. You'll often see this called the repository pattern. This has quite a number of benefits such as exposing behaviour rather than data, allowing you to optimise storage independently of your model, or even change where data is stored (perhaps into a separate service).
However, Django idiomatically merges the model and persistence layers with its ORM. It would be possible to treat the classes derived from django.models.Model
as being repositories and only ever using them to save or retrieve data, then expose plain Python classes internally. However, that would negate many of the reasons to use Django such as the auto-generated admin forms and suchlike.
This has implications around architecture because it means you're essentially restricted to a layered architecture rather than the more modern (and arguably superior) 'onion' architecture (aka ports and adapters, or hexagonal architecture).
An entity is an object whose identity is defined by an identifier, rather than by its data. This identifier may be the database primary key, or a surrogate identifier such as a policy number. Entities are often mutable, although this is not a requirement.
Many of the most common entities we deal with such as Policy
, Product
and User
are entities because their identifier is what defines them; a User
does not become a different user if they change their email address, for example.
A value object is an object whose identity is defined solely by its data. A common example is a string
as although there may be multiple instances of the string "hello world"
in memory we consider them to be the same for all intents and purposes. Value objects are always immutable, as any change to the data is effectively a different value object.
Unfortunately, efficient use of relational databases requires identifiers, so there are a couple of patterns that have evolved for storing value object.
The User
entity shows a common approach when there is a 1:1 relationship between an entity and a value object it contains: The user's address is stored inline in the user table as individual fields, so an Address
property can be synthesised, for example:
@dataclass
class Address:
line1: str
# ...
postcode: str
class User(Model):
@cached_property
def address(self) -> Address:
return Address(self.address_line1, ..., self.address_postcode)
The FinancialTransaction
entity shows the pattern to use where there is a 1:many relationship between the entity and its Entry
value objects: The value objects are stored in a separate table and then reference their parent object. At first glance the Entry
objects might look like entities because they have an identifier, but it's only their data that matters so they are really value objects; their identifier exists only to help with database structure.
An aggregate is one or more entities that logically form a single unit. In any aggregate there is a 'top level' entity, known as the aggregate root, which is responsible for enforcing any invariants in the entire aggregate.
Most entities will also be aggregate roots; larger aggregates composed of multiple entities are comparatively rare.
To make code manageable and maintainable, most large codebases are separated into different 'layers' which each have their own responsibility. Within a Django application, the layers we would typically have are, from bottom to top:
- Infrastructure
- Model & Persistence
- Domain Services
- Application Services
- Presentation
Infrastructure, in the application sense, is the set of basic facilities that other code can build on such as signals, queues, logging, metrics, and so on. This is arguably less of a layer than supporting code that sits alongside all other layers.
This layer has the responsibility of ensuring that all aggregates are valid by enforcing their invariants, and saving/retrieving data from the database. Any domain logic that is related to a single aggregate should live in this layer.
Domain services are used as a container for domain logic that relates to multiple aggregates, because the logic does not belong in any particular aggregate. Domain services, along with the model and persistence, form the domain model.
Application services are not part of the domain, but are used to orchestrate the domain into higher level processes. An example of this might be a renewals service which would orchestrate actions such as taking payment, renewing the policy, and sending notifications.
The presentation layer includes anything that exposes the application to the outside world, such as a user interface or API. The presentation layer should typically only use application services, although in some cases may use the domain model directly.
Even domain logic that appears trivial is worth encapsulating in a method.
This is best illustrated with an example. The Policy
model has a renewals_switch_to
property which is an optional Product
that should be used on renewals. To determine the product to use for renewals seemed as easy as the following, which was used in a number of places:
renewal_product = policy.renewals_switch_to or policy.product
However, the logic changed and this was subsequently updated in a number of places to also check things like whether new users were permitted, or whether the new product allowed renewals. Unfortunately it wasn't changed consistently everywhere, which meant that some places had different behaviour.
If this had been encapsulated in a method, then there would be a single source of truth for which product should be renewed to, and only one place would need to be changed:
class Policy(Model):
def renewal_product(self) -> Optional[Product]:
product = self.product.renewals_switch_to or self.product
return product if product.allow_new_users else None
You'll notice that this method uses two different aggregates, Policy
and Product
, so the method should arguably be in a domain service because aggregates should only have methods related to themselves. An alternative implementation might look something like this:
class PolicyRenewals:
def __init__(policy: Policy) -> None
self.policy = policy
def renewal_product(self) -> Optional[Product]:
product = self.policy.product.renewals_switch_to or self.policy.product
return product if product.allow_new_users else None
In this case either approach seems reasonable, because no properties of the Product
aggregate are used, and too many tiny domain services can hurt discoverability. However, if the method started getting more complex and using details of the product, or there were a set of related renewals methods, then it would make sense to extract them into a cohesive domain service.
There are a fair number of articles that explain why sequential ids might be an issue but, to summarise:
- Avoids information disclosure
- Stops entity enumeration -> IDOR
- Less prone to mistakes while querying DB
Don't:
class Entity(models.Model):
# Django adds a sequential ID by default
...
class Entity(models.Model):
id = models.AutoField(primary_key=True)
Do:
class Entity(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)