Django Natural Key
I have found that many Django developers I have talked to are either unaware
of the natural_key
method that comes with Django models, and the
get_by_natural_key
method in model managers, or they do not know
how to use them.
I think one of the most amazing aspects of Django is the ability to automatically create the database right after defining the model and quickly enter data through the Django admin interface. During development, we add additional fields to the model, and as the model grows, the number of fields we enter through the admin panel also increases.
In certain situations, we use realistic data to decide on the HTML design of the page. For example, while creating a blog application, we enter a few blog entries. These entries have a title and a body, with the body possibly consisting of 4-5 paragraphs.
Sometimes, to avoid creating additional migrations, and since the project hasn’t been deployed yet, we can delete/drop the local development database and migration files and create the migrations from scratch.
In the end, it becomes necessary to repeatedly fill and empty the database. For these situations, we use Django’s fixture1 feature2.
For example, let’s say we have a model named Category
:
import uuid
from django.db import models
class Category(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False,
)
title = models.CharField(max_length=255)
def __str__(self):
return self.title
Let’s say we have entered some data and we want to create a fixture for this model:
python manage.py dumpdata coders.category # coders is the name of app
# category is the name of model
Django outputs the dump result as json
to console:
[
{
"model": "coders.category",
"pk": "0d40f3ad-a500-4717-a0c4-cbaf880306df",
"fields": {
"title": "C++"
}
},
{
"model": "coders.category",
"pk": "1999fd35-7513-4b5d-add1-11db4d073cf3",
"fields": {
"title": "PHP"
}
},
{
"model": "coders.category",
"pk": "24786be1-7c4e-4923-baf6-54f61fe70976",
"fields": {
"title": "Bash"
}
}
]
Let’s say we have a Post
model and a category
field that is linked to the
Category
model with a ForeignKey:
import uuid
from django.db import models
class Post(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False,
)
category = models.ForeignKey(
to='Category',
on_delete=models.CASCADE,
related_name='posts',
)
title = models.CharField(max_length=255)
def __str__(self):
return self.title
Similarly, let’s serialize it to JSON
using dumpdata
:
python manage.py dumpdata coders.post
Output:
[
{
"model": "coders.post",
"pk": "19b20f6b-b352-4152-8889-ec6c9118e28c",
"fields": {
"category": "1999fd35-7513-4b5d-add1-11db4d073cf3",
"title": "Modern PHP Frameworks"
}
},
{
"model": "coders.post",
"pk": "40cfc47f-bae2-4dff-a812-9fafea349799",
"fields": {
"category": "24786be1-7c4e-4923-baf6-54f61fe70976",
"title": "Bash Scripting Essentials"
}
},
{
"model": "coders.post",
"pk": "4cb31890-3c7b-461d-b87e-98131d0ee150",
"fields": {
"category": "0d40f3ad-a500-4717-a0c4-cbaf880306df",
"title": "Templates and Generic Programming in C++"
}
}
]
When we look at the output, we see that the pk
as primary key for Post
model and the category
field in the foreign key relationship contain the
pk
value as Category
model’s pk
.
The question is; when restoring this dump data, what order should we follow?
Should we load the dump of the Category
model first and then the dump of the
Post
model? Or is it the other way around?
What if the Post
model has other Foreign Keys or even Many to Many
relationships? In what order will we restore them? And will these restore
operations be idempotent?
Natural Keys to the Rescue!
In fact, the natural key strategy has been part of Django since the
beginning. If you have used Django’s built-in User
model and serialized with
dumpdata
using the --natural-primary
and --natural-foreign
arguments,
you might have noticed an interesting field in the output:
[
{
"model": "auth.user",
"fields": {
"password": "pbkdf2_sha256$720000$VrtuqbddGBXZxssR5dszGV$JgsU4a8sQnTGQ8RlND36CeXuTlZHugN3nID5wxNF+Nw=",
"last_login": "2024-05-19T18:08:04.481Z",
"is_superuser": true,
"username": "vigo",
"first_name": "Uğur",
"last_name": "Özyılmazel",
"email": "vigo@******",
"is_staff": true,
"is_active": true,
"date_joined": "2024-05-18T20:44:02.729Z",
"groups": [
],
"user_permissions": [
]
}
},
{
"model": "auth.user",
"fields": {
"password": "pbkdf2_sha256$720000$VrtuqbddGBXZxssR5dszGV$JgsU4a8sQnTGQ8RlND36CeXuTlZHugN3nID5wxNF+Nw=",
"last_login": "2024-05-18T20:45:04.615Z",
"is_superuser": true,
"username": "turbo",
"first_name": "Tunç",
"last_name": "Dindaş",
"email": "turbo@******"",
"is_staff": false,
"is_active": true,
"date_joined": "2024-05-18T20:44:02.729Z",
"groups": [
],
"user_permissions": [
]
}
}
]
Did you see a field named pk
? No. Now, let’s add an author
field to the
Post
model and take another dump:
import uuid
from django.conf import settings
class Post(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False,
)
category = models.ForeignKey(
to='Category',
on_delete=models.CASCADE,
related_name='posts',
)
title = models.CharField(max_length=255)
author = models.ForeignKey(
to=settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name='posts',
)
def __str__(self):
return self.title
A snippet from the data we dumped:
[
{
"model": "coders.post",
"pk": "19b20f6b-b352-4152-8889-ec6c9118e28c",
"fields": {
"category": "1999fd35-7513-4b5d-add1-11db4d073cf3",
"title": "Modern PHP Frameworks",
"author": 5
}
},
{
"model": "coders.post",
"pk": "1090da3a-af48-47dc-8041-c282b84b3d04",
"fields": {
"category": "f7d1ff24-c436-4ef5-bfb5-c1e8bcb64942",
"title": "Text Processing with Perl",
"author": 6
}
},
{
"model": "coders.post",
"pk": "17b0eea1-0ce5-4bc2-b4c2-13e1fd073516",
"fields": {
"category": "ea006dc9-7267-4d03-9541-37d42cdbefc1",
"title": "Introduction to JavaScript ES6",
"author": 4
}
}
]
Now, the question is: when restoring the author
and category
fields
related to this model with loaddata, what order should we follow? The
record with user ID 4
(author
) should be inserted into the user table with
ID 4 during the restore process. Who guarantees this?
Let’s find out the unique fields of User
model:
[
f.name
for f in User._meta.get_fields()
if getattr(f, 'unique', None) and f.get_internal_type() != 'AutoField'
]
['username']
Now let’s find out the username
for user ids 4,5,6
:
User.objects.values_list('id', 'username').filter(id__in=[4,5,6])
<QuerySet [(4, 'flatliners'), (5, 'ezelozy'), (6, 'yesimfo')]>
Why not use username
instead of id
in fixture? There can be only one
flatliners
or ezelozy
or yesimfo
right? How do we accomplish that? To
avoid such primary key confusions, Django provides us with an excellent model
instance method: natural_key
. If we look at
django/contrib/auth/base_user.py
:
class AbstractBaseUser(models.Model):
# fields, properties
#
def natural_key(self):
return (self.get_username(),)
So, if we ask, what is the natural key for user ID 4
?
User.objects.get(id=4).natural_key()
('flatliners',)
In the same file:
class BaseUserManager(models.Manager):
# methods, etc...
#
def get_by_natural_key(self, username):
return self.get(**{self.model.USERNAME_FIELD: username})
In the help text of dumpdata
there are two options:
python manage.py dumpdata --help
--natural-foreign Use natural foreign keys if they are available.
--natural-primary Use natural primary keys if they are available.
If the model manager has get_by_natural_key
method use it and generate
serialization with using natural keys instead of id
values. So, if we
update our models:
class CategoryManager(models.Manager):
def get_by_natural_key(self, title):
return self.get(title=title)
class Category(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False,
)
title = models.CharField(max_length=255, unique=True)
objects = CategoryManager()
def __str__(self):
return self.title
def natural_key(self):
return (self.title,)
class PostManager(models.Manager):
def get_by_natural_key(self, title, category_nk):
category = Category.objects.get_by_natural_key(*category_nk)
return self.get(title=title, category=category)
class Post(models.Model):
id = models.UUIDField(
primary_key=True,
default=uuid.uuid4,
editable=False,
)
category = models.ForeignKey(
to='Category',
on_delete=models.CASCADE,
related_name='posts',
)
title = models.CharField(max_length=255)
author = models.ForeignKey(
to=settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name='posts',
)
objects = PostManager()
def __str__(self):
return self.title
def natural_key(self):
return (self.title, self.category.natural_key())
Serialization / Deserialization
natural_key
is used to define a key (more humane) instead of ID
presentation. This method should return a tuple that uniquely
identifies an instance of your model using fields other than the primary
key. It is used in serialization process.
get_by_natural_key
is used to retrieve an instance of your model using the
natural key fields which should be implemented in model’s manager. It is used
in deserialization process.
In our example, author
field is a foreign key to User
model and User
model is already shipped with this strategy. So, if we call the dumpdata
:
python manage.py dumpdata coders.post --indent 4 --natural-foreign --natural-primary
The result looks like this:
[
{
"model": "coders.post",
"fields": {
"category": [
"JavaScript"
],
"title": "Asynchronous JavaScript",
"author": [
"flatliners"
]
}
},
{
"model": "coders.post",
"fields": {
"category": [
"Perl"
],
"title": "Text Processing with Perl",
"author": [
"yesimfo"
]
}
},
{
"model": "coders.post",
"fields": {
"category": [
"PHP"
],
"title": "Modern PHP Frameworks",
"author": [
"ezelozy"
]
}
}
]
Now, when Django loads the fixture with loaddata
, it uses natural keys
instead of ID values to find the category
and author
. Pretty amazing,
right?
We can also arrange dependencies during serialization. If we look at Django’s documentation:
class Book(models.Model):
name = models.CharField(max_length=100)
author = models.ForeignKey(Person, on_delete=models.CASCADE)
def natural_key(self):
return (self.name,) + self.author.natural_key()
Book
’s natural key is dependent on Person
’s natural key. Therefore Person
model should be serialized before the Book
model. All we need is to add
natural_key.dependencies
declaration to Book
model:
class Book(models.Model):
name = models.CharField(max_length=100)
author = models.ForeignKey(Person, on_delete=models.CASCADE)
def natural_key(self):
return (self.name,) + self.author.natural_key()
natural_key.dependencies = ['example_app.person']
The serialization order will be as follows: first, the Person
model, then the
Book
model will be serialized. Thanks to natural_key.dependencies
Finally, if you frequently need fixtures and perform load/dump operations like I do, you can confidently transfer data from one place to another without struggling with integrity errors. You can prepare fresh initial data for your project, and you can even quickly write and load these fixtures manually.