Applying natural sort to a Django CharField with PostgreSQL
When sorting strings, databases typically use lexicographical (alphabetical) ordering, which can produce unexpected results when dealing with strings that contain numbers.
Consider the following Django model:
from django.db import models
class Chapter(models.Model):
title = models.CharField(max_length=255)
If you have chapters with titles like "Chapter 1"
, "Chapter 10"
, "Chapter 2"
, the default Django ordering will sort them alphabetically:
>>> Chapter.objects.order_by('title')
<QuerySet [<Chapter: Chapter 1>, <Chapter: Chapter 10>, <Chapter: Chapter 2>]>
Natural sort order addresses this by treating numeric parts within strings as numbers rather than text, producing the more intuitive order: "Chapter 1"
, "Chapter 2"
, "Chapter 10"
.
Natural sort in PostgreSQL
Unfortunately, Django doesn’t have a built-in method to apply natural sort to a queryset, as this is typically not provided by the supported databases. You could use a custom function with sorted()
in Python, but if you’re using PostgreSQL there is a way to apply this sort at the database level using the collation feature.
What is a collation?
A collation is a set of rules for comparing and sorting strings. It is used to determine the order of strings in a database. The default collations that come with a typical PostgreSQL installation do not have natural sort, so we need to create a custom collation.
Creating a new collation
The CREATE COLLATION
statement is used to create a new collation:
CREATE COLLATION "natural" (provider = icu, locale = 'en-u-kn-true');
"natural"
is the name of this collation; you can use any descriptive name.provider
specifies the library providing the sorting rules. We must useicu
(International Components for Unicode) because it supports the natural sort order, and the defaultlibc
provider does not.locale
specifies the sorting rules from ICU to apply, using the BCP 47 standard.'en-u-kn-true'
:en
is the primary subtag and specifies the language,-u-
is the extension subtag to extend the language with specific behavior, and thekn-true
attribute enables numeric ordering, which treats numbers in strings as numeric values for sorting.
Instead of executing this SQL statement directly, we can use a Django migration to create the collation.
Django migration with CreateCollation
operation
-
Create an empty migration file with a descriptive name:
$ python manage.py makemigrations --empty --name create_collation_for_natural_sort
-
Add the CreateCollation operation to the migration file:
from django.contrib.postgres.operations import CreateCollation class Migration(migrations.Migration): ... operations = [ CreateCollation( name="natural", provider="icu", locale="en-u-kn-true", ), ..., ]
-
Run the migration:
$ python manage.py migrate
Using a collation to sort in Django
You can use the custom collation by using the database function Collate()
to order a queryset:
>>> from django.db.models.functions import Collate
>>> Chapter.objects.order_by(Collate("title", "natural"))
<QuerySet [<Chapter: Chapter 1>, <Chapter: Chapter 2>, <Chapter: Chapter 10>]>
Alternatively, you can set db_collation=
argument on the CharField
to apply the collation to the field:
class Chapter(models.Model):
title = models.CharField(max_length=255, db_collation="natural")
Don’t forget to create and run the migration for this model change. Once applied, the natural sort order will be used automatically whenever you sort by the title
field:
>>> Chapter.objects.order_by("title")
<QuerySet [<Chapter: Chapter 1>, <Chapter: Chapter 2>, <Chapter: Chapter 10>]>