Remove NaturalOrderingManager #3096

Closed
opened 2025-12-29 18:25:34 +01:00 by adam · 12 comments
Owner

Originally created by @jeremystretch on GitHub (Dec 27, 2019).

Originally assigned to: @jeremystretch on GitHub.

Proposed Changes

Replace NaturalOrderingManager with a simpler and more efficient approach: ordering first by string length and then by string. This article provides a concise example.

Justification

NaturalOrderingManager was introduced some time ago to effect the natural ordering of various models. (For example, ensure that Router10 appears in a list before Router2; the default alphabetic ordering does not do this). It works by splitting a field into its leading integers (if any), middle part, and trailing integers (if any), and ordering the three discrete values independently.

Although this approach works, it incurs significant performance penalty, which is visible when inspecting SQL queries. Casual testing with the Device model saw a drop from ~66ms to ~7ms per query using the length-based approach. Further, ordering is removed entirely during COUNT() queries as we are no longer modifying the default queryset returned by the manager.

I believe the only drawback to this approach is that we'll lose natural ordering on integer prefixes, which seems reasonable given that it's not a very common requirement, and ordering on trailing integers is presumably a much more common use case.

Originally created by @jeremystretch on GitHub (Dec 27, 2019). Originally assigned to: @jeremystretch on GitHub. ### Proposed Changes Replace `NaturalOrderingManager` with a simpler and more efficient approach: ordering first by string length and then by string. [This article](https://www.copterlabs.com/natural-sorting-in-mysql/) provides a concise example. ### Justification [`NaturalOrderingManager`](https://github.com/netbox-community/netbox/blob/develop/netbox/utilities/managers.py#L9) was introduced some time ago to effect the natural ordering of various models. (For example, ensure that Router10 appears in a list _before_ Router2; the default alphabetic ordering does not do this). It works by splitting a field into its leading integers (if any), middle part, and trailing integers (if any), and ordering the three discrete values independently. Although this approach works, it incurs significant performance penalty, which is visible when inspecting SQL queries. Casual testing with the Device model saw a drop from ~66ms to ~7ms per query using the length-based approach. Further, ordering is removed entirely during `COUNT()` queries as we are no longer modifying the default queryset returned by the manager. I believe the only drawback to this approach is that we'll lose natural ordering on integer prefixes, which seems reasonable given that it's not a very common requirement, and ordering on trailing integers is presumably a much more common use case.
adam added the status: acceptedtype: feature labels 2025-12-29 18:25:34 +01:00
adam closed this issue 2025-12-29 18:25:34 +01:00
Author
Owner

@steffann commented on GitHub (Dec 27, 2019):

I like the idea, but it doesn't work in some common cases. For example:

sander=# select * from ordering_test1 order by length(alnum), alnum;
   alnum    |  num
------------+--------
 ge-1/0/2   |  10002
 ge-2/0/2   |  20002
 ge-0/10/0  |   1000
 ge-1/0/10  |  10010
 ge-1/10/0  |  11000

Besides: Juniper EX and QFX change the interface name based on the type of transceiver, so there will be a mix of xe-0/0/0', ge-0/0/1` etc. That will not be sorted properly.

For Cisco gear with both GigabitEthernet1/1 and TenGigabitEthernet1/1 the new ordering would be nice though :)

@steffann commented on GitHub (Dec 27, 2019): I like the idea, but it doesn't work in some common cases. For example: ``` sander=# select * from ordering_test1 order by length(alnum), alnum; alnum | num ------------+-------- ge-1/0/2 | 10002 ge-2/0/2 | 20002 ge-0/10/0 | 1000 ge-1/0/10 | 10010 ge-1/10/0 | 11000 ``` Besides: Juniper EX and QFX change the interface name based on the type of transceiver, so there will be a mix of `xe-0/0/0', `ge-0/0/1` etc. That will not be sorted properly. For Cisco gear with both `GigabitEthernet1/1` and `TenGigabitEthernet1/1` the new ordering would be nice though :)
Author
Owner

@steffann commented on GitHub (Dec 27, 2019):

How about creating an extra field called _sortable_name that transforms the interface name to something the database can easily sort on?

For example left-padding every sequence of integers to a length of 4 or 5 digits (anybody has interface numbers >99999?). Or an ArrayField like:

sander=# select * from ordering_test1 order by arr;
   alnum   |  num  |   arr
-----------+-------+----------
 ge-0/0    |     0 | {0,0}
 ge-0/10/0 |  1000 | {0,10,0}
 ge-1/0/2  | 10002 | {1,0,2}
 ge-1/0/10 | 10010 | {1,0,10}
 ge-1/10/0 | 11000 | {1,10,0}
 ge-2/0/2  | 20002 | {2,0,2}
 ge-3/0    |  3000 | {3,0}

It can be automatically populated on save() and after that sorting on it wouldn't cost any performance.

@steffann commented on GitHub (Dec 27, 2019): How about creating an extra field called `_sortable_name` that transforms the interface name to something the database can easily sort on? For example left-padding every sequence of integers to a length of 4 or 5 digits (anybody has interface numbers >99999?). Or an ArrayField like: ``` sander=# select * from ordering_test1 order by arr; alnum | num | arr -----------+-------+---------- ge-0/0 | 0 | {0,0} ge-0/10/0 | 1000 | {0,10,0} ge-1/0/2 | 10002 | {1,0,2} ge-1/0/10 | 10010 | {1,0,10} ge-1/10/0 | 11000 | {1,10,0} ge-2/0/2 | 20002 | {2,0,2} ge-3/0 | 3000 | {3,0} ``` It can be automatically populated on `save()` and after that sorting on it wouldn't cost any performance.
Author
Owner

@jeremystretch commented on GitHub (Dec 27, 2019):

NaturalOrderingManager isn't used for interfaces; the Interface model has its own jumble of regex that it uses (see #3097). This is for generic natural ordering.

@jeremystretch commented on GitHub (Dec 27, 2019): NaturalOrderingManager isn't used for interfaces; the Interface model has its own jumble of regex that it uses (see #3097). This is for generic natural ordering.
Author
Owner

@steffann commented on GitHub (Dec 27, 2019):

/me was confused. Please ignore 🙂

@steffann commented on GitHub (Dec 27, 2019): /me was confused. Please ignore 🙂
Author
Owner

@jeremystretch commented on GitHub (Dec 27, 2019):

Come up with a solution for #3097 and all is forgiven! 😆

@jeremystretch commented on GitHub (Dec 27, 2019): Come up with a solution for #3097 and all is forgiven! :laughing:
Author
Owner

@jeremystretch commented on GitHub (Jan 3, 2020):

I have the code for this done in the 3799-natural-ordering branch. However, I've run into a suspected bug in Django that causes an exception when ordering by a ForeignKey to a model which orders by a function (e.g. Length()). I've refrained from submitting a PR since the code as it currently stands breaks several views.

This is on hold until I hear back regarding the bug report linked above. Worst case, it should be possible to work around this by cooking up a custom wrapper that provides a __getitem__ method to avoid the exception.

@jeremystretch commented on GitHub (Jan 3, 2020): I have the code for this done in the `3799-natural-ordering` branch. However, I've run into a [suspected bug in Django](https://code.djangoproject.com/ticket/31137) that causes an exception when ordering by a ForeignKey to a model which orders by a function (e.g. `Length()`). I've refrained from submitting a PR since the code as it currently stands breaks several views. This is on hold until I hear back regarding the bug report linked above. Worst case, it _should_ be possible to work around this by cooking up a custom wrapper that provides a `__getitem__` method to avoid the exception.
Author
Owner

@jeremystretch commented on GitHub (Jan 3, 2020):

After poking at this some more, it unfortunately doesn't seem that we can work around it with a wrapper. However, we can fall back to referencing each parent model's ordering explicitly. For instance:

    ordering = (Length('device__name'), 'device__name', Length('name'), 'name')

However, great would be needed to ensure the child's ordering for the parent never deviates from the parent's native ordering.

@jeremystretch commented on GitHub (Jan 3, 2020): After poking at this some more, it unfortunately doesn't seem that we can work around it with a wrapper. However, we can fall back to referencing each parent model's ordering explicitly. For instance: ```python ordering = (Length('device__name'), 'device__name', Length('name'), 'name') ``` However, great would be needed to ensure the child's ordering for the parent never deviates from the parent's native ordering.
Author
Owner

@jeremystretch commented on GitHub (Jan 6, 2020):

After some experimentation, I haven't been able to come up with a suitable workaround. This is a confirmed bug that has been resolved in Django 3.0 but is not eligible for back-porting to Django 2.2.

Marking this as blocked by #3848.

@jeremystretch commented on GitHub (Jan 6, 2020): After some experimentation, I haven't been able to come up with a suitable workaround. This is a [confirmed bug](https://code.djangoproject.com/ticket/30557) that has been resolved in Django 3.0 but is not eligible for back-porting to Django 2.2. Marking this as blocked by #3848.
Author
Owner

@candlerb commented on GitHub (Jan 28, 2020):

(For example, ensure that Router10 appears in a list before Router2; the default alphabetic ordering does not do this)

Presumably this should say "after" not "before".

However, unless I've missed something, I don't think that sorting by length and then by text works at all. Consider the following example:

  • bar1
  • foo1
  • bar10
  • foo10
  • switch1
  • firewall1

That list is sorted by length first, then by text. However, I would expect "firewall1" to become before "switch1", and also before "foo". I would also expect "bar10" to come before "foo1".

To optimise sorting without explicitly splitting the device name into two parts then index expressions are probably the best approach. But I think you'd need an expression which separates the input into (text part, number part) as a composite type rather than just using the length.

@candlerb commented on GitHub (Jan 28, 2020): > (For example, ensure that Router10 appears in a list _before_ Router2; the default alphabetic ordering does not do this) Presumably this should say "after" not "before". However, unless I've missed something, I don't think that sorting by length and then by text works at all. Consider the following example: * `bar1` * `foo1` * `bar10` * `foo10` * `switch1` * `firewall1` That list is sorted by length first, then by text. However, I would expect "firewall1" to become before "switch1", and also before "foo<any>". I would also expect "bar10" to come before "foo1". To optimise sorting without explicitly splitting the device name into two parts then [index expressions](https://www.postgresql.org/docs/9.6/indexes-expressional.html) are probably the best approach. But I think you'd need an expression which separates the input into (text part, number part) as a [composite type](https://www.postgresql.org/docs/9.6/rowtypes.html) rather than just using the length.
Author
Owner

@jeremystretch commented on GitHub (Jan 28, 2020):

I don't recall exactly what I was doing earlier, but it seemed to work at the time. It's likely I didn't have sufficiently diverse test data.

This should be a common enough problem that I'd expect to find plenty of existing solutions for reference. Might just need to dig a bit more.

@jeremystretch commented on GitHub (Jan 28, 2020): I don't recall exactly what I was doing earlier, but it seemed to work at the time. It's likely I didn't have sufficiently diverse test data. This should be a common enough problem that I'd expect to find plenty of existing solutions for reference. Might just need to dig a bit more.
Author
Owner

@jeremystretch commented on GitHub (Feb 4, 2020):

After poking at this some more, I think our best chance at an efficient, maintainable solution will be to normalize the sort field at write time and store that value in a separate column. For example, Router12 would become router00000012, which would get stored in a field separate from name.

While this is trivial to achieve inside a model's save() method, I would like to preserve create() and bulk_create() functionality to allow for better performance when populating test data. I haven't tried it yet, but we should be able to leverage pre_save() on the sort field to populate the normalized value on demand.

I envision something like this:

class NaturalSortField(CharField):
    def pre_save(model_instance, add):
        return naturalize(getattr(model_instance, self.source_field))

class Device(Model):
    name = models.CharField(max_length=64)
    _sort = models.NaturalSortField(source_field='name')

This would require a one-time migration to calculate all normalized values in bulk, but that should not be a disruptive process (we've performed similar operations in prior releases).

@jeremystretch commented on GitHub (Feb 4, 2020): After poking at this some more, I think our best chance at an efficient, maintainable solution will be to normalize the sort field at write time and store that value in a separate column. For example, `Router12` would become `router00000012`, which would get stored in a field separate from `name`. While this is trivial to achieve inside a model's `save()` method, I would like to preserve `create()` and `bulk_create()` functionality to allow for better performance when populating test data. I haven't tried it yet, but we should be able to leverage [`pre_save()`](https://docs.djangoproject.com/en/stable/ref/models/fields/#django.db.models.Field.pre_save) on the sort field to populate the normalized value on demand. I envision something like this: ```python class NaturalSortField(CharField): def pre_save(model_instance, add): return naturalize(getattr(model_instance, self.source_field)) class Device(Model): name = models.CharField(max_length=64) _sort = models.NaturalSortField(source_field='name') ``` This would require a one-time migration to calculate all normalized values in bulk, but that should not be a disruptive process (we've performed similar operations in prior releases).
Author
Owner

@jeremystretch commented on GitHub (Feb 7, 2020):

Testing of the initial implementation is very promising:

$ git checkout develop
$ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null
real	0m8.958s
user	0m0.005s
sys	0m0.005s
$ git checkout 3799-natural-ordering-field
$ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null
real	0m0.896s
user	0m0.006s
sys	0m0.003s

I was also able to fully replicate the existing ordering logic (at least according to the tests we have in place).

@jeremystretch commented on GitHub (Feb 7, 2020): Testing of the initial implementation is _very_ promising: ``` $ git checkout develop $ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null real 0m8.958s user 0m0.005s sys 0m0.005s $ git checkout 3799-natural-ordering-field $ time curl -s http://localhost:8000/api/dcim/interfaces/ > /dev/null real 0m0.896s user 0m0.006s sys 0m0.003s ``` I was also able to fully replicate the existing ordering logic (at least according to the tests we have in place).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#3096