Custom field accessor database performance #2614

Closed
opened 2025-12-29 18:20:25 +01:00 by adam · 3 comments
Owner

Originally created by @paravoid on GitHub (May 14, 2019).

Environment

  • Python version: 3.5.3
  • NetBox version: 2.5.12

Use Case

One of the Netbox reports that we've written, needs to basically look at multiple Devices, and for each of them check various fields against another database for consistency. Some of them are Netbox's built-in fields, some of them are custom fields. Basically, semi-realistic example:

def test_devices_match(self):
    devices = Device.objects.all()  # or filter(...)
    for device in devices:
        other = fetch_from_other_data_source(device.serial)

        if device.asset_tag != other["asset_tag"]:
            self.log_failure(device, "Asset tag does not match")

        if device.cf()["ticket"] != other["ticket"]:
            self.log_failure(device, "Ticket does not match")

Observed behavior

Netbox issues hundreds of database queries, making this inefficient, slow and tough on the database. The .cf() accessor seems to be the culprit, as it's making a lot of queries for each of the ContentTypes etc., for each of those Device objects.

Proposed behavior

We've worked around that, with this piece of code:

devices = devices.prefetch_related("custom_field_values__field")
[...]
    netbox_ticket = None
    for cfv in device.custom_field_values.all():
        if cfv.field.name == "ticket":
            netbox_ticket = cfv.value
            break
[...]

This reduces the hotpath queries to just 1 big one, making this code path about 20-30x more efficient.

I think this could be generalized, and potentially replace the .cf() accessor.

I'm unaware of the history of .cf(), and it is a complicated piece of code, so I'm a bit hesitant diving in and submitting a PR there. Would love some guidance :)

Originally created by @paravoid on GitHub (May 14, 2019). ### Environment * Python version: 3.5.3 * NetBox version: 2.5.12 ### Use Case One of the Netbox reports that we've written, needs to basically look at multiple Devices, and for each of them check various fields against another database for consistency. Some of them are Netbox's built-in fields, some of them are custom fields. Basically, semi-realistic example: ```python def test_devices_match(self): devices = Device.objects.all() # or filter(...) for device in devices: other = fetch_from_other_data_source(device.serial) if device.asset_tag != other["asset_tag"]: self.log_failure(device, "Asset tag does not match") if device.cf()["ticket"] != other["ticket"]: self.log_failure(device, "Ticket does not match") ``` ### Observed behavior Netbox issues hundreds of database queries, making this inefficient, slow and tough on the database. The `.cf()` accessor seems to be the culprit, as it's making a lot of queries for each of the ContentTypes etc., for each of those Device objects. ### Proposed behavior We've worked around that, with this piece of code: ```python devices = devices.prefetch_related("custom_field_values__field") [...] netbox_ticket = None for cfv in device.custom_field_values.all(): if cfv.field.name == "ticket": netbox_ticket = cfv.value break [...] ``` This reduces the hotpath queries to just 1 big one, making this code path about 20-30x more efficient. I think this could be generalized, and potentially replace the `.cf()` accessor. I'm unaware of the history of `.cf()`, and it is a complicated piece of code, so I'm a bit hesitant diving in and submitting a PR there. Would love some guidance :)
adam added the status: acceptedtype: feature labels 2025-12-29 18:20:25 +01:00
adam closed this issue 2025-12-29 18:20:25 +01:00
Author
Owner

@jeremystretch commented on GitHub (May 17, 2019):

Related to #3190

@jeremystretch commented on GitHub (May 17, 2019): Related to #3190
Author
Owner

@jeremystretch commented on GitHub (May 29, 2019):

First, I should point out that cf was recently changed from a method to a property in #3190. Just something to be aware of with the upcoming v2.5.13 release.

There are a few opportunities for improvement here. First, we have this line in get_custom_fields():

fields = CustomField.objects.filter(obj_type=content_type)

Here, we're retrieving all applicable custom fields for the current instance. (This is needed to ensure we also represent fields which have no value assigned.) This line is called each time cf is accessed: not ideal. We should take a look at caching these similar to how ContentTypes are cached, however this will necessitate introducing a new manager for custom field models. This is a large enough undertaking to warrant its own issue.

Further down, we have this line:

values = CustomFieldValue.objects.filter(obj_type=content_type, obj_id=self.pk).select_related('field')

This forfeits any benefit from having pre-fetched values on the parent queryset. We should be referencing self.custom_field_values.all() instead.

And finally there's the cf property itself, which is currently calling get_custom_fields() on each access:

return {field.name: value for field, value in self.get_custom_fields().items()}

This means that accessing e.g. obj.cf.foo and obj.cf.bar will result in two separate queries. Instead, we can can cache the results from the first call to get_custom_fields() on the instance, and use that for successive references.

@jeremystretch commented on GitHub (May 29, 2019): First, I should point out that `cf` was recently changed from a method to a property in #3190. Just something to be aware of with the upcoming v2.5.13 release. There are a few opportunities for improvement here. First, we have this line in `get_custom_fields()`: ``` fields = CustomField.objects.filter(obj_type=content_type) ``` Here, we're retrieving all applicable custom fields for the current instance. (This is needed to ensure we also represent fields which have no value assigned.) This line is called each time `cf` is accessed: not ideal. We should take a look at caching these similar to [how ContentTypes are cached](https://github.com/django/django/blob/master/django/contrib/contenttypes/models.py#L34), however this will necessitate introducing a new manager for custom field models. This is a large enough undertaking to warrant its own issue. Further down, we have this line: ``` values = CustomFieldValue.objects.filter(obj_type=content_type, obj_id=self.pk).select_related('field') ``` This forfeits any benefit from having pre-fetched values on the parent queryset. We should be referencing `self.custom_field_values.all()` instead. And finally there's the `cf` property itself, which is currently calling `get_custom_fields()` on each access: ``` return {field.name: value for field, value in self.get_custom_fields().items()} ``` This means that accessing e.g. `obj.cf.foo` and `obj.cf.bar` will result in two separate queries. Instead, we can can cache the results from the first call to `get_custom_fields()` on the instance, and use that for successive references.
Author
Owner

@jeremystretch commented on GitHub (May 29, 2019):

I've pushed 823257c to address the last two points above, and opened #3226 for the first one.

@jeremystretch commented on GitHub (May 29, 2019): I've pushed 823257c to address the last two points above, and opened #3226 for the first one.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#2614