Cache the Config Context in the Database #3615

New Issue

adam · 2025-12-29T18:30:10+01:00

adam commented

2025-12-29 18:30:10 +01:00

Originally created by @dstarner on GitHub (Apr 27, 2020).

Environment

Python version: 3.7
NetBox version: 2.7.10

Proposed Functionality

As discussed in Slack, it may be beneficial to cache the generated/merged config context in the database to reduce load times on bulk device list API requests.

Use Case

Fetching 1000 devices with config context returned takes ~60s. Without config context, the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. This context is regenerated on every request. For performance, it would be practical and very performant to pre-cache this generated data in the database.

Database Changes

This would require adding a computed_config_context to the ConfigContextModel that would contain the computed config context. The naming of this field can be up for debate if something better is thought of.

It would contain the pre-cached computed config context, and it would be rebuilt whenever an associated ConfigContext is updated or during the save()/post_save method of a ConfigContextModel.

External Dependencies

NA

Originally created by @dstarner on GitHub (Apr 27, 2020).  ### Environment * Python version: 3.7 * NetBox version: 2.7.10  ### Proposed Functionality [As discussed in Slack](https://networktocode.slack.com/archives/C3DQ6MZ0Q/p1587745271315900?thread_ts=1587743396.300200&cid=C3DQ6MZ0Q), it may be beneficial to cache the generated/merged config context in the database to reduce load times on bulk device list API requests.  ### Use Case Fetching 1000 devices _with_ config context returned takes ~60s. _Without_ config context, the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. This context is regenerated on every request. For performance, it would be practical and very performant to pre-cache this generated data in the database.  ### Database Changes This would require adding a `computed_config_context` to the [`ConfigContextModel`](https://github.com/netbox-community/netbox/blob/develop/netbox/extras/models.py#L841) that would contain the computed config context. The naming of this field can be up for debate if something better is thought of. It would contain the pre-cached computed config context, and it would be rebuilt whenever an associated [`ConfigContext`](https://github.com/netbox-community/netbox/blob/develop/netbox/extras/models.py#L754) is updated or during the `save()`/`post_save` method of a [`ConfigContextModel`](https://github.com/netbox-community/netbox/blob/develop/netbox/extras/models.py#L841).  ### External Dependencies NA

adam added the status: revisions needed label 2025-12-29 18:30:10 +01:00

adam closed this issue

2025-12-29 18:30:10 +01:00

adam commented

2025-12-29 18:30:11 +01:00

@jeremystretch commented on GitHub (Apr 27, 2020):

This proposal needs to be fleshed out more. What is the specific workflow being proposed? E.g. when do config contexts get calculated if not on-demand? How do we detect and resolve discrepancies? Is the calculation being performed in the background? What are the performance implications? Please be specific in your proposal.

@jeremystretch commented on GitHub (Apr 27, 2020): This proposal needs to be fleshed out more. What is the specific workflow being proposed? E.g. when do config contexts get calculated if not on-demand? How do we detect and resolve discrepancies? Is the calculation being performed in the background? What are the performance implications? Please be specific in your proposal.

adam commented

2025-12-29 18:30:11 +01:00

@dstarner commented on GitHub (Apr 27, 2020):

I'll post a small design when I get a moment. If anyone sees this and wants to post their thoughts / comments before I flesh it out, feel free to.

@dstarner commented on GitHub (Apr 27, 2020): I'll post a small design when I get a moment. If anyone sees this and wants to post their thoughts / comments before I flesh it out, feel free to.

adam commented

2025-12-29 18:30:11 +01:00

@ebusto commented on GitHub (Apr 27, 2020):

This sounds great.

We are starting to use config contexts rather extensively, and have a few external services which relentlessly hammer the NetBox API. Eliminating the overhead of rendering the config context on demand would help scalability quite a bit.

@ebusto commented on GitHub (Apr 27, 2020): This sounds great. We are starting to use config contexts rather extensively, and have a few external services which relentlessly hammer the NetBox API. Eliminating the overhead of rendering the config context on demand would help scalability quite a bit.

adam commented

2025-12-29 18:30:11 +01:00

@tyler-8 commented on GitHub (Apr 27, 2020):

I imagine this will look something like:

Save signal on a ConfigContextModel, Device, or VM
RQ Worker picks up task
Call .prerender_config_context() method on all associated Devices/VMs (which is using the .get_config_context() method)
Populate/overwrite computed_config_context field on the device or VM with output

Serializers and views would be updated to show the contents of computed_config_context instead of calling .get_config_context() directly.

@tyler-8 commented on GitHub (Apr 27, 2020): I imagine this will look something like: 1. Save signal on a ConfigContextModel, Device, or VM 2. RQ Worker picks up task 3. Call `.prerender_config_context()` method on all associated Devices/VMs (which is using the `.get_config_context()` method) 4. Populate/overwrite `computed_config_context` field on the device or VM with output Serializers and views would be updated to show the contents of `computed_config_context` instead of calling `.get_config_context()` directly.

adam commented

2025-12-29 18:30:11 +01:00

@ebusto commented on GitHub (Apr 27, 2020):

Would the asynchronous nature of the RQ worker approach cause issues? That is, if I update a few attributes of a device, I'd expect to be able to click save and flip over to the config context tab and see the updated configuration.

Using just signals might be a better approach.

@ebusto commented on GitHub (Apr 27, 2020): Would the asynchronous nature of the RQ worker approach cause issues? That is, if I update a few attributes of a device, I'd expect to be able to click save and flip over to the config context tab and see the updated configuration. Using just signals might be a better approach.

adam commented

2025-12-29 18:30:11 +01:00

@ebusto commented on GitHub (Apr 27, 2020):

I implemented very similar functionality in one of our NetBox plugins, which allows you to group devices (and virtual machines) into services, such as "all cron servers owned by Unix Support" and "all SaltStack masters owned by Network Engineering".

Since manually adding and removing systems to and from services is tedious, it supports contexts, where you can specify criteria that is virtually the same as a configuration context, with the addition of hostname patterns. A system that no longer meets the criteria is automatically removed from the service, and a system that was created or updated and now meets the criteria is automatically added to the service.

In order to keep the performance acceptable, I took the approach of building a query set representing the set of devices that need to be evaluated, and then evaluating each service against the query set, moving all of the work to the database.

When a ServiceContext is created, updated, or deleted, the query set is simply all devices (Device.objects.all()), but only against the specific service that was changed.

For devices that are created or updated, a signal receiver tracks the PK, and constructs the query set from all PKs once the transaction is committed. We often perform bulk updates of hundreds of devices at a time, so this query set can represent a large number of devices.

Our NetBox instance has 50k devices and 10k VMs, and with this approach the overhead is negligible.

@ebusto commented on GitHub (Apr 27, 2020): I implemented very similar functionality in one of our NetBox plugins, which allows you to group devices (and virtual machines) into services, such as "all cron servers owned by Unix Support" and "all SaltStack masters owned by Network Engineering". Since manually adding and removing systems to and from services is tedious, it supports contexts, where you can specify criteria that is virtually the same as a configuration context, with the addition of hostname patterns. A system that no longer meets the criteria is automatically removed from the service, and a system that was created or updated and now meets the criteria is automatically added to the service. In order to keep the performance acceptable, I took the approach of building a query set representing the set of devices that need to be evaluated, and then evaluating each service against the query set, moving all of the work to the database. When a ServiceContext is created, updated, or deleted, the query set is simply all devices (`Device.objects.all()`), but only against the specific service that was changed. For devices that are created or updated, a signal receiver tracks the PK, and constructs the query set from all PKs once the transaction is committed. We often perform bulk updates of hundreds of devices at a time, so this query set can represent a large number of devices. Our NetBox instance has 50k devices and 10k VMs, and with this approach the overhead is negligible.

adam commented

2025-12-29 18:30:11 +01:00

@sdktr commented on GitHub (Apr 27, 2020):

Would the asynchronous nature of the RQ worker approach cause issues? That is, if I update a few attributes of a device, I'd expect to be able to click save and flip over to the config context tab and see the updated configuration.

Using just signals might be a better approach.

Agree on that. We can't trust the async update proces to be ready before the next consumer requests the config_context and needs this update to be reflected in there. But I also agree that we can't wait on a recompute of all affected data on a save.

The update signal on config_context (but also on a lot of other changes like site/region/device that can affect memberships!) should_ imo both

invalidate the queryset (OR: invalidate the computed_config_context on all affected devices (old and new members of the queryset!))
schedule the async task of regenerating

If the config_context of a device is requested, there should be a check on whether a valid computed config context is available. If not we should generate it on request (as current behavior).

I hope we will in the future have a way of getting config_context merging on other entities than devices/vms (e.g. site level). Storing the computed_config_context not with the device/vm but in a seperate table (or cache) would be a future proof way to handle this?

What should be the role of the redis cache vs the database for storing each of these properties (computed configcontext, (in)valid querysets etc)? The computed config context is in fact a cached/optimized copy of data which might make it more suitable for storage in cache. On the other hand storing it as a JSONB datatype in postgres would open up some real cool possibilities on indexing/searching/querying within the context..

@sdktr commented on GitHub (Apr 27, 2020): > Would the asynchronous nature of the RQ worker approach cause issues? That is, if I update a few attributes of a device, I'd expect to be able to click save and flip over to the config context tab and see the updated configuration. > > Using just signals might be a better approach. Agree on that. We can't trust the async update proces to be ready before the next consumer requests the config_context and needs this update to be reflected in there. But I also agree that we can't wait on a recompute of all affected data on a save. The update signal on config_context _(but also on a lot of other changes like site/region/device that can affect memberships!)_ should_ imo both 1. invalidate the queryset (OR: invalidate the computed_config_context on all _affected_ devices (old and new members of the queryset!)) 2. schedule the async task of regenerating If the config_context of a device is requested, there should be a check on whether a _valid_ computed config context is available. If not we should generate it on request (as current behavior). I hope we will in the future have a way of getting config_context merging on other entities than devices/vms (e.g. site level). Storing the `computed_config_context` not with the device/vm but in a seperate table (or cache) would be a future proof way to handle this? What should be the role of the redis cache vs the database for storing each of these properties (computed configcontext, (in)valid querysets etc)? The computed config context is in fact a cached/optimized copy of data which might make it more suitable for storage in cache. On the other hand storing it as a JSONB datatype in postgres would open up some real cool possibilities on indexing/searching/querying _within_ the context..

adam commented

2025-12-29 18:30:11 +01:00

@lampwins commented on GitHub (Apr 27, 2020):

I think this is an overcomplicated solution to the root problem which is we are not efficiently querying the ConfigContext model instances.

The reason this is so inefficient today is that this method gets called on each serialized device instance which in turn calls this complex query logic for every device.

We should look to optimize this before going down the denormalization route.

@lampwins commented on GitHub (Apr 27, 2020): I think this is an overcomplicated solution to the root problem which is we are not efficiently querying the ConfigContext model instances. The reason this is so inefficient today is that [this method](https://github.com/netbox-community/netbox/blob/develop/netbox/dcim/api/serializers.py#L428) gets called on each serialized device instance which in turn calls [this complex query logic](https://github.com/netbox-community/netbox/blob/d8cb58c74653da88d9aade40548b146a9311f5e6/netbox/extras/querysets.py#L24) _for every device_. We should look to optimize this before going down the denormalization route.

adam commented

2025-12-29 18:30:11 +01:00

@jeremystretch commented on GitHub (Apr 29, 2020):

@lampwins Do you want to close this issue in favor of a new one then? I want to make sure we keep the issue proposal consistent with the eventual solution to avoid confusion.

@jeremystretch commented on GitHub (Apr 29, 2020): @lampwins Do you want to close this issue in favor of a new one then? I want to make sure we keep the issue proposal consistent with the eventual solution to avoid confusion.

adam commented

2025-12-29 18:30:11 +01:00

@dstarner commented on GitHub (Apr 30, 2020):

@jeremystretch I do not mind creating the new issue that is tied closer to the query optimization. Feel free to close this one in the meantime

@dstarner commented on GitHub (Apr 30, 2020): @jeremystretch I do not mind creating the new issue that is tied closer to the query optimization. Feel free to close this one in the meantime

adam commented

2025-12-29 18:30:11 +01:00

@dstarner commented on GitHub (Apr 30, 2020):

@lampwins Interestingly enough, when I played around with this, Django debug-toolbar noted that my SQL performance was generally okay, but that it was actually CPU time that was causing the slowness. I'm not sure at what level this CPU slowness is occuring.

@dstarner commented on GitHub (Apr 30, 2020): @lampwins Interestingly enough, when I played around with this, Django debug-toolbar noted that my SQL performance was generally okay, but that it was actually CPU time that was causing the slowness. I'm not sure at what level this CPU slowness is occuring.

adam referenced this issue

2025-12-29 22:22:33 +01:00

[PR #3615] [MERGED] Update examples in webhooks.md #12596

Sign in to join this conversation.

Branches Tags

main

21524-invlaid-paths-exception

21518-cf-decimal-zero

21356-etags

feature

20787-spectacular

21477-extend-graphql-api-filters-for-cables

21331-deprecate-querystring-tag

21304-deprecate-housekeeping-command

21481-facility-id-doesnt-show-in-rack-page

21429-cable-create-add-another-does-not-carry-over-termination

21364-swagger

20442-callable-audit

feature-ip-prefix-link

20923-dcim-templates

20911-dropdown-3

fix_module_substitution

21203-q-attr-denorm

21160-filterset

21118-site

20911-dropdown-2

21102-fix-graphiql-explorer

20044-elevation-stuck-lightmode

v4.5-beta1-release

20068-import-moduletype-attrs

20766-fix-german-translation-code-literals

20378-del-script

7604-filter-modifiers-v3

circuit-swap

12318-case-insensitive-uniqueness

20637-improve-device-q-filter

20660-script-load

19724-graphql

20614-update-ruff

14884-script

02496-max-page

19720-macaddress-interface-generic-relation

19408-circuit-terminations-export-templates

20203-openapi-check

fix-19669-api-image-download

7604-filter-modifiers

19275-fixes-interface-bulk-edit

fix-17794-get_field_value_return_list

11507-show-aggregate-and-rir-on-api

9583-add_column_specific_search_field_to_tables

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/netbox#3615