mirror of
https://github.com/netbox-community/netbox.git
synced 2026-01-11 21:10:29 +01:00
Improve Performance of Generating Config Contexts #3624
Closed
opened 2025-12-29 18:30:13 +01:00 by adam
·
17 comments
No Branch/Tag Specified
main
update-changelog-comments-docs
feature-removal-issue-type
20911-dropdown
20239-plugin-menu-classes-mutable-state
21097-graphql-id-lookups
feature
fix_module_substitution
20923-dcim-templates
20044-elevation-stuck-lightmode
feature-ip-prefix-link
v4.5-beta1-release
20068-import-moduletype-attrs
20766-fix-german-translation-code-literals
20378-del-script
7604-filter-modifiers-v3
circuit-swap
12318-case-insensitive-uniqueness
20637-improve-device-q-filter
20660-script-load
19724-graphql
20614-update-ruff
14884-script
02496-max-page
19720-macaddress-interface-generic-relation
19408-circuit-terminations-export-templates
20203-openapi-check
fix-19669-api-image-download
7604-filter-modifiers
19275-fixes-interface-bulk-edit
fix-17794-get_field_value_return_list
11507-show-aggregate-and-rir-on-api
9583-add_column_specific_search_field_to_tables
v4.5.0
v4.4.10
v4.4.9
v4.5.0-beta1
v4.4.8
v4.4.7
v4.4.6
v4.4.5
v4.4.4
v4.4.3
v4.4.2
v4.4.1
v4.4.0
v4.3.7
v4.4.0-beta1
v4.3.6
v4.3.5
v4.3.4
v4.3.3
v4.3.2
v4.3.1
v4.3.0
v4.2.9
v4.3.0-beta2
v4.2.8
v4.3.0-beta1
v4.2.7
v4.2.6
v4.2.5
v4.2.4
v4.2.3
v4.2.2
v4.2.1
v4.2.0
v4.1.11
v4.1.10
v4.1.9
v4.1.8
v4.2-beta1
v4.1.7
v4.1.6
v4.1.5
v4.1.4
v4.1.3
v4.1.2
v4.1.1
v4.1.0
v4.0.11
v4.0.10
v4.0.9
v4.1-beta1
v4.0.8
v4.0.7
v4.0.6
v4.0.5
v4.0.3
v4.0.2
v4.0.1
v4.0.0
v3.7.8
v3.7.7
v4.0-beta2
v3.7.6
v3.7.5
v4.0-beta1
v3.7.4
v3.7.3
v3.7.2
v3.7.1
v3.7.0
v3.6.9
v3.6.8
v3.6.7
v3.7-beta1
v3.6.6
v3.6.5
v3.6.4
v3.6.3
v3.6.2
v3.6.1
v3.6.0
v3.5.9
v3.6-beta2
v3.5.8
v3.6-beta1
v3.5.7
v3.5.6
v3.5.5
v3.5.4
v3.5.3
v3.5.2
v3.5.1
v3.5.0
v3.4.10
v3.4.9
v3.5-beta2
v3.4.8
v3.5-beta1
v3.4.7
v3.4.6
v3.4.5
v3.4.4
v3.4.3
v3.4.2
v3.4.1
v3.4.0
v3.3.10
v3.3.9
v3.4-beta1
v3.3.8
v3.3.7
v3.3.6
v3.3.5
v3.3.4
v3.3.3
v3.3.2
v3.3.1
v3.3.0
v3.2.9
v3.2.8
v3.3-beta2
v3.2.7
v3.3-beta1
v3.2.6
v3.2.5
v3.2.4
v3.2.3
v3.2.2
v3.2.1
v3.2.0
v3.1.11
v3.1.10
v3.2-beta2
v3.1.9
v3.2-beta1
v3.1.8
v3.1.7
v3.1.6
v3.1.5
v3.1.4
v3.1.3
v3.1.2
v3.1.1
v3.1.0
v3.0.12
v3.0.11
v3.0.10
v3.1-beta1
v3.0.9
v3.0.8
v3.0.7
v3.0.6
v3.0.5
v3.0.4
v3.0.3
v3.0.2
v3.0.1
v3.0.0
v2.11.12
v3.0-beta2
v2.11.11
v2.11.10
v3.0-beta1
v2.11.9
v2.11.8
v2.11.7
v2.11.6
v2.11.5
v2.11.4
v2.11.3
v2.11.2
v2.11.1
v2.11.0
v2.10.10
v2.10.9
v2.11-beta1
v2.10.8
v2.10.7
v2.10.6
v2.10.5
v2.10.4
v2.10.3
v2.10.2
v2.10.1
v2.10.0
v2.9.11
v2.10-beta2
v2.9.10
v2.10-beta1
v2.9.9
v2.9.8
v2.9.7
v2.9.6
v2.9.5
v2.9.4
v2.9.3
v2.9.2
v2.9.1
v2.9.0
v2.9-beta2
v2.8.9
v2.9-beta1
v2.8.8
v2.8.7
v2.8.6
v2.8.5
v2.8.4
v2.8.3
v2.8.2
v2.8.1
v2.8.0
v2.7.12
v2.7.11
v2.7.10
v2.7.9
v2.7.8
v2.7.7
v2.7.6
v2.7.5
v2.7.4
v2.7.3
v2.7.2
v2.7.1
v2.7.0
v2.6.12
v2.6.11
v2.6.10
v2.6.9
v2.7-beta1
Solcon-2020-01-06
v2.6.8
v2.6.7
v2.6.6
v2.6.5
v2.6.4
v2.6.3
v2.6.2
v2.6.1
v2.6.0
v2.5.13
v2.5.12
v2.6-beta1
v2.5.11
v2.5.10
v2.5.9
v2.5.8
v2.5.7
v2.5.6
v2.5.5
v2.5.4
v2.5.3
v2.5.2
v2.5.1
v2.5.0
v2.4.9
v2.5-beta2
v2.4.8
v2.5-beta1
v2.4.7
v2.4.6
v2.4.5
v2.4.4
v2.4.3
v2.4.2
v2.4.1
v2.4.0
v2.3.7
v2.4-beta1
v2.3.6
v2.3.5
v2.3.4
v2.3.3
v2.3.2
v2.3.1
v2.3.0
v2.2.10
v2.3-beta2
v2.2.9
v2.3-beta1
v2.2.8
v2.2.7
v2.2.6
v2.2.5
v2.2.4
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.6
v2.2-beta2
v2.1.5
v2.2-beta1
v2.1.4
v2.1.3
v2.1.2
v2.1.1
v2.1.0
v2.0.10
v2.1-beta1
v2.0.9
v2.0.8
v2.0.7
v2.0.6
v2.0.5
v2.0.4
v2.0.3
v2.0.2
v2.0.1
v2.0.0
v2.0-beta3
v1.9.6
v1.9.5
v2.0-beta2
v1.9.4-r1
v1.9.3
v2.0-beta1
v1.9.2
v1.9.1
v1.9.0-r1
v1.8.4
v1.8.3
v1.8.2
v1.8.1
v1.8.0
v1.7.3
v1.7.2-r1
v1.7.1
v1.7.0
v1.6.3
v1.6.2-r1
v1.6.1-r1
1.6.1
v1.6.0
v1.5.2
v1.5.1
v1.5.0
v1.4.2
v1.4.1
v1.4.0
v1.3.2
v1.3.1
v1.3.0
v1.2.2
v1.2.1
v1.2.0
v1.1.0
v1.0.7-r1
v1.0.7
v1.0.6
v1.0.5
v1.0.4
v1.0.3-r1
v1.0.3
1.0.0
Labels
Clear labels
beta
breaking change
complexity: high
complexity: low
complexity: medium
needs milestone
netbox
pending closure
plugin candidate
pull-request
severity: high
severity: low
severity: medium
status: accepted
status: backlog
status: blocked
status: duplicate
status: needs owner
status: needs triage
status: revisions needed
status: under review
topic: GraphQL
topic: Internationalization
topic: OpenAPI
topic: UI/UX
topic: cabling
topic: event rules
topic: htmx navigation
topic: industrialization
topic: migrations
topic: plugins
topic: scripts
topic: templating
topic: testing
type: bug
type: deprecation
type: documentation
type: feature
type: housekeeping
type: translation
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/netbox#3624
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @dstarner on GitHub (Apr 30, 2020).
Originally assigned to: @lampwins on GitHub.
Environment
Proposed Functionality
Improve the
get_config_contextserializer method that is observed to have slow performance that builds up as the number of fetched devices at a time increases.I'm not sure exactly where the bottleneck is, but it does perform a very complex query that may be the cause of the overhead. I performed some profiling awhile back and noticed that the SQL queries used an acceptable amount of request time, but that it was mostly using CPU execution time. I'm not sure if this is still the case across the board, but it may be worthwhile to profile and optimize whatever calls are made under this
get_config_contextserializer method.Use Case
Fetching 1000 devices by default takes ~60s. Without config context - using
?excludes=config_context- the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. Reducing this performance overhead would make NetBox's usage much more friendly and acceptable.Database Changes
This context is regenerated on every request on via this serialize. This then calls this complex query which either in the query, object generation, or serialization takes an enormous amount of time.
We would need to evaluate the performance of this query / method and determine where the bottleneck is occurring, and how to make it faster.
External Dependencies
N/A
Note that this is a revised version of #4544
@dstarner commented on GitHub (May 1, 2020):
@lampwins do you have any ideas / advice on how we could make these queries more performant? Sadly this is slightly out of my area of expertise.
@DouglasHeriot commented on GitHub (May 8, 2020):
This affects my use of Netbox (via the
netbox-community/ansible_modulesinventory plugin) - I can confirm it also takes me about ~60s to query 1000 devices.If nobody else gets to it first, I might have a go at digging into profiling what's going on here at some point in the next few months.
@tyler-8 commented on GitHub (May 11, 2020):
As @lampwins noted in the previous issue:
@lampwins commented on GitHub (May 11, 2020):
I dove into this a bit over the weekend and I think there are two primary ways we can approach this. The first is coming up with a reasonable way to do the deep merge of context data in psql and the second is doing some sort of annotation on the config context object query to have psql do the mapping for us in bulk.
@kevinreniers commented on GitHub (Jun 4, 2020):
Thanks to @tyler-8 for pointing me to this issue. I've also noticed that redis performance is significantly impacted by this.
We have a number of API clients regularly calling the
/api/dcim/devicesendpoint for various limits and at various offsets. We noticed that the Redis server's CPU usage would spike to near 100% with just two or three simultaneous calls. As suggested in this thread, excluding config_contexts from the response alleviates this problem entirely. As soon as we did that, CPU usage dropped back down to below 0.5%.Might I suggest that this functionality gets disabled by default for now, and that the API docs mention the very significant performance impact of enabling it?
Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call?
@tyler-8 commented on GitHub (Jun 4, 2020):
Might be something to discuss in the Slack channel or Google Group before opening an issue to improve docs.
That was the initial topic that led to this issue. https://github.com/netbox-community/netbox/issues/4544 - the decision was that the logic to generate config contexts needs to be "fixed" first as the way it's currently written invokes multiple DB queries PER device in a query of N-devices, rather than a handful of queries.
@dstarner commented on GitHub (Jun 11, 2020):
What would be the best way to remove these queries? I don't mind working on it a bit, I just don't want to see this issue fall by the wayside as its pretty important to my team to fix.
@danielestevez commented on GitHub (Jun 15, 2020):
Maybe making this
excludes=config_contextthe default option could work as a quickfix for a minor version?This is quite a blocker to upgrade to a newer version of Netbox since there's no way we can control how third party tools use the Netbox API
@zacho112 commented on GitHub (Jun 16, 2020):
Quick fix to implement the
exclude="config_context"across our codebase:Rewrites all filter to include the exclude (or append it), and the all() method to a filter() aswell.
@jeremystretch commented on GitHub (Jul 24, 2020):
Tagging this as
under reviewuntil a specific implementation has been identified.@tyler-8 commented on GitHub (Jul 28, 2020):
I found this snippet that could potentially be molded for this use case - however this isn't doing a deep merge.
@roganartu commented on GitHub (Aug 14, 2020):
This is the explain I get on v2.8.8 for the deep join query mentioned for a single device, fwiw. It doesn't seem particularly bad in isolation, but I guess it's probably executing once per item in the result set.
@maxstr commented on GitHub (Aug 14, 2020):
When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing.
@tyler-8 commented on GitHub (Aug 14, 2020):
I think we have a good understanding of the problem already - the question now is how to change it to a more efficient method. That "20-30s serializing" is actually just performing even more queries for each device.
Executing a separate query (and then calculating the context merger in Python, separately for each device) is what is chewing up so much time. One of the most common API issues is around users pulling the device list for hundreds/thousands of devices at once and having web server timeouts because it takes upwards of 1 minute to return a response - which is an eternity in web terms.
@stale[bot] commented on GitHub (Sep 29, 2020):
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our contributing guide.
@lampwins commented on GitHub (Oct 20, 2020):
I have a solution going in the
4559-config-context-renderingbranch. Basically, I am using a subquery to annotate the device/vm query with the relevant config contexts. The only remaining major hurdle is how to handle the tree of regions because in the current solution we use a separate query from the region instance. It does not appear that django-mptt includes any custom queryset filters that would make this easy.@lampwins commented on GitHub (Oct 25, 2020):
Here is some analysis of my solution.
There are 5000 devices in my test database and 10 config context objects randomly assigned in various objects in the hierarchy. For anyone interested, I have created two NetBox custom scripts to create the devices and config context objects here. Also, I have set
MAX_PAGE_SIZE = 0Baseline, on develop branch with
?limit=0&exclude=config_context:On develop branch with
?limit=0:Finally, on 4559-config-context-rendering branch with
?limit=0:Keep in mind these results are only with the development server. While not entirely scientific, I'll take the 1 sec gain over baseline as a big win ;)
A simple test to ensure we still render the same way:
I have also added a handful of tests to more thoroughly cover the integrity of the rendered data.