Improve Performance of Generating Config Contexts #3624

Closed
opened 2025-12-29 18:30:13 +01:00 by adam · 17 comments
Owner

Originally created by @dstarner on GitHub (Apr 30, 2020).

Originally assigned to: @lampwins on GitHub.

Environment

  • Python version: 3.7
  • NetBox version: 2.7.12, but issue exists on <= 2.8.1

Proposed Functionality

Improve the get_config_context serializer method that is observed to have slow performance that builds up as the number of fetched devices at a time increases.

I'm not sure exactly where the bottleneck is, but it does perform a very complex query that may be the cause of the overhead. I performed some profiling awhile back and noticed that the SQL queries used an acceptable amount of request time, but that it was mostly using CPU execution time. I'm not sure if this is still the case across the board, but it may be worthwhile to profile and optimize whatever calls are made under this get_config_context serializer method.

Use Case

Fetching 1000 devices by default takes ~60s. Without config context - using ?excludes=config_context - the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. Reducing this performance overhead would make NetBox's usage much more friendly and acceptable.

Database Changes

This context is regenerated on every request on via this serialize. This then calls this complex query which either in the query, object generation, or serialization takes an enormous amount of time.

We would need to evaluate the performance of this query / method and determine where the bottleneck is occurring, and how to make it faster.

External Dependencies

N/A


Note that this is a revised version of #4544

Originally created by @dstarner on GitHub (Apr 30, 2020). Originally assigned to: @lampwins on GitHub. <!-- NOTE: IF YOUR ISSUE DOES NOT FOLLOW THIS TEMPLATE, IT WILL BE CLOSED. This form is only for proposing specific new features or enhancements. If you have a general idea or question, please post to our mailing list instead of opening an issue: https://groups.google.com/forum/#!forum/netbox-discuss NOTE: Due to an excessive backlog of feature requests, we are not currently accepting any proposals which significantly extend NetBox's feature scope. Please describe the environment in which you are running NetBox. Be sure that you are running an unmodified instance of the latest stable release before submitting a bug report. --> ### Environment * Python version: 3.7 * NetBox version: 2.7.12, but issue exists on <= 2.8.1 <!-- Describe in detail the new functionality you are proposing. Include any specific changes to work flows, data models, or the user interface. --> ### Proposed Functionality Improve the [`get_config_context` serializer method](https://github.com/netbox-community/netbox/blob/develop/netbox/dcim/api/serializers.py#L428) that is observed to have slow performance that builds up as the number of fetched devices at a time increases. I'm not sure exactly where the bottleneck is, but it does perform a very complex query that may be the cause of the overhead. I performed some profiling awhile back and noticed that the SQL queries used an acceptable amount of request time, but that it was mostly using CPU execution time. I'm not sure if this is still the case across the board, but it may be worthwhile to profile and optimize whatever calls are made under this `get_config_context` serializer method. <!-- Convey an example use case for your proposed feature. Write from the perspective of a NetBox user who would benefit from the proposed functionality and describe how. ---> ### Use Case Fetching 1000 devices by default takes ~60s. _Without_ config context - using `?excludes=config_context` - the response takes ~5 seconds. This large overhead was determined to be from the generation & merging of the config contexts that exist on devices / virtual machines. Reducing this performance overhead would make NetBox's usage much more friendly and acceptable. <!-- Note any changes to the database schema necessary to support the new feature. For example, does the proposal require adding a new model or field? (Not all new features require database changes.) ---> ### Database Changes This context is regenerated on every request on via [this serialize](https://github.com/netbox-community/netbox/blob/develop/netbox/dcim/api/serializers.py#L428). This then calls [this complex query](https://github.com/netbox-community/netbox/blob/d8cb58c74653da88d9aade40548b146a9311f5e6/netbox/extras/querysets.py#L24) which either in the query, object generation, or serialization takes an enormous amount of time. We would need to evaluate the performance of this query / method and determine where the bottleneck is occurring, and how to make it faster. <!-- List any new dependencies on external libraries or services that this new feature would introduce. For example, does the proposal require the installation of a new Python package? (Not all new features introduce new dependencies.) --> ### External Dependencies N/A --- Note that this is a revised version of #4544
adam added the status: acceptedtype: feature labels 2025-12-29 18:30:13 +01:00
adam closed this issue 2025-12-29 18:30:13 +01:00
Author
Owner

@dstarner commented on GitHub (May 1, 2020):

@lampwins do you have any ideas / advice on how we could make these queries more performant? Sadly this is slightly out of my area of expertise.

@dstarner commented on GitHub (May 1, 2020): @lampwins do you have any ideas / advice on how we could make these queries more performant? Sadly this is slightly out of my area of expertise.
Author
Owner

@DouglasHeriot commented on GitHub (May 8, 2020):

This affects my use of Netbox (via the netbox-community/ansible_modules inventory plugin) - I can confirm it also takes me about ~60s to query 1000 devices.

If nobody else gets to it first, I might have a go at digging into profiling what's going on here at some point in the next few months.

@DouglasHeriot commented on GitHub (May 8, 2020): This affects my use of Netbox (via the `netbox-community/ansible_modules` inventory plugin) - I can confirm it also takes me about ~60s to query 1000 devices. If nobody else gets to it first, I might have a go at digging into profiling what's going on here at some point in the next few months.
Author
Owner

@tyler-8 commented on GitHub (May 11, 2020):

As @lampwins noted in the previous issue:

The reason this is so inefficient today is that this method gets called on each serialized device instance which in turn calls this complex query logic for every device.

@tyler-8 commented on GitHub (May 11, 2020): As @lampwins noted in the previous issue: > The reason this is so inefficient today is that [this method](https://github.com/netbox-community/netbox/blob/91e46ceb77c4907347e0af453af1cad8b1a32dba/netbox/dcim/api/serializers.py#L428) gets called on each serialized device instance which in turn calls [this complex query logic](https://github.com/netbox-community/netbox/blob/d8cb58c74653da88d9aade40548b146a9311f5e6/netbox/extras/querysets.py#L24) _for every device_.
Author
Owner

@lampwins commented on GitHub (May 11, 2020):

I dove into this a bit over the weekend and I think there are two primary ways we can approach this. The first is coming up with a reasonable way to do the deep merge of context data in psql and the second is doing some sort of annotation on the config context object query to have psql do the mapping for us in bulk.

@lampwins commented on GitHub (May 11, 2020): I dove into this a bit over the weekend and I think there are two primary ways we can approach this. The first is coming up with a reasonable way to do the deep merge of context data in psql and the second is doing some sort of annotation on the config context object query to have psql do the mapping for us in bulk.
Author
Owner

@kevinreniers commented on GitHub (Jun 4, 2020):

Thanks to @tyler-8 for pointing me to this issue. I've also noticed that redis performance is significantly impacted by this.

We have a number of API clients regularly calling the /api/dcim/devices endpoint for various limits and at various offsets. We noticed that the Redis server's CPU usage would spike to near 100% with just two or three simultaneous calls. As suggested in this thread, excluding config_contexts from the response alleviates this problem entirely. As soon as we did that, CPU usage dropped back down to below 0.5%.

Might I suggest that this functionality gets disabled by default for now, and that the API docs mention the very significant performance impact of enabling it?

Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call?

@kevinreniers commented on GitHub (Jun 4, 2020): Thanks to @tyler-8 for pointing me to this issue. I've also noticed that redis performance is significantly impacted by this. We have a number of API clients regularly calling the `/api/dcim/devices` endpoint for various limits and at various offsets. We noticed that the Redis server's CPU usage would spike to near 100% with just two or three simultaneous calls. As suggested in this thread, excluding config_contexts from the response alleviates this problem entirely. As soon as we did that, CPU usage dropped back down to below 0.5%. Might I suggest that this functionality gets disabled *by default* for now, and that the API docs mention the very significant performance impact of enabling it? Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call?
Author
Owner

@tyler-8 commented on GitHub (Jun 4, 2020):

Might I suggest that this functionality gets disabled by default for now, and that the API docs mention the very significant performance impact of enabling it?

Might be something to discuss in the Slack channel or Google Group before opening an issue to improve docs.

Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call?

That was the initial topic that led to this issue. https://github.com/netbox-community/netbox/issues/4544 - the decision was that the logic to generate config contexts needs to be "fixed" first as the way it's currently written invokes multiple DB queries PER device in a query of N-devices, rather than a handful of queries.

@tyler-8 commented on GitHub (Jun 4, 2020): > Might I suggest that this functionality gets disabled _by default_ for now, and that the API docs mention the very significant performance impact of enabling it? Might be something to discuss in the Slack channel or Google Group before opening an issue to improve docs. > Aside from that, couldn't this problem be solved by regenerating the config_contexts in a background task when an action happens that requires it to change, rather than generating it on-demand for every API call? That was the initial topic that led to this issue. https://github.com/netbox-community/netbox/issues/4544 - the decision was that the logic to generate config contexts needs to be "fixed" first as the way it's currently written invokes multiple DB queries PER device in a query of N-devices, rather than a handful of queries.
Author
Owner

@dstarner commented on GitHub (Jun 11, 2020):

What would be the best way to remove these queries? I don't mind working on it a bit, I just don't want to see this issue fall by the wayside as its pretty important to my team to fix.

@dstarner commented on GitHub (Jun 11, 2020): What would be the best way to remove these queries? I don't mind working on it a bit, I just don't want to see this issue fall by the wayside as its pretty important to my team to fix.
Author
Owner

@danielestevez commented on GitHub (Jun 15, 2020):

Maybe making this excludes=config_context the default option could work as a quickfix for a minor version?
This is quite a blocker to upgrade to a newer version of Netbox since there's no way we can control how third party tools use the Netbox API

@danielestevez commented on GitHub (Jun 15, 2020): Maybe making this `excludes=config_context` the default option could work as a quickfix for a minor version? This is quite a blocker to upgrade to a newer version of Netbox since there's no way we can control how third party tools use the Netbox API
Author
Owner

@zacho112 commented on GitHub (Jun 16, 2020):

Quick fix to implement the exclude="config_context" across our codebase:

Rewrites all filter to include the exclude (or append it), and the all() method to a filter() aswell.

import pynetbox
org_filter = pynetbox.core.endpoint.Endpoint.filter


def new_filter(self, *args, **kwargs):
    excludes = kwargs.get("exclude", [])
    if isinstance(excludes, str):
        excludes = [excludes]

    if "config_context" not in excludes:
        excludes.append("config_context")

    kwargs["exclude"] = excludes

    return org_filter(self, *args, **kwargs)


pynetbox.core.endpoint.Endpoint.filter = new_filter
pynetbox.core.endpoint.Endpoint.all = new_filter


nb = pynetbox.api(
    netbox_url,
    token=netbox_token,
    ssl_verify=False
)

start = time.time()
devices = nb.dcim.devices.all()
print(time.time() - start)
print(len(devices))
@zacho112 commented on GitHub (Jun 16, 2020): Quick fix to implement the `exclude="config_context"` across our codebase: Rewrites all filter to include the exclude (or append it), and the all() method to a filter() aswell. ``` import pynetbox org_filter = pynetbox.core.endpoint.Endpoint.filter def new_filter(self, *args, **kwargs): excludes = kwargs.get("exclude", []) if isinstance(excludes, str): excludes = [excludes] if "config_context" not in excludes: excludes.append("config_context") kwargs["exclude"] = excludes return org_filter(self, *args, **kwargs) pynetbox.core.endpoint.Endpoint.filter = new_filter pynetbox.core.endpoint.Endpoint.all = new_filter nb = pynetbox.api( netbox_url, token=netbox_token, ssl_verify=False ) start = time.time() devices = nb.dcim.devices.all() print(time.time() - start) print(len(devices)) ```
Author
Owner

@jeremystretch commented on GitHub (Jul 24, 2020):

Tagging this as under review until a specific implementation has been identified.

@jeremystretch commented on GitHub (Jul 24, 2020): Tagging this as `under review` until a specific implementation has been identified.
Author
Owner

@tyler-8 commented on GitHub (Jul 28, 2020):

I found this snippet that could potentially be molded for this use case - however this isn't doing a deep merge.

@tyler-8 commented on GitHub (Jul 28, 2020): I found [this snippet](https://gist.github.com/marcgibbons/22bf6a67a93908ef93bb27d18b229611) that could potentially be molded for this use case - however this isn't doing a deep merge.
Author
Owner

@roganartu commented on GitHub (Aug 14, 2020):

This is the explain I get on v2.8.8 for the deep join query mentioned for a single device, fwiw. It doesn't seem particularly bad in isolation, but I guess it's probably executing once per item in the result set.

netbox=# explain analyze  SELECT "extras_configcontext"."id",
       "extras_configcontext"."name",
       "extras_configcontext"."weight",
       "extras_configcontext"."description",
       "extras_configcontext"."is_active",
       "extras_configcontext"."data"
FROM   "extras_configcontext"
       LEFT OUTER JOIN "extras_configcontext_regions"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_regions"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_sites"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_sites"."configcontext_id"
                       )
       LEFT OUTER JOIN "extras_configcontext_roles"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_roles"."configcontext_id"
                       )
       LEFT OUTER JOIN "extras_configcontext_platforms"
                    ON ( "extras_configcontext"."id" =
       "extras_configcontext_platforms"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_cluster_groups"
                    ON ( "extras_configcontext"."id" =
       "extras_configcontext_cluster_groups"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_clusters"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_clusters"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_tenant_groups"
                    ON ( "extras_configcontext"."id" =
       "extras_configcontext_tenant_groups"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_tenants"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_tenants"."configcontext_id" )
       LEFT OUTER JOIN "extras_configcontext_tags"
                    ON ( "extras_configcontext"."id" =
                         "extras_configcontext_tags"."configcontext_id" )
       LEFT OUTER JOIN "extras_tag"
                    ON ( "extras_configcontext_tags"."tag_id" =
                         "extras_tag"."id" )
WHERE  ( ( "extras_configcontext_regions"."region_id" IN
                    (SELECT U0."id"
                                                          FROM
                    "dcim_region" U0
                                                          WHERE
                    ( U0."lft" <= 6
                      AND U0."rght" >= 7
                      AND U0."tree_id" = 1 )
                    )
            OR "extras_configcontext_regions"."region_id" IS NULL )
         AND ( "extras_configcontext_sites"."site_id" = 1455
                OR "extras_configcontext_sites"."site_id" IS NULL )
         AND ( "extras_configcontext_roles"."devicerole_id" = 100
                OR "extras_configcontext_roles"."devicerole_id" IS NULL )
         AND ( "extras_configcontext_platforms"."platform_id" = 25
                OR "extras_configcontext_platforms"."platform_id" IS NULL )
         AND ( "extras_configcontext_cluster_groups"."clustergroup_id" IS NULL
                OR "extras_configcontext_cluster_groups"."clustergroup_id" IS
                   NULL )
         AND ( "extras_configcontext_clusters"."cluster_id" IS NULL
                OR "extras_configcontext_clusters"."cluster_id" IS NULL )
         AND ( "extras_configcontext_tenant_groups"."tenantgroup_id" IS NULL
                OR "extras_configcontext_tenant_groups"."tenantgroup_id" IS NULL
             )
         AND ( "extras_configcontext_tenants"."tenant_id" IS NULL
                OR "extras_configcontext_tenants"."tenant_id" IS NULL )
         AND ( "extras_tag"."slug" IN (SELECT DISTINCT U0."slug"
                                       FROM   "extras_tag" U0
                                              INNER JOIN "extras_taggeditem" U1
                                                      ON ( U0."id" =
                                                         U1."tag_id" )
                                              INNER JOIN "django_content_type"
                                                         U2
                                                      ON ( U1."content_type_id"
                                                           =
                                                         U2."id" )
                                       WHERE  ( U2."app_label" = 'dcim'
                                                AND U2."model" = 'device'
                                                AND U1."object_id" = 593495 ))
                OR "extras_configcontext_tags"."tag_id" IS NULL )
         AND "extras_configcontext"."is_active" = true )
ORDER  BY "extras_configcontext"."weight" ASC,
          "extras_configcontext"."name" ASC;
                                                                                                               QUERY PLAN                                                                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=766.71..774.62 rows=3161 width=675) (actual time=0.293..0.293 rows=3 loops=1)
   Sort Key: extras_configcontext.weight, extras_configcontext.name
   Sort Method: quicksort  Memory: 25kB
   ->  Merge Left Join  (cost=127.75..582.96 rows=3161 width=675) (actual time=0.265..0.270 rows=3 loops=1)
         Merge Cond: (extras_configcontext.id = extras_configcontext_tenants.configcontext_id)
         Filter: (extras_configcontext_tenants.tenant_id IS NULL)
         ->  Merge Left Join  (cost=127.59..363.59 rows=930 width=675) (actual time=0.256..0.261 rows=3 loops=1)
               Merge Cond: (extras_configcontext.id = extras_configcontext_tags.configcontext_id)
               Filter: ((hashed SubPlan 2) OR (extras_configcontext_tags.tag_id IS NULL))
               ->  Merge Left Join  (cost=102.20..312.00 rows=1850 width=675) (actual time=0.123..0.126 rows=3 loops=1)
                     Merge Cond: (extras_configcontext.id = extras_configcontext_platforms.configcontext_id)
                     Filter: ((extras_configcontext_platforms.platform_id = 25) OR (extras_configcontext_platforms.platform_id IS NULL))
                     ->  Merge Left Join  (cost=102.05..186.74 rows=279 width=675) (actual time=0.113..0.116 rows=3 loops=1)
                           Merge Cond: (extras_configcontext.id = extras_configcontext_roles.configcontext_id)
                           Filter: ((extras_configcontext_roles.devicerole_id = 100) OR (extras_configcontext_roles.devicerole_id IS NULL))
                           ->  Merge Left Join  (cost=101.90..102.47 rows=34 width=675) (actual time=0.108..0.110 rows=3 loops=1)
                                 Merge Cond: (extras_configcontext.id = extras_configcontext_regions.configcontext_id)
                                 Filter: ((hashed SubPlan 1) OR (extras_configcontext_regions.region_id IS NULL))
                                 Rows Removed by Filter: 1
                                 ->  Sort  (cost=99.76..99.84 rows=34 width=675) (actual time=0.085..0.086 rows=3 loops=1)
                                       Sort Key: extras_configcontext.id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Hash Right Join  (cost=49.63..98.90 rows=34 width=675) (actual time=0.065..0.067 rows=3 loops=1)
                                             Hash Cond: (extras_configcontext_tenant_groups.configcontext_id = extras_configcontext.id)
                                             Filter: (extras_configcontext_tenant_groups.tenantgroup_id IS NULL)
                                             ->  Seq Scan on extras_configcontext_tenant_groups  (cost=0.00..30.40 rows=2040 width=8) (actual time=0.001..0.001 rows=0 loops=1)
                                             ->  Hash  (cost=49.50..49.50 rows=10 width=675) (actual time=0.053..0.053 rows=3 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                   ->  Nested Loop Left Join  (cost=5.52..49.50 rows=10 width=675) (actual time=0.037..0.048 rows=3 loops=1)
                                                         Filter: (extras_configcontext_clusters.cluster_id IS NULL)
                                                         ->  Nested Loop Left Join  (cost=3.29..13.95 rows=3 width=675) (actual time=0.033..0.041 rows=3 loops=1)
                                                               Filter: (extras_configcontext_cluster_groups.clustergroup_id IS NULL)
                                                               ->  Hash Right Join  (cost=1.05..2.11 rows=1 width=675) (actual time=0.026..0.030 rows=3 loops=1)
                                                                     Hash Cond: (extras_configcontext_sites.configcontext_id = extras_configcontext.id)
                                                                     Filter: ((extras_configcontext_sites.site_id = 1455) OR (extras_configcontext_sites.site_id IS NULL))
                                                                     Rows Removed by Filter: 2
                                                                     ->  Seq Scan on extras_configcontext_sites  (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.002 rows=3 loops=1)
                                                                     ->  Hash  (cost=1.03..1.03 rows=2 width=675) (actual time=0.011..0.011 rows=3 loops=1)
                                                                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                           ->  Seq Scan on extras_configcontext  (cost=0.00..1.03 rows=2 width=675) (actual time=0.005..0.006 rows=3 loops=1)
                                                                                 Filter: is_active
                                                               ->  Bitmap Heap Scan on extras_configcontext_cluster_groups  (cost=2.23..11.75 rows=10 width=8) (actual time=0.002..0.002 rows=0 loops=3)
                                                                     Recheck Cond: (extras_configcontext.id = configcontext_id)
                                                                     ->  Bitmap Index Scan on extras_configcontext_cluster_groups_configcontext_id_8f50b794  (cost=0.00..2.23 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=3)
                                                                           Index Cond: (extras_configcontext.id = configcontext_id)
                                                         ->  Bitmap Heap Scan on extras_configcontext_clusters  (cost=2.23..11.75 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=3)
                                                               Recheck Cond: (extras_configcontext.id = configcontext_id)
                                                               ->  Bitmap Index Scan on extras_configcontext_clusters_configcontext_id_ed579a40  (cost=0.00..2.23 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=3)
                                                                     Index Cond: (extras_configcontext.id = configcontext_id)
                                 ->  Sort  (cost=1.03..1.03 rows=2 width=8) (actual time=0.009..0.009 rows=2 loops=1)
                                       Sort Key: extras_configcontext_regions.configcontext_id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Seq Scan on extras_configcontext_regions  (cost=0.00..1.02 rows=2 width=8) (actual time=0.002..0.003 rows=2 loops=1)
                                 SubPlan 1
                                   ->  Seq Scan on dcim_region u0  (cost=0.00..1.10 rows=1 width=4) (actual time=0.003..0.004 rows=2 loops=1)
                                         Filter: ((lft <= 6) AND (rght >= 7) AND (tree_id = 1))
                                         Rows Removed by Filter: 4
                           ->  Index Scan using extras_configcontext_roles_configcontext_id_59b67386 on extras_configcontext_roles  (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1)
                     ->  Materialize  (cost=0.15..79.85 rows=2040 width=8) (actual time=0.008..0.009 rows=0 loops=1)
                           ->  Index Scan using extras_configcontext_platforms_configcontext_id_2a516699 on extras_configcontext_platforms  (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1)
               ->  Sort  (cost=9.63..9.63 rows=2 width=19) (actual time=0.079..0.079 rows=2 loops=1)
                     Sort Key: extras_configcontext_tags.configcontext_id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Hash Right Join  (cost=1.04..9.62 rows=2 width=19) (actual time=0.027..0.069 rows=2 loops=1)
                           Hash Cond: (extras_tag.id = extras_configcontext_tags.tag_id)
                           ->  Seq Scan on extras_tag  (cost=0.00..7.31 rows=331 width=15) (actual time=0.004..0.026 rows=331 loops=1)
                           ->  Hash  (cost=1.02..1.02 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Seq Scan on extras_configcontext_tags  (cost=0.00..1.02 rows=2 width=8) (actual time=0.002..0.003 rows=2 loops=1)
               SubPlan 2
                 ->  Unique  (cost=15.76..15.76 rows=1 width=11) (actual time=0.044..0.045 rows=4 loops=1)
                       ->  Sort  (cost=15.76..15.76 rows=1 width=11) (actual time=0.043..0.044 rows=4 loops=1)
                             Sort Key: u0_1.slug
                             Sort Method: quicksort  Memory: 25kB
                             ->  Nested Loop  (cost=0.44..15.75 rows=1 width=11) (actual time=0.022..0.032 rows=4 loops=1)
                                   ->  Nested Loop  (cost=0.29..10.55 rows=1 width=4) (actual time=0.017..0.023 rows=4 loops=1)
                                         ->  Seq Scan on django_content_type u2  (cost=0.00..2.23 rows=1 width=4) (actual time=0.007..0.011 rows=1 loops=1)
                                               Filter: (((app_label)::text = 'dcim'::text) AND ((model)::text = 'device'::text))
                                               Rows Removed by Filter: 81
                                         ->  Index Scan using extras_taggeditem_content_type_id_object_id_80e28e23_idx on extras_taggeditem u1  (cost=0.29..8.31 rows=1 width=8) (actual time=0.009..0.010 rows=4 loops=1)
                                               Index Cond: ((content_type_id = u2.id) AND (object_id = 593495))
                                   ->  Index Scan using extras_tag_pkey on extras_tag u0_1  (cost=0.15..5.17 rows=1 width=15) (actual time=0.002..0.002 rows=1 loops=4)
                                         Index Cond: (id = u1.tag_id)
         ->  Materialize  (cost=0.15..79.85 rows=2040 width=8) (actual time=0.008..0.008 rows=0 loops=1)
               ->  Index Scan using extras_configcontext_tenants_configcontext_id_b53552a6 on extras_configcontext_tenants  (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1)
 Planning Time: 5.490 ms
 Execution Time: 0.591 ms
(87 rows)
@roganartu commented on GitHub (Aug 14, 2020): This is the explain I get on v2.8.8 for the deep join query mentioned for a single device, fwiw. It doesn't seem particularly bad in isolation, but I guess it's probably executing once per item in the result set. ``` netbox=# explain analyze SELECT "extras_configcontext"."id", "extras_configcontext"."name", "extras_configcontext"."weight", "extras_configcontext"."description", "extras_configcontext"."is_active", "extras_configcontext"."data" FROM "extras_configcontext" LEFT OUTER JOIN "extras_configcontext_regions" ON ( "extras_configcontext"."id" = "extras_configcontext_regions"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_sites" ON ( "extras_configcontext"."id" = "extras_configcontext_sites"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_roles" ON ( "extras_configcontext"."id" = "extras_configcontext_roles"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_platforms" ON ( "extras_configcontext"."id" = "extras_configcontext_platforms"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_cluster_groups" ON ( "extras_configcontext"."id" = "extras_configcontext_cluster_groups"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_clusters" ON ( "extras_configcontext"."id" = "extras_configcontext_clusters"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_tenant_groups" ON ( "extras_configcontext"."id" = "extras_configcontext_tenant_groups"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_tenants" ON ( "extras_configcontext"."id" = "extras_configcontext_tenants"."configcontext_id" ) LEFT OUTER JOIN "extras_configcontext_tags" ON ( "extras_configcontext"."id" = "extras_configcontext_tags"."configcontext_id" ) LEFT OUTER JOIN "extras_tag" ON ( "extras_configcontext_tags"."tag_id" = "extras_tag"."id" ) WHERE ( ( "extras_configcontext_regions"."region_id" IN (SELECT U0."id" FROM "dcim_region" U0 WHERE ( U0."lft" <= 6 AND U0."rght" >= 7 AND U0."tree_id" = 1 ) ) OR "extras_configcontext_regions"."region_id" IS NULL ) AND ( "extras_configcontext_sites"."site_id" = 1455 OR "extras_configcontext_sites"."site_id" IS NULL ) AND ( "extras_configcontext_roles"."devicerole_id" = 100 OR "extras_configcontext_roles"."devicerole_id" IS NULL ) AND ( "extras_configcontext_platforms"."platform_id" = 25 OR "extras_configcontext_platforms"."platform_id" IS NULL ) AND ( "extras_configcontext_cluster_groups"."clustergroup_id" IS NULL OR "extras_configcontext_cluster_groups"."clustergroup_id" IS NULL ) AND ( "extras_configcontext_clusters"."cluster_id" IS NULL OR "extras_configcontext_clusters"."cluster_id" IS NULL ) AND ( "extras_configcontext_tenant_groups"."tenantgroup_id" IS NULL OR "extras_configcontext_tenant_groups"."tenantgroup_id" IS NULL ) AND ( "extras_configcontext_tenants"."tenant_id" IS NULL OR "extras_configcontext_tenants"."tenant_id" IS NULL ) AND ( "extras_tag"."slug" IN (SELECT DISTINCT U0."slug" FROM "extras_tag" U0 INNER JOIN "extras_taggeditem" U1 ON ( U0."id" = U1."tag_id" ) INNER JOIN "django_content_type" U2 ON ( U1."content_type_id" = U2."id" ) WHERE ( U2."app_label" = 'dcim' AND U2."model" = 'device' AND U1."object_id" = 593495 )) OR "extras_configcontext_tags"."tag_id" IS NULL ) AND "extras_configcontext"."is_active" = true ) ORDER BY "extras_configcontext"."weight" ASC, "extras_configcontext"."name" ASC; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=766.71..774.62 rows=3161 width=675) (actual time=0.293..0.293 rows=3 loops=1) Sort Key: extras_configcontext.weight, extras_configcontext.name Sort Method: quicksort Memory: 25kB -> Merge Left Join (cost=127.75..582.96 rows=3161 width=675) (actual time=0.265..0.270 rows=3 loops=1) Merge Cond: (extras_configcontext.id = extras_configcontext_tenants.configcontext_id) Filter: (extras_configcontext_tenants.tenant_id IS NULL) -> Merge Left Join (cost=127.59..363.59 rows=930 width=675) (actual time=0.256..0.261 rows=3 loops=1) Merge Cond: (extras_configcontext.id = extras_configcontext_tags.configcontext_id) Filter: ((hashed SubPlan 2) OR (extras_configcontext_tags.tag_id IS NULL)) -> Merge Left Join (cost=102.20..312.00 rows=1850 width=675) (actual time=0.123..0.126 rows=3 loops=1) Merge Cond: (extras_configcontext.id = extras_configcontext_platforms.configcontext_id) Filter: ((extras_configcontext_platforms.platform_id = 25) OR (extras_configcontext_platforms.platform_id IS NULL)) -> Merge Left Join (cost=102.05..186.74 rows=279 width=675) (actual time=0.113..0.116 rows=3 loops=1) Merge Cond: (extras_configcontext.id = extras_configcontext_roles.configcontext_id) Filter: ((extras_configcontext_roles.devicerole_id = 100) OR (extras_configcontext_roles.devicerole_id IS NULL)) -> Merge Left Join (cost=101.90..102.47 rows=34 width=675) (actual time=0.108..0.110 rows=3 loops=1) Merge Cond: (extras_configcontext.id = extras_configcontext_regions.configcontext_id) Filter: ((hashed SubPlan 1) OR (extras_configcontext_regions.region_id IS NULL)) Rows Removed by Filter: 1 -> Sort (cost=99.76..99.84 rows=34 width=675) (actual time=0.085..0.086 rows=3 loops=1) Sort Key: extras_configcontext.id Sort Method: quicksort Memory: 25kB -> Hash Right Join (cost=49.63..98.90 rows=34 width=675) (actual time=0.065..0.067 rows=3 loops=1) Hash Cond: (extras_configcontext_tenant_groups.configcontext_id = extras_configcontext.id) Filter: (extras_configcontext_tenant_groups.tenantgroup_id IS NULL) -> Seq Scan on extras_configcontext_tenant_groups (cost=0.00..30.40 rows=2040 width=8) (actual time=0.001..0.001 rows=0 loops=1) -> Hash (cost=49.50..49.50 rows=10 width=675) (actual time=0.053..0.053 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Nested Loop Left Join (cost=5.52..49.50 rows=10 width=675) (actual time=0.037..0.048 rows=3 loops=1) Filter: (extras_configcontext_clusters.cluster_id IS NULL) -> Nested Loop Left Join (cost=3.29..13.95 rows=3 width=675) (actual time=0.033..0.041 rows=3 loops=1) Filter: (extras_configcontext_cluster_groups.clustergroup_id IS NULL) -> Hash Right Join (cost=1.05..2.11 rows=1 width=675) (actual time=0.026..0.030 rows=3 loops=1) Hash Cond: (extras_configcontext_sites.configcontext_id = extras_configcontext.id) Filter: ((extras_configcontext_sites.site_id = 1455) OR (extras_configcontext_sites.site_id IS NULL)) Rows Removed by Filter: 2 -> Seq Scan on extras_configcontext_sites (cost=0.00..1.03 rows=3 width=8) (actual time=0.002..0.002 rows=3 loops=1) -> Hash (cost=1.03..1.03 rows=2 width=675) (actual time=0.011..0.011 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on extras_configcontext (cost=0.00..1.03 rows=2 width=675) (actual time=0.005..0.006 rows=3 loops=1) Filter: is_active -> Bitmap Heap Scan on extras_configcontext_cluster_groups (cost=2.23..11.75 rows=10 width=8) (actual time=0.002..0.002 rows=0 loops=3) Recheck Cond: (extras_configcontext.id = configcontext_id) -> Bitmap Index Scan on extras_configcontext_cluster_groups_configcontext_id_8f50b794 (cost=0.00..2.23 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=3) Index Cond: (extras_configcontext.id = configcontext_id) -> Bitmap Heap Scan on extras_configcontext_clusters (cost=2.23..11.75 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=3) Recheck Cond: (extras_configcontext.id = configcontext_id) -> Bitmap Index Scan on extras_configcontext_clusters_configcontext_id_ed579a40 (cost=0.00..2.23 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=3) Index Cond: (extras_configcontext.id = configcontext_id) -> Sort (cost=1.03..1.03 rows=2 width=8) (actual time=0.009..0.009 rows=2 loops=1) Sort Key: extras_configcontext_regions.configcontext_id Sort Method: quicksort Memory: 25kB -> Seq Scan on extras_configcontext_regions (cost=0.00..1.02 rows=2 width=8) (actual time=0.002..0.003 rows=2 loops=1) SubPlan 1 -> Seq Scan on dcim_region u0 (cost=0.00..1.10 rows=1 width=4) (actual time=0.003..0.004 rows=2 loops=1) Filter: ((lft <= 6) AND (rght >= 7) AND (tree_id = 1)) Rows Removed by Filter: 4 -> Index Scan using extras_configcontext_roles_configcontext_id_59b67386 on extras_configcontext_roles (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1) -> Materialize (cost=0.15..79.85 rows=2040 width=8) (actual time=0.008..0.009 rows=0 loops=1) -> Index Scan using extras_configcontext_platforms_configcontext_id_2a516699 on extras_configcontext_platforms (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1) -> Sort (cost=9.63..9.63 rows=2 width=19) (actual time=0.079..0.079 rows=2 loops=1) Sort Key: extras_configcontext_tags.configcontext_id Sort Method: quicksort Memory: 25kB -> Hash Right Join (cost=1.04..9.62 rows=2 width=19) (actual time=0.027..0.069 rows=2 loops=1) Hash Cond: (extras_tag.id = extras_configcontext_tags.tag_id) -> Seq Scan on extras_tag (cost=0.00..7.31 rows=331 width=15) (actual time=0.004..0.026 rows=331 loops=1) -> Hash (cost=1.02..1.02 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on extras_configcontext_tags (cost=0.00..1.02 rows=2 width=8) (actual time=0.002..0.003 rows=2 loops=1) SubPlan 2 -> Unique (cost=15.76..15.76 rows=1 width=11) (actual time=0.044..0.045 rows=4 loops=1) -> Sort (cost=15.76..15.76 rows=1 width=11) (actual time=0.043..0.044 rows=4 loops=1) Sort Key: u0_1.slug Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.44..15.75 rows=1 width=11) (actual time=0.022..0.032 rows=4 loops=1) -> Nested Loop (cost=0.29..10.55 rows=1 width=4) (actual time=0.017..0.023 rows=4 loops=1) -> Seq Scan on django_content_type u2 (cost=0.00..2.23 rows=1 width=4) (actual time=0.007..0.011 rows=1 loops=1) Filter: (((app_label)::text = 'dcim'::text) AND ((model)::text = 'device'::text)) Rows Removed by Filter: 81 -> Index Scan using extras_taggeditem_content_type_id_object_id_80e28e23_idx on extras_taggeditem u1 (cost=0.29..8.31 rows=1 width=8) (actual time=0.009..0.010 rows=4 loops=1) Index Cond: ((content_type_id = u2.id) AND (object_id = 593495)) -> Index Scan using extras_tag_pkey on extras_tag u0_1 (cost=0.15..5.17 rows=1 width=15) (actual time=0.002..0.002 rows=1 loops=4) Index Cond: (id = u1.tag_id) -> Materialize (cost=0.15..79.85 rows=2040 width=8) (actual time=0.008..0.008 rows=0 loops=1) -> Index Scan using extras_configcontext_tenants_configcontext_id_b53552a6 on extras_configcontext_tenants (cost=0.15..74.75 rows=2040 width=8) (actual time=0.005..0.005 rows=0 loops=1) Planning Time: 5.490 ms Execution Time: 0.591 ms (87 rows) ```
Author
Owner

@maxstr commented on GitHub (Aug 14, 2020):

When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing.

@maxstr commented on GitHub (Aug 14, 2020): When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing.
Author
Owner

@tyler-8 commented on GitHub (Aug 14, 2020):

When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing.

I think we have a good understanding of the problem already - the question now is how to change it to a more efficient method. That "20-30s serializing" is actually just performing even more queries for each device.

Executing a separate query (and then calculating the context merger in Python, separately for each device) is what is chewing up so much time. One of the most common API issues is around users pulling the device list for hundreds/thousands of devices at once and having web server timeouts because it takes upwards of 1 minute to return a response - which is an eternity in web terms.

@tyler-8 commented on GitHub (Aug 14, 2020): > When I profiled this I found the query and talking to the cache were pretty quick and all the slowness was in the serialization. It'd spend ~1-2s getting the data and then 20-30s serializing. I think we have a [good understanding](https://github.com/netbox-community/netbox/issues/4559#issuecomment-626784383) of the problem already - the question now is how to change it to a more efficient method. That "20-30s serializing" is actually just performing even more queries for each device. Executing a separate query (and then calculating the context merger in Python, separately for each device) is what is chewing up so much time. One of the most common API issues is around users pulling the device list for hundreds/thousands of devices at once and having web server timeouts because it takes upwards of 1 minute to return a response - which is an eternity in web terms.
Author
Owner

@stale[bot] commented on GitHub (Sep 29, 2020):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our contributing guide.

@stale[bot] commented on GitHub (Sep 29, 2020): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our [contributing guide](https://github.com/netbox-community/netbox/blob/develop/CONTRIBUTING.md).
Author
Owner

@lampwins commented on GitHub (Oct 20, 2020):

I have a solution going in the 4559-config-context-rendering branch. Basically, I am using a subquery to annotate the device/vm query with the relevant config contexts. The only remaining major hurdle is how to handle the tree of regions because in the current solution we use a separate query from the region instance. It does not appear that django-mptt includes any custom queryset filters that would make this easy.

@lampwins commented on GitHub (Oct 20, 2020): I have a solution going in the `4559-config-context-rendering` branch. Basically, I am using a subquery to annotate the device/vm query with the relevant config contexts. The only remaining major hurdle is how to handle the tree of regions because in the current solution we use a separate query from the region instance. It does not appear that django-mptt includes any custom queryset filters that would make this easy.
Author
Owner

@lampwins commented on GitHub (Oct 25, 2020):

Here is some analysis of my solution.

There are 5000 devices in my test database and 10 config context objects randomly assigned in various objects in the hierarchy. For anyone interested, I have created two NetBox custom scripts to create the devices and config context objects here. Also, I have set MAX_PAGE_SIZE = 0

Baseline, on develop branch with ?limit=0&exclude=config_context:

% curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0&exclude=config_context" | python -m json.tool > develop-exclude.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5530k  100 5530k    0     0   587k      0  0:00:09  0:00:09 --:--:-- 1317k

On develop branch with ?limit=0:

 % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > develop.json 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5770k  100 5770k    0     0  61767      0  0:01:35  0:01:35 --:--:-- 1378k

Finally, on 4559-config-context-rendering branch with ?limit=0:

% curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > new-4559.json   
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5770k  100 5770k    0     0   695k      0  0:00:08  0:00:08 --:--:-- 1413k

Keep in mind these results are only with the development server. While not entirely scientific, I'll take the 1 sec gain over baseline as a big win ;)

A simple test to ensure we still render the same way:

% diff develop.json new-4559.json
%

I have also added a handful of tests to more thoroughly cover the integrity of the rendered data.

@lampwins commented on GitHub (Oct 25, 2020): Here is some analysis of my solution. There are 5000 devices in my test database and 10 config context objects randomly assigned in various objects in the hierarchy. For anyone interested, I have created two NetBox custom scripts to create the devices and config context objects [here](https://github.com/lampwins/netbox-load-testing/tree/master/deployment/simple/test_data/custom_scripts). Also, I have set `MAX_PAGE_SIZE = 0` Baseline, on develop branch with `?limit=0&exclude=config_context`: ```bash % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0&exclude=config_context" | python -m json.tool > develop-exclude.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5530k 100 5530k 0 0 587k 0 0:00:09 0:00:09 --:--:-- 1317k ``` On develop branch with `?limit=0`: ```bash % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > develop.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5770k 100 5770k 0 0 61767 0 0:01:35 0:01:35 --:--:-- 1378k ``` Finally, on 4559-config-context-rendering branch with `?limit=0`: ```bash % curl -H "accept: application/json" -H "authorization: token b7232b1b115300f91f2aef57bc01f4c5fed4aa79" "http://localhost:8000/api/dcim/devices/?limit=0" | python -m json.tool > new-4559.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5770k 100 5770k 0 0 695k 0 0:00:08 0:00:08 --:--:-- 1413k ``` Keep in mind these results are only with the development server. While not entirely scientific, I'll take the 1 sec gain over baseline as a big win ;) A simple test to ensure we still render the same way: ```bash % diff develop.json new-4559.json % ``` I have also added a handful of tests to more thoroughly cover the integrity of the rendered data.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#3624