REST response time does not scale properly with limit param while including config_context #6487

Closed
opened 2025-12-29 19:41:21 +01:00 by adam · 11 comments
Owner

Originally created by @BowmanKlinkenberg on GitHub (May 16, 2022).

Originally assigned to: @kkthxbye-code on GitHub.

NetBox version

v3.2.2

Python version

3.9

Steps to Reproduce

  1. GET devices from /api/dcim/devices, including config_context, with a resulting count field less than 100, and limit=100 (for example, use a query parameter that would result in a low number of matches)
  2. GET devices from /api/dcim/devices including config_context, with a resulting count field greater than 100,000, and limit=100 (for example, use a query parameter that would result in a high number of matches)

Expected Behavior

  1. Similar API response times from either query, scaling with limit

Observed Behavior

  1. When the exclude=config_context parameter is included in a query, there is a similar response time between queries that return a large dataset, and queries that return a small dataset
  2. When exclude=config_context is omitted (i.e. config context is fetched), queries that return a large dataset increase in response time dramatically vs. smaller queries, regardless of limit
Originally created by @BowmanKlinkenberg on GitHub (May 16, 2022). Originally assigned to: @kkthxbye-code on GitHub. ### NetBox version v3.2.2 ### Python version 3.9 ### Steps to Reproduce 1. GET devices from `/api/dcim/devices`, including config_context, with a resulting `count` field less than 100, and `limit=100` (for example, use a query parameter that would result in a low number of matches) 2. GET devices from `/api/dcim/devices` including config_context, with a resulting `count` field greater than 100,000, and `limit=100` (for example, use a query parameter that would result in a high number of matches) ### Expected Behavior 1. Similar API response times from either query, scaling with `limit` ### Observed Behavior 1. When the `exclude=config_context` parameter is included in a query, there is a similar response time between queries that return a large dataset, and queries that return a small dataset 2. When `exclude=config_context` is omitted (i.e. config context is fetched), queries that return a large dataset increase in response time dramatically vs. smaller queries, regardless of `limit`
adam added the type: bugstatus: accepted labels 2025-12-29 19:41:21 +01:00
adam closed this issue 2025-12-29 19:41:21 +01:00
Author
Owner

@kkthxbye-code commented on GitHub (May 18, 2022):

I can't quite replicate this. While there is a performance penalty of not excluding config_context, I can't replicate a difference between offset 5000 and no offset.

I can try generating a larger dataset, but some more info with regards to what your config contexts look like (amount, data size, what you are filtering on etc.) would probably help.

@kkthxbye-code commented on GitHub (May 18, 2022): I can't quite replicate this. While there is a performance penalty of not excluding config_context, I can't replicate a difference between offset 5000 and no offset. I can try generating a larger dataset, but some more info with regards to what your config contexts look like (amount, data size, what you are filtering on etc.) would probably help.
Author
Owner

@BowmanKlinkenberg commented on GitHub (May 18, 2022):

@kkthxbye-code we did further testing and found that the offset param was not the root of the issue. It's rather count or in other words the size of the result.

With config context enabled, response time seems to scale directly with the count field of the result, regardless of limit

With config context disabled, response time remains low, and scales depending on limit

Our dataset is around 140K device records. So, it was just coincidental that using offset was the largest count could get, since querying on offset without other params returns all devices, paged by limit

I'll add another comment with SQL delays our team is seeing in the queries that include config context.

@BowmanKlinkenberg commented on GitHub (May 18, 2022): @kkthxbye-code we did further testing and found that the `offset` param was not the root of the issue. It's rather `count` or in other words the size of the result. With config context enabled, response time seems to scale directly with the `count` field of the result, regardless of `limit` With config context disabled, response time remains low, and scales depending on `limit` Our dataset is around 140K device records. So, it was just coincidental that using `offset` was the largest `count` could get, since querying on offset without other params returns all devices, paged by `limit` I'll add another comment with SQL delays our team is seeing in the queries that include config context.
Author
Owner

@BowmanKlinkenberg commented on GitHub (May 18, 2022):

Slow Queries from All w/Context

(5.138) SELECT COUNT(*) FROM (SELECT DISTINCT "dcim_device"."id" AS "col1", "dcim_device"."created" AS "col2", "dcim_device"."last_updated" AS "col3", "dcim_device"."custom_field_data" AS "col4", "dcim_device"."local_context_data" AS "col5", "dcim_device"."device_type_id" AS "col6", "dcim_device"."device_role_id" AS "col7", "dcim_device"."tenant_id" AS "col8", "dcim_device"."platform_id" AS "col9", "dcim_device"."name" AS "col10", "dcim_device"."_name" AS "col11", "dcim_device"."serial" AS "col12", "dcim_device"."asset_tag" AS "col13", "dcim_device"."site_id" AS "col14", "dcim_device"."location_id" AS "col15", "dcim_device"."rack_id" AS "col16", "dcim_device"."position" AS "col17", "dcim_device"."face" AS "col18", "dcim_device"."status" AS "col19", "dcim_device"."airflow" AS "col20", "dcim_device"."primary_ip4_id" AS "col21", "dcim_device"."primary_ip6_id" AS "col22", "dcim_device"."cluster_id" AS "col23", "dcim_device"."virtual_chassis_id" AS "col24", "dcim_device"."vc_position" AS "col25", "dcim_device"."vc_priority" AS "col26", "dcim_device"."comments" AS "col27", (SELECT JSONB_AGG(V0."data" ORDER BY V0."weight", V0."name") AS "_data" FROM "extras_configcontext" V0 LEFT OUTER JOIN "extras_configcontext_platforms" V1 ON (V0."id" = V1."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_types" V3 ON (V0."id" = V3."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_groups" V5 ON (V0."id" = V5."configcontext_id") LEFT OUTER JOIN "extras_configcontext_clusters" V7 ON (V0."id" = V7."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenant_groups" V9 ON (V0."id" = V9."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenants" V11 ON (V0."id" = V11."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tags" V13 ON (V0."id" = V13."configcontext_id") LEFT OUTER JOIN "extras_configcontext_device_types" V15 ON (V0."id" = V15."configcontext_id") LEFT OUTER JOIN "extras_configcontext_roles" V17 ON (V0."id" = V17."configcontext_id") LEFT OUTER JOIN "extras_configcontext_sites" V19 ON (V0."id" = V19."configcontext_id") LEFT OUTER JOIN "extras_configcontext_regions" V21 ON (V0."id" = V21."configcontext_id") LEFT OUTER JOIN "dcim_region" V22 ON (V21."region_id" = V22."id") LEFT OUTER JOIN "extras_configcontext_site_groups" V23 ON (V0."id" = V23."configcontext_id") LEFT OUTER JOIN "dcim_sitegroup" V24 ON (V23."sitegroup_id" = V24."id") WHERE ((V1."platform_id" = ("dcim_device"."platform_id") OR V1."platform_id" IS NULL) AND (V3."clustertype_id" = ("virtualization_cluster"."type_id") OR V3."clustertype_id" IS NULL) AND (V5."clustergroup_id" = ("virtualization_cluster"."group_id") OR V5."clustergroup_id" IS NULL) AND (V7."cluster_id" = ("dcim_device"."cluster_id") OR V7."cluster_id" IS NULL) AND (V9."tenantgroup_id" = ("tenancy_tenant"."group_id") OR V9."tenantgroup_id" IS NULL) AND (V11."tenant_id" = ("dcim_device"."tenant_id") OR V11."tenant_id" IS NULL) AND (V13."tag_id" IN (SELECT U0."tag_id" FROM "extras_taggeditem" U0 INNER JOIN "django_content_type" U1 ON (U0."content_type_id" = U1."id") WHERE (U1."app_label" = 'dcim' AND U1."model" = 'device' AND U0."object_id" = ("dcim_device"."id"))) OR V13."tag_id" IS NULL) AND V0."is_active" AND (V15."devicetype_id" = ("dcim_device"."device_type_id") OR V15."devicetype_id" IS NULL) AND (V17."devicerole_id" = ("dcim_device"."device_role_id") OR V17."devicerole_id" IS NULL) AND (V19."site_id" = ("dcim_device"."site_id") OR V19."site_id" IS NULL) AND ((V22."level" <= ("dcim_region"."level") AND V22."lft" <= ("dcim_region"."lft") AND V22."rght" >= ("dcim_region"."rght") AND V22."tree_id" = ("dcim_region"."tree_id")) OR V21."region_id" IS NULL) AND ((V24."level" <= ("dcim_sitegroup"."level") AND V24."lft" <= ("dcim_sitegroup"."lft") AND V24."rght" >= ("dcim_sitegroup"."rght") AND V24."tree_id" = ("dcim_sitegroup"."tree_id")) OR V23."sitegroup_id" IS NULL))) AS "config_context_data" FROM "dcim_device" LEFT OUTER JOIN "virtualization_cluster" ON ("dcim_device"."cluster_id" = "virtualization_cluster"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("dcim_device"."tenant_id" = "tenancy_tenant"."id") INNER JOIN "dcim_site" ON ("dcim_device"."site_id" = "dcim_site"."id") LEFT OUTER JOIN "dcim_region" ON ("dcim_site"."region_id" = "dcim_region"."id") LEFT OUTER JOIN "dcim_sitegroup" ON ("dcim_site"."group_id" = "dcim_sitegroup"."id")) subquery; args=('dcim', 'device'); alias=default

(6.062) SELECT DISTINCT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."custom_field_data", "dcim_device"."local_context_data", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."_name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position", "dcim_device"."vc_priority", "dcim_device"."comments", (SELECT JSONB_AGG(V0."data" ORDER BY V0."weight", V0."name") AS "_data" FROM "extras_configcontext" V0 LEFT OUTER JOIN "extras_configcontext_platforms" V1 ON (V0."id" = V1."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_types" V3 ON (V0."id" = V3."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_groups" V5 ON (V0."id" = V5."configcontext_id") LEFT OUTER JOIN "extras_configcontext_clusters" V7 ON (V0."id" = V7."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenant_groups" V9 ON (V0."id" = V9."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenants" V11 ON (V0."id" = V11."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tags" V13 ON (V0."id" = V13."configcontext_id") LEFT OUTER JOIN "extras_configcontext_device_types" V15 ON (V0."id" = V15."configcontext_id") LEFT OUTER JOIN "extras_configcontext_roles" V17 ON (V0."id" = V17."configcontext_id") LEFT OUTER JOIN "extras_configcontext_sites" V19 ON (V0."id" = V19."configcontext_id") LEFT OUTER JOIN "extras_configcontext_regions" V21 ON (V0."id" = V21."configcontext_id") LEFT OUTER JOIN "dcim_region" V22 ON (V21."region_id" = V22."id") LEFT OUTER JOIN "extras_configcontext_site_groups" V23 ON (V0."id" = V23."configcontext_id") LEFT OUTER JOIN "dcim_sitegroup" V24 ON (V23."sitegroup_id" = V24."id") WHERE ((V1."platform_id" = ("dcim_device"."platform_id") OR V1."platform_id" IS NULL) AND (V3."clustertype_id" = ("virtualization_cluster"."type_id") OR V3."clustertype_id" IS NULL) AND (V5."clustergroup_id" = ("virtualization_cluster"."group_id") OR V5."clustergroup_id" IS NULL) AND (V7."cluster_id" = ("dcim_device"."cluster_id") OR V7."cluster_id" IS NULL) AND (V9."tenantgroup_id" = ("tenancy_tenant"."group_id") OR V9."tenantgroup_id" IS NULL) AND (V11."tenant_id" = ("dcim_device"."tenant_id") OR V11."tenant_id" IS NULL) AND (V13."tag_id" IN (SELECT U0."tag_id" FROM "extras_taggeditem" U0 INNER JOIN "django_content_type" U1 ON (U0."content_type_id" = U1."id") WHERE (U1."app_label" = 'dcim' AND U1."model" = 'device' AND U0."object_id" = ("dcim_device"."id"))) OR V13."tag_id" IS NULL) AND V0."is_active" AND (V15."devicetype_id" = ("dcim_device"."device_type_id") OR V15."devicetype_id" IS NULL) AND (V17."devicerole_id" = ("dcim_device"."device_role_id") OR V17."devicerole_id" IS NULL) AND (V19."site_id" = ("dcim_device"."site_id") OR V19."site_id" IS NULL) AND ((V22."level" <= ("dcim_region"."level") AND V22."lft" <= ("dcim_region"."lft") AND V22."rght" >= ("dcim_region"."rght") AND V22."tree_id" = ("dcim_region"."tree_id")) OR V21."region_id" IS NULL) AND ((V24."level" <= ("dcim_sitegroup"."level") AND V24."lft" <= ("dcim_sitegroup"."lft") AND V24."rght" >= ("dcim_sitegroup"."rght") AND V24."tree_id" = ("dcim_sitegroup"."tree_id")) OR V23."sitegroup_id" IS NULL))) AS "config_context_data" FROM "dcim_device" LEFT OUTER JOIN "virtualization_cluster" ON ("dcim_device"."cluster_id" = "virtualization_cluster"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("dcim_device"."tenant_id" = "tenancy_tenant"."id") INNER JOIN "dcim_site" ON ("dcim_device"."site_id" = "dcim_site"."id") LEFT OUTER JOIN "dcim_region" ON ("dcim_site"."region_id" = "dcim_region"."id") LEFT OUTER JOIN "dcim_sitegroup" ON ("dcim_site"."group_id" = "dcim_sitegroup"."id") ORDER BY "dcim_device"."_name" ASC, "dcim_device"."id" ASC LIMIT 50; args=('dcim', 'device'); alias=default

Fast Queries w/o Context

(0.051) SELECT COUNT(*) AS "__count" FROM "dcim_device"; args=(); alias=default

(0.114) SELECT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."custom_field_data", "dcim_device"."local_context_data", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."_name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position", "dcim_device"."vc_priority", "dcim_device"."comments" FROM "dcim_device" ORDER BY "dcim_device"."_name" ASC, "dcim_device"."id" ASC LIMIT 50; args=(); alias=default
@BowmanKlinkenberg commented on GitHub (May 18, 2022): Slow Queries from All w/Context ``` (5.138) SELECT COUNT(*) FROM (SELECT DISTINCT "dcim_device"."id" AS "col1", "dcim_device"."created" AS "col2", "dcim_device"."last_updated" AS "col3", "dcim_device"."custom_field_data" AS "col4", "dcim_device"."local_context_data" AS "col5", "dcim_device"."device_type_id" AS "col6", "dcim_device"."device_role_id" AS "col7", "dcim_device"."tenant_id" AS "col8", "dcim_device"."platform_id" AS "col9", "dcim_device"."name" AS "col10", "dcim_device"."_name" AS "col11", "dcim_device"."serial" AS "col12", "dcim_device"."asset_tag" AS "col13", "dcim_device"."site_id" AS "col14", "dcim_device"."location_id" AS "col15", "dcim_device"."rack_id" AS "col16", "dcim_device"."position" AS "col17", "dcim_device"."face" AS "col18", "dcim_device"."status" AS "col19", "dcim_device"."airflow" AS "col20", "dcim_device"."primary_ip4_id" AS "col21", "dcim_device"."primary_ip6_id" AS "col22", "dcim_device"."cluster_id" AS "col23", "dcim_device"."virtual_chassis_id" AS "col24", "dcim_device"."vc_position" AS "col25", "dcim_device"."vc_priority" AS "col26", "dcim_device"."comments" AS "col27", (SELECT JSONB_AGG(V0."data" ORDER BY V0."weight", V0."name") AS "_data" FROM "extras_configcontext" V0 LEFT OUTER JOIN "extras_configcontext_platforms" V1 ON (V0."id" = V1."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_types" V3 ON (V0."id" = V3."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_groups" V5 ON (V0."id" = V5."configcontext_id") LEFT OUTER JOIN "extras_configcontext_clusters" V7 ON (V0."id" = V7."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenant_groups" V9 ON (V0."id" = V9."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenants" V11 ON (V0."id" = V11."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tags" V13 ON (V0."id" = V13."configcontext_id") LEFT OUTER JOIN "extras_configcontext_device_types" V15 ON (V0."id" = V15."configcontext_id") LEFT OUTER JOIN "extras_configcontext_roles" V17 ON (V0."id" = V17."configcontext_id") LEFT OUTER JOIN "extras_configcontext_sites" V19 ON (V0."id" = V19."configcontext_id") LEFT OUTER JOIN "extras_configcontext_regions" V21 ON (V0."id" = V21."configcontext_id") LEFT OUTER JOIN "dcim_region" V22 ON (V21."region_id" = V22."id") LEFT OUTER JOIN "extras_configcontext_site_groups" V23 ON (V0."id" = V23."configcontext_id") LEFT OUTER JOIN "dcim_sitegroup" V24 ON (V23."sitegroup_id" = V24."id") WHERE ((V1."platform_id" = ("dcim_device"."platform_id") OR V1."platform_id" IS NULL) AND (V3."clustertype_id" = ("virtualization_cluster"."type_id") OR V3."clustertype_id" IS NULL) AND (V5."clustergroup_id" = ("virtualization_cluster"."group_id") OR V5."clustergroup_id" IS NULL) AND (V7."cluster_id" = ("dcim_device"."cluster_id") OR V7."cluster_id" IS NULL) AND (V9."tenantgroup_id" = ("tenancy_tenant"."group_id") OR V9."tenantgroup_id" IS NULL) AND (V11."tenant_id" = ("dcim_device"."tenant_id") OR V11."tenant_id" IS NULL) AND (V13."tag_id" IN (SELECT U0."tag_id" FROM "extras_taggeditem" U0 INNER JOIN "django_content_type" U1 ON (U0."content_type_id" = U1."id") WHERE (U1."app_label" = 'dcim' AND U1."model" = 'device' AND U0."object_id" = ("dcim_device"."id"))) OR V13."tag_id" IS NULL) AND V0."is_active" AND (V15."devicetype_id" = ("dcim_device"."device_type_id") OR V15."devicetype_id" IS NULL) AND (V17."devicerole_id" = ("dcim_device"."device_role_id") OR V17."devicerole_id" IS NULL) AND (V19."site_id" = ("dcim_device"."site_id") OR V19."site_id" IS NULL) AND ((V22."level" <= ("dcim_region"."level") AND V22."lft" <= ("dcim_region"."lft") AND V22."rght" >= ("dcim_region"."rght") AND V22."tree_id" = ("dcim_region"."tree_id")) OR V21."region_id" IS NULL) AND ((V24."level" <= ("dcim_sitegroup"."level") AND V24."lft" <= ("dcim_sitegroup"."lft") AND V24."rght" >= ("dcim_sitegroup"."rght") AND V24."tree_id" = ("dcim_sitegroup"."tree_id")) OR V23."sitegroup_id" IS NULL))) AS "config_context_data" FROM "dcim_device" LEFT OUTER JOIN "virtualization_cluster" ON ("dcim_device"."cluster_id" = "virtualization_cluster"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("dcim_device"."tenant_id" = "tenancy_tenant"."id") INNER JOIN "dcim_site" ON ("dcim_device"."site_id" = "dcim_site"."id") LEFT OUTER JOIN "dcim_region" ON ("dcim_site"."region_id" = "dcim_region"."id") LEFT OUTER JOIN "dcim_sitegroup" ON ("dcim_site"."group_id" = "dcim_sitegroup"."id")) subquery; args=('dcim', 'device'); alias=default (6.062) SELECT DISTINCT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."custom_field_data", "dcim_device"."local_context_data", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."_name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position", "dcim_device"."vc_priority", "dcim_device"."comments", (SELECT JSONB_AGG(V0."data" ORDER BY V0."weight", V0."name") AS "_data" FROM "extras_configcontext" V0 LEFT OUTER JOIN "extras_configcontext_platforms" V1 ON (V0."id" = V1."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_types" V3 ON (V0."id" = V3."configcontext_id") LEFT OUTER JOIN "extras_configcontext_cluster_groups" V5 ON (V0."id" = V5."configcontext_id") LEFT OUTER JOIN "extras_configcontext_clusters" V7 ON (V0."id" = V7."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenant_groups" V9 ON (V0."id" = V9."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tenants" V11 ON (V0."id" = V11."configcontext_id") LEFT OUTER JOIN "extras_configcontext_tags" V13 ON (V0."id" = V13."configcontext_id") LEFT OUTER JOIN "extras_configcontext_device_types" V15 ON (V0."id" = V15."configcontext_id") LEFT OUTER JOIN "extras_configcontext_roles" V17 ON (V0."id" = V17."configcontext_id") LEFT OUTER JOIN "extras_configcontext_sites" V19 ON (V0."id" = V19."configcontext_id") LEFT OUTER JOIN "extras_configcontext_regions" V21 ON (V0."id" = V21."configcontext_id") LEFT OUTER JOIN "dcim_region" V22 ON (V21."region_id" = V22."id") LEFT OUTER JOIN "extras_configcontext_site_groups" V23 ON (V0."id" = V23."configcontext_id") LEFT OUTER JOIN "dcim_sitegroup" V24 ON (V23."sitegroup_id" = V24."id") WHERE ((V1."platform_id" = ("dcim_device"."platform_id") OR V1."platform_id" IS NULL) AND (V3."clustertype_id" = ("virtualization_cluster"."type_id") OR V3."clustertype_id" IS NULL) AND (V5."clustergroup_id" = ("virtualization_cluster"."group_id") OR V5."clustergroup_id" IS NULL) AND (V7."cluster_id" = ("dcim_device"."cluster_id") OR V7."cluster_id" IS NULL) AND (V9."tenantgroup_id" = ("tenancy_tenant"."group_id") OR V9."tenantgroup_id" IS NULL) AND (V11."tenant_id" = ("dcim_device"."tenant_id") OR V11."tenant_id" IS NULL) AND (V13."tag_id" IN (SELECT U0."tag_id" FROM "extras_taggeditem" U0 INNER JOIN "django_content_type" U1 ON (U0."content_type_id" = U1."id") WHERE (U1."app_label" = 'dcim' AND U1."model" = 'device' AND U0."object_id" = ("dcim_device"."id"))) OR V13."tag_id" IS NULL) AND V0."is_active" AND (V15."devicetype_id" = ("dcim_device"."device_type_id") OR V15."devicetype_id" IS NULL) AND (V17."devicerole_id" = ("dcim_device"."device_role_id") OR V17."devicerole_id" IS NULL) AND (V19."site_id" = ("dcim_device"."site_id") OR V19."site_id" IS NULL) AND ((V22."level" <= ("dcim_region"."level") AND V22."lft" <= ("dcim_region"."lft") AND V22."rght" >= ("dcim_region"."rght") AND V22."tree_id" = ("dcim_region"."tree_id")) OR V21."region_id" IS NULL) AND ((V24."level" <= ("dcim_sitegroup"."level") AND V24."lft" <= ("dcim_sitegroup"."lft") AND V24."rght" >= ("dcim_sitegroup"."rght") AND V24."tree_id" = ("dcim_sitegroup"."tree_id")) OR V23."sitegroup_id" IS NULL))) AS "config_context_data" FROM "dcim_device" LEFT OUTER JOIN "virtualization_cluster" ON ("dcim_device"."cluster_id" = "virtualization_cluster"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("dcim_device"."tenant_id" = "tenancy_tenant"."id") INNER JOIN "dcim_site" ON ("dcim_device"."site_id" = "dcim_site"."id") LEFT OUTER JOIN "dcim_region" ON ("dcim_site"."region_id" = "dcim_region"."id") LEFT OUTER JOIN "dcim_sitegroup" ON ("dcim_site"."group_id" = "dcim_sitegroup"."id") ORDER BY "dcim_device"."_name" ASC, "dcim_device"."id" ASC LIMIT 50; args=('dcim', 'device'); alias=default ``` Fast Queries w/o Context ``` (0.051) SELECT COUNT(*) AS "__count" FROM "dcim_device"; args=(); alias=default (0.114) SELECT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."custom_field_data", "dcim_device"."local_context_data", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."_name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position", "dcim_device"."vc_priority", "dcim_device"."comments" FROM "dcim_device" ORDER BY "dcim_device"."_name" ASC, "dcim_device"."id" ASC LIMIT 50; args=(); alias=default ```
Author
Owner

@BowmanKlinkenberg commented on GitHub (May 18, 2022):

PS this isn't unique to a count of over 100K, it's just easier to measure the bigger the difference gets in test cases. After accounting for the base API response overhead, it seems to scale directly with count

Also, this doesn't seem to be related to whether or not config context is actually there or what size it is. It seems to be just related to whether exclude=config_context is present. It seems that the resulting query performs far differently.

@BowmanKlinkenberg commented on GitHub (May 18, 2022): PS this isn't unique to a `count` of over 100K, it's just easier to measure the bigger the difference gets in test cases. After accounting for the base API response overhead, it seems to scale directly with `count` Also, this doesn't seem to be related to whether or not config context is actually there or what size it is. It seems to be just related to whether `exclude=config_context` is present. It seems that the resulting query performs far differently.
Author
Owner

@kkthxbye-code commented on GitHub (May 21, 2022):

Had some time to look at it a little. There's two issues.

First is that the query is just heavy.

https://explain.dalibo.com/plan/VkK#plan

There's some obvious squential scans that seem slow:

https://github.com/netbox-community/netbox/blob/develop/netbox/extras/querysets.py#L103-L111

But it's only around 12% of the total query. I'm sure the entire query can be done faster, but I'm no where near good enough at the django ORM. Maybe it's possible to do the config_context annotation after the main query has been limited. I'm hoping @jeremystretch has time at some point to look at this.

Second issue is that as you said, there's a count executed as well. This effectively doubles the query time as django rest frameworks paginator basically just does a count around the entire query. Not sure if it's possible to define a custom pagination_class that doesn't annotate the queryset with config_context. Again, Jeremy or someone with more django experience probably needs to look at it.

For reference, the sql timing showing the count first then the real query:

image

@kkthxbye-code commented on GitHub (May 21, 2022): Had some time to look at it a little. There's two issues. First is that the query is just heavy. https://explain.dalibo.com/plan/VkK#plan There's some obvious squential scans that seem slow: https://github.com/netbox-community/netbox/blob/develop/netbox/extras/querysets.py#L103-L111 But it's only around 12% of the total query. I'm sure the entire query can be done faster, but I'm no where near good enough at the django ORM. Maybe it's possible to do the config_context annotation after the main query has been limited. I'm hoping @jeremystretch has time at some point to look at this. Second issue is that as you said, there's a count executed as well. This effectively doubles the query time as django rest frameworks paginator basically just does a count around the entire query. Not sure if it's possible to define a custom pagination_class that doesn't annotate the queryset with config_context. Again, Jeremy or someone with more django experience probably needs to look at it. For reference, the sql timing showing the count first then the real query: ![image](https://user-images.githubusercontent.com/400797/169667123-08df20d5-77d6-4a4a-9b4d-cc81a3f8d08a.png)
Author
Owner

@maximumG commented on GitHub (May 24, 2022):

We have exactly the same issue on our netbox instance with around 20K devices with 99 config context. Out of these 99 config context, around 10 are really big JSON structure applied to all of the 20K devices.

After some deep investigation it seems that collapsing every JSON config context for every devices in pure postgreSQL takes a long time. As pinpointed by @kkthxbye-code, the same big query is executed twice: on time for the device count rendering and a second time to render every devices with their associated config context.

I guess we can find some pagination workaround to avoid executing this long-running query just to render a device count. Does anyone knows if we can customize the pagination for DRF ?

@maximumG commented on GitHub (May 24, 2022): We have exactly the same issue on our netbox instance with around 20K devices with 99 config context. Out of these 99 config context, around 10 are really big JSON structure applied to all of the 20K devices. After some deep investigation it seems that collapsing every JSON config context for every devices in pure postgreSQL takes a long time. As pinpointed by @kkthxbye-code, the same big query is executed twice: on time for the device count rendering and a second time to render every devices with their associated config context. I guess we can find some pagination workaround to avoid executing this long-running query just to render a device count. Does anyone knows if we can customize the pagination for DRF ?
Author
Owner

@kkthxbye-code commented on GitHub (Jun 13, 2022):

I will tentatively own this, with the goal of removing the extra count overhead. Cloning the queryset in the paginator and running queryset.query.annotations.clear() seems to work as intended and removes the vast majority of the extra overhead.

@kkthxbye-code commented on GitHub (Jun 13, 2022): I will tentatively own this, with the goal of removing the extra count overhead. Cloning the queryset in the paginator and running queryset.query.annotations.clear() seems to work as intended and removes the vast majority of the extra overhead.
Author
Owner

@jeremystretch commented on GitHub (Jun 13, 2022):

Thanks @kkthxbye-code, I've been meaning to dig into this but haven't gotten to it yet. Let me know if I can be of any assistance.

@jeremystretch commented on GitHub (Jun 13, 2022): Thanks @kkthxbye-code, I've been meaning to dig into this but haven't gotten to it yet. Let me know if I can be of any assistance.
Author
Owner

@kkthxbye-code commented on GitHub (Jun 13, 2022):

I implemented a partial fix which should halve the load time of the device list through the API when config context is not excluded. It's really more of a workaround as I still think there should be a way to avoid the query scaling linearly with the number of devices, I'm still just hitting my head against the ORM when trying to fix it.

I'll try to keep digging when I get the time, but we probably still need your help here @jeremystretch !

@kkthxbye-code commented on GitHub (Jun 13, 2022): I implemented a partial fix which should halve the load time of the device list through the API when config context is not excluded. It's really more of a workaround as I still think there should be a way to avoid the query scaling linearly with the number of devices, I'm still just hitting my head against the ORM when trying to fix it. I'll try to keep digging when I get the time, but we probably still need your help here @jeremystretch !
Author
Owner

@jeremystretch commented on GitHub (Jun 17, 2022):

Although there likely remains work we can do to better optimize these queries, I'm going to leave this as closed by @kkthxbye-code's PR #9527 for a few reasons:

  • The PR effectively reduces query time by 50%, which is a huge win.
  • A new round of testing & evaluation is needed now that this change has been put into place.
  • IMO strategic performance improvements (beyond glaring inefficiencies) should be approached as feature work, rather than bugs.

Anyone is welcome to submit a follow-on issue if they'd like to further pursue improvements in this area once v3.2.5 has been released.

@jeremystretch commented on GitHub (Jun 17, 2022): Although there likely remains work we can do to better optimize these queries, I'm going to leave this as closed by @kkthxbye-code's PR #9527 for a few reasons: * The PR effectively reduces query time by 50%, which is a huge win. * A new round of testing & evaluation is needed now that this change has been put into place. * IMO strategic performance improvements (beyond glaring inefficiencies) should be approached as feature work, rather than bugs. Anyone is welcome to submit a follow-on issue if they'd like to further pursue improvements in this area once v3.2.5 has been released.
Author
Owner

@BowmanKlinkenberg commented on GitHub (Jun 17, 2022):

Great news! Thanks for your work and attention to this, @kkthxbye-code @jeremystretch!

@BowmanKlinkenberg commented on GitHub (Jun 17, 2022): Great news! Thanks for your work and attention to this, @kkthxbye-code @jeremystretch!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#6487