Slow queries against table 'extras_objectchange' #2776

Closed
opened 2025-12-29 18:22:04 +01:00 by adam · 2 comments
Owner

Originally created by @ir0n-curta1n on GitHub (Aug 5, 2019).

Environment

  • Python version: 3.6
  • NetBox version: 2.6.1

Steps to Reproduce

  1. Populate table 'extras_objectchange' with a bunch of rows. In my case, one million or so, but the number can be higher based on disk type, amount/type of memory and CPUs for the issue to become really noticeable.
  2. Send multiple SELECT/UPDATE statements in parallel to NetBox web API, via pynetbox. In my case, this was a script reading from our old DCIM source and sending SQL statements against the web NetBox API, also using asyncio.
  3. Enable PostgreSQL statement logging and notice long-running statements against table 'extras_objectchange'.

Expected Behavior

I expected things to function without issues.

Observed Behavior

I believe the web API creates SQL statements against table 'extras_objectchange', column 'time' (a column with no index), creating a bottleneck due to the poor database performance of multiple sequential scans of table 'extras_objectchange'. Table 'extras_objectchange' needs an index on column 'time'. Here's the long version of what was observed:

My company is migrating off a DCIM product built completely in-house. As one of the steps in the migration we've written python scripts to read our old DCIM tool and populate NetBox using this fetched data. The migration script uses pynetbox to send queries to the NetBox web API. Since we've not gotten the migrated data just right in the first, or second, or n'th try, we've produced a good number of rows in table 'extras_objectchange'...CHANGELOG_RETENTION value is kept at default of 90 days. We've noticed at at a certain point the migration script started running much slower. After looking, gunicorn workers were queueing requests, taking up memory and eventually leading to swapping. Enabling postgres statement logging showed that there were a large number of long-running requests against table 'extras_objectchange', statements such as:
SELECT "extras_objectchange"."id", "extras_objectchange"."time", "extras_objectchange"."user_id", "extras_objectchange"."user_name", "extras_objectchange"."request_id", "extras_objectchange"."action", "extras_objectchange"."changed_object_type_id", "extras_objectchange"."changed_object_id", "extras_objectchange"."related_object_type_id", "extras_objectchange"."related_object_id", "extras_objectchange"."object_repr", "extras_objectchange"."object_data" FROM "extras_objectchange" WHERE "extras_objectchange"."time" < '2019-04-26T14:44:52.229708+00:00'::timestamptz;

The static timestamp value used in the above statement was exactly 90 days (our/default logging retention period) at the time of execution. Although I've traced these statements to a definite source, I don't see postgresql table triggers that would do this, and thus assume it is some sort of internal NetBox created statement. In any case, we see a lot of these statements when running our sync script. The problem with the statements is that their WHERE clause compares the 'time' column to a static value, and there appears to be no 'time' column index on table 'extras_objectchange'. This combined with the number of rows we have in the table is making for really poor performance. We're sequentially scanning a 1,000,000 row table.

explain SELECT "extras_objectchange"."id", "extras_objectchange"."time", "extras_objectchange"."user_id", "extras_objectchange"."user_name", "extras_objectchange"."request_id", "extras_objectchange"."action", "extras_objectchange"."changed_object_type_id", "extras_objectchange"."changed_object_id", "extras_objectchange"."related_object_type_id", "extras_objectchange"."related_object_id", "extras_objectchange"."object_repr", "extras_objectchange"."object_data" FROM "extras_objectchange" WHERE "extras_objectchange"."time" < '2019-04-26T14:44:52.229708+00:00'::timestamptz;
                                   QUERY PLAN     
--------------------------------------------------------------------------------
 Seq Scan on extras_objectchange  (cost=0.00..130200.30 rows=948049 width=364)
   Filter: ("time" < '2019-04-26 10:44:52.229708-04'::timestamp with time zone)
(2 rows)

Thanks for all the hard work creating and maintaining an awesome DCIM/IPAM product.

Originally created by @ir0n-curta1n on GitHub (Aug 5, 2019). <!-- NOTE: This form is only for reproducible bugs. If you need assistance with NetBox installation, or if you have a general question, DO NOT open an issue. Instead, post to our mailing list: https://groups.google.com/forum/#!forum/netbox-discuss Please describe the environment in which you are running NetBox. Be sure that you are running an unmodified instance of the latest stable release before submitting a bug report. --> ### Environment * Python version: 3.6 * NetBox version: 2.6.1 <!-- Describe in detail the exact steps that someone else can take to reproduce this bug using the current stable release of NetBox (or the current beta release where applicable). Begin with the creation of any necessary database objects and call out every operation being performed explicitly. If reporting a bug in the REST API, be sure to reconstruct the raw HTTP request(s) being made: Don't rely on a wrapper like pynetbox. --> ### Steps to Reproduce 1. Populate table 'extras_objectchange' with a bunch of rows. In my case, one million or so, but the number can be higher based on disk type, amount/type of memory and CPUs for the issue to become really noticeable. 2. Send multiple SELECT/UPDATE statements in parallel to NetBox web API, via pynetbox. In my case, this was a script reading from our old DCIM source and sending SQL statements against the web NetBox API, also using asyncio. 3. Enable PostgreSQL statement logging and notice long-running statements against table 'extras_objectchange'. <!-- What did you expect to happen? --> ### Expected Behavior I expected things to function without issues. <!-- What happened instead? --> ### Observed Behavior I believe the web API creates SQL statements against table 'extras_objectchange', column 'time' (a column with no index), creating a bottleneck due to the poor database performance of multiple sequential scans of table 'extras_objectchange'. Table 'extras_objectchange' needs an index on column 'time'. Here's the long version of what was observed: My company is migrating off a DCIM product built completely in-house. As one of the steps in the migration we've written python scripts to read our old DCIM tool and populate NetBox using this fetched data. The migration script uses pynetbox to send queries to the NetBox web API. Since we've not gotten the migrated data just right in the first, or second, or n'th try, we've produced a good number of rows in table 'extras_objectchange'...CHANGELOG_RETENTION value is kept at default of 90 days. We've noticed at at a certain point the migration script started running much slower. After looking, gunicorn workers were queueing requests, taking up memory and eventually leading to swapping. Enabling postgres statement logging showed that there were a large number of long-running requests against table 'extras_objectchange', statements such as: ` SELECT "extras_objectchange"."id", "extras_objectchange"."time", "extras_objectchange"."user_id", "extras_objectchange"."user_name", "extras_objectchange"."request_id", "extras_objectchange"."action", "extras_objectchange"."changed_object_type_id", "extras_objectchange"."changed_object_id", "extras_objectchange"."related_object_type_id", "extras_objectchange"."related_object_id", "extras_objectchange"."object_repr", "extras_objectchange"."object_data" FROM "extras_objectchange" WHERE "extras_objectchange"."time" < '2019-04-26T14:44:52.229708+00:00'::timestamptz;` The static timestamp value used in the above statement was exactly 90 days (our/default logging retention period) at the time of execution. Although I've traced these statements to a definite source, I don't see postgresql table triggers that would do this, and thus assume it is some sort of internal NetBox created statement. In any case, we see a lot of these statements when running our sync script. The problem with the statements is that their WHERE clause compares the 'time' column to a static value, and there appears to be no 'time' column index on table 'extras_objectchange'. This combined with the number of rows we have in the table is making for really poor performance. We're sequentially scanning a 1,000,000 row table. ``` explain SELECT "extras_objectchange"."id", "extras_objectchange"."time", "extras_objectchange"."user_id", "extras_objectchange"."user_name", "extras_objectchange"."request_id", "extras_objectchange"."action", "extras_objectchange"."changed_object_type_id", "extras_objectchange"."changed_object_id", "extras_objectchange"."related_object_type_id", "extras_objectchange"."related_object_id", "extras_objectchange"."object_repr", "extras_objectchange"."object_data" FROM "extras_objectchange" WHERE "extras_objectchange"."time" < '2019-04-26T14:44:52.229708+00:00'::timestamptz; QUERY PLAN -------------------------------------------------------------------------------- Seq Scan on extras_objectchange (cost=0.00..130200.30 rows=948049 width=364) Filter: ("time" < '2019-04-26 10:44:52.229708-04'::timestamp with time zone) (2 rows) ``` Thanks for all the hard work creating and maintaining an awesome DCIM/IPAM product.
adam added the type: bugstatus: accepted labels 2025-12-29 18:22:04 +01:00
adam closed this issue 2025-12-29 18:22:04 +01:00
Author
Owner

@jeremystretch commented on GitHub (Aug 28, 2019):

The static timestamp value used in the above statement was exactly 90 days (our/default logging retention period) at the time of execution. Although I've traced these statements to a definite source, I don't see postgresql table triggers that would do this, and thus assume it is some sort of internal NetBox created statement.

This is probably the auto-purge rule triggering. Every time the change logging middleware runs, there's a 1% chance that NetBox will purge expired Changelog entries. Reviewing the code, it occurs to me that this routine is being called far more often than it should be. The 1% chance is applied for every request rather than only for requests which effect changes. So that's one problem, and I've opened #3459 for that to be addressed separately.

Edit: Disregard. This only runs if a change has been logged.

If you had asked me whether Django DateTimeFields get an index created automatically, I would have said yes, but it appears that's not the case. We'll need to add an index for the field manually.

@jeremystretch commented on GitHub (Aug 28, 2019): > The static timestamp value used in the above statement was exactly 90 days (our/default logging retention period) at the time of execution. Although I've traced these statements to a definite source, I don't see postgresql table triggers that would do this, and thus assume it is some sort of internal NetBox created statement. This is probably the auto-purge rule triggering. Every time the change logging middleware runs, there's a 1% chance that NetBox will purge expired Changelog entries. ~Reviewing [the code](https://github.com/netbox-community/netbox/blob/develop/netbox/extras/middleware.py#L111), it occurs to me that this routine is being called far more often than it should be. The 1% chance is applied for every request rather than only for requests which effect changes. So that's one problem, and I've opened #3459 for that to be addressed separately.~ Edit: Disregard. This only runs if a change has been logged. If you had asked me whether Django DateTimeFields get an index created automatically, I would have said yes, but it appears that's not the case. We'll need to add an index for the field manually.
Author
Owner

@ir0n-curta1n commented on GitHub (Sep 4, 2019):

Awesome. Thank you for fixing this.

@ir0n-curta1n commented on GitHub (Sep 4, 2019): Awesome. Thank you for fixing this.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#2776