Creating devices takes an inordinate amount of time #7652

Closed
opened 2025-12-29 20:26:29 +01:00 by adam · 7 comments
Owner

Originally created by @kkthxbye-code on GitHub (Feb 17, 2023).

Originally assigned to: @jeremystretch on GitHub.

NetBox version

v3.4.4

Python version

3.10

Steps to Reproduce

This issue will be used to collect what I find while investigating the issue, as such the scenario is based on my test database and test devicetype.

  1. Create a devicetype with 62 interfaces
  2. Create a device based on the interface
  3. Notice the responsetime

Expected Behavior

The device should be create within a reasonable time period.

Observed Behavior

The device takes many seconds to create.

Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components.

Originally created by @kkthxbye-code on GitHub (Feb 17, 2023). Originally assigned to: @jeremystretch on GitHub. ### NetBox version v3.4.4 ### Python version 3.10 ### Steps to Reproduce This issue will be used to collect what I find while investigating the issue, as such the scenario is based on my test database and test devicetype. 1. Create a devicetype with 62 interfaces 2. Create a device based on the interface 3. Notice the responsetime ### Expected Behavior The device should be create within a reasonable time period. ### Observed Behavior The device takes many seconds to create. Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components.
adam added the type: bugstatus: accepted labels 2025-12-29 20:26:29 +01:00
adam closed this issue 2025-12-29 20:26:29 +01:00
Author
Owner

@kkthxbye-code commented on GitHub (Feb 17, 2023):

Some background around my test database:

Row counts:

extras_cachedvalue: 798217
dcim_device: 9622
dcim_interface: 97056

Rough response time (without debug) for a devicetype with 62 interfaces: 6.01 seconds

Initial findings:

A lot of the processing time is spent deleting from extras_cachedvalue:
image

It's slow because there's no index on object_id. Adding an index on object_id:

create index extras_cachedvalue_object_id_sdfsdf ON extras_cachedvalue(object_id);

Response time after adding the index: 2.36 s (~60% decrease).

Before index:
image

After index:
image

Insertions did not seem to be noticeably effected in my testing, but reindexing might be slower, will need to test. Otherwise this seems like a very easy win.

@kkthxbye-code commented on GitHub (Feb 17, 2023): Some background around my test database: Row counts: extras_cachedvalue: 798217 dcim_device: 9622 dcim_interface: 97056 Rough response time (without debug) for a devicetype with 62 interfaces: 6.01 seconds Initial findings: A lot of the processing time is spent deleting from extras_cachedvalue: ![image](https://user-images.githubusercontent.com/400797/219611087-e81be65f-8d58-49c8-94c0-c1961c8c3bf8.png) It's slow because there's no index on object_id. Adding an index on object_id: ```create index extras_cachedvalue_object_id_sdfsdf ON extras_cachedvalue(object_id);``` Response time after adding the index: 2.36 s (~60% decrease). Before index: ![image](https://user-images.githubusercontent.com/400797/219612037-582db89a-e611-4fdd-9a09-4f5de7c43121.png) After index: ![image](https://user-images.githubusercontent.com/400797/219611927-e78d00f3-3235-492f-8e4d-95ed8d7f1cfa.png) Insertions did not seem to be noticeably effected in my testing, but reindexing might be slower, will need to test. Otherwise this seems like a very easy win.
Author
Owner

@kkthxbye-code commented on GitHub (Feb 17, 2023):

I tested reindexing with and with an index on object_id and it seems like there's no difference.

With index:
python manage.py reindex 76,35s user 1,80s system 74% cpu 1:44,20 total

Without index:
python manage.py reindex 76,78s user 1,79s system 76% cpu 1:42,92 total

@kkthxbye-code commented on GitHub (Feb 17, 2023): I tested reindexing with and with an index on object_id and it seems like there's no difference. With index: python manage.py reindex 76,35s user 1,80s system 74% cpu 1:44,20 total Without index: python manage.py reindex 76,78s user 1,79s system 76% cpu 1:42,92 total
Author
Owner

@DanSheps commented on GitHub (Feb 17, 2023):

The device takes many seconds to create.

Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components.

Wonder if we could push changelogging to the RQWorker (maybe include the option to spin up additional workers as well) for certain execution paths (anything in a RQ worker already can be completed there, anything web/api can go through the RQ).

This might also offer additional performance improvements, you just wouldn't get real-time change logging.

@DanSheps commented on GitHub (Feb 17, 2023): > The device takes many seconds to create. > > Do note, that while the performance was never great, it was made worse by both the search indexing and the fact that we fixed a bug where changelog entries were not created for instantiated components. Wonder if we could push changelogging to the RQWorker (maybe include the option to spin up additional workers as well) for certain execution paths (anything in a RQ worker already can be completed there, anything web/api can go through the RQ). This might also offer additional performance improvements, you just wouldn't get real-time change logging.
Author
Owner

@jeremystretch commented on GitHub (Feb 18, 2023):

Thanks for tackling this @kkthxbye-code!

As we discussed earlier today, we can sidestep the root issue here and avoid clearing any existing cache entries altogether when a new object is created (for which it's impossible to have any pre-existing entries). Of course, we should still add the index to serve cases where we do need to clear them.

Response time after adding the index: 2.36 s (~60 decrease).

I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains?

Wonder if we could push changelogging to the RQWorker

I opened #11202 to explore that idea recently, but haven't done any work on it yet.

@jeremystretch commented on GitHub (Feb 18, 2023): Thanks for tackling this @kkthxbye-code! As we discussed earlier today, we can sidestep the root issue here and avoid clearing any existing cache entries altogether when a new object is created (for which it's impossible to have any pre-existing entries). Of course, we should still add the index to serve cases where we _do_ need to clear them. > Response time after adding the index: 2.36 s (~60 decrease). I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains? > Wonder if we could push changelogging to the RQWorker I opened #11202 to explore that idea recently, but haven't done any work on it yet.
Author
Owner

@jeremystretch commented on GitHub (Feb 18, 2023):

@kkthxbye-code can you test PR #11785 when you get a chance? I haven't added the index yet, but it should result in the same improvement as we're sidestepping the call to clear() entirely.

If this is deemed acceptable to resolve this specific issue, I'd like to do an audit to add indexes everywhere we use generic foreign keys.

@jeremystretch commented on GitHub (Feb 18, 2023): @kkthxbye-code can you test PR #11785 when you get a chance? I haven't added the index yet, but it _should_ result in the same improvement as we're sidestepping the call to `clear()` entirely. If this is deemed acceptable to resolve this specific issue, I'd like to do an audit to add indexes everywhere we use generic foreign keys.
Author
Owner

@kkthxbye-code commented on GitHub (Feb 18, 2023):

@jeremystretch

I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains?

I noted it in my first reply, but around 800k entries. Important to note that the 60% reduction is without DEBUG enabled. What does the reduction look like for you when just looking at the SQL tab in the debug toolbar?

I'll try the PR out!

@kkthxbye-code commented on GitHub (Feb 18, 2023): @jeremystretch > I'm not able to reproduce such a stark reduction, but it's likely my cache just isn't large enough to impose that much of a delay. Can you say roughly how many object your test cache contains? I noted it in my first reply, but around 800k entries. Important to note that the 60% reduction is without DEBUG enabled. What does the reduction look like for you when just looking at the SQL tab in the debug toolbar? I'll try the PR out!
Author
Owner

@jeremystretch commented on GitHub (Feb 18, 2023):

I noted it in my first reply, but around 800k entries.

Sorry, missed that. I'll need to generate a bunch of objects to increase the cache to test properly on my end.

What does the reduction look like for you when just looking at the SQL tab in the debug toolbar?

It shaves some time off, but I don't think I have a large enough cache yet for it to be noticeable.

@jeremystretch commented on GitHub (Feb 18, 2023): > I noted it in my first reply, but around 800k entries. Sorry, missed that. I'll need to generate a bunch of objects to increase the cache to test properly on my end. > What does the reduction look like for you when just looking at the SQL tab in the debug toolbar? It shaves some time off, but I don't think I have a large enough cache yet for it to be noticeable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#7652