Use a background job to handle bulk imports #11226

Closed
opened 2025-12-29 21:42:10 +01:00 by adam · 1 comment
Owner

Originally created by @jeremystretch on GitHub (May 27, 2025).

Originally assigned to: @jeremystretch on GitHub.

NetBox version

v4.3.1

Feature type

Change to existing functionality

Proposed functionality

Rather than processing bulk import synchronously, NetBox should offload them to a background job and provide the user a link (or automatically redirect) to the job where the progress of the import can be checked.

Use case

This will mitigate timeout errors during very large imports by removing the actual upload process from the request-response cycle.

Database changes

N/A

External dependencies

N/A

Originally created by @jeremystretch on GitHub (May 27, 2025). Originally assigned to: @jeremystretch on GitHub. ### NetBox version v4.3.1 ### Feature type Change to existing functionality ### Proposed functionality Rather than processing bulk import synchronously, NetBox should offload them to a background job and provide the user a link (or automatically redirect) to the job where the progress of the import can be checked. ### Use case This will mitigate timeout errors during very large imports by removing the actual upload process from the request-response cycle. ### Database changes N/A ### External dependencies N/A
adam added the status: acceptedtype: featurecomplexity: medium labels 2025-12-29 21:42:10 +01:00
adam closed this issue 2025-12-29 21:42:10 +01:00
Author
Owner

@sleepinggenius2 commented on GitHub (May 27, 2025):

In the seemingly never-ending process of migrating data from spreadsheets and other systems that we do not currently or cannot have proper integrations with, I find myself using the bulk import functionality quite often. My workflow today primarily leverages the textarea for input, as it provides the quickest iteration in the case of duplicate records or other errors. My datasets are often small (deltas) and would not be significantly improved by running asynchronously. However, this proposal has the potential to make that process significantly more cumbersome and less efficient from a UX standpoint. Therefore, I offer two suggestions:

  1. Make asynchronous execution opt-in or opt-out, depending on what the team feels is the most reasonable default. This would allow for a user to maintain the current benefits for smaller datasets, while also leveraging this new functionality for larger datasets.
  2. Provide better strategies for dealing with errors. By leveraging a background job, that would give you access to both logs and output in the Job object. It can be incredibly frustrating for large datasets to roll back the entire transaction after many minutes of processing due to a single errored record. I would greatly appreciate a mechanism to have errors logged, but allow them to be skipped, so as not to block the import of the rest of the records. The job output could additionally be used to capture either the entire data input or just the failed records to make it easier to iterate.

Overall, this sounds like a worthwhile proposal and the ability to leverage the fields in the Job model could provide a number of benefits, whether executed synchronously or asynchronously. Having a way to at least link to the appropriate list with the modified_by_request filter (like the current redirect page) would also still be incredibly helpful.

@sleepinggenius2 commented on GitHub (May 27, 2025): In the seemingly never-ending process of migrating data from spreadsheets and other systems that we do not currently or cannot have proper integrations with, I find myself using the bulk import functionality quite often. My workflow today primarily leverages the textarea for input, as it provides the quickest iteration in the case of duplicate records or other errors. My datasets are often small (deltas) and would not be significantly improved by running asynchronously. However, this proposal has the potential to make that process significantly more cumbersome and less efficient from a UX standpoint. Therefore, I offer two suggestions: 1. Make asynchronous execution opt-in or opt-out, depending on what the team feels is the most reasonable default. This would allow for a user to maintain the current benefits for smaller datasets, while also leveraging this new functionality for larger datasets. 2. Provide better strategies for dealing with errors. By leveraging a background job, that would give you access to both logs and output in the Job object. It can be incredibly frustrating for large datasets to roll back the entire transaction after many minutes of processing due to a single errored record. I would greatly appreciate a mechanism to have errors logged, but allow them to be skipped, so as not to block the import of the rest of the records. The job output could additionally be used to capture either the entire data input or just the failed records to make it easier to iterate. Overall, this sounds like a worthwhile proposal and the ability to leverage the fields in the Job model could provide a number of benefits, whether executed synchronously or asynchronously. Having a way to at least link to the appropriate list with the `modified_by_request` filter (like the current redirect page) would also still be incredibly helpful.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#11226