The RQ worker should verify that a job exists in the database before attempting to start it #10883

Closed
opened 2025-12-29 21:37:14 +01:00 by adam · 2 comments
Owner

Originally created by @jeremystretch on GitHub (Mar 12, 2025).

Originally assigned to: @jeremystretch on GitHub.

NetBox version

v4.2.5

Feature type

New functionality

Proposed functionality

Extend the handle() method of the base JobRunner class to confirm that a Job exists in the database before attempting to call start().

If the Job does not exist yet, it should retry several times after an incrementally increasing backoff timer (e.g. 0.5 seconds, 1 second, 2 seconds).

Use case

When a background RQ task is enqueued, a serialized representation of a Job object is stored with it. When the task executes, it does so with the assumption that its associated Job already exists in the database. However, this may not hold true due to various race conditions, such as the one captured by netboxlabs/netbox-branching#193. (I've also seen this happen with data source syncing.)

Implementing a short delay mitigates the race condition that occurs between enqueuing a background task in Redis and committing the PostgreSQL transaction within which its corresponding Job object was created.

Database changes

N/A

External dependencies

N/A

Originally created by @jeremystretch on GitHub (Mar 12, 2025). Originally assigned to: @jeremystretch on GitHub. ### NetBox version v4.2.5 ### Feature type New functionality ### Proposed functionality Extend the `handle()` method of the base JobRunner class to confirm that a Job exists in the database before attempting to call `start()`. If the Job does not exist yet, it should retry several times after an incrementally increasing backoff timer (e.g. 0.5 seconds, 1 second, 2 seconds). ### Use case When a background RQ task is enqueued, a serialized representation of a Job object is stored with it. When the task executes, it does so with the assumption that its associated Job already exists in the database. However, this may not hold true due to various race conditions, such as the one captured by netboxlabs/netbox-branching#193. (I've also seen this happen with data source syncing.) Implementing a short delay mitigates the race condition that occurs between enqueuing a background task in Redis and committing the PostgreSQL transaction within which its corresponding Job object was created. ### Database changes N/A ### External dependencies N/A
adam added the status: acceptedtype: featurecomplexity: medium labels 2025-12-29 21:37:14 +01:00
adam closed this issue 2025-12-29 21:37:15 +01:00
Author
Owner

@tobiasge commented on GitHub (Mar 13, 2025):

I've seen things like that happen in our own application. One of the reasons why we switched to Celery for background tasks, In Celery there is delay_on_commit to mitigate this problem. But I think the same can be applied to RQ tasks. See these links:

@tobiasge commented on GitHub (Mar 13, 2025): I've seen things like that happen in our own application. One of the reasons why we switched to Celery for background tasks, In Celery there is `delay_on_commit` to mitigate this problem. But I think the same can be applied to RQ tasks. See these links: - https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#trigger-tasks-at-the-end-of-the-database-transaction - https://docs.celeryq.dev/en/stable/_modules/celery/contrib/django/task.html#DjangoTask.delay_on_commit
Author
Owner

@jeremystretch commented on GitHub (Mar 13, 2025):

Ahh this is what @arthanson mentioned yesterday. I wasn't sure if this works outside an explicit transaction, but it seems to. That should work well then.

@jeremystretch commented on GitHub (Mar 13, 2025): Ahh this is what @arthanson mentioned yesterday. I wasn't sure if this works outside an explicit transaction, but it seems to. That should work well then.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#10883