The RQ worker should verify that a job exists in the database before attempting to start it #10883

New Issue

adam · 2025-12-29T21:37:14+01:00

adam commented

2025-12-29 21:37:14 +01:00

Originally created by @jeremystretch on GitHub (Mar 12, 2025).

Originally assigned to: @jeremystretch on GitHub.

NetBox version

v4.2.5

Feature type

New functionality

Proposed functionality

Extend the handle() method of the base JobRunner class to confirm that a Job exists in the database before attempting to call start().

If the Job does not exist yet, it should retry several times after an incrementally increasing backoff timer (e.g. 0.5 seconds, 1 second, 2 seconds).

Use case

When a background RQ task is enqueued, a serialized representation of a Job object is stored with it. When the task executes, it does so with the assumption that its associated Job already exists in the database. However, this may not hold true due to various race conditions, such as the one captured by netboxlabs/netbox-branching#193. (I've also seen this happen with data source syncing.)

Implementing a short delay mitigates the race condition that occurs between enqueuing a background task in Redis and committing the PostgreSQL transaction within which its corresponding Job object was created.

Database changes

N/A

External dependencies

N/A

Originally created by @jeremystretch on GitHub (Mar 12, 2025). Originally assigned to: @jeremystretch on GitHub. ### NetBox version v4.2.5 ### Feature type New functionality ### Proposed functionality Extend the `handle()` method of the base JobRunner class to confirm that a Job exists in the database before attempting to call `start()`. If the Job does not exist yet, it should retry several times after an incrementally increasing backoff timer (e.g. 0.5 seconds, 1 second, 2 seconds). ### Use case When a background RQ task is enqueued, a serialized representation of a Job object is stored with it. When the task executes, it does so with the assumption that its associated Job already exists in the database. However, this may not hold true due to various race conditions, such as the one captured by netboxlabs/netbox-branching#193. (I've also seen this happen with data source syncing.) Implementing a short delay mitigates the race condition that occurs between enqueuing a background task in Redis and committing the PostgreSQL transaction within which its corresponding Job object was created. ### Database changes N/A ### External dependencies N/A

adam added the status: accepted type: feature complexity: medium labels 2025-12-29 21:37:14 +01:00

adam closed this issue

2025-12-29 21:37:15 +01:00

adam commented

2025-12-29 21:37:15 +01:00

@tobiasge commented on GitHub (Mar 13, 2025):

I've seen things like that happen in our own application. One of the reasons why we switched to Celery for background tasks, In Celery there is delay_on_commit to mitigate this problem. But I think the same can be applied to RQ tasks. See these links:

@tobiasge commented on GitHub (Mar 13, 2025): I've seen things like that happen in our own application. One of the reasons why we switched to Celery for background tasks, In Celery there is `delay_on_commit` to mitigate this problem. But I think the same can be applied to RQ tasks. See these links: - https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#trigger-tasks-at-the-end-of-the-database-transaction - https://docs.celeryq.dev/en/stable/_modules/celery/contrib/django/task.html#DjangoTask.delay_on_commit

adam commented

2025-12-29 21:37:16 +01:00

@jeremystretch commented on GitHub (Mar 13, 2025):

Ahh this is what @arthanson mentioned yesterday. I wasn't sure if this works outside an explicit transaction, but it seems to. That should work well then.

@jeremystretch commented on GitHub (Mar 13, 2025): Ahh this is what @arthanson mentioned yesterday. I wasn't sure if this works outside an explicit transaction, but it seems to. That should work well then.

adam referenced this issue

2025-12-29 23:20:22 +01:00

[PR #10883] [MERGED] Closes #9439: Ensure thread safety of change logging functions #13708

Sign in to join this conversation.

Branches Tags

main

21524-invlaid-paths-exception

21518-cf-decimal-zero

21356-etags

feature

20787-spectacular

21477-extend-graphql-api-filters-for-cables

21331-deprecate-querystring-tag

21304-deprecate-housekeeping-command

21429-cable-create-add-another-does-not-carry-over-termination

21364-swagger

20442-callable-audit

feature-ip-prefix-link

20923-dcim-templates

20911-dropdown-3

fix_module_substitution

21203-q-attr-denorm

21160-filterset

21118-site

20911-dropdown-2

21102-fix-graphiql-explorer

20044-elevation-stuck-lightmode

v4.5-beta1-release

20068-import-moduletype-attrs

20766-fix-german-translation-code-literals

20378-del-script

7604-filter-modifiers-v3

circuit-swap

12318-case-insensitive-uniqueness

20637-improve-device-q-filter

20660-script-load

19724-graphql

20614-update-ruff

14884-script

02496-max-page

19720-macaddress-interface-generic-relation

19408-circuit-terminations-export-templates

20203-openapi-check

fix-19669-api-image-download

7604-filter-modifiers

19275-fixes-interface-bulk-edit

fix-17794-get_field_value_return_list

11507-show-aggregate-and-rir-on-api

9583-add_column_specific_search_field_to_tables

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/netbox#10883