Fix randomly failing test test_enqueue_once_after_enqueue #10449

New Issue

adam · 2025-12-29T21:31:34+01:00

adam commented

2025-12-29 21:31:34 +01:00

Originally created by @jeremystretch on GitHub (Nov 7, 2024).

Originally assigned to: @bctiemann on GitHub.

Proposed Changes

The test netbox.tests.test_jobs.EnqueueTest.test_enqueue_once_after_enqueue occasionally fails for an unknown reason (see this example). This needs to be investigated and resolved.

Justification

CI tests should always pass reliably.

Originally created by @jeremystretch on GitHub (Nov 7, 2024). Originally assigned to: @bctiemann on GitHub. ### Proposed Changes The test `netbox.tests.test_jobs.EnqueueTest.test_enqueue_once_after_enqueue` occasionally fails for an unknown reason (see [this example](https://github.com/netbox-community/netbox/actions/runs/11724028782/job/32656992871?pr=17916)). This needs to be investigated and resolved. ### Justification CI tests should always pass reliably.

adam added the status: accepted type: housekeeping netbox labels 2025-12-29 21:31:34 +01:00

adam closed this issue

2025-12-29 21:31:35 +01:00

adam commented

2025-12-29 21:31:36 +01:00

@bctiemann commented on GitHub (Nov 13, 2024):

The exception is being raised from here, in rq/job.py --

        if refresh:
            status = self.connection.hget(self.key, 'status')
            if not status:
                raise InvalidJobOperation(f"Failed to retrieve status for job: {self.id}")
            self._status = JobStatus(as_text(status))

self.connection at that point when running locally is a redis.client.Redis instance. What is the setup in CI? Does it have Redis available? Or does this connection need to be mocked?

It seems like whatever caching backend is present in CI is intermittently failing to return a result for the key of the job being enqueue_once'd:

        job1 = TestJobRunner.enqueue(instance, schedule_at=self.get_schedule_at())
        job2 = TestJobRunner.enqueue_once(instance, schedule_at=self.get_schedule_at(2))

            # If the job parameters haven't changed, don't schedule a new job and keep the current schedule. Otherwise,
            # delete the existing job and schedule a new job instead.
            if (schedule_at and job.scheduled == schedule_at) and (job.interval == interval):
                return job
            job.delete()

We are trying to delete the job (because it is being called with new parameters), but its key is not in the cache when the above rq code is called, so it raises the exception. Maybe this is because the first instance of the job has already completed by the time the second one is enqueued? But I would think the key would still be present and it would have a status of finished, rather than not being there at all.

Since I have Redis in my local environment, this always works properly, but I suspect the setup in CI is different.

@bctiemann commented on GitHub (Nov 13, 2024): The exception is being raised from here, in `rq/job.py` -- ``` if refresh: status = self.connection.hget(self.key, 'status') if not status: raise InvalidJobOperation(f"Failed to retrieve status for job: {self.id}") self._status = JobStatus(as_text(status)) ``` `self.connection` at that point when running locally is a `redis.client.Redis` instance. What is the setup in CI? Does it have Redis available? Or does this connection need to be mocked? It seems like whatever caching backend is present in CI is intermittently failing to return a result for the `key` of the job being `enqueue_once`'d: ``` job1 = TestJobRunner.enqueue(instance, schedule_at=self.get_schedule_at()) job2 = TestJobRunner.enqueue_once(instance, schedule_at=self.get_schedule_at(2)) ``` ``` # If the job parameters haven't changed, don't schedule a new job and keep the current schedule. Otherwise, # delete the existing job and schedule a new job instead. if (schedule_at and job.scheduled == schedule_at) and (job.interval == interval): return job job.delete() ``` We are trying to `delete` the job (because it is being called with new parameters), but its key is not in the cache when the above `rq` code is called, so it raises the exception. Maybe this is because the first instance of the job has already completed by the time the second one is enqueued? But I would think the key would still be present and it would have a status of `finished`, rather than not being there at all. Since I have Redis in my local environment, this always works properly, but I suspect the setup in CI is different.

adam commented

2025-12-29 21:31:36 +01:00

@jsenecal commented on GitHub (Nov 13, 2024):

HA! It was driving me mad yesterday, thanks for flagging this @jeremystretch

@jsenecal commented on GitHub (Nov 13, 2024): HA! It was driving me mad yesterday, thanks for flagging this @jeremystretch

Sign in to join this conversation.

Branches Tags

main

update-changelog-comments-docs

feature-removal-issue-type

20911-dropdown

20239-plugin-menu-classes-mutable-state

21097-graphql-id-lookups

feature

fix_module_substitution

20923-dcim-templates

20044-elevation-stuck-lightmode

feature-ip-prefix-link

v4.5-beta1-release

20068-import-moduletype-attrs

20766-fix-german-translation-code-literals

20378-del-script

7604-filter-modifiers-v3

circuit-swap

12318-case-insensitive-uniqueness

20637-improve-device-q-filter

20660-script-load

19724-graphql

20614-update-ruff

14884-script

02496-max-page

19720-macaddress-interface-generic-relation

19408-circuit-terminations-export-templates

20203-openapi-check

fix-19669-api-image-download

7604-filter-modifiers

19275-fixes-interface-bulk-edit

fix-17794-get_field_value_return_list

11507-show-aggregate-and-rir-on-api

9583-add_column_specific_search_field_to_tables

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/netbox#10449

Fix randomly failing test test_enqueue_once_after_enqueue #10449

Proposed Changes

Justification

Fix randomly failing test `test_enqueue_once_after_enqueue` #10449