Fix randomly failing test test_enqueue_once_after_enqueue #10449

Closed
opened 2025-12-29 21:31:34 +01:00 by adam · 2 comments
Owner

Originally created by @jeremystretch on GitHub (Nov 7, 2024).

Originally assigned to: @bctiemann on GitHub.

Proposed Changes

The test netbox.tests.test_jobs.EnqueueTest.test_enqueue_once_after_enqueue occasionally fails for an unknown reason (see this example). This needs to be investigated and resolved.

Justification

CI tests should always pass reliably.

Originally created by @jeremystretch on GitHub (Nov 7, 2024). Originally assigned to: @bctiemann on GitHub. ### Proposed Changes The test `netbox.tests.test_jobs.EnqueueTest.test_enqueue_once_after_enqueue` occasionally fails for an unknown reason (see [this example](https://github.com/netbox-community/netbox/actions/runs/11724028782/job/32656992871?pr=17916)). This needs to be investigated and resolved. ### Justification CI tests should always pass reliably.
adam added the status: acceptedtype: housekeepingnetbox labels 2025-12-29 21:31:34 +01:00
adam closed this issue 2025-12-29 21:31:35 +01:00
Author
Owner

@bctiemann commented on GitHub (Nov 13, 2024):

The exception is being raised from here, in rq/job.py --

        if refresh:
            status = self.connection.hget(self.key, 'status')
            if not status:
                raise InvalidJobOperation(f"Failed to retrieve status for job: {self.id}")
            self._status = JobStatus(as_text(status))

self.connection at that point when running locally is a redis.client.Redis instance. What is the setup in CI? Does it have Redis available? Or does this connection need to be mocked?

It seems like whatever caching backend is present in CI is intermittently failing to return a result for the key of the job being enqueue_once'd:

        job1 = TestJobRunner.enqueue(instance, schedule_at=self.get_schedule_at())
        job2 = TestJobRunner.enqueue_once(instance, schedule_at=self.get_schedule_at(2))
            # If the job parameters haven't changed, don't schedule a new job and keep the current schedule. Otherwise,
            # delete the existing job and schedule a new job instead.
            if (schedule_at and job.scheduled == schedule_at) and (job.interval == interval):
                return job
            job.delete()

We are trying to delete the job (because it is being called with new parameters), but its key is not in the cache when the above rq code is called, so it raises the exception. Maybe this is because the first instance of the job has already completed by the time the second one is enqueued? But I would think the key would still be present and it would have a status of finished, rather than not being there at all.

Since I have Redis in my local environment, this always works properly, but I suspect the setup in CI is different.

@bctiemann commented on GitHub (Nov 13, 2024): The exception is being raised from here, in `rq/job.py` -- ``` if refresh: status = self.connection.hget(self.key, 'status') if not status: raise InvalidJobOperation(f"Failed to retrieve status for job: {self.id}") self._status = JobStatus(as_text(status)) ``` `self.connection` at that point when running locally is a `redis.client.Redis` instance. What is the setup in CI? Does it have Redis available? Or does this connection need to be mocked? It seems like whatever caching backend is present in CI is intermittently failing to return a result for the `key` of the job being `enqueue_once`'d: ``` job1 = TestJobRunner.enqueue(instance, schedule_at=self.get_schedule_at()) job2 = TestJobRunner.enqueue_once(instance, schedule_at=self.get_schedule_at(2)) ``` ``` # If the job parameters haven't changed, don't schedule a new job and keep the current schedule. Otherwise, # delete the existing job and schedule a new job instead. if (schedule_at and job.scheduled == schedule_at) and (job.interval == interval): return job job.delete() ``` We are trying to `delete` the job (because it is being called with new parameters), but its key is not in the cache when the above `rq` code is called, so it raises the exception. Maybe this is because the first instance of the job has already completed by the time the second one is enqueued? But I would think the key would still be present and it would have a status of `finished`, rather than not being there at all. Since I have Redis in my local environment, this always works properly, but I suspect the setup in CI is different.
Author
Owner

@jsenecal commented on GitHub (Nov 13, 2024):

HA! It was driving me mad yesterday, thanks for flagging this @jeremystretch

@jsenecal commented on GitHub (Nov 13, 2024): HA! It was driving me mad yesterday, thanks for flagging this @jeremystretch
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#10449