Core's context-middleware race condition when using threads with shared memory #3238

Closed
opened 2025-12-29 18:27:00 +01:00 by adam · 7 comments
Owner

Originally created by @juan-vg on GitHub (Jan 29, 2020).

Environment

  • Python version: 3.6
  • NetBox version: 2.6

Steps to Reproduce

  1. Use a ThreadPoolExecutor where the threads use the ObjectChangeMiddleware

The perfect example can be found at #3726

Expected Behavior

The use of threads is transparent and works well without failures coming from system's core

Observed Behavior

Fails appear when several threads concurrently use the context-middleware

Traceback (most recent call last):
  File "/home/jstretch/netbox/netbox/ipam/tests/test_api.py", line 689, in setUp
    super().setUp()
  File "/home/jstretch/netbox/netbox/utilities/testing.py", line 15, in setUp
    self.user = User.objects.create(username='testuser', is_superuser=True)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/query.py", line 422, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/contrib/auth/base_user.py", line 66, in save
    super().save(*args, **kwargs)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/base.py", line 741, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/base.py", line 790, in save_base
    update_fields=update_fields, raw=raw, using=using,
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/dispatch/dispatcher.py", line 175, in send
    for receiver in self._live_receivers(sender)
  File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/dispatch/dispatcher.py", line 175, in <listcomp>
    for receiver in self._live_receivers(sender)
  File "/home/jstretch/netbox/netbox/extras/middleware.py", line 27, in handle_changed_object
    _thread_locals.changed_objects.append(
AttributeError: '_thread._local' object has no attribute 'changed_objects'

Having a look at the "netbox/extras/middleware.py" file you will be able to see that the 'changed_objects' attrib is separately set for each thread environment (threading.local()) which calls first the class ObjectChangeMiddleware (indirectly calling the method __call__).

The problem appears when a thread calls the function handle_changed_object without calling first the class. In order to illustrate this behaviour I've added some logging to the file. What is actually happening can be seen on the following trace.
Format: (threadName)::funcName : message

$ ./manage.py test ipam.tests.test_api.ParallelAPITests
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
(ThreadPoolExecutor-0_0)::__init__ :  
(ThreadPoolExecutor-0_1)::__init__ :  
(ThreadPoolExecutor-0_1)::__call__ :  
(ThreadPoolExecutor-0_2)::__call__ :  
(ThreadPoolExecutor-0_3)::__call__ :  
(ThreadPoolExecutor-0_5)::__call__ :  
(ThreadPoolExecutor-0_8)::__call__ :  
(ThreadPoolExecutor-0_0)::__call__ :  
(ThreadPoolExecutor-0_13)::__call__ :  
(ThreadPoolExecutor-0_6)::__call__ :  
(ThreadPoolExecutor-0_4)::__call__ :  
(ThreadPoolExecutor-0_10)::__call__ :  
(ThreadPoolExecutor-0_14)::__call__ :  
(ThreadPoolExecutor-0_15)::__call__ :  
(ThreadPoolExecutor-0_7)::__call__ :  
(ThreadPoolExecutor-0_11)::__call__ :  
(ThreadPoolExecutor-0_12)::__call__ :  
(ThreadPoolExecutor-0_9)::__call__ :  
E(MainThread)::handle_changed_object :  not hasattr(_thread_locals, "changed_objects")
E
======================================================================
ERROR: test_create_multiple_available_prefixes_parallel (ipam.tests.test_api.ParallelAPITests)
----------------------------------------------------------------------
...
  File "./netbox/extras/middleware.py", line 33, in handle_changed_object
    _thread_locals.changed_objects.append(
AttributeError: '_thread._local' object has no attribute 'changed_objects'

As you can see above, MainThread never calls the __call__ method, so the attrib is never initialized.

As you may have seen above, context middleware bases its use on a separated-thread approach, but doesn't follow a shared-memory one. It's intended to work on several threads, as long as each one creates a new instance of the class. However, on an environment where 2 or more threads can share memory (say variables), there is a race condition that can cause this class to fail. Just put together a separated and isolated environment for each thread, and a thread not being properly initialized because the class instance of the context-middleware has been shared to it.

Originally created by @juan-vg on GitHub (Jan 29, 2020). ### Environment * Python version: 3.6 * NetBox version: 2.6 <!-- Describe in detail the exact steps that someone else can take to reproduce this bug using the current stable release of NetBox (or the current beta release where applicable). Begin with the creation of any necessary database objects and call out every operation being performed explicitly. If reporting a bug in the REST API, be sure to reconstruct the raw HTTP request(s) being made: Don't rely on a wrapper like pynetbox. --> ### Steps to Reproduce 1. Use a ThreadPoolExecutor where the threads use the ObjectChangeMiddleware The perfect example can be found at #3726 <!-- What did you expect to happen? --> ### Expected Behavior The use of threads is transparent and works well without failures coming from system's core <!-- What happened instead? --> ### Observed Behavior Fails appear when several threads concurrently use the context-middleware ``` Traceback (most recent call last): File "/home/jstretch/netbox/netbox/ipam/tests/test_api.py", line 689, in setUp super().setUp() File "/home/jstretch/netbox/netbox/utilities/testing.py", line 15, in setUp self.user = User.objects.create(username='testuser', is_superuser=True) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/query.py", line 422, in create obj.save(force_insert=True, using=self.db) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/contrib/auth/base_user.py", line 66, in save super().save(*args, **kwargs) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/base.py", line 741, in save force_update=force_update, update_fields=update_fields) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/db/models/base.py", line 790, in save_base update_fields=update_fields, raw=raw, using=using, File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/dispatch/dispatcher.py", line 175, in send for receiver in self._live_receivers(sender) File "/home/jstretch/.virtualenvs/netbox/lib/python3.6/site-packages/django/dispatch/dispatcher.py", line 175, in <listcomp> for receiver in self._live_receivers(sender) File "/home/jstretch/netbox/netbox/extras/middleware.py", line 27, in handle_changed_object _thread_locals.changed_objects.append( AttributeError: '_thread._local' object has no attribute 'changed_objects' ``` Having a look at the _"netbox/extras/middleware.py"_ file you will be able to see that the '_changed_objects_' attrib **is separately set for each thread** environment (_threading.local()_) **which calls first the class** _ObjectChangeMiddleware_ (indirectly calling the method _\_\_call\_\__). The problem appears when a thread calls the function _handle_changed_object_ without calling first the class. In order to illustrate this behaviour I've added some logging to the file. What is actually happening can be seen on the following trace. **Format**: _(threadName)::funcName : message_ ``` $ ./manage.py test ipam.tests.test_api.ParallelAPITests Creating test database for alias 'default'... System check identified no issues (0 silenced). (ThreadPoolExecutor-0_0)::__init__ : (ThreadPoolExecutor-0_1)::__init__ : (ThreadPoolExecutor-0_1)::__call__ : (ThreadPoolExecutor-0_2)::__call__ : (ThreadPoolExecutor-0_3)::__call__ : (ThreadPoolExecutor-0_5)::__call__ : (ThreadPoolExecutor-0_8)::__call__ : (ThreadPoolExecutor-0_0)::__call__ : (ThreadPoolExecutor-0_13)::__call__ : (ThreadPoolExecutor-0_6)::__call__ : (ThreadPoolExecutor-0_4)::__call__ : (ThreadPoolExecutor-0_10)::__call__ : (ThreadPoolExecutor-0_14)::__call__ : (ThreadPoolExecutor-0_15)::__call__ : (ThreadPoolExecutor-0_7)::__call__ : (ThreadPoolExecutor-0_11)::__call__ : (ThreadPoolExecutor-0_12)::__call__ : (ThreadPoolExecutor-0_9)::__call__ : E(MainThread)::handle_changed_object : not hasattr(_thread_locals, "changed_objects") E ====================================================================== ERROR: test_create_multiple_available_prefixes_parallel (ipam.tests.test_api.ParallelAPITests) ---------------------------------------------------------------------- ... File "./netbox/extras/middleware.py", line 33, in handle_changed_object _thread_locals.changed_objects.append( AttributeError: '_thread._local' object has no attribute 'changed_objects' ``` As you can see above, _MainThread_ never calls the _\_\_call\_\__ method, so the attrib is never initialized. As you may have seen above, context middleware bases its use on a separated-thread approach, but doesn't follow a shared-memory one. It's intended to work on several threads, as long as each one creates a new instance of the class. However, on an environment where 2 or more threads can share memory (say variables), there is a race condition that can cause this class to fail. **Just put together a separated and isolated environment for each thread, and a thread not being properly initialized because the class instance of the context-middleware has been shared to it**.
adam added the type: feature label 2025-12-29 18:27:00 +01:00
adam closed this issue 2025-12-29 18:27:00 +01:00
Author
Owner

@juan-vg commented on GitHub (Jan 29, 2020):

After carefully reading the already present code of the middleware I've concluded that the best and simple fix is the following patch. As far as I can see there's no possible backward incompatibility.

diff --git a/netbox/extras/middleware.py b/netbox/extras/middleware.py
index 57f8f37d..a42f86ef 100644
--- a/netbox/extras/middleware.py
+++ b/netbox/extras/middleware.py
@@ -18,12 +18,29 @@ from .webhooks import enqueue_webhooks
 _thread_locals = threading.local()
 
 
+def _init_thread_locals():
+    """
+    Initialize the thread local env
+    """
+    _thread_locals.changed_objects = []
+
+
+def _check_thread_locals():
+    """
+    Check if the thread locals need to be initialized
+    """
+    if not hasattr(_thread_locals, "changed_objects"):
+        _init_thread_locals()
+
+
 def handle_changed_object(sender, instance, **kwargs):
     """
     Fires when an object is created or updated.
     """
     # Queue the object for processing once the request completes
     action = OBJECTCHANGE_ACTION_CREATE if kwargs['created'] else OBJECTCHANGE_ACTION_UPDATE
+
+    _check_thread_locals()
     _thread_locals.changed_objects.append(
         (instance, action)
     )
@@ -44,6 +61,7 @@ def handle_deleted_object(sender, instance, **kwargs):
     if hasattr(instance, 'tags'):
         copy.tags = DummyQuerySet(instance.tags.all())
 
+    _check_thread_locals()
     # Queue the copy of the object for processing once the request completes
     _thread_locals.changed_objects.append(
         (copy, OBJECTCHANGE_ACTION_DELETE)
@@ -54,6 +72,7 @@ def purge_objectchange_cache(sender, **kwargs):
     """
     Delete any queued object changes waiting to be written.
     """
+    _check_thread_locals()
     _thread_locals.changed_objects = []
 
 
@@ -71,13 +90,13 @@ class ObjectChangeMiddleware(object):
     have been created. Conversely, deletions are acted upon immediately, so that the serialized representation of the
     object is recorded before it (and any related objects) are actually deleted from the database.
     """
+
     def __init__(self, get_response):
         self.get_response = get_response
 
     def __call__(self, request):
-
         # Initialize an empty list to cache objects being saved.
-        _thread_locals.changed_objects = []
+        _init_thread_locals()
 
         # Assign a random unique ID to the request. This will be used to associate multiple object changes made during
         # the same request.

@juan-vg commented on GitHub (Jan 29, 2020): After carefully reading the already present code of the middleware I've concluded that the best and simple fix is the following patch. As far as I can see there's no possible backward incompatibility. ```diff diff --git a/netbox/extras/middleware.py b/netbox/extras/middleware.py index 57f8f37d..a42f86ef 100644 --- a/netbox/extras/middleware.py +++ b/netbox/extras/middleware.py @@ -18,12 +18,29 @@ from .webhooks import enqueue_webhooks _thread_locals = threading.local() +def _init_thread_locals(): + """ + Initialize the thread local env + """ + _thread_locals.changed_objects = [] + + +def _check_thread_locals(): + """ + Check if the thread locals need to be initialized + """ + if not hasattr(_thread_locals, "changed_objects"): + _init_thread_locals() + + def handle_changed_object(sender, instance, **kwargs): """ Fires when an object is created or updated. """ # Queue the object for processing once the request completes action = OBJECTCHANGE_ACTION_CREATE if kwargs['created'] else OBJECTCHANGE_ACTION_UPDATE + + _check_thread_locals() _thread_locals.changed_objects.append( (instance, action) ) @@ -44,6 +61,7 @@ def handle_deleted_object(sender, instance, **kwargs): if hasattr(instance, 'tags'): copy.tags = DummyQuerySet(instance.tags.all()) + _check_thread_locals() # Queue the copy of the object for processing once the request completes _thread_locals.changed_objects.append( (copy, OBJECTCHANGE_ACTION_DELETE) @@ -54,6 +72,7 @@ def purge_objectchange_cache(sender, **kwargs): """ Delete any queued object changes waiting to be written. """ + _check_thread_locals() _thread_locals.changed_objects = [] @@ -71,13 +90,13 @@ class ObjectChangeMiddleware(object): have been created. Conversely, deletions are acted upon immediately, so that the serialized representation of the object is recorded before it (and any related objects) are actually deleted from the database. """ + def __init__(self, get_response): self.get_response = get_response def __call__(self, request): - # Initialize an empty list to cache objects being saved. - _thread_locals.changed_objects = [] + _init_thread_locals() # Assign a random unique ID to the request. This will be used to associate multiple object changes made during # the same request. ```
Author
Owner

@stale[bot] commented on GitHub (Feb 13, 2020):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our contributing guide.

@stale[bot] commented on GitHub (Feb 13, 2020): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Please see our [contributing guide](https://github.com/netbox-community/netbox/blob/develop/CONTRIBUTING.md).
Author
Owner

@juan-vg commented on GitHub (Feb 13, 2020):

@jeremystretch Could you have a look please? I would not like this to be closed!

@juan-vg commented on GitHub (Feb 13, 2020): @jeremystretch Could you have a look please? I would not like this to be closed!
Author
Owner

@jeremystretch commented on GitHub (Feb 14, 2020):

Haven't been able to dig into this yet, but it's tagged now to keep stalebot away.

@jeremystretch commented on GitHub (Feb 14, 2020): Haven't been able to dig into this yet, but it's tagged now to keep stalebot away.
Author
Owner

@edwinbalani commented on GitHub (Jul 7, 2020):

I've been encountering this issue on a small installation. Interestingly, it's possible the issue's only cropped up shortly after an upgrade (first from 2.8.5 to 2.8.6, then 2.8.6 to 2.8.7) and corresponding restarts of our WSGI server (uwsgi) and the django-rq runner.

Is there a compelling reason to keep handle_changed_object and friends as module-level functions here, and not define them in the local scope of ObjectChangeMiddleware.__call__()? The middleware is the only place these functions are used anyway; they're not being imported elsewhere.

My aim here is to avoid the use of thread-local storage, which (without digging into this too much) seems to be one contributing cause of the problem.

I have a format like this in mind:

class ObjectChangeMiddleware:
    def __call__(self, request):
        changed_objects = []

        def handle_changed_object(sender, instance, **kwargs):
            action = ObjectChangeActionChoices.ACTION_CREATE if kwargs['created'] else ObjectChangeActionChoices.ACTION_UPDATE
            changed_objects.append((instance, action))
        post_save.connect(handle_changed_object, dispatch_uid='handle_changed_object')

        # repeat for other handlers
@edwinbalani commented on GitHub (Jul 7, 2020): I've been encountering this issue on a small installation. Interestingly, it's possible the issue's only cropped up shortly after an upgrade (first from 2.8.5 to 2.8.6, then 2.8.6 to 2.8.7) and corresponding restarts of our WSGI server (uwsgi) and the django-rq runner. Is there a compelling reason to keep `handle_changed_object` and friends as module-level functions here, and not define them in the local scope of `ObjectChangeMiddleware.__call__()`? The middleware is the only place these functions are used anyway; they're not being imported elsewhere. My aim here is to avoid the use of thread-local storage, which (without digging into this too much) seems to be one contributing cause of the problem. I have a format like this in mind: ```python class ObjectChangeMiddleware: def __call__(self, request): changed_objects = [] def handle_changed_object(sender, instance, **kwargs): action = ObjectChangeActionChoices.ACTION_CREATE if kwargs['created'] else ObjectChangeActionChoices.ACTION_UPDATE changed_objects.append((instance, action)) post_save.connect(handle_changed_object, dispatch_uid='handle_changed_object') # repeat for other handlers ```
Author
Owner

@jeremystretch commented on GitHub (Aug 17, 2020):

Blocked and pending closure assuming #4990 is resolved as intended by #5010.

@jeremystretch commented on GitHub (Aug 17, 2020): Blocked and pending closure assuming #4990 is resolved as intended by #5010.
Author
Owner

@jeremystretch commented on GitHub (Aug 18, 2020):

The middleware was extensively refactored under #5010 and this proposal is no longer relevant.

@jeremystretch commented on GitHub (Aug 18, 2020): The middleware was extensively refactored under #5010 and this proposal is no longer relevant.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#3238