gunicorn spends 80% of each request in get_object_permissions() #5445

Closed
opened 2025-12-29 19:28:04 +01:00 by adam · 21 comments
Owner

Originally created by @heroin-moose on GitHub (Sep 30, 2021).

NetBox version

v3.0.3

Python version

3.9

Steps to Reproduce

  1. Configure Netbox with AD authentication (a few thousands users + heavy nesting)
  2. Create a bunch of devices (~500 servers in my case, ~500 cables, ~50 VLANs, ~500 IP addresses, ~40 prefixes, ~40 racks in 3 sites)
  3. Try to access https://netbox.example.com/dcim/devices/1234
  4. See that gunicorn process is suddenly eats 100% CPU and the request takes more than a second

Expected Behavior

The request should be completed way faster.

Observed Behavior

For me the situtation looks like this: the HTTP request takes ~1.6s to complete and ~1.3s of that request was spent in authentication.py:get_object_permissions(). I am terrible with Python, but I came up with a patch:

@@ -11,6 +11,9 @@ from django.db.models import Q
 from users.models import ObjectPermission
 from utilities.permissions import permission_is_exempt, resolve_permission, resolve_permission_ct

+import time
+import syslog
+
 UserModel = get_user_model()


@@ -38,11 +42,18 @@ class ObjectPermissionMixin():

         # Create a dictionary mapping permissions to their constraints
         perms = defaultdict(list)
+        i = 0
+        start = time.time()
         for obj_perm in object_permissions:
             for object_type in obj_perm.object_types.all():
                 for action in obj_perm.actions:
                     perm_name = f"{object_type.app_label}.{action}_{object_type.model}"
                     perms[perm_name].extend(obj_perm.list_constraints())
+                    syslog.syslog(perm_name)
+                    i += 1
+        end = time.time()
+        t = end - start
+        syslog.syslog("took " + str(t) + " seconds with " + str(i) + " iterations")

         return perms

The output:

Sep 29 22:55:18 netbox.example.com gunicorn[78629]: took 1.325146198272705 seconds with 10397 iterations

Wow. That's a lot. Okay, more dirty patching:

@@ -24,7 +27,7 @@ class ObjectPermissionMixin():
         return user_obj._object_perm_cache

     def get_permission_filter(self, user_obj):
-        return Q(users=user_obj) | Q(groups__user=user_obj)
+        return Q(users=user_obj)

     def get_object_permissions(self, user_obj):
         """

And now I have this:

Sep 29 22:57:51 netbox.example.com gunicorn[78889]: took 0.17675495147705078 seconds with 1126 iterations

Still a lot (the request itself took ~400ms), but a lot better.

I'm not familiar with Netbox internals at all, so I'm not sure where to start digging.

Originally created by @heroin-moose on GitHub (Sep 30, 2021). ### NetBox version v3.0.3 ### Python version 3.9 ### Steps to Reproduce 1. Configure Netbox with AD authentication (a few thousands users + heavy nesting) 2. Create a bunch of devices (~500 servers in my case, ~500 cables, ~50 VLANs, ~500 IP addresses, ~40 prefixes, ~40 racks in 3 sites) 3. Try to access https://netbox.example.com/dcim/devices/1234 4. See that gunicorn process is suddenly eats 100% CPU and the request takes more than a second ### Expected Behavior The request should be completed way faster. ### Observed Behavior For me the situtation looks like this: the HTTP request takes ~1.6s to complete and ~1.3s of that request was spent in authentication.py:get_object_permissions(). I am terrible with Python, but I came up with a patch: ``` @@ -11,6 +11,9 @@ from django.db.models import Q from users.models import ObjectPermission from utilities.permissions import permission_is_exempt, resolve_permission, resolve_permission_ct +import time +import syslog + UserModel = get_user_model() @@ -38,11 +42,18 @@ class ObjectPermissionMixin(): # Create a dictionary mapping permissions to their constraints perms = defaultdict(list) + i = 0 + start = time.time() for obj_perm in object_permissions: for object_type in obj_perm.object_types.all(): for action in obj_perm.actions: perm_name = f"{object_type.app_label}.{action}_{object_type.model}" perms[perm_name].extend(obj_perm.list_constraints()) + syslog.syslog(perm_name) + i += 1 + end = time.time() + t = end - start + syslog.syslog("took " + str(t) + " seconds with " + str(i) + " iterations") return perms ``` The output: ``` Sep 29 22:55:18 netbox.example.com gunicorn[78629]: took 1.325146198272705 seconds with 10397 iterations ``` Wow. That's a lot. Okay, more dirty patching: ``` @@ -24,7 +27,7 @@ class ObjectPermissionMixin(): return user_obj._object_perm_cache def get_permission_filter(self, user_obj): - return Q(users=user_obj) | Q(groups__user=user_obj) + return Q(users=user_obj) def get_object_permissions(self, user_obj): """ ``` And now I have this: ``` Sep 29 22:57:51 netbox.example.com gunicorn[78889]: took 0.17675495147705078 seconds with 1126 iterations ``` Still a lot (the request itself took ~400ms), but a lot better. I'm not familiar with Netbox internals at all, so I'm not sure where to start digging.
adam added the type: bugstatus: under review labels 2025-12-29 19:28:04 +01:00
adam closed this issue 2025-12-29 19:28:04 +01:00
Author
Owner

@jeremystretch commented on GitHub (Sep 30, 2021):

Nice sleuthing! This looks related to (if not the same root issue as) #6926. Do you agree?

Also, can you confirm whether this is reproducible using local authentication, or only as an LDAP user?

@jeremystretch commented on GitHub (Sep 30, 2021): Nice sleuthing! This looks related to (if not the same root issue as) #6926. Do you agree? Also, can you confirm whether this is reproducible using local authentication, or only as an LDAP user?
Author
Owner

@heroin-moose commented on GitHub (Sep 30, 2021):

Nice sleuthing! This looks related to (if not the same root issue as) #6926. Do you agree?

I'm not sure, but looks related. AUTH_LDAP_FIND_GROUP_PERMS does not help me either.

Also, can you confirm whether this is reproducible using local authentication, or only as an LDAP user?

Can I enable both AD and local?

@heroin-moose commented on GitHub (Sep 30, 2021): > Nice sleuthing! This looks related to (if not the same root issue as) #6926. Do you agree? I'm not sure, but looks related. AUTH_LDAP_FIND_GROUP_PERMS does not help me either. > Also, can you confirm whether this is reproducible using local authentication, or only as an LDAP user? Can I enable both AD and local?
Author
Owner

@jeremystretch commented on GitHub (Sep 30, 2021):

Sure. I mean, does it happen if you log in as a user with a local NetBox account? (One defined under admin > users.)

@jeremystretch commented on GitHub (Sep 30, 2021): Sure. I mean, does it happen if you log in as a user with a local NetBox account? (One defined under admin > users.)
Author
Owner

@heroin-moose commented on GitHub (Sep 30, 2021):

Trying to remember the password for netbox user...

@heroin-moose commented on GitHub (Sep 30, 2021): Trying to remember the password for netbox user...
Author
Owner

@heroin-moose commented on GitHub (Sep 30, 2021):

No, it does not happen, NetBox is much more responsive and this function is not called at all.

@heroin-moose commented on GitHub (Sep 30, 2021): No, it does not happen, NetBox is much more responsive and this function is not called at all.
Author
Owner

@heroin-moose commented on GitHub (Sep 30, 2021):

Any other info you need?

@heroin-moose commented on GitHub (Sep 30, 2021): Any other info you need?
Author
Owner

@kkthxbye-code commented on GitHub (Sep 30, 2021):

To add to this, we experienced probably the same problem. I didn't have time to debug the issue, but for us it was caused by the way our permissions were structured.

We had a netbox-user and a netbox-admin group mapped by LDAP. Each group had an individual object permission for view, change, delete:

  • delete_rack
  • view_rack
  • change_rack
  • view_device
  • change_device
    etc.

These were assigned to the groups (all of the for the admin group and all except adminish tasks for the other group). These permissions were created by netbox at one point (or netbox_docker?) at around v2.7, but netbox does not seem to seed the permissions anymore on fresh installs. I solved the issue on our end by just creating two larger permissions, one called all_permissions and one all_permissions_except_admin, then response time was fast for both local and ldap users.

In summation, the issue with our deployment was pre-seeded individual object permissions assigned to the mapped ldap group (around 300 permissions I think).

@kkthxbye-code commented on GitHub (Sep 30, 2021): To add to this, we experienced probably the same problem. I didn't have time to debug the issue, but for us it was caused by the way our permissions were structured. We had a netbox-user and a netbox-admin group mapped by LDAP. Each group had an individual object permission for view, change, delete: - delete_rack - view_rack - change_rack - view_device - change_device etc. These were assigned to the groups (all of the for the admin group and all except adminish tasks for the other group). These permissions were created by netbox at one point (or netbox_docker?) at around v2.7, but netbox does not seem to seed the permissions anymore on fresh installs. I solved the issue on our end by just creating two larger permissions, one called all_permissions and one all_permissions_except_admin, then response time was fast for both local and ldap users. In summation, the issue with our deployment was pre-seeded individual object permissions assigned to the mapped ldap group (around 300 permissions I think).
Author
Owner

@heroin-moose commented on GitHub (Sep 30, 2021):

If I do understand correctly the root issue is not the fact that get_object_permissions() loops 10k times. The problem is that it does so on every request (i.e. it's not cached). Am I right, @jeremystretch?

@heroin-moose commented on GitHub (Sep 30, 2021): If I do understand correctly the root issue is not the fact that `get_object_permissions()` loops 10k times. The problem is that it does so on every request (i.e. it's not cached). Am I right, @jeremystretch?
Author
Owner

@heroin-moose commented on GitHub (Oct 18, 2021):

So, anything else I can do to help you fix this?

@heroin-moose commented on GitHub (Oct 18, 2021): So, anything else I can do to help you fix this?
Author
Owner

@jeremystretch commented on GitHub (Oct 18, 2021):

@heroin-moose you're certainly welcome to volunteer a fix, if you can identify one that doesn't break the behavior of permissions.

@jeremystretch commented on GitHub (Oct 18, 2021): @heroin-moose you're certainly welcome to volunteer a fix, if you can identify one that doesn't break the behavior of permissions.
Author
Owner

@heroin-moose commented on GitHub (Oct 18, 2021):

Can you describe the intended algorithm behind this function? Is my assumption that it shall not called on each request, but cached instead, correct?

@heroin-moose commented on GitHub (Oct 18, 2021): Can you describe the intended algorithm behind this function? Is my assumption that it shall not called on each request, but cached instead, correct?
Author
Owner

@jeremystretch commented on GitHub (Oct 18, 2021):

I'm sorry, I don't have any time to budget on this particular problem right now. Maybe you can track down the person who wrote it.

@jeremystretch commented on GitHub (Oct 18, 2021): I'm sorry, I don't have any time to budget on this particular problem right now. Maybe you can track down the person who wrote it.
Author
Owner

@heroin-moose commented on GitHub (Oct 18, 2021):

Okay, got ya.

@heroin-moose commented on GitHub (Oct 18, 2021): Okay, got ya.
Author
Owner

@xkilian commented on GitHub (Oct 18, 2021):

My collegue tracked down that the LDAP integration has a backwards implementation in Django for how to query groups. It first does a query using a service account and then queries to get all groups, then it looks for the user in the groups, then tries to match groups to Netbox groups. (bad design DJANGO side)
My collegue modified it(Django LDAP module) so that it uses the actuel user for the request and returns if it is a valid user, then netbox does the group mapping based on the user. We do not currently need groups to come from AD, but he will eventually modify it to do the initial query using the user and also request what groups the user is a member of. (We already do this internally in python using a reverse proxy web app in another project)
There is already someone that posted some code (pull request) to modify the LDAP DJANGO code to do the first part and it works well. (Ended up writing the same code as was published on the internet, so it is good and works.) I do not have a link, but a little googling should find it.
Anyway, this should improve overall speed in addition to any Netbox related optimizations.
Cheers.

@xkilian commented on GitHub (Oct 18, 2021): My collegue tracked down that the LDAP integration has a backwards implementation in Django for how to query groups. It first does a query using a service account and then queries to get all groups, then it looks for the user in the groups, then tries to match groups to Netbox groups. (bad design DJANGO side) My collegue modified it(Django LDAP module) so that it uses the actuel user for the request and returns if it is a valid user, then netbox does the group mapping based on the user. We do not currently need groups to come from AD, but he will eventually modify it to do the initial query using the user and also request what groups the user is a member of. (We already do this internally in python using a reverse proxy web app in another project) There is already someone that posted some code (pull request) to modify the LDAP DJANGO code to do the first part and it works well. (Ended up writing the same code as was published on the internet, so it is good and works.) I do not have a link, but a little googling should find it. Anyway, this should improve overall speed in addition to any Netbox related optimizations. Cheers.
Author
Owner

@buzzingbren commented on GitHub (Oct 25, 2021):

I don't have a solution. I ran into this as well with our setup. Using LDAP against an Active Directory, on its own works just fine.
But we're using https://github.com/bb-Ricardo/netbox-sync for filling our Netbox with data. And yes I know, the use case for Netbox is the other way as the source of truth. But for our little environment (4 ESX clusters in 4 separate sites) this just works better than an Excel Spreadsheet.
During synchronization Netbox-Sync uses the API to request info from Netbox and Update/Add info to Netbox.
But this "bug" (for us then) slowed it down to a halt (resulting in 500 Internal Server errors) since version 3.0.x.

We've now just disabled LDAP, because this was the only way for us. Even though the token was of a local defined user, it still screeched to a halt and was unworkable.
We're lucky that we're only a small team, so it's not a real issue to forgo the LDAP integration. But it would be nice to use again in the future.

@buzzingbren commented on GitHub (Oct 25, 2021): I don't have a solution. I ran into this as well with our setup. Using LDAP against an Active Directory, on its own works just fine. But we're using https://github.com/bb-Ricardo/netbox-sync for filling our Netbox with data. And yes I know, the use case for Netbox is the other way as the source of truth. But for our little environment (4 ESX clusters in 4 separate sites) this just works better than an Excel Spreadsheet. During synchronization Netbox-Sync uses the API to request info from Netbox and Update/Add info to Netbox. But this "bug" (for us then) slowed it down to a halt (resulting in 500 Internal Server errors) since version 3.0.x. We've now just disabled LDAP, because this was the only way for us. Even though the token was of a local defined user, it still screeched to a halt and was unworkable. We're lucky that we're only a small team, so it's not a real issue to forgo the LDAP integration. But it would be nice to use again in the future.
Author
Owner

@lastorel commented on GitHub (Oct 27, 2021):

The same situation. Kibana find ~70000 logs from LDAP server when cron starts my sync tasks...

@lastorel commented on GitHub (Oct 27, 2021): The same situation. Kibana find ~70000 logs from LDAP server when cron starts my sync tasks...
Author
Owner

@kkthxbye-code commented on GitHub (Oct 28, 2021):

Please check out the proposed fix in #7676. Note that issues like @lasorel posted are not the same, as this issue and the fix only address CPU usage. Caching issues should be raised in issue #6926

@kkthxbye-code commented on GitHub (Oct 28, 2021): Please check out the proposed fix in #7676. Note that issues like @lasorel posted are not the same, as this issue and the fix only address CPU usage. Caching issues should be raised in issue #6926
Author
Owner

@DanSheps commented on GitHub (Oct 29, 2021):

My collegue modified it(Django LDAP module) so that it uses the actuel user for the request and returns if it is a valid user, then netbox does the group mapping based on the user.

Could you link to the django issue?

@DanSheps commented on GitHub (Oct 29, 2021): > My collegue modified it(Django LDAP module) so that it uses the actuel user for the request and returns if it is a valid user, then netbox does the group mapping based on the user. Could you link to the django issue?
Author
Owner

@xkilian commented on GitHub (Oct 29, 2021):

DJANGO LDAP AUTH
Using memberof for the given user instead of querying group members and looking for the user in the returned data dump.

@xkilian commented on GitHub (Oct 29, 2021): [DJANGO LDAP AUTH](https://github.com/django-auth-ldap/django-auth-ldap/issues/210) Using memberof for the given user instead of querying group members and looking for the user in the returned data dump.
Author
Owner

@xkilian commented on GitHub (Oct 29, 2021):

3 keys points:

  • Unlike all concrete group types in django_auth_ldap, this does not require group search to be configured.
  • Does not require a more privileged user to query the groups
  • Much better performance
@xkilian commented on GitHub (Oct 29, 2021): 3 keys points: - Unlike all concrete group types in ``django_auth_ldap``, this does not require group search to be configured. - Does not require a more privileged user to query the groups - Much better performance
Author
Owner

@xkilian commented on GitHub (Nov 3, 2021):

This is the pull request: https://github.com/django-auth-ldap/django-auth-ldap/pull/180

@xkilian commented on GitHub (Nov 3, 2021): This is the pull request: https://github.com/django-auth-ldap/django-auth-ldap/pull/180
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#5445