Netbox workers exhibit slow memory leak #2948

New Issue

adam · 2025-12-29T18:23:52+01:00

adam commented

2025-12-29 18:23:52 +01:00

Originally created by @ajknv on GitHub (Oct 11, 2019).

Environment

Python version: 3.6.8
NetBox version: 2.6.3

Steps to Reproduce

(I did this as an A/B test on two servers in a clustered HA configuration)

Deploy server A with Netbox configured to have no max_requests value for gunicorn workers (e.g. unlimited). I deploy via the community docker image for what it's worth, though backed by RDS and ElastiCache DBs rather than using the DB containers.
Deploy server B with Netbox configured to set a max_requests value for gunicorn workers. I set it to 2500.
Observe the memory use of the servers over a course of several days/few weeks. Server A free memory slowly drops, in my case from an initial high of ~12GB (VM officially has 16GB allocated) to a current value of ~336K. Gunicorn worker processes are reported consuming gigabytes of memory (current title holder is at 3.7GB). Server B holds more or less steady on free memory.

Expected Behavior

While the gunicorn max_requests configuration does provide a workaround, ideally a server service application like Netbox shouldn't slowly drain all memory on the host until it seizes up.

Observed Behavior

I initially got turned on to tracing and experimenting with performance parameters like the gunicorn max_requests in the course of debugging why both instances in the HA cluster became completely unresponsive, and seized up the VM instances to such a degree that they had to be hard-terminated in AWS for replacement. I'm continuing to observe server A for a similar total failure for complete verification of that root cause, but feel fairly confident that this memory leaking behavior is the primary smoking gun.

Originally created by @ajknv on GitHub (Oct 11, 2019). ### Environment * Python version: 3.6.8 * NetBox version: 2.6.3 ### Steps to Reproduce (I did this as an A/B test on two servers in a clustered HA configuration) 1. Deploy server A with Netbox configured to have no max_requests value for gunicorn workers (e.g. unlimited). I deploy via the community docker image for what it's worth, though backed by RDS and ElastiCache DBs rather than using the DB containers. 2. Deploy server B with Netbox configured to set a max_requests value for gunicorn workers. I set it to 2500. 3. Observe the memory use of the servers over a course of several days/few weeks. Server A free memory slowly drops, in my case from an initial high of ~12GB (VM officially has 16GB allocated) to a current value of ~336K. Gunicorn worker processes are reported consuming gigabytes of memory (current title holder is at 3.7GB). Server B holds more or less steady on free memory. ### Expected Behavior While the gunicorn max_requests configuration does provide a workaround, ideally a server service application like Netbox shouldn't slowly drain all memory on the host until it seizes up. ### Observed Behavior I initially got turned on to tracing and experimenting with performance parameters like the gunicorn max_requests in the course of debugging why both instances in the HA cluster became completely unresponsive, and seized up the VM instances to such a degree that they had to be hard-terminated in AWS for replacement. I'm continuing to observe server A for a similar total failure for complete verification of that root cause, but feel fairly confident that this memory leaking behavior is the primary smoking gun.

adam closed this issue

2025-12-29 18:23:52 +01:00

adam commented

2025-12-29 18:23:53 +01:00

@DanSheps commented on GitHub (Oct 11, 2019):

I am going to go out on a limb, and say this is going to be an upstream issue.

There are four layers here:

Python
Gunicorn
DJango
Netbox

Most functions are abstracted by the first 2 so Netbox itself does not have a lot of interaction with the operating system at all. The memory leak, if you determine there is one, could come from any layer.

@DanSheps commented on GitHub (Oct 11, 2019): I am going to go out on a limb, and say this is going to be an upstream issue. There are four layers here: * Python * Gunicorn * DJango * Netbox Most functions are abstracted by the first 2 so Netbox itself does not have a lot of interaction with the operating system at all. The memory leak, if you determine there is one, could come from any layer.

adam commented

2025-12-29 18:23:53 +01:00

@jeremystretch commented on GitHub (Oct 12, 2019):

Agreed. Unless you can attribute the leak to something particular inside NetBox, I don't think there's much we can do. Although, it might be worth adding max_requests to the reference gunicorn config in the docs to mitigate the issue.

@jeremystretch commented on GitHub (Oct 12, 2019): Agreed. Unless you can attribute the leak to something particular inside NetBox, I don't think there's much we can do. Although, it might be worth adding `max_requests` to the reference gunicorn config in the docs to mitigate the issue.

adam commented

2025-12-29 18:23:53 +01:00

@ajknv commented on GitHub (Oct 14, 2019):

It is possible it could be upstream, no doubt, though given the popularity of all of those layers that conclusion would seem to imply that every single project built on pretty widely adopted frameworks suffers from a fairly aggressive memory leak. IMO that doesn't really pass the Occam's Razor test for explaining the issue.

Netbox itself does not have a lot of interaction with the operating system at all

What I mean by memory leak in this context is the likelihood of something hanging on to and accumulating references to Python objects such that they can't be garbage collected, e.g. a memory leak in the sense typical of higher-level languages; not just something like a malloc request directly to the OS. Something like DB results sets or whatnot.

The memory leak, if you determine there is one

I'm fairly puzzled, the description provided in the issue description isn't even convincing to you that there is definitely a memory leak, setting aside doubts about which layer it is coming from?

Unless you can attribute the leak to something particular inside NetBox, I don't think there's much we can do.

I can try to find the time to help, but I guess I was hoping you might consider it worthwhile as the maintainers of the project, to consider running some tools (static analysis/linters, memory analysis, etc.) or even just do an inspection pass over the code around data handling to look for possible culprits within Netbox.

@ajknv commented on GitHub (Oct 14, 2019): It is possible it could be upstream, no doubt, though given the popularity of all of those layers that conclusion would seem to imply that every single project built on pretty widely adopted frameworks suffers from a fairly aggressive memory leak. IMO that doesn't really pass the Occam's Razor test for explaining the issue. > Netbox itself does not have a lot of interaction with the operating system at all What I mean by memory leak in this context is the likelihood of something hanging on to and accumulating references to Python objects such that they can't be garbage collected, e.g. a memory leak in the sense typical of higher-level languages; not just something like a malloc request directly to the OS. Something like DB results sets or whatnot. > The memory leak, if you determine there is one I'm fairly puzzled, the description provided in the issue description isn't even convincing to you that there is definitely a memory leak, setting aside doubts about which layer it is coming from? > Unless you can attribute the leak to something particular inside NetBox, I don't think there's much we can do. I can try to find the time to help, but I guess I was hoping you might consider it worthwhile as the maintainers of the project, to consider running some tools (static analysis/linters, memory analysis, etc.) or even just do an inspection pass over the code around data handling to look for possible culprits within Netbox.

adam commented

2025-12-29 18:23:54 +01:00

@jeremystretch commented on GitHub (Oct 15, 2019):

I can try to find the time to help, but I guess I was hoping you might consider it worthwhile as the maintainers of the project, to consider running some tools

I'm sure it's worthwhile, but so are the 186 other issues currently open that our four (part-time) maintainers are tasked with managing. Open source projects depend on contributions from their users to survive. @ajknv if you're willing to commit to doing this work I'll leave this issue open. Otherwise, I think adding a line in the docs to suggest adding a max_requests limit in the gunicorn configuration is a very acceptable workaround.

@jeremystretch commented on GitHub (Oct 15, 2019): > I can try to find the time to help, but I guess I was hoping you might consider it worthwhile as the maintainers of the project, to consider running some tools I'm sure it's worthwhile, but so are the 186 other issues currently open that our four (part-time) maintainers are tasked with managing. Open source projects depend on contributions from their users to survive. @ajknv if you're willing to commit to doing this work I'll leave this issue open. Otherwise, I think adding a line in the docs to suggest adding a `max_requests` limit in the gunicorn configuration is a very acceptable workaround.

adam commented

2025-12-29 18:23:54 +01:00

@tyler-8 commented on GitHub (Oct 16, 2019):

I agree that max_requests is really the way to go. Memory leaks can be difficult and time-intensive to pin down, and even then may eventually be attributed to any number of third party libraries in use (if the issue truly lies in the Netbox code to begin with).

I would add that max_requests should be paired with max_requests_jitter in the suggested configuration. This helps prevent all the gunicorn workers from being restarted at nearly the same time. I'm using uwsgi myself but it has similar configuration parameters.

Further general recommendations: https://adamj.eu/tech/2019/09/19/working-around-memory-leaks-in-your-django-app/

@tyler-8 commented on GitHub (Oct 16, 2019): I agree that `max_requests` is really the way to go. Memory leaks can be difficult and time-intensive to pin down, and even then may eventually be attributed to any number of third party libraries in use (if the issue truly lies in the Netbox code to begin with). I would add that `max_requests` should be paired with `max_requests_jitter` in the suggested configuration. This helps prevent all the gunicorn workers from being restarted at nearly the same time. I'm using `uwsgi` myself but it has similar configuration parameters. Further general recommendations: https://adamj.eu/tech/2019/09/19/working-around-memory-leaks-in-your-django-app/

adam commented

2025-12-29 18:23:54 +01:00

@jeremystretch commented on GitHub (Nov 1, 2019):

Going to close this out as #3658 seems to provide an adequate workaround in the absence of a known root issue.

@jeremystretch commented on GitHub (Nov 1, 2019): Going to close this out as #3658 seems to provide an adequate workaround in the absence of a known root issue.

adam referenced this issue

2025-12-29 22:21:46 +01:00

[PR #2948] [CLOSED] Update index.md #12469

Sign in to join this conversation.

Branches Tags

main

update-changelog-comments-docs

feature-removal-issue-type

20911-dropdown

20239-plugin-menu-classes-mutable-state

21097-graphql-id-lookups

feature

fix_module_substitution

20923-dcim-templates

20044-elevation-stuck-lightmode

feature-ip-prefix-link

v4.5-beta1-release

20068-import-moduletype-attrs

20766-fix-german-translation-code-literals

20378-del-script

7604-filter-modifiers-v3

circuit-swap

12318-case-insensitive-uniqueness

20637-improve-device-q-filter

20660-script-load

19724-graphql

20614-update-ruff

14884-script

02496-max-page

19720-macaddress-interface-generic-relation

19408-circuit-terminations-export-templates

20203-openapi-check

fix-19669-api-image-download

7604-filter-modifiers

19275-fixes-interface-bulk-edit

fix-17794-get_field_value_return_list

11507-show-aggregate-and-rir-on-api

9583-add_column_specific_search_field_to_tables

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/netbox#2948