Potential Memory Leak Problems #635

Closed
opened 2025-12-29 16:24:07 +01:00 by adam · 12 comments
Owner

Originally created by @WilliamMarti on GitHub (Jan 17, 2017).

So I have Netbox running on an Ubuntu t2.small(1 vCPU 2 GM RAM) server in AWS. I am running Netbox v1.8.0 right now as well.

I have about 20,000 IPs in Netbox currently(and about another 50,000 to go). When I load the 'IP Addresses' Page it is quite slow which may be a CPU bottleneck, but once it is finished loading my memory jumps way up and stays up. When I start paging through the IP address pages, I eventually use all of my available memory and can crash the gunicorn app server.

Restarting the supervisor service my server idles at about ~290 MB of memory, but once I start loading pages I can see my memory climb. It is not as apparent with Sites or other data categories that are not as large. This is most prominent with IP addresses because I have so many I am guessing.

It seems like Netbox is loading the entire set of IP addresses I have behind the scenes. I have looked into gunicorn and found you can have the the workers restart automatically after a # of requests, but seems like a bandaid. I have set my 'max_requests' to 2 in my gunicorn config, which seems to have helped the issue for now.

So I guess my question is, is this intended behavior? Regardless of the amount of RAM I have, continuously loading large pages over time appears like it will use all the available RAM on a server.

Originally created by @WilliamMarti on GitHub (Jan 17, 2017). So I have Netbox running on an Ubuntu t2.small(1 vCPU 2 GM RAM) server in AWS. I am running Netbox v1.8.0 right now as well. I have about 20,000 IPs in Netbox currently(and about another 50,000 to go). When I load the 'IP Addresses' Page it is quite slow which may be a CPU bottleneck, but once it is finished loading my memory jumps _way_ up and stays up. When I start paging through the IP address pages, I eventually use all of my available memory and can crash the gunicorn app server. Restarting the supervisor service my server idles at about ~290 MB of memory, but once I start loading pages I can see my memory climb. It is not as apparent with Sites or other data categories that are not as large. This is most prominent with IP addresses because I have so many I am guessing. It seems like Netbox is loading the entire set of IP addresses I have behind the scenes. I have looked into gunicorn and found you can have the the workers restart automatically after a # of requests, but seems like a bandaid. I have set my 'max_requests' to 2 in my gunicorn config, which seems to have helped the issue for now. So I guess my question is, is this intended behavior? Regardless of the amount of RAM I have, continuously loading large pages over time appears like it will use all the available RAM on a server.
adam closed this issue 2025-12-29 16:24:07 +01:00
Author
Owner

@jeremystretch commented on GitHub (Jan 18, 2017):

It does seem that certain views are pulling down entire database tables with no LIMIT appended to the SQL query. I can't work out exactly why this is happening, but it seems to stem from the rendering of tables in django-tables2. It's probably worthwhile at this point to re-implement the object lists and ditch django-tables2 altogether.

Also, just to verify, you are not running with DEBUG enabled in the configuration, correct?

@jeremystretch commented on GitHub (Jan 18, 2017): It does seem that certain views are pulling down entire database tables with no `LIMIT` appended to the SQL query. I can't work out exactly why this is happening, but it seems to stem from the rendering of tables in django-tables2. It's probably worthwhile at this point to re-implement the object lists and ditch django-tables2 altogether. Also, just to verify, you are not running with `DEBUG` enabled in the configuration, correct?
Author
Owner

@WilliamMarti commented on GitHub (Jan 18, 2017):

Not from what I can tell. In settings.py:

DEBUG = getattr(configuration, 'DEBUG', False)

Nothing regarding DEBUG in my configuration.py file. I do have LDAP configured so I think:

logger.setLevel(logging.DEBUG)

is set. Not sure if that makes a difference.

@WilliamMarti commented on GitHub (Jan 18, 2017): Not from what I can tell. In settings.py: `DEBUG = getattr(configuration, 'DEBUG', False)` Nothing regarding DEBUG in my configuration.py file. I do have LDAP configured so I think: `logger.setLevel(logging.DEBUG)` is set. Not sure if that makes a difference.
Author
Owner

@jeremystretch commented on GitHub (Jan 18, 2017):

Nah, it's only enabled if you set DEBUG = True in configuration.py. (You'll see the debug toolbar appear on the right side of the screen when using NetBox, too.) Thanks for checking.

@jeremystretch commented on GitHub (Jan 18, 2017): Nah, it's only enabled if you set `DEBUG = True` in configuration.py. (You'll see the debug toolbar appear on the right side of the screen when using NetBox, too.) Thanks for checking.
Author
Owner

@jeremystretch commented on GitHub (Jan 20, 2017):

I've re-implemented the manner in which bulk editing and deletion is done on objects, which should be much more efficient now. However, I'm not sure whether this was contributing to your issue. The change has been pushed to the develop branch if you'd like to try cloning it (not recommended for production use). Otherwise, it will be included in the v1.8.3 release, which will probably be in a couple weeks.

@jeremystretch commented on GitHub (Jan 20, 2017): I've re-implemented the manner in which bulk editing and deletion is done on objects, which should be much more efficient now. However, I'm not sure whether this was contributing to your issue. The change has been pushed to the `develop` branch if you'd like to try cloning it (not recommended for production use). Otherwise, it will be included in the v1.8.3 release, which will probably be in a couple weeks.
Author
Owner

@WilliamMarti commented on GitHub (Jan 21, 2017):

I can test this tomorrow, although based on your description I am not sure this is the fix. Merely reading/viewing data was making my RAM usage climb.

@WilliamMarti commented on GitHub (Jan 21, 2017): I can test this tomorrow, although based on your description I am not sure this is the fix. Merely reading/viewing data was making my RAM usage climb.
Author
Owner

@WilliamMarti commented on GitHub (Jan 25, 2017):

Sorry for the delay, this is still on my radar. Running into some server problems with my own deployment that are not related to Netbox itself.

@WilliamMarti commented on GitHub (Jan 25, 2017): Sorry for the delay, this is still on my radar. Running into some server problems with my own deployment that are not related to Netbox itself.
Author
Owner

@WilliamMarti commented on GitHub (Jan 25, 2017):

It looks like this is still going on. Having the workers restart after a set number of requests seems to alleviate the issue fine though.

The bigger issue for me is the app selecting the entire table for whatever we are viewing. For most data types this isn't an issue because I have less that 3000 entries. But I have 77000 IPs entered in currently so this pegs my EC2 instance pretty hard. I realize I am running this on a pretty low end instance, but if there was a way to do a "LIMIT 50" for each pages SELECT, I think that would fix the issue for me.

If not I can probably move up to a t2.medium

@WilliamMarti commented on GitHub (Jan 25, 2017): It looks like this is still going on. Having the workers restart after a set number of requests seems to alleviate the issue fine though. The bigger issue for me is the app selecting the entire table for whatever we are viewing. For most data types this isn't an issue because I have less that 3000 entries. But I have 77000 IPs entered in currently so this pegs my EC2 instance pretty hard. I realize I am running this on a pretty low end instance, but if there was a way to do a "LIMIT 50" for each pages SELECT, I think that would fix the issue for me. If not I can probably move up to a t2.medium
Author
Owner

@jeremystretch commented on GitHub (Jan 25, 2017):

Can you try running NetBox with DEBUG = True set in configuration.py? This will display the debug toolbar along the righthand side of the screen. Clicking the SQL tab will show a breakdown of each database query and the time it's taking. I'm curious which queries are taking the longest.

@jeremystretch commented on GitHub (Jan 25, 2017): Can you try running NetBox with `DEBUG = True` set in configuration.py? This will display the debug toolbar along the righthand side of the screen. Clicking the SQL tab will show a breakdown of each database query and the time it's taking. I'm curious which queries are taking the longest.
Author
Owner

@WilliamMarti commented on GitHub (Jan 25, 2017):

There are 3 that are suspect, but 1 big one. If I run these on my AWS instance I get a 502 error from nginx which I am guessing is related to the slowness of the SELECT. Ran these tests on my VMware Workstation VM, still slow but at least they complete. For this both netbox and postgres are on the same VM

SELECT "ipam_ipaddress"."id", "ipam_ipaddress"."created", "ipam_ipaddress"."last_updated", "ipam_ipaddress"."family", "ipam_ipaddress"."address", "ipam_ipaddress"."vrf_id", "ipam_ipaddress"."tenant_id", "ipam_ipaddress"."status", "ipam_ipaddress"."interface_id", "ipam_ipaddress"."nat_inside_id", "ipam_ipaddress"."description", (INET(HOST(ipam_ipaddress.address))) AS "host", "ipam_vrf"."id", "ipam_vrf"."created", "ipam_vrf"."last_updated", "ipam_vrf"."name", "ipam_vrf"."rd", "ipam_vrf"."tenant_id", "ipam_vrf"."enforce_unique", "ipam_vrf"."description", "tenancy_tenant"."id", "tenancy_tenant"."created", "tenancy_tenant"."last_updated", "tenancy_tenant"."name", "tenancy_tenant"."slug", "tenancy_tenant"."group_id", "tenancy_tenant"."description", "tenancy_tenant"."comments", T4."id", T4."created", T4."last_updated", T4."name", T4."slug", T4."group_id", T4."description", T4."comments", "dcim_interface"."id", "dcim_interface"."device_id", "dcim_interface"."name", "dcim_interface"."form_factor", "dcim_interface"."mac_address", "dcim_interface"."mgmt_only", "dcim_interface"."description", "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."comments" FROM "ipam_ipaddress" LEFT OUTER JOIN "ipam_vrf" ON ("ipam_ipaddress"."vrf_id" = "ipam_vrf"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("ipam_vrf"."tenant_id" = "tenancy_tenant"."id") LEFT OUTER JOIN "tenancy_tenant" T4 ON ("ipam_ipaddress"."tenant_id" = T4."id") LEFT OUTER JOIN "dcim_interface" ON ("ipam_ipaddress"."interface_id" = "dcim_interface"."id") LEFT OUTER JOIN "dcim_device" ON ("dcim_interface"."device_id" = "dcim_device"."id") ORDER BY "ipam_ipaddress"."family" ASC, "host" ASC

This one took 1023.61ms to complete

SELECT COUNT(*) FROM (SELECT "ipam_ipaddress"."id" AS Col1, (INET(HOST(ipam_ipaddress.address))) AS "host" FROM "ipam_ipaddress" GROUP BY "ipam_ipaddress"."id", (INET(HOST(ipam_ipaddress.address)))) subquery

This took 125.89 ms to finish

SELECT "ipam_ipaddress"."id", "ipam_ipaddress"."created", "ipam_ipaddress"."last_updated", "ipam_ipaddress"."family", "ipam_ipaddress"."address", "ipam_ipaddress"."vrf_id", "ipam_ipaddress"."tenant_id", "ipam_ipaddress"."status", "ipam_ipaddress"."interface_id", "ipam_ipaddress"."nat_inside_id", "ipam_ipaddress"."description", (INET(HOST(ipam_ipaddress.address))) AS "host", "ipam_vrf"."id", "ipam_vrf"."created", "ipam_vrf"."last_updated", "ipam_vrf"."name", "ipam_vrf"."rd", "ipam_vrf"."tenant_id", "ipam_vrf"."enforce_unique", "ipam_vrf"."description", "tenancy_tenant"."id", "tenancy_tenant"."created", "tenancy_tenant"."last_updated", "tenancy_tenant"."name", "tenancy_tenant"."slug", "tenancy_tenant"."group_id", "tenancy_tenant"."description", "tenancy_tenant"."comments", T4."id", T4."created", T4."last_updated", T4."name", T4."slug", T4."group_id", T4."description", T4."comments", "dcim_interface"."id", "dcim_interface"."device_id", "dcim_interface"."name", "dcim_interface"."form_factor", "dcim_interface"."mac_address", "dcim_interface"."mgmt_only", "dcim_interface"."description", "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."comments" FROM "ipam_ipaddress" LEFT OUTER JOIN "ipam_vrf" ON ("ipam_ipaddress"."vrf_id" = "ipam_vrf"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("ipam_vrf"."tenant_id" = "tenancy_tenant"."id") LEFT OUTER JOIN "tenancy_tenant" T4 ON ("ipam_ipaddress"."tenant_id" = T4."id") LEFT OUTER JOIN "dcim_interface" ON ("ipam_ipaddress"."interface_id" = "dcim_interface"."id") LEFT OUTER JOIN "dcim_device" ON ("dcim_interface"."device_id" = "dcim_device"."id") ORDER BY "ipam_ipaddress"."family" ASC, "host" ASC LIMIT 50

Took 119.15 ms to finish

@WilliamMarti commented on GitHub (Jan 25, 2017): There are 3 that are suspect, but 1 big one. If I run these on my AWS instance I get a 502 error from nginx which I am guessing is related to the slowness of the SELECT. Ran these tests on my VMware Workstation VM, still slow but at least they complete. For this both netbox and postgres are on the same VM 1. ``` SELECT "ipam_ipaddress"."id", "ipam_ipaddress"."created", "ipam_ipaddress"."last_updated", "ipam_ipaddress"."family", "ipam_ipaddress"."address", "ipam_ipaddress"."vrf_id", "ipam_ipaddress"."tenant_id", "ipam_ipaddress"."status", "ipam_ipaddress"."interface_id", "ipam_ipaddress"."nat_inside_id", "ipam_ipaddress"."description", (INET(HOST(ipam_ipaddress.address))) AS "host", "ipam_vrf"."id", "ipam_vrf"."created", "ipam_vrf"."last_updated", "ipam_vrf"."name", "ipam_vrf"."rd", "ipam_vrf"."tenant_id", "ipam_vrf"."enforce_unique", "ipam_vrf"."description", "tenancy_tenant"."id", "tenancy_tenant"."created", "tenancy_tenant"."last_updated", "tenancy_tenant"."name", "tenancy_tenant"."slug", "tenancy_tenant"."group_id", "tenancy_tenant"."description", "tenancy_tenant"."comments", T4."id", T4."created", T4."last_updated", T4."name", T4."slug", T4."group_id", T4."description", T4."comments", "dcim_interface"."id", "dcim_interface"."device_id", "dcim_interface"."name", "dcim_interface"."form_factor", "dcim_interface"."mac_address", "dcim_interface"."mgmt_only", "dcim_interface"."description", "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."comments" FROM "ipam_ipaddress" LEFT OUTER JOIN "ipam_vrf" ON ("ipam_ipaddress"."vrf_id" = "ipam_vrf"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("ipam_vrf"."tenant_id" = "tenancy_tenant"."id") LEFT OUTER JOIN "tenancy_tenant" T4 ON ("ipam_ipaddress"."tenant_id" = T4."id") LEFT OUTER JOIN "dcim_interface" ON ("ipam_ipaddress"."interface_id" = "dcim_interface"."id") LEFT OUTER JOIN "dcim_device" ON ("dcim_interface"."device_id" = "dcim_device"."id") ORDER BY "ipam_ipaddress"."family" ASC, "host" ASC ``` This one took 1023.61ms to complete 2. ``` SELECT COUNT(*) FROM (SELECT "ipam_ipaddress"."id" AS Col1, (INET(HOST(ipam_ipaddress.address))) AS "host" FROM "ipam_ipaddress" GROUP BY "ipam_ipaddress"."id", (INET(HOST(ipam_ipaddress.address)))) subquery ``` This took 125.89 ms to finish 3. ``` SELECT "ipam_ipaddress"."id", "ipam_ipaddress"."created", "ipam_ipaddress"."last_updated", "ipam_ipaddress"."family", "ipam_ipaddress"."address", "ipam_ipaddress"."vrf_id", "ipam_ipaddress"."tenant_id", "ipam_ipaddress"."status", "ipam_ipaddress"."interface_id", "ipam_ipaddress"."nat_inside_id", "ipam_ipaddress"."description", (INET(HOST(ipam_ipaddress.address))) AS "host", "ipam_vrf"."id", "ipam_vrf"."created", "ipam_vrf"."last_updated", "ipam_vrf"."name", "ipam_vrf"."rd", "ipam_vrf"."tenant_id", "ipam_vrf"."enforce_unique", "ipam_vrf"."description", "tenancy_tenant"."id", "tenancy_tenant"."created", "tenancy_tenant"."last_updated", "tenancy_tenant"."name", "tenancy_tenant"."slug", "tenancy_tenant"."group_id", "tenancy_tenant"."description", "tenancy_tenant"."comments", T4."id", T4."created", T4."last_updated", T4."name", T4."slug", T4."group_id", T4."description", T4."comments", "dcim_interface"."id", "dcim_interface"."device_id", "dcim_interface"."name", "dcim_interface"."form_factor", "dcim_interface"."mac_address", "dcim_interface"."mgmt_only", "dcim_interface"."description", "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated", "dcim_device"."device_type_id", "dcim_device"."device_role_id", "dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face", "dcim_device"."status", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id", "dcim_device"."comments" FROM "ipam_ipaddress" LEFT OUTER JOIN "ipam_vrf" ON ("ipam_ipaddress"."vrf_id" = "ipam_vrf"."id") LEFT OUTER JOIN "tenancy_tenant" ON ("ipam_vrf"."tenant_id" = "tenancy_tenant"."id") LEFT OUTER JOIN "tenancy_tenant" T4 ON ("ipam_ipaddress"."tenant_id" = T4."id") LEFT OUTER JOIN "dcim_interface" ON ("ipam_ipaddress"."interface_id" = "dcim_interface"."id") LEFT OUTER JOIN "dcim_device" ON ("dcim_interface"."device_id" = "dcim_device"."id") ORDER BY "ipam_ipaddress"."family" ASC, "host" ASC LIMIT 50 ``` Took 119.15 ms to finish
Author
Owner

@jeremystretch commented on GitHub (Jan 26, 2017):

I'm pretty sure that first query shouldn't be there. IIRC it was removed when I re-implemented the method for bulk editing or deleting all objects matching a query (code). Are you running on an updated copy of the develop branch?

@jeremystretch commented on GitHub (Jan 26, 2017): I'm pretty sure that first query shouldn't be there. IIRC it was removed when I re-implemented the method for bulk editing or deleting all objects matching a query ([code](https://github.com/digitalocean/netbox/commit/39d083eae71805a9ea9dc4a0ad6ca017419ef111#diff-2790e3b925f6b6677b3f6bef9dd0ada5)). Are you running on an updated copy of the develop branch?
Author
Owner

@WilliamMarti commented on GitHub (Jan 27, 2017):

So my apologies, once I switched my netbox deploy to the develop branch, my issues went away immediately. I think this issue can be closed.

Thank you so much for the assistance

@WilliamMarti commented on GitHub (Jan 27, 2017): So my apologies, once I switched my netbox deploy to the develop branch, my issues went away immediately. I think this issue can be closed. Thank you so much for the assistance
Author
Owner

@jeremystretch commented on GitHub (Jan 27, 2017):

Awesome! Glad we got that worked out. Thanks for not giving up on NetBox.

@jeremystretch commented on GitHub (Jan 27, 2017): Awesome! Glad we got that worked out. Thanks for not giving up on NetBox.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#635