Upgrade to v3.6-beta2 - migration causes OOM of the database process #8533

Closed
opened 2025-12-29 20:37:51 +01:00 by adam · 9 comments
Owner

Originally created by @cybarox on GitHub (Aug 30, 2023).

Originally assigned to: @jeremystretch on GitHub.

NetBox version

v3.5.9

Python version

3.10

Steps to Reproduce

  1. dump production database
  2. start clean vagrant vm (2 Cores, 4GB Ram) ubuntu 22.04
  3. setup NetBox v3.5.9 like install instructions
  4. create database from dump postgesql v14.9
  5. netbox function as normal
  6. change repo to beta2
  7. run sudo /opt/netbox/upgrade.sh
  8. upgrade get stuck on django db dcim migrations and fails because postgresql.service is killed of OOM

We have over 18000 devices in NetBox. Maybe it has something to do with the high number of devices.

Expected Behavior

NetBox upgrades to v3.6-beta2

Observed Behavior

Upgrade fails due to terminated database process:

Operations to perform:
  Apply all migrations: account, admin, auth, circuits, contenttypes, core, dcim, django_rq, extras, ipam, sessions, social_django, taggit, tenancy, users, virtualization, wireless
Running migrations:
  Applying users.0004_netboxgroup_netboxuser... OK
  Applying account.0001_initial... OK
  Applying dcim.0173_remove_napalm_fields... OK
  Applying dcim.0174_device_latitude_device_longitude... OK
  Applying dcim.0174_rack_starting_unit... OK
  Applying dcim.0175_device_oob_ip... OK
  Applying dcim.0176_device_component_counters...Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/cursor.py", line 737, in execute
    raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: EOF detected

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards
    self.code(from_state.apps, schema_editor)
  File "/opt/netbox/netbox/dcim/migrations/0176_device_component_counters.py", line 34, in recalculate_device_counts
    Device.objects.bulk_update(devices, [
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/query.py", line 892, in bulk_update
    rows_updated += queryset.filter(pk__in=pks).update(**update_kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/query.py", line 1206, in update
    rows = query.get_compiler(self.db).execute_sql(CURSOR)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1984, in execute_sql
    cursor = super().execute_sql(result_type)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
    cursor.execute(sql, params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/cursor.py", line 737, in execute
    raise ex.with_traceback(None)
django.db.utils.OperationalError: consuming input failed: EOF detected

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
    self.connect()
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/connection.py", line 729, in connect
    raise ex.with_traceback(None)
psycopg.OperationalError: connection failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/netbox/netbox/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 249, in apply_migration
    with self.connection.schema_editor(
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/schema.py", line 168, in __exit__
    self.atomic.__exit__(exc_type, exc_value, traceback)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/transaction.py", line 307, in __exit__
    connection.set_autocommit(True)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 483, in set_autocommit
    self.ensure_connection()
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 288, in ensure_connection
    with self.wrap_database_errors:
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection
    self.connect()
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/connection.py", line 729, in connect
    raise ex.with_traceback(None)
django.db.utils.OperationalError: connection failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?
Originally created by @cybarox on GitHub (Aug 30, 2023). Originally assigned to: @jeremystretch on GitHub. ### NetBox version v3.5.9 ### Python version 3.10 ### Steps to Reproduce 1. dump production database 2. start clean vagrant vm (2 Cores, 4GB Ram) ubuntu 22.04 3. setup NetBox v3.5.9 like install instructions 4. create database from dump postgesql v14.9 5. netbox function as normal 6. change repo to beta2 7. run sudo /opt/netbox/upgrade.sh 8. upgrade get stuck on django db dcim migrations and fails because postgresql.service is killed of OOM We have over 18000 devices in NetBox. Maybe it has something to do with the high number of devices. ### Expected Behavior NetBox upgrades to v3.6-beta2 ### Observed Behavior Upgrade fails due to terminated database process: ``` Operations to perform: Apply all migrations: account, admin, auth, circuits, contenttypes, core, dcim, django_rq, extras, ipam, sessions, social_django, taggit, tenancy, users, virtualization, wireless Running migrations: Applying users.0004_netboxgroup_netboxuser... OK Applying account.0001_initial... OK Applying dcim.0173_remove_napalm_fields... OK Applying dcim.0174_device_latitude_device_longitude... OK Applying dcim.0174_rack_starting_unit... OK Applying dcim.0175_device_oob_ip... OK Applying dcim.0176_device_component_counters...Traceback (most recent call last): File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute return self.cursor.execute(sql, params) File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/cursor.py", line 737, in execute raise ex.with_traceback(None) psycopg.OperationalError: consuming input failed: EOF detected The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 252, in apply_migration state = migration.apply(state, schema_editor) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/migration.py", line 132, in apply operation.database_forwards( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards self.code(from_state.apps, schema_editor) File "/opt/netbox/netbox/dcim/migrations/0176_device_component_counters.py", line 34, in recalculate_device_counts Device.objects.bulk_update(devices, [ File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/query.py", line 892, in bulk_update rows_updated += queryset.filter(pk__in=pks).update(**update_kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/query.py", line 1206, in update rows = query.get_compiler(self.db).execute_sql(CURSOR) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1984, in execute_sql cursor = super().execute_sql(result_type) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql cursor.execute(sql, params) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute return self._execute_with_wrappers( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers return executor(sql, params, many, context) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute with self.db.wrap_database_errors: File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute return self.cursor.execute(sql, params) File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/cursor.py", line 737, in execute raise ex.with_traceback(None) django.db.utils.OperationalError: consuming input failed: EOF detected During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection self.connect() File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect self.connection = self.get_new_connection(conn_params) File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection connection = self.Database.connect(**conn_params) File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/connection.py", line 729, in connect raise ex.with_traceback(None) psycopg.OperationalError: connection failed: Connection refused Is the server running on that host and accepting TCP/IP connections? The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/netbox/netbox/manage.py", line 10, in <module> execute_from_command_line(sys.argv) File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line utility.execute() File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 412, in run_from_argv self.execute(*args, **cmd_options) File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 458, in execute output = self.handle(*args, **options) File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/base.py", line 106, in wrapper res = handle_func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 356, in handle post_migrate_state = executor.migrate( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 135, in migrate state = self._migrate_all_forwards( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards state = self.apply_migration( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/migrations/executor.py", line 249, in apply_migration with self.connection.schema_editor( File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/schema.py", line 168, in __exit__ self.atomic.__exit__(exc_type, exc_value, traceback) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/transaction.py", line 307, in __exit__ connection.set_autocommit(True) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 483, in set_autocommit self.ensure_connection() File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 288, in ensure_connection with self.wrap_database_errors: File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 289, in ensure_connection self.connect() File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/base/base.py", line 270, in connect self.connection = self.get_new_connection(conn_params) File "/opt/netbox/venv/lib/python3.10/site-packages/django/utils/asyncio.py", line 26, in inner return func(*args, **kwargs) File "/opt/netbox/venv/lib/python3.10/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection connection = self.Database.connect(**conn_params) File "/opt/netbox/venv/lib/python3.10/site-packages/psycopg/connection.py", line 729, in connect raise ex.with_traceback(None) django.db.utils.OperationalError: connection failed: Connection refused Is the server running on that host and accepting TCP/IP connections? ```
adam added the type: bugstatus: acceptedbetaseverity: medium labels 2025-12-29 20:37:51 +01:00
adam closed this issue 2025-12-29 20:37:51 +01:00
Author
Owner

@abhi1693 commented on GitHub (Aug 30, 2023):

Thank you for opening a bug report. I was unable to reproduce the reported behavior on NetBox v3.6-beta2. Please re-confirm the reported behavior on the current stable release and adjust your post above as necessary. Remember to provide detailed steps that someone else can follow using a clean installation of NetBox to reproduce the issue. Remember to include the steps taken to create any initial objects or other data.

@abhi1693 commented on GitHub (Aug 30, 2023): Thank you for opening a bug report. I was unable to reproduce the reported behavior on NetBox v3.6-beta2. Please re-confirm the reported behavior on the current stable release and adjust your post above as necessary. Remember to provide detailed steps that someone else can follow using a clean installation of NetBox to reproduce the issue. Remember to include the steps taken to create any initial objects or other data.
Author
Owner

@cybarox commented on GitHub (Aug 30, 2023):

I can understand that the steps to reproduce the error are not really meaningful. With the demo data, the migration also works without problems. We have in our database over 18k devices, 90k interfaces, 19k front ports, 16k rear ports and 3k console ports. The DB dump is around 450mb in size. I have no idea how to reproduce this.

@cybarox commented on GitHub (Aug 30, 2023): I can understand that the steps to reproduce the error are not really meaningful. With the demo data, the migration also works without problems. We have in our database over 18k devices, 90k interfaces, 19k front ports, 16k rear ports and 3k console ports. The DB dump is around 450mb in size. I have no idea how to reproduce this.
Author
Owner

@abhi1693 commented on GitHub (Aug 30, 2023):

It's most likely a configuration issue with your PostgreSQL server. It's possible that it's killing the session when it goes beyond a certain time maybe 5 minutes.

@abhi1693 commented on GitHub (Aug 30, 2023): It's most likely a configuration issue with your PostgreSQL server. It's possible that it's killing the session when it goes beyond a certain time maybe 5 minutes.
Author
Owner

@cybarox commented on GitHub (Aug 30, 2023):

It is not an PostgreSQL issue, the process is killed by oom-killer

[ 2529.555076] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@14-main.service,task=postgres,pid=5310,uid=114
[ 2529.555086] Out of memory: Killed process 5310 (postgres) total-vm:5199092kB, anon-rss:3583912kB, file-rss:2580kB, shmem-rss:5500kB, UID:114 pgtables:9416kB oom_score_adj:0

See also:
https://netdev-community.slack.com/files/U01Q2UCRPRP/F05Q42F3L4V/image.png

@cybarox commented on GitHub (Aug 30, 2023): It is not an PostgreSQL issue, the process is killed by oom-killer ``` [ 2529.555076] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=system-postgresql.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice/postgresql@14-main.service,task=postgres,pid=5310,uid=114 [ 2529.555086] Out of memory: Killed process 5310 (postgres) total-vm:5199092kB, anon-rss:3583912kB, file-rss:2580kB, shmem-rss:5500kB, UID:114 pgtables:9416kB oom_score_adj:0 ``` See also: [https://netdev-community.slack.com/files/U01Q2UCRPRP/F05Q42F3L4V/image.png](https://netdev-community.slack.com/files/U01Q2UCRPRP/F05Q42F3L4V/image.png)
Author
Owner

@abhi1693 commented on GitHub (Aug 30, 2023):

I believe it's still a configuration issue if your VM doesn't have resources to process the data. However, I'll leave this for another maintainer to take a look if they find any optimisation for the counter migration.

@abhi1693 commented on GitHub (Aug 30, 2023): I believe it's still a configuration issue if your VM doesn't have resources to process the data. However, I'll leave this for another maintainer to take a look if they find any optimisation for the counter migration.
Author
Owner

@jeremystretch commented on GitHub (Aug 30, 2023):

This can probably be addressed by defining a batch size for the bulk update operation in migration 0176_device_component_counters.

@jeremystretch commented on GitHub (Aug 30, 2023): This can probably be addressed by defining a batch size for the bulk update operation in migration `0176_device_component_counters`.
Author
Owner

@jeremystretch commented on GitHub (Aug 30, 2023):

@cybarox are you able to test the migration using the 13605-optimize-migration branch I just created? (git checkout 13605-optimize-migration and run upgrade.sh again)

@jeremystretch commented on GitHub (Aug 30, 2023): @cybarox are you able to test the migration using the [`13605-optimize-migration`](https://github.com/netbox-community/netbox/tree/13605-optimize-migration) branch I just created? (`git checkout 13605-optimize-migration` and run `upgrade.sh` again)
Author
Owner

@cybarox commented on GitHub (Aug 30, 2023):

The batch_size value solved the problem. All migrations were applied successfully. Thank you @jeremystretch !

@cybarox commented on GitHub (Aug 30, 2023): The batch_size value solved the problem. All migrations were applied successfully. Thank you @jeremystretch !
Author
Owner

@jeremystretch commented on GitHub (Aug 30, 2023):

Excellent! Thanks for the quick confirmation @cybarox.

@jeremystretch commented on GitHub (Aug 30, 2023): Excellent! Thanks for the quick confirmation @cybarox.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#8533