0.22.1 uses way, way more memory than 0.21? #491

New Issue

adam · 2025-12-29T02:19:02+01:00

adam commented

2025-12-29 02:19:02 +01:00

Originally created by @linsomniac on GitHub (Apr 28, 2023).

I was running 0.21 on a instance with 2GB of RAM. I upgraded to 0.22.1 and it immediately thrashed itself to death. I upgraded the instance to 4GB and it still is pretty quickly thrashing. At the moment AWS won't let me upgrade it to 16GB. I have ~100 nodes in my headscale.

Is this known and expected?

Originally created by @linsomniac on GitHub (Apr 28, 2023).   I was running 0.21 on a instance with 2GB of RAM. I upgraded to 0.22.1 and it immediately thrashed itself to death. I upgraded the instance to 4GB and it still is pretty quickly thrashing. At the moment AWS won't let me upgrade it to 16GB. I have ~100 nodes in my headscale. Is this known and expected?

adam added the bug label 2025-12-29 02:19:02 +01:00

adam closed this issue

2025-12-29 02:19:02 +01:00

adam commented

2025-12-29 02:19:02 +01:00

@loprima-l commented on GitHub (Apr 28, 2023):

As far as I know, I'm only aware of CPU trouble on large installations, but that sounds "normal" as Headscale isn't optimized yet for large installations.

Are you sure that the problem came from RAM ?

@loprima-l commented on GitHub (Apr 28, 2023): As far as I know, I'm only aware of CPU trouble on large installations, but that sounds "normal" as Headscale isn't optimized yet for large installations. Are you sure that the problem came from RAM ?

adam commented

2025-12-29 02:19:03 +01:00

@linsomniac commented on GitHub (Apr 28, 2023):

Yep, I'm sure the problem was RAM. I was getting OOM messages on the console.

I've regularly run into memory issues, originally was running on a 1GB machine, but started having both CPU and RAM issues when I added ~100 nodes. So I upped it. I do have fairly high disc I/O, I had to reduce my update time, I think I went from 10s to 30s.

vmstat during this is showing memory free (free+buf+cache) going down to ~100MB, and after I kill headscale it goes back up to 3.6GB free. During that time it was doing super heavy "block in" and wait cpu time was ~80%, so heavy disc activity, heavy read, heavy memory use.

Rememinder: I was running 0.21 in 2GB on this system, installed 0.22.1 and restarted headscale, and started getting OOMs. Doubled RAM and also was getting OOMs. Switched back to 0.21 and now have been running several hours and have 3GB free, 470M in buff/cache, 396MB in "used".

Seems to point to 0.22.1 having some dramatically higher memory use.

@linsomniac commented on GitHub (Apr 28, 2023): Yep, I'm sure the problem was RAM. I was getting OOM messages on the console. I've regularly run into memory issues, originally was running on a 1GB machine, but started having both CPU and RAM issues when I added ~100 nodes. So I upped it. I do have fairly high disc I/O, I had to reduce my update time, I think I went from 10s to 30s. vmstat during this is showing memory free (free+buf+cache) going down to ~100MB, and after I kill headscale it goes back up to 3.6GB free. During that time it was doing super heavy "block in" and wait cpu time was ~80%, so heavy disc activity, heavy read, heavy memory use. Rememinder: I was running 0.21 in 2GB on this system, installed 0.22.1 and restarted headscale, and started getting OOMs. Doubled RAM and also was getting OOMs. Switched back to 0.21 and now have been running several hours and have 3GB free, 470M in buff/cache, 396MB in "used". Seems to point to 0.22.1 having some dramatically higher memory use.

adam commented

2025-12-29 02:19:03 +01:00

@loprima-l commented on GitHub (Apr 28, 2023):

Have you successfully backed up your system to the previous version ? I think it's the better option now.

I think your issue is related to another issue that we choose to not fix yet.
Fixing those performance issues means a lot to me as big environnement made easier to find bugs but can't be our priority. I'm gonna check it when as son as possible.

@loprima-l commented on GitHub (Apr 28, 2023): Have you successfully backed up your system to the previous version ? I think it's the better option now. I think your issue is related to another issue that we choose to not fix yet. Fixing those performance issues means a lot to me as big environnement made easier to find bugs but can't be our priority. I'm gonna check it when as son as possible.

adam commented

2025-12-29 02:19:03 +01:00

@loprima-l commented on GitHub (Apr 28, 2023):

Also, can you introduce a bit more to your Headscale instance, like why are you using Headscale, and what are your users ? Is it a prod environment ? Ect...

I'm interested to know what type of large infra are using Headscale

@loprima-l commented on GitHub (Apr 28, 2023): Also, can you introduce a bit more to your Headscale instance, like why are you using Headscale, and what are your users ? Is it a prod environment ? Ect... I'm interested to know what type of large infra are using Headscale

adam commented

2025-12-29 02:19:03 +01:00

@loprima-l commented on GitHub (Apr 29, 2023):

Hi, I think you should give #1377 a try if you have a bunch of ACLs, because I think with 100+ machines you must have a lot of ACLs

@loprima-l commented on GitHub (Apr 29, 2023): Hi, I think you should give #1377 a try if you have a bunch of ACLs, because I think with 100+ machines you must have a lot of ACLs

adam commented

2025-12-29 02:19:03 +01:00

@linsomniac commented on GitHub (Apr 29, 2023):

Yes, I have successfully returned to 0.21, I just had to wait for the OOM killer to make the system responsive enough to get a window to stop headscale and revert.

Why am I using headscale? I couldn't get buy in to purchase tailscale.

Size of ACLs: I have 3 groups, 5 subnets, 22 ACL rules, my entire acls.yaml is ~170 lines.

"headscale node list | wc" is 115 lines.

My environment is dev, staging, and production, mostly virtual machines and some AWS EC2 instances, mostly Linux. I deployed tailscale to all the dev/stg instances, and a handful of production instances (mostly administrative things and the firewalls as subnet routers). The users primarily are me and one of the other operations people, I'm still in a proof of concept mode. The longer term plan would be to bring on the ~8 developers and maybe a couple Q&A people, maybe up to 10 more.

@linsomniac commented on GitHub (Apr 29, 2023): Yes, I have successfully returned to 0.21, I just had to wait for the OOM killer to make the system responsive enough to get a window to stop headscale and revert. Why am I using headscale? I couldn't get buy in to purchase tailscale. Size of ACLs: I have 3 groups, 5 subnets, 22 ACL rules, my entire acls.yaml is ~170 lines. "headscale node list | wc" is 115 lines. My environment is dev, staging, and production, mostly virtual machines and some AWS EC2 instances, mostly Linux. I deployed tailscale to all the dev/stg instances, and a handful of production instances (mostly administrative things and the firewalls as subnet routers). The users primarily are me and one of the other operations people, I'm still in a proof of concept mode. The longer term plan would be to bring on the ~8 developers and maybe a couple Q&A people, maybe up to 10 more.

adam commented

2025-12-29 02:19:04 +01:00

@linsomniac commented on GitHub (Apr 29, 2023):

I've switched my EC2 instance to a t3a.xlarge with 16GB of RAM, and restarted headscale with 0.22.1, and watched as the free memory dipped down to 2GB, then it gradually returned to 14GB. Here's a sampling of vmstat output during this run:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 15365232  50972 477732    0    0 23264   100 2989 3572  3  3 86  7  1
 3  0      0 14937296  51044 478832    0    0    80   380 6240 2122 30  4 63  1  2
 4  0      0 13252048  51164 479000    0    0     0    88 77691 1149 85 15  0  0  0
 4  0      0 10792780  51328 479028    0    0     0     0 66285 2253 87 12  0  0  0
 4  0      0 8397076  51516 479520    0    0     0     0 55892  977 92  8  0  0  0
 4  0      0 6178920  51604 479560    0    0     0    80 49133  925 92  8  0  0  0
 4  0      0 3857496  51776 479688    0    0     0    28 82437 1706 86 13  1  0  0
 4  0      0 2100900  52272 479644    0    0     0   696 2430 1715 97  2  0  1  0
 3  1      0 3500568  53408 480084    0    0     0   376 3453 1316 98  1  0  1  0
 5  0      0 4365412  53964 480512    0    0     0   192 4178 1425 97  3  0  0  0
 4  0      0 5963896  54880 480664    0    0     0   416 2551 1712 97  2  1  0  0
 4  0      0 6821424  55412 480696    0    0     0   392 3425 1373 98  2  0  1  0
 5  0      0 12377728  56956 480728    0    0     0  1944 7442 4131 83  3 11  3  0
 1  0      0 14483708  58000 480820    0    0     0   348 3359 1662 36  3 59  1  2
 0  0      0 14493176  58372 480868    0    0     0   180 1142 1550  3  1 95  0  0

Looks like it does that every time I restart it (was wondering if there was a one-time housekeeping).

Maybe it's some combination of 110-ish hosts and 20-ish ACLs? But something changed between 0.21, which I've been able to successfully run in 2GB of RAM, and 0.22.1, which is requiring ~14GB.

@linsomniac commented on GitHub (Apr 29, 2023): I've switched my EC2 instance to a t3a.xlarge with 16GB of RAM, and restarted headscale with 0.22.1, and watched as the free memory dipped down to 2GB, then it gradually returned to 14GB. Here's a sampling of vmstat output during this run: procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 15365232 50972 477732 0 0 23264 100 2989 3572 3 3 86 7 1 3 0 0 14937296 51044 478832 0 0 80 380 6240 2122 30 4 63 1 2 4 0 0 13252048 51164 479000 0 0 0 88 77691 1149 85 15 0 0 0 4 0 0 10792780 51328 479028 0 0 0 0 66285 2253 87 12 0 0 0 4 0 0 8397076 51516 479520 0 0 0 0 55892 977 92 8 0 0 0 4 0 0 6178920 51604 479560 0 0 0 80 49133 925 92 8 0 0 0 4 0 0 3857496 51776 479688 0 0 0 28 82437 1706 86 13 1 0 0 4 0 0 2100900 52272 479644 0 0 0 696 2430 1715 97 2 0 1 0 3 1 0 3500568 53408 480084 0 0 0 376 3453 1316 98 1 0 1 0 5 0 0 4365412 53964 480512 0 0 0 192 4178 1425 97 3 0 0 0 4 0 0 5963896 54880 480664 0 0 0 416 2551 1712 97 2 1 0 0 4 0 0 6821424 55412 480696 0 0 0 392 3425 1373 98 2 0 1 0 5 0 0 12377728 56956 480728 0 0 0 1944 7442 4131 83 3 11 3 0 1 0 0 14483708 58000 480820 0 0 0 348 3359 1662 36 3 59 1 2 0 0 0 14493176 58372 480868 0 0 0 180 1142 1550 3 1 95 0 0 Looks like it does that every time I restart it (was wondering if there was a one-time housekeeping). Maybe it's some combination of 110-ish hosts and 20-ish ACLs? But something changed between 0.21, which I've been able to successfully run in 2GB of RAM, and 0.22.1, which is requiring ~14GB.

adam commented

2025-12-29 02:19:04 +01:00

@loprima-l commented on GitHub (Apr 30, 2023):

Thanks for your reply, have u tried the patch in #1337 ?

@loprima-l commented on GitHub (Apr 30, 2023): Thanks for your reply, have u tried the patch in #1337 ?

adam commented

2025-12-29 02:19:04 +01:00

@linsomniac commented on GitHub (May 1, 2023):

I haven't, I will give it a try probably this evening.

@linsomniac commented on GitHub (May 1, 2023): I haven't, I will give it a try probably this evening.

adam commented

2025-12-29 02:19:04 +01:00

@linsomniac commented on GitHub (May 2, 2023):

It looks like #1377 is merged into main, so I grabbed that and built it and it does indeed seem to have solved the memory issue.

@linsomniac commented on GitHub (May 2, 2023): It looks like #1377 is merged into main, so I grabbed that and built it and it does indeed seem to have solved the memory issue.

adam commented

2025-12-29 02:19:04 +01:00

@loprima-l commented on GitHub (May 2, 2023):

Súper ! Are the performance better or érode than on 0.21 ?

@loprima-l commented on GitHub (May 2, 2023): Súper ! Are the performance better or érode than on 0.21 ?

adam commented

2025-12-29 02:19:04 +01:00

@linsomniac commented on GitHub (May 2, 2023):

I only ran it a little bit, but performance seemed similar to 0.21. I really didn't do much testing of it. I had kind of a janky build, built against libraries in /nix, and I decided to go back to running 0.21 at the moment until the next release comes out. I couldn't seem to get the build to work, or at least couldn't find the resulting binary, when I did "go build", it wasn't writing to ~/go/bin like I was expecting.

@linsomniac commented on GitHub (May 2, 2023): I only ran it a little bit, but performance seemed similar to 0.21. I really didn't do much testing of it. I had kind of a janky build, built against libraries in /nix, and I decided to go back to running 0.21 at the moment until the next release comes out. I couldn't seem to get the build to work, or at least couldn't find the resulting binary, when I did "go build", it wasn't writing to ~/go/bin like I was expecting.

adam commented

2025-12-29 02:19:04 +01:00

@kradalby commented on GitHub (May 10, 2023):

We will release with #1377 in a bit, please test that and reopen if it still is an issue.

@kradalby commented on GitHub (May 10, 2023): We will release with #1377 in a bit, please test that and reopen if it still is an issue.

Sign in to join this conversation.

Branches Tags

main

update_flake_lock_action

gh-pages

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#491