mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-11 20:00:28 +01:00
Expotential CPU usage from allowed peer checks #381
Closed
opened 2025-12-29 01:27:57 +01:00 by adam
·
9 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#381
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jblackwood-fes on GitHub (Nov 24, 2022).
Bug description
CPU usage grows exponentially as the number of peers grows, to the point where headscale cannot respond to updates fast enough for clients to remain connected.
This appears to be due to recalculating allowable peers for every update, which is an O(n) operation, for n peers. The allowed peer list should be static except when new peers are added, so updating the peer list once for each new peer would be a huge performance win.
Enabling ACLs makes this worse because there is more work per peer to check if it's valid, but the namespace only checks do eventually cause performance issues too with 1000s of peers.
To Reproduce
Create a network with 400-600 peers, the exact number where the performance curve becomes a problem depends on the system specs, but with a 4 core server 600 is usually enough to overwhelm the system.
Context info
@rjmalagon commented on GitHub (Nov 26, 2022):
Even 200 peers exceeds a healthy CPU quota.
@kradalby commented on GitHub (Nov 29, 2022):
While we are flattered that people use this for larger installations, our current scope is probably homelabs/small teams, and performance work will come after correctness. We will ofc keep this issue around, but I think it is worth clarifying that this isnt really a "bug" as we have not attempted to make things efficient, just "correct".
@jblackwood-fes commented on GitHub (Dec 2, 2022):
I think keeping performance in mind helps to make sure the design can grow/scale.
I've done some testing and caching peer lists until they need to change (new peers, or just not loaded) can make a huge difference in performance. My code's a bit hack-ish, but happy to share it with someone as a reference for a better fix.
@magkopian commented on GitHub (Jan 22, 2023):
Just wanted to say that we are experiencing the same issue and we have around 150 devices on our Tailnet. While we were at around 130 devices, Headscale was barely consuming any CPU. Now I see fluctuating between 60% - 80%. I recently updated to v0.18.0 but haven't noticed much of a change on that matter.
With that being said, this is on a VPS with a single CPU and 512 MB of RAM. So, we can definitely add more resources to it if needed. I just thought it would be a good idea to share my own experience.
@qzydustin commented on GitHub (Mar 31, 2023):
This experience helps me a lot. Thank you for your share.
@kradalby commented on GitHub (May 10, 2023):
This should be resolved in the next release.
@magkopian commented on GitHub (May 12, 2023):
I'm not sure what is going on, but I just updated yesterday to 0.22.2 and the CPU usage actually jumped from around 40% that it has been for weeks, to close to 100%. And it has been like that for over 16 hours so far.
Today, I updated to 0.22.3 in the hopes that the issue is fixed but unfortunately nothing changed. Any guidance on how to troubleshoot this? Also, would it be safe to downgrade back to 0.22.1?
@kradalby commented on GitHub (May 12, 2023):
Running 0.22.1 should not be a problem, there has not been any database migrations.
Could you capture a profile of the cpu usage and upload it?
b01f1f1867/cmd/headscale/headscale.go (L15-L16)@magkopian commented on GitHub (May 13, 2023):
I created a directory
/var/log/headscale/profiling/and added the following inside theheadscale.servicefile,However, when I tried restarting headscale I got the following error,
Have I misunderstood what you were asking me to do?