mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-11 20:00:28 +01:00
Frequent "offline" status causing subnet router re-election and connection disruptions #528
Closed
opened 2025-12-29 02:19:33 +01:00 by adam
·
11 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#528
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @vsychov on GitHub (Jun 30, 2023).
Hello,
I have noticed a recurring issue where I often see console messages from headscale indicating that a machine has gone "offline", even though the machine is actually online and has no issues with its internet connection. As I am using tailscale as a subnet router, this results in re-election of the "primary route" if such a machine was being used as the "primary route", leading to connection disruptions.
It appears that the problem lies in how a machine is set to "offline" mode, using the
last_seenfield in the database. A machine goes offline when thelast_seenfield reaches a value of 60 seconds (keepAliveInterval). Therefore, even a slight delay of just an extra second can make the machine go offline, leading to a new subnet router being elected.It looks like field
last_seenupdated inkeepAliveTickerand few other places, and it's happens each 40-60 seconds in my setup, that's not enough.From what I can see, this problem could be solved by updating the
last_seenfield in theupdateCheckerTicker(which by default occurs every 10 seconds -NodeUpdateCheckInterval), simply by adding:right after:
fe75b71620/hscontrol/poll.go (L561)I hope this suggestion is helpful and look forward to any feedback.
Thank you
@kradalby commented on GitHub (Jul 7, 2023):
This might be fixed, or we might have the base to fix this when #1492 land, it starts looking at the Online field, and sends update in a different way. It might not have been directly addressed, but should be easier to fix.
@github-actions[bot] commented on GitHub (Dec 24, 2023):
This issue is stale because it has been open for 90 days with no activity.
@github-actions[bot] commented on GitHub (Dec 31, 2023):
This issue was closed because it has been inactive for 14 days since being marked as stale.
@andreyrd commented on GitHub (Jan 17, 2024):
This is still an active issue in the latest stable version.
Is this fixed in the latest alpha and is the latest alpha ready for use in a prod-like environment?
@kradalby commented on GitHub (Jan 19, 2024):
@andreyrd we follow common software release practices and alpha software is not recommended to use in production, we need help testing it so we release it under a alpha/beta label to imply that you need to be cautious using this.
I believe the issue has been solved, but we need people who encounter the problem to test it, if you have the opportunity, that would be great.
@kradalby commented on GitHub (Feb 19, 2024):
Could you please test if this is still the case with https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha5 ?
@eNdiD commented on GitHub (Feb 27, 2024):
@kradalby with the latest 0.23.5-alpha5 there is an odd behavior. I constantly see my android clients go offline while they continue to work fine with the tailnet. But Headscale seems to stop sending updates to them. To make them become online again they need to send some updates by themselves, like moving to a different network, or if I manually restart Tailscale connection on them.
Once I've seen the very same on my Raspberry Pi, but only once, and I'm not sure what the cause was. Other linux clients stay online without an issue.
Update: Going offline is not instant. The android nodes stay online for some time, like hours. More interestingly, "offline nodes" may have kinda fresh
last seenvalue, like one minute ago.Update2: I believe it can be reproduced by switching networks. Like the next scenario:
@fortitudepub commented on GitHub (Mar 19, 2024):
I also found this issue with the 0.23.5+ version, by some investigation, I think it may be caused by existing connection to controller have been reset (by switching the router /wifi because it may switch the NAT outside address or other reasons) and meanwhile the new connection established quickly, in that case, in poll.go the old connection's defer action may be executed after e the new conection being added because the online status is now a map index by node key.
@kradalby commented on GitHub (Apr 17, 2024):
Could you please try the newest alpha (https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha6) and report back?
@vsychov commented on GitHub (Apr 18, 2024):
Thanks @kradalby , I'll make tests today or tomorrow
@kradalby commented on GitHub (May 24, 2024):
I believe fixes in https://github.com/juanfont/headscale/releases/tag/v0.23.0-alpha12 should resolve this issue, let me now if not and we will reopen it.