mirror of
https://github.com/juanfont/headscale.git
synced 2026-01-11 20:00:28 +01:00
[Bug] lastSeen not updating, causing ephemeral nodes to be incorrectly removed #739
Closed
opened 2025-12-29 02:23:07 +01:00 by adam
·
10 comments
No Branch/Tag Specified
main
update_flake_lock_action
gh-pages
kradalby/release-v0.27.2
dependabot/go_modules/golang.org/x/crypto-0.45.0
dependabot/go_modules/github.com/opencontainers/runc-1.3.3
copilot/investigate-headscale-issue-2788
copilot/investigate-visibility-issue-2788
copilot/investigate-issue-2833
copilot/debug-issue-2846
copilot/fix-issue-2847
dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0
dependabot/go_modules/github.com/docker/docker-28.3.3incompatible
kradalby/cli-experiement3
doc/0.26.1
doc/0.25.1
doc/0.25.0
doc/0.24.3
doc/0.24.2
doc/0.24.1
doc/0.24.0
kradalby/build-docker-on-pr
topic/docu-versioning
topic/docker-kos
juanfont/fix-crash-node-id
juanfont/better-disclaimer
update-contributors
topic/prettier
revert-1893-add-test-stage-to-docs
add-test-stage-to-docs
remove-node-check-interval
fix-empty-prefix
fix-ephemeral-reusable
bug_report-debuginfo
autogroups
logs-to-stderr
revert-1414-topic/fix_unix_socket
rename-machine-node
port-embedded-derp-tests-v2
port-derp-tests
duplicate-word-linter
update-tailscale-1.36
warn-against-apache
ko-fi-link
more-acl-tests
fix-typo-standalone
parallel-nolint
tparallel-fix
rerouting
ssh-changelog-docs
oidc-cleanup
web-auth-flow-tests
kradalby-gh-runner
fix-proto-lint
remove-funding-links
go-1.19
enable-1.30-in-tests
0.16.x
cosmetic-changes-integration
tmp-fix-integration-docker
fix-integration-docker
configurable-update-interval
show-nodes-online
hs2021
acl-syntax-fixes
ts2021-implementation
fix-spurious-updates
unstable-integration-tests
mandatory-stun
embedded-derp
prtemplate-fix
v0.28.0-beta.1
v0.27.2-rc.1
v0.27.1
v0.27.0
v0.27.0-beta.2
v0.27.0-beta.1
v0.26.1
v0.26.0
v0.26.0-beta.2
v0.26.0-beta.1
v0.25.1
v0.25.0
v0.25.0-beta.2
v0.24.3
v0.25.0-beta.1
v0.24.2
v0.24.1
v0.24.0
v0.24.0-beta.2
v0.24.0-beta.1
v0.23.0
v0.23.0-rc.1
v0.23.0-beta.5
v0.23.0-beta.4
v0.23.0-beta3
v0.23.0-beta2
v0.23.0-beta1
v0.23.0-alpha12
v0.23.0-alpha11
v0.23.0-alpha10
v0.23.0-alpha9
v0.23.0-alpha8
v0.23.0-alpha7
v0.23.0-alpha6
v0.23.0-alpha5
v0.23.0-alpha4
v0.23.0-alpha4-docker-ko-test9
v0.23.0-alpha4-docker-ko-test8
v0.23.0-alpha4-docker-ko-test7
v0.23.0-alpha4-docker-ko-test6
v0.23.0-alpha4-docker-ko-test5
v0.23.0-alpha-docker-release-test-debug2
v0.23.0-alpha-docker-release-test-debug
v0.23.0-alpha4-docker-ko-test4
v0.23.0-alpha4-docker-ko-test3
v0.23.0-alpha4-docker-ko-test2
v0.23.0-alpha4-docker-ko-test
v0.23.0-alpha3
v0.23.0-alpha2
v0.23.0-alpha1
v0.22.3
v0.22.2
v0.23.0-alpha-docker-release-test
v0.22.1
v0.22.0
v0.22.0-alpha3
v0.22.0-alpha2
v0.22.0-alpha1
v0.22.0-nfpmtest
v0.21.0
v0.20.0
v0.19.0
v0.19.0-beta2
v0.19.0-beta1
v0.18.0
v0.18.0-beta4
v0.18.0-beta3
v0.18.0-beta2
v0.18.0-beta1
v0.17.1
v0.17.0
v0.17.0-beta5
v0.17.0-beta4
v0.17.0-beta3
v0.17.0-beta2
v0.17.0-beta1
v0.17.0-alpha4
v0.17.0-alpha3
v0.17.0-alpha2
v0.17.0-alpha1
v0.16.4
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.16.0-beta7
v0.16.0-beta6
v0.16.0-beta5
v0.16.0-beta4
v0.16.0-beta3
v0.16.0-beta2
v0.16.0-beta1
v0.15.0
v0.15.0-beta6
v0.15.0-beta5
v0.15.0-beta4
v0.15.0-beta3
v0.15.0-beta2
v0.15.0-beta1
v0.14.0
v0.14.0-beta2
v0.14.0-beta1
v0.13.0
v0.13.0-beta3
v0.13.0-beta2
v0.13.0-beta1
upstream/v0.12.4
v0.12.4
v0.12.3
v0.12.2
v0.12.2-beta1
v0.12.1
v0.12.0-beta2
v0.12.0-beta1
v0.11.0
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.1
v0.6.0
v0.5.2
v0.5.1
v0.5.0
v0.4.0
v0.3.6
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.1
v0.1.0
Labels
Clear labels
CLI
DERP
DNS
Nix
OIDC
SSH
bug
database
documentation
duplicate
enhancement
faq
good first issue
grants
help wanted
might-come
needs design doc
needs investigation
no-stale-bot
out of scope
performance
policy 📝
pull-request
question
regression
routes
stale
tags
tailscale-feature-gap
well described ❤️
wontfix
Mirrored from GitHub Pull Request
No Label
bug
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/headscale#739
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @samcday on GitHub (Jul 12, 2024).
Is this a support request?
Is there an existing issue for this?
Current Behavior
This is kinda like #1725 ... but completely the opposite. Instead of ephemeral nodes never being expired, ephemeral nodes are always being expired on
v0.23.0-alpha12.Most of the nodes in my tailnet do not seem to be updating their `lastSeen`:
(note the online status of many of the nodes, and the stale
lastSeenas compared to the currentdate)Because the lastSeen isn't being updated, `DeleteExpiredEphemeralNodes` is deleting ephemeral nodes `h.cfg.EphemeralNodeInactivityTimeout` (in my case the config example default: 30 minutes) after they join the tailnet, even if they're still actively connected:
Expected Behavior
lastSeenis kept continually refreshed on nodes that are actively connected to the tailnet control plane. Or at least, thelastSeenis guaranteed to be refreshed at least once everyephemeral_node_inactivity_timeout / 2or something.Ephemeral nodes don't get deleted when they're still online and active. That is, I think there's two solutions to this bug: 1) make sure
lastSeenis updated, 2) ignorelastSeenduring expiration check IFnode.IsOnline.Steps To Reproduce
Environment
Runtime environment
Anything else?
No response
@samcday commented on GitHub (Jul 12, 2024):
This seems related: https://github.com/juanfont/headscale/blob/main/hscontrol/mapper/tail.go#L147-L149
I'm testing this commit in my tailnet setup at present:
d90d8d20c0@kradalby commented on GitHub (Jul 12, 2024):
Hi!, this seem like a result of stop using the lastseen field for a bunch of stuff after we actually read the docs of it https://github.com/tailscale/tailscale/blob/main/tailcfg/tailcfg.go#L333-L337.
I've written a test to confirm your issue and I'm looking at rewriting the delete logic so it doesnt depend on the field and rather look at when it was disconnected.
@samcday commented on GitHub (Jul 12, 2024):
I've patched a fix for my deployment with this commit:
3f0dfca6f4LMK if that looks like something you'd be interested in adopting. If so I'll raise a PR. Otherwise I eagerly await the rewrite+fix :D
@kradalby commented on GitHub (Jul 12, 2024):
@samcday I'm not completely done with tests yet, but could you give https://github.com/juanfont/headscale/pull/2008 ago?
@kradalby commented on GitHub (Jul 12, 2024):
This seems also sensible, I figured I'll give getting rid of those loops and use expiries a bit more cleverly. Lets see if I was too clever.
@samcday commented on GitHub (Jul 12, 2024):
Cool 👍 I'll check it out this evening or tomorrow.
FWIW, I would have tested it straight away if the PR machinery were to cut images that I could test (e.g a
ghcr.io/juantfont/headscale/unstable-dev-pr-2008or somesuch).Relatedly, with minimal hax I was able to get the headscale goreleaser CI to run in my fork and easily test images built and pushed to ghcr.io/samcday. It's great to see CI machinery in such great shape in this project. 10/10 would fork again
@kradalby commented on GitHub (Jul 12, 2024):
Great, thanks, I'll write more tests in the mean time.
That would be neat, happy to take contributions for that one, or an issue, maybe @ohdearaugustin could have a look.
Thanks! Great to hear.
@samcday commented on GitHub (Jul 15, 2024):
I applied #2008 to my local deployment this morning. This tailnet has a handful of ephemeral (autoscaling) devices. I'll report back in a day or two after the changes have soaked a bit!
@kradalby commented on GitHub (Jul 17, 2024):
Hey @samcday, just checking in if you have seen any strange behaviour before we get it reviewed and merged?
@samcday commented on GitHub (Jul 17, 2024):
I've been seeing some weirdness when I try to run a pair of HA subnet routers with an ephemeral preauth key. Seems everytime I come back to examine that setup later, one of the pods has died and been expired out. I haven't dug into that further though (I just shrugged, and scaled back down to 1 replica for now).
The original issue was biting me more severely because I have autoscaling cloud nodes (using an ephemeral key) that couldn't reach their control plane in the tailnet. I can confirm that with #2008 running for the last couple of days I've had none of those nodes getting unceremoniously yeeted from the tailnet 🎉