[Bug] lastSeen not updating, causing ephemeral nodes to be incorrectly removed #739

Closed
opened 2025-12-29 02:23:07 +01:00 by adam · 10 comments
Owner

Originally created by @samcday on GitHub (Jul 12, 2024).

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

This is kinda like #1725 ... but completely the opposite. Instead of ephemeral nodes never being expired, ephemeral nodes are always being expired on v0.23.0-alpha12.

Most of the nodes in my tailnet do not seem to be updating their `lastSeen`:

/ # headscale node list
{"level":"warn","time":1720770108,"message":"An updated version of Headscale has been found (0.23.0-alpha9 vs. your current v0.23.0-alpha12). Check it out https://github.com/juanfont/headscale/releases\n"}
{"level":"debug","timeout":5000,"time":1720770108,"message":"Setting timeout"}
{"level":"debug","socket":"/var/run/headscale/headscale.sock","time":1720770108,"message":"HEADSCALE_CLI_ADDRESS environment is not set, connecting to unix socket."}
ID | Hostname              | Name                  | MachineKey | NodeKey | User | IP addresses                                             | Ephemeral | Last seen           | Expiration          | Connected | Expired
1  | sam-desktop           | sam-desktop           | [eVvnq]    | [rEeDH] | sam  | 100.80.252.223, fd7a:115c:a1e0:36ae:ea56:d47f:5599:bf8c  | false     | 2024-07-11 20:48:58 | 0001-01-01 00:00:00 | online    | no
2  | sam-laptop            | sam-laptop            | [Z+WQN]    | [90YXd] | sam  | 100.117.81.112, fd7a:115c:a1e0:85d2:1875:8134:5cfc:9e2f  | false     | 2024-07-08 19:02:32 | 0001-01-01 00:00:00 | offline   | no
3  | localhost             | sam-pixel             | [2sO6K]    | [RPjmd] | sam  | 100.120.152.22, fd7a:115c:a1e0:3d98:1040:6cef:119a:18d9  | false     | 2024-07-12 07:38:23 | 0001-01-01 00:00:00 | online    | no
5  | sam-deskwart          | sam-deskwart          | [jHH/I]    | [UwXt7] | sam  | 100.83.11.59, fd7a:115c:a1e0:f6a5:86cb:f39:3cc3:4a4      | false     | 2024-07-12 07:39:27 | 0001-01-01 00:00:00 | online    | no
6  | sam-deck              | sam-deck              | [rDf+m]    | [saBU8] | sam  | 100.87.8.109, fd7a:115c:a1e0:b6f8:a2ec:de6b:5887:c52a    | false     | 2024-07-07 15:20:28 | 0001-01-01 00:00:00 | offline   | no
8  | home-cluster-router   | home-cluster-router   | [wv7AT]    | [+LpMc] | sam  | 100.120.168.113, fd7a:115c:a1e0:8fbf:6e88:a628:fb2d:6968 | false     | 2024-07-10 22:03:33 | 0001-01-01 00:00:00 | online    | no
18 | internal-net-router-0 | internal-net-router-0 | [Brxps]    | [YWv0x] | sam  | 100.122.125.77, fd7a:115c:a1e0:2783:36c7:3f68:3856:6ef   | false     | 2024-07-11 21:33:48 | 0001-01-01 00:00:00 | online    | no

/ # date
Fri Jul 12 07:41:49 UTC 2024

(note the online status of many of the nodes, and the stale lastSeen as compared to the current date)

Because the lastSeen isn't being updated, `DeleteExpiredEphemeralNodes` is deleting ephemeral nodes `h.cfg.EphemeralNodeInactivityTimeout` (in my case the config example default: 30 minutes) after they join the tailnet, even if they're still actively connected:

2024-07-12 08:35:32.895	{"level":"info","node":"pool1-1d0189563e8df30a","time":1720766132,"message":"Ephemeral client removed from database"}
2024-07-12 08:35:32.949	{"level":"error","caller":"/home/runner/work/headscale/headscale/hscontrol/poll.go:705","readOnly":false,"omitPeers":false,"stream":true,"node.id":27,"node":"pool1-1d0189563e8df30a","error":"record not found","time":1720766132,"message":"Could not get machine from db"}
2024-07-12 08:35:32.958	2024/07/12 06:35:32 http2: panic serving 172.30.3.219:33220: runtime error: invalid memory address or nil pointer dereference
2024-07-12 08:35:32.958	goroutine 5314 [running]:
2024-07-12 08:35:32.958	golang.org/x/net/http2.(*serverConn).runHandler.func1()
2024-07-12 08:35:32.958		/home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2363 +0x145
2024-07-12 08:35:32.958	panic({0x1cec9e0?, 0x3372bf0?})
2024-07-12 08:35:32.958		/nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/runtime/panic.go:770 +0x132
2024-07-12 08:35:32.958	github.com/juanfont/headscale/hscontrol.(*mapSession).serveLongPoll.func1()
2024-07-12 08:35:32.959		/home/runner/work/headscale/headscale/hscontrol/poll.go:194 +0x90
2024-07-12 08:35:32.959	github.com/juanfont/headscale/hscontrol.(*mapSession).serveLongPoll(0xc000002480)
2024-07-12 08:35:32.959		/home/runner/work/headscale/headscale/hscontrol/poll.go:277 +0x1214
2024-07-12 08:35:32.959	github.com/juanfont/headscale/hscontrol.(*noiseServer).NoisePollNetMapHandler(0xc00077ddd0, {0x23790f0, 0xc0005a8200}, 0xc000a1b680)
2024-07-12 08:35:32.959		/home/runner/work/headscale/headscale/hscontrol/noise.go:240 +0x365
2024-07-12 08:35:32.959	net/http.HandlerFunc.ServeHTTP(0x7efcc365a108?, {0x23790f0?, 0xc0005a8200?}, 0xd44d80?)
2024-07-12 08:35:32.959		/nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/net/http/server.go:2166 +0x29
2024-07-12 08:35:32.959	github.com/juanfont/headscale/hscontrol.prometheusMiddleware.func1({0x23790f0, 0xc0005a8200}, 0xc000a1b680)
2024-07-12 08:35:32.959		/home/runner/work/headscale/headscale/hscontrol/metrics.go:87 +0x143
2024-07-12 08:35:32.959	net/http.HandlerFunc.ServeHTTP(0xc000a1b560?, {0x23790f0?, 0xc0005a8200?}, 0x7efc7c7042c8?)
2024-07-12 08:35:32.959		/nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/net/http/server.go:2166 +0x29
2024-07-12 08:35:32.959	github.com/gorilla/mux.(*Router).ServeHTTP(0xc0003d26c0, {0x23790f0, 0xc0005a8200}, 0xc0008b77a0)
2024-07-12 08:35:32.959		/home/runner/go/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212 +0x1e2
2024-07-12 08:35:32.959	golang.org/x/net/http2.(*serverConn).runHandler(0x44973d?, 0xc000c62910?, 0x237bb40?, 0xc000c62b40?)
2024-07-12 08:35:32.959		/home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2370 +0xbb
2024-07-12 08:35:32.959	created by golang.org/x/net/http2.(*serverConn).scheduleHandler in goroutine 5259
2024-07-12 08:35:32.959		/home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2305 +0x21d

Expected Behavior

lastSeen is kept continually refreshed on nodes that are actively connected to the tailnet control plane. Or at least, the lastSeen is guaranteed to be refreshed at least once every ephemeral_node_inactivity_timeout / 2 or something.

Ephemeral nodes don't get deleted when they're still online and active. That is, I think there's two solutions to this bug: 1) make sure lastSeen is updated, 2) ignore lastSeen during expiration check IF node.IsOnline.

Steps To Reproduce

cat config-example.yaml | sed 's/^listen_addr:.*/listen_addr: 0.0.0.0:8080/' | sed 's/ephemeral_node_inactivity_timeout:.*/ephemeral_node_inactivity_timeout: 1m6s/'  > config-2006.yaml

docker run --name demo-2006 -p 8080:8080 -v `pwd`/config-2006.yaml:/etc/headscale/config.yaml headscale/headscale:v0.23.0-alpha12-debug serve

docker exec -it demo-2006 headscale user create sam

key=$(docker exec -it demo-2006 headscale pre create --ephemeral -u sam | tail -1)
sudo tailscale login --login-server=http://localhost:8080 --authkey=$key

docker exec -it demo-2006 headscale node list

sleep 66s

docker exec -it demo-2006 headscale node list

Environment

- OS: Container (running in Kube 1.30 on CoreOS-stable nodes)
- Headscale version: v0.23.0-alpha12
- Tailscale version: 1.68.2

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

No response

Originally created by @samcday on GitHub (Jul 12, 2024). ### Is this a support request? - [X] This is not a support request ### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior This is kinda like #1725 ... but completely the opposite. Instead of ephemeral nodes never being expired, ephemeral nodes are *always* being expired on `v0.23.0-alpha12`. <details><summary>Most of the nodes in my tailnet do not seem to be updating their `lastSeen`:</summary> <p> ``` / # headscale node list {"level":"warn","time":1720770108,"message":"An updated version of Headscale has been found (0.23.0-alpha9 vs. your current v0.23.0-alpha12). Check it out https://github.com/juanfont/headscale/releases\n"} {"level":"debug","timeout":5000,"time":1720770108,"message":"Setting timeout"} {"level":"debug","socket":"/var/run/headscale/headscale.sock","time":1720770108,"message":"HEADSCALE_CLI_ADDRESS environment is not set, connecting to unix socket."} ID | Hostname | Name | MachineKey | NodeKey | User | IP addresses | Ephemeral | Last seen | Expiration | Connected | Expired 1 | sam-desktop | sam-desktop | [eVvnq] | [rEeDH] | sam | 100.80.252.223, fd7a:115c:a1e0:36ae:ea56:d47f:5599:bf8c | false | 2024-07-11 20:48:58 | 0001-01-01 00:00:00 | online | no 2 | sam-laptop | sam-laptop | [Z+WQN] | [90YXd] | sam | 100.117.81.112, fd7a:115c:a1e0:85d2:1875:8134:5cfc:9e2f | false | 2024-07-08 19:02:32 | 0001-01-01 00:00:00 | offline | no 3 | localhost | sam-pixel | [2sO6K] | [RPjmd] | sam | 100.120.152.22, fd7a:115c:a1e0:3d98:1040:6cef:119a:18d9 | false | 2024-07-12 07:38:23 | 0001-01-01 00:00:00 | online | no 5 | sam-deskwart | sam-deskwart | [jHH/I] | [UwXt7] | sam | 100.83.11.59, fd7a:115c:a1e0:f6a5:86cb:f39:3cc3:4a4 | false | 2024-07-12 07:39:27 | 0001-01-01 00:00:00 | online | no 6 | sam-deck | sam-deck | [rDf+m] | [saBU8] | sam | 100.87.8.109, fd7a:115c:a1e0:b6f8:a2ec:de6b:5887:c52a | false | 2024-07-07 15:20:28 | 0001-01-01 00:00:00 | offline | no 8 | home-cluster-router | home-cluster-router | [wv7AT] | [+LpMc] | sam | 100.120.168.113, fd7a:115c:a1e0:8fbf:6e88:a628:fb2d:6968 | false | 2024-07-10 22:03:33 | 0001-01-01 00:00:00 | online | no 18 | internal-net-router-0 | internal-net-router-0 | [Brxps] | [YWv0x] | sam | 100.122.125.77, fd7a:115c:a1e0:2783:36c7:3f68:3856:6ef | false | 2024-07-11 21:33:48 | 0001-01-01 00:00:00 | online | no / # date Fri Jul 12 07:41:49 UTC 2024 ``` (note the online status of many of the nodes, and the stale `lastSeen` as compared to the current `date`) </p> </details> <details><summary>Because the lastSeen isn't being updated, `DeleteExpiredEphemeralNodes` is deleting ephemeral nodes `h.cfg.EphemeralNodeInactivityTimeout` (in my case the config example default: 30 minutes) after they join the tailnet, even if they're still actively connected:</summary> <p> ``` 2024-07-12 08:35:32.895 {"level":"info","node":"pool1-1d0189563e8df30a","time":1720766132,"message":"Ephemeral client removed from database"} 2024-07-12 08:35:32.949 {"level":"error","caller":"/home/runner/work/headscale/headscale/hscontrol/poll.go:705","readOnly":false,"omitPeers":false,"stream":true,"node.id":27,"node":"pool1-1d0189563e8df30a","error":"record not found","time":1720766132,"message":"Could not get machine from db"} 2024-07-12 08:35:32.958 2024/07/12 06:35:32 http2: panic serving 172.30.3.219:33220: runtime error: invalid memory address or nil pointer dereference 2024-07-12 08:35:32.958 goroutine 5314 [running]: 2024-07-12 08:35:32.958 golang.org/x/net/http2.(*serverConn).runHandler.func1() 2024-07-12 08:35:32.958 /home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2363 +0x145 2024-07-12 08:35:32.958 panic({0x1cec9e0?, 0x3372bf0?}) 2024-07-12 08:35:32.958 /nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/runtime/panic.go:770 +0x132 2024-07-12 08:35:32.958 github.com/juanfont/headscale/hscontrol.(*mapSession).serveLongPoll.func1() 2024-07-12 08:35:32.959 /home/runner/work/headscale/headscale/hscontrol/poll.go:194 +0x90 2024-07-12 08:35:32.959 github.com/juanfont/headscale/hscontrol.(*mapSession).serveLongPoll(0xc000002480) 2024-07-12 08:35:32.959 /home/runner/work/headscale/headscale/hscontrol/poll.go:277 +0x1214 2024-07-12 08:35:32.959 github.com/juanfont/headscale/hscontrol.(*noiseServer).NoisePollNetMapHandler(0xc00077ddd0, {0x23790f0, 0xc0005a8200}, 0xc000a1b680) 2024-07-12 08:35:32.959 /home/runner/work/headscale/headscale/hscontrol/noise.go:240 +0x365 2024-07-12 08:35:32.959 net/http.HandlerFunc.ServeHTTP(0x7efcc365a108?, {0x23790f0?, 0xc0005a8200?}, 0xd44d80?) 2024-07-12 08:35:32.959 /nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/net/http/server.go:2166 +0x29 2024-07-12 08:35:32.959 github.com/juanfont/headscale/hscontrol.prometheusMiddleware.func1({0x23790f0, 0xc0005a8200}, 0xc000a1b680) 2024-07-12 08:35:32.959 /home/runner/work/headscale/headscale/hscontrol/metrics.go:87 +0x143 2024-07-12 08:35:32.959 net/http.HandlerFunc.ServeHTTP(0xc000a1b560?, {0x23790f0?, 0xc0005a8200?}, 0x7efc7c7042c8?) 2024-07-12 08:35:32.959 /nix/store/6bvndddvxaypc42x6x4ari20gv3vfdgd-go-1.22.2/share/go/src/net/http/server.go:2166 +0x29 2024-07-12 08:35:32.959 github.com/gorilla/mux.(*Router).ServeHTTP(0xc0003d26c0, {0x23790f0, 0xc0005a8200}, 0xc0008b77a0) 2024-07-12 08:35:32.959 /home/runner/go/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212 +0x1e2 2024-07-12 08:35:32.959 golang.org/x/net/http2.(*serverConn).runHandler(0x44973d?, 0xc000c62910?, 0x237bb40?, 0xc000c62b40?) 2024-07-12 08:35:32.959 /home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2370 +0xbb 2024-07-12 08:35:32.959 created by golang.org/x/net/http2.(*serverConn).scheduleHandler in goroutine 5259 2024-07-12 08:35:32.959 /home/runner/go/pkg/mod/golang.org/x/net@v0.25.0/http2/server.go:2305 +0x21d ``` </p> </details> ### Expected Behavior `lastSeen` is kept continually refreshed on nodes that are actively connected to the tailnet control plane. Or at least, the `lastSeen` is guaranteed to be refreshed at least once every `ephemeral_node_inactivity_timeout / 2` or something. Ephemeral nodes don't get deleted when they're still online and active. That is, I think there's two solutions to this bug: 1) make sure `lastSeen` is updated, 2) ignore `lastSeen` during expiration check IF `node.IsOnline`. ### Steps To Reproduce ``` cat config-example.yaml | sed 's/^listen_addr:.*/listen_addr: 0.0.0.0:8080/' | sed 's/ephemeral_node_inactivity_timeout:.*/ephemeral_node_inactivity_timeout: 1m6s/' > config-2006.yaml docker run --name demo-2006 -p 8080:8080 -v `pwd`/config-2006.yaml:/etc/headscale/config.yaml headscale/headscale:v0.23.0-alpha12-debug serve docker exec -it demo-2006 headscale user create sam key=$(docker exec -it demo-2006 headscale pre create --ephemeral -u sam | tail -1) sudo tailscale login --login-server=http://localhost:8080 --authkey=$key docker exec -it demo-2006 headscale node list sleep 66s docker exec -it demo-2006 headscale node list ``` ### Environment ```markdown - OS: Container (running in Kube 1.30 on CoreOS-stable nodes) - Headscale version: v0.23.0-alpha12 - Tailscale version: 1.68.2 ``` ### Runtime environment - [X] Headscale is behind a (reverse) proxy - [X] Headscale runs in a container ### Anything else? _No response_
adam added the bug label 2025-12-29 02:23:07 +01:00
adam closed this issue 2025-12-29 02:23:07 +01:00
Author
Owner

@samcday commented on GitHub (Jul 12, 2024):

This seems related: https://github.com/juanfont/headscale/blob/main/hscontrol/mapper/tail.go#L147-L149

I'm testing this commit in my tailnet setup at present: d90d8d20c0

@samcday commented on GitHub (Jul 12, 2024): This seems related: https://github.com/juanfont/headscale/blob/main/hscontrol/mapper/tail.go#L147-L149 I'm testing this commit in my tailnet setup at present: https://github.com/samcday/headscale/commit/d90d8d20c0e9e0291f722316f61d946cc2ea1ebe
Author
Owner

@kradalby commented on GitHub (Jul 12, 2024):

Hi!, this seem like a result of stop using the lastseen field for a bunch of stuff after we actually read the docs of it https://github.com/tailscale/tailscale/blob/main/tailcfg/tailcfg.go#L333-L337.

I've written a test to confirm your issue and I'm looking at rewriting the delete logic so it doesnt depend on the field and rather look at when it was disconnected.

@kradalby commented on GitHub (Jul 12, 2024): Hi!, this seem like a result of stop using the lastseen field for a bunch of stuff after we actually read the docs of it https://github.com/tailscale/tailscale/blob/main/tailcfg/tailcfg.go#L333-L337. I've written a test to confirm your issue and I'm looking at rewriting the delete logic so it doesnt depend on the field and rather look at when it was disconnected.
Author
Owner

@samcday commented on GitHub (Jul 12, 2024):

I've patched a fix for my deployment with this commit: 3f0dfca6f4

LMK if that looks like something you'd be interested in adopting. If so I'll raise a PR. Otherwise I eagerly await the rewrite+fix :D

@samcday commented on GitHub (Jul 12, 2024): I've patched a fix for my deployment with this commit: https://github.com/juanfont/headscale/commit/3f0dfca6f474054b40ee543434c23ea7a4ca37a7 LMK if that looks like something you'd be interested in adopting. If so I'll raise a PR. Otherwise I eagerly await the rewrite+fix :D
Author
Owner

@kradalby commented on GitHub (Jul 12, 2024):

@samcday I'm not completely done with tests yet, but could you give https://github.com/juanfont/headscale/pull/2008 ago?

@kradalby commented on GitHub (Jul 12, 2024): @samcday I'm not completely done with tests yet, but could you give https://github.com/juanfont/headscale/pull/2008 ago?
Author
Owner

@kradalby commented on GitHub (Jul 12, 2024):

I've patched a fix for my deployment with this commit: 3f0dfca

LMK if that looks like something you'd be interested in adopting. If so I'll raise a PR. Otherwise I eagerly await the rewrite+fix :D

This seems also sensible, I figured I'll give getting rid of those loops and use expiries a bit more cleverly. Lets see if I was too clever.

@kradalby commented on GitHub (Jul 12, 2024): > I've patched a fix for my deployment with this commit: [3f0dfca](https://github.com/juanfont/headscale/commit/3f0dfca6f474054b40ee543434c23ea7a4ca37a7) > > LMK if that looks like something you'd be interested in adopting. If so I'll raise a PR. Otherwise I eagerly await the rewrite+fix :D This seems also sensible, I figured I'll give getting rid of those loops and use expiries a bit more cleverly. Lets see if I was too clever.
Author
Owner

@samcday commented on GitHub (Jul 12, 2024):

@samcday I'm not completely done with tests yet, but could you give #2008 ago?

Cool 👍 I'll check it out this evening or tomorrow.

FWIW, I would have tested it straight away if the PR machinery were to cut images that I could test (e.g a ghcr.io/juantfont/headscale/unstable-dev-pr-2008 or somesuch).

Relatedly, with minimal hax I was able to get the headscale goreleaser CI to run in my fork and easily test images built and pushed to ghcr.io/samcday. It's great to see CI machinery in such great shape in this project. 10/10 would fork again

@samcday commented on GitHub (Jul 12, 2024): > @samcday I'm not completely done with tests yet, but could you give #2008 ago? Cool :+1: I'll check it out this evening or tomorrow. FWIW, I would have tested it straight away if the PR machinery were to cut images that I could test (e.g a `ghcr.io/juantfont/headscale/unstable-dev-pr-2008` or somesuch). Relatedly, with minimal hax I was able to get the headscale goreleaser CI to run in my fork and easily test images built and pushed to [ghcr.io/samcday](https://github.com/samcday/headscale/pkgs/container/headscale). It's great to see CI machinery in such great shape in this project. 10/10 would fork again
Author
Owner

@kradalby commented on GitHub (Jul 12, 2024):

@samcday I'm not completely done with tests yet, but could you give #2008 ago?

Cool 👍 I'll check it out this evening or tomorrow.

Great, thanks, I'll write more tests in the mean time.

FWIW, I would have tested it straight away if the PR machinery were to cut images that I could test (e.g a ghcr.io/juantfont/headscale/unstable-dev-pr-2008 or somesuch).

That would be neat, happy to take contributions for that one, or an issue, maybe @ohdearaugustin could have a look.

Relatedly, with minimal hax I was able to get the headscale goreleaser CI to run in my fork and easily test images built and pushed to ghcr.io/samcday. It's great to see CI machinery in such great shape in this project. 10/10 would fork again

Thanks! Great to hear.

@kradalby commented on GitHub (Jul 12, 2024): > > @samcday I'm not completely done with tests yet, but could you give #2008 ago? > > Cool 👍 I'll check it out this evening or tomorrow. Great, thanks, I'll write more tests in the mean time. > > FWIW, I would have tested it straight away if the PR machinery were to cut images that I could test (e.g a `ghcr.io/juantfont/headscale/unstable-dev-pr-2008` or somesuch). That would be neat, happy to take contributions for that one, or an issue, maybe @ohdearaugustin could have a look. > > Relatedly, with minimal hax I was able to get the headscale goreleaser CI to run in my fork and easily test images built and pushed to [ghcr.io/samcday](https://github.com/samcday/headscale/pkgs/container/headscale). It's great to see CI machinery in such great shape in this project. 10/10 would fork again Thanks! Great to hear.
Author
Owner

@samcday commented on GitHub (Jul 15, 2024):

I applied #2008 to my local deployment this morning. This tailnet has a handful of ephemeral (autoscaling) devices. I'll report back in a day or two after the changes have soaked a bit!

@samcday commented on GitHub (Jul 15, 2024): I applied #2008 to my local deployment this morning. This tailnet has a handful of ephemeral (autoscaling) devices. I'll report back in a day or two after the changes have soaked a bit!
Author
Owner

@kradalby commented on GitHub (Jul 17, 2024):

Hey @samcday, just checking in if you have seen any strange behaviour before we get it reviewed and merged?

@kradalby commented on GitHub (Jul 17, 2024): Hey @samcday, just checking in if you have seen any strange behaviour before we get it reviewed and merged?
Author
Owner

@samcday commented on GitHub (Jul 17, 2024):

I've been seeing some weirdness when I try to run a pair of HA subnet routers with an ephemeral preauth key. Seems everytime I come back to examine that setup later, one of the pods has died and been expired out. I haven't dug into that further though (I just shrugged, and scaled back down to 1 replica for now).

The original issue was biting me more severely because I have autoscaling cloud nodes (using an ephemeral key) that couldn't reach their control plane in the tailnet. I can confirm that with #2008 running for the last couple of days I've had none of those nodes getting unceremoniously yeeted from the tailnet 🎉

@samcday commented on GitHub (Jul 17, 2024): I've been seeing some weirdness when I try to run a pair of HA subnet routers with an ephemeral preauth key. Seems everytime I come back to examine that setup later, one of the pods has died and been expired out. I haven't dug into that further though (I just shrugged, and scaled back down to 1 replica for now). The original issue was biting me more severely because I have autoscaling cloud nodes (using an ephemeral key) that couldn't reach their control plane in the tailnet. I can confirm that with #2008 running for the last couple of days I've had none of those nodes getting unceremoniously yeeted from the tailnet :tada:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#739