Ephemeral nodes no longer correctly expire #631

Closed
opened 2025-12-29 02:21:24 +01:00 by adam · 3 comments
Owner

Originally created by @dustinblackman on GitHub (Feb 9, 2024).

Bug description

Related to https://github.com/juanfont/headscale/pull/1701

https://github.com/juanfont/headscale/pull/1701 introduced a bug where ephemeral nodes no longer correctly expire. Once a node reaches it's intended expiry, it's correctly marked as expired in the TUI, but the node remains in the node list where the expired at timestamp continues to increase every 5 seconds. It continues to allocate an IP for itself even though no inbound connections are accepted.

The following screenshot shows 3 ephemeral nodes that have all expired. Pay close attention to the timestamp in the top left corner where I run the command every ~5 seconds, and the expired_at continues to increase. The theory is an event isn't being correctly fired/consumed since the transactions DB changes.

Screenshot 2024-02-08 at 9 56 18 PM

I'm hoping to continue debugging to see if I can find solution in the upcoming week, but would love an assist as I'm still unfamiliar with the code base.

Environment

  • Version of headscale used: 83769ba715
  • Version of tailscale client: v1.58.2
  • OS (e.g. Linux, Mac, Cygwin, WSL, etc.): Debian bookworm

To Reproduce

The following branch I have contains a docker-compose file and several scripts that spawn a full headscale cluster locally, adding 2 normal nodes, 3 ephemeral nodes, and a nginx server to test connections. The headscale container has live reload that monitors changes on your local file system, making development and testing very quick.

Due to https://github.com/juanfont/headscale/issues/1711, nodes authorized by auth keys have no expiry set, so the branch also includes a patch to expire ephemeral nodes after 30 seconds to assist with debugging.

Run the following to spawn the cluster locally.

git clone https://github.com/dustinblackman/headscale.git
cd headscale
git checkout ephemeral-debug
docker compose -f local-cluster.docker-compose.yml up

In a separate terminal once Headscale is running, you can enter the Headscale container with the following to access the CLI.

docker compose -f local-cluster.docker-compose.yml exec headscale /bin/bash 
headscale node list

Additionally, you can enter any of the nodes and run tailscale commands to check connections

docker ps | grep node
# Select a container ID from the above list
docker exec -it CONTAINTER-ID /bin/bash
tailscale status

To reset the environment and start over, while keeping the go build cache

docker compose -f local-cluster.docker-compose.yml kill
docker compose -f local-cluster.docker-compose.yml rm
docker volume ls | grep headscale | grep -v go | awk '{print $2}' | xargs -L1 docker volume rm
docker compose -f local-cluster.docker-compose.yml up

Side note: The docker-compose cluster setup is what I consider the magic sauce to debugging systems like Headscale locally in a production-like environment. If you find this useful, I'd be happy to open up a PR with docs for future developers. :)

Originally created by @dustinblackman on GitHub (Feb 9, 2024). ## Bug description Related to https://github.com/juanfont/headscale/pull/1701 https://github.com/juanfont/headscale/pull/1701 introduced a bug where ephemeral nodes no longer correctly expire. Once a node reaches it's intended expiry, it's correctly marked as `expired` in the TUI, but the node remains in the node list where the expired at timestamp continues to increase every 5 seconds. It continues to allocate an IP for itself even though no inbound connections are accepted. The following screenshot shows 3 ephemeral nodes that have all expired. Pay close attention to the timestamp in the top left corner where I run the command every ~5 seconds, and the `expired_at` continues to increase. The theory is an event isn't being correctly fired/consumed since the transactions DB changes. <img width="1346" alt="Screenshot 2024-02-08 at 9 56 18 PM" src="https://github.com/juanfont/headscale/assets/5246169/6a67c4e9-0cc2-4836-8333-ad244123c0bb"> I'm hoping to continue debugging to see if I can find solution in the upcoming week, but would love an assist as I'm still unfamiliar with the code base. ## Environment - Version of headscale used: https://github.com/juanfont/headscale/commit/83769ba715408c05cc5defc1562e0bfe1d368de6 - Version of tailscale client: v1.58.2 - OS (e.g. Linux, Mac, Cygwin, WSL, etc.): Debian bookworm ## To Reproduce The [following branch](https://github.com/dustinblackman/headscale/compare/main...ephemeral-debug) I have contains a docker-compose file and several scripts that spawn a full headscale cluster locally, adding 2 normal nodes, 3 ephemeral nodes, and a nginx server to test connections. The headscale container has live reload that monitors changes on your local file system, making development and testing very quick. Due to https://github.com/juanfont/headscale/issues/1711, nodes authorized by auth keys have no expiry set, so the branch also includes [a patch](https://github.com/dustinblackman/headscale/compare/main...ephemeral-debug#diff-40ecedd4d6f0d26d2c46e86928d9bb503048b0d655a05f34107d23b329daebee) to expire ephemeral nodes after 30 seconds to assist with debugging. Run the following to spawn the cluster locally. ```bash git clone https://github.com/dustinblackman/headscale.git cd headscale git checkout ephemeral-debug docker compose -f local-cluster.docker-compose.yml up ``` In a separate terminal once Headscale is running, you can enter the Headscale container with the following to access the CLI. ```bash docker compose -f local-cluster.docker-compose.yml exec headscale /bin/bash headscale node list ``` Additionally, you can enter any of the nodes and run tailscale commands to check connections ```bash docker ps | grep node # Select a container ID from the above list docker exec -it CONTAINTER-ID /bin/bash tailscale status ``` To reset the environment and start over, while keeping the go build cache ```bash docker compose -f local-cluster.docker-compose.yml kill docker compose -f local-cluster.docker-compose.yml rm docker volume ls | grep headscale | grep -v go | awk '{print $2}' | xargs -L1 docker volume rm docker compose -f local-cluster.docker-compose.yml up ``` Side note: The docker-compose cluster setup is what I consider the magic sauce to debugging systems like Headscale locally in a production-like environment. If you find this useful, I'd be happy to open up a PR with docs for future developers. :)
adam added the bug label 2025-12-29 02:21:24 +01:00
adam closed this issue 2025-12-29 02:21:24 +01:00
Author
Owner

@dustinblackman commented on GitHub (Feb 9, 2024):

cc @kradalby

@dustinblackman commented on GitHub (Feb 9, 2024): cc @kradalby
Author
Owner

@dustinblackman commented on GitHub (Feb 9, 2024):

Going to close this. I think I misunderstood the feature set around ephemeral nodes, and made me consider this a bug. Once ephemeral nodes disconnect, they disappear as expected. Forcing them off the network is a manual intervention, which makes sense.

Sorry for the noise!

@dustinblackman commented on GitHub (Feb 9, 2024): Going to close this. I think I misunderstood the feature set around ephemeral nodes, and made me consider this a bug. Once ephemeral nodes disconnect, they disappear as expected. Forcing them off the network is a manual intervention, which makes sense. Sorry for the noise!
Author
Owner

@kradalby commented on GitHub (Feb 9, 2024):

@dustinblackman no problem, I didnt get around to look into this today, but I wanted to say that I really appreciate the detailed report!

@kradalby commented on GitHub (Feb 9, 2024): @dustinblackman no problem, I didnt get around to look into this today, but I wanted to say that I _really_ appreciate the detailed report!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#631