[Feature] Support listening for metrics on tailscale address #740

New Issue

adam · 2025-12-29T02:23:07+01:00

adam commented

2025-12-29 02:23:07 +01:00

Originally created by @OddBloke on GitHub (Jul 10, 2024).

Use case

I use my Headscale node's Tailscale IP address as its metrics_listen_addr so I don't have to expose the metric data to the world. Whenever the node restarts, it ends up in a deadlocked condition: the Tailscale interface can't come up because Headscale isn't up, and Headscale won't come up because the Tailscale IP address isn't available to bind to:

Error starting server error="failed to bind to TCP address: listen tcp <REDACTED>:9090: bind: cannot assign requested address"

To work around this, I currently have to manually intervene on every reboot. I edit my config.yaml to disable metrics, let Headscale start so Tailscale can come up, undo my config.yaml change and restart Headscale. (This also means that Headscale doesn't automatically recover if the node goes down unexpectedly.)

Description

Headscale should support bringing up its core functionality even if it can't bring up the metrics server. This behaviour should be non-default and configurable so that, by default, metrics failures remains obvious to users. (For my usecase this would still require a service restart to restore metrics, but Headscale would otherwise be up and listening on reboot.)

Ideally, Headscale would attempt to start the metrics server on an interval. For my usecase, this would mean metrics would be restored automatically once the Tailscale interface was up.

Contribution

I can write the design doc for this feature
I can contribute this feature

How can it be implemented?

Originally created by @OddBloke on GitHub (Jul 10, 2024). ### Use case I use my Headscale node's Tailscale IP address as its `metrics_listen_addr` so I don't have to expose the metric data to the world. Whenever the node restarts, it ends up in a deadlocked condition: the Tailscale interface can't come up because Headscale isn't up, and Headscale won't come up because the Tailscale IP address isn't available to bind to: ``` Error starting server error="failed to bind to TCP address: listen tcp <REDACTED>:9090: bind: cannot assign requested address" ``` To work around this, I currently have to manually intervene on every reboot. I edit my `config.yaml` to disable metrics, let Headscale start so Tailscale can come up, undo my `config.yaml` change and restart Headscale. (This also means that Headscale doesn't automatically recover if the node goes down unexpectedly.) ### Description Headscale should support bringing up its core functionality even if it can't bring up the metrics server. This behaviour should be non-default and configurable so that, by default, metrics failures remains obvious to users. (For my usecase this would still require a service restart to restore metrics, but Headscale would otherwise be up and listening on reboot.) Ideally, Headscale would attempt to start the metrics server on an interval. For my usecase, this would mean metrics would be restored automatically once the Tailscale interface was up. ### Contribution - [ ] I can write the design doc for this feature - [ ] I can contribute this feature ### How can it be implemented?

adam added the enhancement no-stale-bot labels 2025-12-29 02:23:07 +01:00

adam commented

2025-12-29 02:23:07 +01:00

@ArcticLampyrid commented on GitHub (Nov 19, 2024):

Workaround:
You can bind the metrics to loopback address (eg. ::1) and then setup port forwarding rules via nftables (or iptables if you use that)

@ArcticLampyrid commented on GitHub (Nov 19, 2024): Workaround: You can bind the metrics to loopback address (eg. `::1`) and then setup port forwarding rules via `nftables` (or `iptables` if you use that)

adam commented

2025-12-29 02:23:07 +01:00

@kradalby commented on GitHub (Nov 21, 2024):

Yes this is a bit of a bootstrap problem, there isnt really a good way to solve this in a "stable" way. Having logic for retrying the metric server seems very complicated.

I would propose that you use a subnetrouter to scrape your headscale via a Tailscale that runs on it.

A possibility which I think of occasionally that would be "fun" is to serve up Headscale endpoints and metrics straight onto the headscale tailnet via tsnet. So running embedded tailscale inside headscale and have it autojoin the network.

Happy for someone to play around with that, I've had not had time to test it out.

@kradalby commented on GitHub (Nov 21, 2024): Yes this is a bit of a bootstrap problem, there isnt really a good way to solve this in a "stable" way. Having logic for retrying the metric server seems very complicated. I would propose that you use a subnetrouter to scrape your headscale via a Tailscale that runs on it. A possibility which I think of occasionally that would be "fun" is to serve up Headscale endpoints and metrics straight onto the headscale tailnet via [tsnet](https://tailscale.com/kb/1244/tsnet). So running embedded tailscale inside headscale and have it autojoin the network. Happy for someone to play around with that, I've had not had time to test it out.

adam commented

2025-12-29 02:23:08 +01:00

@Erisa commented on GitHub (Feb 17, 2025):

To add to this, a simple workaround for this use-case is to use Tailscale Serve to proxy the request to the Tailscale IP back to 127.0.0.1 like this:

root@headscale:~# tailscale serve --bg --http=9090 http://127.0.0.1:9090
Available within your tailnet:

http://headscale.net.isolated.network:9090/
|-- proxy http://127.0.0.1:9090

Serve started and running in the background.
To disable the proxy, run: tailscale serve --http=9090 off

Then any request on headscale:9090 will proxy back to 127.0.0.1:9090 and you won't run into any worse bootstrap problems than you would have had with Headscale and Tailscale on the same server anyway.

➜  ~ curl http://headscale:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
[..etc..]

The only oddity I noticed is that requesting the IP without the hostname won't work:

➜  ~ curl http://100.64.0.13:9090/metrics
404 page not found

But this is easily resolved by using a TCP listener instead:

root@headscale:~# tailscale serve --bg --tcp=9090 tcp://127.0.0.1:9090
Available within your tailnet:

https://headscale.net.isolated.network:9090
|-- tcp://headscale.net.isolated.network:9090 (TLS over TCP)
|-- tcp://100.64.0.13:9090
|-- tcp://[fd7a:115c:a1e0::d]:9090
|--> tcp://127.0.0.1:9090
Serve started and running in the background.
To disable the proxy, run: tailscale serve --tcp=9090 off

➜  ~ curl http://100.64.0.13:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
[..etc..]

@Erisa commented on GitHub (Feb 17, 2025): To add to this, a simple workaround for this use-case is to use Tailscale Serve to proxy the request to the Tailscale IP back to 127.0.0.1 like this: ``` root@headscale:~# tailscale serve --bg --http=9090 http://127.0.0.1:9090 Available within your tailnet: http://headscale.net.isolated.network:9090/ |-- proxy http://127.0.0.1:9090 Serve started and running in the background. To disable the proxy, run: tailscale serve --http=9090 off ``` Then any request on `headscale:9090` will proxy back to `127.0.0.1:9090` and you won't run into any worse bootstrap problems than you would have had with Headscale and Tailscale on the same server anyway. ``` ➜ ~ curl http://headscale:9090/metrics # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles. # TYPE go_gc_duration_seconds summary [..etc..] ``` The only oddity I noticed is that requesting the IP without the hostname won't work: ``` ➜ ~ curl http://100.64.0.13:9090/metrics 404 page not found ``` But this is easily resolved by using a TCP listener instead: ``` root@headscale:~# tailscale serve --bg --tcp=9090 tcp://127.0.0.1:9090 Available within your tailnet: https://headscale.net.isolated.network:9090 |-- tcp://headscale.net.isolated.network:9090 (TLS over TCP) |-- tcp://100.64.0.13:9090 |-- tcp://[fd7a:115c:a1e0::d]:9090 |--> tcp://127.0.0.1:9090 Serve started and running in the background. To disable the proxy, run: tailscale serve --tcp=9090 off ``` ``` ➜ ~ curl http://100.64.0.13:9090/metrics # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles. # TYPE go_gc_duration_seconds summary [..etc..] ```

adam referenced this issue

2025-12-29 03:19:43 +01:00

[PR #1400] [CLOSED] PR: Add a config to verify clients on DERP #2087

adam referenced this issue

2025-12-29 03:20:08 +01:00

[PR #1580] [CLOSED] feat(derp): support `verify-clients` #2189

Sign in to join this conversation.

Branches Tags

main

gh-pages

update_flake_lock_action

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#740