[Feature] Support listening for metrics on tailscale address #740

Open
opened 2025-12-29 02:23:07 +01:00 by adam · 3 comments
Owner

Originally created by @OddBloke on GitHub (Jul 10, 2024).

Use case

I use my Headscale node's Tailscale IP address as its metrics_listen_addr so I don't have to expose the metric data to the world. Whenever the node restarts, it ends up in a deadlocked condition: the Tailscale interface can't come up because Headscale isn't up, and Headscale won't come up because the Tailscale IP address isn't available to bind to:

Error starting server error="failed to bind to TCP address: listen tcp <REDACTED>:9090: bind: cannot assign requested address"

To work around this, I currently have to manually intervene on every reboot. I edit my config.yaml to disable metrics, let Headscale start so Tailscale can come up, undo my config.yaml change and restart Headscale. (This also means that Headscale doesn't automatically recover if the node goes down unexpectedly.)

Description

Headscale should support bringing up its core functionality even if it can't bring up the metrics server. This behaviour should be non-default and configurable so that, by default, metrics failures remains obvious to users. (For my usecase this would still require a service restart to restore metrics, but Headscale would otherwise be up and listening on reboot.)

Ideally, Headscale would attempt to start the metrics server on an interval. For my usecase, this would mean metrics would be restored automatically once the Tailscale interface was up.

Contribution

  • I can write the design doc for this feature
  • I can contribute this feature

How can it be implemented?

Originally created by @OddBloke on GitHub (Jul 10, 2024). ### Use case I use my Headscale node's Tailscale IP address as its `metrics_listen_addr` so I don't have to expose the metric data to the world. Whenever the node restarts, it ends up in a deadlocked condition: the Tailscale interface can't come up because Headscale isn't up, and Headscale won't come up because the Tailscale IP address isn't available to bind to: ``` Error starting server error="failed to bind to TCP address: listen tcp <REDACTED>:9090: bind: cannot assign requested address" ``` To work around this, I currently have to manually intervene on every reboot. I edit my `config.yaml` to disable metrics, let Headscale start so Tailscale can come up, undo my `config.yaml` change and restart Headscale. (This also means that Headscale doesn't automatically recover if the node goes down unexpectedly.) ### Description Headscale should support bringing up its core functionality even if it can't bring up the metrics server. This behaviour should be non-default and configurable so that, by default, metrics failures remains obvious to users. (For my usecase this would still require a service restart to restore metrics, but Headscale would otherwise be up and listening on reboot.) Ideally, Headscale would attempt to start the metrics server on an interval. For my usecase, this would mean metrics would be restored automatically once the Tailscale interface was up. ### Contribution - [ ] I can write the design doc for this feature - [ ] I can contribute this feature ### How can it be implemented?
adam added the enhancementno-stale-bot labels 2025-12-29 02:23:07 +01:00
Author
Owner

@ArcticLampyrid commented on GitHub (Nov 19, 2024):

Workaround:
You can bind the metrics to loopback address (eg. ::1) and then setup port forwarding rules via nftables (or iptables if you use that)

@ArcticLampyrid commented on GitHub (Nov 19, 2024): Workaround: You can bind the metrics to loopback address (eg. `::1`) and then setup port forwarding rules via `nftables` (or `iptables` if you use that)
Author
Owner

@kradalby commented on GitHub (Nov 21, 2024):

Yes this is a bit of a bootstrap problem, there isnt really a good way to solve this in a "stable" way. Having logic for retrying the metric server seems very complicated.

I would propose that you use a subnetrouter to scrape your headscale via a Tailscale that runs on it.

A possibility which I think of occasionally that would be "fun" is to serve up Headscale endpoints and metrics straight onto the headscale tailnet via tsnet. So running embedded tailscale inside headscale and have it autojoin the network.

Happy for someone to play around with that, I've had not had time to test it out.

@kradalby commented on GitHub (Nov 21, 2024): Yes this is a bit of a bootstrap problem, there isnt really a good way to solve this in a "stable" way. Having logic for retrying the metric server seems very complicated. I would propose that you use a subnetrouter to scrape your headscale via a Tailscale that runs on it. A possibility which I think of occasionally that would be "fun" is to serve up Headscale endpoints and metrics straight onto the headscale tailnet via [tsnet](https://tailscale.com/kb/1244/tsnet). So running embedded tailscale inside headscale and have it autojoin the network. Happy for someone to play around with that, I've had not had time to test it out.
Author
Owner

@Erisa commented on GitHub (Feb 17, 2025):

To add to this, a simple workaround for this use-case is to use Tailscale Serve to proxy the request to the Tailscale IP back to 127.0.0.1 like this:

root@headscale:~# tailscale serve --bg --http=9090 http://127.0.0.1:9090
Available within your tailnet:

http://headscale.net.isolated.network:9090/
|-- proxy http://127.0.0.1:9090

Serve started and running in the background.
To disable the proxy, run: tailscale serve --http=9090 off

Then any request on headscale:9090 will proxy back to 127.0.0.1:9090 and you won't run into any worse bootstrap problems than you would have had with Headscale and Tailscale on the same server anyway.

➜  ~ curl http://headscale:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
[..etc..]

The only oddity I noticed is that requesting the IP without the hostname won't work:

➜  ~ curl http://100.64.0.13:9090/metrics
404 page not found

But this is easily resolved by using a TCP listener instead:

root@headscale:~# tailscale serve --bg --tcp=9090 tcp://127.0.0.1:9090
Available within your tailnet:

https://headscale.net.isolated.network:9090
|-- tcp://headscale.net.isolated.network:9090 (TLS over TCP)
|-- tcp://100.64.0.13:9090
|-- tcp://[fd7a:115c:a1e0::d]:9090
|--> tcp://127.0.0.1:9090
Serve started and running in the background.
To disable the proxy, run: tailscale serve --tcp=9090 off
➜  ~ curl http://100.64.0.13:9090/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
[..etc..]
@Erisa commented on GitHub (Feb 17, 2025): To add to this, a simple workaround for this use-case is to use Tailscale Serve to proxy the request to the Tailscale IP back to 127.0.0.1 like this: ``` root@headscale:~# tailscale serve --bg --http=9090 http://127.0.0.1:9090 Available within your tailnet: http://headscale.net.isolated.network:9090/ |-- proxy http://127.0.0.1:9090 Serve started and running in the background. To disable the proxy, run: tailscale serve --http=9090 off ``` Then any request on `headscale:9090` will proxy back to `127.0.0.1:9090` and you won't run into any worse bootstrap problems than you would have had with Headscale and Tailscale on the same server anyway. ``` ➜ ~ curl http://headscale:9090/metrics # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles. # TYPE go_gc_duration_seconds summary [..etc..] ``` The only oddity I noticed is that requesting the IP without the hostname won't work: ``` ➜ ~ curl http://100.64.0.13:9090/metrics 404 page not found ``` But this is easily resolved by using a TCP listener instead: ``` root@headscale:~# tailscale serve --bg --tcp=9090 tcp://127.0.0.1:9090 Available within your tailnet: https://headscale.net.isolated.network:9090 |-- tcp://headscale.net.isolated.network:9090 (TLS over TCP) |-- tcp://100.64.0.13:9090 |-- tcp://[fd7a:115c:a1e0::d]:9090 |--> tcp://127.0.0.1:9090 Serve started and running in the background. To disable the proxy, run: tailscale serve --tcp=9090 off ``` ``` ➜ ~ curl http://100.64.0.13:9090/metrics # HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles. # TYPE go_gc_duration_seconds summary [..etc..] ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#740