[question] high availability support? #29

New Issue

adam · 2025-12-29T01:20:24+01:00

adam commented

2025-12-29 01:20:24 +01:00

Originally created by @jsiebens on GitHub (Sep 1, 2021).

Seeing the number of replicas set to two in the k8s examples, I was just wondering if headscale supports such a HA setup?
What if nodes are contacting different instances of the headscale server?

Originally created by @jsiebens on GitHub (Sep 1, 2021). Seeing the number of replicas set to two in the k8s examples, I was just wondering if headscale supports such a HA setup? What if nodes are contacting different instances of the headscale server?

adam closed this issue

2025-12-29 01:20:24 +01:00

adam commented

2025-12-29 01:20:25 +01:00

@SilverBut commented on GitHub (Sep 2, 2021):

Not sure about if the database transactions are well managed. If so, HA might be simple within same region: just use a shared MySQL database, if the service itself is not stateful.

But I think this is not what you mean (and also not mine), since deploying service in different regions is still not possible (for example, two servers running in Russia and Japan). Maybe we should consider either support some distributed database (like TiDB) so state can be synced via database, or we should use something like raft or paxos to build a cluster.

@SilverBut commented on GitHub (Sep 2, 2021): Not sure about if the database transactions are well managed. If so, HA might be simple within same region: just use a shared MySQL database, if the service itself is not stateful. But I think this is not what you mean (and also not mine), since deploying service in different regions is still not possible (for example, two servers running in Russia and Japan). Maybe we should consider either support some distributed database (like TiDB) so state can be synced via database, or we should use something like raft or paxos to build a cluster.

adam commented

2025-12-29 01:20:25 +01:00

@juanfont commented on GitHub (Sep 3, 2021):

We need to think a bit about it. It is not trivial with the current architecture, as a TCP connection is opened and kept from the clients to the server.

This connection is used for keepalives and sending network map updates to the client. Should we have more than one server instance, we would need a mechanism to have cross-headscale communication to notify the peers polling in different instances - which requires some changes in our side.

On the other hand, having the control server down is not great, but not immediately terrible. Everything keeps working, but slowly decaying (Tailscale.com has a KB article on this https://tailscale.com/kb/1091/what-happens-if-the-coordination-server-is-down/)

New users and devices cannot be added to the network.
Keys cannot be refreshed and exchanged, meaning that existing devices will gradually lose access to each other.
Firewall rules cannot be updated.
Existing users cannot have their keys revoked.

Hope this helps...

@juanfont commented on GitHub (Sep 3, 2021): We need to think a bit about it. It is not trivial with the current architecture, as a TCP connection is opened and kept from the clients to the server. This connection is used for keepalives and sending network map updates to the client. Should we have more than one server instance, we would need a mechanism to have cross-headscale communication to notify the peers polling in different instances - which requires some changes in our side. On the other hand, having the control server down is not great, but not immediately terrible. Everything keeps working, but slowly decaying (Tailscale.com has a KB article on this https://tailscale.com/kb/1091/what-happens-if-the-coordination-server-is-down/) > New users and devices cannot be added to the network. > Keys cannot be refreshed and exchanged, meaning that existing devices will gradually lose access to each other. > Firewall rules cannot be updated. > Existing users cannot have their keys revoked. Hope this helps...

adam commented

2025-12-29 01:20:26 +01:00

@jsiebens commented on GitHub (Sep 3, 2021):

hi @juanfont, that clarifies a lot, thanks for the feedback!

@jsiebens commented on GitHub (Sep 3, 2021): hi @juanfont, that clarifies a lot, thanks for the feedback!

adam commented

2025-12-29 01:20:27 +01:00

@SuperPauly commented on GitHub (Apr 6, 2023):

Did anyone eventually work on this? Or is there another HA solution for Headscale?

@SuperPauly commented on GitHub (Apr 6, 2023): Did anyone eventually work on this? Or is there another HA solution for Headscale?

adam commented

2025-12-29 01:20:28 +01:00

@TKinslayer commented on GitHub (Dec 18, 2024):

Maybe I'm coming a bit late to the party but basically, the same : any plan to add High Availability like Tailscale does ? (only about the failover part with route advertising). That would make headscale production ready in my book and I could start using it at work.

@TKinslayer commented on GitHub (Dec 18, 2024): Maybe I'm coming a bit late to the party but basically, the same : any plan to add High Availability like Tailscale does ? (only about the failover part with route advertising). That would make headscale production ready in my book and I could start using it at work.

adam commented

2025-12-29 01:20:28 +01:00

@nickdickinson commented on GitHub (Jan 10, 2025):

Could each headscale peer post their IP/address in the database and then they have each an API endpoint to notify if there is a network change to be found in the database (or however this should actually work)? Anyway I guess it is a mute point if it is not in the roadmap

@nickdickinson commented on GitHub (Jan 10, 2025): Could each headscale peer post their IP/address in the database and then they have each an API endpoint to notify if there is a network change to be found in the database (or however this should actually work)? Anyway I guess it is a mute point if it is not in the roadmap

adam referenced this issue

2025-12-29 02:29:14 +01:00

[PR #29] [MERGED] Add support for ephemeral nodes via a special type of pre-auth key. #1212

Sign in to join this conversation.

Branches Tags

main

gh-pages

update_flake_lock_action

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#29