[question] high availability support? #29

Closed
opened 2025-12-29 01:20:24 +01:00 by adam · 6 comments
Owner

Originally created by @jsiebens on GitHub (Sep 1, 2021).

Seeing the number of replicas set to two in the k8s examples, I was just wondering if headscale supports such a HA setup?
What if nodes are contacting different instances of the headscale server?

Originally created by @jsiebens on GitHub (Sep 1, 2021). Seeing the number of replicas set to two in the k8s examples, I was just wondering if headscale supports such a HA setup? What if nodes are contacting different instances of the headscale server?
adam closed this issue 2025-12-29 01:20:24 +01:00
Author
Owner

@SilverBut commented on GitHub (Sep 2, 2021):

Not sure about if the database transactions are well managed. If so, HA might be simple within same region: just use a shared MySQL database, if the service itself is not stateful.

But I think this is not what you mean (and also not mine), since deploying service in different regions is still not possible (for example, two servers running in Russia and Japan). Maybe we should consider either support some distributed database (like TiDB) so state can be synced via database, or we should use something like raft or paxos to build a cluster.

@SilverBut commented on GitHub (Sep 2, 2021): Not sure about if the database transactions are well managed. If so, HA might be simple within same region: just use a shared MySQL database, if the service itself is not stateful. But I think this is not what you mean (and also not mine), since deploying service in different regions is still not possible (for example, two servers running in Russia and Japan). Maybe we should consider either support some distributed database (like TiDB) so state can be synced via database, or we should use something like raft or paxos to build a cluster.
Author
Owner

@juanfont commented on GitHub (Sep 3, 2021):

We need to think a bit about it. It is not trivial with the current architecture, as a TCP connection is opened and kept from the clients to the server.

This connection is used for keepalives and sending network map updates to the client. Should we have more than one server instance, we would need a mechanism to have cross-headscale communication to notify the peers polling in different instances - which requires some changes in our side.

On the other hand, having the control server down is not great, but not immediately terrible. Everything keeps working, but slowly decaying (Tailscale.com has a KB article on this https://tailscale.com/kb/1091/what-happens-if-the-coordination-server-is-down/)

New users and devices cannot be added to the network.
Keys cannot be refreshed and exchanged, meaning that existing devices will gradually lose access to each other.
Firewall rules cannot be updated.
Existing users cannot have their keys revoked.

Hope this helps...

@juanfont commented on GitHub (Sep 3, 2021): We need to think a bit about it. It is not trivial with the current architecture, as a TCP connection is opened and kept from the clients to the server. This connection is used for keepalives and sending network map updates to the client. Should we have more than one server instance, we would need a mechanism to have cross-headscale communication to notify the peers polling in different instances - which requires some changes in our side. On the other hand, having the control server down is not great, but not immediately terrible. Everything keeps working, but slowly decaying (Tailscale.com has a KB article on this https://tailscale.com/kb/1091/what-happens-if-the-coordination-server-is-down/) > New users and devices cannot be added to the network. > Keys cannot be refreshed and exchanged, meaning that existing devices will gradually lose access to each other. > Firewall rules cannot be updated. > Existing users cannot have their keys revoked. Hope this helps...
Author
Owner

@jsiebens commented on GitHub (Sep 3, 2021):

hi @juanfont, that clarifies a lot, thanks for the feedback!

@jsiebens commented on GitHub (Sep 3, 2021): hi @juanfont, that clarifies a lot, thanks for the feedback!
Author
Owner

@SuperPauly commented on GitHub (Apr 6, 2023):

Did anyone eventually work on this? Or is there another HA solution for Headscale?

@SuperPauly commented on GitHub (Apr 6, 2023): Did anyone eventually work on this? Or is there another HA solution for Headscale?
Author
Owner

@TKinslayer commented on GitHub (Dec 18, 2024):

Maybe I'm coming a bit late to the party but basically, the same : any plan to add High Availability like Tailscale does ? (only about the failover part with route advertising). That would make headscale production ready in my book and I could start using it at work.

@TKinslayer commented on GitHub (Dec 18, 2024): Maybe I'm coming a bit late to the party but basically, the same : any plan to add High Availability like Tailscale does ? (only about the failover part with route advertising). That would make headscale production ready in my book and I could start using it at work.
Author
Owner

@nickdickinson commented on GitHub (Jan 10, 2025):

Could each headscale peer post their IP/address in the database and then they have each an API endpoint to notify if there is a network change to be found in the database (or however this should actually work)? Anyway I guess it is a mute point if it is not in the roadmap

@nickdickinson commented on GitHub (Jan 10, 2025): Could each headscale peer post their IP/address in the database and then they have each an API endpoint to notify if there is a network change to be found in the database (or however this should actually work)? Anyway I guess it is a mute point if it is not in the roadmap
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/headscale#29