Shoehorn PSK-based authentication to provide key rotation and stronger identity guarantees #658

New Issue

adam · 2025-12-29T02:21:42+01:00

adam commented

2025-12-29 02:21:42 +01:00

Originally created by @gdonval on GitHub (Mar 6, 2024).

Why

Headscale, like Tailscale, currently considers Wireguard's public-private key pair as disposable: they are created and then distributed everywhere through Headscale's backend and rotated as needed.

Any flaw in ed25519 means end game; it's not quantum-resistant.
Key rotation scales poorly: if node X is connected to 100 other nodes and the key for X is rotated, full connectivity will only be achieved after all 101 nodes are updated and fully in sync. Add NAT in there and reverse proxying and key rotation can take even longer to be propagated.
This breaks a nice Wireguard property: private keys are node identity. If node private keys were immutable, a compromised Headscale instance would be guaranteed not to have much effect on the existing cluster.

Description

What I propose, as a separate mode, is the following:

Upon registration with Headscale, a node generates private keys as it is the case today.
Headscale distributes the public part of the Wireguard key pair to the concerned nodes as usual. The only difference is that that key is now permanently tied to a specific node.
For each node pair, Headscale generates a separate PSK that is pushed onto that pair. This is the part that will be regularly pushed and rotated.

In terms of how this helps with the points above:

With a PSK, existing and past encrypted connections remain secure, even against quantum attacks.
Rotating a key for X now involves synchronising pairs of nodes, two at a time, in any order (PSKs are tied to peers!). The added benefit is that key rotation can occur X hours after connection for each node so potentially all those updates get staggered.
In case the Headscale server gets compromised, AFAICT, nothing really prevents it from pretending a node's Wireguard keys were rotated and inject references to a rogue node. I might be wrong there. But at any rate, if the Wireguard key pair is immutable in this mode, there is no way any existing server can be impersonated unless that server was compromised to begin with. This is strong. If there is mitigation in place to prevent such impersonation, it can be removed in this mode.

Originally created by @gdonval on GitHub (Mar 6, 2024).  ## Why  Headscale, like Tailscale, currently considers Wireguard's public-private key pair as disposable: they are created and then distributed everywhere through Headscale's backend and rotated as needed. 1. Any flaw in ed25519 means end game; it's not quantum-resistant. 2. Key rotation scales poorly: if node X is connected to 100 other nodes and the key for X is rotated, full connectivity will only be achieved **after** all 101 nodes are updated and fully in sync. Add NAT in there and reverse proxying and key rotation can take even longer to be propagated. 3. This breaks a nice Wireguard property: private keys **are** node identity. If node private keys were immutable, a compromised Headscale instance would be guaranteed not to have much effect on the existing cluster. ## Description  What I propose, **as a separate mode**, is the following: 1. Upon registration with Headscale, a node generates private keys as it is the case today. 2. Headscale distributes the public part of the Wireguard key pair to the concerned nodes as usual. The only difference is that that key is now permanently tied to a specific node. 4. For each node **pair**, Headscale generates a separate PSK that is pushed onto that pair. This is the part that will be regularly pushed and rotated. In terms of how this helps with the points above: 1. With a PSK, existing and past encrypted connections remain secure, even against quantum attacks. 2. Rotating a key for X now involves synchronising pairs of nodes, two at a time, in any order (PSKs are tied to peers!). The added benefit is that key rotation can occur X hours after connection for each node so potentially all those updates get staggered. 3. In case the Headscale server gets compromised, AFAICT, nothing really prevents it from pretending a node's Wireguard keys were rotated and inject references to a rogue node. I might be wrong there. But at any rate, if the Wireguard key pair is immutable in this mode, there is no way any existing server can be impersonated unless that server was compromised to begin with. This is strong. If there is mitigation in place to prevent such impersonation, it can be removed in this mode.

adam added the enhancement stale labels 2025-12-29 02:21:42 +01:00

adam closed this issue

2025-12-29 02:21:42 +01:00

adam commented

2025-12-29 02:21:43 +01:00

@github-actions[bot] commented on GitHub (Jun 5, 2024):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (Jun 5, 2024): This issue is stale because it has been open for 90 days with no activity.

adam commented

2025-12-29 02:21:43 +01:00

@gdonval commented on GitHub (Jun 5, 2024):

Shouldn't be stale.

@gdonval commented on GitHub (Jun 5, 2024): Shouldn't be stale.

adam commented

2025-12-29 02:21:43 +01:00

@kradalby commented on GitHub (Jun 5, 2024):

I think this would require changes to the Tailscale client, and therefore would be out of scope for this project.

@kradalby commented on GitHub (Jun 5, 2024): I think this would require changes to the Tailscale client, and therefore would be out of scope for this project.

adam commented

2025-12-29 02:21:43 +01:00

@github-actions[bot] commented on GitHub (Sep 4, 2024):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (Sep 4, 2024): This issue is stale because it has been open for 90 days with no activity.

adam commented

2025-12-29 02:21:43 +01:00

@mjohnson9 commented on GitHub (Sep 10, 2024):

There is a significant weakness in the proposed design.

For each node pair, Headscale generates a separate PSK that is pushed onto that pair.

The Headscale server should never know the PSK of any pair of peers.

If, as in the proposed scenario, there is a cryptographic break of ed25519, the PSK becomes the sole cryptographic key material protecting the connection. If, as in the proposed solution, the Headscale server generated and/or stored the PSK, it is now capable of decrypting and spoofing traffic.

There are a few potential solutions, but I don't want to put out ideas using cryptographic primitives from back-of-the-napkin thoughts, because cryptographic primitives are prone to severe weakening through subtle misuse.

@mjohnson9 commented on GitHub (Sep 10, 2024): There is a significant weakness in the proposed design. > For each node pair, Headscale generates a separate PSK that is pushed onto that pair. The Headscale server should never know the PSK of any pair of peers. If, as in the proposed scenario, there is a cryptographic break of ed25519, the PSK becomes the sole cryptographic key material protecting the connection. If, as in the proposed solution, the Headscale server generated and/or stored the PSK, it is now capable of decrypting and spoofing traffic. There are a few potential solutions, but I don't want to put out ideas using cryptographic primitives from back-of-the-napkin thoughts, because cryptographic primitives are prone to severe weakening through subtle misuse.

adam commented

2025-12-29 02:21:44 +01:00

@gdonval commented on GitHub (Sep 11, 2024):

If there is a cryptographic break of ed25519, the PSK becomes the sole cryptographic key material

If the private key can be retrieved from the public key, then it is correct. With the current system in such situation, every eavesdropper can decrypt everything though. The PSK makes it at least a tiny bit harder. But the situation is no worse than keeping using no PSK.

There are a few potential solutions

In my mind, there are multiple levels of "flaws" before reaching the dreaded private key full compromise so I wasn't thinking "what if your crypto is completely broken and you want to keep using it?". That doesn't sound like a reasonable threat model, which is why I didn't even try to think about it.

Plus the main boon, I think, is to provide a mechanism that scales for 2M nodes as well as it does for 2 nodes while retaining strong node identity (you can even get something like tailnet-lock-like feature for free with this)! As long as ed25519 is not completely broken, a headscale provided PSK is a simple way to achieve all this. And if it is broken in the future, eavesdropper would have to hope they also captured that exchange, that might not have been performed with elliptic curve crypto.

I'm obviously not against getting a fancy post-quantum key exchange algorithm mixed in everything instead of just Headscale saying "here's your PSK to peer X" but I just found the latter very elegant (and not any worse than the normal scheme security-wise).

But yeah, we could let the nodes do a fancy post-quantum key exchange. Actually, for the sake of simplicity and to ensure headscale itself doesn't eavesdrop, that could occur in a wireguard channel first established without a PSK to kickstart the negotiation and then updated with negotiated keys. (I'm actually serious even though I know it sounds convoluted, using the channel is a great way to achieve mutual authentication)

There is just one thing though... If the private keys are compromised (i.e. if ed25519 is completely broken), you don't get mutual authentication anymore. The key exchange would still provide protection against passive attackers, which is a good thing, but active attackers can MitM the whole thing. At prime position to do this kind of naughty stuff is the headscale server (which is a reason why I suggested establishing and using a wireguard channel to do the key exchange).

@gdonval commented on GitHub (Sep 11, 2024): > If there is a cryptographic break of ed25519, the PSK becomes the sole cryptographic key material If the private key can be retrieved from the public key, then it is correct. With the current system in such situation, every eavesdropper can decrypt everything though. The PSK makes it at least a tiny bit harder. But the situation is no worse than keeping using no PSK. --- > There are a few potential solutions In my mind, there are multiple levels of "flaws" before reaching the dreaded private key full compromise so I wasn't thinking "what if your crypto is completely broken and you want to keep using it?". That doesn't sound like a reasonable threat model, which is why I didn't even try to think about it. Plus the main boon, I think, is to provide a mechanism that scales for 2M nodes as well as it does for 2 nodes while retaining strong node identity (you can even get something like tailnet-lock-like feature for free with this)! As long as ed25519 is not completely broken, a headscale provided PSK is a simple way to achieve all this. And if it is broken in the future, eavesdropper would have to hope they also captured that exchange, that might not have been performed with elliptic curve crypto. --- I'm obviously not against getting a fancy post-quantum key exchange algorithm mixed in everything instead of just Headscale saying "here's your PSK to peer X" but I just found the latter very elegant (and not any worse than the normal scheme security-wise). But yeah, we could let the nodes do a fancy post-quantum key exchange. Actually, for the sake of simplicity and to ensure headscale itself doesn't eavesdrop, that could occur in a wireguard channel first established without a PSK to kickstart the negotiation and then updated with negotiated keys. (I'm actually serious even though I know it sounds convoluted, using the channel is a great way to achieve mutual authentication) There is just one thing though... If the private keys are compromised (i.e. if ed25519 is completely broken), you don't get mutual authentication anymore. The key exchange would still provide protection against passive attackers, which is a good thing, but active attackers can MitM the whole thing. At prime position to do this kind of naughty stuff is the headscale server (which is a reason why I suggested establishing and using a wireguard channel to do the key exchange).

adam commented

2025-12-29 02:21:44 +01:00

@github-actions[bot] commented on GitHub (Dec 27, 2024):

This issue is stale because it has been open for 90 days with no activity.

@github-actions[bot] commented on GitHub (Dec 27, 2024): This issue is stale because it has been open for 90 days with no activity.

adam commented

2025-12-29 02:21:44 +01:00

@github-actions[bot] commented on GitHub (Jan 4, 2025):

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions[bot] commented on GitHub (Jan 4, 2025): This issue was closed because it has been inactive for 14 days since being marked as stale.

adam commented

2025-12-29 02:21:44 +01:00

@marek22k commented on GitHub (Jan 8, 2025):

It would also be a change in the tailscale client and therefore outside the project, but Rosenpass does something similar. Rosenpass uses post-quantum secure algorithms to negotiate new PSKs every two minutes.
netbird, a kind of Tailscale competitor, has already integrated this, for example.
Related issue: https://github.com/tailscale/tailscale/issues/14370

@marek22k commented on GitHub (Jan 8, 2025): It would also be a change in the tailscale client and therefore outside the project, but [Rosenpass](https://rosenpass.eu/) does something similar. Rosenpass uses post-quantum secure algorithms to negotiate new PSKs every two minutes. netbird, a kind of Tailscale competitor, has already [integrated this](https://docs.netbird.io/how-to/enable-post-quantum-cryptography), for example. Related issue: https://github.com/tailscale/tailscale/issues/14370

adam referenced this issue

2025-12-29 02:30:34 +01:00

[PR #658] [MERGED] Quick fix to segfault on CLI when Headscale is not running #1557

Sign in to join this conversation.

Branches Tags

main

gh-pages

update_flake_lock_action

kradalby/3038-reg-panic

kradalby/release-v0.27.2

dependabot/go_modules/golang.org/x/crypto-0.45.0

dependabot/go_modules/github.com/opencontainers/runc-1.3.3

copilot/investigate-headscale-issue-2788

copilot/investigate-visibility-issue-2788

copilot/investigate-issue-2833

copilot/debug-issue-2846

copilot/fix-issue-2847

dependabot/go_modules/github.com/go-viper/mapstructure/v2-2.4.0

dependabot/go_modules/github.com/docker/docker-28.3.3incompatible

kradalby/cli-experiement3

doc/0.26.1

doc/0.25.1

doc/0.25.0

doc/0.24.3

doc/0.24.2

doc/0.24.1

doc/0.24.0

kradalby/build-docker-on-pr

topic/docu-versioning

topic/docker-kos

juanfont/fix-crash-node-id

juanfont/better-disclaimer

update-contributors

topic/prettier

revert-1893-add-test-stage-to-docs

add-test-stage-to-docs

remove-node-check-interval

fix-empty-prefix

fix-ephemeral-reusable

bug_report-debuginfo

autogroups

logs-to-stderr

revert-1414-topic/fix_unix_socket

rename-machine-node

port-embedded-derp-tests-v2

port-derp-tests

duplicate-word-linter

update-tailscale-1.36

warn-against-apache

ko-fi-link

more-acl-tests

fix-typo-standalone

parallel-nolint

tparallel-fix

rerouting

ssh-changelog-docs

oidc-cleanup

web-auth-flow-tests

kradalby-gh-runner

fix-proto-lint

remove-funding-links

go-1.19

enable-1.30-in-tests

0.16.x

cosmetic-changes-integration

tmp-fix-integration-docker

fix-integration-docker

configurable-update-interval

show-nodes-online

hs2021

acl-syntax-fixes

ts2021-implementation

fix-spurious-updates

unstable-integration-tests

mandatory-stun

embedded-derp

prtemplate-fix

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/headscale#658