Commit Graph

20 Commits

Author SHA1 Message Date
Kristoffer Dalby
3e0a96ec3a all: fix test flakiness and improve test infrastructure
Buffer the AuthRequest verdict channel to prevent a race where the
sender blocks indefinitely if the receiver has already timed out, and
increase the auth followup test timeout from 100ms to 5s to prevent
spurious failures under load.

Skip postgres-backed tests when the postgres server is unavailable
instead of calling t.Fatal, which was preventing the rest of the test
suite from running.

Add TestMain to db, types, and policy/v2 packages to chdir to the
source directory before running tests. This ensures relative testdata/
paths resolve correctly when the test binary is executed from an
arbitrary working directory (e.g., via "go tool stress").
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
3db0a483ed integration: add SSH check mode tests
Add ReadLog method to headscale integration container for log
inspection. Split SSH check mode tests into CLI and OIDC variants
and add comprehensive test coverage:

- TestSSHOneUserToOneCheckModeCLI: basic check mode with CLI approval
- TestSSHOneUserToOneCheckModeOIDC: check mode with OIDC approval
- TestSSHCheckModeUnapprovedTimeout: rejection on cache expiry
- TestSSHCheckModeCheckPeriodCLI: session expiry and re-auth
- TestSSHCheckModeAutoApprove: auto-approval within check period
- TestSSHCheckModeNegativeCLI: explicit rejection via CLI

Update existing integration tests to use headscale auth register.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
cb3b6949ea auth: generalise auth flow and introduce AuthVerdict
Generalise the registration pipeline to a more general auth pipeline
supporting both node registrations and SSH check auth requests.
Rename RegistrationID to AuthID, unexport AuthRequest fields, and
introduce AuthVerdict to unify the auth finish API.

Add the urlParam generic helper for extracting typed URL parameters
from chi routes, used by the new auth request handler.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
f20bd0cf08 node: implement disable key expiry via CLI and API
Add --disable flag to "headscale nodes expire" CLI command and
disable_expiry field handling in the gRPC API to allow disabling
key expiry for nodes. When disabled, the node's expiry is set to
NULL and IsExpired() returns false.

The CLI follows the new grpcRunE/RunE/printOutput patterns
introduced in the recent CLI refactor.

Also fix NodeSetExpiry to persist directly to the database instead
of going through persistNodeToDB which omits the expiry field.

Fixes #2681

Co-authored-by: Marco Santos <me@marcopsantos.com>
2026-02-20 21:49:55 +01:00
Kristoffer Dalby
ce580f8245 all: fix golangci-lint issues (#3064) 2026-02-06 21:45:32 +01:00
Kristoffer Dalby
ca75e096e6 integration: add test for tagged→user-owned conversion panic
Add TestTagsAuthKeyConvertToUserViaCLIRegister that reproduces the
exact panic from #3038: register a node with a tags-only PreAuthKey
(no user), force reauth with empty tags, then register via CLI with
a user. The mapper panics on node.Owner().Model().ID when User is nil.

The critical detail is using a tags-only PreAuthKey (User: nil). When
the key is created under a user, the node inherits the User pointer
from createAndSaveNewNode and the bug is masked.

Also add Owner() validity assertions to the existing unit test
TestTaggedNodeWithoutUserToDifferentUser to catch the nil pointer
at the unit test level.

Updates #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
306aabbbce state: fix nil pointer panic when re-registering tagged node without user
When a node was registered with a tags-only PreAuthKey (no user
associated), the node had User=nil and UserID=nil. When attempting to
re-register this node to a different user via HandleNodeFromAuthPath,
two issues occurred:

1. The code called oldUser.Name() without checking if oldUser was valid,
   causing a nil pointer dereference panic.

2. The existing node lookup logic didn't find the tagged node because it
   searched by (machineKey, userID), but tagged nodes have no userID.
   This caused a new node to be created instead of updating the existing
   tagged node.

Fix this by restructuring HandleNodeFromAuthPath to:
1. First check if a node exists for the same user (existing behavior)
2. If not found, check if an existing TAGGED node exists with the same
   machine key (regardless of userID)
3. If a tagged node exists, UPDATE it to convert from tagged to
   user-owned (preserving the node ID)
4. Only create a new node if the existing node is user-owned by a
   different user

This ensures consistent behavior between:
- personal → tagged → personal (same node, same owner)
- tagged (no user) → personal (same node, new owner)

Add a test that reproduces the panic and conversion scenario by:
1. Creating a tags-only PreAuthKey (no user)
2. Registering a node with that key
3. Re-registering the same machine to a different user
4. Verifying the node ID stays the same (conversion, not creation)

Fixes #3038
2026-02-04 15:44:55 +01:00
Kristoffer Dalby
46daa659e2 state: omit AuthKeyID/AuthKey in node Updates to prevent FK errors
When a PreAuthKey is deleted, the database correctly sets auth_key_id
to NULL on referencing nodes via ON DELETE SET NULL. However, the
NodeStore (in-memory cache) retains the old AuthKeyID value.

When nodes send MapRequests (e.g., after tailscaled restart), GORM's
Updates() tries to persist the stale AuthKeyID, causing a foreign key
constraint error when trying to reference a deleted PreAuthKey.

Fix this by adding AuthKeyID and AuthKey to the Omit() call in all
three places where nodes are updated via GORM's Updates():
- persistNodeToDB (MapRequest processing)
- HandleNodeFromAuthPath (re-auth via web/OIDC)
- HandleNodeFromPreAuthKey (re-registration with preauth key)

This tells GORM to never touch the auth_key_id column or AuthKey
association during node updates, letting the database handle the
foreign key relationship correctly.

Added TestDeletedPreAuthKeyNotRecreatedOnNodeUpdate to verify that
deleted PreAuthKeys are not recreated when nodes send MapRequests.
2026-01-26 12:12:11 +00:00
Kristoffer Dalby
0451dd4718 state: allow untagging nodes via reauth with empty RequestTags
When a node re-authenticates via OIDC/web auth with empty RequestTags
(from `tailscale up --advertise-tags= --force-reauth`), remove all tags
and return ownership to the authenticating user.

This allows nodes to transition from any tagged state (including nodes
originally registered with a tagged pre-auth key) back to user-owned.

Fixes #2979
2026-01-17 10:13:24 +01:00
Kristoffer Dalby
87bd67318b golangci-lint: use forbidigo to block time.Sleep (#2946) 2025-12-10 16:45:59 +00:00
Kristoffer Dalby
22ee2bfc9c tags: process tags on registration, simplify policy (#2931)
This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model.

When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users.

A node can either be owned by a user, or a tag.

Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time.

Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code.
Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different.

A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.
2025-12-08 18:51:07 +01:00
Kristoffer Dalby
15c84b34e0 policy: allow tags to own tags (#2930) 2025-12-06 10:23:35 +01:00
Kristoffer Dalby
eb788cd007 make tags first class node owner (#2885)
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.

There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.

Updates #2417
Closes #2619
2025-12-02 12:01:25 +01:00
Kristoffer Dalby
3cf2d7195a auth: ensure machines are allowed in when pak change (#2917) 2025-12-02 12:01:02 +01:00
Kristoffer Dalby
eec196d200 modernize: run gopls modernize to bring up to 1.25 (#2920) 2025-12-01 19:40:25 +01:00
Kristoffer Dalby
da9018a0eb types: make pre auth key use bcrypt (#2853) 2025-11-12 16:36:36 +01:00
Kristoffer Dalby
4728a2ba9e hscontrol/state: allow expired auth keys for node re-registration
Skip auth key validation for existing nodes re-registering with the same
NodeKey. Pre-auth keys are only required for initial authentication.

NodeKey rotation still requires a valid auth key as it is a security-sensitive
operation that changes the node's cryptographic identity.

Fixes #2830
2025-11-11 05:12:59 -06:00
Kristoffer Dalby
fddc7117e4 stability and race conditions in auth and node store (#2781)
This PR addresses some consistency issues that was introduced or discovered with the nodestore.

nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.

auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.

integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)

This PR was assisted, particularly tests, by claude code.
2025-10-16 12:17:43 +02:00
Kristoffer Dalby
1553f0ab53 state: introduce state
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.

The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.

Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-06-24 07:58:54 +02:00
Kristoffer Dalby
45752db0f6 Return better web errors to the user (#2398)
* add dedicated http error to propagate to user

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* classify user errors in http handlers

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* move validation of pre auth key out of db

This move separates the logic a bit and allow us to
write specific errors for the caller, in this case the web
layer so we can present the user with the correct error
codes without bleeding web stuff into a generic validate.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

* update changelog

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>

---------

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-02-01 15:25:18 +01:00