Commit Graph

174 Commits

Author SHA1 Message Date
Kristoffer Dalby
3e0a96ec3a all: fix test flakiness and improve test infrastructure
Buffer the AuthRequest verdict channel to prevent a race where the
sender blocks indefinitely if the receiver has already timed out, and
increase the auth followup test timeout from 100ms to 5s to prevent
spurious failures under load.

Skip postgres-backed tests when the postgres server is unavailable
instead of calling t.Fatal, which was preventing the rest of the test
suite from running.

Add TestMain to db, types, and policy/v2 packages to chdir to the
source directory before running tests. This ensures relative testdata/
paths resolve correctly when the test binary is executed from an
arbitrary working directory (e.g., via "go tool stress").
2026-03-14 02:52:28 -07:00
DM
610c1daa4d types: avoid NodeView clone in CanAccess
NodeView.CanAccess called node2.AsStruct() on every check. In peer-map construction we run CanAccess in O(n^2) pair scans (often twice per pair), so that per-call clone multiplied into large heap churn
2026-02-26 19:15:07 -08:00
Kristoffer Dalby
107c2f2f70 policy, noise: implement SSH check action
Implement the SSH "check" action which requires additional
verification before allowing SSH access. The policy compiler generates
a HoldAndDelegate URL that the Tailscale client calls back to
headscale. The SSHActionHandler creates an auth session and waits for
approval via the generalised auth flow.

Sort check (HoldAndDelegate) rules before accept rules to match
Tailscale's first-match-wins evaluation order.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
cb3b6949ea auth: generalise auth flow and introduce AuthVerdict
Generalise the registration pipeline to a more general auth pipeline
supporting both node registrations and SSH check auth requests.
Rename RegistrationID to AuthID, unexport AuthRequest fields, and
introduce AuthVerdict to unify the auth finish API.

Add the urlParam generic helper for extracting typed URL parameters
from chi routes, used by the new auth request handler.

Updates #1850
2026-02-25 21:28:05 +01:00
Kristoffer Dalby
894e6946dc hscontrol/types: regenerate types_view.go
make generate

Updates #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
75e56df9e4 hscontrol: enforce that tagged nodes never have user_id
Tagged nodes are owned by their tags, not a user. Enforce this
invariant at every write path:

- createAndSaveNewNode: do not set UserID for tagged PreAuthKey
  registration; clear UserID when advertise-tags are applied
  during OIDC/CLI registration
- SetNodeTags: clear UserID/User when tags are assigned
- processReauthTags: clear UserID/User when tags are applied
  during re-authentication
- validateNodeOwnership: reject tagged nodes with non-nil UserID
- NodeStore: skip nodesByUser indexing for tagged nodes since
  they have no owning user
- HandleNodeFromPreAuthKey: add fallback lookup for tagged PAK
  re-registration (tagged nodes indexed under UserID(0)); guard
  against nil User deref for tagged nodes in different-user check

Since tagged nodes now have user_id = NULL, ListNodesByUser
will not return them and DestroyUser naturally allows deleting
users whose nodes have all been tagged. The ON DELETE CASCADE
FK cannot reach tagged nodes through a NULL foreign key.

Also tone down shouty comments throughout state.go.

Fixes #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
43afeedde2 all: apply golangci-lint 2.9.0 fixes
Fix issues found by the upgraded golangci-lint:
- wsl_v5: add required whitespace in CLI files
- staticcheck SA4006: replace new(var.Field) with &localVar
  pattern since staticcheck does not recognize Go 1.26
  new(value) as a use of the variable
- staticcheck SA5011: use t.Fatal instead of t.Error for
  nil guard checks so execution stops
- unused: remove dead ptrTo helper function
2026-02-19 08:21:23 +01:00
Kristoffer Dalby
0f6d312ada all: upgrade to Go 1.26rc2 and modernize codebase
This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and
adopts new language features.

Toolchain updates:
- go.mod: go 1.25.5 → go 1.26rc2
- flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26
- flake.nix: build golangci-lint from source with Go 1.26
- Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie
- Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine
- Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine
- .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26
- cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2
- .pre-commit-config.yaml: simplify golangci-lint hook entry

Code modernization using Go 1.26 features:
- Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare
- Replace ptr.To(x) with new(x) syntax
- Replace errors.As with errors.AsType[T]

Lint rule updates:
- Add forbidigo rules to prevent regression to old patterns
2026-02-08 12:35:23 +01:00
Kristoffer Dalby
ce580f8245 all: fix golangci-lint issues (#3064) 2026-02-06 21:45:32 +01:00
Kristoffer Dalby
3acce2da87 errors: rewrite errors to follow go best practices
Errors should not start capitalised and they should not contain the word error
or state that they "failed" as we already know it is an error

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
4a9a329339 all: use lowercase log messages
Go style recommends that log messages and error strings should not be
capitalized (unless beginning with proper nouns or acronyms) and should
not end with punctuation.

This change normalizes all zerolog .Msg() and .Msgf() calls to start
with lowercase letters, following Go conventions and making logs more
consistent across the codebase.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
cf3d30b6f6 types: add MarshalZerologObject to domain types
Implement zerolog.LogObjectMarshaler interface on domain types
for structured logging:

- Node: logs node.id, node.name, machine.key (short), node.key (short),
  node.is_tagged, node.expired, node.online, node.tags, user.name
- User: logs user.id, user.name, user.display, user.provider
- PreAuthKey: logs pak.id, pak.prefix (masked), pak.reusable,
  pak.ephemeral, pak.used, pak.is_tagged, pak.tags
- APIKey: logs api_key.id, api_key.prefix (masked), api_key.expiration

Security: PreAuthKey and APIKey only log masked prefixes, never full
keys or hashes. Uses zf.* constants for consistent field naming.
2026-02-06 07:40:29 +01:00
Kristoffer Dalby
a04b21abc6 gen: regenerate protobuf and type views
Regenerated with updated grpc-gateway and tailscale dependencies.
2026-01-21 19:17:10 +00:00
Kristoffer Dalby
3b4b9a4436 hscontrol: fix tag updates not propagating to node self view
When SetNodeTags changed a node's tags, the node's self view wasn't
updated. The bug manifested as: the first SetNodeTags call updates
the server but the client's self view doesn't update until a second
call with the same tag.

Root cause: Three issues combined to prevent self-updates:

1. SetNodeTags returned PolicyChange which doesn't set OriginNode,
   so the mapper's self-update check failed.

2. The Change.Merge function didn't preserve OriginNode, so when
   changes were batched together, OriginNode was lost.

3. generateMapResponse checked OriginNode only in buildFromChange(),
   but PolicyChange uses RequiresRuntimePeerComputation which
   bypasses that code path entirely and calls policyChangeResponse()
   instead.

The fix addresses all three:
- state.go: Set OriginNode on the returned change
- change.go: Preserve OriginNode (and TargetNode) during merge
- batcher.go: Pass isSelfUpdate to policyChangeResponse so the
  origin node gets both self info AND packet filters
- mapper.go: Add includeSelf parameter to policyChangeResponse

Fixes #2978
2026-01-20 10:13:47 +01:00
Kristoffer Dalby
3689f05407 types: use Username() in User.Proto() when Name is empty
User.Proto() was returning u.Name directly, which is empty for OIDC
users who have their identifier in the Email field instead. This caused
"headscale nodes list" to show empty user names for OIDC-authenticated
nodes.

Only fall back to Username() when Name is empty, which provides a
display-friendly identifier (Email > ProviderIdentifier > ID). This
ensures OIDC users display their email while CLI users retain their
original Name.

Fixes #2972
2026-01-14 13:16:51 +01:00
Kristoffer Dalby
0516c0ec37 gen: regenerate protobuf code 2026-01-14 09:32:46 +01:00
Kristoffer Dalby
72fcb93ef3 cli: ensure tagged-devices is included in profile list (#2991) 2026-01-09 16:31:23 +01:00
Justin Angel
7be20912f5 oidc: make email verification configurable
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-12-18 11:42:32 +00:00
Kristoffer Dalby
5767ca5085 change: smarter change notifications
This commit replaces the ChangeSet with a simpler bool based
change model that can be directly used in the map builder to
build the appropriate map response based on the change that
has occured. Previously, we fell back to sending full maps
for a lot of changes as that was consider "the safe" thing to
do to ensure no updates were missed.

This was slightly problematic as a node that already has a list
of peers will only do full replacement of the peers if the list
is non-empty, meaning that it was not possible to remove all
nodes (if for example policy changed).

Now we will keep track of last seen nodes, so we can send remove
ids, but also we are much smarter on how we send smaller, partial
maps when needed.

Fixes #2389

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
506bd8c8eb policy: more accurate node change
This commit changes so that node changes to the policy is
calculated if any of the nodes has changed in a way that might
affect the policy.

Previously we just checked if the number of nodes had changed,
which meant that if a node was added and removed, we would be
in a bad state.

Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
2025-12-16 10:12:36 +01:00
Kristoffer Dalby
642073f4b8 types: add option to disable taildrop, improve tests (#2955) 2025-12-12 11:35:16 +01:00
Kristoffer Dalby
c8376e44a2 mapper: move tail node conversion to node type (#2950) 2025-12-10 09:16:22 +01:00
Kristoffer Dalby
eb788cd007 make tags first class node owner (#2885)
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.

There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.

Updates #2417
Closes #2619
2025-12-02 12:01:25 +01:00
Kristoffer Dalby
eec196d200 modernize: run gopls modernize to bring up to 1.25 (#2920) 2025-12-01 19:40:25 +01:00
Kristoffer Dalby
db293e0698 hscontrol/state: make NodeStore batch configuration tunable (#2886) 2025-11-28 16:38:29 +01:00
Kristoffer Dalby
7fb0f9a501 batcher: send endpoint and derp only updates. (#2856) 2025-11-13 20:38:49 +01:00
Kristoffer Dalby
da9018a0eb types: make pre auth key use bcrypt (#2853) 2025-11-12 16:36:36 +01:00
Kristoffer Dalby
2024219bd1 types: Distinguish subnet and exit node access
When we fixed the issue of node visibility of nodes
that only had access to eachother because of a subnet
route, we gave all nodes access to all exit routes by
accident.

This commit splits exit nodes and subnet routes in the
access.

If a matcher indicates that the node should have access to
any part of the subnet routes, we do not remove it from the
node list.

If a matcher destination is equal to the internet, and the
target node is an exit node, we also do not remove the access.

Fixes #2784
Fixes #2788

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
bd9cf42b96 types: NodeView CanAccess uses internal
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
1c0bb0338d types: split SubnetRoutes and ExitRoutes
There are situations where the subnet routes and exit nodes
must be treated differently. This splits it so SubnetRoutes
only returns routes that are not exit routes.

It adds `IsExitRoutes` and `AllApprovedRoutes` for convenience.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-11-02 13:19:59 +01:00
Kristoffer Dalby
2bf1200483 policy: fix autogroup:self propagation and optimize cache invalidation (#2807) 2025-10-23 17:57:41 +02:00
Kristoffer Dalby
1cdea7ed9b stricter hostname validation and replace (#2383) 2025-10-22 13:50:39 +02:00
Juanjo Presa
c97d0ff23d Fix fatal error on missing config file by handling viper.ConfigFileNotFoundError
Correctly identify Viper's ConfigFileNotFoundError in LoadConfig to log a warning and use defaults, unifying behavior with empty config files. Fixes fatal error when no config file is present for CLI commands relying on environment variables.
2025-10-19 15:29:47 +02:00
Kristoffer Dalby
fddc7117e4 stability and race conditions in auth and node store (#2781)
This PR addresses some consistency issues that was introduced or discovered with the nodestore.

nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.

auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.

integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)

This PR was assisted, particularly tests, by claude code.
2025-10-16 12:17:43 +02:00
Andrey Bobelev
c4a8c038cd fix: return valid AuthUrl in followup request on expired reg id
- tailscale client gets a new AuthUrl and sets entry in the regcache
- regcache entry expires
- client doesn't know about that
- client always polls followup request а gets error

When user clicks "Login" in the app (after cache expiry), they visit
invalid URL and get "node not found in registration cache". Some clients
on Windows for e.g. can't get a new AuthUrl without restart the app.

To fix that we can issue a new reg id and return user a new valid
AuthUrl.

RegisterNode is refactored to be created with NewRegisterNode() to
autocreate channel and other stuff.
2025-10-11 05:57:39 +02:00
Andrey Bobelev
022098fe4e chore: make reg cache expiry tunable
Mostly for the tests, opts:

- tuning.register_cache_expiration
- tuning.register_cache_cleanup
2025-10-11 05:57:39 +02:00
Kristoffer Dalby
ed3a9c8d6d mapper: send change instead of full update (#2775) 2025-09-17 14:23:21 +02:00
Kristoffer Dalby
1b1c989268 {policy, node}: allow return paths in route reduction (#2767) 2025-09-12 11:47:51 +02:00
Kristoffer Dalby
3950f8f171 cli: use gobuild version handling (#2770) 2025-09-12 11:47:31 +02:00
Kristoffer Dalby
233dffc186 lint and leftover
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
9d236571f4 state/nodestore: in memory representation of nodes
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.

It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.

Writes will block until commited, while reads are never
blocked.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
b87567628a derp: increase update frequency and harden on failures (#2741) 2025-08-22 10:40:38 +02:00
Florian Preinstorfer
be337c6a33 Enable derp.server.verify_clients by default
This setting is already enabled in example-config.yaml but would default
to false if no key is set.
2025-08-19 11:30:44 +02:00
Fredrik Ekre
5d8a2c25ea OIDC: Query userinfo endpoint before verifying user
This patch includes some changes to the OIDC integration in particular:
 - Make sure that userinfo claims are queried *before* comparing the
   user with the configured allowed groups, email and email domain.
 - Update user with group claim from the userinfo endpoint which is
   required for allowed groups to work correctly. This is essentially a
   continuation of #2545.
 - Let userinfo claims take precedence over id token claims.

With these changes I have verified that Headscale works as expected
together with Authelia without the documented escape hatch [0], i.e.
everything works even if the id token only contain the iss and sub
claims.

[0]: https://www.authelia.com/integration/openid-connect/headscale/#configuration-escape-hatch
2025-08-11 17:51:16 +02:00
Kristoffer Dalby
a058bf3cd3 mapper: produce map before poll (#2628) 2025-07-28 11:15:53 +02:00
Kristoffer Dalby
c6d7b512bd integration: replace time.Sleep with assert.EventuallyWithT (#2680) 2025-07-10 23:38:55 +02:00
Kristoffer Dalby
73023c2ec3 all: use immutable node view in read path
This commit changes most of our (*)types.Node to
types.NodeView, which is a readonly version of the
underlying node ensuring that there is no mutations
happening in the read path.

Based on the migration, there didnt seem to be any, but the
idea here is to prevent it in the future and simplify other
new implementations.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-07-07 21:28:59 +01:00
Stavros Kois
855c48aec2 remove unneeded check (#2658) 2025-07-04 15:47:01 +00:00
Stavros Kois
ded049b905 don't crash if config file is missing (#2656) 2025-07-04 12:58:17 +00:00
Kristoffer Dalby
1553f0ab53 state: introduce state
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.

The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.

Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-06-24 07:58:54 +02:00