headscale

mirror of https://github.com/juanfont/headscale.git synced 2026-03-19 16:21:23 +01:00

Author	SHA1	Message	Date
Kristoffer Dalby	3ebe4d99c1	mapper/batcher: reduce lock contention with two-phase send Rewrite multiChannelNodeConn.send() to use a two-phase approach: 1. RLock: snapshot connections slice (cheap pointer copy) 2. Unlock: send to all connections (50ms timeouts happen here) 3. Lock: remove failed connections by pointer identity Previously, send() held the write lock for the entire duration of sending to all connections. With N stale connections each timing out at 50ms, this blocked addConnection/removeConnection for N50ms. The two-phase approach holds the lock only for O(N) pointer operations, not for N50ms I/O waits.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	da33795e79	mapper/batcher: fix race conditions in cleanup and lookups Replace the two-phase Load-check-Delete in cleanupOfflineNodes with xsync.Map.Compute() for atomic check-and-delete. This prevents the TOCTOU race where a node reconnects between the hasActiveConnections check and the Delete call. Add nil guards on all b.nodes.Load() and b.nodes.Range() call sites to prevent nil pointer panics from concurrent cleanup races.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	57070680a5	mapper/batcher: restructure internals for correctness Move per-node pending changes from a shared xsync.Map on the batcher into multiChannelNodeConn, protected by a dedicated mutex. The new appendPending/drainPending methods provide atomic append and drain operations, eliminating data races in addToBatch and processBatchedChanges. Add sync.Once to multiChannelNodeConn.close() to make it idempotent, preventing panics from concurrent close calls on the same channel. Add started atomic.Bool to guard Start() against being called multiple times, preventing orphaned goroutines. Add comprehensive concurrency tests validating these changes.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	21e02e5d1f	mapper/batcher: add unit tests and benchmarks Add comprehensive unit tests for the LockFreeBatcher covering AddNode/RemoveNode lifecycle, addToBatch routing (broadcast, targeted, full update), processBatchedChanges deduplication, cleanup of offline nodes, close/shutdown behavior, IsConnected state tracking, and connected map consistency. Add benchmarks for connection entry send, multi-channel send and broadcast, peer diff computation, sentPeers updates, addToBatch at various scales (10/100/1000 nodes), processBatchedChanges, broadcast delivery, IsConnected lookups, connected map enumeration, connection churn, and concurrent send+churn scenarios. Widen setupBatcherWithTestData to accept testing.TB so benchmarks can reuse the same database-backed test setup as unit tests.	2026-03-14 02:52:28 -07:00
DM	4aca9d6568	poll: stop stale map sessions through an explicit teardown hook When stale-send cleanup prunes a connection from the batcher, the old serveLongPoll session needs an explicit stop signal. Pass a stop hook into AddNode and trigger it when that connection is removed, so the session exits through its normal cancel path instead of relying on channel closure from the batcher side.	2026-03-12 01:27:34 -07:00
DM	3daf45e88a	mapper: close stale map channels after send timeouts When the batcher timed out sending to a node, it removed the channel from multiChannelNodeConn but left the old serveLongPoll goroutine running on that channel. That left a live stale session behind: it no longer received new updates, but it could still keep the stream open and block shutdown. Close the pruned channel when stale-send cleanup removes it so the old map session exits after draining any buffered update.	2026-03-12 01:27:34 -07:00
DM	b81d6c734d	mapper: handle RemoveNode after channel cleanup A connection can already be removed from multiChannelNodeConn by the stale-send cleanup path before serveLongPoll reaches its deferred RemoveNode call. In that case RemoveNode used to return early on "channel not found" and never updated the node's connected state. Drop that early return so RemoveNode still checks whether any active connections remain and marks the node disconnected when the last one is gone.	2026-03-12 01:27:34 -07:00
Kristoffer Dalby	cb3b6949ea	auth: generalise auth flow and introduce AuthVerdict Generalise the registration pipeline to a more general auth pipeline supporting both node registrations and SSH check auth requests. Rename RegistrationID to AuthID, unexport AuthRequest fields, and introduce AuthVerdict to unify the auth finish API. Add the urlParam generic helper for extracting typed URL parameters from chi routes, used by the new auth request handler. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	0f6d312ada	all: upgrade to Go 1.26rc2 and modernize codebase This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and adopts new language features. Toolchain updates: - go.mod: go 1.25.5 → go 1.26rc2 - flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26 - flake.nix: build golangci-lint from source with Go 1.26 - Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie - Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine - Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine - .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26 - cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2 - .pre-commit-config.yaml: simplify golangci-lint hook entry Code modernization using Go 1.26 features: - Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare - Replace ptr.To(x) with new(x) syntax - Replace errors.As with errors.AsType[T] Lint rule updates: - Add forbidigo rules to prevent regression to old patterns	2026-02-08 12:35:23 +01:00
Kristoffer Dalby	ce580f8245	all: fix golangci-lint issues (#3064 )	2026-02-06 21:45:32 +01:00
Kristoffer Dalby	3acce2da87	errors: rewrite errors to follow go best practices Errors should not start capitalised and they should not contain the word error or state that they "failed" as we already know it is an error Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	4a9a329339	all: use lowercase log messages Go style recommends that log messages and error strings should not be capitalized (unless beginning with proper nouns or acronyms) and should not end with punctuation. This change normalizes all zerolog .Msg() and .Msgf() calls to start with lowercase letters, following Go conventions and making logs more consistent across the codebase.	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	53cdeff129	hscontrol/mapper: use sub-loggers and zf constants Add sub-logger patterns to worker(), AddNode(), RemoveNode() and multiChannelNodeConn to eliminate repeated field calls. Use zf.* constants for consistent field naming. Changes in batcher_lockfree.go: - Add wlog sub-logger in worker() with worker.id context - Add log field to multiChannelNodeConn struct - Initialize mc.log with node.id in newMultiChannelNodeConn() - Add nlog sub-loggers in AddNode() and RemoveNode() - Update all connection methods to use mc.log Changes in batcher.go: - Use zf.NodeID and zf.Reason in handleNodeChange()	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	91730e2a1d	hscontrol: use EmbedObject for node logging Replace manual Uint64("node.id")/Str("node.name") field patterns with EmbedObject(node) which automatically includes all standard node fields (id, name, machine key, node key, online status, tags, user). This reduces code repetition and ensures consistent logging across: - state.go: Connect/Disconnect, persistNodeToDB, AutoApproveRoutes - auth.go: handleLogout, handleRegisterWithAuthKey	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	ce7c256d1e	state: set User pointer during tagged→user-owned conversion processReauthTags sets UserID when converting a tagged node to user-owned, but does not set the User pointer. When the node was registered with a tags-only PreAuthKey (User: nil), the in-memory NodeStore cache holds a node with User=nil. The mapper's generateUserProfiles then calls node.Owner().Model().ID, which dereferences the nil pointer and panics. Set node.User alongside node.UserID in processReauthTags. Also add defensive nil checks in generateUserProfiles to gracefully handle nodes with invalid owners rather than panicking. Fixes #3038	2026-02-04 15:44:55 +01:00
Shourya Gautam	4e1834adaf	db: use PolicyManager for RequestTags migration Refactor the RequestTags migration (202601121700-migrate-hostinfo-request-tags) to use PolicyManager.NodeCanHaveTag() instead of reimplementing tag validation. Changes: - NewHeadscaleDatabase now accepts types.Config to allow migrations access to policy configuration - Add loadPolicyBytes helper to load policy from file or DB based on config - Add standalone GetPolicy(tx gorm.DB) for use during migrations - Replace custom tag validation logic with PolicyManager Benefits: - Full HuJSON parsing support (not just JSON) - Proper group expansion via PolicyManager - Support for nested tags and autogroups - Works with both file and database policy modes - Single source of truth for tag validation Co-Authored-By: Shourya Gautam <shouryamgautam@gmail.com>	2026-01-21 15:10:29 +01:00
Kristoffer Dalby	424e26d636	db: migrate tests from check.v1 to testify Migrate all database tests from gopkg.in/check.v1 Suite-based testing to standard Go tests with testify assert/require. Changes: - Remove empty Suite files (hscontrol/suite_test.go, hscontrol/mapper/suite_test.go) - Convert hscontrol/db/suite_test.go to modern helpers only - Convert 6 Suite test methods in node_test.go to standalone tests - Convert 5 Suite test methods in api_key_test.go to standalone tests - Fix stale global variable reference in db_test.go The legacy TestListPeers Suite method was renamed to TestListPeersManyNodes to avoid conflict with the existing modern TestListPeers function, as they test different aspects (basic peer listing vs ID filtering).	2026-01-20 15:41:33 +01:00
Kristoffer Dalby	3b4b9a4436	hscontrol: fix tag updates not propagating to node self view When SetNodeTags changed a node's tags, the node's self view wasn't updated. The bug manifested as: the first SetNodeTags call updates the server but the client's self view doesn't update until a second call with the same tag. Root cause: Three issues combined to prevent self-updates: 1. SetNodeTags returned PolicyChange which doesn't set OriginNode, so the mapper's self-update check failed. 2. The Change.Merge function didn't preserve OriginNode, so when changes were batched together, OriginNode was lost. 3. generateMapResponse checked OriginNode only in buildFromChange(), but PolicyChange uses RequiresRuntimePeerComputation which bypasses that code path entirely and calls policyChangeResponse() instead. The fix addresses all three: - state.go: Set OriginNode on the returned change - change.go: Preserve OriginNode (and TargetNode) during merge - batcher.go: Pass isSelfUpdate to policyChangeResponse so the origin node gets both self info AND packet filters - mapper.go: Add includeSelf parameter to policyChangeResponse Fixes #2978	2026-01-20 10:13:47 +01:00
Kristoffer Dalby	72fcb93ef3	cli: ensure tagged-devices is included in profile list (#2991 )	2026-01-09 16:31:23 +01:00
Justin Angel	7be20912f5	oidc: make email verification configurable Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-12-18 11:42:32 +00:00
Kristoffer Dalby	82d4275c3b	mapper: correct some variable names missed from change Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-17 13:19:26 +01:00
Kristoffer Dalby	f3767dddf8	batcher: ensure removal from batcher Fixes #2924 Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-17 13:19:26 +01:00
Kristoffer Dalby	5767ca5085	change: smarter change notifications This commit replaces the ChangeSet with a simpler bool based change model that can be directly used in the map builder to build the appropriate map response based on the change that has occured. Previously, we fell back to sending full maps for a lot of changes as that was consider "the safe" thing to do to ensure no updates were missed. This was slightly problematic as a node that already has a list of peers will only do full replacement of the peers if the list is non-empty, meaning that it was not possible to remove all nodes (if for example policy changed). Now we will keep track of last seen nodes, so we can send remove ids, but also we are much smarter on how we send smaller, partial maps when needed. Fixes #2389 Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-16 10:12:36 +01:00
Kristoffer Dalby	616c0e895d	batcher: fix closed panic Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-12-15 16:28:27 +01:00
Kristoffer Dalby	642073f4b8	types: add option to disable taildrop, improve tests (#2955 )	2025-12-12 11:35:16 +01:00
Kristoffer Dalby	87bd67318b	golangci-lint: use forbidigo to block time.Sleep (#2946 )	2025-12-10 16:45:59 +00:00
Kristoffer Dalby	0e1673041c	all: remove deadcode (#2952 )	2025-12-10 15:55:15 +01:00
Kristoffer Dalby	c8376e44a2	mapper: move tail node conversion to node type (#2950 )	2025-12-10 09:16:22 +01:00
Kristoffer Dalby	22ee2bfc9c	tags: process tags on registration, simplify policy (#2931 ) This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model. When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users. A node can either be owned by a user, or a tag. Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time. Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code. Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different. A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.	2025-12-08 18:51:07 +01:00
Kristoffer Dalby	eb788cd007	make tags first class node owner (#2885 ) This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation. There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt. Updates #2417 Closes #2619	2025-12-02 12:01:25 +01:00
Kristoffer Dalby	eec196d200	modernize: run gopls modernize to bring up to 1.25 (#2920 )	2025-12-01 19:40:25 +01:00
Kristoffer Dalby	db293e0698	hscontrol/state: make NodeStore batch configuration tunable (#2886 )	2025-11-28 16:38:29 +01:00
Kristoffer Dalby	7fb0f9a501	batcher: send endpoint and derp only updates. (#2856 )	2025-11-13 20:38:49 +01:00
Kristoffer Dalby	d7a43a7cf1	state: use AllApprovedRoutes instead of SubnetRoutes Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-11-02 13:19:59 +01:00
Kristoffer Dalby	2bf1200483	policy: fix autogroup:self propagation and optimize cache invalidation (#2807 )	2025-10-23 17:57:41 +02:00
Florian Preinstorfer	46477b8021	Downgrade completed broadcast message to debug	2025-10-18 07:56:59 +02:00
Vitalij Dovhanyc	c2a58a304d	feat: add autogroup:self (#2789 )	2025-10-16 12:59:52 +02:00
Kristoffer Dalby	ed3a9c8d6d	mapper: send change instead of full update (#2775 )	2025-09-17 14:23:21 +02:00
Kristoffer Dalby	d41fb4d540	app: fix sigint hanging When the node notifier was replaced with batcher, we removed its closing, but forgot to add the batchers so it was never stopping node connections and waiting forever. Fixes #2751 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-11 11:53:26 +02:00
Kristoffer Dalby	233dffc186	lint and leftover Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-09 09:40:00 +02:00
Kristoffer Dalby	9d236571f4	state/nodestore: in memory representation of nodes Initial work on a nodestore which stores all of the nodes and their relations in memory with relationship for peers precalculated. It is a copy-on-write structure, replacing the "snapshot" when a change to the structure occurs. It is optimised for reads, and while batches are not fast, they are grouped together to do less of the expensive peer calculation if there are many changes rapidly. Writes will block until commited, while reads are never blocked. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-09 09:40:00 +02:00
Kristoffer Dalby	b6d5788231	mapper: produce map before poll Before this patch, we would send a message to each "node stream" that there is an update that needs to be turned into a mapresponse and sent to a node. Producing the mapresponse is a "costly" afair which means that while a node was producing one, it might start blocking and creating full queues from the poller and all the way up to where updates where sent. This could cause updates to time out and being dropped as a bad node going away or spending too time processing would cause all the other nodes to not get any updates. In addition, it contributed to "uncontrolled parallel processing" by potentially doing too many expensive operations at the same time: Each node stream is essentially a channel, meaning that if you have 30 nodes, we will try to process 30 map requests at the same time. If you have 8 cpu cores, that will saturate all the cores immediately and cause a lot of wasted switching between the processing. Now, all the maps are processed by workers in the mapper, and the number of workers are controlable. These would now be recommended to be a bit less than number of CPU cores, allowing us to process them as fast as we can, and then send them to the poll. When the poll recieved the map, it is only responsible for taking it and sending it to the node. This might not directly improve the performance of Headscale, but it will likely make the performance a lot more consistent. And I would argue the design is a lot easier to reason about. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-09 09:40:00 +02:00
Kristoffer Dalby	8e25f7f9dd	bunch of qol (#2748 )	2025-08-27 17:09:13 +02:00
Kristoffer Dalby	b87567628a	derp: increase update frequency and harden on failures (#2741 )	2025-08-22 10:40:38 +02:00
Kristoffer Dalby	a058bf3cd3	mapper: produce map before poll (#2628 )	2025-07-28 11:15:53 +02:00
Kristoffer Dalby	c6d7b512bd	integration: replace time.Sleep with assert.EventuallyWithT (#2680 )	2025-07-10 23:38:55 +02:00
Kristoffer Dalby	73023c2ec3	all: use immutable node view in read path This commit changes most of our (*)types.Node to types.NodeView, which is a readonly version of the underlying node ensuring that there is no mutations happening in the read path. Based on the migration, there didnt seem to be any, but the idea here is to prevent it in the future and simplify other new implementations. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-07-07 21:28:59 +01:00
Kristoffer Dalby	1553f0ab53	state: introduce state this commit moves all of the read and write logic, and all different parts of headscale that manages some sort of persistent and in memory state into a separate package. The goal of this is to clearly define the boundry between parts of the app which accesses and modifies data, and where it happens. Previously, different state (routes, policy, db and so on) was used directly, and sometime passed to functions as pointers. Now all access has to go through state. In the initial implementation, most of the same functions exists and have just been moved. In the future centralising this will allow us to optimise bottle necks with the database (in memory state) and make the different parts talking to eachother do so in the same way across headscale components. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-06-24 07:58:54 +02:00
Kristoffer Dalby	a52f1df180	policy: remove v1 code (#2600 ) * policy: remove v1 code Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * db: update test with v1 removal Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * integration: start moving to v2 policy Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: add ssh unmarshal tests Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * changelog: add entry Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: remove v1 comment Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * integration: remove comment out case Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * cleanup skipv1 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: remove v1 prefix workaround Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: add all node ips if prefix/host is ts ip Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> --------- Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-05-20 13:57:26 +02:00
Kristoffer Dalby	45e38cb080	policy: reduce routes sent to peers based on packetfilter (#2561 ) * notifier: use convenience funcs Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: reduce routes based on policy Fixes #2365 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * hsic: more helper methods Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: more test cases Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * integration: add route with filter acl integration test Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * integration: correct route reduce test, now failing Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * mapper: compare peer routes against node Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * hs: more output to debug strings Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * types/node: slice.ContainsFunc Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * policy: more reduce route test Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> * changelog: add entry for route filter Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com> --------- Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-05-04 21:52:47 +02:00

1 2 3

116 Commits