headscale

mirror of https://github.com/juanfont/headscale.git synced 2026-04-18 06:50:16 +02:00

Author	SHA1	Message	Date
Kristoffer Dalby	3ebe4d99c1	mapper/batcher: reduce lock contention with two-phase send Rewrite multiChannelNodeConn.send() to use a two-phase approach: 1. RLock: snapshot connections slice (cheap pointer copy) 2. Unlock: send to all connections (50ms timeouts happen here) 3. Lock: remove failed connections by pointer identity Previously, send() held the write lock for the entire duration of sending to all connections. With N stale connections each timing out at 50ms, this blocked addConnection/removeConnection for N50ms. The two-phase approach holds the lock only for O(N) pointer operations, not for N50ms I/O waits.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	da33795e79	mapper/batcher: fix race conditions in cleanup and lookups Replace the two-phase Load-check-Delete in cleanupOfflineNodes with xsync.Map.Compute() for atomic check-and-delete. This prevents the TOCTOU race where a node reconnects between the hasActiveConnections check and the Delete call. Add nil guards on all b.nodes.Load() and b.nodes.Range() call sites to prevent nil pointer panics from concurrent cleanup races.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	57070680a5	mapper/batcher: restructure internals for correctness Move per-node pending changes from a shared xsync.Map on the batcher into multiChannelNodeConn, protected by a dedicated mutex. The new appendPending/drainPending methods provide atomic append and drain operations, eliminating data races in addToBatch and processBatchedChanges. Add sync.Once to multiChannelNodeConn.close() to make it idempotent, preventing panics from concurrent close calls on the same channel. Add started atomic.Bool to guard Start() against being called multiple times, preventing orphaned goroutines. Add comprehensive concurrency tests validating these changes.	2026-03-14 02:52:28 -07:00
DM	4aca9d6568	poll: stop stale map sessions through an explicit teardown hook When stale-send cleanup prunes a connection from the batcher, the old serveLongPoll session needs an explicit stop signal. Pass a stop hook into AddNode and trigger it when that connection is removed, so the session exits through its normal cancel path instead of relying on channel closure from the batcher side.	2026-03-12 01:27:34 -07:00
DM	3daf45e88a	mapper: close stale map channels after send timeouts When the batcher timed out sending to a node, it removed the channel from multiChannelNodeConn but left the old serveLongPoll goroutine running on that channel. That left a live stale session behind: it no longer received new updates, but it could still keep the stream open and block shutdown. Close the pruned channel when stale-send cleanup removes it so the old map session exits after draining any buffered update.	2026-03-12 01:27:34 -07:00
DM	b81d6c734d	mapper: handle RemoveNode after channel cleanup A connection can already be removed from multiChannelNodeConn by the stale-send cleanup path before serveLongPoll reaches its deferred RemoveNode call. In that case RemoveNode used to return early on "channel not found" and never updated the node's connected state. Drop that early return so RemoveNode still checks whether any active connections remain and marks the node disconnected when the last one is gone.	2026-03-12 01:27:34 -07:00
Kristoffer Dalby	0f6d312ada	all: upgrade to Go 1.26rc2 and modernize codebase This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and adopts new language features. Toolchain updates: - go.mod: go 1.25.5 → go 1.26rc2 - flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26 - flake.nix: build golangci-lint from source with Go 1.26 - Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie - Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine - Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine - .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26 - cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2 - .pre-commit-config.yaml: simplify golangci-lint hook entry Code modernization using Go 1.26 features: - Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare - Replace ptr.To(x) with new(x) syntax - Replace errors.As with errors.AsType[T] Lint rule updates: - Add forbidigo rules to prevent regression to old patterns	2026-02-08 12:35:23 +01:00
Kristoffer Dalby	ce580f8245	all: fix golangci-lint issues (#3064 )	2026-02-06 21:45:32 +01:00
Kristoffer Dalby	3acce2da87	errors: rewrite errors to follow go best practices Errors should not start capitalised and they should not contain the word error or state that they "failed" as we already know it is an error Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	4a9a329339	all: use lowercase log messages Go style recommends that log messages and error strings should not be capitalized (unless beginning with proper nouns or acronyms) and should not end with punctuation. This change normalizes all zerolog .Msg() and .Msgf() calls to start with lowercase letters, following Go conventions and making logs more consistent across the codebase.	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	53cdeff129	hscontrol/mapper: use sub-loggers and zf constants Add sub-logger patterns to worker(), AddNode(), RemoveNode() and multiChannelNodeConn to eliminate repeated field calls. Use zf.* constants for consistent field naming. Changes in batcher_lockfree.go: - Add wlog sub-logger in worker() with worker.id context - Add log field to multiChannelNodeConn struct - Initialize mc.log with node.id in newMultiChannelNodeConn() - Add nlog sub-loggers in AddNode() and RemoveNode() - Update all connection methods to use mc.log Changes in batcher.go: - Use zf.NodeID and zf.Reason in handleNodeChange()	2026-02-06 07:40:29 +01:00
Justin Angel	7be20912f5	oidc: make email verification configurable Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-12-18 11:42:32 +00:00
Kristoffer Dalby	82d4275c3b	mapper: correct some variable names missed from change Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-17 13:19:26 +01:00
Kristoffer Dalby	f3767dddf8	batcher: ensure removal from batcher Fixes #2924 Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-17 13:19:26 +01:00
Kristoffer Dalby	5767ca5085	change: smarter change notifications This commit replaces the ChangeSet with a simpler bool based change model that can be directly used in the map builder to build the appropriate map response based on the change that has occured. Previously, we fell back to sending full maps for a lot of changes as that was consider "the safe" thing to do to ensure no updates were missed. This was slightly problematic as a node that already has a list of peers will only do full replacement of the peers if the list is non-empty, meaning that it was not possible to remove all nodes (if for example policy changed). Now we will keep track of last seen nodes, so we can send remove ids, but also we are much smarter on how we send smaller, partial maps when needed. Fixes #2389 Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2025-12-16 10:12:36 +01:00
Kristoffer Dalby	616c0e895d	batcher: fix closed panic Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-12-15 16:28:27 +01:00
Kristoffer Dalby	0e1673041c	all: remove deadcode (#2952 )	2025-12-10 15:55:15 +01:00
Kristoffer Dalby	2bf1200483	policy: fix autogroup:self propagation and optimize cache invalidation (#2807 )	2025-10-23 17:57:41 +02:00
Florian Preinstorfer	46477b8021	Downgrade completed broadcast message to debug	2025-10-18 07:56:59 +02:00
Kristoffer Dalby	d41fb4d540	app: fix sigint hanging When the node notifier was replaced with batcher, we removed its closing, but forgot to add the batchers so it was never stopping node connections and waiting forever. Fixes #2751 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-11 11:53:26 +02:00
Kristoffer Dalby	233dffc186	lint and leftover Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-09 09:40:00 +02:00
Kristoffer Dalby	9d236571f4	state/nodestore: in memory representation of nodes Initial work on a nodestore which stores all of the nodes and their relations in memory with relationship for peers precalculated. It is a copy-on-write structure, replacing the "snapshot" when a change to the structure occurs. It is optimised for reads, and while batches are not fast, they are grouped together to do less of the expensive peer calculation if there are many changes rapidly. Writes will block until commited, while reads are never blocked. Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>	2025-09-09 09:40:00 +02:00
Kristoffer Dalby	8e25f7f9dd	bunch of qol (#2748 )	2025-08-27 17:09:13 +02:00
Kristoffer Dalby	a058bf3cd3	mapper: produce map before poll (#2628 )	2025-07-28 11:15:53 +02:00

24 Commits