When a node disconnects, serveLongPoll defers a cleanup that starts a
grace period goroutine. This goroutine polls batcher.IsConnected() and,
if the node has not reconnected within ~10 seconds, calls
state.Disconnect() to mark it offline. A TOCTOU race exists: the node
can reconnect (calling Connect()) between the IsConnected check and
the Disconnect() call, causing the stale Disconnect() to overwrite
the new session's online status.
Fix with a monotonic per-node generation counter:
- State.Connect() increments the counter and returns the current
generation alongside the change list.
- State.Disconnect() accepts the generation from the caller and
rejects the call if a newer generation exists, making stale
disconnects from old sessions a no-op.
- serveLongPoll captures the generation at Connect() time and passes
it to Disconnect() in the deferred cleanup.
- RemoveNode's return value is now checked: if another session already
owns the batcher slot (reconnect happened), the old session skips
the grace period entirely.
Update batcher_test.go to track per-node connect generations and
pass them through to Disconnect(), matching production behavior.
Fixes the following test failures:
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- nodestore_correct_after_rapid_reconnect
- rapid_reconnect_peer_never_sees_offline
Remove XTestBatcherChannelClosingRace (~95 lines) and
XTestBatcherScalability (~515 lines). These were disabled by
prefixing with X (making them invisible to go test) and served
as dead code. The functionality they covered is exercised by the
active test suite.
Updates #2545
L8: Rename SCREAMING_SNAKE_CASE test constants to idiomatic Go
camelCase. Remove highLoad* and extremeLoad* constants that were
only referenced by disabled (X-prefixed) tests.
L10: Fix misleading assert message that said "1337" while checking
for region ID 999.
L12: Remove emoji from test log output to avoid encoding issues
in CI environments.
Updates #2545
Remove the Batcher interface since there is only one implementation.
Rename LockFreeBatcher to Batcher and merge batcher_lockfree.go into
batcher.go.
Drop type assertions in debug.go now that mapBatcher is a concrete
*mapper.Batcher pointer.
Move per-node pending changes from a shared xsync.Map on the batcher
into multiChannelNodeConn, protected by a dedicated mutex. The new
appendPending/drainPending methods provide atomic append and drain
operations, eliminating data races in addToBatch and
processBatchedChanges.
Add sync.Once to multiChannelNodeConn.close() to make it idempotent,
preventing panics from concurrent close calls on the same channel.
Add started atomic.Bool to guard Start() against being called
multiple times, preventing orphaned goroutines.
Add comprehensive concurrency tests validating these changes.
Add comprehensive unit tests for the LockFreeBatcher covering
AddNode/RemoveNode lifecycle, addToBatch routing (broadcast, targeted,
full update), processBatchedChanges deduplication, cleanup of offline
nodes, close/shutdown behavior, IsConnected state tracking, and
connected map consistency.
Add benchmarks for connection entry send, multi-channel send and
broadcast, peer diff computation, sentPeers updates, addToBatch at
various scales (10/100/1000 nodes), processBatchedChanges, broadcast
delivery, IsConnected lookups, connected map enumeration, connection
churn, and concurrent send+churn scenarios.
Widen setupBatcherWithTestData to accept testing.TB so benchmarks can
reuse the same database-backed test setup as unit tests.
When stale-send cleanup prunes a connection from the batcher, the old serveLongPoll session needs an explicit stop signal. Pass a stop hook into AddNode and trigger it when that connection is removed, so the session exits through its normal cancel path instead of relying on channel closure from the batcher side.
A connection can already be removed from multiChannelNodeConn by the stale-send cleanup path before serveLongPoll reaches its deferred RemoveNode call. In that case RemoveNode used to return early on "channel not found" and never updated the node's connected state.
Drop that early return so RemoveNode still checks whether any active connections remain and marks the node disconnected when the last one is gone.
Generalise the registration pipeline to a more general auth pipeline
supporting both node registrations and SSH check auth requests.
Rename RegistrationID to AuthID, unexport AuthRequest fields, and
introduce AuthVerdict to unify the auth finish API.
Add the urlParam generic helper for extracting typed URL parameters
from chi routes, used by the new auth request handler.
Updates #1850
Refactor the RequestTags migration (202601121700-migrate-hostinfo-request-tags)
to use PolicyManager.NodeCanHaveTag() instead of reimplementing tag validation.
Changes:
- NewHeadscaleDatabase now accepts *types.Config to allow migrations
access to policy configuration
- Add loadPolicyBytes helper to load policy from file or DB based on config
- Add standalone GetPolicy(tx *gorm.DB) for use during migrations
- Replace custom tag validation logic with PolicyManager
Benefits:
- Full HuJSON parsing support (not just JSON)
- Proper group expansion via PolicyManager
- Support for nested tags and autogroups
- Works with both file and database policy modes
- Single source of truth for tag validation
Co-Authored-By: Shourya Gautam <shouryamgautam@gmail.com>
This commit replaces the ChangeSet with a simpler bool based
change model that can be directly used in the map builder to
build the appropriate map response based on the change that
has occured. Previously, we fell back to sending full maps
for a lot of changes as that was consider "the safe" thing to
do to ensure no updates were missed.
This was slightly problematic as a node that already has a list
of peers will only do full replacement of the peers if the list
is non-empty, meaning that it was not possible to remove all
nodes (if for example policy changed).
Now we will keep track of last seen nodes, so we can send remove
ids, but also we are much smarter on how we send smaller, partial
maps when needed.
Fixes#2389
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
When the node notifier was replaced with batcher, we removed
its closing, but forgot to add the batchers so it was never
stopping node connections and waiting forever.
Fixes#2751
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.
It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.
Writes will block until commited, while reads are never
blocked.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.
Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.
This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.
In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:
Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.
Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.
When the poll recieved the map, it is only responsible for taking it and
sending it to the node.
This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>