Commit Graph

3 Commits

Author SHA1 Message Date
Kristoffer Dalby
3ebe4d99c1 mapper/batcher: reduce lock contention with two-phase send
Rewrite multiChannelNodeConn.send() to use a two-phase approach:
1. RLock: snapshot connections slice (cheap pointer copy)
2. Unlock: send to all connections (50ms timeouts happen here)
3. Lock: remove failed connections by pointer identity

Previously, send() held the write lock for the entire duration of
sending to all connections. With N stale connections each timing out
at 50ms, this blocked addConnection/removeConnection for N*50ms.
The two-phase approach holds the lock only for O(N) pointer
operations, not for N*50ms I/O waits.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
da33795e79 mapper/batcher: fix race conditions in cleanup and lookups
Replace the two-phase Load-check-Delete in cleanupOfflineNodes with
xsync.Map.Compute() for atomic check-and-delete. This prevents the
TOCTOU race where a node reconnects between the hasActiveConnections
check and the Delete call.

Add nil guards on all b.nodes.Load() and b.nodes.Range() call sites
to prevent nil pointer panics from concurrent cleanup races.
2026-03-14 02:52:28 -07:00
Kristoffer Dalby
57070680a5 mapper/batcher: restructure internals for correctness
Move per-node pending changes from a shared xsync.Map on the batcher
into multiChannelNodeConn, protected by a dedicated mutex. The new
appendPending/drainPending methods provide atomic append and drain
operations, eliminating data races in addToBatch and
processBatchedChanges.

Add sync.Once to multiChannelNodeConn.close() to make it idempotent,
preventing panics from concurrent close calls on the same channel.

Add started atomic.Bool to guard Start() against being called
multiple times, preventing orphaned goroutines.

Add comprehensive concurrency tests validating these changes.
2026-03-14 02:52:28 -07:00