When send() is called on a node with zero active connections
(disconnected but kept for rapid reconnection), it returns nil
(success). handleNodeChange then calls updateSentPeers, recording
peers as delivered when no client received the data.
This corrupts lastSentPeers: future computePeerDiff calculations
produce wrong results because they compare against phantom state.
After reconnection, the node's initial map resets lastSentPeers,
but any changes processed during the disconnect window leave
stale entries that cause asymmetric peer visibility.
Return errNoActiveConnections from send() when there are no
connections. handleNodeChange treats this as a no-op (the change
was generated but not deliverable) and skips updateSentPeers,
keeping lastSentPeers consistent with what clients actually
received.
The Batcher's connected field (*xsync.Map[types.NodeID, *time.Time])
encoded three states via pointer semantics:
- nil value: node is connected
- non-nil time: node disconnected at that timestamp
- key missing: node was never seen
This was error-prone (nil meaning 'connected' inverts Go idioms),
redundant with b.nodes + hasActiveConnections(), and required keeping
two parallel maps in sync. It also contained a bug in RemoveNode where
new(time.Now()) was used instead of &now, producing a zero time.
Replace the separate connected map with a disconnectedAt field on
multiChannelNodeConn (atomic.Pointer[time.Time]), tracked directly
on the object that already manages the node's connections.
Changes:
- Add disconnectedAt field and helpers (markConnected, markDisconnected,
isConnected, offlineDuration) to multiChannelNodeConn
- Remove the connected field from Batcher
- Simplify IsConnected from two map lookups to one
- Simplify ConnectedMap and Debug from two-map iteration to one
- Rewrite cleanupOfflineNodes to scan b.nodes directly
- Remove the markDisconnectedIfNoConns helper
- Update all tests and benchmarks
Fixes#3141
- TestBatcher_CloseBeforeStart_DoesNotHang: verifies Close() before
Start() returns promptly now that done is initialized in NewBatcher.
- TestBatcher_QueueWorkAfterClose_DoesNotHang: verifies queueWork
returns via the done channel after Close(), even without Start().
- TestIsConnected_FalseAfterAddNodeFailure: verifies IsConnected
returns false after AddNode fails and removes the last connection.
- TestRemoveConnectionAtIndex_NilsTrailingSlot: verifies the backing
array slot is nil-ed after removal to avoid retaining pointers.
Updates #2545
Add four unit tests guarding fixes introduced in recent commits:
- TestConnectionEntry_SendFastPath_TimerStopped: verifies the
time.NewTimer fix (H1) does not leak goroutines after many
fast-path sends on a buffered channel.
- TestBatcher_CloseWaitsForWorkers: verifies Close() blocks until all
worker goroutines exit (H3), preventing sends on torn-down channels.
- TestBatcher_CloseThenStartIsNoop: verifies the one-shot lifecycle
contract; Start() after Close() must not spawn new goroutines.
- TestBatcher_CloseStopsTicker: verifies Close() stops the internal
ticker to prevent resource leaks.
Updates #2545
Add comprehensive unit tests for the LockFreeBatcher covering
AddNode/RemoveNode lifecycle, addToBatch routing (broadcast, targeted,
full update), processBatchedChanges deduplication, cleanup of offline
nodes, close/shutdown behavior, IsConnected state tracking, and
connected map consistency.
Add benchmarks for connection entry send, multi-channel send and
broadcast, peer diff computation, sentPeers updates, addToBatch at
various scales (10/100/1000 nodes), processBatchedChanges, broadcast
delivery, IsConnected lookups, connected map enumeration, connection
churn, and concurrent send+churn scenarios.
Widen setupBatcherWithTestData to accept testing.TB so benchmarks can
reuse the same database-backed test setup as unit tests.