headscale

mirror of https://github.com/juanfont/headscale.git synced 2026-03-24 10:21:33 +01:00

Author	SHA1	Message	Date
Tanayk07	568baf3d02	fix: align banner right-side border to consistent 64-char width	2026-03-19 07:08:35 +01:00
Tanayk07	5105033224	feat: add prominent warning banner for non-standard IP prefixes Add a highly visible ASCII-art warning banner that is printed at startup when the configured IP prefixes fall outside the standard Tailscale CGNAT (100.64.0.0/10) or ULA (fd7a:115c:a1e0::/48) ranges. The warning fires once even if both v4 and v6 are non-standard, and the warnBanner() function is reusable for other critical configuration warnings in the future. Also updates config-example.yaml to clarify that subsets of the default ranges are fine, but ranges outside CGNAT/ULA are not. Closes #3055	2026-03-19 07:08:35 +01:00
Kristoffer Dalby	3d53f97c82	hscontrol/servertest: fix test expectations for eventual consistency Three corrections to issue tests that had wrong assumptions about when data becomes available: 1. initial_map_should_include_peer_online_status: use WaitForCondition instead of checking the initial netmap. Online status is set by Connect() which sends a PeerChange patch after the initial RegisterResponse, so it may not be present immediately. 2. disco_key_should_propagate_to_peers: use WaitForCondition. The DiscoKey is sent in the first MapRequest (not RegisterRequest), so peers may not see it until a subsequent map update. 3. approved_route_without_announcement: invert the test expectation. Tailscale uses a strict advertise-then-approve model -- routes are only distributed when the node advertises them (Hostinfo.RoutableIPs) AND they are approved. An approval without advertisement is a dormant pre-approval. The test now asserts the route does NOT appear in AllowedIPs, matching upstream Tailscale semantics. Also fix TestClient.Reconnect to clear the cached netmap and drain pending updates before re-registering. Without this, WaitForPeers returned immediately based on the old session's stale data.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	1053fbb16b	hscontrol/state: fix online status reset during re-registration Two fixes to how online status is handled during registration: 1. Re-registration (applyAuthNodeUpdate, HandleNodeFromPreAuthKey) no longer resets IsOnline to false. Online status is managed exclusively by Connect()/Disconnect() in the poll session lifecycle. The reset caused a false offline blip: the auth handler's change notification triggered a map regeneration showing the node as offline to peers, even though Connect() would set it back to true moments later. 2. New node creation (createAndSaveNewNode) now explicitly sets IsOnline=false instead of leaving it nil. This ensures peers always receive a known online status rather than an ambiguous nil/unknown.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	b09af3846b	hscontrol/poll,state: fix grace period disconnect TOCTOU race When a node disconnects, serveLongPoll defers a cleanup that starts a grace period goroutine. This goroutine polls batcher.IsConnected() and, if the node has not reconnected within ~10 seconds, calls state.Disconnect() to mark it offline. A TOCTOU race exists: the node can reconnect (calling Connect()) between the IsConnected check and the Disconnect() call, causing the stale Disconnect() to overwrite the new session's online status. Fix with a monotonic per-node generation counter: - State.Connect() increments the counter and returns the current generation alongside the change list. - State.Disconnect() accepts the generation from the caller and rejects the call if a newer generation exists, making stale disconnects from old sessions a no-op. - serveLongPoll captures the generation at Connect() time and passes it to Disconnect() in the deferred cleanup. - RemoveNode's return value is now checked: if another session already owns the batcher slot (reconnect happened), the old session skips the grace period entirely. Update batcher_test.go to track per-node connect generations and pass them through to Disconnect(), matching production behavior. Fixes the following test failures: - server_state_online_after_reconnect_within_grace - update_history_no_false_offline - nodestore_correct_after_rapid_reconnect - rapid_reconnect_peer_never_sees_offline	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	00c41b6422	hscontrol/servertest: add race, stress, and poll race tests Add three test files designed to stress the control plane under concurrent and adversarial conditions: - race_test.go: 14 tests exercising concurrent mutations, session replacement, batcher contention, NodeStore access, and map response delivery during disconnect. All pass the Go race detector. - poll_race_test.go: 8 tests targeting the poll.go grace period interleaving. These confirm a logical TOCTOU race: when a node disconnects and reconnects within the grace period, the old session's deferred Disconnect() can overwrite the new session's Connect(), leaving IsOnline=false despite an active poll session. - stress_test.go: sustained churn, rapid mutations, rolling replacement, data integrity checks under load, and verification that rapid reconnects do not leak false-offline notifications. Known failing tests (grace period TOCTOU race): - server_state_online_after_reconnect_within_grace - update_history_no_false_offline - rapid_reconnect_peer_never_sees_offline	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	ab4e205ce7	hscontrol/servertest: expand issue tests to 24 scenarios, surface 4 issues Split TestIssues into 7 focused test functions to stay under cyclomatic complexity limits while testing more aggressively. Issues surfaced (4 failing tests): 1. initial_map_should_include_peer_online_status: Initial MapResponse has Online=nil for peers. Online status only arrives later via PeersChangedPatch. 2. disco_key_should_propagate_to_peers: DiscoPublicKey set by client is not visible to peers. Peers see zero disco key. 3. approved_route_without_announcement_is_visible: Server-side route approval without client-side announcement silently produces empty SubnetRoutes (intersection of empty announced + approved = empty). 4. nodestore_correct_after_rapid_reconnect: After 5 rapid reconnect cycles, NodeStore reports node as offline despite having an active poll session. The connect/disconnect grace period interleaving leaves IsOnline in an incorrect state. Passing tests (20) verify: - IP uniqueness across 10 nodes - IP stability across reconnect - New peers have addresses immediately - Node rename propagates to peers - Node delete removes from all peer lists - Hostinfo changes (OS field) propagate - NodeStore/DB consistency after route mutations - Grace period timing (8-20s window) - Ephemeral node deletion (not just offline) - 10-node simultaneous connect convergence - Rapid sequential node additions - Reconnect produces complete map - Cross-user visibility with default policy - Same-user multiple nodes get distinct IDs - Same-hostname nodes get unique GivenNames - Policy change during connect still converges - DERP region references are valid - User profiles present for self and peers - Self-update arrives after route approval - Route advertisement stored as AnnouncedRoutes	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	f87b08676d	hscontrol/servertest: add policy, route, ephemeral, and content tests Extend the servertest harness with: - TestClient.Direct() accessor for advanced operations - TestClient.WaitForPeerCount and WaitForCondition helpers - TestHarness.ChangePolicy for ACL policy testing - AssertDERPMapPresent and AssertSelfHasAddresses New test suites: - content_test.go: self node, DERP map, peer properties, user profiles, update history monotonicity, and endpoint update propagation - policy_test.go: default allow-all, explicit policy, policy triggers updates on all nodes, multiple policy changes, multi-user mesh - ephemeral_test.go: ephemeral connect, cleanup after disconnect, mixed ephemeral/regular, reconnect prevents cleanup - routes_test.go: addresses in AllowedIPs, route advertise and approve, advertised routes via hostinfo, CGNAT range validation Also fix node_departs test to use WaitForCondition instead of assert.Eventually, and convert concurrent_join_and_leave to interleaved_join_and_leave with grace-period-tolerant assertions.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	ca7362e9aa	hscontrol/servertest: add control plane lifecycle and consistency tests Add three test files exercising the servertest harness: - lifecycle_test.go: connection, disconnection, reconnection, session replacement, and mesh formation at various sizes. - consistency_test.go: symmetric visibility, consistent peer state, address presence, concurrent join/leave convergence. - weather_test.go: rapid reconnects, flapping stability, reconnect with various delays, concurrent reconnects, and scale tests. All tests use table-driven patterns with subtests.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	0288614bdf	hscontrol: add servertest harness for in-process control plane testing Add a new hscontrol/servertest package that provides a test harness for exercising the full Headscale control protocol in-process, using Tailscale's controlclient.Direct as the client. The harness consists of: - TestServer: wraps a Headscale instance with an httptest.Server - TestClient: wraps controlclient.Direct with NetworkMap tracking - TestHarness: orchestrates N clients against a single server - Assertion helpers for mesh completeness, visibility, and consistency Export minimal accessor methods on Headscale (HTTPHandler, NoisePublicKey, GetState, SetServerURL, StartBatcher, StartEphemeralGC) so the servertest package can construct a working server from outside the hscontrol package. This enables fast, deterministic tests of connection lifecycle, update propagation, and network weather scenarios without Docker.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	82c7efccf8	mapper/batcher: serialize per-node work to prevent out-of-order delivery processBatchedChanges queued each pending change for a node as a separate work item. Since multiple workers pull from the same channel, two changes for the same node could be processed concurrently by different workers. This caused two problems: 1. MapResponses delivered out of order — a later change could finish generating before an earlier one, so the client sees stale state. 2. updateSentPeers and computePeerDiff race against each other — updateSentPeers does Clear() + Store() which is not atomic relative to a concurrent Range() in computePeerDiff. Bundle all pending changes for a node into a single work item so one worker processes them sequentially. Add a per-node workMu that serializes processing across consecutive batch ticks, preventing a second worker from starting tick N+1 while tick N is still in progress. Fixes #3140	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	87b8507ac9	mapper/batcher: replace connected map with per-node disconnectedAt The Batcher's connected field (xsync.Map[types.NodeID, time.Time]) encoded three states via pointer semantics: - nil value: node is connected - non-nil time: node disconnected at that timestamp - key missing: node was never seen This was error-prone (nil meaning 'connected' inverts Go idioms), redundant with b.nodes + hasActiveConnections(), and required keeping two parallel maps in sync. It also contained a bug in RemoveNode where new(time.Now()) was used instead of &now, producing a zero time. Replace the separate connected map with a disconnectedAt field on multiChannelNodeConn (atomic.Pointer[time.Time]), tracked directly on the object that already manages the node's connections. Changes: - Add disconnectedAt field and helpers (markConnected, markDisconnected, isConnected, offlineDuration) to multiChannelNodeConn - Remove the connected field from Batcher - Simplify IsConnected from two map lookups to one - Simplify ConnectedMap and Debug from two-map iteration to one - Rewrite cleanupOfflineNodes to scan b.nodes directly - Remove the markDisconnectedIfNoConns helper - Update all tests and benchmarks Fixes #3141	2026-03-16 02:22:56 -07:00
Kristoffer Dalby	60317064fd	mapper/batcher: serialize per-node work to prevent out-of-order delivery processBatchedChanges queued each pending change for a node as a separate work item. Since multiple workers pull from the same channel, two changes for the same node could be processed concurrently by different workers. This caused two problems: 1. MapResponses delivered out of order — a later change could finish generating before an earlier one, so the client sees stale state. 2. updateSentPeers and computePeerDiff race against each other — updateSentPeers does Clear() + Store() which is not atomic relative to a concurrent Range() in computePeerDiff. Bundle all pending changes for a node into a single work item so one worker processes them sequentially. Add a per-node workMu that serializes processing across consecutive batch ticks, preventing a second worker from starting tick N+1 while tick N is still in progress. Fixes #3140	2026-03-16 02:22:46 -07:00
Juan Font	4d427cfe2a	noise: limit request body size to prevent unauthenticated OOM The Noise handshake accepts any machine key without checking registration, so all endpoints behind the Noise router are reachable without credentials. Three handlers used io.ReadAll without size limits, allowing an attacker to OOM-kill the server. Fix: - Add http.MaxBytesReader middleware (1 MiB) on the Noise router. - Replace io.ReadAll + json.Unmarshal with json.NewDecoder in PollNetMapHandler and RegistrationHandler. - Stop reading the body in NotImplementedHandler entirely.	2026-03-16 09:28:31 +01:00
Kristoffer Dalby	afd3a6acbc	mapper/batcher: remove disabled X-prefixed test functions Remove XTestBatcherChannelClosingRace (~95 lines) and XTestBatcherScalability (~515 lines). These were disabled by prefixing with X (making them invisible to go test) and served as dead code. The functionality they covered is exercised by the active test suite. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	feaf85bfbc	mapper/batcher: clean up test constants and output L8: Rename SCREAMING_SNAKE_CASE test constants to idiomatic Go camelCase. Remove highLoad* and extremeLoad* constants that were only referenced by disabled (X-prefixed) tests. L10: Fix misleading assert message that said "1337" while checking for region ID 999. L12: Remove emoji from test log output to avoid encoding issues in CI environments. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	86e279869e	mapper/batcher: minor production code cleanup L1: Replace crypto/rand with an atomic counter for generating connection IDs. These identifiers are process-local and do not need cryptographic randomness; a monotonic counter is cheaper and produces shorter, sortable IDs. L5: Use getActiveConnectionCount() in Debug() instead of directly locking the mutex and reading the connections slice. This avoids bypassing the accessor that already exists for this purpose. L6: Extract the hardcoded 15*time.Minute cleanup threshold into the named constant offlineNodeCleanupThreshold. L7: Inline the trivial addWork wrapper; AddWork now calls addToBatch directly. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	7881f65358	mapper: extract node connection types to node_conn.go Move connectionEntry, multiChannelNodeConn, generateConnectionID, and all their methods from batcher.go into a dedicated file. This reduces batcher.go from ~1170 lines to ~800 and separates per-node connection management from batcher orchestration. Pure move — no logic changes. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	2d549e579f	mapper/batcher: add regression tests for M1, M3, M7 fixes - TestBatcher_CloseBeforeStart_DoesNotHang: verifies Close() before Start() returns promptly now that done is initialized in NewBatcher. - TestBatcher_QueueWorkAfterClose_DoesNotHang: verifies queueWork returns via the done channel after Close(), even without Start(). - TestIsConnected_FalseAfterAddNodeFailure: verifies IsConnected returns false after AddNode fails and removes the last connection. - TestRemoveConnectionAtIndex_NilsTrailingSlot: verifies the backing array slot is nil-ed after removal to avoid retaining pointers. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	50e8b21471	mapper/batcher: fix pointer retention, done-channel init, and connected-map races M7: Nil out trailing *connectionEntry pointers in the backing array after slice removal in removeConnectionAtIndexLocked and send(). Without this, the GC cannot collect removed entries until the slice is reallocated. M1: Initialize the done channel in NewBatcher instead of Start(). Previously, calling Close() or queueWork before Start() would select on a nil channel, blocking forever. Moving the make() to the constructor ensures the channel is always usable. M2: Move b.connected.Delete and b.totalNodes decrement inside the Compute callback in cleanupOfflineNodes. Previously these ran after the Compute returned, allowing a concurrent AddNode to reconnect between the delete and the bookkeeping update, which would wipe the fresh connected state. M3: Call markDisconnectedIfNoConns on AddNode error paths. Previously, when initial map generation or send timed out, the connection was removed but b.connected retained its old nil (= connected) value, making IsConnected return true for a node with zero connections. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	8e26651f2c	mapper/batcher: add regression tests for timer leak and Close lifecycle Add four unit tests guarding fixes introduced in recent commits: - TestConnectionEntry_SendFastPath_TimerStopped: verifies the time.NewTimer fix (H1) does not leak goroutines after many fast-path sends on a buffered channel. - TestBatcher_CloseWaitsForWorkers: verifies Close() blocks until all worker goroutines exit (H3), preventing sends on torn-down channels. - TestBatcher_CloseThenStartIsNoop: verifies the one-shot lifecycle contract; Start() after Close() must not spawn new goroutines. - TestBatcher_CloseStopsTicker: verifies Close() stops the internal ticker to prevent resource leaks. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	57a38b5678	mapper/batcher: reduce hot-path log verbosity Remove Caller(), channel pointer formatting (fmt.Sprintf("%p",...)), and mutex timing from send(), addConnection(), and removeConnectionByChannel(). Move per-broadcast summary and no-connection logs from Debug to Trace. Remove per-connection "attempting"/"succeeded" logs entirely; keep Warn for failures. These methods run on every MapResponse delivery, so the savings compound quickly under load. Updates #2545	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	051a38a4c4	mapper/batcher: track worker goroutines and stop ticker on Close Close() previously closed the done channel and returned immediately, without waiting for worker goroutines to exit. This caused goroutine leaks in tests and allowed workers to race with connection teardown. The ticker was also never stopped, leaking its internal goroutine. Add a sync.WaitGroup to track the doWork goroutine and every worker it spawns. Close() now calls wg.Wait() after signalling shutdown, ensuring all goroutines have exited before tearing down connections. Also stop the ticker to prevent resource leaks. Document that a Batcher must not be reused after Close().	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	3276bda0c0	mapper/batcher: replace time.After with NewTimer to avoid timer leak connectionEntry.send() is on the hot path: called once per connection per broadcast tick. time.After allocates a timer that sits in the runtime timer heap until it fires (50 ms), even when the channel send succeeds immediately. At 1000 connected nodes, every tick leaks 1000 timers into the heap, creating continuous GC pressure. Replace with time.NewTimer + defer timer.Stop() so the timer is removed from the heap as soon as the fast-path send completes.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	2058343ad6	mapper: remove Batcher interface, rename to Batcher struct Remove the Batcher interface since there is only one implementation. Rename LockFreeBatcher to Batcher and merge batcher_lockfree.go into batcher.go. Drop type assertions in debug.go now that mapBatcher is a concrete *mapper.Batcher pointer.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	9b24a39943	mapper/batcher: add scale benchmarks Add benchmarks that systematically test node counts from 100 to 50,000 to identify scaling limits and validate performance under load.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	3ebe4d99c1	mapper/batcher: reduce lock contention with two-phase send Rewrite multiChannelNodeConn.send() to use a two-phase approach: 1. RLock: snapshot connections slice (cheap pointer copy) 2. Unlock: send to all connections (50ms timeouts happen here) 3. Lock: remove failed connections by pointer identity Previously, send() held the write lock for the entire duration of sending to all connections. With N stale connections each timing out at 50ms, this blocked addConnection/removeConnection for N50ms. The two-phase approach holds the lock only for O(N) pointer operations, not for N50ms I/O waits.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	da33795e79	mapper/batcher: fix race conditions in cleanup and lookups Replace the two-phase Load-check-Delete in cleanupOfflineNodes with xsync.Map.Compute() for atomic check-and-delete. This prevents the TOCTOU race where a node reconnects between the hasActiveConnections check and the Delete call. Add nil guards on all b.nodes.Load() and b.nodes.Range() call sites to prevent nil pointer panics from concurrent cleanup races.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	57070680a5	mapper/batcher: restructure internals for correctness Move per-node pending changes from a shared xsync.Map on the batcher into multiChannelNodeConn, protected by a dedicated mutex. The new appendPending/drainPending methods provide atomic append and drain operations, eliminating data races in addToBatch and processBatchedChanges. Add sync.Once to multiChannelNodeConn.close() to make it idempotent, preventing panics from concurrent close calls on the same channel. Add started atomic.Bool to guard Start() against being called multiple times, preventing orphaned goroutines. Add comprehensive concurrency tests validating these changes.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	21e02e5d1f	mapper/batcher: add unit tests and benchmarks Add comprehensive unit tests for the LockFreeBatcher covering AddNode/RemoveNode lifecycle, addToBatch routing (broadcast, targeted, full update), processBatchedChanges deduplication, cleanup of offline nodes, close/shutdown behavior, IsConnected state tracking, and connected map consistency. Add benchmarks for connection entry send, multi-channel send and broadcast, peer diff computation, sentPeers updates, addToBatch at various scales (10/100/1000 nodes), processBatchedChanges, broadcast delivery, IsConnected lookups, connected map enumeration, connection churn, and concurrent send+churn scenarios. Widen setupBatcherWithTestData to accept testing.TB so benchmarks can reuse the same database-backed test setup as unit tests.	2026-03-14 02:52:28 -07:00
Kristoffer Dalby	3e0a96ec3a	all: fix test flakiness and improve test infrastructure Buffer the AuthRequest verdict channel to prevent a race where the sender blocks indefinitely if the receiver has already timed out, and increase the auth followup test timeout from 100ms to 5s to prevent spurious failures under load. Skip postgres-backed tests when the postgres server is unavailable instead of calling t.Fatal, which was preventing the rest of the test suite from running. Add TestMain to db, types, and policy/v2 packages to chdir to the source directory before running tests. This ensures relative testdata/ paths resolve correctly when the test binary is executed from an arbitrary working directory (e.g., via "go tool stress").	2026-03-14 02:52:28 -07:00
DM	fffc58b5d0	poll: fix poll test linter violations	2026-03-12 01:27:34 -07:00
DM	4aca9d6568	poll: stop stale map sessions through an explicit teardown hook When stale-send cleanup prunes a connection from the batcher, the old serveLongPoll session needs an explicit stop signal. Pass a stop hook into AddNode and trigger it when that connection is removed, so the session exits through its normal cancel path instead of relying on channel closure from the batcher side.	2026-03-12 01:27:34 -07:00
DM	3daf45e88a	mapper: close stale map channels after send timeouts When the batcher timed out sending to a node, it removed the channel from multiChannelNodeConn but left the old serveLongPoll goroutine running on that channel. That left a live stale session behind: it no longer received new updates, but it could still keep the stream open and block shutdown. Close the pruned channel when stale-send cleanup removes it so the old map session exits after draining any buffered update.	2026-03-12 01:27:34 -07:00
DM	b81d6c734d	mapper: handle RemoveNode after channel cleanup A connection can already be removed from multiChannelNodeConn by the stale-send cleanup path before serveLongPoll reaches its deferred RemoveNode call. In that case RemoveNode used to return early on "channel not found" and never updated the node's connected state. Drop that early return so RemoveNode still checks whether any active connections remain and marks the node disconnected when the last one is gone.	2026-03-12 01:27:34 -07:00
DM	8423af2732	Swap favicon for updated version	2026-03-03 05:59:40 +01:00
Florian Preinstorfer	9baa795ddb	Update docs for auth-id changes - Replace "headscale nodes register" with "headscale auth register" - Update from registration key to Auth ID - Fix API example to register a node	2026-03-01 13:38:22 +01:00
Kristoffer Dalby	6c59d3e601	policy/v2: add SSH compatibility testdata from Tailscale SaaS Add 39 test fixtures captured from Tailscale SaaS API responses to validate SSH policy compilation parity. Each JSON file contains the SSH policy section and expected compiled SSHRule arrays for 5 test nodes (3 user-owned, 2 tagged). Test series: SSH-A (basic), SSH-B (specific sources), SSH-C (destination combos), SSH-D (localpart), SSH-E (edge cases), SSH-F (multi-rule), SSH-G (acceptEnv). The data-driven TestSSHDataCompat harness uses cmp.Diff with principal order tolerance but strict rule ordering (first-match-wins semantics require exact order). Updates #3049	2026-02-28 05:14:11 -08:00
Kristoffer Dalby	0acf09bdd2	policy/v2: add localpart:@domain SSH user compilation Add support for localpart:@<domain> entries in SSH policy users. When a user SSHes into a target, their email local-part becomes the OS username (e.g. alice@example.com → OS user alice). Type system (types.go): - SSHUser.IsLocalpart() and ParseLocalpart() for validation - SSHUsers.LocalpartEntries(), NormalUsers(), ContainsLocalpart() - Enforces format: localpart:@<domain> (wildcard-only) - UserWildcard.Resolve for user:@domain SSH source aliases - acceptEnv passthrough for SSH rules Compilation (filter.go): - resolveLocalparts: pure function mapping users to local-parts by email domain. No node walking, easy to test. - groupSourcesByUser: single walk producing per-user principals with sorted user IDs, and tagged principals separately. - ipSetToPrincipals: shared helper replacing 6 inline copies. - selfPrincipalsForNode: self-access using pre-computed byUser. The approach separates data gathering from rule assembly. Localpart rules are interleaved per source user to match Tailscale SaaS first-match-wins ordering. Updates #3049	2026-02-28 05:14:11 -08:00
QEDeD	414d3bbbd8	Fix typo in comment about fsnotify behavior Correct loose (opposite of tight) to lose (opposite of keep).	2026-02-27 15:23:06 +01:00
DM	610c1daa4d	types: avoid NodeView clone in CanAccess NodeView.CanAccess called node2.AsStruct() on every check. In peer-map construction we run CanAccess in O(n^2) pair scans (often twice per pair), so that per-call clone multiplied into large heap churn	2026-02-26 19:15:07 -08:00
Kristoffer Dalby	3db0a483ed	integration: add SSH check mode tests Add ReadLog method to headscale integration container for log inspection. Split SSH check mode tests into CLI and OIDC variants and add comprehensive test coverage: - TestSSHOneUserToOneCheckModeCLI: basic check mode with CLI approval - TestSSHOneUserToOneCheckModeOIDC: check mode with OIDC approval - TestSSHCheckModeUnapprovedTimeout: rejection on cache expiry - TestSSHCheckModeCheckPeriodCLI: session expiry and re-auth - TestSSHCheckModeAutoApprove: auto-approval within check period - TestSSHCheckModeNegativeCLI: explicit rejection via CLI Update existing integration tests to use headscale auth register. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	7bab8da366	state, policy, noise: implement SSH check period auto-approval Add SSH check period tracking so that recently authenticated users are auto-approved without requiring manual intervention each time. Introduce SSHCheckPeriod type with validation (min 1m, max 168h, "always" for every request) and encode the compiled check period as URL query parameters in the HoldAndDelegate URL. The SSHActionHandler checks recorded auth times before creating a new HoldAndDelegate flow. Auth timestamps are stored in-memory: - Default period (no explicit checkPeriod): auth covers any destination, keyed by source node with Dst=0 sentinel - Explicit period: auth covers only that specific destination, keyed by (source, destination) pair Auth times are cleared on policy changes. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	48cc98b787	hscontrol, cli: add auth register and approve commands Implement AuthRegister and AuthApprove gRPC handlers and add corresponding CLI commands (headscale auth register, approve, reject) for managing pending auth requests including SSH check approvals. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	107c2f2f70	policy, noise: implement SSH check action Implement the SSH "check" action which requires additional verification before allowing SSH access. The policy compiler generates a HoldAndDelegate URL that the Tailscale client calls back to headscale. The SSHActionHandler creates an auth session and waits for approval via the generalised auth flow. Sort check (HoldAndDelegate) rules before accept rules to match Tailscale's first-match-wins evaluation order. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	4a7e1475c0	templates: generalise auth templates for web and OIDC Extract shared HTML/CSS design into a common template and create generalised auth success and web auth templates that work for both node registration and SSH check authentication flows. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	cb3b6949ea	auth: generalise auth flow and introduce AuthVerdict Generalise the registration pipeline to a more general auth pipeline supporting both node registrations and SSH check auth requests. Rename RegistrationID to AuthID, unexport AuthRequest fields, and introduce AuthVerdict to unify the auth finish API. Add the urlParam generic helper for extracting typed URL parameters from chi routes, used by the new auth request handler. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	30338441c1	app: switch from gorilla to chi mux Replace gorilla/mux with go-chi/chi as the HTTP router and add a custom zerolog-based request logger to replace chi's default stdlib-based middleware.Logger, consistent with the rest of the application. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	8048f10d13	hscontrol/state: extract findExistingNodeForPAK to reduce complexity Extract the existing-node lookup logic from HandleNodeFromPreAuthKey into a separate method. This reduces the cyclomatic complexity from 32 to 28, below the gocyclo limit of 30. Updates #3077	2026-02-20 21:51:00 +01:00
Kristoffer Dalby	1e4fc3f179	hscontrol: add tests for deleting users with tagged nodes Test the tagged-node-survives-user-deletion scenario at two layers: DB layer (users_test.go): - success_user_only_has_tagged_nodes: tagged nodes with nil user_id do not block user deletion and survive it - error_user_has_tagged_and_owned_nodes: user-owned nodes still block deletion even when tagged nodes coexist App layer (grpcv1_test.go): - TestDeleteUser_TaggedNodeSurvives: full registration flow with tagged PreAuthKey verifies nil UserID after registration, absence from nodesByUser index, user deletion succeeds, and tagged node remains in global node list Also update auth_tags_test.go assertions to expect nil UserID on tagged nodes, consistent with the new invariant. Updates #3077	2026-02-20 21:51:00 +01:00

1 2 3 4 5 ...

529 Commits