compileFilterRules checked only pol.ACLs == nil to decide whether
to return FilterAllowAll (permit-any). Policies that use only Grants
(no ACLs) had nil ACLs, so the function short-circuited before
compiling any CapGrant rules. This meant cap/relay, cap/drive, and
any other App-based grant capabilities were silently ignored.
Check both ACLs and Grants are empty before returning FilterAllowAll.
Updates #2180
Via grants steer routes to specific nodes per viewer. Until now,
all clients saw the same routes for each peer because route
assembly was viewer-independent. This implements per-viewer route
visibility so that via-designated peers serve routes only to
matching viewers, while non-designated peers have those routes
withdrawn.
Add ViaRouteResult type (Include/Exclude prefix lists) and
ViaRoutesForPeer to the PolicyManager interface. The v2
implementation iterates via grants, resolves sources against the
viewer, matches destinations against the peer's advertised routes
(both subnet and exit), and categorizes prefixes by whether the
peer has the via tag.
Add RoutesForPeer to State which composes global primary election,
via Include/Exclude filtering, exit routes, and ACL reduction.
When no via grants exist, it falls back to existing behavior.
Update the mapper to call RoutesForPeer per-peer instead of using
a single route function for all peers. The route function now
returns all routes (subnet + exit), and TailNode filters exit
routes out of the PrimaryRoutes field for HA tracking.
Updates #2180
compileViaGrant only handled *Prefix destinations, skipping
*AutoGroup entirely. This meant via grants with
dst=[autogroup:internet] produced no filter rules even when the
node was an exit node with approved exit routes.
Switch the destination loop from a type assertion to a type switch
that handles both *Prefix (subnet routes) and *AutoGroup (exit
routes via autogroup:internet). Also check ExitRoutes() in
addition to SubnetRoutes() so the function doesn't bail early
when a node only has exit routes.
Updates #2180
Add autogroup:danger-all as a valid source alias that matches ALL IP
addresses including non-Tailscale addresses. When used as a source,
it resolves to 0.0.0.0/0 + ::/0 internally but produces SrcIPs: ["*"]
in filter rules. When used as a destination, it is rejected with an
error matching Tailscale SaaS behavior.
Key changes:
- Add AutoGroupDangerAll constant and validation
- Add sourcesHaveDangerAll() helper and hasDangerAll parameter to
srcIPsWithRoutes() across all compilation paths
- Add ErrAutogroupDangerAllDst for destination rejection
- Remove 3 AUTOGROUP_DANGER_ALL skip entries (K6, K7, K8)
Updates #2180
Add five categories of grant validation that Tailscale enforces:
1. Capability name format: reject URL schemes (://) and restrict
tailscale.com domain to an allowlist of user-grantable caps.
2. Grant-specific autogroup:self: reject wildcard (*) sources with
autogroup:self destinations (stricter than ACL rules since * includes
tags which cannot use autogroup:self).
3. App + autogroup:internet: reject app grants targeting
autogroup:internet.
4. Raw default route CIDRs: reject 0.0.0.0/0 and ::/0 as grant
destinations, requiring "*" or "autogroup:internet" instead.
5. Via field: non-tag values (e.g. autogroup:tagged) are caught at
unmarshal time by Tag.UnmarshalJSON validation.
This resolves 23 ERROR_VALIDATION_GAP + 1 via validation test, reducing
the grant compat skip list from 28 to 5 remaining tests.
Updates #2180
Tailscale SaaS accepts grants with empty src=[] or dst=[] arrays,
producing no filter rules for any node. Headscale previously rejected
these with validation errors.
Remove the empty source/destination validation checks and add an
early return in compileGrantWithAutogroupSelf when the grant has
literally empty Sources or Destinations arrays. This is distinct
from sources that resolve to empty (e.g., group:empty) where
Tailscale still produces CapGrant rules with empty SrcIPs.
Updates #2180
Compile grants with "via" field into FilterRules that are placed only
on nodes matching the via tag and actually advertising the destination
subnets. Key behavior:
- Filter rules go exclusively to via-nodes with matching approved routes
- Destination subnets not advertised by the via node are silently dropped
- App-only via grants (no ip field) produce no packet filter rules
- Via grants are skipped in the global compileFilterRules since they
are node-specific
Reduces grant compat test skips from 41 to 30 (11 newly passing).
Updates #2180
Compile grant app fields into CapGrant FilterRules matching Tailscale
SaaS behavior. Key changes:
- Generate CapGrant rules in compileFilterRules and
compileGrantWithAutogroupSelf, with node-specific /32 and /128
Dsts for autogroup:self grants
- Add reversed companion rules for drive→drive-sharer and
relay→relay-target capabilities, ordered by original cap name
- Narrow broad CapGrant Dsts to node-specific prefixes in
ReduceFilterRules via new reduceCapGrantRule helper
- Skip merging CapGrant rules in mergeFilterRules to preserve
per-capability structure
- Remove ip+app mutual exclusivity validation (Tailscale accepts both)
- Add semantic JSON comparison for RawMessage types and netip.Prefix
comparators in test infrastructure
Reduces grant compat test skips from 99 to 41 (58 newly passing).
Updates #2180
When an ACL source list contains a wildcard (*) alongside explicit
sources (tags, groups, hosts, etc.), Tailscale preserves the individual
IPs from non-wildcard sources in SrcIPs alongside the merged wildcard
CGNAT ranges. Previously, headscale's IPSetBuilder would merge all
sources into a single set, absorbing the explicit IPs into the wildcard
range.
Track non-wildcard resolved addresses separately during source
resolution, then append their individual IP strings to the output
when a wildcard is also present. This fixes the remaining 5 ACL
compat test failures (K01 and M06 subtests).
Updates #2180
When an ACL has non-autogroup destinations (groups, users, tags, hosts)
alongside autogroup:self, emit non-self grants before self grants to
match Tailscale's filter rule ordering. ACLs with only autogroup
destinations (self + member) preserve the policy-defined order.
This fixes ACL-A17, ACL-SF07, and ACL-SF11 compat test failures.
Updates #2180
Remove 10 grant skip entries for subnet route filter rule generation.
These tests now pass after the exit route exclusion fix in
ReduceFilterRules, which correctly handles routable IPs overlap
for subnet-router nodes.
Updates skip count from 207 to 197 (v1) and 109 to 99 (v2),
with 10 additional tests now expected to pass.
Updates #2180
Exit nodes handle traffic via AllowedIPs/routing, not packet filter
rules. Skip exit routes (0.0.0.0/0, ::/0) when checking RoutableIPs
overlap in ReduceFilterRules, matching Tailscale SaaS behavior where
exit nodes do not receive filter rules for destinations that only
overlap via exit routes.
Updates #2180
Per Tailscale documentation, the wildcard (*) source includes "any
approved subnets" — the actually-advertised-and-approved routes from
nodes, not the autoApprover policy prefixes.
Change Asterix.resolve() to return just the base CGNAT+ULA set, and
add approved subnet routes as separate SrcIPs entries in the filter
compilation path. This preserves individual route prefixes that would
otherwise be merged by IPSet (e.g., 10.0.0.0/8 absorbing 10.33.0.0/16).
Also swap rule ordering in compileGrantWithAutogroupSelf() to emit
non-self destination rules before autogroup:self rules, matching the
Tailscale FilterRule wire format ordering.
Remove the unused AutoApproverPolicy.prefixes() method.
Updates #2180
Add routable_ips and approved_routes fields to the node topology
definitions in all golden test files. These represent the subnet
routes actually advertised by nodes on the Tailscale SaaS network
during data capture:
Routes topology (92 files, 6 router nodes):
big-router: 10.0.0.0/8
subnet-router: 10.33.0.0/16
ha-router1: 192.168.1.0/24
ha-router2: 192.168.1.0/24
multi-router: 172.16.0.0/24
exit-node: 0.0.0.0/0, ::/0
ACL topology (199 files, 1 router node):
subnet-router: 10.33.0.0/16
Grants topology (203 files, 1 router node):
subnet-router: 10.33.0.0/16
The route assignments were deduced from the golden data by analyzing
which router nodes receive FilterRules for which destination CIDRs
across all test files, and cross-referenced with the MTS setup
script (setup_grant_nodes.sh).
Updates #2180
Use ip.String() instead of netip.PrefixFrom(ip, ip.BitLen()).String()
when building DstPorts for autogroup:self destinations. This produces
bare IPs like "100.90.199.68" instead of CIDR notation like
"100.90.199.68/32", matching the Tailscale FilterRule wire format.
Updates #2180
Remove 91 entries from grantSkipReasons that are now passing:
- 90 MISSING_IPV6_ADDRS entries (identity aliases now include IPv6)
- 1 RAW_IPV6_ADDR_EXPANSION entry (address aliases no longer expand)
Move GRANT-P09_12B from the removed MISSING_IPV6_ADDRS category to
SUBNET_ROUTE_FILTER_RULES, which is its remaining failure mode.
Updates #2180
Now that AppendToIPSet includes both IPv4 and IPv6, tests with
nodes that have IPv6 addresses produce additional entries in SrcIPs
and DstPorts. Update the expected values accordingly.
Updates #2180
AppendToIPSet now adds both IPv4 and IPv6 addresses for nodes,
matching Tailscale's FilterRule wire format where identity-based
aliases (tags, users, groups, autogroups) resolve to both address
families.
Address-based aliases (raw IPs, host names) are unchanged: they
resolve to exactly the literal prefix. The appendIfNodeHasIP helper
that incorrectly expanded address aliases to include the matching
node's other IPs is removed, fixing the RAW_IPV6_ADDR_EXPANSION
bug where a raw fd7a: IPv6 address would incorrectly include the
node's IPv4.
Updates #2180
Replace the headscale-adapted routes golden files with authoritative
captures from Tailscale SaaS using the 12-node topology (8 original
grant nodes + 4 new route-specific nodes: ha-router1, ha-router2,
big-router, multi-router).
The golden data was captured via debug-packet-filter-rules from all
12 nodes. The routes driver now falls back to the standard 3-user
setup when topology.users is absent (matching the SaaS capture
format) and converts @passkey/@dalby.cc emails to @example.com.
92 test cases captured, all valid JSON, all from Tailscale SaaS.
Updates #2180
Replace the headscale-adapted ACL golden files with authoritative
captures from Tailscale SaaS using the 8-node grant topology.
The golden data was captured via debug-packet-filter-rules (FilterRule
wire format) from each of the 8 nodes after pushing each ACL policy
to the Tailscale API. This gives us the exact format Tailscale sends
to clients:
- SrcIPs use IP ranges (100.64.0.0-100.115.91.255) not CIDRs
- SrcIPs include subnet routes (10.33.0.0/16) for wildcard sources
- IPProto is omitted for default all-protocol rules
- DstPorts use bare IPs without /32 suffix
- Identity aliases include both IPv4 and IPv6 addresses
The test driver is updated to use the 8-node topology (3 users,
5 tagged nodes) matching the grant compat tests, with the same
email conversion (kratail2tid@passkey -> @example.com).
215 test cases: 199 success + 16 error (captured from API 400s).
All captured from Tailscale SaaS, no headscale-adapted values.
Updates #2180
Replace 8,286 lines of inline Go struct test expectations in
tailscale_routes_compat_test.go with 92 JSON golden files in
testdata/routes_results/ROUTES-*.json and a ~300-line Go driver in
tailscale_routes_data_compat_test.go.
Unlike the ACL and grants compat tests which use shared hardcoded node
topologies, the routes driver builds nodes from JSON topology data.
Each test file embeds its full topology including routable_ips and
approved_routes, making test files self-contained. This naturally
handles the IPv6 tests which use a different 4-node topology from the
standard 9-node setup.
Test count is preserved: 92 test cases across 19 original test
functions (SubnetBasics, ExitNodes, HARouters, FilterPlacement,
RouteCoverage, Overlapping, TagResolution, ProtocolPort, IPv6,
EdgeCases, AutoApprover, and additional variants).
Updates #2180
Replace 9,937 lines of inline Go struct test expectations in
tailscale_acl_compat_test.go with 215 JSON golden files in
testdata/acl_results/ACL-*.json and a ~400-line Go driver in
tailscale_acl_data_compat_test.go.
This matches the pattern used by the grants compat tests
(testdata/grant_results/GRANT-*.json + tailscale_grants_compat_test.go)
and the SSH compat tests (testdata/ssh_results/SSH-*.json +
tailscale_ssh_data_compat_test.go).
The JSON golden files contain the same test expectations as the
original Go file, preserving the Tailscale SaaS reference data.
The expectations are NOT adapted to match headscale current output —
they represent the target behavior.
Test count is preserved: 215 test cases (203 success + 12 error).
Updates #2180
- TestTagUserMutualExclusivity and TestUserToTagCrossIdentityGrant:
add ReduceFilterRules after compileFilterRulesForNode to match the
production filter pipeline in filterForNodeLocked. The compilation
step produces global rules for all ACLs; ReduceFilterRules strips
them down to only rules where the node is a destination.
- containsSrcIP/containsIP helpers: use util.ParseIPSet to handle
IP range strings like "100.64.0.1-100.64.0.3" produced by
ipSetToStrings when contiguous IPs are coalesced.
Updates #2180
Fix several test issues exposed by the ResolvedAddresses refactor:
- TestTagUserMutualExclusivity: remove incorrect ACL rule that was
testing the wrong invariant. The test now correctly validates that
without an explicit cross-identity grant, user-owned nodes cannot
reach tagged nodes. Add TestUserToTagCrossIdentityGrant to verify
that explicit user@ -> tag:X ACL rules produce valid filter rules.
- TestResolvePolicy/wildcard-alias: update expected prefixes to match
the CGNAT range minus ChromeOS VM range (multiple prefixes instead
of the encompassing 100.64.0.0/10).
- TestApproveRoutesWithPolicy: fix user Name fields from "testuser@"
to "testuser" to match how resolveUser trims the @ suffix before
comparing against stored names.
Updates #2180
Fix three nil dereference issues in the policy resolution code:
- newResolvedAddresses: preserve partial IP results when errors occur
instead of discarding valid IPSets. Callers already handle errors
and nil results independently, so returning both allows partial
resolution (e.g. groups with phantom users) to work correctly.
- resolveTagOwners: guard against nil ResolvedAddresses before calling
Prefixes(), since Resolve may return nil when resolution fails.
- Asterix.resolve: guard against nil *Policy pointer, which occurs
when resolving wildcards without a policy context (e.g. in tests).
Updates #2180
Replace the monolithic SRCIPS_FORMAT skip category (125 tests) with 7
specific subcategories based on analysis of actual test failures:
MISSING_IPV6_ADDRS - 90 tests: identity aliases resolve to IPv4 only
SUBNET_ROUTE_FILTER_RULES - 10 tests: no rules for subnet-routed CIDRs
AUTOGROUP_SELF_CIDR_FORMAT - 4 tests: /32 and /128 suffix on DstPorts IPs
USER_PASSKEY_WILDCARD - 2 tests: user:*@passkey unresolvable
RAW_IPV6_ADDR_EXPANSION - 2 tests: raw IPv6 expanded to include IPv4
SRCIPS_WILDCARD_NODE_DEDUP - 1 test: wildcard+specific node IP dedup
Also reclassify tests that moved between categories after the CGNAT
split range fix (4 tests now passing, others recategorized into
CAPGRANT_COMPILATION, ERROR_VALIDATION_GAP, VIA_COMPILATION, etc).
Total: 207 skipped, 30 passing (was 193 skipped, 19 passing).
Update the grants compatibility test skip list with 23 new entries for
the V-series tests (V07 and V24 pass without skipping).
New skip categories introduced:
- VIA_COMPILATION (3): via routes with specific src identities where
there is no SrcIPs format issue (V11, V12, V13)
- Additional VIA_COMPILATION_AND_SRCIPS_FORMAT (3): via with wildcard
src (V17, V21, V23)
- Additional CAPGRANT_COMPILATION (6): app grants on specific tags,
drive cap, autogroup:self app (V02, V03, V06, V19, V20, V25)
- Additional CAPGRANT_COMPILATION_AND_SRCIPS_FORMAT (2): mixed ip+app
on specific tags rejected by headscale (V09, V10)
- Additional ERROR_VALIDATION_GAP (9): autogroup:internet + app,
raw 0.0.0.0/0 and ::/0 as grant dst (V01, V04, V05, V08, V14-V16,
V18, V22)
Test totals: 237 total, 21 pass, 216 skip, 0 fail.
Updates #2180
Add GRANT-V01 through GRANT-V25 JSON files captured from Tailscale SaaS
to fill coverage gaps in the grants compatibility test suite.
These tests cover:
- App grants on specific tags (not just wildcards)
- Mixed ip+app grants on specific tags
- Via routes with specific src identities (tags, groups, members)
- Via with multiple dst subnets and multiple via tags
- Drive cap with reverse drive-sharer generation
- autogroup:self with app grants
- autogroup:internet rejection with app grants
- Raw default route CIDR (0.0.0.0/0, ::/0) rejection as grant dst
Updates #2180
Add TestGrantsCompat, a data-driven test that validates headscale's
grants implementation against 212 test cases captured from Tailscale
SaaS. Each test case loads a GRANT-*.json file from testdata/, applies
the policy through headscale's engine, and compares the resulting
packet filter rules against Tailscale's actual output.
Currently 19 tests pass and 193 are skipped with documented reasons:
- SRCIPS_FORMAT (125): IP range formatting differences
- CAPGRANT_COMPILATION (41): app capability grants not yet compiled
- ERROR_VALIDATION_GAP (14): validation strictness differences
- CAPGRANT_AND_SRCIPS_FORMAT (9): combined ip+app grant issues
- VIA_AND_SRCIPS_FORMAT (4): via route compilation not implemented
- AUTOGROUP_DANGER_ALL (3): autogroup:danger-all not supported
- VALIDATION_STRICTNESS (2): empty src/dst array handling
Updates #2180
Add 212 GRANT-*.json test files captured from Tailscale SaaS to
testdata/grant_results/. Each file contains a policy with grants,
the expected packet_filter_rules for 8 test nodes, and the topology
used during capture.
These files serve as the ground truth for the data-driven grants
compatibility test.
Updates #2180
Rename the ACL compatibility test file to include 'acl' in the name,
making room for the upcoming grants compatibility test file.
Also fix a godoclint issue by adding a blank line between the file
header comment and the package declaration.
Updates #2180
Implement ProtocolPort.MarshalJSON to produce string format matching
UnmarshalJSON expectations (e.g. "tcp:443", "udp:10000-20000", "*").
Add comprehensive TestGrantMarshalJSON with 10 test cases:
- IP-based grants with TCP, UDP, ICMP, and wildcard protocols
- Single ports, port ranges, and wildcard ports
- Capability-based grants using app field
- Grants with both ip and app fields
- Grants with via field for route filtering
- Testing omitempty behavior for ip, app, and via fields
- JSON round-trip validation (marshal → unmarshal → compare)
Add omitempty tag to Grant.InternetProtocols to avoid marshaling
null when field is empty.
Updates #2180
Add test for aclToGrants() function that converts ACL rules to Grant
format. Tests conversion of:
- Single-port TCP rules
- Multiple ACL entries to multiple Grants
- Port ranges and multiple ports in a single rule
- Wildcard protocols
- UDP, ICMP, and other protocol types
Ensures backward compatibility by verifying that ACL rules are correctly
transformed to the new Grant format.
Updates #2180
Add comprehensive tests for Grant unmarshaling covering:
- Valid grants with ip field (network access)
- Valid grants with app field (capabilities)
- Wildcard port handling
- Port range parsing
- Error cases (missing fields, conflicting fields)
Updates #2180
Add support for the Grant policy format as an alternative to ACL format,
following Tailscale's policy v2 specification. Grants provide a more
structured way to define network access rules with explicit separation
of IP-based and capability-based permissions.
Key changes:
- Add Grant struct with Sources, Destinations, InternetProtocols (ip),
and App (capabilities) fields
- Add ProtocolPort type for unmarshaling protocol:port strings
- Add Grant validation in Policy.validate() to enforce:
- Mutual exclusivity of ip and app fields
- Required ip or app field presence
- Non-empty sources and destinations
- Refactor compileFilterRules to support both ACLs and Grants
- Convert ACLs to Grants internally via aclToGrants() for unified
processing
- Extract destinationsToNetPortRange() helper for cleaner code
- Rename parseProtocol() to toIANAProtocolNumbers() for clarity
- Add ProtocolNumberToName mapping for reverse lookups
The Grant format allows policies to be written using either the legacy
ACL format or the new Grant format. ACLs are converted to Grants
internally, ensuring backward compatibility while enabling the new
format's benefits.
Updates #2180
Add a highly visible ASCII-art warning banner that is printed at
startup when the configured IP prefixes fall outside the standard
Tailscale CGNAT (100.64.0.0/10) or ULA (fd7a:115c:a1e0::/48) ranges.
The warning fires once even if both v4 and v6 are non-standard, and
the warnBanner() function is reusable for other critical configuration
warnings in the future.
Also updates config-example.yaml to clarify that subsets of the
default ranges are fine, but ranges outside CGNAT/ULA are not.
Closes#3055
Three corrections to issue tests that had wrong assumptions about
when data becomes available:
1. initial_map_should_include_peer_online_status: use WaitForCondition
instead of checking the initial netmap. Online status is set by
Connect() which sends a PeerChange patch after the initial
RegisterResponse, so it may not be present immediately.
2. disco_key_should_propagate_to_peers: use WaitForCondition. The
DiscoKey is sent in the first MapRequest (not RegisterRequest),
so peers may not see it until a subsequent map update.
3. approved_route_without_announcement: invert the test expectation.
Tailscale uses a strict advertise-then-approve model -- routes are
only distributed when the node advertises them (Hostinfo.RoutableIPs)
AND they are approved. An approval without advertisement is a dormant
pre-approval. The test now asserts the route does NOT appear in
AllowedIPs, matching upstream Tailscale semantics.
Also fix TestClient.Reconnect to clear the cached netmap and drain
pending updates before re-registering. Without this, WaitForPeers
returned immediately based on the old session's stale data.
Two fixes to how online status is handled during registration:
1. Re-registration (applyAuthNodeUpdate, HandleNodeFromPreAuthKey) no
longer resets IsOnline to false. Online status is managed exclusively
by Connect()/Disconnect() in the poll session lifecycle. The reset
caused a false offline blip: the auth handler's change notification
triggered a map regeneration showing the node as offline to peers,
even though Connect() would set it back to true moments later.
2. New node creation (createAndSaveNewNode) now explicitly sets
IsOnline=false instead of leaving it nil. This ensures peers always
receive a known online status rather than an ambiguous nil/unknown.
When a node disconnects, serveLongPoll defers a cleanup that starts a
grace period goroutine. This goroutine polls batcher.IsConnected() and,
if the node has not reconnected within ~10 seconds, calls
state.Disconnect() to mark it offline. A TOCTOU race exists: the node
can reconnect (calling Connect()) between the IsConnected check and
the Disconnect() call, causing the stale Disconnect() to overwrite
the new session's online status.
Fix with a monotonic per-node generation counter:
- State.Connect() increments the counter and returns the current
generation alongside the change list.
- State.Disconnect() accepts the generation from the caller and
rejects the call if a newer generation exists, making stale
disconnects from old sessions a no-op.
- serveLongPoll captures the generation at Connect() time and passes
it to Disconnect() in the deferred cleanup.
- RemoveNode's return value is now checked: if another session already
owns the batcher slot (reconnect happened), the old session skips
the grace period entirely.
Update batcher_test.go to track per-node connect generations and
pass them through to Disconnect(), matching production behavior.
Fixes the following test failures:
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- nodestore_correct_after_rapid_reconnect
- rapid_reconnect_peer_never_sees_offline
Add three test files designed to stress the control plane under
concurrent and adversarial conditions:
- race_test.go: 14 tests exercising concurrent mutations, session
replacement, batcher contention, NodeStore access, and map response
delivery during disconnect. All pass the Go race detector.
- poll_race_test.go: 8 tests targeting the poll.go grace period
interleaving. These confirm a logical TOCTOU race: when a node
disconnects and reconnects within the grace period, the old
session's deferred Disconnect() can overwrite the new session's
Connect(), leaving IsOnline=false despite an active poll session.
- stress_test.go: sustained churn, rapid mutations, rolling
replacement, data integrity checks under load, and verification
that rapid reconnects do not leak false-offline notifications.
Known failing tests (grace period TOCTOU race):
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- rapid_reconnect_peer_never_sees_offline
Split TestIssues into 7 focused test functions to stay under cyclomatic
complexity limits while testing more aggressively.
Issues surfaced (4 failing tests):
1. initial_map_should_include_peer_online_status: Initial MapResponse
has Online=nil for peers. Online status only arrives later via
PeersChangedPatch.
2. disco_key_should_propagate_to_peers: DiscoPublicKey set by client
is not visible to peers. Peers see zero disco key.
3. approved_route_without_announcement_is_visible: Server-side route
approval without client-side announcement silently produces empty
SubnetRoutes (intersection of empty announced + approved = empty).
4. nodestore_correct_after_rapid_reconnect: After 5 rapid reconnect
cycles, NodeStore reports node as offline despite having an active
poll session. The connect/disconnect grace period interleaving
leaves IsOnline in an incorrect state.
Passing tests (20) verify:
- IP uniqueness across 10 nodes
- IP stability across reconnect
- New peers have addresses immediately
- Node rename propagates to peers
- Node delete removes from all peer lists
- Hostinfo changes (OS field) propagate
- NodeStore/DB consistency after route mutations
- Grace period timing (8-20s window)
- Ephemeral node deletion (not just offline)
- 10-node simultaneous connect convergence
- Rapid sequential node additions
- Reconnect produces complete map
- Cross-user visibility with default policy
- Same-user multiple nodes get distinct IDs
- Same-hostname nodes get unique GivenNames
- Policy change during connect still converges
- DERP region references are valid
- User profiles present for self and peers
- Self-update arrives after route approval
- Route advertisement stored as AnnouncedRoutes
Extend the servertest harness with:
- TestClient.Direct() accessor for advanced operations
- TestClient.WaitForPeerCount and WaitForCondition helpers
- TestHarness.ChangePolicy for ACL policy testing
- AssertDERPMapPresent and AssertSelfHasAddresses
New test suites:
- content_test.go: self node, DERP map, peer properties, user profiles,
update history monotonicity, and endpoint update propagation
- policy_test.go: default allow-all, explicit policy, policy triggers
updates on all nodes, multiple policy changes, multi-user mesh
- ephemeral_test.go: ephemeral connect, cleanup after disconnect,
mixed ephemeral/regular, reconnect prevents cleanup
- routes_test.go: addresses in AllowedIPs, route advertise and approve,
advertised routes via hostinfo, CGNAT range validation
Also fix node_departs test to use WaitForCondition instead of
assert.Eventually, and convert concurrent_join_and_leave to
interleaved_join_and_leave with grace-period-tolerant assertions.
Add three test files exercising the servertest harness:
- lifecycle_test.go: connection, disconnection, reconnection, session
replacement, and mesh formation at various sizes.
- consistency_test.go: symmetric visibility, consistent peer state,
address presence, concurrent join/leave convergence.
- weather_test.go: rapid reconnects, flapping stability, reconnect
with various delays, concurrent reconnects, and scale tests.
All tests use table-driven patterns with subtests.
Add a new hscontrol/servertest package that provides a test harness
for exercising the full Headscale control protocol in-process, using
Tailscale's controlclient.Direct as the client.
The harness consists of:
- TestServer: wraps a Headscale instance with an httptest.Server
- TestClient: wraps controlclient.Direct with NetworkMap tracking
- TestHarness: orchestrates N clients against a single server
- Assertion helpers for mesh completeness, visibility, and consistency
Export minimal accessor methods on Headscale (HTTPHandler, NoisePublicKey,
GetState, SetServerURL, StartBatcher, StartEphemeralGC) so the servertest
package can construct a working server from outside the hscontrol package.
This enables fast, deterministic tests of connection lifecycle, update
propagation, and network weather scenarios without Docker.
processBatchedChanges queued each pending change for a node as a
separate work item. Since multiple workers pull from the same channel,
two changes for the same node could be processed concurrently by
different workers. This caused two problems:
1. MapResponses delivered out of order — a later change could finish
generating before an earlier one, so the client sees stale state.
2. updateSentPeers and computePeerDiff race against each other —
updateSentPeers does Clear() + Store() which is not atomic relative
to a concurrent Range() in computePeerDiff.
Bundle all pending changes for a node into a single work item so one
worker processes them sequentially. Add a per-node workMu that
serializes processing across consecutive batch ticks, preventing a
second worker from starting tick N+1 while tick N is still in progress.
Fixes#3140
The Batcher's connected field (*xsync.Map[types.NodeID, *time.Time])
encoded three states via pointer semantics:
- nil value: node is connected
- non-nil time: node disconnected at that timestamp
- key missing: node was never seen
This was error-prone (nil meaning 'connected' inverts Go idioms),
redundant with b.nodes + hasActiveConnections(), and required keeping
two parallel maps in sync. It also contained a bug in RemoveNode where
new(time.Now()) was used instead of &now, producing a zero time.
Replace the separate connected map with a disconnectedAt field on
multiChannelNodeConn (atomic.Pointer[time.Time]), tracked directly
on the object that already manages the node's connections.
Changes:
- Add disconnectedAt field and helpers (markConnected, markDisconnected,
isConnected, offlineDuration) to multiChannelNodeConn
- Remove the connected field from Batcher
- Simplify IsConnected from two map lookups to one
- Simplify ConnectedMap and Debug from two-map iteration to one
- Rewrite cleanupOfflineNodes to scan b.nodes directly
- Remove the markDisconnectedIfNoConns helper
- Update all tests and benchmarks
Fixes#3141